1 KPT Validation with Kepler Stellar Light Curves

1.1 Experimental Design

Objective: Test whether Kime-Phase Tomography (KPT) can discover latent phase structure in repeated measurements of a stochastic astrophysical process—stellar brightness variability.

Rationale: KPT requires data where multiple independent realizations of a longitudinal process are observed. Kepler provides an ideal testbed: each star is an independent realization of the “stellar variability process,” observed over the same time period with identical instrumentation.

1.2 Data Source

Mission: NASA Kepler Space Telescope (2009–2013)

Archive: Mikulski Archive for Space Telescopes (MAST)

Product: Pre-search Data Conditioning Simple Aperture Photometry (PDCSAP) flux.

Sample Selection:

Sun-like stars: $Teff \in [5500, 6000]\ K$
Main sequence: $\log(g) \in [4.0, 4.6]\ dex$
Bright targets: Kepler magnitude $< 13$ (high SNR)
Quarter $5$ data (stable instrumental performance).

Data Dictionary

Variable	Dimension	Description
`Y`	$I \times K$	Standardized variability matrix; rows = stars, columns = time windows
`Y_raw`	$I \times K$	Raw variance of flux within each window ($ppm^2$)
`t_center`	$K$	Window center times (days from start)
`n_stars`	$1$	Number of stellar realizations ($I = 75$)
`n_windows`	$1$	Number of temporal windows ($K = 28$)
`metadata`	$I \times 10$	Stellar properties from Kepler Input Catalog

Metadata fields: kepid (Kepler ID), ra, dec (J2000 coordinates), teff (effective temperature, K), logg (surface gravity, dex), feh (metallicity, dex), kepmag (Kepler magnitude), radius (R☉), mass (M☉), star_id (index).

The KPT data structure mapping is shown below.

KPT Concept	Kepler Implementation
Realization index $i$	Star $i$ (independent stellar system)
Time window $k$	$5$-day binned interval
Observation $Y_{i\ k}$	Standardized flux variance of star $i$ in window $k$
Kime-phase $\theta_i$	Latent stellar “activity phase” or rotation phase
Phase distribution $\varphi(\theta)$	Distribution of stellar phases across the sample
Kime-surface $\mathbb{S}(\theta, t)$	Phase-dependent variability pattern

1.3 Hypothesis Tested

Null Hypothesis ($H_o$): Stellar variability is phase-independent. In other words, $\varphi(\theta)=$ uniform (all phases equally likely), $\mathbb{S}(\theta, t)=$ constant (no $\theta$-dependence), and the optimal model is $Y_{i k} \sim N(\mu, \sigma^2)$.

Alternative Hypothesis ($H_1$): Stellar variability depends on a latent phase, i.e., $\varphi(\theta)\not=$ uniform (certain phases more prevalent), $\mathbb{S}(\theta, t)$ varies with $\theta$ (phase-dependent patterns exist), and KPT provides improved predictions beyond the additive noise, simple Gaussian model.

1.4 Validation Criteria

Metric	$H_o$ (No Structure)	$H_1$ (Structure Detected)
$\varphi$ deviation from uniform	$< 0.1$	$> 0.2$
$\mathbb{S}$ average $\theta$-variation	$< 0.1$	$> 0.2$
KPT vs Gaussian RMSE	Equal	KPT lower
$95\%$ PI coverage	$\sim 95\%$	$\sim 95\%$

Our experiments using Kepler stellar data indicate that:

Observed: $\varphi(\theta)\approx$ uniform, $\mathbb{S}(\theta, t)\approx$ constant.

Hence, KPT correctly identified that stellar variability in this sample has no exploitable kime-phase structure. The stochastic process (granulation, magnetic activity) is statistically homogeneous across stars. Thus, there is no evidence of hidden phase variable systematically affects variability levels.

This vital KPT experiment does not falsify KPT. It demonstrates KPT correctly recovers the null structure when no phase-dependence exists. This validates the algorithm’s behavior and confirms that Sun-like stellar variability is well-modeled as a phase-independent Gaussian process.

2 Implementation: Kepler Light Curves for KPT

This notebook provides a complete implementation for using Kepler stellar light curves as the validation dataset for Kime-Phase Tomography.

2.1 Data Access Strategy

library(tidyverse)
library(httr)
library(jsonlite)

# For handling FITS files (Kepler data format)
# install.packages("FITSio") if needed
library(FITSio)

2.2 Step 1: Query Kepler Targets

We select Sun-like stars with similar properties to ensure the “process” being measured is comparable across realizations.

#' Query NASA Exoplanet Archive for Kepler stellar properties
#' 
#' @param n_targets Maximum number of targets
#' @param teff_range Temperature range (K) for Sun-like stars
#' @param logg_range Surface gravity range for main-sequence
#' @param mag_max Maximum Kepler magnitude (brightness limit)
query_sunlike_targets <- function(n_targets = 500,
                                   teff_range = c(5500, 6000),
                                   logg_range = c(4.0, 4.6),
                                   mag_max = 13) {
  
  base_url <- "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
  
  query <- sprintf(
    "SELECT TOP %d kepid, ra, dec, teff, logg, feh, kepmag, radius, mass
     FROM keplerstellar 
     WHERE teff BETWEEN %d AND %d
       AND logg BETWEEN %.1f AND %.1f
       AND kepmag < %.1f
       AND radius IS NOT NULL
     ORDER BY kepmag ASC",
    n_targets, 
    teff_range[1], teff_range[2],
    logg_range[1], logg_range[2],
    mag_max
  )
  
  response <- GET(base_url, query = list(query = query, format = "json"))
  
  if (status_code(response) != 200) {
    stop("Query failed with status: ", status_code(response))
  }
  
  data <- fromJSON(rawToChar(response$content))
  as_tibble(data)
}

# Query Sun-like targets
cat("Querying Kepler archive for Sun-like stars...\n")

## Querying Kepler archive for Sun-like stars...

targets <- tryCatch({
  query_sunlike_targets(n_targets = 200)
}, error = function(e) {
  cat("Error:", e$message, "\n")
  cat("Using simulated data for demonstration\n")
  NULL
})

if (!is.null(targets)) {
  cat(sprintf("\nFound %d Sun-like targets\n", nrow(targets)))
  cat("\nStellar parameter ranges:\n")
  cat(sprintf("  Teff: %d - %d K\n", min(targets$teff), max(targets$teff)))
  cat(sprintf("  log g: %.2f - %.2f\n", min(targets$logg), max(targets$logg)))
  cat(sprintf("  Kepmag: %.1f - %.1f\n", min(targets$kepmag), max(targets$kepmag)))
  print(head(targets, 10))
}

## 
## Found 200 Sun-like targets
## 
## Stellar parameter ranges:
##   Teff: 5545 - 5997 K
##   log g: 4.09 - 4.58
##   Kepmag: 6.0 - 13.0
## # A tibble: 10 × 9
##       kepid    ra   dec  teff  logg   feh kepmag radius mass 
##       <int> <dbl> <dbl> <int> <dbl> <dbl>  <dbl>  <dbl> <lgl>
##  1 10005473  289.  47.0  5780  4.44     0   6.00      1 NA   
##  2 10002106  287.  47.0  5780  4.44     0   8.98      1 NA   
##  3 10064712  287.  47.1  5780  4.44     0   9.36      1 NA   
##  4 10068519  289.  47.1  5780  4.44     0   9.54      1 NA   
##  5 10031808  299.  46.9  5780  4.44     0   9.56      1 NA   
##  6 10200031  288.  47.2  5780  4.44     0   9.84      1 NA   
##  7 10202015  289.  47.2  5780  4.44     0   9.86      1 NA   
##  8 10139791  291.  47.2  5780  4.44     0   9.90      1 NA   
##  9 10082904  295.  47.1  5780  4.44     0   9.93      1 NA   
## 10 10068482  289.  47.1  5780  4.44     0   9.99      1 NA

2.3 Step 2: Download Light Curves

#' Get MAST data product URLs for a Kepler target
#' 
#' @param kepid Kepler Input Catalog ID
#' @param product_type Type of light curve ("LLC" for long cadence)
get_kepler_urls <- function(kepid, product_type = "LLC") {
  
  # MAST API query for Kepler data products
  mast_url <- "https://mast.stsci.edu/api/v0/invoke"
  
  request_body <- list(
    service = "Mast.Caom.Filtered",
    format = "json",
    params = list(
      filters = list(
        list(paramName = "target_name", values = list(as.character(kepid))),
        list(paramName = "obs_collection", values = list("Kepler")),
        list(paramName = "dataproduct_type", values = list("timeseries"))
      )
    )
  )
  
  response <- POST(
    mast_url,
    body = toJSON(request_body, auto_unbox = TRUE),
    content_type_json(),
    timeout(60)
  )
  
  if (status_code(response) != 200) {
    return(NULL)
  }
  
  result <- fromJSON(rawToChar(response$content))
  return(result$data)
}

#' Download and parse a Kepler FITS light curve
#' 
#' @param url URL to FITS file
#' @param use_pdcflux Use PDC-corrected flux (TRUE) or SAP flux (FALSE)
download_lightcurve <- function(url, use_pdcflux = TRUE) {
  
  # Download to temp file
  temp_file <- tempfile(fileext = ".fits")
  
  tryCatch({
    download.file(url, temp_file, mode = "wb", quiet = TRUE)
    
    # Read FITS file
    fits <- readFITS(temp_file)
    
    # Extract time and flux
    time <- fits$col[[1]]  # TIME column (BJD - 2454833)
    
    if (use_pdcflux) {
      flux <- fits$col[[8]]  # PDCSAP_FLUX (corrected)
    } else {
      flux <- fits$col[[4]]  # SAP_FLUX (raw)
    }
    
    quality <- fits$col[[10]]  # SAP_QUALITY flags
    
    # Clean up temp file
    unlink(temp_file)
    
    # Return data frame with good quality data
    df <- data.frame(
      time = time,
      flux = flux,
      quality = quality
    ) %>%
      filter(quality == 0, !is.na(flux), !is.na(time))
    
    return(df)
    
  }, error = function(e) {
    unlink(temp_file)
    return(NULL)
  })
}

2.4 Step 3: Simulated Kepler-like Data

For a simple demonstration without downloading actual Kepler data, we simulate Kepler-like light curves with realistic variability properties.

#' Simulate Kepler-like stellar light curves
#' 
#' Includes realistic components:
#' - Granulation noise (red noise)
#' - Rotation + spots (quasi-periodic)
#' - Photon noise (white noise)
#' 
#' @param n_stars Number of stars (realizations)
#' @param n_time Number of time points
#' @param cadence Cadence in minutes
#' @param variability_ppm RMS variability in parts per million
simulate_kepler_lightcurves <- function(n_stars = 100,
                                         n_time = 5000,
                                         cadence = 30,
                                         variability_ppm = 100) {
  
  # Time in days
  dt <- cadence / (60 * 24)  # Convert minutes to days
  t <- seq(0, (n_time - 1) * dt, by = dt)
  
  # Initialize output matrix
  Y <- matrix(NA_real_, nrow = n_stars, ncol = n_time)
  metadata <- tibble(
    star_id = 1:n_stars,
    rotation_period = numeric(n_stars),
    granulation_tau = numeric(n_stars),
    activity_level = numeric(n_stars)
  )
  
  cat("Simulating", n_stars, "stellar light curves...\n")
  
  for (i in 1:n_stars) {
    # Stellar parameters (vary between stars)
    P_rot <- runif(1, 10, 30)  # Rotation period (days)
    tau_gran <- runif(1, 0.1, 0.5)  # Granulation timescale (days)
    activity <- runif(1, 0.3, 1.5)  # Activity level multiplier
    
    metadata$rotation_period[i] <- P_rot
    metadata$granulation_tau[i] <- tau_gran
    metadata$activity_level[i] <- activity
    
    # 1. Granulation noise (red noise via AR process)
    ar_coef <- exp(-dt / tau_gran)
    granulation <- numeric(n_time)
    granulation[1] <- rnorm(1, 0, variability_ppm * 0.5 * activity)
    for (j in 2:n_time) {
      granulation[j] <- ar_coef * granulation[j-1] + 
                        sqrt(1 - ar_coef^2) * rnorm(1, 0, variability_ppm * 0.5 * activity)
    }
    
    # 2. Rotation + spots (quasi-periodic)
    phase <- runif(1, 0, 2*pi)  # Random initial phase
    spot_amp <- variability_ppm * 0.3 * activity
    rotation_signal <- spot_amp * sin(2*pi*t/P_rot + phase) +
                       spot_amp * 0.5 * sin(4*pi*t/P_rot + phase + runif(1, 0, pi))
    
    # 3. Photon noise (white noise, depends on stellar brightness)
    photon_noise <- rnorm(n_time, 0, variability_ppm * 0.2)
    
    # Combine components
    flux <- 1e6 + granulation + rotation_signal + photon_noise
    
    # Normalize to zero mean, unit variance
    Y[i, ] <- (flux - mean(flux)) / sd(flux)
  }
  
  cat("Done!\n")
  
  list(
    Y = Y,
    t = t,
    metadata = metadata,
    n_stars = n_stars,
    n_time = n_time,
    cadence_min = cadence
  )
}

# Generate simulated data
set.seed(42)
kepler_sim <- simulate_kepler_lightcurves(
  n_stars = 100,
  n_time = 2000,  # About 40 days at 30-min cadence
  cadence = 30,
  variability_ppm = 100
)

## Simulating 100 stellar light curves...
## Done!

cat("\n=== SIMULATED KEPLER DATA ===\n")

## 
## === SIMULATED KEPLER DATA ===

cat(sprintf("Stars (realizations): %d\n", kepler_sim$n_stars))

## Stars (realizations): 100

cat(sprintf("Time points: %d\n", kepler_sim$n_time))

## Time points: 2000

cat(sprintf("Time span: %.1f days\n", max(kepler_sim$t)))

## Time span: 41.6 days

cat(sprintf("Cadence: %d minutes\n", kepler_sim$cadence_min))

## Cadence: 30 minutes

2.5 Step 3A: Using Real Kepler Light Curves

Here is a robust version operating with real Kepler data.

Automatically selects method: First, it tries lightkurve (Python package) and it falls back to pure R if unavailable,
Same output structure as simulate_kepler_lightcurves(),
$kepler\_real\$Y$, Matrix: $n\_stars \times n\_timekepler\_real\$t$, Time vector (days)
$kepler\_real\$metadata$, Tibble with stellar properties
$kepler\_real\$n\_stars$, Number of $starskepler\_real\$n\_time$, Number of time points

Invocation protocol, which replaces the prior simulation chunk with this real Kepler download chunk.

Instead of using kepler_sim <- simulate_kepler_lightcurves(n_stars = 100, ...)

The real Kepler data use-case invokes the method download_kepler_lightcurves().


kepler_real <- download_kepler_lightcurves(
  n_stars = 100,              # 100 independent stellar realizations
  teff_range = c(5500, 6000), # Sun-like stars
  max_time_days = 60,         # ~2 months of data
  method = "auto"             # Auto-select best method
)

Then continuing with using the existing methods create_kpt_matrix_kepler() and kpt_gem_stellar().


kpt_kepler <- create_kpt_matrix_kepler(kepler_real, window_days = 5, step_days = 2)
fit_kepler <- kpt_gem_stellar(kpt_kepler$Y)

The key invocation parameters include:


download_kepler_lightcurves(
  n_stars = 100,        # Number of independent realizations
  teff_range = c(5500, 6000),  # Temperature range (K)
  logg_range = c(4.0, 4.6),    # Surface gravity (main sequence)
  mag_range = c(10, 13),       # Brightness (brighter = better SNR)
  quarter = 5,          # Kepler quarter (1-17)
  max_time_days = 60,   # Time span to use
  method = "auto"       # "auto", "lightkurve", or "mast"
)

In practice, the real Kepler data download may take $5-20$ minutes depending on $n\_stars$ value and network speed.

# ==============================================================================
# DOWNLOAD REAL KEPLER STELLAR LIGHT CURVES - UNIFIED VERSION
# ==============================================================================
#
# This chunk downloads actual observed Kepler light curves to use as the
# ultimate validation dataset for Kime-Phase Tomography.
#
# Two methods available:
#   1. lightkurve (Python) via reticulate - Most reliable, more features
#   2. Pure R via direct MAST API - No Python dependency
#
# Output structure matches simulate_kepler_lightcurves():
#   - Y: matrix (n_stars × n_time) of standardized flux
#   - t: time vector in days (starting at 0)
#   - metadata: tibble with stellar properties
#   - n_stars, n_time, cadence_min: scalar metadata
#
# ==============================================================================

library(tidyverse)
library(httr)
library(jsonlite)

# ==============================================================================
# HELPER: Query Kepler Input Catalog
# ==============================================================================

query_kepler_targets <- function(n_targets = 500,
                                  teff_range = c(5500, 6000),
                                  logg_range = c(4.0, 4.6),
                                  mag_range = c(10, 13)) {
  
  tap_url <- "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
  
  query <- sprintf(
    "SELECT TOP %d kepid, ra, dec, teff, logg, feh, kepmag, radius, mass
     FROM keplerstellar 
     WHERE teff BETWEEN %d AND %d
       AND logg BETWEEN %.2f AND %.2f
       AND kepmag BETWEEN %.1f AND %.1f
       AND radius IS NOT NULL AND mass IS NOT NULL
     ORDER BY kepmag ASC",
    n_targets,
    teff_range[1], teff_range[2],
    logg_range[1], logg_range[2],
    mag_range[1], mag_range[2]
  )
  
  response <- GET(tap_url, query = list(query = query, format = "json"), timeout(120))
  
  if (status_code(response) != 200) {
    stop("Failed to query Kepler stellar catalog")
  }
  
  as_tibble(fromJSON(rawToChar(response$content)))
}

# ==============================================================================
# METHOD 1: Using lightkurve (Python) - RECOMMENDED
# ==============================================================================

download_kepler_via_lightkurve <- function(targets,
                                            n_stars = 100,
                                            quarter = 5,
                                            cadence = "long",
                                            max_time_days = 60,
                                            verbose = TRUE) {
  
  # Check Python availability
  if (!requireNamespace("reticulate", quietly = TRUE)) {
    stop("reticulate package required. Install with: install.packages('reticulate')")
  }
  library(reticulate)
  
  # Check/install lightkurve
  if (!py_module_available("lightkurve")) {
    if (verbose) cat("Installing lightkurve Python package...\n")
    py_install("lightkurve", pip = TRUE)
  }
  
  lk <- import("lightkurve")
  
  if (verbose) cat("Downloading via lightkurve...\n")
  
  lightcurves <- list()
  metadata_list <- list()
  successful <- 0
  
  for (i in seq_len(min(nrow(targets), n_stars * 3))) {
    if (successful >= n_stars) break
    
    kepid <- targets$kepid[i]
    
    if (verbose && (i %% 20 == 0 || i == 1)) {
      cat(sprintf("  Target %d: KIC %d (success: %d/%d)\n", i, kepid, successful, n_stars))
    }
    
    lc_result <- tryCatch({
      search_str <- sprintf("KIC %d", kepid)
      
      search_result <- lk$search_lightcurve(
        search_str,
        mission = "Kepler",
        quarter = as.integer(quarter),
        cadence = cadence
      )
      
      if (length(search_result) == 0) return(NULL)
      
      lc <- search_result$download(quality_bitmask = "hard")
      
      time_raw <- as.numeric(lc$time$value)
      flux_raw <- as.numeric(lc$flux$value)
      
      valid <- is.finite(time_raw) & is.finite(flux_raw)
      if (sum(valid) < 100) return(NULL)
      
      list(
        time = time_raw[valid] - min(time_raw[valid]),
        flux = (flux_raw[valid] / median(flux_raw[valid]) - 1) * 1e6,
        kepid = kepid
      )
    }, error = function(e) NULL)
    
    if (!is.null(lc_result)) {
      successful <- successful + 1
      lightcurves[[successful]] <- lc_result
      metadata_list[[successful]] <- targets[i, ] %>% mutate(star_id = successful)
    }
    
    Sys.sleep(0.3)
  }
  
  list(lightcurves = lightcurves, metadata = bind_rows(metadata_list), n = successful)
}

# ==============================================================================
# METHOD 2: Pure R via MAST API
# ==============================================================================

download_kepler_via_mast <- function(targets,
                                      n_stars = 50,
                                      max_time_days = 60,
                                      verbose = TRUE) {
  
  if (!requireNamespace("FITSio", quietly = TRUE)) {
    install.packages("FITSio")
  }
  library(FITSio)
  
  if (verbose) cat("Downloading via direct MAST API (pure R)...\n")
  
  lightcurves <- list()
  metadata_list <- list()
  successful <- 0
  
  # Quarter timestamps for filename construction
  q_timestamps <- c("2010078095331", "2010174085026", "2009350155506", "2010265121752")
  
  for (i in seq_len(min(nrow(targets), n_stars * 5))) {
    if (successful >= n_stars) break
    
    kepid <- targets$kepid[i]
    kepid_str <- sprintf("%09d", kepid)
    prefix <- substr(kepid_str, 1, 4)
    
    if (verbose && (i %% 10 == 0 || i == 1)) {
      cat(sprintf("  Target %d: KIC %d (success: %d/%d)\n", i, kepid, successful, n_stars))
    }
    
    lc_result <- NULL
    
    for (ts in q_timestamps) {
      fits_url <- sprintf(
        "https://archive.stsci.edu/missions/kepler/lightcurves/%s/%s/kplr%s-%s_llc.fits",
        prefix, kepid_str, kepid_str, ts
      )
      
      lc_result <- tryCatch({
        temp_file <- tempfile(fileext = ".fits")
        on.exit(unlink(temp_file), add = TRUE)
        
        download.file(fits_url, temp_file, mode = "wb", quiet = TRUE)
        fits_data <- readFITS(temp_file)
        
        col_names <- fits_data$colNames
        time_idx <- which(col_names == "TIME")
        flux_idx <- which(col_names == "PDCSAP_FLUX")
        qual_idx <- which(col_names == "SAP_QUALITY")
        
        if (length(time_idx) == 0 || length(flux_idx) == 0) return(NULL)
        
        time <- as.numeric(fits_data$col[[time_idx]])
        flux <- as.numeric(fits_data$col[[flux_idx]])
        quality <- if (length(qual_idx) > 0) as.integer(fits_data$col[[qual_idx]]) else rep(0L, length(time))
        
        valid <- (quality == 0) & is.finite(time) & is.finite(flux) & (flux > 0)
        if (sum(valid) < 100) return(NULL)
        
        list(
          time = time[valid] - min(time[valid]),
          flux = (flux[valid] / median(flux[valid]) - 1) * 1e6,
          kepid = kepid
        )
      }, error = function(e) NULL)
      
      if (!is.null(lc_result)) break
    }
    
    if (!is.null(lc_result)) {
      successful <- successful + 1
      lightcurves[[successful]] <- lc_result
      metadata_list[[successful]] <- targets[i, ] %>% mutate(star_id = successful)
    }
    
    Sys.sleep(0.2)
  }
  
  list(lightcurves = lightcurves, metadata = bind_rows(metadata_list), n = successful)
}

# ==============================================================================
# MAIN FUNCTION: Align and create matrix
# ==============================================================================

create_kepler_matrix <- function(download_result, max_time_days = 60) {
  
  lightcurves <- download_result$lightcurves
  metadata <- download_result$metadata
  n <- download_result$n
  
  if (n < 5) stop("Too few successful downloads")
  
  # Find common time range
  t_min <- max(sapply(lightcurves, function(lc) min(lc$time)))
  t_max <- min(sapply(lightcurves, function(lc) max(lc$time)))
  
  if (!is.null(max_time_days)) {
    t_max <- min(t_max, t_min + max_time_days)
  }
  
  # Create time grid (30-min cadence)
  dt <- 0.0208
  t_grid <- seq(t_min, t_max, by = dt)
  n_time <- length(t_grid)
  
  # Build matrix
  Y <- matrix(NA_real_, nrow = n, ncol = n_time)
  
  for (i in seq_len(n)) {
    lc <- lightcurves[[i]]
    flux_interp <- approx(lc$time, lc$flux, xout = t_grid, rule = 1)$y
    
    if (sum(!is.na(flux_interp)) > 50) {
      Y[i, ] <- (flux_interp - mean(flux_interp, na.rm = TRUE)) / 
                 sd(flux_interp, na.rm = TRUE)
    }
  }
  
  list(
    Y = Y,
    t = t_grid,
    metadata = metadata,
    n_stars = n,
    n_time = n_time,
    cadence_min = dt * 24 * 60,
    source = "Kepler/MAST"
  )
}

# ==============================================================================
# UNIFIED DOWNLOAD FUNCTION
# ==============================================================================

#' Download real Kepler light curves
#'
#' @param n_stars Number of stars to download
#' @param teff_range Temperature range for Sun-like stars (K)
#' @param logg_range Surface gravity range (dex)
#' @param mag_range Kepler magnitude range
#' @param quarter Kepler quarter (1-17)
#' @param max_time_days Maximum time span to include
#' @param method "auto" (try lightkurve, fall back to mast), "lightkurve", or "mast"
#' @param verbose Print progress
#' @return List with Y matrix, t vector, metadata (same as simulate_kepler_lightcurves)
download_kepler_lightcurves <- function(n_stars = 100,
                                         teff_range = c(5500, 6000),
                                         logg_range = c(4.0, 4.6),
                                         mag_range = c(10, 13),
                                         quarter = 5,
                                         max_time_days = 60,
                                         method = "auto",
                                         verbose = TRUE) {
  
  if (verbose) {
    cat("\n")
    cat("═══════════════════════════════════════════════════════════════════\n")
    cat("  DOWNLOADING REAL KEPLER STELLAR LIGHT CURVES                     \n")
    cat("═══════════════════════════════════════════════════════════════════\n")
  }
  
  # Step 1: Query targets
  if (verbose) cat("\nStep 1: Querying Kepler stellar catalog...\n")
  
  targets <- query_kepler_targets(
    n_targets = n_stars * 5,
    teff_range = teff_range,
    logg_range = logg_range,
    mag_range = mag_range
  )
  
  if (verbose) {
    cat(sprintf("  Found %d candidate Sun-like stars\n", nrow(targets)))
  }
  
  # Step 2: Download light curves
  if (verbose) cat("\nStep 2: Downloading light curves...\n")
  
  download_result <- NULL
  
  if (method %in% c("auto", "lightkurve")) {
    download_result <- tryCatch({
      download_kepler_via_lightkurve(
        targets, n_stars = n_stars, quarter = quarter,
        max_time_days = max_time_days, verbose = verbose
      )
    }, error = function(e) {
      if (verbose) cat("  lightkurve method failed:", e$message, "\n")
      NULL
    })
  }
  
  if (is.null(download_result) && method %in% c("auto", "mast")) {
    if (verbose && method == "auto") cat("  Falling back to direct MAST API...\n")
    download_result <- download_kepler_via_mast(
      targets, n_stars = min(n_stars, 75),  # MAST is slower
      max_time_days = max_time_days, verbose = verbose
    )
  }
  
  if (is.null(download_result) || download_result$n < 5) {
    stop("Failed to download sufficient light curves")
  }
  
  if (verbose) cat(sprintf("  Downloaded %d light curves\n", download_result$n))
  
  # Step 3: Create aligned matrix
  if (verbose) cat("\nStep 3: Creating aligned data matrix...\n")
  
  result <- create_kepler_matrix(download_result, max_time_days = max_time_days)
  
  # Summary
  if (verbose) {
    cat("\n")
    cat("═══════════════════════════════════════════════════════════════════\n")
    cat("  DOWNLOAD COMPLETE                                                \n")
    cat("═══════════════════════════════════════════════════════════════════\n")
    cat(sprintf("Stars (realizations): %d\n", result$n_stars))
    cat(sprintf("Time points: %d\n", result$n_time))
    cat(sprintf("Time span: %.1f days\n", max(result$t)))
    cat(sprintf("Cadence: %.1f minutes\n", result$cadence_min))
    cat(sprintf("Matrix dimensions: %d × %d\n", nrow(result$Y), ncol(result$Y)))
    cat(sprintf("Missing values: %.1f%%\n", 100 * mean(is.na(result$Y))))
    cat("\nStellar properties:\n")
    cat(sprintf("  Teff: %d - %d K\n", 
                min(result$metadata$teff), max(result$metadata$teff)))
    cat(sprintf("  Kepmag: %.1f - %.1f\n",
                min(result$metadata$kepmag), max(result$metadata$kepmag)))
    cat("═══════════════════════════════════════════════════════════════════\n")
  }
  
  result
}

# ==============================================================================
# EXECUTE: DOWNLOAD REAL DATA
# ==============================================================================

cat("Starting Kepler light curve download...\n")

## Starting Kepler light curve download...

cat("This may take 5-20 minutes depending on n_stars and method...\n\n")

## This may take 5-20 minutes depending on n_stars and method...

# Download real Kepler data
# This replaces simulate_kepler_lightcurves() with real observed data


# # OLD
# kepler_real <- download_kepler_lightcurves(
#   n_stars = 100,              # Number of stars (realizations)
#   teff_range = c(5500, 6000), # Sun-like temperature
#   logg_range = c(4.0, 4.6),   # Main sequence
#   mag_range = c(10, 13),      # Bright enough for good data
#   quarter = 5,                # Kepler Q5 (good quality)
#   max_time_days = 60,         # ~2 months of data
#   method = "auto",            # Try lightkurve, fall back to MAST
#   verbose = TRUE
# )

n_stars = 10
cache_file <- sprintf("kepler_lightcurves_%s.rds",
                      gsub("-", "", n_stars))

if (file.exists(cache_file)) {
  message("Loading cached Kepler data: ", cache_file)
  kepler_real <- readRDS(cache_file)
} else {
  # Retrieve data from Horizons
  kepler_real <- tryCatch({
      kepler_real <- download_kepler_lightcurves(
        n_stars = 100,              # Number of stars (realizations)
        teff_range = c(5500, 6000), # Sun-like temperature
        logg_range = c(4.0, 4.6),   # Main sequence
        mag_range = c(10, 13),      # Bright enough for good data
        quarter = 5,                # Kepler Q5 (good quality)
        max_time_days = 60,         # ~2 months of data
        method = "auto",            # Try lightkurve, fall back to MAST
        verbose = TRUE
      )
    }, error = function(e) {
      cat("Primary request failed, trying backup range...\n")
      # fetch_horizons_data(
      #   start_date = "2000-01-01",
      #   end_date = "2024-06-01",
      #   step_size = "1 d"
      # )
  })

  saveRDS(kepler_real, cache_file)
  message("Saved cached Kepler Lightcurves Data: ", cache_file)
}

# Quick validation
cat("\n=== DATA READY FOR KPT ===\n")

## 
## === DATA READY FOR KPT ===

cat(sprintf("Use kepler_real$Y as input to create_kpt_matrix_kepler()\n"))

## Use kepler_real$Y as input to create_kpt_matrix_kepler()

cat(sprintf("Dimensions match simulate_kepler_lightcurves() output\n"))

## Dimensions match simulate_kepler_lightcurves() output

# Preview first few stars
cat("\nFirst 5 stars:\n")

## 
## First 5 stars:

print(head(kepler_real$metadata, 5))

## # A tibble: 5 × 10
##      kepid    ra   dec  teff  logg   feh kepmag radius  mass star_id
##      <int> <dbl> <dbl> <int> <dbl> <dbl>  <dbl>  <dbl> <dbl>   <dbl>
## 1 10677047  296.  47.9  5663  4.54 -0.14   10.1  0.849 0.916       1
## 2 11081642  291.  48.6  5993  4.25 -0.02   10.1  1.27  1.05        2
## 3  9760362  287.  46.6  5860  4.21 -0.2    10.1  1.25  0.929       3
## 4 10934241  298.  48.4  5912  4.49 -0.16   10.3  0.934 0.974       4
## 5 11200185  296.  48.8  5580  4.48 -0.6    10.3  0.814 0.724       5

# Basic statistics
cat("\nFlux statistics per star (first 5):\n")

## 
## Flux statistics per star (first 5):

for (i in 1:min(5, nrow(kepler_real$Y))) {
  flux <- kepler_real$Y[i, ]
  cat(sprintf("  Star %d (KIC %d): mean=%.3f, sd=%.3f, range=[%.2f, %.2f]\n",
              i, kepler_real$metadata$kepid[i],
              mean(flux, na.rm=TRUE), sd(flux, na.rm=TRUE),
              min(flux, na.rm=TRUE), max(flux, na.rm=TRUE)))
}

##   Star 1 (KIC 10677047): mean=0.000, sd=1.000, range=[-2.68, 2.44]
##   Star 2 (KIC 11081642): mean=-0.000, sd=1.000, range=[-2.78, 2.34]
##   Star 3 (KIC 9760362): mean=-0.000, sd=1.000, range=[-3.30, 1.82]
##   Star 4 (KIC 10934241): mean=0.000, sd=1.000, range=[-2.56, 1.85]
##   Star 5 (KIC 11200185): mean=-0.000, sd=1.000, range=[-2.96, 2.25]

The output of the code-chunk above may look like this ….


Starting Kepler light curve download...
This may take 5-20 minutes depending on n_stars and method...

═══════════════════════════════════════════════════════════════════
  DOWNLOADING REAL KEPLER STELLAR LIGHT CURVES                     
═══════════════════════════════════════════════════════════════════
...

C:\Users\IvoD\ANACON~1\Lib\site-packages\lightkurve\prf\__init__.py:7: UserWarning: Warning: the tpfmodel submodule is not available without oktopus installed, which requires a current version of autograd. See #1452 for details.
  warnings.warn(
Downloading via lightkurve...
  Target 1: KIC 10677047 (success: 0/100)
  Target 20: KIC 10331562 (success: 0/100)
Warning: 35% (1566/4538) of the cadences will be ignored due to the quality mask (quality_bitmask=1664431).
  Falling back to direct MAST API...
Downloading via direct MAST API (pure R)...
  Target 1: KIC 10677047 (success: 0/75)
  Target 10: KIC 10477502 (success: 9/75)
  Target 20: KIC 10331562 (success: 19/75)
  Target 30: KIC 10389154 (success: 29/75)
  Target 40: KIC 11127479 (success: 39/75)
  Target 50: KIC 9962623 (success: 49/75)
  Target 60: KIC 10514429 (success: 59/75)
  Target 70: KIC 10857462 (success: 69/75)
  Downloaded 75 light curves

Step 3: Creating aligned data matrix...

═══════════════════════════════════════════════════════════════════
  DOWNLOAD COMPLETE                                                
═══════════════════════════════════════════════════════════════════
Stars (realizations): 75
Time points: 2885
Time span: 60.0 days
Cadence: 30.0 minutes
Matrix dimensions: 75 � 2885
Missing values: 0.0%

Stellar properties:
  Teff: 5530 - 5999 K
  Kepmag: 10.1 - 11.6
═══════════════════════════════════════════════════════════════════

=== DATA READY FOR KPT ===
Use kepler_real$Y as input to create_kpt_matrix_kepler()
Dimensions match simulate_kepler_lightcurves() output

First 5 stars:
# A tibble: 5 x 10
     kepid    ra   dec  teff  logg   feh kepmag radius  mass star_id
                  
1 10677047  296.  47.9  5663  4.54 -0.14   10.1  0.849 0.916       1
2 11081642  291.  48.6  5993  4.25 -0.02   10.1  1.27  1.05        2
3  9760362  287.  46.6  5860  4.21 -0.2    10.1  1.25  0.929       3
4 10934241  298.  48.4  5912  4.49 -0.16   10.3  0.934 0.974       4
5 11200185  296.  48.8  5580  4.48 -0.6    10.3  0.814 0.724       5

Flux statistics per star (first 5):
  Star 1 (KIC 10677047): mean=0.000, sd=1.000, range=[-2.68, 2.44]
  Star 2 (KIC 11081642): mean=-0.000, sd=1.000, range=[-2.78, 2.34]
  Star 3 (KIC 9760362): mean=-0.000, sd=1.000, range=[-3.30, 1.82]
  Star 4 (KIC 10934241): mean=0.000, sd=1.000, range=[-2.56, 1.85]
  Star 5 (KIC 11200185): mean=-0.000, sd=1.000, range=[-2.96, 2.25]

2.6 Step 4: Visualize Sample Light Curves

# Plot sample light curves
par(mfrow = c(3, 1), mar = c(4, 4, 2, 1))

kepler_data <- kepler_real # Real Kepler Data 
# kepler_data <- kepler_sim  # SImulated Kepler Data 

for (i in 1:3) {
  plot(kepler_data$t, kepler_data$Y[i, ], type = "l", col = "steelblue",
       xlab = "Time (days)", ylab = "Normalized Flux",
       main = sprintf("Star %d (feh  = %.1f d, mass = %.2f d)", 
                      i, kepler_data$metadata$feh[i],
                      kepler_data$metadata$mass[i]))
       # main = sprintf("Star %d (P_rot = %.1f d, τ_gran = %.2f d)", 
       #                i, kepler_data$metadata$rotation_period[i],
       #                kepler_data$metadata$granulation_tau[i]))
  grid()
}

par(mfrow = c(1, 1))

2.7 Step 5: Create KPT Matrix Structure

#' Create KPT matrix from light curves
#' 
#' Groups time points into windows for temporal analysis.
#' 
#' @param sim Output from simulate_kepler_lightcurves()
#' @param window_days Window size in days
#' @param step_days Step between windows
create_kpt_matrix_kepler <- function(sim, 
                                      window_days = 5,
                                      step_days = 2) {
  
  Y_full <- sim$Y
  t <- sim$t
  dt <- t[2] - t[1]
  
  n_stars <- nrow(Y_full)
  
  # Create windows
  window_pts <- round(window_days / dt)
  step_pts <- round(step_days / dt)
  
  # Window centers
  center_idx <- seq(window_pts/2 + 1, ncol(Y_full) - window_pts/2, by = step_pts)
  n_windows <- length(center_idx)
  
  cat(sprintf("Creating KPT matrix:\n"))
  cat(sprintf("  Window size: %.1f days (%d points)\n", window_days, window_pts))
  cat(sprintf("  Step size: %.1f days (%d points)\n", step_days, step_pts))
  cat(sprintf("  Number of windows: %d\n", n_windows))
  
  # For each window, compute summary statistic (variance or RMS)
  # This gives us I realizations × K time windows
  Y <- matrix(NA_real_, nrow = n_stars, ncol = n_windows)
  t_center <- numeric(n_windows)
  
  for (k in seq_along(center_idx)) {
    idx <- (center_idx[k] - window_pts/2 + 1):(center_idx[k] + window_pts/2)
    idx <- idx[idx > 0 & idx <= ncol(Y_full)]
    
    # Compute variance within window (measure of variability)
    Y[, k] <- apply(Y_full[, idx, drop = FALSE], 1, var)
    t_center[k] <- mean(t[idx])
  }
  
  # Log-transform variance (more Gaussian)
  Y_log <- log(Y)
  
  # Standardize
  Y_std <- scale(Y_log)
  
  list(
    Y = Y_std,
    Y_raw = Y,
    t_center = t_center,
    n_stars = n_stars,
    n_windows = n_windows,
    metadata = sim$metadata
  )
}

# Create KPT matrix
kpt_kepler <- create_kpt_matrix_kepler(kepler_data, window_days = 5, step_days = 2)

## Creating KPT matrix:
##   Window size: 5.0 days (240 points)
##   Step size: 2.0 days (96 points)
##   Number of windows: 28

cat(sprintf("\nKPT Matrix dimensions: %d stars × %d windows\n", 
            kpt_kepler$n_stars, kpt_kepler$n_windows))

## 
## KPT Matrix dimensions: 75 stars × 28 windows

cat(sprintf("Overall mean: %.4f\n", mean(kpt_kepler$Y)))

## Overall mean: -0.0000

cat(sprintf("Overall SD: %.4f\n", sd(kpt_kepler$Y)))

## Overall SD: 0.9935

2.8 Step 6: Apply KPT-GEM and KPT-FTT

# #' Simplified KPT-GEM for stellar data
# kpt_gem_stellar <- function(Y, L_theta = 64, max_iter = 50, tol = 1e-5) {
#   
#   I <- nrow(Y)  # Stars
#   K <- ncol(Y)  # Windows
#   theta <- seq(0, 2*pi, length.out = L_theta + 1)[1:L_theta]
#   
#   # Initialize
#   mu_0 <- mean(Y)
#   phi <- rep(1, L_theta)
#   S <- matrix(mu_0, K, L_theta)
#   sigma2 <- var(as.vector(Y))
#   
#   history <- data.frame(iter = integer(), mu_0 = numeric(), ll = numeric())
#   
#   cat("Running KPT-GEM on stellar data...\n")
#   
#   for (iter in 1:max_iter) {
#     mu_0_old <- mu_0
#     
#     # E-step
#     W_sum <- matrix(0, K, L_theta)
#     YW_sum <- matrix(0, K, L_theta)
#     ll <- 0
#     
#     for (k in 1:K) {
#       yk <- Y[, k]
#       D <- outer(yk, S[k, ], "-")
#       log_lik <- -D^2 / (2 * sigma2)
#       log_w <- sweep(log_lik, 2, log(pmax(phi, 1e-10)), "+")
#       w_max <- apply(log_w, 1, max)
#       w <- exp(log_w - w_max)
#       w <- w / rowSums(w)
#       
#       W_sum[k, ] <- colSums(w)
#       YW_sum[k, ] <- as.numeric(crossprod(w, yk))
#       ll <- ll + sum(w_max + log(rowSums(exp(log_w - w_max))))
#     }
#     
#     # M-step
#     phi_new <- colSums(W_sum)
#     phi_new <- phi_new / sum(phi_new) * L_theta
#     
#     for (k in 1:K) {
#       w_l <- pmax(W_sum[k, ], 1e-10)
#       S[k, ] <- YW_sum[k, ] / w_l
#     }
#     
#     mu_0_new <- mean(S)
#     
#     # Update sigma
#     ss <- 0
#     for (k in 1:K) {
#       yk <- Y[, k]
#       D <- outer(yk, S[k, ], "-")
#       log_lik <- -D^2 / (2 * sigma2)
#       log_w <- sweep(log_lik, 2, log(pmax(phi_new, 1e-10)), "+")
#       w <- exp(log_w - apply(log_w, 1, max))
#       w <- w / rowSums(w)
#       ss <- ss + sum(w * D^2)
#     }
#     sigma2 <- max(ss / (I * K), 1e-6)
#     
#     history <- rbind(history, data.frame(iter = iter, mu_0 = mu_0_new, ll = ll))
#     
#     if (iter <= 5 || iter %% 10 == 0) {
#       cat(sprintf("Iter %d: LL=%.1f, μ₀=%.4f, σ=%.4f\n", iter, ll, mu_0_new, sqrt(sigma2)))
#     }
#     
#     if (abs(mu_0_new - mu_0_old) < tol) {
#       cat("Converged!\n")
#       break
#     }
#     
#     mu_0 <- mu_0_new
#     phi <- phi_new
#   }
#   
#   # Compute E[S]
#   E_S <- sapply(1:K, function(k) sum(S[k, ] * phi) / sum(phi))
#   
#   list(
#     theta = theta,
#     phi = phi,
#     S = S,
#     E_S = E_S,
#     mu_0 = mu_0,
#     sigma = sqrt(sigma2),
#     history = history,
#     phi_uniform_dist = max(abs(phi - 1))
#   )
# }
# 
# # Fit KPT-GEM
# fit_kepler <- kpt_gem_stellar(kpt_kepler$Y, L_theta = 64, max_iter = 100)
# 
# cat("\n=== KPT-GEM RESULTS ===\n")
# cat(sprintf("Estimated μ₀: %.4f\n", fit_kepler$mu_0))
# cat(sprintf("Noise σ: %.4f\n", fit_kepler$sigma))
# cat(sprintf("φ deviation from uniform: %.4f\n", fit_kepler$phi_uniform_dist))

First, implement KPT-GEM.

# ==============================================================================
# KPT-GEM FOR STELLAR DATA
# ==============================================================================
# 
# Updates:
# 1. Convergence checks all parameters (φ, S, σ), not just μ₀
# 2. Minimum iterations to ensure exploration
# 3. Better initialization with small random perturbations
# 4. Handles standardized data properly
#
# ==============================================================================

#' KPT-GEM Algorithm for Stellar Light Curves (Fixed)
#'
#' @param Y Data matrix (n_stars × n_windows), can be standardized
#' @param L_theta Number of theta grid points
#' @param n_harmonics Number of Fourier harmonics for S(θ)
#' @param smooth_kappa Smoothing parameter for φ (higher = less smooth)
#' @param min_iter Minimum iterations before checking convergence
#' @param max_iter Maximum iterations
#' @param tol Convergence tolerance
#' @param verbose Print progress
#' @return List with θ, φ, S, predictions, and diagnostics
kpt_gem_stellar <- function(Y, 
                             L_theta = 64,
                             n_harmonics = 2,
                             smooth_kappa = 10,
                             min_iter = 10,
                             max_iter = 100,
                             tol = 1e-5,
                             verbose = TRUE) {
  
  I <- nrow(Y)  # Number of stars (realizations)
  K <- ncol(Y)  # Number of time windows
  
  # Theta grid
  theta <- seq(0, 2*pi, length.out = L_theta + 1)[1:L_theta]
  dtheta <- 2 * pi / L_theta
  
  # Fourier basis
  cos_basis <- sapply(1:n_harmonics, function(j) cos(j * theta))
  sin_basis <- sapply(1:n_harmonics, function(j) sin(j * theta))
  
  # =========================================================================
  # INITIALIZATION (with small perturbations to break symmetry)
  # =========================================================================
  
  mu_0 <- mean(Y, na.rm = TRUE)
  sigma2 <- var(as.vector(Y), na.rm = TRUE)
  sigma2 <- max(sigma2, 0.01)
  
  # Initialize φ with small random perturbations from uniform
  set.seed(123)  # For reproducibility
  phi <- rep(1, L_theta) + rnorm(L_theta, 0, 0.1)
  phi <- pmax(phi, 0.1)
  phi <- phi / mean(phi)  # Normalize to mean-1
  
  # Initialize S with window means + small θ-dependent variation
  window_means <- colMeans(Y, na.rm = TRUE)
  window_sds <- apply(Y, 2, sd, na.rm = TRUE)
  window_sds[!is.finite(window_sds)] <- 1
  
  S <- matrix(0, K, L_theta)
  for (k in 1:K) {
    # Add small sinusoidal perturbation to break symmetry
    S[k, ] <- window_means[k] + 0.1 * window_sds[k] * sin(theta + runif(1, 0, 2*pi))
  }
  
  # =========================================================================
  # TRACKING
  # =========================================================================
  
  history <- data.frame(
    iter = integer(),
    ll = numeric(),
    mu_0 = numeric(),
    sigma = numeric(),
    phi_change = numeric(),
    S_change = numeric()
  )
  
  if (verbose) {
    cat("\n")
    cat("════════════════════════════════════════════════════════════════\n")
    cat("  KPT-GEM FOR STELLAR DATA                                      \n")
    cat("════════════════════════════════════════════════════════════════\n")
    cat(sprintf("Data: %d stars × %d windows\n", I, K))
    cat(sprintf("Theta grid: %d points, %d harmonics\n", L_theta, n_harmonics))
    cat(sprintf("Initial μ₀: %.4f, σ: %.4f\n", mu_0, sqrt(sigma2)))
    cat(sprintf("Min iterations: %d\n", min_iter))
    cat("\n")
  }
  
  # =========================================================================
  # EM ITERATIONS
  # =========================================================================
  
  for (iter in 1:max_iter) {
    
    phi_old <- phi
    S_old <- S
    sigma2_old <- sigma2
    
    # =====================================================================
    # E-STEP: Compute posterior weights w_{ik}(θ)
    # =====================================================================
    
    W_sum <- matrix(0, K, L_theta)   # Sum of weights per window
    YW_sum <- matrix(0, K, L_theta)  # Sum of y*weights per window
    ll_total <- 0
    
    for (k in 1:K) {
      yk <- Y[, k]
      valid <- is.finite(yk)
      if (sum(valid) < 2) next
      yk_obs <- yk[valid]
      n_obs <- length(yk_obs)
      
      # D[i, l] = y_i - S_k(θ_l)
      D <- outer(yk_obs, S[k, ], "-")
      
      # Log-likelihood: log N(y | S(θ), σ²)
      log_lik <- -D^2 / (2 * sigma2) - 0.5 * log(2 * pi * sigma2)
      
      # Log prior: log φ(θ)
      log_prior <- log(pmax(phi, 1e-12))
      
      # Log posterior (unnormalized)
      log_w <- sweep(log_lik, 2, log_prior, "+")
      
      # Normalize (log-sum-exp for stability)
      log_w_max <- apply(log_w, 1, max)
      w <- exp(log_w - log_w_max)
      w <- w / rowSums(w)
      
      # Accumulate sufficient statistics
      W_sum[k, ] <- colSums(w)
      YW_sum[k, ] <- as.numeric(crossprod(w, yk_obs))
      
      # Log-likelihood contribution
      ll_total <- ll_total + sum(log_w_max + log(rowSums(exp(log_w - log_w_max))))
    }
    
    # =====================================================================
    # M-STEP: Update parameters
    # =====================================================================
    
    # --- Update φ (pooled across windows) ---
    phi_raw <- colSums(W_sum) + 0.01  # Small regularization
    phi_new <- phi_raw / sum(phi_raw) * L_theta  # Mean-1 normalization
    
    # Circular smoothing via von Mises kernel
    if (smooth_kappa > 0) {
      kernel <- exp(smooth_kappa * cos(theta - theta[1]))
      kernel <- kernel / sum(kernel)
      phi_new <- Re(fft(fft(phi_new) * fft(kernel), inverse = TRUE)) / L_theta
      phi_new <- pmax(phi_new, 1e-8)
      phi_new <- phi_new / mean(phi_new)
    }
    
    # --- Update S for each window ---
    S_new <- matrix(0, K, L_theta)
    
    for (k in 1:K) {
      w_l <- pmax(W_sum[k, ], 1e-10)
      y_l <- YW_sum[k, ] / w_l  # Weighted mean at each θ
      
      # Window mean (φ-weighted)
      mu_k <- sum(y_l * phi_new) / sum(phi_new)
      y_centered <- y_l - mu_k
      
      # Fit Fourier harmonics to residuals
      a_coef <- b_coef <- numeric(n_harmonics)
      for (j in 1:n_harmonics) {
        a_coef[j] <- 2 * sum(y_centered * cos_basis[, j] * phi_new) / sum(phi_new)
        b_coef[j] <- 2 * sum(y_centered * sin_basis[, j] * phi_new) / sum(phi_new)
      }
      
      # Regularization (shrink toward flat)
      shrink <- 0.8
      a_coef <- a_coef * shrink
      b_coef <- b_coef * shrink
      
      # Reconstruct S
      S_new[k, ] <- mu_k
      for (j in 1:n_harmonics) {
        S_new[k, ] <- S_new[k, ] + a_coef[j] * cos_basis[, j] + b_coef[j] * sin_basis[, j]
      }
    }
    
    # --- Update μ₀ ---
    mu_0_new <- mean(S_new)
    
    # --- Update σ² ---
    ss_total <- 0
    n_total <- 0
    
    for (k in 1:K) {
      yk <- Y[, k]
      valid <- is.finite(yk)
      if (sum(valid) < 2) next
      yk_obs <- yk[valid]
      
      D <- outer(yk_obs, S_new[k, ], "-")
      log_lik <- -D^2 / (2 * sigma2)
      log_w <- sweep(log_lik, 2, log(pmax(phi_new, 1e-12)), "+")
      log_w_max <- apply(log_w, 1, max)
      w <- exp(log_w - log_w_max)
      w <- w / rowSums(w)
      
      ss_total <- ss_total + sum(w * D^2)
      n_total <- n_total + length(yk_obs)
    }
    
    sigma2_new <- max(ss_total / max(n_total, 1), 0.001)
    
    # =====================================================================
    # CONVERGENCE CHECK (on ALL parameters)
    # =====================================================================
    
    phi_change <- max(abs(phi_new - phi_old))
    S_change <- max(abs(S_new - S_old))
    sigma_change <- abs(sqrt(sigma2_new) - sqrt(sigma2_old))
    
    max_change <- max(phi_change, S_change, sigma_change)
    
    history <- rbind(history, data.frame(
      iter = iter,
      ll = ll_total,
      mu_0 = mu_0_new,
      sigma = sqrt(sigma2_new),
      phi_change = phi_change,
      S_change = S_change
    ))
    
    if (verbose && (iter <= 5 || iter %% 10 == 0 || 
                    (iter >= min_iter && max_change < tol))) {
      cat(sprintf("Iter %3d: LL=%9.1f  μ₀=%8.4f  σ=%6.4f  Δφ=%.2e  ΔS=%.2e\n",
                  iter, ll_total, mu_0_new, sqrt(sigma2_new), phi_change, S_change))
    }
    
    # Update parameters
    phi <- phi_new
    S <- S_new
    mu_0 <- mu_0_new
    sigma2 <- sigma2_new
    
    # Check convergence (only after min_iter)
    if (iter >= min_iter && max_change < tol) {
      if (verbose) cat("Converged!\n")
      break
    }
  }
  
  # =========================================================================
  # COMPUTE FINAL QUANTITIES
  # =========================================================================
  
  # E[S(θ)] for each window
  E_S <- sapply(1:K, function(k) sum(S[k, ] * phi) / sum(phi))
  
  # Overall KPT estimate
  kpt_estimate <- mean(E_S)
  
  # Diagnostics
  phi_entropy <- -sum((phi / L_theta) * log(phi / L_theta + 1e-12))
  phi_uniform_dist <- max(abs(phi - 1))
  S_variation <- mean(apply(S, 1, sd))
  
  if (verbose) {
    cat("\n")
    cat("═══════════════════════════════════════════════════════════════\n")
    cat("  KPT-GEM RESULTS                                               \n")
    cat("═══════════════════════════════════════════════════════════════\n")
    cat(sprintf("Estimated μ₀: %.4f\n", mu_0))
    cat(sprintf("KPT estimate (mean E[S]): %.4f\n", kpt_estimate))
    cat(sprintf("Noise σ: %.4f\n", sqrt(sigma2)))
    cat(sprintf("φ entropy: %.4f (max = %.4f for uniform)\n", 
                phi_entropy, log(L_theta)))
    cat(sprintf("φ deviation from uniform: %.4f\n", phi_uniform_dist))
    cat(sprintf("S average variation: %.4f\n", S_variation))
    cat(sprintf("Iterations: %d\n", iter))
    
    if (phi_uniform_dist < 0.05 && S_variation < 0.1) {
      cat("\nNote: Minimal phase structure detected.\n")
      cat("      Data appears to have no significant θ-dependence.\n")
    } else if (phi_uniform_dist > 0.1) {
      cat("\nPhase structure detected! φ(θ) is non-uniform.\n")
    }
    cat("═══════════════════════════════════════════════════════════════\n")
  }
  
  list(
    # Grid
    theta = theta,
    L_theta = L_theta,
    
    # Estimated parameters
    mu_0 = mu_0,
    phi = phi,
    S = S,
    sigma = sqrt(sigma2),
    
    # Derived quantities
    E_S = E_S,
    kpt_estimate = kpt_estimate,
    
    # Diagnostics
    history = history,
    phi_entropy = phi_entropy,
    phi_uniform_dist = phi_uniform_dist,
    S_variation = S_variation,
    
    # Config
    n_harmonics = n_harmonics,
    converged = (iter < max_iter)
  )
}

# Fit KPT-GEM
fit_gem_kepler <- kpt_gem_stellar(kpt_kepler$Y, L_theta = 64, max_iter = 100)

## 
## ════════════════════════════════════════════════════════════════
##   KPT-GEM FOR STELLAR DATA                                      
## ════════════════════════════════════════════════════════════════
## Data: 75 stars × 28 windows
## Theta grid: 64 points, 2 harmonics
## Initial μ₀: -0.0000, σ: 0.9935
## Min iterations: 10
## 
## Iter   1: LL=   5768.0  μ₀= -0.0000  σ=0.9917  Δφ=2.14e-01  ΔS=2.67e-02
## Iter   2: LL=   5768.0  μ₀= -0.0000  σ=0.9923  Δφ=1.65e-02  ΔS=1.81e-02
## Iter   3: LL=   5768.0  μ₀= -0.0000  σ=0.9926  Δφ=9.97e-03  ΔS=1.30e-02
## Iter   4: LL=   5768.0  μ₀= -0.0000  σ=0.9929  Δφ=6.38e-03  ΔS=1.04e-02
## Iter   5: LL=   5768.0  μ₀= -0.0000  σ=0.9930  Δφ=4.33e-03  ΔS=8.65e-03
## Iter  10: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=1.14e-03  ΔS=3.29e-03
## Iter  20: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=3.08e-04  ΔS=3.91e-04
## Iter  30: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=1.51e-04  ΔS=4.33e-05
## Iter  40: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=8.52e-05  ΔS=4.71e-06
## Iter  50: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=4.97e-05  ΔS=5.09e-07
## Iter  60: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=2.92e-05  ΔS=5.49e-08
## Iter  70: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=1.72e-05  ΔS=5.91e-09
## Iter  80: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=1.01e-05  ΔS=6.36e-10
## Iter  81: LL=   5768.0  μ₀= -0.0000  σ=0.9933  Δφ=9.57e-06  ΔS=5.09e-10
## Converged!
## 
## ═══════════════════════════════════════════════════════════════
##   KPT-GEM RESULTS                                               
## ═══════════════════════════════════════════════════════════════
## Estimated μ₀: -0.0000
## KPT estimate (mean E[S]): 0.0000
## Noise σ: 0.9933
## φ entropy: 4.1589 (max = 4.1589 for uniform)
## φ deviation from uniform: 0.0002
## S average variation: 0.0000
## Iterations: 81
## 
## Note: Minimal phase structure detected.
##       Data appears to have no significant θ-dependence.
## ═══════════════════════════════════════════════════════════════

cat("\n=== KPT-GEM RESULTS ===\n")

## 
## === KPT-GEM RESULTS ===

cat(sprintf("Estimated μ₀: %.4f\n", fit_gem_kepler$mu_0))

## Estimated μ₀: -0.0000

cat(sprintf("Noise σ: %.4f\n", fit_gem_kepler$sigma))

## Noise σ: 0.9933

cat(sprintf("φ deviation from uniform: %.4f\n", fit_gem_kepler$phi_uniform_dist))

## φ deviation from uniform: 0.0002

Next, implement KPT-FFT.

# ==============================================================================
# KPT-FFT FOR STELLAR DATA
# ==============================================================================
#
# Spectral Wiener-Sobolev algorithm for kime-phase tomography.
# Uses FFT for efficient computation with Sobolev regularization.
#
# ==============================================================================

#' KPT-FFT Algorithm for Stellar Light Curves
#'
#' @param Y Data matrix (n_stars × n_windows)
#' @param J Number of Fourier harmonics (modes from -J to J)
#' @param L_theta Number of theta grid points
#' @param lambda_phi Regularization strength for φ
#' @param lambda_S Regularization strength for S
#' @param p_sobolev Sobolev smoothness order
#' @param min_iter Minimum iterations
#' @param max_iter Maximum iterations
#' @param tol Convergence tolerance
#' @param verbose Print progress
#' @return List with θ, φ, S, predictions, and diagnostics
kpt_fft_stellar <- function(Y,
                             J = 4,
                             L_theta = 64,
                             lambda_phi = 0.1,
                             lambda_S = 0.3,
                             p_sobolev = 1,
                             min_iter = 10,
                             max_iter = 100,
                             tol = 1e-5,
                             verbose = TRUE) {
  
  I <- nrow(Y)  # Stars
  K <- ncol(Y)  # Windows
  
  # Theta grid
  theta <- seq(0, 2*pi, length.out = L_theta + 1)[1:L_theta]
  
  # Harmonic indices: n ∈ {-J, ..., J}
  nvec <- seq(-J, J)
  Nn <- length(nvec)
  
  # DFT frequencies for Sobolev penalty
  omega <- 2 * pi * (0:(Nn - 1)) / Nn
  
  # Sobolev penalty weights: Λ_n = (2 sin(ω/2))^{2p}
  Lambda <- (2 * sin(0.5 * omega))^(2 * p_sobolev)
  Lambda[1] <- Lambda[1] + 0.01  # Regularize DC component
  
  # =========================================================================
  # INITIALIZATION
  # =========================================================================
  
  mu_0 <- mean(Y, na.rm = TRUE)
  sigma2 <- var(as.vector(Y), na.rm = TRUE)
  sigma2 <- max(sigma2, 0.01)
  
  # φ per window (will be pooled later)
  phi <- matrix(1.0, K, L_theta)
  
  # Add small perturbations
  set.seed(456)
  for (k in 1:K) {
    phi[k, ] <- 1 + rnorm(L_theta, 0, 0.05)
    phi[k, ] <- pmax(phi[k, ], 0.1)
    phi[k, ] <- phi[k, ] / mean(phi[k, ])
  }
  
  # S initialized near window means
  S <- matrix(0, K, L_theta)
  window_means <- colMeans(Y, na.rm = TRUE)
  window_sds <- apply(Y, 2, sd, na.rm = TRUE)
  window_sds[!is.finite(window_sds)] <- 1
  
  for (k in 1:K) {
    S[k, ] <- window_means[k] + 0.1 * window_sds[k] * cos(theta + runif(1, 0, 2*pi))
  }
  
  # Basis matrices for DFT
  Eneg <- exp(-1i * outer(theta, nvec))  # e^{-inθ}: L_theta × Nn
  Epos <- exp(1i * outer(nvec, theta))    # e^{inθ}: Nn × L_theta
  
  # Initialize Fourier coefficients of S
  f_n <- matrix(0+0i, K, Nn)
  for (k in 1:K) {
    f_n[k, ] <- as.vector(crossprod(S[k, ], Eneg)) / L_theta
  }
  
  # =========================================================================
  # FFT HELPER FUNCTIONS
  # =========================================================================
  
  # Map coefficient vector (indexed by nvec) to DFT order
  coeffs_to_dft <- function(x_n) {
    x_m <- complex(Nn)
    for (i in seq_along(nvec)) {
      n <- nvec[i]
      m <- ((n %% Nn) + Nn) %% Nn
      x_m[m + 1] <- x_n[i]
    }
    x_m
  }
  
  # Map DFT order back to coefficient vector
  dft_to_coeffs <- function(x_m) {
    x_n <- complex(Nn)
    for (i in seq_along(nvec)) {
      n <- nvec[i]
      m <- ((n %% Nn) + Nn) %% Nn
      x_n[i] <- x_m[m + 1]
    }
    x_n
  }
  
  # DFT via FFT
  Xomega <- function(x_n) fft(coeffs_to_dft(x_n))
  coeffs <- function(X_w) dft_to_coeffs(fft(X_w, inverse = TRUE) / Nn)
  
  # =========================================================================
  # TRACKING
  # =========================================================================
  
  history <- data.frame(
    iter = integer(),
    mu_0 = numeric(),
    sigma = numeric(),
    delta_phi = numeric(),
    delta_f = numeric()
  )
  
  if (verbose) {
    cat("\n")
    cat("════════════════════════════════════════════════════════════════\n")
    cat("  KPT-FFT FOR STELLAR DATA (Spectral Wiener-Sobolev)            \n")
    cat("════════════════════════════════════════════════════════════════\n")
    cat(sprintf("Data: %d stars × %d windows\n", I, K))
    cat(sprintf("Fourier modes: J = %d (indices -%d to %d)\n", J, J, J))
    cat(sprintf("Regularization: λ_φ = %.2f, λ_S = %.2f\n", lambda_phi, lambda_S))
    cat(sprintf("Initial μ₀: %.4f, σ: %.4f\n", mu_0, sqrt(sigma2)))
    cat("\n")
  }
  
  # =========================================================================
  # ALTERNATING UPDATES
  # =========================================================================
  
  for (iter in 1:max_iter) {
    
    phi_old <- phi
    f_old <- f_n
    
    # =====================================================================
    # E-STEP: Compute empirical Fourier moments
    # =====================================================================
    
    m_hat <- matrix(0+0i, K, Nn)
    
    for (k in 1:K) {
      yk <- Y[, k]
      valid <- is.finite(yk)
      if (sum(valid) < 2) next
      yk_obs <- yk[valid]
      
      # Posterior weights
      prior_k <- pmax(phi[k, ], 1e-10)
      D <- outer(yk_obs, S[k, ], "-")
      log_lik <- -D^2 / (2 * sigma2)
      log_w <- sweep(log_lik, 2, log(prior_k), "+")
      log_w_max <- apply(log_w, 1, max)
      w <- exp(log_w - log_w_max)
      w <- w / rowSums(w)
      
      # Empirical moment: m_hat[k, n] = E[y · e^{-inθ}]
      xi <- w %*% Eneg  # I × Nn
      m_hat[k, ] <- colMeans(xi * yk_obs)
    }
    
    # =====================================================================
    # Φ-STEP: Update phase distribution via Wiener filter
    # =====================================================================
    
    phi_hat_n <- matrix(0+0i, K, Nn)
    
    for (k in 1:K) {
      F_w <- Xomega(f_n[k, ])
      M_w <- Xomega(m_hat[k, ])
      
      # Wiener filter: Φ = conj(F) · M / (|F|² + λ·Λ)
      denom <- Mod(F_w)^2 + lambda_phi * Lambda + 1e-8
      Phi_w <- Conj(F_w) * M_w / denom
      
      phi_hat_n[k, ] <- coeffs(Phi_w)
    }
    
    # Synthesize φ(θ) from Fourier coefficients
    phi_new <- matrix(0, K, L_theta)
    for (k in 1:K) {
      phi_new[k, ] <- Re(drop(phi_hat_n[k, ] %*% Epos))
    }
    
    # Enforce positivity and mean-1 normalization
    phi_new <- pmax(phi_new, 1e-8)
    for (k in 1:K) {
      phi_new[k, ] <- phi_new[k, ] / mean(phi_new[k, ])
    }
    
    # =====================================================================
    # F-STEP: Update surface coefficients via Wiener filter
    # =====================================================================
    
    f_new <- matrix(0+0i, K, Nn)
    
    for (k in 1:K) {
      # Get Fourier coefficients of φ_k
      phi_coeffs <- as.vector(crossprod(phi_new[k, ], Eneg)) / L_theta
      Phi_w <- Xomega(phi_coeffs)
      M_w <- Xomega(m_hat[k, ])
      
      # Wiener filter: F = conj(Φ) · M / (|Φ|² + λ·Λ)
      denom <- Mod(Phi_w)^2 + lambda_S * Lambda + 1e-8
      F_w <- Conj(Phi_w) * M_w / denom
      
      f_new[k, ] <- coeffs(F_w)
    }
    
    # Enforce Hermitian symmetry (ensures real-valued S)
    for (k in 1:K) {
      for (i in seq_along(nvec)) {
        n <- nvec[i]
        if (n < 0) {
          j <- which(nvec == -n)
          if (length(j) == 1) {
            f_new[k, i] <- Conj(f_new[k, j])
          }
        }
      }
      # DC component must be real
      i0 <- which(nvec == 0)
      f_new[k, i0] <- Re(f_new[k, i0])
    }
    
    # Synthesize S(θ) from Fourier coefficients
    S_new <- matrix(0, K, L_theta)
    for (k in 1:K) {
      S_new[k, ] <- Re(drop(f_new[k, ] %*% Epos))
    }
    
    # =====================================================================
    # UPDATE GLOBAL PARAMETERS
    # =====================================================================
    
    # Update μ₀ from DC components
    i0 <- which(nvec == 0)
    mu_0_new <- mean(Re(f_new[, i0]))
    
    # Update σ²
    ss_total <- 0
    n_total <- 0
    
    for (k in 1:K) {
      yk <- Y[, k]
      valid <- is.finite(yk)
      if (sum(valid) < 2) next
      yk_obs <- yk[valid]
      
      D <- outer(yk_obs, S_new[k, ], "-")
      log_lik <- -D^2 / (2 * sigma2)
      log_w <- sweep(log_lik, 2, log(pmax(phi_new[k, ], 1e-10)), "+")
      log_w_max <- apply(log_w, 1, max)
      w <- exp(log_w - log_w_max)
      w <- w / rowSums(w)
      
      ss_total <- ss_total + sum(w * D^2)
      n_total <- n_total + length(yk_obs)
    }
    
    sigma2_new <- max(ss_total / max(n_total, 1), 0.001)
    
    # =====================================================================
    # CONVERGENCE CHECK
    # =====================================================================
    
    delta_phi <- sqrt(mean((phi_new - phi_old)^2))
    delta_f <- sqrt(mean(Mod(f_new - f_old)^2))
    max_change <- max(delta_phi, delta_f)
    
    history <- rbind(history, data.frame(
      iter = iter,
      mu_0 = mu_0_new,
      sigma = sqrt(sigma2_new),
      delta_phi = delta_phi,
      delta_f = delta_f
    ))
    
    if (verbose && (iter <= 5 || iter %% 10 == 0 || 
                    (iter >= min_iter && max_change < tol))) {
      cat(sprintf("Iter %3d: μ₀=%8.4f  σ=%6.4f  Δφ=%.2e  Δf=%.2e\n",
                  iter, mu_0_new, sqrt(sigma2_new), delta_phi, delta_f))
    }
    
    # Update parameters
    mu_0 <- mu_0_new
    phi <- phi_new
    f_n <- f_new
    S <- S_new
    sigma2 <- sigma2_new
    
    # Check convergence (only after min_iter)
    if (iter >= min_iter && max_change < tol) {
      if (verbose) cat("Converged!\n")
      break
    }
  }
  
  # =========================================================================
  # COMPUTE FINAL QUANTITIES
  # =========================================================================
  
  # Pool φ across windows (average)
  phi_pooled <- colMeans(phi)
  phi_pooled <- phi_pooled / mean(phi_pooled)
  
  # E[S(θ)] for each window
  E_S <- sapply(1:K, function(k) sum(S[k, ] * phi_pooled) / sum(phi_pooled))
  
  # Overall estimate
  kpt_estimate <- mean(E_S)
  
  # Diagnostics
  phi_uniform_dist <- max(abs(phi_pooled - 1))
  S_variation <- mean(apply(S, 1, sd))
  
  if (verbose) {
    cat("\n")
    cat("═══════════════════════════════════════════════════════════════\n")
    cat("  KPT-FFT RESULTS                                               \n")
    cat("═══════════════════════════════════════════════════════════════\n")
    cat(sprintf("Estimated μ₀: %.4f\n", mu_0))
    cat(sprintf("KPT estimate (mean E[S]): %.4f\n", kpt_estimate))
    cat(sprintf("Noise σ: %.4f\n", sqrt(sigma2)))
    cat(sprintf("φ (pooled) deviation from uniform: %.4f\n", phi_uniform_dist))
    cat(sprintf("S average variation: %.4f\n", S_variation))
    cat(sprintf("Iterations: %d\n", iter))
    
    if (phi_uniform_dist < 0.05 && S_variation < 0.1) {
      cat("\nNote: Minimal phase structure detected.\n")
    } else if (phi_uniform_dist > 0.1) {
      cat("\nPhase structure detected! φ(θ) is non-uniform.\n")
    }
    cat("═══════════════════════════════════════════════════════════════\n")
  }
  
  list(
    # Grid
    theta = theta,
    L_theta = L_theta,
    J = J,
    nvec = nvec,
    
    # Estimated parameters
    mu_0 = mu_0,
    phi = phi,
    phi_pooled = phi_pooled,
    S = S,
    f_n = f_n,
    sigma = sqrt(sigma2),
    
    # Derived quantities
    E_S = E_S,
    kpt_estimate = kpt_estimate,
    
    # Diagnostics
    history = history,
    phi_uniform_dist = phi_uniform_dist,
    S_variation = S_variation,
    
    # Config
    converged = (iter < max_iter)
  )
}

# Fit KPT-GEM
fit_fft_kepler <- kpt_fft_stellar(kpt_kepler$Y, L_theta = 64, max_iter = 100)

## 
## ════════════════════════════════════════════════════════════════
##   KPT-FFT FOR STELLAR DATA (Spectral Wiener-Sobolev)            
## ════════════════════════════════════════════════════════════════
## Data: 75 stars × 28 windows
## Fourier modes: J = 4 (indices -4 to 4)
## Regularization: λ_φ = 0.10, λ_S = 0.30
## Initial μ₀: -0.0000, σ: 0.9935
## 
## Iter   1: μ₀=  0.0010  σ=0.9931  Δφ=1.58e+00  Δf=1.51e-02
## Iter   2: μ₀=  0.0004  σ=0.9933  Δφ=1.28e+00  Δf=8.84e-03
## Iter   3: μ₀=  0.0004  σ=0.9932  Δφ=2.13e+00  Δf=2.64e-03
## Iter   4: μ₀=  0.0000  σ=0.9933  Δφ=2.35e+00  Δf=1.85e-03
## Iter   5: μ₀=  0.0001  σ=0.9933  Δφ=1.96e+00  Δf=1.49e-03
## Iter  10: μ₀=  0.0000  σ=0.9933  Δφ=1.24e+00  Δf=3.99e-04
## Iter  20: μ₀= -0.0000  σ=0.9933  Δφ=3.43e-01  Δf=1.42e-04
## Iter  30: μ₀= -0.0000  σ=0.9933  Δφ=3.18e-01  Δf=5.32e-05
## Iter  40: μ₀= -0.0000  σ=0.9933  Δφ=3.78e-01  Δf=2.22e-05
## Iter  50: μ₀= -0.0000  σ=0.9933  Δφ=3.75e-01  Δf=9.20e-06
## Iter  60: μ₀= -0.0000  σ=0.9933  Δφ=4.22e-01  Δf=4.14e-06
## Iter  70: μ₀= -0.0000  σ=0.9933  Δφ=3.58e-01  Δf=2.13e-06
## Iter  80: μ₀= -0.0000  σ=0.9933  Δφ=2.75e-01  Δf=1.33e-06
## Iter  90: μ₀= -0.0000  σ=0.9933  Δφ=1.21e-01  Δf=5.80e-07
## Iter  97: μ₀= -0.0000  σ=0.9933  Δφ=0.00e+00  Δf=7.83e-08
## Converged!
## 
## ═══════════════════════════════════════════════════════════════
##   KPT-FFT RESULTS                                               
## ═══════════════════════════════════════════════════════════════
## Estimated μ₀: -0.0000
## KPT estimate (mean E[S]): -0.0000
## Noise σ: 0.9933
## φ (pooled) deviation from uniform: 0.0000
## S average variation: 0.0000
## Iterations: 97
## 
## Note: Minimal phase structure detected.
## ═══════════════════════════════════════════════════════════════

cat("\n=== KPT-GEM RESULTS ===\n")

## 
## === KPT-GEM RESULTS ===

cat(sprintf("Estimated μ₀: %.4f\n", fit_fft_kepler$mu_0))

## Estimated μ₀: -0.0000

cat(sprintf("Noise σ: %.4f\n", fit_fft_kepler$sigma))

## Noise σ: 0.9933

cat(sprintf("φ deviation from uniform: %.4f\n", fit_fft_kepler$phi_uniform_dist))

## φ deviation from uniform: 0.0000

Finally, let’s test the KPT (GEM and FFT) algorithm predictions with the Kepler data.

# ==============================================================================
# PREDICTION FUNCTIONS FOR STELLAR KPT
# ==============================================================================

#' Generate predictions from KPT-GEM fit
#'
#' @param fit Output from kpt_gem_stellar()
#' @param n_new Number of new windows to predict
#' @param n_draw Number of Monte Carlo draws
kpt_gem_predict <- function(fit, n_new, n_draw = 5000) {
  
  phi <- fit$phi
  S_avg <- colMeans(fit$S)
  L <- fit$L_theta
  
  # Convert φ to probability
  p_theta <- pmax(phi, 1e-10)
  p_theta <- p_theta / sum(p_theta)
  
  draws <- matrix(NA, n_draw, n_new)
  
  for (k in 1:n_new) {
    theta_idx <- sample(L, n_draw, replace = TRUE, prob = p_theta)
    draws[, k] <- S_avg[theta_idx] + rnorm(n_draw, 0, fit$sigma)
  }
  
  list(
    point = rep(fit$kpt_estimate, n_new),
    mean = colMeans(draws),
    sd = apply(draws, 2, sd),
    draws = draws,
    pi_low = apply(draws, 2, quantile, 0.025),
    pi_high = apply(draws, 2, quantile, 0.975)
  )
}

#' Generate predictions from KPT-FFT fit
#'
#' @param fit Output from kpt_fft_stellar()
#' @param n_new Number of new windows to predict
#' @param n_draw Number of Monte Carlo draws
kpt_fft_predict <- function(fit, n_new, n_draw = 5000) {
  
  phi <- fit$phi_pooled
  S_avg <- colMeans(fit$S)
  L <- fit$L_theta
  
  p_theta <- pmax(phi, 1e-10)
  p_theta <- p_theta / sum(p_theta)
  
  draws <- matrix(NA, n_draw, n_new)
  
  for (k in 1:n_new) {
    theta_idx <- sample(L, n_draw, replace = TRUE, prob = p_theta)
    draws[, k] <- S_avg[theta_idx] + rnorm(n_draw, 0, fit$sigma)
  }
  
  list(
    point = rep(fit$kpt_estimate, n_new),
    mean = colMeans(draws),
    sd = apply(draws, 2, sd),
    draws = draws,
    pi_low = apply(draws, 2, quantile, 0.025),
    pi_high = apply(draws, 2, quantile, 0.975)
  )
}

# KPT-GEM & KPT-FFT Predictions
#' @param fit Output from kpt_gem_stellar()
#' @param n_new Number of new windows to predict
#' @param n_draw Number of Monte Carlo draws
# pred_kpt_gem <- kpt_gem_predict(fit=fit_gem_kepler, n_new=10, n_draw = 5000)
# pred_kpt_fft <- kpt_fft_predict(fit=fit_fft_kepler, n_new=10, n_draw = 5000)

# PLOT KPT-GEM and KPT-FFT Predictions
# ==============================================================================
# VISUALIZE KPT PREDICTIONS
# ==============================================================================

# Define test set (last n_test windows)
n_test <- 10  # or however many you held out
n_train <- kpt_kepler$n_windows - n_test

# Get observed test values
Y_test <- kpt_kepler$Y[, (n_train + 1):kpt_kepler$n_windows]
test_observed <- colMeans(Y_test, na.rm = TRUE)
test_sd <- apply(Y_test, 2, sd, na.rm = TRUE)
t_test <- kpt_kepler$t_center[(n_train + 1):kpt_kepler$n_windows]

# Generate predictions
pred_kpt_gem <- kpt_gem_predict(fit=fit_gem_kepler, n_new = n_test, n_draw = 5000)
pred_kpt_fft <- kpt_fft_predict(fit=fit_fft_kepler, n_new = n_test, n_draw = 5000)

# Plot
par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))

###########################################################
#  KPT-GEM
# 1. Predictions vs Observed
plot(1:n_test, test_observed, pch = 19, cex = 1.5, col = "black",
     ylim = range(c(test_observed, pred_kpt_gem$pi_low, pred_kpt_gem$pi_high)),
     main = "KPT-GEM Predictions vs Observed",
     xlab = "Test Window", ylab = "Standardized Variability")

# Add prediction intervals
arrows(1:n_test, pred_kpt_gem$pi_low, 1:n_test, pred_kpt_gem$pi_high,
       length = 0.05, angle = 90, code = 3, col = "forestgreen", lwd = 2)
points(1:n_test, pred_kpt_gem$mean, pch = 17, col = "forestgreen", cex = 1.2)

# Reference line at 0 (mean of standardized data)
abline(h = 0, lty = 2, col = "gray50")

legend("topright", c("Observed", "KPT Prediction", "95% PI"),
       pch = c(19, 17, NA), lty = c(NA, NA, 1), col = c("black", "forestgreen", "forestgreen"),
       pt.cex = c(1.5, 1.2, NA), lwd = c(NA, NA, 2))

# 2. Coverage check
coverage <- mean(test_observed >= pred_kpt_gem$pi_low & 
                 test_observed <= pred_kpt_gem$pi_high)

# Residuals
residuals <- test_observed - pred_kpt_gem$mean
plot(1:n_test, residuals, pch = 19, col = "steelblue",
     main = sprintf("Residuals (Coverage: %.0f%%)", 100 * coverage),
     xlab = "Test Window", ylab = "Observed - Predicted")
abline(h = 0, lty = 2)
abline(h = c(-1.96, 1.96), lty = 3, col = "red")

par(mfrow = c(1, 1))

# Print summary
cat("\n=== PREDICTION SUMMARY ===\n")

## 
## === PREDICTION SUMMARY ===

cat(sprintf("RMSE: %.4f\n", sqrt(mean(residuals^2))))

## RMSE: 0.0114

cat(sprintf("Bias: %.4f\n", mean(residuals)))

## Bias: -0.0064

cat(sprintf("95%% PI Coverage: %.0f%%\n", 100 * coverage))

## 95% PI Coverage: 100%

###########################################################
#  KPT-FFT
# 1. Predictions vs Observed
plot(1:n_test, test_observed, pch = 19, cex = 1.5, col = "black",
     ylim = range(c(test_observed, pred_kpt_fft$pi_low, pred_kpt_gem$pi_high)),
     main = "KPT-GEM Predictions vs Observed",
     xlab = "Test Window", ylab = "Standardized Variability")

# Add prediction intervals
arrows(1:n_test, pred_kpt_fft$pi_low, 1:n_test, pred_kpt_fft$pi_high,
       length = 0.05, angle = 90, code = 3, col = "forestgreen", lwd = 2)
points(1:n_test, pred_kpt_fft$mean, pch = 17, col = "forestgreen", cex = 1.2)

# Reference line at 0 (mean of standardized data)
abline(h = 0, lty = 2, col = "gray50")

legend("topright", c("Observed", "KPT-FFT Prediction", "95% PI"),
       pch = c(19, 17, NA), lty = c(NA, NA, 1), col = c("black", "forestgreen", "forestgreen"),
       pt.cex = c(1.5, 1.2, NA), lwd = c(NA, NA, 2))

# 2. Coverage check
coverage <- mean(test_observed >= pred_kpt_fft$pi_low & 
                 test_observed <= pred_kpt_fft$pi_high)

# Residuals
residuals <- test_observed - pred_kpt_fft$mean
plot(1:n_test, residuals, pch = 19, col = "steelblue",
     main = sprintf("KPT-FFT: Residuals (Coverage: %.0f%%)", 100 * coverage),
     xlab = "Test Window", ylab = "(KPT-FFT) Observed - Predicted")
abline(h = 0, lty = 2)
abline(h = c(-1.96, 1.96), lty = 3, col = "red")

par(mfrow = c(1, 1))

# Print summary
cat("\n=== KPT-FFT PREDICTION SUMMARY ===\n")

## 
## === KPT-FFT PREDICTION SUMMARY ===

cat(sprintf("RMSE: %.4f\n", sqrt(mean(residuals^2))))

## RMSE: 0.0066

cat(sprintf("Bias: %.4f\n", mean(residuals)))

## Bias: -0.0029

cat(sprintf("95%% PI Coverage: %.0f%%\n", 100 * coverage))

## 95% PI Coverage: 100%

2.9 Step 7: Visualize Results

par(mfrow = c(2, 2))

# 1. Convergence
plot(fit_gem_kepler$history$iter, fit_gem_kepler$history$mu_0, type = "l",
     col = "green", lwd = 4,
     main = "KPT-GEM/KPT-FFT: μ₀ Convergence", xlab = "Iteration", ylab = "μ₀")
lines(fit_fft_kepler$history$iter, fit_fft_kepler$history$mu_0, 
      col = "red", lwd = 2)
legend("bottomright", legend = c("KPT-GEM", "KPT-FFT"), 
       col = c("green", "red"), lwd = 3)
grid()

# 2. Learned φ(θ)
plot(fit_gem_kepler$theta, fit_gem_kepler$phi, type = "l", col = "green", lwd = 3,
     main = "Learned Phase Distribution φ(θ)",
     xlab = "θ", ylab = "φ(θ)", xaxt = "n")
lines(fit_fft_kepler$theta, fit_fft_kepler$phi_pooled, 
      col = "red", lwd = 2)
legend("bottomright", legend = c("KPT-GEM", "KPT-FFT"), 
       col = c("green", "red"), lwd = 3)
axis(1, at = c(0, pi/2, pi, 3*pi/2, 2*pi), 
     labels = c("0", "π/2", "π", "3π/2", "2π"))
abline(h = 1, lty = 2, col = "gray")
grid()

# 3. Kime surface (heatmap) -- KPT GEM and FFT
# image(fit_kepler$theta, 1:ncol(fit_kepler$S), t(fit_kepler$S),
#       main = "Kime Surface S(θ, t)",
#       xlab = "θ", ylab = "Time Window", xaxt = "n",
#       col = viridis::viridis(100))
# axis(1, at = c(0, pi/2, pi, 3*pi/2, 2*pi), 
#      labels = c("0", "π/2", "π", "3π/2", "2π"))
# 3. Kime surface (heatmap)
image(                                # KPT-GEM
  x = fit_gem_kepler$theta,           # Length 64
  y = 1:nrow(fit_gem_kepler$S),       # Length 19 (Matches columns of t(S))
  z = t(fit_gem_kepler$S),            # Dim [64, 19]
  main = "(KPT-GEM) Kime Surface S(θ, t)",
  xlab = "θ", 
  ylab = "Time Window", 
  col = viridis::viridis(100)
)

filled.contour(
  x = fit_gem_kepler$theta,
  y = 1:nrow(fit_gem_kepler$S),
  z = t(fit_gem_kepler$S),
  color.palette = viridis::viridis,
  xlab = "Theta",
  ylab = "Time Window"
)

image(                                # KPT-FFT
  x = fit_fft_kepler$theta,           # Length 64
  y = 1:nrow(fit_fft_kepler$S),       # Length 19 (Matches columns of t(S))
  z = t(fit_fft_kepler$S),            # Dim [64, 19]
  main = "(KPT-FFT) Kime Surface S(θ, t)",
  xlab = "θ", 
  ylab = "Time Window", 
  col = viridis::viridis(100)
)

filled.contour(
  x = fit_fft_kepler$theta,
  y = 1:nrow(fit_fft_kepler$S),
  z = t(fit_fft_kepler$S),
  color.palette = viridis::viridis,
  xlab = "Theta",
  ylab = "Time Window"
)

# 4. Star-level variability vs mass    
plot(kpt_kepler$metadata$mass, rowMeans(kpt_kepler$Y_raw),
     pch = 19, col = alpha("steelblue", 0.6),
     main = "Variability vs Star-Mass",
     xlab = "Star Mass", ylab = "Mean Variance")
grid()

par(mfrow = c(1, 1))

2.10 Step 8: Interpretation

cat("\n")

cat("═══════════════════════════════════════════════════════════════════\n")

## ═══════════════════════════════════════════════════════════════════

cat("  INTERPRETATION: KPT ON STELLAR LIGHT CURVES                      \n")

##   INTERPRETATION: KPT ON STELLAR LIGHT CURVES

cat("═══════════════════════════════════════════════════════════════════\n")

## ═══════════════════════════════════════════════════════════════════

cat("\n1. KPT-GEM PHASE STRUCTURE ANALYSIS\n")

## 
## 1. KPT-GEM PHASE STRUCTURE ANALYSIS

if (fit_gem_kepler$phi_uniform_dist < 0.2) {
  cat("   φ(θ) ≈ uniform: No strong phase structure detected.\n")
  cat("   Interpretation: Stellar variability is largely star-independent.\n")
  cat("   The 'kime-phase' θ does not significantly affect variability.\n")
} else {
  cat("   φ(θ) ≠ uniform: Significant phase structure detected!\n")
  cat("   Interpretation: There may be systematic patterns in stellar\n")
  cat("   variability that depend on some hidden 'phase' variable.\n")
}

##    φ(θ) ≈ uniform: No strong phase structure detected.
##    Interpretation: Stellar variability is largely star-independent.
##    The 'kime-phase' θ does not significantly affect variability.

cat("\n2. PHYSICAL INTERPRETATION\n")

## 
## 2. PHYSICAL INTERPRETATION

cat("   In this simulation, any uncovered phase θ could represent:\n")

##    In this simulation, any uncovered phase θ could represent:

cat("   - Stellar rotation phase (check against simulated random phases)\n")

##    - Stellar rotation phase (check against simulated random phases)

cat("   - Activity cycle phase\n")

##    - Activity cycle phase

cat("   - Instrumental systematics common to all stars\n")

##    - Instrumental systematics common to all stars

cat("\n3. KPT VALUE FOR STELLAR SCIENCE\n")

## 
## 3. KPT VALUE FOR STELLAR SCIENCE

cat("   - Characterizes the 'typical' stellar variability pattern\n")

##    - Characterizes the 'typical' stellar variability pattern

cat("   - Separates star-specific (replicate) from universal (process)\n")

##    - Separates star-specific (replicate) from universal (process)

cat("   - Could improve exoplanet transit detection\n")

##    - Could improve exoplanet transit detection

cat("\n1. KPT-FFT PHASE STRUCTURE ANALYSIS\n")

## 
## 1. KPT-FFT PHASE STRUCTURE ANALYSIS

if (fit_fft_kepler$phi_uniform_dist < 0.2) {
  cat("   φ(θ) ≈ uniform: No strong phase structure detected.\n")
  cat("   Interpretation: Stellar variability is largely star-independent.\n")
  cat("   The 'kime-phase' θ does not significantly affect variability.\n")
} else {
  cat("   φ(θ) ≠ uniform: Significant phase structure detected!\n")
  cat("   Interpretation: There may be systematic patterns in stellar\n")
  cat("   variability that depend on some hidden 'phase' variable.\n")
}

##    φ(θ) ≈ uniform: No strong phase structure detected.
##    Interpretation: Stellar variability is largely star-independent.
##    The 'kime-phase' θ does not significantly affect variability.

cat("\n═══════════════════════════════════════════════════════════════════\n")

## 
## ═══════════════════════════════════════════════════════════════════

These result are not a falsification of KPT, as they represent a valid (and expected) result.

Finding	Meaning
$\varphi(\theta)\approx$ uniform	No hidden phase variable affects stellar variability
$\mathbb{S}(\theta) \approx$ constant	Variability doesn’t depend on $\theta$
KPT estimate $\approx 0$	Correct! Standardized data has mean $0$
$95\% \ PI \approx [-2, +2]$	Matches standardized $SD\approx 1$

In other words, KPT modeling works correctly and results indicate that

Stellar variability is “phase-independent”: Different stars don’t have systematically different “phases” that affect their variability patterns
The optimal model is simply $Y \sim N(0, 1)$: No complex kime-surface improves predictions
This is scientifically meaningful: Stellar granulation/activity noise is largely star-independent (universal process).

The resulting KPT-estimate of the phase law, $\varphi\approx$ uniform means that KPT found no exploitable structure to improve predictions. The “null model”, simple Gaussian, is sufficient. This is like finding no significant effect in a well-designed experiment, it’s still represents a valuable result.

In practice, to uncover new structure, KPT would need to find $\varphi \not=$ uniform if stars with certain rotation phases had systematically different variability, or if there were hidden instrumental effects correlated across stars or some physical process created phase-locked variability patterns.

The fact that KPT did not uncover new structure is still informative, i.e., stellar noise appears to be genuinely stochastic.

3 Next Steps

cat("\n=== NEXT STEPS FOR REAL DATA ===\n")

## 
## === NEXT STEPS FOR REAL DATA ===

cat("\n1. DOWNLOAD ACTUAL KEPLER DATA:\n")

## 
## 1. DOWNLOAD ACTUAL KEPLER DATA:

cat("   - Use lightkurve Python package via reticulate\n")

##    - Use lightkurve Python package via reticulate

cat("   - Select 100-500 Sun-like stars from KIC\n")

##    - Select 100-500 Sun-like stars from KIC

cat("   - Use PDC-corrected flux for cleaned data\n")

##    - Use PDC-corrected flux for cleaned data

cat("\n2. COMPARE WITH REAL STELLAR VARIABILITY:\n")

## 
## 2. COMPARE WITH REAL STELLAR VARIABILITY:

cat("   - Known rotation periods from McQuillan et al. (2014)\n")

##    - Known rotation periods from McQuillan et al. (2014)

cat("   - Activity indices (S-index, log R'HK)\n")

##    - Activity indices (S-index, log R'HK)

cat("   - Asteroseismic properties\n")

##    - Asteroseismic properties

cat("\n3. VALIDATE KPT PREDICTIONS:\n")

## 
## 3. VALIDATE KPT PREDICTIONS:

cat("   - Train on subset of stars\n")

##    - Train on subset of stars

cat("   - Predict variability of held-out stars\n")

##    - Predict variability of held-out stars

cat("   - Compare with actual variability metrics\n")

##    - Compare with actual variability metrics

cat("\n4. EXTEND TO TESS:\n")

## 
## 4. EXTEND TO TESS:

cat("   - 10× more stars\n")

##    - 10× more stars

cat("   - Multiple sectors for some stars\n")

##    - Multiple sectors for some stars

cat("   - Different stellar populations\n")

##    - Different stellar populations

KPT Concept	Kepler Implementation
Realization index \(i\)	Star \(i\) (independent stellar system)
Time window \(k\)	\(5\)-day binned interval
Observation \(Y_{i\ k}\)	Standardized flux variance of star \(i\) in window \(k\)
Kime-phase \(\theta_i\)	Latent stellar “activity phase” or rotation phase
Phase distribution \(\varphi(\theta)\)	Distribution of stellar phases across the sample
Kime-surface \(\mathbb{S}(\theta, t)\)	Phase-dependent variability pattern

Metric	\(H_o\) (No Structure)	\(H_1\) (Structure Detected)
\(\varphi\) deviation from uniform	\(< 0.1\)	\(> 0.2\)
\(\mathbb{S}\) average \(\theta\)-variation	\(< 0.1\)	\(> 0.2\)
KPT vs Gaussian RMSE	Equal	KPT lower
\(95\%\) PI coverage	\(\sim 95\%\)	\(\sim 95\%\)

Finding	Meaning
\(\varphi(\theta)\approx\) uniform	No hidden phase variable affects stellar variability
\(\mathbb{S}(\theta) \approx\) constant	Variability doesn’t depend on \(\theta\)
KPT estimate \(\approx 0\)	Correct! Standardized data has mean \(0\)
\(95\% \ PI \approx [-2, +2]\)	Matches standardized \(SD\approx 1\)

Complex-Time (Kime) Representation and AI Spacekime Analytics

KPT Validation with Kepler Stellar Light Curves

SOCR (Ivo Dinov)

January 29, 2026