| SOCR ≫ | TCIU Website ≫ | TCIU GitHub ≫ |
Objective: Test whether Kime-Phase Tomography (KPT) can discover latent phase structure in repeated measurements of a stochastic astrophysical process—stellar brightness variability.
Rationale: KPT requires data where multiple independent realizations of a longitudinal process are observed. Kepler provides an ideal testbed: each star is an independent realization of the “stellar variability process,” observed over the same time period with identical instrumentation.
Mission: NASA Kepler Space Telescope (2009–2013)
Archive: Mikulski Archive for Space Telescopes (MAST)
Product: Pre-search Data Conditioning Simple Aperture Photometry (PDCSAP) flux.
Sample Selection:
Data Dictionary
| Variable | Dimension | Description |
|---|---|---|
Y |
\(I \times K\) | Standardized variability matrix; rows = stars, columns = time windows |
Y_raw |
\(I \times K\) | Raw variance of flux within each window (\(ppm^2\)) |
t_center |
\(K\) | Window center times (days from start) |
n_stars |
\(1\) | Number of stellar realizations (\(I = 75\)) |
n_windows |
\(1\) | Number of temporal windows (\(K = 28\)) |
metadata |
\(I \times 10\) | Stellar properties from Kepler Input Catalog |
Metadata fields: kepid (Kepler ID),
ra, dec (J2000 coordinates), teff
(effective temperature, K), logg (surface gravity, dex),
feh (metallicity, dex), kepmag (Kepler
magnitude), radius (R☉), mass (M☉),
star_id (index).
The KPT data structure mapping is shown below.
| KPT Concept | Kepler Implementation |
|---|---|
| Realization index \(i\) | Star \(i\) (independent stellar system) |
| Time window \(k\) | \(5\)-day binned interval |
| Observation \(Y_{i\ k}\) | Standardized flux variance of star \(i\) in window \(k\) |
| Kime-phase \(\theta_i\) | Latent stellar “activity phase” or rotation phase |
| Phase distribution \(\varphi(\theta)\) | Distribution of stellar phases across the sample |
| Kime-surface \(\mathbb{S}(\theta, t)\) | Phase-dependent variability pattern |
Null Hypothesis (\(H_o\)): Stellar variability is phase-independent. In other words, \(\varphi(\theta)=\) uniform (all phases equally likely), \(\mathbb{S}(\theta, t)=\) constant (no \(\theta\)-dependence), and the optimal model is \(Y_{i k} \sim N(\mu, \sigma^2)\).
Alternative Hypothesis (\(H_1\)): Stellar variability depends on a latent phase, i.e., \(\varphi(\theta)\not=\) uniform (certain phases more prevalent), \(\mathbb{S}(\theta, t)\) varies with \(\theta\) (phase-dependent patterns exist), and KPT provides improved predictions beyond the additive noise, simple Gaussian model.
| Metric | \(H_o\) (No Structure) | \(H_1\) (Structure Detected) |
|---|---|---|
| \(\varphi\) deviation from uniform | \(< 0.1\) | \(> 0.2\) |
| \(\mathbb{S}\) average \(\theta\)-variation | \(< 0.1\) | \(> 0.2\) |
| KPT vs Gaussian RMSE | Equal | KPT lower |
| \(95\%\) PI coverage | \(\sim 95\%\) | \(\sim 95\%\) |
Our experiments using Kepler stellar data indicate that:
Observed: \(\varphi(\theta)\approx\) uniform, \(\mathbb{S}(\theta, t)\approx\) constant.
Hence, KPT correctly identified that stellar variability in this sample has no exploitable kime-phase structure. The stochastic process (granulation, magnetic activity) is statistically homogeneous across stars. Thus, there is no evidence of hidden phase variable systematically affects variability levels.
This vital KPT experiment does not falsify KPT. It demonstrates KPT correctly recovers the null structure when no phase-dependence exists. This validates the algorithm’s behavior and confirms that Sun-like stellar variability is well-modeled as a phase-independent Gaussian process.
This notebook provides a complete implementation for using Kepler stellar light curves as the validation dataset for Kime-Phase Tomography.
We select Sun-like stars with similar properties to ensure the “process” being measured is comparable across realizations.
#' Query NASA Exoplanet Archive for Kepler stellar properties
#'
#' @param n_targets Maximum number of targets
#' @param teff_range Temperature range (K) for Sun-like stars
#' @param logg_range Surface gravity range for main-sequence
#' @param mag_max Maximum Kepler magnitude (brightness limit)
query_sunlike_targets <- function(n_targets = 500,
teff_range = c(5500, 6000),
logg_range = c(4.0, 4.6),
mag_max = 13) {
base_url <- "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
query <- sprintf(
"SELECT TOP %d kepid, ra, dec, teff, logg, feh, kepmag, radius, mass
FROM keplerstellar
WHERE teff BETWEEN %d AND %d
AND logg BETWEEN %.1f AND %.1f
AND kepmag < %.1f
AND radius IS NOT NULL
ORDER BY kepmag ASC",
n_targets,
teff_range[1], teff_range[2],
logg_range[1], logg_range[2],
mag_max
)
response <- GET(base_url, query = list(query = query, format = "json"))
if (status_code(response) != 200) {
stop("Query failed with status: ", status_code(response))
}
data <- fromJSON(rawToChar(response$content))
as_tibble(data)
}
# Query Sun-like targets
cat("Querying Kepler archive for Sun-like stars...\n")## Querying Kepler archive for Sun-like stars...
targets <- tryCatch({
query_sunlike_targets(n_targets = 200)
}, error = function(e) {
cat("Error:", e$message, "\n")
cat("Using simulated data for demonstration\n")
NULL
})
if (!is.null(targets)) {
cat(sprintf("\nFound %d Sun-like targets\n", nrow(targets)))
cat("\nStellar parameter ranges:\n")
cat(sprintf(" Teff: %d - %d K\n", min(targets$teff), max(targets$teff)))
cat(sprintf(" log g: %.2f - %.2f\n", min(targets$logg), max(targets$logg)))
cat(sprintf(" Kepmag: %.1f - %.1f\n", min(targets$kepmag), max(targets$kepmag)))
print(head(targets, 10))
}##
## Found 200 Sun-like targets
##
## Stellar parameter ranges:
## Teff: 5545 - 5997 K
## log g: 4.09 - 4.58
## Kepmag: 6.0 - 13.0
## # A tibble: 10 × 9
## kepid ra dec teff logg feh kepmag radius mass
## <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 10005473 289. 47.0 5780 4.44 0 6.00 1 NA
## 2 10002106 287. 47.0 5780 4.44 0 8.98 1 NA
## 3 10064712 287. 47.1 5780 4.44 0 9.36 1 NA
## 4 10068519 289. 47.1 5780 4.44 0 9.54 1 NA
## 5 10031808 299. 46.9 5780 4.44 0 9.56 1 NA
## 6 10200031 288. 47.2 5780 4.44 0 9.84 1 NA
## 7 10202015 289. 47.2 5780 4.44 0 9.86 1 NA
## 8 10139791 291. 47.2 5780 4.44 0 9.90 1 NA
## 9 10082904 295. 47.1 5780 4.44 0 9.93 1 NA
## 10 10068482 289. 47.1 5780 4.44 0 9.99 1 NA
#' Get MAST data product URLs for a Kepler target
#'
#' @param kepid Kepler Input Catalog ID
#' @param product_type Type of light curve ("LLC" for long cadence)
get_kepler_urls <- function(kepid, product_type = "LLC") {
# MAST API query for Kepler data products
mast_url <- "https://mast.stsci.edu/api/v0/invoke"
request_body <- list(
service = "Mast.Caom.Filtered",
format = "json",
params = list(
filters = list(
list(paramName = "target_name", values = list(as.character(kepid))),
list(paramName = "obs_collection", values = list("Kepler")),
list(paramName = "dataproduct_type", values = list("timeseries"))
)
)
)
response <- POST(
mast_url,
body = toJSON(request_body, auto_unbox = TRUE),
content_type_json(),
timeout(60)
)
if (status_code(response) != 200) {
return(NULL)
}
result <- fromJSON(rawToChar(response$content))
return(result$data)
}
#' Download and parse a Kepler FITS light curve
#'
#' @param url URL to FITS file
#' @param use_pdcflux Use PDC-corrected flux (TRUE) or SAP flux (FALSE)
download_lightcurve <- function(url, use_pdcflux = TRUE) {
# Download to temp file
temp_file <- tempfile(fileext = ".fits")
tryCatch({
download.file(url, temp_file, mode = "wb", quiet = TRUE)
# Read FITS file
fits <- readFITS(temp_file)
# Extract time and flux
time <- fits$col[[1]] # TIME column (BJD - 2454833)
if (use_pdcflux) {
flux <- fits$col[[8]] # PDCSAP_FLUX (corrected)
} else {
flux <- fits$col[[4]] # SAP_FLUX (raw)
}
quality <- fits$col[[10]] # SAP_QUALITY flags
# Clean up temp file
unlink(temp_file)
# Return data frame with good quality data
df <- data.frame(
time = time,
flux = flux,
quality = quality
) %>%
filter(quality == 0, !is.na(flux), !is.na(time))
return(df)
}, error = function(e) {
unlink(temp_file)
return(NULL)
})
}For a simple demonstration without downloading actual Kepler data, we simulate Kepler-like light curves with realistic variability properties.
#' Simulate Kepler-like stellar light curves
#'
#' Includes realistic components:
#' - Granulation noise (red noise)
#' - Rotation + spots (quasi-periodic)
#' - Photon noise (white noise)
#'
#' @param n_stars Number of stars (realizations)
#' @param n_time Number of time points
#' @param cadence Cadence in minutes
#' @param variability_ppm RMS variability in parts per million
simulate_kepler_lightcurves <- function(n_stars = 100,
n_time = 5000,
cadence = 30,
variability_ppm = 100) {
# Time in days
dt <- cadence / (60 * 24) # Convert minutes to days
t <- seq(0, (n_time - 1) * dt, by = dt)
# Initialize output matrix
Y <- matrix(NA_real_, nrow = n_stars, ncol = n_time)
metadata <- tibble(
star_id = 1:n_stars,
rotation_period = numeric(n_stars),
granulation_tau = numeric(n_stars),
activity_level = numeric(n_stars)
)
cat("Simulating", n_stars, "stellar light curves...\n")
for (i in 1:n_stars) {
# Stellar parameters (vary between stars)
P_rot <- runif(1, 10, 30) # Rotation period (days)
tau_gran <- runif(1, 0.1, 0.5) # Granulation timescale (days)
activity <- runif(1, 0.3, 1.5) # Activity level multiplier
metadata$rotation_period[i] <- P_rot
metadata$granulation_tau[i] <- tau_gran
metadata$activity_level[i] <- activity
# 1. Granulation noise (red noise via AR process)
ar_coef <- exp(-dt / tau_gran)
granulation <- numeric(n_time)
granulation[1] <- rnorm(1, 0, variability_ppm * 0.5 * activity)
for (j in 2:n_time) {
granulation[j] <- ar_coef * granulation[j-1] +
sqrt(1 - ar_coef^2) * rnorm(1, 0, variability_ppm * 0.5 * activity)
}
# 2. Rotation + spots (quasi-periodic)
phase <- runif(1, 0, 2*pi) # Random initial phase
spot_amp <- variability_ppm * 0.3 * activity
rotation_signal <- spot_amp * sin(2*pi*t/P_rot + phase) +
spot_amp * 0.5 * sin(4*pi*t/P_rot + phase + runif(1, 0, pi))
# 3. Photon noise (white noise, depends on stellar brightness)
photon_noise <- rnorm(n_time, 0, variability_ppm * 0.2)
# Combine components
flux <- 1e6 + granulation + rotation_signal + photon_noise
# Normalize to zero mean, unit variance
Y[i, ] <- (flux - mean(flux)) / sd(flux)
}
cat("Done!\n")
list(
Y = Y,
t = t,
metadata = metadata,
n_stars = n_stars,
n_time = n_time,
cadence_min = cadence
)
}
# Generate simulated data
set.seed(42)
kepler_sim <- simulate_kepler_lightcurves(
n_stars = 100,
n_time = 2000, # About 40 days at 30-min cadence
cadence = 30,
variability_ppm = 100
)## Simulating 100 stellar light curves...
## Done!
##
## === SIMULATED KEPLER DATA ===
## Stars (realizations): 100
## Time points: 2000
## Time span: 41.6 days
## Cadence: 30 minutes
Here is a robust version operating with real Kepler data.
Python package) and it falls back to pure R
if unavailable,simulate_kepler_lightcurves(),Invocation protocol, which replaces the prior simulation chunk with this real Kepler download chunk.
Instead of using
kepler_sim <- simulate_kepler_lightcurves(n_stars = 100, ...)
The real Kepler data use-case invokes the method
download_kepler_lightcurves().
kepler_real <- download_kepler_lightcurves(
n_stars = 100, # 100 independent stellar realizations
teff_range = c(5500, 6000), # Sun-like stars
max_time_days = 60, # ~2 months of data
method = "auto" # Auto-select best method
)
Then continuing with using the existing methods
create_kpt_matrix_kepler() and
kpt_gem_stellar().
kpt_kepler <- create_kpt_matrix_kepler(kepler_real, window_days = 5, step_days = 2)
fit_kepler <- kpt_gem_stellar(kpt_kepler$Y)
The key invocation parameters include:
download_kepler_lightcurves(
n_stars = 100, # Number of independent realizations
teff_range = c(5500, 6000), # Temperature range (K)
logg_range = c(4.0, 4.6), # Surface gravity (main sequence)
mag_range = c(10, 13), # Brightness (brighter = better SNR)
quarter = 5, # Kepler quarter (1-17)
max_time_days = 60, # Time span to use
method = "auto" # "auto", "lightkurve", or "mast"
)
In practice, the real Kepler data download may take \(5-20\) minutes depending on \(n\_stars\) value and network speed.
# ==============================================================================
# DOWNLOAD REAL KEPLER STELLAR LIGHT CURVES - UNIFIED VERSION
# ==============================================================================
#
# This chunk downloads actual observed Kepler light curves to use as the
# ultimate validation dataset for Kime-Phase Tomography.
#
# Two methods available:
# 1. lightkurve (Python) via reticulate - Most reliable, more features
# 2. Pure R via direct MAST API - No Python dependency
#
# Output structure matches simulate_kepler_lightcurves():
# - Y: matrix (n_stars × n_time) of standardized flux
# - t: time vector in days (starting at 0)
# - metadata: tibble with stellar properties
# - n_stars, n_time, cadence_min: scalar metadata
#
# ==============================================================================
library(tidyverse)
library(httr)
library(jsonlite)
# ==============================================================================
# HELPER: Query Kepler Input Catalog
# ==============================================================================
query_kepler_targets <- function(n_targets = 500,
teff_range = c(5500, 6000),
logg_range = c(4.0, 4.6),
mag_range = c(10, 13)) {
tap_url <- "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
query <- sprintf(
"SELECT TOP %d kepid, ra, dec, teff, logg, feh, kepmag, radius, mass
FROM keplerstellar
WHERE teff BETWEEN %d AND %d
AND logg BETWEEN %.2f AND %.2f
AND kepmag BETWEEN %.1f AND %.1f
AND radius IS NOT NULL AND mass IS NOT NULL
ORDER BY kepmag ASC",
n_targets,
teff_range[1], teff_range[2],
logg_range[1], logg_range[2],
mag_range[1], mag_range[2]
)
response <- GET(tap_url, query = list(query = query, format = "json"), timeout(120))
if (status_code(response) != 200) {
stop("Failed to query Kepler stellar catalog")
}
as_tibble(fromJSON(rawToChar(response$content)))
}
# ==============================================================================
# METHOD 1: Using lightkurve (Python) - RECOMMENDED
# ==============================================================================
download_kepler_via_lightkurve <- function(targets,
n_stars = 100,
quarter = 5,
cadence = "long",
max_time_days = 60,
verbose = TRUE) {
# Check Python availability
if (!requireNamespace("reticulate", quietly = TRUE)) {
stop("reticulate package required. Install with: install.packages('reticulate')")
}
library(reticulate)
# Check/install lightkurve
if (!py_module_available("lightkurve")) {
if (verbose) cat("Installing lightkurve Python package...\n")
py_install("lightkurve", pip = TRUE)
}
lk <- import("lightkurve")
if (verbose) cat("Downloading via lightkurve...\n")
lightcurves <- list()
metadata_list <- list()
successful <- 0
for (i in seq_len(min(nrow(targets), n_stars * 3))) {
if (successful >= n_stars) break
kepid <- targets$kepid[i]
if (verbose && (i %% 20 == 0 || i == 1)) {
cat(sprintf(" Target %d: KIC %d (success: %d/%d)\n", i, kepid, successful, n_stars))
}
lc_result <- tryCatch({
search_str <- sprintf("KIC %d", kepid)
search_result <- lk$search_lightcurve(
search_str,
mission = "Kepler",
quarter = as.integer(quarter),
cadence = cadence
)
if (length(search_result) == 0) return(NULL)
lc <- search_result$download(quality_bitmask = "hard")
time_raw <- as.numeric(lc$time$value)
flux_raw <- as.numeric(lc$flux$value)
valid <- is.finite(time_raw) & is.finite(flux_raw)
if (sum(valid) < 100) return(NULL)
list(
time = time_raw[valid] - min(time_raw[valid]),
flux = (flux_raw[valid] / median(flux_raw[valid]) - 1) * 1e6,
kepid = kepid
)
}, error = function(e) NULL)
if (!is.null(lc_result)) {
successful <- successful + 1
lightcurves[[successful]] <- lc_result
metadata_list[[successful]] <- targets[i, ] %>% mutate(star_id = successful)
}
Sys.sleep(0.3)
}
list(lightcurves = lightcurves, metadata = bind_rows(metadata_list), n = successful)
}
# ==============================================================================
# METHOD 2: Pure R via MAST API
# ==============================================================================
download_kepler_via_mast <- function(targets,
n_stars = 50,
max_time_days = 60,
verbose = TRUE) {
if (!requireNamespace("FITSio", quietly = TRUE)) {
install.packages("FITSio")
}
library(FITSio)
if (verbose) cat("Downloading via direct MAST API (pure R)...\n")
lightcurves <- list()
metadata_list <- list()
successful <- 0
# Quarter timestamps for filename construction
q_timestamps <- c("2010078095331", "2010174085026", "2009350155506", "2010265121752")
for (i in seq_len(min(nrow(targets), n_stars * 5))) {
if (successful >= n_stars) break
kepid <- targets$kepid[i]
kepid_str <- sprintf("%09d", kepid)
prefix <- substr(kepid_str, 1, 4)
if (verbose && (i %% 10 == 0 || i == 1)) {
cat(sprintf(" Target %d: KIC %d (success: %d/%d)\n", i, kepid, successful, n_stars))
}
lc_result <- NULL
for (ts in q_timestamps) {
fits_url <- sprintf(
"https://archive.stsci.edu/missions/kepler/lightcurves/%s/%s/kplr%s-%s_llc.fits",
prefix, kepid_str, kepid_str, ts
)
lc_result <- tryCatch({
temp_file <- tempfile(fileext = ".fits")
on.exit(unlink(temp_file), add = TRUE)
download.file(fits_url, temp_file, mode = "wb", quiet = TRUE)
fits_data <- readFITS(temp_file)
col_names <- fits_data$colNames
time_idx <- which(col_names == "TIME")
flux_idx <- which(col_names == "PDCSAP_FLUX")
qual_idx <- which(col_names == "SAP_QUALITY")
if (length(time_idx) == 0 || length(flux_idx) == 0) return(NULL)
time <- as.numeric(fits_data$col[[time_idx]])
flux <- as.numeric(fits_data$col[[flux_idx]])
quality <- if (length(qual_idx) > 0) as.integer(fits_data$col[[qual_idx]]) else rep(0L, length(time))
valid <- (quality == 0) & is.finite(time) & is.finite(flux) & (flux > 0)
if (sum(valid) < 100) return(NULL)
list(
time = time[valid] - min(time[valid]),
flux = (flux[valid] / median(flux[valid]) - 1) * 1e6,
kepid = kepid
)
}, error = function(e) NULL)
if (!is.null(lc_result)) break
}
if (!is.null(lc_result)) {
successful <- successful + 1
lightcurves[[successful]] <- lc_result
metadata_list[[successful]] <- targets[i, ] %>% mutate(star_id = successful)
}
Sys.sleep(0.2)
}
list(lightcurves = lightcurves, metadata = bind_rows(metadata_list), n = successful)
}
# ==============================================================================
# MAIN FUNCTION: Align and create matrix
# ==============================================================================
create_kepler_matrix <- function(download_result, max_time_days = 60) {
lightcurves <- download_result$lightcurves
metadata <- download_result$metadata
n <- download_result$n
if (n < 5) stop("Too few successful downloads")
# Find common time range
t_min <- max(sapply(lightcurves, function(lc) min(lc$time)))
t_max <- min(sapply(lightcurves, function(lc) max(lc$time)))
if (!is.null(max_time_days)) {
t_max <- min(t_max, t_min + max_time_days)
}
# Create time grid (30-min cadence)
dt <- 0.0208
t_grid <- seq(t_min, t_max, by = dt)
n_time <- length(t_grid)
# Build matrix
Y <- matrix(NA_real_, nrow = n, ncol = n_time)
for (i in seq_len(n)) {
lc <- lightcurves[[i]]
flux_interp <- approx(lc$time, lc$flux, xout = t_grid, rule = 1)$y
if (sum(!is.na(flux_interp)) > 50) {
Y[i, ] <- (flux_interp - mean(flux_interp, na.rm = TRUE)) /
sd(flux_interp, na.rm = TRUE)
}
}
list(
Y = Y,
t = t_grid,
metadata = metadata,
n_stars = n,
n_time = n_time,
cadence_min = dt * 24 * 60,
source = "Kepler/MAST"
)
}
# ==============================================================================
# UNIFIED DOWNLOAD FUNCTION
# ==============================================================================
#' Download real Kepler light curves
#'
#' @param n_stars Number of stars to download
#' @param teff_range Temperature range for Sun-like stars (K)
#' @param logg_range Surface gravity range (dex)
#' @param mag_range Kepler magnitude range
#' @param quarter Kepler quarter (1-17)
#' @param max_time_days Maximum time span to include
#' @param method "auto" (try lightkurve, fall back to mast), "lightkurve", or "mast"
#' @param verbose Print progress
#' @return List with Y matrix, t vector, metadata (same as simulate_kepler_lightcurves)
download_kepler_lightcurves <- function(n_stars = 100,
teff_range = c(5500, 6000),
logg_range = c(4.0, 4.6),
mag_range = c(10, 13),
quarter = 5,
max_time_days = 60,
method = "auto",
verbose = TRUE) {
if (verbose) {
cat("\n")
cat("═══════════════════════════════════════════════════════════════════\n")
cat(" DOWNLOADING REAL KEPLER STELLAR LIGHT CURVES \n")
cat("═══════════════════════════════════════════════════════════════════\n")
}
# Step 1: Query targets
if (verbose) cat("\nStep 1: Querying Kepler stellar catalog...\n")
targets <- query_kepler_targets(
n_targets = n_stars * 5,
teff_range = teff_range,
logg_range = logg_range,
mag_range = mag_range
)
if (verbose) {
cat(sprintf(" Found %d candidate Sun-like stars\n", nrow(targets)))
}
# Step 2: Download light curves
if (verbose) cat("\nStep 2: Downloading light curves...\n")
download_result <- NULL
if (method %in% c("auto", "lightkurve")) {
download_result <- tryCatch({
download_kepler_via_lightkurve(
targets, n_stars = n_stars, quarter = quarter,
max_time_days = max_time_days, verbose = verbose
)
}, error = function(e) {
if (verbose) cat(" lightkurve method failed:", e$message, "\n")
NULL
})
}
if (is.null(download_result) && method %in% c("auto", "mast")) {
if (verbose && method == "auto") cat(" Falling back to direct MAST API...\n")
download_result <- download_kepler_via_mast(
targets, n_stars = min(n_stars, 75), # MAST is slower
max_time_days = max_time_days, verbose = verbose
)
}
if (is.null(download_result) || download_result$n < 5) {
stop("Failed to download sufficient light curves")
}
if (verbose) cat(sprintf(" Downloaded %d light curves\n", download_result$n))
# Step 3: Create aligned matrix
if (verbose) cat("\nStep 3: Creating aligned data matrix...\n")
result <- create_kepler_matrix(download_result, max_time_days = max_time_days)
# Summary
if (verbose) {
cat("\n")
cat("═══════════════════════════════════════════════════════════════════\n")
cat(" DOWNLOAD COMPLETE \n")
cat("═══════════════════════════════════════════════════════════════════\n")
cat(sprintf("Stars (realizations): %d\n", result$n_stars))
cat(sprintf("Time points: %d\n", result$n_time))
cat(sprintf("Time span: %.1f days\n", max(result$t)))
cat(sprintf("Cadence: %.1f minutes\n", result$cadence_min))
cat(sprintf("Matrix dimensions: %d × %d\n", nrow(result$Y), ncol(result$Y)))
cat(sprintf("Missing values: %.1f%%\n", 100 * mean(is.na(result$Y))))
cat("\nStellar properties:\n")
cat(sprintf(" Teff: %d - %d K\n",
min(result$metadata$teff), max(result$metadata$teff)))
cat(sprintf(" Kepmag: %.1f - %.1f\n",
min(result$metadata$kepmag), max(result$metadata$kepmag)))
cat("═══════════════════════════════════════════════════════════════════\n")
}
result
}
# ==============================================================================
# EXECUTE: DOWNLOAD REAL DATA
# ==============================================================================
cat("Starting Kepler light curve download...\n")## Starting Kepler light curve download...
## This may take 5-20 minutes depending on n_stars and method...
# Download real Kepler data
# This replaces simulate_kepler_lightcurves() with real observed data
# # OLD
# kepler_real <- download_kepler_lightcurves(
# n_stars = 100, # Number of stars (realizations)
# teff_range = c(5500, 6000), # Sun-like temperature
# logg_range = c(4.0, 4.6), # Main sequence
# mag_range = c(10, 13), # Bright enough for good data
# quarter = 5, # Kepler Q5 (good quality)
# max_time_days = 60, # ~2 months of data
# method = "auto", # Try lightkurve, fall back to MAST
# verbose = TRUE
# )
n_stars = 10
cache_file <- sprintf("kepler_lightcurves_%s.rds",
gsub("-", "", n_stars))
if (file.exists(cache_file)) {
message("Loading cached Kepler data: ", cache_file)
kepler_real <- readRDS(cache_file)
} else {
# Retrieve data from Horizons
kepler_real <- tryCatch({
kepler_real <- download_kepler_lightcurves(
n_stars = 100, # Number of stars (realizations)
teff_range = c(5500, 6000), # Sun-like temperature
logg_range = c(4.0, 4.6), # Main sequence
mag_range = c(10, 13), # Bright enough for good data
quarter = 5, # Kepler Q5 (good quality)
max_time_days = 60, # ~2 months of data
method = "auto", # Try lightkurve, fall back to MAST
verbose = TRUE
)
}, error = function(e) {
cat("Primary request failed, trying backup range...\n")
# fetch_horizons_data(
# start_date = "2000-01-01",
# end_date = "2024-06-01",
# step_size = "1 d"
# )
})
saveRDS(kepler_real, cache_file)
message("Saved cached Kepler Lightcurves Data: ", cache_file)
}
# Quick validation
cat("\n=== DATA READY FOR KPT ===\n")##
## === DATA READY FOR KPT ===
## Use kepler_real$Y as input to create_kpt_matrix_kepler()
## Dimensions match simulate_kepler_lightcurves() output
##
## First 5 stars:
## # A tibble: 5 × 10
## kepid ra dec teff logg feh kepmag radius mass star_id
## <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10677047 296. 47.9 5663 4.54 -0.14 10.1 0.849 0.916 1
## 2 11081642 291. 48.6 5993 4.25 -0.02 10.1 1.27 1.05 2
## 3 9760362 287. 46.6 5860 4.21 -0.2 10.1 1.25 0.929 3
## 4 10934241 298. 48.4 5912 4.49 -0.16 10.3 0.934 0.974 4
## 5 11200185 296. 48.8 5580 4.48 -0.6 10.3 0.814 0.724 5
##
## Flux statistics per star (first 5):
for (i in 1:min(5, nrow(kepler_real$Y))) {
flux <- kepler_real$Y[i, ]
cat(sprintf(" Star %d (KIC %d): mean=%.3f, sd=%.3f, range=[%.2f, %.2f]\n",
i, kepler_real$metadata$kepid[i],
mean(flux, na.rm=TRUE), sd(flux, na.rm=TRUE),
min(flux, na.rm=TRUE), max(flux, na.rm=TRUE)))
}## Star 1 (KIC 10677047): mean=0.000, sd=1.000, range=[-2.68, 2.44]
## Star 2 (KIC 11081642): mean=-0.000, sd=1.000, range=[-2.78, 2.34]
## Star 3 (KIC 9760362): mean=-0.000, sd=1.000, range=[-3.30, 1.82]
## Star 4 (KIC 10934241): mean=0.000, sd=1.000, range=[-2.56, 1.85]
## Star 5 (KIC 11200185): mean=-0.000, sd=1.000, range=[-2.96, 2.25]
The output of the code-chunk above may look like this ….
Starting Kepler light curve download...
This may take 5-20 minutes depending on n_stars and method...
═══════════════════════════════════════════════════════════════════
DOWNLOADING REAL KEPLER STELLAR LIGHT CURVES
═══════════════════════════════════════════════════════════════════
...
C:\Users\IvoD\ANACON~1\Lib\site-packages\lightkurve\prf\__init__.py:7: UserWarning: Warning: the tpfmodel submodule is not available without oktopus installed, which requires a current version of autograd. See #1452 for details.
warnings.warn(
Downloading via lightkurve...
Target 1: KIC 10677047 (success: 0/100)
Target 20: KIC 10331562 (success: 0/100)
Warning: 35% (1566/4538) of the cadences will be ignored due to the quality mask (quality_bitmask=1664431).
Falling back to direct MAST API...
Downloading via direct MAST API (pure R)...
Target 1: KIC 10677047 (success: 0/75)
Target 10: KIC 10477502 (success: 9/75)
Target 20: KIC 10331562 (success: 19/75)
Target 30: KIC 10389154 (success: 29/75)
Target 40: KIC 11127479 (success: 39/75)
Target 50: KIC 9962623 (success: 49/75)
Target 60: KIC 10514429 (success: 59/75)
Target 70: KIC 10857462 (success: 69/75)
Downloaded 75 light curves
Step 3: Creating aligned data matrix...
═══════════════════════════════════════════════════════════════════
DOWNLOAD COMPLETE
═══════════════════════════════════════════════════════════════════
Stars (realizations): 75
Time points: 2885
Time span: 60.0 days
Cadence: 30.0 minutes
Matrix dimensions: 75 � 2885
Missing values: 0.0%
Stellar properties:
Teff: 5530 - 5999 K
Kepmag: 10.1 - 11.6
═══════════════════════════════════════════════════════════════════
=== DATA READY FOR KPT ===
Use kepler_real$Y as input to create_kpt_matrix_kepler()
Dimensions match simulate_kepler_lightcurves() output
First 5 stars:
# A tibble: 5 x 10
kepid ra dec teff logg feh kepmag radius mass star_id
1 10677047 296. 47.9 5663 4.54 -0.14 10.1 0.849 0.916 1
2 11081642 291. 48.6 5993 4.25 -0.02 10.1 1.27 1.05 2
3 9760362 287. 46.6 5860 4.21 -0.2 10.1 1.25 0.929 3
4 10934241 298. 48.4 5912 4.49 -0.16 10.3 0.934 0.974 4
5 11200185 296. 48.8 5580 4.48 -0.6 10.3 0.814 0.724 5
Flux statistics per star (first 5):
Star 1 (KIC 10677047): mean=0.000, sd=1.000, range=[-2.68, 2.44]
Star 2 (KIC 11081642): mean=-0.000, sd=1.000, range=[-2.78, 2.34]
Star 3 (KIC 9760362): mean=-0.000, sd=1.000, range=[-3.30, 1.82]
Star 4 (KIC 10934241): mean=0.000, sd=1.000, range=[-2.56, 1.85]
Star 5 (KIC 11200185): mean=-0.000, sd=1.000, range=[-2.96, 2.25]
# Plot sample light curves
par(mfrow = c(3, 1), mar = c(4, 4, 2, 1))
kepler_data <- kepler_real # Real Kepler Data
# kepler_data <- kepler_sim # SImulated Kepler Data
for (i in 1:3) {
plot(kepler_data$t, kepler_data$Y[i, ], type = "l", col = "steelblue",
xlab = "Time (days)", ylab = "Normalized Flux",
main = sprintf("Star %d (feh = %.1f d, mass = %.2f d)",
i, kepler_data$metadata$feh[i],
kepler_data$metadata$mass[i]))
# main = sprintf("Star %d (P_rot = %.1f d, τ_gran = %.2f d)",
# i, kepler_data$metadata$rotation_period[i],
# kepler_data$metadata$granulation_tau[i]))
grid()
}#' Create KPT matrix from light curves
#'
#' Groups time points into windows for temporal analysis.
#'
#' @param sim Output from simulate_kepler_lightcurves()
#' @param window_days Window size in days
#' @param step_days Step between windows
create_kpt_matrix_kepler <- function(sim,
window_days = 5,
step_days = 2) {
Y_full <- sim$Y
t <- sim$t
dt <- t[2] - t[1]
n_stars <- nrow(Y_full)
# Create windows
window_pts <- round(window_days / dt)
step_pts <- round(step_days / dt)
# Window centers
center_idx <- seq(window_pts/2 + 1, ncol(Y_full) - window_pts/2, by = step_pts)
n_windows <- length(center_idx)
cat(sprintf("Creating KPT matrix:\n"))
cat(sprintf(" Window size: %.1f days (%d points)\n", window_days, window_pts))
cat(sprintf(" Step size: %.1f days (%d points)\n", step_days, step_pts))
cat(sprintf(" Number of windows: %d\n", n_windows))
# For each window, compute summary statistic (variance or RMS)
# This gives us I realizations × K time windows
Y <- matrix(NA_real_, nrow = n_stars, ncol = n_windows)
t_center <- numeric(n_windows)
for (k in seq_along(center_idx)) {
idx <- (center_idx[k] - window_pts/2 + 1):(center_idx[k] + window_pts/2)
idx <- idx[idx > 0 & idx <= ncol(Y_full)]
# Compute variance within window (measure of variability)
Y[, k] <- apply(Y_full[, idx, drop = FALSE], 1, var)
t_center[k] <- mean(t[idx])
}
# Log-transform variance (more Gaussian)
Y_log <- log(Y)
# Standardize
Y_std <- scale(Y_log)
list(
Y = Y_std,
Y_raw = Y,
t_center = t_center,
n_stars = n_stars,
n_windows = n_windows,
metadata = sim$metadata
)
}
# Create KPT matrix
kpt_kepler <- create_kpt_matrix_kepler(kepler_data, window_days = 5, step_days = 2)## Creating KPT matrix:
## Window size: 5.0 days (240 points)
## Step size: 2.0 days (96 points)
## Number of windows: 28
cat(sprintf("\nKPT Matrix dimensions: %d stars × %d windows\n",
kpt_kepler$n_stars, kpt_kepler$n_windows))##
## KPT Matrix dimensions: 75 stars × 28 windows
## Overall mean: -0.0000
## Overall SD: 0.9935
# #' Simplified KPT-GEM for stellar data
# kpt_gem_stellar <- function(Y, L_theta = 64, max_iter = 50, tol = 1e-5) {
#
# I <- nrow(Y) # Stars
# K <- ncol(Y) # Windows
# theta <- seq(0, 2*pi, length.out = L_theta + 1)[1:L_theta]
#
# # Initialize
# mu_0 <- mean(Y)
# phi <- rep(1, L_theta)
# S <- matrix(mu_0, K, L_theta)
# sigma2 <- var(as.vector(Y))
#
# history <- data.frame(iter = integer(), mu_0 = numeric(), ll = numeric())
#
# cat("Running KPT-GEM on stellar data...\n")
#
# for (iter in 1:max_iter) {
# mu_0_old <- mu_0
#
# # E-step
# W_sum <- matrix(0, K, L_theta)
# YW_sum <- matrix(0, K, L_theta)
# ll <- 0
#
# for (k in 1:K) {
# yk <- Y[, k]
# D <- outer(yk, S[k, ], "-")
# log_lik <- -D^2 / (2 * sigma2)
# log_w <- sweep(log_lik, 2, log(pmax(phi, 1e-10)), "+")
# w_max <- apply(log_w, 1, max)
# w <- exp(log_w - w_max)
# w <- w / rowSums(w)
#
# W_sum[k, ] <- colSums(w)
# YW_sum[k, ] <- as.numeric(crossprod(w, yk))
# ll <- ll + sum(w_max + log(rowSums(exp(log_w - w_max))))
# }
#
# # M-step
# phi_new <- colSums(W_sum)
# phi_new <- phi_new / sum(phi_new) * L_theta
#
# for (k in 1:K) {
# w_l <- pmax(W_sum[k, ], 1e-10)
# S[k, ] <- YW_sum[k, ] / w_l
# }
#
# mu_0_new <- mean(S)
#
# # Update sigma
# ss <- 0
# for (k in 1:K) {
# yk <- Y[, k]
# D <- outer(yk, S[k, ], "-")
# log_lik <- -D^2 / (2 * sigma2)
# log_w <- sweep(log_lik, 2, log(pmax(phi_new, 1e-10)), "+")
# w <- exp(log_w - apply(log_w, 1, max))
# w <- w / rowSums(w)
# ss <- ss + sum(w * D^2)
# }
# sigma2 <- max(ss / (I * K), 1e-6)
#
# history <- rbind(history, data.frame(iter = iter, mu_0 = mu_0_new, ll = ll))
#
# if (iter <= 5 || iter %% 10 == 0) {
# cat(sprintf("Iter %d: LL=%.1f, μ₀=%.4f, σ=%.4f\n", iter, ll, mu_0_new, sqrt(sigma2)))
# }
#
# if (abs(mu_0_new - mu_0_old) < tol) {
# cat("Converged!\n")
# break
# }
#
# mu_0 <- mu_0_new
# phi <- phi_new
# }
#
# # Compute E[S]
# E_S <- sapply(1:K, function(k) sum(S[k, ] * phi) / sum(phi))
#
# list(
# theta = theta,
# phi = phi,
# S = S,
# E_S = E_S,
# mu_0 = mu_0,
# sigma = sqrt(sigma2),
# history = history,
# phi_uniform_dist = max(abs(phi - 1))
# )
# }
#
# # Fit KPT-GEM
# fit_kepler <- kpt_gem_stellar(kpt_kepler$Y, L_theta = 64, max_iter = 100)
#
# cat("\n=== KPT-GEM RESULTS ===\n")
# cat(sprintf("Estimated μ₀: %.4f\n", fit_kepler$mu_0))
# cat(sprintf("Noise σ: %.4f\n", fit_kepler$sigma))
# cat(sprintf("φ deviation from uniform: %.4f\n", fit_kepler$phi_uniform_dist))First, implement KPT-GEM.
# ==============================================================================
# KPT-GEM FOR STELLAR DATA
# ==============================================================================
#
# Updates:
# 1. Convergence checks all parameters (φ, S, σ), not just μ₀
# 2. Minimum iterations to ensure exploration
# 3. Better initialization with small random perturbations
# 4. Handles standardized data properly
#
# ==============================================================================
#' KPT-GEM Algorithm for Stellar Light Curves (Fixed)
#'
#' @param Y Data matrix (n_stars × n_windows), can be standardized
#' @param L_theta Number of theta grid points
#' @param n_harmonics Number of Fourier harmonics for S(θ)
#' @param smooth_kappa Smoothing parameter for φ (higher = less smooth)
#' @param min_iter Minimum iterations before checking convergence
#' @param max_iter Maximum iterations
#' @param tol Convergence tolerance
#' @param verbose Print progress
#' @return List with θ, φ, S, predictions, and diagnostics
kpt_gem_stellar <- function(Y,
L_theta = 64,
n_harmonics = 2,
smooth_kappa = 10,
min_iter = 10,
max_iter = 100,
tol = 1e-5,
verbose = TRUE) {
I <- nrow(Y) # Number of stars (realizations)
K <- ncol(Y) # Number of time windows
# Theta grid
theta <- seq(0, 2*pi, length.out = L_theta + 1)[1:L_theta]
dtheta <- 2 * pi / L_theta
# Fourier basis
cos_basis <- sapply(1:n_harmonics, function(j) cos(j * theta))
sin_basis <- sapply(1:n_harmonics, function(j) sin(j * theta))
# =========================================================================
# INITIALIZATION (with small perturbations to break symmetry)
# =========================================================================
mu_0 <- mean(Y, na.rm = TRUE)
sigma2 <- var(as.vector(Y), na.rm = TRUE)
sigma2 <- max(sigma2, 0.01)
# Initialize φ with small random perturbations from uniform
set.seed(123) # For reproducibility
phi <- rep(1, L_theta) + rnorm(L_theta, 0, 0.1)
phi <- pmax(phi, 0.1)
phi <- phi / mean(phi) # Normalize to mean-1
# Initialize S with window means + small θ-dependent variation
window_means <- colMeans(Y, na.rm = TRUE)
window_sds <- apply(Y, 2, sd, na.rm = TRUE)
window_sds[!is.finite(window_sds)] <- 1
S <- matrix(0, K, L_theta)
for (k in 1:K) {
# Add small sinusoidal perturbation to break symmetry
S[k, ] <- window_means[k] + 0.1 * window_sds[k] * sin(theta + runif(1, 0, 2*pi))
}
# =========================================================================
# TRACKING
# =========================================================================
history <- data.frame(
iter = integer(),
ll = numeric(),
mu_0 = numeric(),
sigma = numeric(),
phi_change = numeric(),
S_change = numeric()
)
if (verbose) {
cat("\n")
cat("════════════════════════════════════════════════════════════════\n")
cat(" KPT-GEM FOR STELLAR DATA \n")
cat("════════════════════════════════════════════════════════════════\n")
cat(sprintf("Data: %d stars × %d windows\n", I, K))
cat(sprintf("Theta grid: %d points, %d harmonics\n", L_theta, n_harmonics))
cat(sprintf("Initial μ₀: %.4f, σ: %.4f\n", mu_0, sqrt(sigma2)))
cat(sprintf("Min iterations: %d\n", min_iter))
cat("\n")
}
# =========================================================================
# EM ITERATIONS
# =========================================================================
for (iter in 1:max_iter) {
phi_old <- phi
S_old <- S
sigma2_old <- sigma2
# =====================================================================
# E-STEP: Compute posterior weights w_{ik}(θ)
# =====================================================================
W_sum <- matrix(0, K, L_theta) # Sum of weights per window
YW_sum <- matrix(0, K, L_theta) # Sum of y*weights per window
ll_total <- 0
for (k in 1:K) {
yk <- Y[, k]
valid <- is.finite(yk)
if (sum(valid) < 2) next
yk_obs <- yk[valid]
n_obs <- length(yk_obs)
# D[i, l] = y_i - S_k(θ_l)
D <- outer(yk_obs, S[k, ], "-")
# Log-likelihood: log N(y | S(θ), σ²)
log_lik <- -D^2 / (2 * sigma2) - 0.5 * log(2 * pi * sigma2)
# Log prior: log φ(θ)
log_prior <- log(pmax(phi, 1e-12))
# Log posterior (unnormalized)
log_w <- sweep(log_lik, 2, log_prior, "+")
# Normalize (log-sum-exp for stability)
log_w_max <- apply(log_w, 1, max)
w <- exp(log_w - log_w_max)
w <- w / rowSums(w)
# Accumulate sufficient statistics
W_sum[k, ] <- colSums(w)
YW_sum[k, ] <- as.numeric(crossprod(w, yk_obs))
# Log-likelihood contribution
ll_total <- ll_total + sum(log_w_max + log(rowSums(exp(log_w - log_w_max))))
}
# =====================================================================
# M-STEP: Update parameters
# =====================================================================
# --- Update φ (pooled across windows) ---
phi_raw <- colSums(W_sum) + 0.01 # Small regularization
phi_new <- phi_raw / sum(phi_raw) * L_theta # Mean-1 normalization
# Circular smoothing via von Mises kernel
if (smooth_kappa > 0) {
kernel <- exp(smooth_kappa * cos(theta - theta[1]))
kernel <- kernel / sum(kernel)
phi_new <- Re(fft(fft(phi_new) * fft(kernel), inverse = TRUE)) / L_theta
phi_new <- pmax(phi_new, 1e-8)
phi_new <- phi_new / mean(phi_new)
}
# --- Update S for each window ---
S_new <- matrix(0, K, L_theta)
for (k in 1:K) {
w_l <- pmax(W_sum[k, ], 1e-10)
y_l <- YW_sum[k, ] / w_l # Weighted mean at each θ
# Window mean (φ-weighted)
mu_k <- sum(y_l * phi_new) / sum(phi_new)
y_centered <- y_l - mu_k
# Fit Fourier harmonics to residuals
a_coef <- b_coef <- numeric(n_harmonics)
for (j in 1:n_harmonics) {
a_coef[j] <- 2 * sum(y_centered * cos_basis[, j] * phi_new) / sum(phi_new)
b_coef[j] <- 2 * sum(y_centered * sin_basis[, j] * phi_new) / sum(phi_new)
}
# Regularization (shrink toward flat)
shrink <- 0.8
a_coef <- a_coef * shrink
b_coef <- b_coef * shrink
# Reconstruct S
S_new[k, ] <- mu_k
for (j in 1:n_harmonics) {
S_new[k, ] <- S_new[k, ] + a_coef[j] * cos_basis[, j] + b_coef[j] * sin_basis[, j]
}
}
# --- Update μ₀ ---
mu_0_new <- mean(S_new)
# --- Update σ² ---
ss_total <- 0
n_total <- 0
for (k in 1:K) {
yk <- Y[, k]
valid <- is.finite(yk)
if (sum(valid) < 2) next
yk_obs <- yk[valid]
D <- outer(yk_obs, S_new[k, ], "-")
log_lik <- -D^2 / (2 * sigma2)
log_w <- sweep(log_lik, 2, log(pmax(phi_new, 1e-12)), "+")
log_w_max <- apply(log_w, 1, max)
w <- exp(log_w - log_w_max)
w <- w / rowSums(w)
ss_total <- ss_total + sum(w * D^2)
n_total <- n_total + length(yk_obs)
}
sigma2_new <- max(ss_total / max(n_total, 1), 0.001)
# =====================================================================
# CONVERGENCE CHECK (on ALL parameters)
# =====================================================================
phi_change <- max(abs(phi_new - phi_old))
S_change <- max(abs(S_new - S_old))
sigma_change <- abs(sqrt(sigma2_new) - sqrt(sigma2_old))
max_change <- max(phi_change, S_change, sigma_change)
history <- rbind(history, data.frame(
iter = iter,
ll = ll_total,
mu_0 = mu_0_new,
sigma = sqrt(sigma2_new),
phi_change = phi_change,
S_change = S_change
))
if (verbose && (iter <= 5 || iter %% 10 == 0 ||
(iter >= min_iter && max_change < tol))) {
cat(sprintf("Iter %3d: LL=%9.1f μ₀=%8.4f σ=%6.4f Δφ=%.2e ΔS=%.2e\n",
iter, ll_total, mu_0_new, sqrt(sigma2_new), phi_change, S_change))
}
# Update parameters
phi <- phi_new
S <- S_new
mu_0 <- mu_0_new
sigma2 <- sigma2_new
# Check convergence (only after min_iter)
if (iter >= min_iter && max_change < tol) {
if (verbose) cat("Converged!\n")
break
}
}
# =========================================================================
# COMPUTE FINAL QUANTITIES
# =========================================================================
# E[S(θ)] for each window
E_S <- sapply(1:K, function(k) sum(S[k, ] * phi) / sum(phi))
# Overall KPT estimate
kpt_estimate <- mean(E_S)
# Diagnostics
phi_entropy <- -sum((phi / L_theta) * log(phi / L_theta + 1e-12))
phi_uniform_dist <- max(abs(phi - 1))
S_variation <- mean(apply(S, 1, sd))
if (verbose) {
cat("\n")
cat("═══════════════════════════════════════════════════════════════\n")
cat(" KPT-GEM RESULTS \n")
cat("═══════════════════════════════════════════════════════════════\n")
cat(sprintf("Estimated μ₀: %.4f\n", mu_0))
cat(sprintf("KPT estimate (mean E[S]): %.4f\n", kpt_estimate))
cat(sprintf("Noise σ: %.4f\n", sqrt(sigma2)))
cat(sprintf("φ entropy: %.4f (max = %.4f for uniform)\n",
phi_entropy, log(L_theta)))
cat(sprintf("φ deviation from uniform: %.4f\n", phi_uniform_dist))
cat(sprintf("S average variation: %.4f\n", S_variation))
cat(sprintf("Iterations: %d\n", iter))
if (phi_uniform_dist < 0.05 && S_variation < 0.1) {
cat("\nNote: Minimal phase structure detected.\n")
cat(" Data appears to have no significant θ-dependence.\n")
} else if (phi_uniform_dist > 0.1) {
cat("\nPhase structure detected! φ(θ) is non-uniform.\n")
}
cat("═══════════════════════════════════════════════════════════════\n")
}
list(
# Grid
theta = theta,
L_theta = L_theta,
# Estimated parameters
mu_0 = mu_0,
phi = phi,
S = S,
sigma = sqrt(sigma2),
# Derived quantities
E_S = E_S,
kpt_estimate = kpt_estimate,
# Diagnostics
history = history,
phi_entropy = phi_entropy,
phi_uniform_dist = phi_uniform_dist,
S_variation = S_variation,
# Config
n_harmonics = n_harmonics,
converged = (iter < max_iter)
)
}
# Fit KPT-GEM
fit_gem_kepler <- kpt_gem_stellar(kpt_kepler$Y, L_theta = 64, max_iter = 100)##
## ════════════════════════════════════════════════════════════════
## KPT-GEM FOR STELLAR DATA
## ════════════════════════════════════════════════════════════════
## Data: 75 stars × 28 windows
## Theta grid: 64 points, 2 harmonics
## Initial μ₀: -0.0000, σ: 0.9935
## Min iterations: 10
##
## Iter 1: LL= 5768.0 μ₀= -0.0000 σ=0.9917 Δφ=2.14e-01 ΔS=2.67e-02
## Iter 2: LL= 5768.0 μ₀= -0.0000 σ=0.9923 Δφ=1.65e-02 ΔS=1.81e-02
## Iter 3: LL= 5768.0 μ₀= -0.0000 σ=0.9926 Δφ=9.97e-03 ΔS=1.30e-02
## Iter 4: LL= 5768.0 μ₀= -0.0000 σ=0.9929 Δφ=6.38e-03 ΔS=1.04e-02
## Iter 5: LL= 5768.0 μ₀= -0.0000 σ=0.9930 Δφ=4.33e-03 ΔS=8.65e-03
## Iter 10: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=1.14e-03 ΔS=3.29e-03
## Iter 20: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=3.08e-04 ΔS=3.91e-04
## Iter 30: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=1.51e-04 ΔS=4.33e-05
## Iter 40: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=8.52e-05 ΔS=4.71e-06
## Iter 50: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=4.97e-05 ΔS=5.09e-07
## Iter 60: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=2.92e-05 ΔS=5.49e-08
## Iter 70: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=1.72e-05 ΔS=5.91e-09
## Iter 80: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=1.01e-05 ΔS=6.36e-10
## Iter 81: LL= 5768.0 μ₀= -0.0000 σ=0.9933 Δφ=9.57e-06 ΔS=5.09e-10
## Converged!
##
## ═══════════════════════════════════════════════════════════════
## KPT-GEM RESULTS
## ═══════════════════════════════════════════════════════════════
## Estimated μ₀: -0.0000
## KPT estimate (mean E[S]): 0.0000
## Noise σ: 0.9933
## φ entropy: 4.1589 (max = 4.1589 for uniform)
## φ deviation from uniform: 0.0002
## S average variation: 0.0000
## Iterations: 81
##
## Note: Minimal phase structure detected.
## Data appears to have no significant θ-dependence.
## ═══════════════════════════════════════════════════════════════
##
## === KPT-GEM RESULTS ===
## Estimated μ₀: -0.0000
## Noise σ: 0.9933
## φ deviation from uniform: 0.0002
Next, implement KPT-FFT.
# ==============================================================================
# KPT-FFT FOR STELLAR DATA
# ==============================================================================
#
# Spectral Wiener-Sobolev algorithm for kime-phase tomography.
# Uses FFT for efficient computation with Sobolev regularization.
#
# ==============================================================================
#' KPT-FFT Algorithm for Stellar Light Curves
#'
#' @param Y Data matrix (n_stars × n_windows)
#' @param J Number of Fourier harmonics (modes from -J to J)
#' @param L_theta Number of theta grid points
#' @param lambda_phi Regularization strength for φ
#' @param lambda_S Regularization strength for S
#' @param p_sobolev Sobolev smoothness order
#' @param min_iter Minimum iterations
#' @param max_iter Maximum iterations
#' @param tol Convergence tolerance
#' @param verbose Print progress
#' @return List with θ, φ, S, predictions, and diagnostics
kpt_fft_stellar <- function(Y,
J = 4,
L_theta = 64,
lambda_phi = 0.1,
lambda_S = 0.3,
p_sobolev = 1,
min_iter = 10,
max_iter = 100,
tol = 1e-5,
verbose = TRUE) {
I <- nrow(Y) # Stars
K <- ncol(Y) # Windows
# Theta grid
theta <- seq(0, 2*pi, length.out = L_theta + 1)[1:L_theta]
# Harmonic indices: n ∈ {-J, ..., J}
nvec <- seq(-J, J)
Nn <- length(nvec)
# DFT frequencies for Sobolev penalty
omega <- 2 * pi * (0:(Nn - 1)) / Nn
# Sobolev penalty weights: Λ_n = (2 sin(ω/2))^{2p}
Lambda <- (2 * sin(0.5 * omega))^(2 * p_sobolev)
Lambda[1] <- Lambda[1] + 0.01 # Regularize DC component
# =========================================================================
# INITIALIZATION
# =========================================================================
mu_0 <- mean(Y, na.rm = TRUE)
sigma2 <- var(as.vector(Y), na.rm = TRUE)
sigma2 <- max(sigma2, 0.01)
# φ per window (will be pooled later)
phi <- matrix(1.0, K, L_theta)
# Add small perturbations
set.seed(456)
for (k in 1:K) {
phi[k, ] <- 1 + rnorm(L_theta, 0, 0.05)
phi[k, ] <- pmax(phi[k, ], 0.1)
phi[k, ] <- phi[k, ] / mean(phi[k, ])
}
# S initialized near window means
S <- matrix(0, K, L_theta)
window_means <- colMeans(Y, na.rm = TRUE)
window_sds <- apply(Y, 2, sd, na.rm = TRUE)
window_sds[!is.finite(window_sds)] <- 1
for (k in 1:K) {
S[k, ] <- window_means[k] + 0.1 * window_sds[k] * cos(theta + runif(1, 0, 2*pi))
}
# Basis matrices for DFT
Eneg <- exp(-1i * outer(theta, nvec)) # e^{-inθ}: L_theta × Nn
Epos <- exp(1i * outer(nvec, theta)) # e^{inθ}: Nn × L_theta
# Initialize Fourier coefficients of S
f_n <- matrix(0+0i, K, Nn)
for (k in 1:K) {
f_n[k, ] <- as.vector(crossprod(S[k, ], Eneg)) / L_theta
}
# =========================================================================
# FFT HELPER FUNCTIONS
# =========================================================================
# Map coefficient vector (indexed by nvec) to DFT order
coeffs_to_dft <- function(x_n) {
x_m <- complex(Nn)
for (i in seq_along(nvec)) {
n <- nvec[i]
m <- ((n %% Nn) + Nn) %% Nn
x_m[m + 1] <- x_n[i]
}
x_m
}
# Map DFT order back to coefficient vector
dft_to_coeffs <- function(x_m) {
x_n <- complex(Nn)
for (i in seq_along(nvec)) {
n <- nvec[i]
m <- ((n %% Nn) + Nn) %% Nn
x_n[i] <- x_m[m + 1]
}
x_n
}
# DFT via FFT
Xomega <- function(x_n) fft(coeffs_to_dft(x_n))
coeffs <- function(X_w) dft_to_coeffs(fft(X_w, inverse = TRUE) / Nn)
# =========================================================================
# TRACKING
# =========================================================================
history <- data.frame(
iter = integer(),
mu_0 = numeric(),
sigma = numeric(),
delta_phi = numeric(),
delta_f = numeric()
)
if (verbose) {
cat("\n")
cat("════════════════════════════════════════════════════════════════\n")
cat(" KPT-FFT FOR STELLAR DATA (Spectral Wiener-Sobolev) \n")
cat("════════════════════════════════════════════════════════════════\n")
cat(sprintf("Data: %d stars × %d windows\n", I, K))
cat(sprintf("Fourier modes: J = %d (indices -%d to %d)\n", J, J, J))
cat(sprintf("Regularization: λ_φ = %.2f, λ_S = %.2f\n", lambda_phi, lambda_S))
cat(sprintf("Initial μ₀: %.4f, σ: %.4f\n", mu_0, sqrt(sigma2)))
cat("\n")
}
# =========================================================================
# ALTERNATING UPDATES
# =========================================================================
for (iter in 1:max_iter) {
phi_old <- phi
f_old <- f_n
# =====================================================================
# E-STEP: Compute empirical Fourier moments
# =====================================================================
m_hat <- matrix(0+0i, K, Nn)
for (k in 1:K) {
yk <- Y[, k]
valid <- is.finite(yk)
if (sum(valid) < 2) next
yk_obs <- yk[valid]
# Posterior weights
prior_k <- pmax(phi[k, ], 1e-10)
D <- outer(yk_obs, S[k, ], "-")
log_lik <- -D^2 / (2 * sigma2)
log_w <- sweep(log_lik, 2, log(prior_k), "+")
log_w_max <- apply(log_w, 1, max)
w <- exp(log_w - log_w_max)
w <- w / rowSums(w)
# Empirical moment: m_hat[k, n] = E[y · e^{-inθ}]
xi <- w %*% Eneg # I × Nn
m_hat[k, ] <- colMeans(xi * yk_obs)
}
# =====================================================================
# Φ-STEP: Update phase distribution via Wiener filter
# =====================================================================
phi_hat_n <- matrix(0+0i, K, Nn)
for (k in 1:K) {
F_w <- Xomega(f_n[k, ])
M_w <- Xomega(m_hat[k, ])
# Wiener filter: Φ = conj(F) · M / (|F|² + λ·Λ)
denom <- Mod(F_w)^2 + lambda_phi * Lambda + 1e-8
Phi_w <- Conj(F_w) * M_w / denom
phi_hat_n[k, ] <- coeffs(Phi_w)
}
# Synthesize φ(θ) from Fourier coefficients
phi_new <- matrix(0, K, L_theta)
for (k in 1:K) {
phi_new[k, ] <- Re(drop(phi_hat_n[k, ] %*% Epos))
}
# Enforce positivity and mean-1 normalization
phi_new <- pmax(phi_new, 1e-8)
for (k in 1:K) {
phi_new[k, ] <- phi_new[k, ] / mean(phi_new[k, ])
}
# =====================================================================
# F-STEP: Update surface coefficients via Wiener filter
# =====================================================================
f_new <- matrix(0+0i, K, Nn)
for (k in 1:K) {
# Get Fourier coefficients of φ_k
phi_coeffs <- as.vector(crossprod(phi_new[k, ], Eneg)) / L_theta
Phi_w <- Xomega(phi_coeffs)
M_w <- Xomega(m_hat[k, ])
# Wiener filter: F = conj(Φ) · M / (|Φ|² + λ·Λ)
denom <- Mod(Phi_w)^2 + lambda_S * Lambda + 1e-8
F_w <- Conj(Phi_w) * M_w / denom
f_new[k, ] <- coeffs(F_w)
}
# Enforce Hermitian symmetry (ensures real-valued S)
for (k in 1:K) {
for (i in seq_along(nvec)) {
n <- nvec[i]
if (n < 0) {
j <- which(nvec == -n)
if (length(j) == 1) {
f_new[k, i] <- Conj(f_new[k, j])
}
}
}
# DC component must be real
i0 <- which(nvec == 0)
f_new[k, i0] <- Re(f_new[k, i0])
}
# Synthesize S(θ) from Fourier coefficients
S_new <- matrix(0, K, L_theta)
for (k in 1:K) {
S_new[k, ] <- Re(drop(f_new[k, ] %*% Epos))
}
# =====================================================================
# UPDATE GLOBAL PARAMETERS
# =====================================================================
# Update μ₀ from DC components
i0 <- which(nvec == 0)
mu_0_new <- mean(Re(f_new[, i0]))
# Update σ²
ss_total <- 0
n_total <- 0
for (k in 1:K) {
yk <- Y[, k]
valid <- is.finite(yk)
if (sum(valid) < 2) next
yk_obs <- yk[valid]
D <- outer(yk_obs, S_new[k, ], "-")
log_lik <- -D^2 / (2 * sigma2)
log_w <- sweep(log_lik, 2, log(pmax(phi_new[k, ], 1e-10)), "+")
log_w_max <- apply(log_w, 1, max)
w <- exp(log_w - log_w_max)
w <- w / rowSums(w)
ss_total <- ss_total + sum(w * D^2)
n_total <- n_total + length(yk_obs)
}
sigma2_new <- max(ss_total / max(n_total, 1), 0.001)
# =====================================================================
# CONVERGENCE CHECK
# =====================================================================
delta_phi <- sqrt(mean((phi_new - phi_old)^2))
delta_f <- sqrt(mean(Mod(f_new - f_old)^2))
max_change <- max(delta_phi, delta_f)
history <- rbind(history, data.frame(
iter = iter,
mu_0 = mu_0_new,
sigma = sqrt(sigma2_new),
delta_phi = delta_phi,
delta_f = delta_f
))
if (verbose && (iter <= 5 || iter %% 10 == 0 ||
(iter >= min_iter && max_change < tol))) {
cat(sprintf("Iter %3d: μ₀=%8.4f σ=%6.4f Δφ=%.2e Δf=%.2e\n",
iter, mu_0_new, sqrt(sigma2_new), delta_phi, delta_f))
}
# Update parameters
mu_0 <- mu_0_new
phi <- phi_new
f_n <- f_new
S <- S_new
sigma2 <- sigma2_new
# Check convergence (only after min_iter)
if (iter >= min_iter && max_change < tol) {
if (verbose) cat("Converged!\n")
break
}
}
# =========================================================================
# COMPUTE FINAL QUANTITIES
# =========================================================================
# Pool φ across windows (average)
phi_pooled <- colMeans(phi)
phi_pooled <- phi_pooled / mean(phi_pooled)
# E[S(θ)] for each window
E_S <- sapply(1:K, function(k) sum(S[k, ] * phi_pooled) / sum(phi_pooled))
# Overall estimate
kpt_estimate <- mean(E_S)
# Diagnostics
phi_uniform_dist <- max(abs(phi_pooled - 1))
S_variation <- mean(apply(S, 1, sd))
if (verbose) {
cat("\n")
cat("═══════════════════════════════════════════════════════════════\n")
cat(" KPT-FFT RESULTS \n")
cat("═══════════════════════════════════════════════════════════════\n")
cat(sprintf("Estimated μ₀: %.4f\n", mu_0))
cat(sprintf("KPT estimate (mean E[S]): %.4f\n", kpt_estimate))
cat(sprintf("Noise σ: %.4f\n", sqrt(sigma2)))
cat(sprintf("φ (pooled) deviation from uniform: %.4f\n", phi_uniform_dist))
cat(sprintf("S average variation: %.4f\n", S_variation))
cat(sprintf("Iterations: %d\n", iter))
if (phi_uniform_dist < 0.05 && S_variation < 0.1) {
cat("\nNote: Minimal phase structure detected.\n")
} else if (phi_uniform_dist > 0.1) {
cat("\nPhase structure detected! φ(θ) is non-uniform.\n")
}
cat("═══════════════════════════════════════════════════════════════\n")
}
list(
# Grid
theta = theta,
L_theta = L_theta,
J = J,
nvec = nvec,
# Estimated parameters
mu_0 = mu_0,
phi = phi,
phi_pooled = phi_pooled,
S = S,
f_n = f_n,
sigma = sqrt(sigma2),
# Derived quantities
E_S = E_S,
kpt_estimate = kpt_estimate,
# Diagnostics
history = history,
phi_uniform_dist = phi_uniform_dist,
S_variation = S_variation,
# Config
converged = (iter < max_iter)
)
}
# Fit KPT-GEM
fit_fft_kepler <- kpt_fft_stellar(kpt_kepler$Y, L_theta = 64, max_iter = 100)##
## ════════════════════════════════════════════════════════════════
## KPT-FFT FOR STELLAR DATA (Spectral Wiener-Sobolev)
## ════════════════════════════════════════════════════════════════
## Data: 75 stars × 28 windows
## Fourier modes: J = 4 (indices -4 to 4)
## Regularization: λ_φ = 0.10, λ_S = 0.30
## Initial μ₀: -0.0000, σ: 0.9935
##
## Iter 1: μ₀= 0.0010 σ=0.9931 Δφ=1.58e+00 Δf=1.51e-02
## Iter 2: μ₀= 0.0004 σ=0.9933 Δφ=1.28e+00 Δf=8.84e-03
## Iter 3: μ₀= 0.0004 σ=0.9932 Δφ=2.13e+00 Δf=2.64e-03
## Iter 4: μ₀= 0.0000 σ=0.9933 Δφ=2.35e+00 Δf=1.85e-03
## Iter 5: μ₀= 0.0001 σ=0.9933 Δφ=1.96e+00 Δf=1.49e-03
## Iter 10: μ₀= 0.0000 σ=0.9933 Δφ=1.24e+00 Δf=3.99e-04
## Iter 20: μ₀= -0.0000 σ=0.9933 Δφ=3.43e-01 Δf=1.42e-04
## Iter 30: μ₀= -0.0000 σ=0.9933 Δφ=3.18e-01 Δf=5.32e-05
## Iter 40: μ₀= -0.0000 σ=0.9933 Δφ=3.78e-01 Δf=2.22e-05
## Iter 50: μ₀= -0.0000 σ=0.9933 Δφ=3.75e-01 Δf=9.20e-06
## Iter 60: μ₀= -0.0000 σ=0.9933 Δφ=4.22e-01 Δf=4.14e-06
## Iter 70: μ₀= -0.0000 σ=0.9933 Δφ=3.58e-01 Δf=2.13e-06
## Iter 80: μ₀= -0.0000 σ=0.9933 Δφ=2.75e-01 Δf=1.33e-06
## Iter 90: μ₀= -0.0000 σ=0.9933 Δφ=1.21e-01 Δf=5.80e-07
## Iter 97: μ₀= -0.0000 σ=0.9933 Δφ=0.00e+00 Δf=7.83e-08
## Converged!
##
## ═══════════════════════════════════════════════════════════════
## KPT-FFT RESULTS
## ═══════════════════════════════════════════════════════════════
## Estimated μ₀: -0.0000
## KPT estimate (mean E[S]): -0.0000
## Noise σ: 0.9933
## φ (pooled) deviation from uniform: 0.0000
## S average variation: 0.0000
## Iterations: 97
##
## Note: Minimal phase structure detected.
## ═══════════════════════════════════════════════════════════════
##
## === KPT-GEM RESULTS ===
## Estimated μ₀: -0.0000
## Noise σ: 0.9933
## φ deviation from uniform: 0.0000
Finally, let’s test the KPT (GEM and FFT) algorithm predictions with the Kepler data.
# ==============================================================================
# PREDICTION FUNCTIONS FOR STELLAR KPT
# ==============================================================================
#' Generate predictions from KPT-GEM fit
#'
#' @param fit Output from kpt_gem_stellar()
#' @param n_new Number of new windows to predict
#' @param n_draw Number of Monte Carlo draws
kpt_gem_predict <- function(fit, n_new, n_draw = 5000) {
phi <- fit$phi
S_avg <- colMeans(fit$S)
L <- fit$L_theta
# Convert φ to probability
p_theta <- pmax(phi, 1e-10)
p_theta <- p_theta / sum(p_theta)
draws <- matrix(NA, n_draw, n_new)
for (k in 1:n_new) {
theta_idx <- sample(L, n_draw, replace = TRUE, prob = p_theta)
draws[, k] <- S_avg[theta_idx] + rnorm(n_draw, 0, fit$sigma)
}
list(
point = rep(fit$kpt_estimate, n_new),
mean = colMeans(draws),
sd = apply(draws, 2, sd),
draws = draws,
pi_low = apply(draws, 2, quantile, 0.025),
pi_high = apply(draws, 2, quantile, 0.975)
)
}
#' Generate predictions from KPT-FFT fit
#'
#' @param fit Output from kpt_fft_stellar()
#' @param n_new Number of new windows to predict
#' @param n_draw Number of Monte Carlo draws
kpt_fft_predict <- function(fit, n_new, n_draw = 5000) {
phi <- fit$phi_pooled
S_avg <- colMeans(fit$S)
L <- fit$L_theta
p_theta <- pmax(phi, 1e-10)
p_theta <- p_theta / sum(p_theta)
draws <- matrix(NA, n_draw, n_new)
for (k in 1:n_new) {
theta_idx <- sample(L, n_draw, replace = TRUE, prob = p_theta)
draws[, k] <- S_avg[theta_idx] + rnorm(n_draw, 0, fit$sigma)
}
list(
point = rep(fit$kpt_estimate, n_new),
mean = colMeans(draws),
sd = apply(draws, 2, sd),
draws = draws,
pi_low = apply(draws, 2, quantile, 0.025),
pi_high = apply(draws, 2, quantile, 0.975)
)
}
# KPT-GEM & KPT-FFT Predictions
#' @param fit Output from kpt_gem_stellar()
#' @param n_new Number of new windows to predict
#' @param n_draw Number of Monte Carlo draws
# pred_kpt_gem <- kpt_gem_predict(fit=fit_gem_kepler, n_new=10, n_draw = 5000)
# pred_kpt_fft <- kpt_fft_predict(fit=fit_fft_kepler, n_new=10, n_draw = 5000)
# PLOT KPT-GEM and KPT-FFT Predictions
# ==============================================================================
# VISUALIZE KPT PREDICTIONS
# ==============================================================================
# Define test set (last n_test windows)
n_test <- 10 # or however many you held out
n_train <- kpt_kepler$n_windows - n_test
# Get observed test values
Y_test <- kpt_kepler$Y[, (n_train + 1):kpt_kepler$n_windows]
test_observed <- colMeans(Y_test, na.rm = TRUE)
test_sd <- apply(Y_test, 2, sd, na.rm = TRUE)
t_test <- kpt_kepler$t_center[(n_train + 1):kpt_kepler$n_windows]
# Generate predictions
pred_kpt_gem <- kpt_gem_predict(fit=fit_gem_kepler, n_new = n_test, n_draw = 5000)
pred_kpt_fft <- kpt_fft_predict(fit=fit_fft_kepler, n_new = n_test, n_draw = 5000)
# Plot
par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))
###########################################################
# KPT-GEM
# 1. Predictions vs Observed
plot(1:n_test, test_observed, pch = 19, cex = 1.5, col = "black",
ylim = range(c(test_observed, pred_kpt_gem$pi_low, pred_kpt_gem$pi_high)),
main = "KPT-GEM Predictions vs Observed",
xlab = "Test Window", ylab = "Standardized Variability")
# Add prediction intervals
arrows(1:n_test, pred_kpt_gem$pi_low, 1:n_test, pred_kpt_gem$pi_high,
length = 0.05, angle = 90, code = 3, col = "forestgreen", lwd = 2)
points(1:n_test, pred_kpt_gem$mean, pch = 17, col = "forestgreen", cex = 1.2)
# Reference line at 0 (mean of standardized data)
abline(h = 0, lty = 2, col = "gray50")
legend("topright", c("Observed", "KPT Prediction", "95% PI"),
pch = c(19, 17, NA), lty = c(NA, NA, 1), col = c("black", "forestgreen", "forestgreen"),
pt.cex = c(1.5, 1.2, NA), lwd = c(NA, NA, 2))
# 2. Coverage check
coverage <- mean(test_observed >= pred_kpt_gem$pi_low &
test_observed <= pred_kpt_gem$pi_high)
# Residuals
residuals <- test_observed - pred_kpt_gem$mean
plot(1:n_test, residuals, pch = 19, col = "steelblue",
main = sprintf("Residuals (Coverage: %.0f%%)", 100 * coverage),
xlab = "Test Window", ylab = "Observed - Predicted")
abline(h = 0, lty = 2)
abline(h = c(-1.96, 1.96), lty = 3, col = "red")##
## === PREDICTION SUMMARY ===
## RMSE: 0.0114
## Bias: -0.0064
## 95% PI Coverage: 100%
###########################################################
# KPT-FFT
# 1. Predictions vs Observed
plot(1:n_test, test_observed, pch = 19, cex = 1.5, col = "black",
ylim = range(c(test_observed, pred_kpt_fft$pi_low, pred_kpt_gem$pi_high)),
main = "KPT-GEM Predictions vs Observed",
xlab = "Test Window", ylab = "Standardized Variability")
# Add prediction intervals
arrows(1:n_test, pred_kpt_fft$pi_low, 1:n_test, pred_kpt_fft$pi_high,
length = 0.05, angle = 90, code = 3, col = "forestgreen", lwd = 2)
points(1:n_test, pred_kpt_fft$mean, pch = 17, col = "forestgreen", cex = 1.2)
# Reference line at 0 (mean of standardized data)
abline(h = 0, lty = 2, col = "gray50")
legend("topright", c("Observed", "KPT-FFT Prediction", "95% PI"),
pch = c(19, 17, NA), lty = c(NA, NA, 1), col = c("black", "forestgreen", "forestgreen"),
pt.cex = c(1.5, 1.2, NA), lwd = c(NA, NA, 2))# 2. Coverage check
coverage <- mean(test_observed >= pred_kpt_fft$pi_low &
test_observed <= pred_kpt_fft$pi_high)
# Residuals
residuals <- test_observed - pred_kpt_fft$mean
plot(1:n_test, residuals, pch = 19, col = "steelblue",
main = sprintf("KPT-FFT: Residuals (Coverage: %.0f%%)", 100 * coverage),
xlab = "Test Window", ylab = "(KPT-FFT) Observed - Predicted")
abline(h = 0, lty = 2)
abline(h = c(-1.96, 1.96), lty = 3, col = "red")##
## === KPT-FFT PREDICTION SUMMARY ===
## RMSE: 0.0066
## Bias: -0.0029
## 95% PI Coverage: 100%
par(mfrow = c(2, 2))
# 1. Convergence
plot(fit_gem_kepler$history$iter, fit_gem_kepler$history$mu_0, type = "l",
col = "green", lwd = 4,
main = "KPT-GEM/KPT-FFT: μ₀ Convergence", xlab = "Iteration", ylab = "μ₀")
lines(fit_fft_kepler$history$iter, fit_fft_kepler$history$mu_0,
col = "red", lwd = 2)
legend("bottomright", legend = c("KPT-GEM", "KPT-FFT"),
col = c("green", "red"), lwd = 3)
grid()
# 2. Learned φ(θ)
plot(fit_gem_kepler$theta, fit_gem_kepler$phi, type = "l", col = "green", lwd = 3,
main = "Learned Phase Distribution φ(θ)",
xlab = "θ", ylab = "φ(θ)", xaxt = "n")
lines(fit_fft_kepler$theta, fit_fft_kepler$phi_pooled,
col = "red", lwd = 2)
legend("bottomright", legend = c("KPT-GEM", "KPT-FFT"),
col = c("green", "red"), lwd = 3)
axis(1, at = c(0, pi/2, pi, 3*pi/2, 2*pi),
labels = c("0", "π/2", "π", "3π/2", "2π"))
abline(h = 1, lty = 2, col = "gray")
grid()
# 3. Kime surface (heatmap) -- KPT GEM and FFT
# image(fit_kepler$theta, 1:ncol(fit_kepler$S), t(fit_kepler$S),
# main = "Kime Surface S(θ, t)",
# xlab = "θ", ylab = "Time Window", xaxt = "n",
# col = viridis::viridis(100))
# axis(1, at = c(0, pi/2, pi, 3*pi/2, 2*pi),
# labels = c("0", "π/2", "π", "3π/2", "2π"))
# 3. Kime surface (heatmap)
image( # KPT-GEM
x = fit_gem_kepler$theta, # Length 64
y = 1:nrow(fit_gem_kepler$S), # Length 19 (Matches columns of t(S))
z = t(fit_gem_kepler$S), # Dim [64, 19]
main = "(KPT-GEM) Kime Surface S(θ, t)",
xlab = "θ",
ylab = "Time Window",
col = viridis::viridis(100)
)
filled.contour(
x = fit_gem_kepler$theta,
y = 1:nrow(fit_gem_kepler$S),
z = t(fit_gem_kepler$S),
color.palette = viridis::viridis,
xlab = "Theta",
ylab = "Time Window"
)image( # KPT-FFT
x = fit_fft_kepler$theta, # Length 64
y = 1:nrow(fit_fft_kepler$S), # Length 19 (Matches columns of t(S))
z = t(fit_fft_kepler$S), # Dim [64, 19]
main = "(KPT-FFT) Kime Surface S(θ, t)",
xlab = "θ",
ylab = "Time Window",
col = viridis::viridis(100)
)
filled.contour(
x = fit_fft_kepler$theta,
y = 1:nrow(fit_fft_kepler$S),
z = t(fit_fft_kepler$S),
color.palette = viridis::viridis,
xlab = "Theta",
ylab = "Time Window"
)# 4. Star-level variability vs mass
plot(kpt_kepler$metadata$mass, rowMeans(kpt_kepler$Y_raw),
pch = 19, col = alpha("steelblue", 0.6),
main = "Variability vs Star-Mass",
xlab = "Star Mass", ylab = "Mean Variance")
grid()
par(mfrow = c(1, 1))## ═══════════════════════════════════════════════════════════════════
## INTERPRETATION: KPT ON STELLAR LIGHT CURVES
## ═══════════════════════════════════════════════════════════════════
##
## 1. KPT-GEM PHASE STRUCTURE ANALYSIS
if (fit_gem_kepler$phi_uniform_dist < 0.2) {
cat(" φ(θ) ≈ uniform: No strong phase structure detected.\n")
cat(" Interpretation: Stellar variability is largely star-independent.\n")
cat(" The 'kime-phase' θ does not significantly affect variability.\n")
} else {
cat(" φ(θ) ≠ uniform: Significant phase structure detected!\n")
cat(" Interpretation: There may be systematic patterns in stellar\n")
cat(" variability that depend on some hidden 'phase' variable.\n")
}## φ(θ) ≈ uniform: No strong phase structure detected.
## Interpretation: Stellar variability is largely star-independent.
## The 'kime-phase' θ does not significantly affect variability.
##
## 2. PHYSICAL INTERPRETATION
## In this simulation, any uncovered phase θ could represent:
## - Stellar rotation phase (check against simulated random phases)
## - Activity cycle phase
## - Instrumental systematics common to all stars
##
## 3. KPT VALUE FOR STELLAR SCIENCE
## - Characterizes the 'typical' stellar variability pattern
## - Separates star-specific (replicate) from universal (process)
## - Could improve exoplanet transit detection
##
## 1. KPT-FFT PHASE STRUCTURE ANALYSIS
if (fit_fft_kepler$phi_uniform_dist < 0.2) {
cat(" φ(θ) ≈ uniform: No strong phase structure detected.\n")
cat(" Interpretation: Stellar variability is largely star-independent.\n")
cat(" The 'kime-phase' θ does not significantly affect variability.\n")
} else {
cat(" φ(θ) ≠ uniform: Significant phase structure detected!\n")
cat(" Interpretation: There may be systematic patterns in stellar\n")
cat(" variability that depend on some hidden 'phase' variable.\n")
}## φ(θ) ≈ uniform: No strong phase structure detected.
## Interpretation: Stellar variability is largely star-independent.
## The 'kime-phase' θ does not significantly affect variability.
##
## ═══════════════════════════════════════════════════════════════════
These result are not a falsification of KPT, as they represent a valid (and expected) result.
| Finding | Meaning |
|---|---|
| \(\varphi(\theta)\approx\) uniform | No hidden phase variable affects stellar variability |
| \(\mathbb{S}(\theta) \approx\) constant | Variability doesn’t depend on \(\theta\) |
| KPT estimate \(\approx 0\) | Correct! Standardized data has mean \(0\) |
| \(95\% \ PI \approx [-2, +2]\) | Matches standardized \(SD\approx 1\) |
In other words, KPT modeling works correctly and results indicate that
The resulting KPT-estimate of the phase law, \(\varphi\approx\) uniform means that KPT found no exploitable structure to improve predictions. The “null model”, simple Gaussian, is sufficient. This is like finding no significant effect in a well-designed experiment, it’s still represents a valuable result.
In practice, to uncover new structure, KPT would need to find \(\varphi \not=\) uniform if stars with certain rotation phases had systematically different variability, or if there were hidden instrumental effects correlated across stars or some physical process created phase-locked variability patterns.
The fact that KPT did not uncover new structure is still informative, i.e., stellar noise appears to be genuinely stochastic.
##
## === NEXT STEPS FOR REAL DATA ===
##
## 1. DOWNLOAD ACTUAL KEPLER DATA:
## - Use lightkurve Python package via reticulate
## - Select 100-500 Sun-like stars from KIC
## - Use PDC-corrected flux for cleaned data
##
## 2. COMPARE WITH REAL STELLAR VARIABILITY:
## - Known rotation periods from McQuillan et al. (2014)
## - Activity indices (S-index, log R'HK)
## - Asteroseismic properties
##
## 3. VALIDATE KPT PREDICTIONS:
## - Train on subset of stars
## - Predict variability of held-out stars
## - Compare with actual variability metrics
##
## 4. EXTEND TO TESS:
## - 10× more stars
## - Multiple sectors for some stars
## - Different stellar populations