1 Spacetime IID vs. Spacekime Sampling

For all natural spacetime processes, various population characteristics like the mean, variance, range, and quantiles can be estimated by collecting independent and identically distributed (IID) samples. These samples represent observed data that is traditionally used to obtain sample-driven estimates of the specific population characteristics via standard formulas like the sample arithmetic average, variance, range, quantiles, etc. The latter approximate their population counterparts and form the basis for classical parametric and non-parametric statistical inference.

Typically, reliable spacetime statistical inference is conditional on the distribution of the native process as well as a sample-size reflecting the characteristics of the phenomenon. We will demonstrate that spacekime analytics can be equally effective with measuring a single spacetime observation and having a reasonable estimates of the unobserved process kime-phases.

Without loss of generality, suppose we have a pair of cohorts \(A\) and \(B\) and we obtain a series of measurements \(\{X_{A,i}\}_{i=1}^{n_A}\) and \(\{X_{B,i}\}_{i=1}^{n_B}\), respectively. Obviously the relations between the cohorts could widely vary, from being samples of the same process, to being related or completely independent.

To allow us to examine the extreme cases of pairing (1) IID cohorts (\(A\) and \(B\)), and (2) independent but differently distributed cohorts (\(A\) and \(C\)). The latter case may be thought of as a split of the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) groups. This design allows us to compare the classical spacetime-derived population characteristics of cohort \(A\) to their spacekime-reconstructed counterparts obtained using a single random kime-magnitude observation from \(A\) and kime-phases estimates derived from cohorts \(B\), \(C\) or \(D\).

1.1 fMRI timeseries

The demonstration below is based on a functional magnetic resonance imaging (fMRI) data, which is a 4D hypervolume with intensities representing the blood oxygenation level dependence at a specific spacetime location \((x,y,z,t)\). For simplicity, we will only focus on two fixed spatial locations with varying intensity distributions.

library(EBImage)
require(brainR)
library(spatstat) 
library(ggplot2)
library(kSamples)
library(reshape2)
library(beanplot)
library(rstanarm)

fMRIURL <- "http://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz"
fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz")
download.file(fMRIURL, dest=fMRIFile, quiet=TRUE)
(fMRIVolume <- readNIfTI(fMRIFile, reorient=FALSE))

## NIfTI-1 format
##   Type            : nifti
##   Data Type       : 4 (INT16)
##   Bits per Pixel  : 16
##   Slice Code      : 0 (Unknown)
##   Intent Code     : 0 (None)
##   Qform Code      : 1 (Scanner_Anat)
##   Sform Code      : 0 (Unknown)
##   Dimension       : 64 x 64 x 21 x 180
##   Pixel Dimension : 4 x 4 x 6 x 3
##   Voxel Units     : mm
##   Time Units      : sec

# dimensions: 64 x 64 x 21 x 180 ; 4mm x 4mm x 6mm x 3 sec 

fMRIVolDims <- dim(fMRIVolume); fMRIVolDims

## [1]  64  64  21 180

# time_dim <- fMRIVolDims[4]; time_dim ## 180

# 2. extract the time-corse of 1D mid-axial slice (3D) hypervolume
xA_fMRI_1D_x20_y20_z11 <- fMRIVolume[20, 20, 11, ]; length(xA_fMRI_1D_x20_y20_z11)   #  180

## [1] 180

hist(xA_fMRI_1D_x20_y20_z11)

xB_fMRI_1D_x30_y30_z13 <- fMRIVolume[30, 30, 13, ]; length(xB_fMRI_1D_x30_y30_z13)   #  180

## [1] 180

# hist(xB_fMRI_1D_x30_y30_z13)

# Now, combine your two 1D timeseries into one dataframe for joint hist plotting as densities.  
# First make a new column in each that will be 
# a variable to identify where they came from later.
xA_df <- as.data.frame(xA_fMRI_1D_x20_y20_z11)
colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
xB_df <- as.data.frame(xB_fMRI_1D_x30_y30_z13)
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"

# and combine into your new data frame vegLengths
xA_xB_df <- rbind(xA_df, xB_df)


ggplot(xA_xB_df, aes(value, fill = cohort)) + 
  geom_density(alpha = 0.5, size=1.2) +
  theme(text = element_text(size=20)) +
  xlim(c(10200, 12000))

Clearly the intensities of cohorts \(A\) and \(B\) are independent and follow different distribution. We’ll split the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) subgroups. Then we will:

transform all four cohorts into Fourier k-space,
iteratively randomly sample single observations from cohort \(C\),
reconstruct the data into spacetime using alternative kime-phase estimates derived from cohorts \(B\) and \(D\), and
compute and compare the classical spacetime-derived population characteristics of cohort \(A\) to their counterparts obtained using a single \(C\) kime-radial measurements paired with \(B\) and \(D\) kime-phases.

# Generic function to Transform Data to/from k-space (Space/Fourier domain)
kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) {
  # ForwardFT (rawData, FALSE, NULL)
  # InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL)
  FT_data <- array(complex(), length(data))
  mag_FT_data <- array(complex(), length(data))
  phase_FT_data <- array(complex(), length(data))
  IFT_reconPhases_data <- array(complex(), length(data))

  if (inverse == FALSE | is.null(reconPhases)) {
      FT_data <- fft(data, inverse)
      X2 <- FT_data
      mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2) 
      phase_FT_data <- atan2(Im(X2), Re(X2)) 
  }
  else {  # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators
      Real <- data * cos(reconPhases)  
      Imaginary <- data * sin(reconPhases) 
      IFT_reconPhases_data <- 
          Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data))
  }

    ######### Test the FT-IFT analysis-synthesis back-and-forth transform process 
    #         to confirm calculations
    # X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2); 
    # phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2)); 
    # Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1])
    # Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1])
    # man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
    # ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!")
    #########
  
    if (inverse == FALSE | is.null(reconPhases)) {
      return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data))
      # Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases
    }
    else {
      return(IFT_reconPhases_data)
      # Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data
    }
}

# 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups.
subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8))  
# 80% training + 20% testing
xC_fMRI_train <- xA_df$value [subset_int]; length(xC_fMRI_train) # 144

## [1] 144

xD_test <- xA_df$value [-subset_int]; length(xD_test)  # 36

## [1] 36

# 2. Transform all four cohorts into Fourier k-space
# xA, xB, xC_fMRI_train; xD_test
xA <- xA_fMRI_1D_x20_y20_z11; length(xA)   #  180

## [1] 180

xB <- xB_fMRI_1D_x30_y30_z13; length(xB)   #  180

## [1] 180

ft_xA <- fft(xA); ft_xB <- fft(xB)
ft_xC_fMRI_train <- fft(xC_fMRI_train); ft_xD_test <- fft(xD_test); 

# Magnitudes and Phases: Phase  <- atan(Im(img_ff)/Re(img_ff))
mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2)
mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2)
mag_ft_xC_fMRI_train <- sqrt(Re(ft_xC_fMRI_train)^2+Im(ft_xC_fMRI_train)^2)
mag_ft_xD_test <- sqrt(Re(ft_xD_test)^2+Im(ft_xD_test)^2)

phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA))
phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB))
phase_ft_xC_fMRI_train <- atan2(Im(ft_xC_fMRI_train), Re(ft_xC_fMRI_train))
phase_ft_xD_test <- atan2(Im(ft_xD_test), Re(ft_xD_test))

# Double-Check FT-IFT==I   ImplicitlyInvert the FT (IFT)
fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) }
head(Re(fftinv(ft_xA))); head(xA)

## [1] 10620 10483 10519 10661 10598 10780

## [1] 10620 10483 10519 10661 10598 10780

# 3. Iteratively randomly sample single observations from cohort $C$,
N <- 30 # to 30 simulations
# take a random sample of size N (without replacement) from $C$
N_sampleIndx <- sample(1:length(xC_fMRI_train), N, replace=FALSE)
xC_fMRI_sampleN <- xC_fMRI_train[N_sampleIndx]
ft_xC_fMRI_sampleN_mag <- mag_ft_xC_fMRI_train[N_sampleIndx]

# 4. reconstruct the $C$ data into spacetime using a single ft_xC_fMRI_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$ 
# for each ft_xC_fMRI_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_fMRI_sampleN_PhaseB ift_ft_xC_fMRI_sampleN_PhaseD
ift_ft_xC_fMRI_1sampleN_PhaseB <- 
      array(dim=c(length(xC_fMRI_train), length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI_1sampleN_PhaseD <-
      array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI_1sampleN_PhaseC <-
      array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI <- array(dim=length(xC_fMRI_train))
dim(ift_ft_xC_fMRI_1sampleN_PhaseB) # [1] Time=144 Samples_N=30

## [1] 144  30

for (i in 1:N) {
  ift_ft_xC_fMRI_1sampleN_PhaseB[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)),
                       TRUE, phase_ft_xB[1:length(xC_fMRI_train)]))
  ift_ft_xC_fMRI_1sampleN_PhaseD[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i],
                           length(phase_ft_xD_test)), TRUE,
                       phase_ft_xD_test[1:length(phase_ft_xD_test)]))
  ift_ft_xC_fMRI_1sampleN_PhaseC[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)),
                       TRUE, phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)]))
}
ift_ft_xC_fMRI <- Re(kSpaceTransform(mag_ft_xC_fMRI_train, TRUE,
                       phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)]))
head(xC_fMRI_train) == head(ift_ft_xC_fMRI)

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

# 5.  compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases.
# Data = xC_fMRI_train, ift_ft_xC_fMRI_1sampleN_PhaseB, ift_ft_xC_fMRI_1sampleN_PhaseD
length(xC_fMRI_train) == length(ift_ft_xC_fMRI_1sampleN_PhaseB[ , 1])

## [1] TRUE

summary(scale(xC_fMRI_train))

##        V1          
##  Min.   :-2.27528  
##  1st Qu.:-0.72802  
##  Median :-0.01324  
##  Mean   : 0.00000  
##  3rd Qu.: 0.69867  
##  Max.   : 3.53484

summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseC[ , 15]))

##        V1         
##  Min.   :-2.3609  
##  1st Qu.:-0.6187  
##  Median : 0.0327  
##  Mean   : 0.0000  
##  3rd Qu.: 0.7080  
##  Max.   : 3.4877

summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseB[ , 15]))

##        V1          
##  Min.   :-3.32577  
##  1st Qu.:-0.56299  
##  Median :-0.03926  
##  Mean   : 0.00000  
##  3rd Qu.: 0.60848  
##  Max.   : 2.56786

summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseD[ , 15]))

##        V1          
##  Min.   :-2.72484  
##  1st Qu.:-0.62653  
##  Median : 0.04799  
##  Mean   : 0.00000  
##  3rd Qu.: 0.68595  
##  Max.   : 2.43560

1.1.1 Data Visualization

# Plot all histograms as densities
ift_ft_xC_fMRI_1sampleN_PhaseC_df <- 
  as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC))
# colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
#  xA_scale_df <- as.data.frame(scale(xA_df$value))
#  colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA"
xB_df <- as.data.frame(scale(xB_fMRI_1D_x30_y30_z13))
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"

# 
# and combine into your new data frame Lengths
xA_xB_df <- rbind(xA_df, xB_df)


ggplot(xA_xB_df, aes(value, fill = cohort)) + 
  geom_density(alpha = 0.5, size=1.2) +
  theme(text = element_text(size=20)) +
  xlim(c(10200, 12000))

## Warning: Removed 180 rows containing non-finite values (stat_density).

length(xC_fMRI_train); dim(ift_ft_xC_fMRI_1sampleN_PhaseB)

## [1] 144

## [1] 144  30

dim(ift_ft_xC_fMRI_1sampleN_PhaseC); dim(ift_ft_xC_fMRI_1sampleN_PhaseD)

## [1] 144  30

## [1] 144  30

# Compute the averages accross all N=30 experiments for the B, C & D reconstructions
ift_ft_xC_fMRI_1sampleN_PhaseB_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseB, 1, mean)
ift_ft_xC_fMRI_1sampleN_PhaseC_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseC, 1, mean)
ift_ft_xC_fMRI_1sampleN_PhaseD_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseD, 1, mean)

# Plot 4 density curves (orig=xC_fMRI and 3 reconstructions from B, C and D)
xC_fMRI_train_scale_df <- as.data.frame(scale(xC_fMRI_train))
colnames(xC_fMRI_train_scale_df) <- "value"; xC_fMRI_train_scale_df$series <- "xC_fMRI_original"

ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseB_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB"

ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC"

ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseD_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD"

# and combine into your new data frame vegLengths
xC_fMRI_SK_Phases_B_C_D_df <- rbind(xC_fMRI_train_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df, 
             ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df)

# library(ggplot2)
ggplot(xC_fMRI_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
  geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) +  # position = "stack"
  theme(text = element_text(size=20)) +
  scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
  geom_line(data=xC_fMRI_train_scale_df, stat = "density", color="purple", lty=4, lwd=2) +
  ## guides(color = guide_legend(order=1)) +
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank())

  # theme(legend.position="bottom")

1.1.1.1 Violin Plot

ggplot(xC_fMRI_SK_Phases_B_C_D_df,aes(x=series, y=value, fill=series)) + 
  geom_violin(trim=FALSE) +
  geom_boxplot(width=0.1) + 
  theme_bw()

1.1.1.2 Empirical CDF

ggplot(xC_fMRI_SK_Phases_B_C_D_df,aes(x=value, color=series)) +
  stat_ecdf(size = 0.5)

1.1.1.3 Beanplot

boxplot(xC_fMRI_SK_Phases_B_C_D_df$value ~ xC_fMRI_SK_Phases_B_C_D_df$series)
beanplot(xC_fMRI_SK_Phases_B_C_D_df$value ~ xC_fMRI_SK_Phases_B_C_D_df$series,
         border = "#CAB2D6",
         col = c("#CAB2D6", "#33A02C", "#B2DF8A"),
         side="second",add = T)

1.2 IID Simulation

The following simulation example generates two mixture distribution random samples each of \(n=10,000\) observations, \(\{X_{A,i}\}_{i=1}^{n_A}\), where \(X_{A,i} = 0.3U_i + 0.7V_i\), \(U_i \sim N(0,1)\) and \(V_i \sim N(5,3)\), and \(\{X_{B,i}\}_{i=1}^{n_B}\), where \(X_{B,i} = 0.4P_i + 0.6Q_i\), \(P_i \sim N(20,20)\) and \(Q_i \sim N(100,30)\).

n=10000
mu1 <- 0; mu2 <- 5
sig1 <- 1; sig2 <- 3
weight <- 0.7   

mixedDistFunc <- function (n, weight, mu1, mu2, sig1, sig2) {
    set.seed(1234); U <- rnorm(n, mean=mu1, sd = sig1)
    set.seed(1234); V <- rnorm(n,mean=mu2, sd = sig2)
    # randomly choose U or V
    set.seed(1234); wght <- rbinom(n, size=1, prob=weight) 
    X <- U*(1 - wght) + V*wght 
}

xA <- mixedDistFunc(n=n, weight, mu1, mu2, sig1, sig2)
hist(xA, freq = F)

length(xA)   #  n=10,000

## [1] 10000

xB <- mixedDistFunc(n=n, 0.6, 20, 100, 20, 30)
hist(xB, freq = F)

length(xB)

## [1] 10000

# Now, combine your two univariate sets into one dataframe for joint hist plotting as densities.  
# First make a new column in each that will be 
# a variable to identify where they came from later.
xA_df <- as.data.frame(xA); colnames(xA_df)<-"value"; xA_df$cohort<-"xA"
xB_df <- as.data.frame(xB); colnames(xB_df)<-"value"; xB_df$cohort<-"xB"

# and combine into your new data frame vegLengths
xA_xB_df <- rbind(xA_df, xB_df)

Figure 4.1

ggplot(xA_xB_df, aes(value, fill = cohort)) + 
  geom_density(alpha = 0.5, size=1.2) +
  theme(text = element_text(size=20))

## quartz_off_screen 
##                 2

Clearly the intensities of cohorts \(A\) and \(B\) are independent and follow different distributions. We’ll split the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) subgroups, and then:

transform all four cohorts into Fourier k-space,
iteratively randomly sample single observations from cohort \(C\),
reconstruct the data into spacetime using a single kime-magnitude value and alternative kime-phase estimates derived from cohorts \(B\), \(C\), and \(D\), and
compute and compare the classical spacetime-derived population characteristics of cohort \(A\) to their spacekime counterparts obtained using a single \(C\) kime-magnitude paired with \(B\), \(C\), or \(D\) kime-phases.

# Generic function to Transform 1D Data to/from k-space (Space/Fourier domain)
kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) {
  # ForwardFT (rawData, FALSE, NULL)
  # InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL)
  FT_data <- array(complex(), length(data))
  mag_FT_data <- array(complex(), length(data))
  phase_FT_data <- array(complex(), length(data))
  IFT_reconPhases_data <- array(complex(), length(data))

  if (inverse == FALSE | is.null(reconPhases)) {
      FT_data <- fft(data, inverse)
      X2 <- FT_data
      mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2) 
      phase_FT_data <- atan2(Im(X2), Re(X2)) 
  }
  else {  # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators
      Real <- data * cos(reconPhases)  
      Imaginary <- data * sin(reconPhases) 
      IFT_reconPhases_data <- 
          Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data))
  }

    ######### Test the FT-IFT analysis-synthesis back-and-forth transform process 
    #         to confirm calculations
    # X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2); 
    # phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2)); 
    # Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1])
    # Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1])
    # man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
    # ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!")
    #########
  
    if (inverse == FALSE | is.null(reconPhases)) {
      return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data))
      # Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases
    }
    else {
      return(IFT_reconPhases_data)
      # Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data
    }
}

# 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups.
subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8))  
# 80% training + 20% testing
xC <- xA_df$value [subset_int]; length(xC) # 8000

## [1] 8000

xD <- xA_df$value [-subset_int]; length(xD)  # 2000

## [1] 2000

# 2. Transform all four cohorts into Fourier k-space
# xA, xB, xC (train), xD (test)
ft_xA <- fft(xA); ft_xB <- fft(xB)
ft_xC <- fft(xC); ft_xD <- fft(xD)

# Magnitudes and Phases: Phase  <- atan(Im(img_ff)/Re(img_ff))
mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2)
mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2)
mag_ft_xC <- sqrt(Re(ft_xC)^2+Im(ft_xC)^2)
mag_ft_xD <- sqrt(Re(ft_xD)^2+Im(ft_xD)^2)

phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA))
phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB))
phase_ft_xC <- atan2(Im(ft_xC), Re(ft_xC))
phase_ft_xD <- atan2(Im(ft_xD), Re(ft_xD))

# Double-Check FT-IFT==I   ImplicitlyInvert the FT (IFT)
fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) }
head(Re(fftinv(ft_xA))); head(xA)

## [1]  1.3788028  5.8322877  8.2533235 -2.0370931  0.4291247  6.5181677

## [1]  1.3788028  5.8322877  8.2533235 -2.0370931  0.4291247  6.5181677

# 3. Iteratively randomly sample single observations from cohort $C$,
N <- 30 #  30 simulations
# take a random sample of size N (without replacement) from $C$
set.seed(1234); N_sampleIndx <- sample(1:length(xC), N, replace=FALSE)
xC_sampleN <- xC[N_sampleIndx]
ft_xC_mag <- mag_ft_xC[N_sampleIndx]

# 4. reconstruct the $C$ data into spacetime using a single ft_xC_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$ 
# for each ft_xC_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_sampleN_PhaseB ift_ft_xC_sampleN_PhaseD
ift_ft_xC_1sampleN_PhaseB <- 
      array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC_1sampleN_PhaseD <-
      array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC_1sampleN_PhaseC <-
      array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC <- array(dim=length(xC))
dim(ift_ft_xC_1sampleN_PhaseB) # [1] Size=8000 Samples_N=30

## [1] 8000   30

for (i in 1:N) {
  ift_ft_xC_1sampleN_PhaseB[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)),
                       TRUE, phase_ft_xB[1:length(xC)]))
  ift_ft_xC_1sampleN_PhaseD[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_mag[i], length(phase_ft_xD)), 
                       TRUE, phase_ft_xD[1:length(phase_ft_xD)]))
  ift_ft_xC_1sampleN_PhaseC[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)),
                       TRUE, phase_ft_xC[1:length(xC)]))
}
ift_ft_xC <- Re(kSpaceTransform(mag_ft_xC, TRUE,phase_ft_xC[1:length(xC)]))
head(xC) - head(ift_ft_xC) # roundoff should be ~ 0

## [1] -2.664535e-15 -1.554312e-15 -4.440892e-16 -5.773160e-15  0.000000e+00
## [6] -8.881784e-16

# 5.  compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases.
# Data = xC_train, ift_ft_xC_1sampleN_PhaseB, ift_ft_xC_1sampleN_PhaseD
length(xC) == length(ift_ft_xC_1sampleN_PhaseB[ , 1])

## [1] TRUE

summary(scale(xC))

##        V1          
##  Min.   :-2.38784  
##  1st Qu.:-0.88609  
##  Median :-0.03893  
##  Mean   : 0.00000  
##  3rd Qu.: 0.75821  
##  Max.   : 3.59925

summary(scale(ift_ft_xC_1sampleN_PhaseC[ , 15]))

##        V1          
##  Min.   :-2.52901  
##  1st Qu.:-0.76221  
##  Median :-0.05584  
##  Mean   : 0.00000  
##  3rd Qu.: 0.72999  
##  Max.   : 3.73114

summary(scale(ift_ft_xC_1sampleN_PhaseB[ , 15]))

##        V1           
##  Min.   :-3.798440  
##  1st Qu.:-0.636799  
##  Median : 0.009279  
##  Mean   : 0.000000  
##  3rd Qu.: 0.645119  
##  Max.   : 3.986702

summary(scale(ift_ft_xC_1sampleN_PhaseD[ , 15]))

##        V1          
##  Min.   :-2.66007  
##  1st Qu.:-0.79651  
##  Median :-0.08165  
##  Mean   : 0.00000  
##  3rd Qu.: 0.73477  
##  Max.   : 3.39448

1.2.1 Data Visualization

# Plot all histograms as densities
ift_ft_xC_1sampleN_PhaseC_df <- 
  as.data.frame(scale(ift_ft_xC_1sampleN_PhaseC))
# colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
#  xA_scale_df <- as.data.frame(scale(xA_df$value))
#  colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA"
xB_df <- as.data.frame(scale(xB))
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"

# 
# and combine into your new data frame Lengths
xA_xB_df <- rbind(xA_df, xB_df)


ggplot(xA_xB_df, aes(value, fill = cohort)) + 
  geom_density(alpha = 0.5, size=1.2) +
  theme(text = element_text(size=20))

length(xC); dim(ift_ft_xC_1sampleN_PhaseB)

## [1] 8000

## [1] 8000   30

dim(ift_ft_xC_1sampleN_PhaseC); dim(ift_ft_xC_1sampleN_PhaseD)

## [1] 8000   30

## [1] 8000   30

# Compute the averages accross all N=30 experiments for the B, C & D spacekime (IFT) reconstructions
ift_ft_xC_1sampleN_PhaseB_avg <- apply(ift_ft_xC_1sampleN_PhaseB, 1, mean)
ift_ft_xC_1sampleN_PhaseC_avg <- apply(ift_ft_xC_1sampleN_PhaseC, 1, mean)
ift_ft_xC_1sampleN_PhaseD_avg <- apply(ift_ft_xC_1sampleN_PhaseD, 1, mean)

# Plot 4 density curves (orig=xC and 3 reconstructions from B, C and D)
xC_scale_df <- as.data.frame(xC)
colnames(xC_scale_df) <- "value"; xC_scale_df$series <- "xC_original"

ift_ft_xC_1sampleN_PhaseB_avg_scale_df <- 
    as.data.frame(ift_ft_xC_1sampleN_PhaseB_avg)
colnames(ift_ft_xC_1sampleN_PhaseB_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB"

ift_ft_xC_1sampleN_PhaseC_avg_scale_df <- 
    as.data.frame(ift_ft_xC_1sampleN_PhaseC_avg)
colnames(ift_ft_xC_1sampleN_PhaseC_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC"

ift_ft_xC_1sampleN_PhaseD_avg_scale_df <- 
      as.data.frame(ift_ft_xC_1sampleN_PhaseD_avg)
colnames(ift_ft_xC_1sampleN_PhaseD_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD"

# Combine into a new data frame xC_SK_Phases_B_C_D_df
xC_SK_Phases_B_C_D_df <- rbind(xC_scale_df, ift_ft_xC_1sampleN_PhaseB_avg_scale_df, 
             ift_ft_xC_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_1sampleN_PhaseD_avg_scale_df)

# library(ggplot2)
ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
  geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) +  # position = "stack"
  theme(text = element_text(size=20)) +
  scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
  geom_line(data=xC_scale_df, stat = "density", color="purple", lty=4, lwd=2) +
  ## guides(color = guide_legend(order=1)) +
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank()) # +

  # xlim(c(-1, 1))
  # theme(legend.position="bottom")

#ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
#  theme(text = element_text(size=20)) +
#  scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
#  geom_line(stat = "density", lty=4, lwd=2) +
#  ## guides(color = guide_legend(order=1)) +
#  theme(axis.title.x=element_blank(),axis.text.x=element_blank(), #axis.ticks.x=element_blank())

ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
     geom_line(aes(colour=series, lty=series), stat = "density", lwd=1.5) +
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseC_avg (reconstructed) CORRELATED?
cor(xC, ift_ft_xC_1sampleN_PhaseC_avg)  # [1] 0.8872053

## [1] 0.8844888

# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseD_avg (reconstructed) CORRELATED?
cor(xC, ift_ft_xC_1sampleN_PhaseD_avg)  # [1] 0.005248561

## [1] 0.01011428

# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseB_avg (reconstructed) CORRELATED?
cor(xC, ift_ft_xC_1sampleN_PhaseB_avg)  # [1] -0.001070121

## [1] 0.001782234

Figure 4.2

# Plot xC vs. ift_ft_xC_1sampleN_PhaseC_avg
plot(xC, ift_ft_xC_1sampleN_PhaseC_avg, xlab = "Original", ylab = "Reconstructed",
     main = "Spacekime signal reconstruction using \n a single spacetime observation and perfect kime-phases",
     cex=1.4)
abline(lm(ift_ft_xC_1sampleN_PhaseC_avg ~ xC), col="red", lwd=2)
text(0, 9, sprintf("Corr(Orig, Rec)=%s", 
                    format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2)))

## quartz_off_screen 
##                 2

#Additional quantitative measures quantifying the differences between distribution (original signal and spacekime reconstructions) include two-sample Kolmogorov-Smirnov (KS) test and the correlation coefficient. The KS test statistic (D) is the maximum distance between the estimated cumulative distribution functions and the corresponding p-value is the probability of seeing a test statistic as high or higher than the one observed given that the two samples were drawn from the same distribution.  In our case, comparing the distributions of the original data and its reconstruction using a single kime magnitude and the correct kime-phases yields a KS statistics D = 0.053875, and p_value= 1.647?e^(-10). This suggests very strong statistical evidence (due to the large sample size) and marginal practical difference between the real and reconstructed signals. Comparing a pair of reconstructions using a single kime-magnitude value and two independent kime-phase estimates (cohorts C and D) yields D = 0.017375, and p_value= 0.1786.

ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  scale(xC) and scale(ift_ft_xC_1sampleN_PhaseC_avg)
## D = 0.05775, p-value = 5.174e-12
## alternative hypothesis: two-sided

#      D = 0.053875, p-value = 1.647e-10, alternative hypothesis: two-sided
ks.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(ift_ft_xC_1sampleN_PhaseD_avg))

## Warning in ks.test(scale(ift_ft_xC_1sampleN_PhaseC_avg),
## scale(ift_ft_xC_1sampleN_PhaseD_avg)): p-value will be approximate in the
## presence of ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  scale(ift_ft_xC_1sampleN_PhaseC_avg) and scale(ift_ft_xC_1sampleN_PhaseD_avg)
## D = 0.014625, p-value = 0.3592
## alternative hypothesis: two-sided

#      D = 0.017375, p-value = 0.1786, alternative hypothesis: two-sided
# wilcox.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg)) # insignificant because of scaling
#library("goftest")
#null_distribution <- function (x) {
#  if (!is.na(approxfun(density(scale(xC)))(x))) {
#    return (approxfun(density(scale(xC)))(x))
#  } else { return (0) }
#}
# Cramer-Von Mises Test of Goodness-of-Fit
#cvm.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) # omega2 = 1854.8, p-value = 0.2955
# Anderson-Darling Test of Goodness-of-Fit
#ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) # 

# Anderson-Darling Test of Goodness-of-Fit
# install.packages("kSamples")  

ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)) # T.AD=19.18; asympt. P-value=8.578e-09

## 
## 
##  Anderson-Darling k-sample test.
## 
## Number of samples:  2
## Sample sizes:  8000, 8000
## Number of ties: 0
## 
## Mean of  Anderson-Darling  Criterion: 1
## Standard deviation of  Anderson-Darling  Criterion: 0.76131
## 
## T.AD = ( Anderson-Darling  Criterion - mean)/sigma
## 
## Null Hypothesis: All samples come from a common population.
## 
##               AD  T.AD  asympt. P-value
## version 1: 16.18 19.93        4.168e-09
## version 2: 16.20 19.94        4.291e-09

Figure 4.3

# PLOT            
# windows(width=10, height=8)
# quartz(width=10, height=8)
plot(density(scale(xC)), col="black", lty=1, lwd=3, xaxt="n", xlab = "Range",
     main="Spacetime Original vs. Spacekime (SK) Reconstructed Distributions")
lines(density(scale(ift_ft_xC_1sampleN_PhaseC_avg)), col="green", lwd=2, lty=2)
lines(density(scale(ift_ft_xC_1sampleN_PhaseD_avg)), col="blue", lwd=2, lty=10)
lines(density(scale(ift_ft_xC_1sampleN_PhaseB_avg)), col="red", lwd=2, lty=20)
legend("topright", legend=c("Original (Mixture of N(0,1) & N(5,3))", "SK Synthesis (1 Mag, Phase=True)",
      "SK Synthesis (1 Mag, Phase=Indep)", "SK Synthesis (1 Mag, Phase=Diff.Proc.)",
      paste0(sprintf("\nKS.test(Original, SK(1Mag,Phase=True)), D=%s, p=%s\n",
                    format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$statistic[[1]], digits=2),
                    format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)),
          sprintf("KS.test(SK(1Mag,Phase=Indep), SK(1Mag,Phase=True)), p=%s\n",
                    format(ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg),
                                   scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)),
          sprintf("Corr(Original, SK(1Mag,Phase=True))=%s", 
                    format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2)))),
       col=c("black", "green", "blue", "red", "black"), lty=c(1, 2, 10, 20, 0), lwd=2, bty = "n", cex=0.75,
       y.intersp=0.0, title="Scaled Densities")

## Warning in ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg),
## scale(ift_ft_xC_1sampleN_PhaseC_avg)): p-value will be approximate in the
## presence of ties

## Warning in ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg),
## scale(ift_ft_xC_1sampleN_PhaseC_avg)): p-value will be approximate in the
## presence of ties

## quartz_off_screen 
##                 2

# Skewness
e1071::skewness(xC); e1071::skewness(ift_ft_xC_1sampleN_PhaseC_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseB_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseD_avg)

## [1] 0.359816

## [1] 0.2563408

## [1] 0.001021943

## [1] 0.2446415

# Kurtosis
e1071::kurtosis(xC); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseC_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseB_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseD_avg)

## [1] -0.6270681

## [1] -0.3619818

## [1] 0.2149918

## [1] -0.5228285

raw_data <- 
  data.frame(Original=scale(xC),
             Phase_True=scale(ift_ft_xC_1sampleN_PhaseB_avg),
             Phase_Diff.Proc=scale(ift_ft_xC_1sampleN_PhaseC_avg),
             Phase_Indep=scale(ift_ft_xC_1sampleN_PhaseD_avg))

1.2.1.1 Violin Plot

melted_data = melt(raw_data)

## No id variables; using all as measure variables

ggplot(melted_data,aes(x=variable, y=value, fill=variable)) + 
  geom_violin(trim=FALSE) +
  geom_boxplot(width=0.1) + 
  theme_bw()

1.2.1.2 Empirical CDF

ggplot(melted_data,aes(x=value, color=variable)) +
  stat_ecdf(size = 0.5)

1.2.1.3 Beanplot

boxplot(melted_data$value ~ melted_data$variable)
beanplot(melted_data$value ~ melted_data$variable,
         border = "#CAB2D6",
         col = c("#CAB2D6", "#33A02C", "#B2DF8A"),
         side="second", add = T)

1.3 Bayesian Inference Simulation

Here we relate to Bayesian inference to Spacekime analytics based on a single (cohort (\(A\)) spacetime observation \(x_{i_o}\) and some prior kime-phases (obtained from cohorts \(A\), \(B\), or \(C\)). This is accomplished by computing the prior or posterior predictive distribution.

1.3.1 Canned academic example (using the `cars` dataset)

#install.packages("rstanarm")
#library("rstanarm")

# Docs: https://rdrr.io/cran/rstanarm/man/posterior_predict.stanreg.html

# 1. Canned example
# if (!exists("example_model")) example(example_model)
# yrep <- posterior_predict(example_model)
# table(yrep)

# 2. Example using sample data (n=10): counts ~ outcome + treatment
# counts <- c(18,17,15,20,10,20,25,13,12,15)
# outcome <- gl(5,2,10)
# treatment <- gl(2,5)
# model_fit <- stan_glm(counts ~ outcome + treatment, 
#                  family = poisson(link="log"),
#                  prior = normal(0, 1), prior_intercept = normal(0, 5))
# new_data <- data.frame(treatment = factor(rep(1,5)), outcome = factor(1:5))
# Draw from the posterior predictive distribution of the outcome.
# ytilde <- posterior_predict(model_fit, new_data, draws = 500)
# print(dim(ytilde))  # 500 by 5 matrix (draws by nrow(new_data))
# ytilde <- data.frame(count = c(ytilde),
#                      outcome = rep(new_data$outcome, each = 500))
# ggplot2::ggplot(ytilde, ggplot2::aes(x=outcome, y=count)) +
#   ggplot2::geom_boxplot() +
#   ggplot2::ylab("predicted count")
# ytilde <- posterior_predict(model_fit, draws = 500)
# bayesplot::color_scheme_set("brightblue")
# bayesplot::ppc_dens_overlay(counts, ytilde[1:100, ])

# Using the CARS data (mpg ~ wt)
mtcars2 <- mtcars   # dim(mtcars2) # [1] 32(Automakers) 11(Features)
# mtcars2$log_mpg <- log(mtcars2$mpg)   # Define outcome of interest
model_fit <- stan_glm(mpg ~ wt, data = mtcars2)

## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 8.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.87 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.057879 seconds (Warm-up)
## Chain 1:                0.049047 seconds (Sampling)
## Chain 1:                0.106926 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.2e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.06328 seconds (Warm-up)
## Chain 2:                0.043701 seconds (Sampling)
## Chain 2:                0.106981 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1.3e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.052791 seconds (Warm-up)
## Chain 3:                0.051304 seconds (Sampling)
## Chain 3:                0.104095 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.053378 seconds (Warm-up)
## Chain 4:                0.050564 seconds (Sampling)
## Chain 4:                0.103942 seconds (Total)
## Chain 4:

# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=500 by 
# N(number of data points being predicted per draw)
ytilde <- posterior_predict(model_fit, draws = 500)
dim(ytilde) # Outcome=mpg, [1] 500(MCMC draws)  32(automakers)

## [1] 500  32

head(ytilde)

##      Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
## [1,]  27.83336      18.09879   20.46724       14.94926          21.22127
## [2,]  24.77289      16.98112   23.68493       20.31273          22.80704
## [3,]  23.77452      19.55476   21.77243       20.91546          26.30971
## [4,]  26.54254      26.26840   24.77963       18.60093          15.30545
## [5,]  19.92686      21.62055   17.91423       18.02250          21.09841
## [6,]  21.90781      26.91936   29.17032       22.04833          20.19422
##       Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE
## [1,] 16.04502   11.63372  10.66559 21.98524 18.77021  16.82558   20.66930
## [2,] 17.02287   19.83796  17.14120 16.72355 18.56964  18.19241   13.83327
## [3,] 11.94724   17.55357  20.60386 20.24220 16.07105  13.96353   21.31576
## [4,] 18.45152   11.59260  17.42055 19.23979 19.46640  16.28337   11.92779
## [5,] 19.79754   20.18799  10.31959 22.63071 14.70449  21.23318   12.28052
## [6,] 19.52867   17.17794  26.50403 20.79099 16.85627  16.74970   13.36202
##      Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## [1,]   15.08942    11.83466           6.372787            9.092416
## [2,]   17.31504    16.66325           9.372890           12.408165
## [3,]   15.25952    12.59248          17.628026           10.148073
## [4,]   13.40738    19.09064           9.711535            4.601367
## [5,]   12.99332    11.82580          13.126779           10.442684
## [6,]   19.48801    17.77542          10.097532           13.456131
##      Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla Toyota Corona
## [1,]          3.101490 26.17248    33.26072       26.44477      23.51168
## [2,]          9.955109 23.92047    32.12174       31.83815      18.68618
## [3,]         15.272304 26.73828    31.45684       29.42007      18.51923
## [4,]          4.139900 24.74204    29.99198       27.54381      23.87595
## [5,]         10.925105 31.37050    25.43979       29.22767      19.55515
## [6,]         11.670132 26.93697    23.03120       27.34927      22.59423
##      Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1-9
## [1,]         16.76778    20.74306   13.82242         20.16253  21.41692
## [2,]         20.19332    19.76735   15.33447         14.03374  29.31019
## [3,]         19.45280    20.17639   18.14594         18.82701  21.83076
## [4,]         19.98458    16.43369   17.16427         17.83606  31.44703
## [5,]         19.06299    12.15604   13.62427         14.28430  27.65276
## [6,]         18.59290    17.41055   17.88571         16.34603  24.81989
##      Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora
## [1,]      26.13833     30.02604       21.19611     19.17363      14.86521
## [2,]      25.81284     31.69259       17.01650     22.37688      14.34606
## [3,]      22.29241     29.33770       23.94098     21.58584      15.72230
## [4,]      30.00675     27.46188       19.61855     26.95284      22.31696
## [5,]      22.70551     27.20964       19.32096     22.94286      16.71434
## [6,]      25.03603     30.59111       12.12025     20.62699      15.26275
##      Volvo 142E
## [1,]   19.80113
## [2,]   21.60304
## [3,]   25.29954
## [4,]   20.34696
## [5,]   17.49975
## [6,]   18.85936

bayesplot::color_scheme_set("brightblue")
bayesplot::ppc_dens_overlay(mtcars2$mpg, ytilde[1:100, ])

bayesplot::ppc_hist(mtcars2$mpg, ytilde[1:100, ])

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "mean")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "sd")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "range")
bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "min")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "max")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# 2D (mean, sd) plot
bayesplot::ppc_stat_2d(mtcars2$mpg, ytilde[1:100, ], stat=c("mean", "sd"), size=2.5, alpha=0.7)

1.3.2 Example of spacekime-analytics as a Bayesian Inference problem (using the fMRI dataset)

Recall that we have:

Original data ift_ft_xC_fMRI
IFT of the data using C-Phases ift_ft_xC_fMRI_1sampleN_PhaseC for all samples of \(N=30\)
IFT of the data using B-Phases ift_ft_xC_fMRI_1sampleN_PhaseB for all samples of \(N=30\), and
IFT of the data using D-Phases ift_ft_xC_fMRI_1sampleN_PhaseD for all samples of \(N=30\)

#install.packages("rstanarm")
#library("rstanarm")

# xC_fMRI_scaled <- scale(xC_fMRI)[ , 1]
length(ift_ft_xC_fMRI)    # [1] 8000

## [1] 144

dim(ift_ft_xC_fMRI_1sampleN_PhaseC)  # [1] 8000   30

## [1] 144  30

dim(ift_ft_xC_fMRI_1sampleN_PhaseB)

## [1] 144  30

dim(ift_ft_xC_fMRI_1sampleN_PhaseD)

## [1] 144  30

# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=30 by 
# N(number of data time points being predicted per draw)=8000
# Inspect the first few spacekime reconstructed draws: Mind the transposition of the tensor!
# t(ift_ft_xC_fMRI_1sampleN_PhaseC)[ , 1:7]  # 30(draws)  7(first timepoints ony) 
# the spacekime estimates represent the posterior predictive distribution
# ytilde <- t(ift_ft_xC_fMRI_1sampleN_PhaseC + mean(xC_fMRI))  # all 8000 timepoints (columns)
# Center/Offset the spacekime estimates to the center of the original data (xC_fMRI)
ytilde <- rbind(t(ift_ft_xC_fMRI_1sampleN_PhaseC+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseC,2,mean)),
                t(ift_ft_xC_fMRI_1sampleN_PhaseB+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseB,2,mean)),
                t(ift_ft_xC_fMRI_1sampleN_PhaseD+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseD,2,mean))) 
dim(ytilde) # Outcome=bimodal process (xC_fMRI), [1] 30(MCMC draws)  8000(timepoints)

## [1]  90 144

bayesplot::color_scheme_set("brightblue")

bayesplot::ppc_dens_overlay(ift_ft_xC_fMRI, ytilde)

bayesplot::ppc_hist(ift_ft_xC_fMRI, ytilde[1:11, ])

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "mean")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "sd")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = IQR)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "min")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "max")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# bayesplot::ppc_stat(xC_fMRI_scaled, ytilde, stat = e1071::skewness)
# bayesplot::ppc_stat(xC_fMRI_scaled, ytilde, stat = e1071::kurtosis)
# 2D (mean, sd) plot
bayesplot::ppc_stat_2d(ift_ft_xC_fMRI, ytilde, stat=c("mean", "sd"), size=2.5, alpha=0.7)

1.3.3 Example of spacekime-analytics as a Bayesian Inference problem (using the `bimodal` simulated dataset)

Recall that we have:

Original data xC
IFT of the data using C-Phases ift_ft_xC_1sampleN_PhaseC for all samples of \(N=30\)
IFT of the data using B-Phases ift_ft_xC_1sampleN_PhaseB for all samples of \(N=30\), and
IFT of the data using D-Phases ift_ft_xC_1sampleN_PhaseD for all samples of \(N=30\)

#install.packages("rstanarm")
#library("rstanarm")

# xC_scaled <- scale(xC)[ , 1]
length(xC)                      # [1] 8000

## [1] 8000

dim(ift_ft_xC_1sampleN_PhaseC)  # [1] 8000   30

## [1] 8000   30

dim(ift_ft_xC_1sampleN_PhaseB)

## [1] 8000   30

dim(ift_ft_xC_1sampleN_PhaseD)

## [1] 8000   30

# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=30 by 
# N(number of data time points being predicted per draw)=8000
# Inspect the first few spacekime reconstructed draws: Mind the transposition of the tensor!
# t(ift_ft_xC_1sampleN_PhaseC)[ , 1:7]  # 30(draws)  7(first timepoints ony) 
# the spacekime estimates represent the posterior predictive distribution
# ytilde <- t(ift_ft_xC_1sampleN_PhaseC + mean(xC))  # all 8000 timepoints (columns)
# Center/Offset the spacekime estimates to the center of the original data (xC)
ytilde <- rbind(t(ift_ft_xC_1sampleN_PhaseC+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseC,2,mean)),
                t(ift_ft_xC_1sampleN_PhaseB+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseB,2,mean)),
                t(ift_ft_xC_1sampleN_PhaseD+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseD,2,mean))) 
dim(ytilde) # Outcome=bimodal process (xC), [1] 30(MCMC draws)  8000(timepoints)

## [1]   90 8000

bayesplot::color_scheme_set("brightblue")

Figure 4.4A

bayesplot::ppc_dens_overlay(xC, ytilde)+
  xlim(-22,25)

## Warning: Removed 3308 rows containing non-finite values (stat_density).

## Warning: Removed 3308 rows containing non-finite values (stat_density).

## quartz_off_screen 
##                 2

bayesplot::ppc_hist(xC, ytilde[1:11, ])

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(xC, ytilde, stat = "mean")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(xC, ytilde, stat = "sd")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Figure 4.4D

bayesplot::ppc_stat(xC, ytilde, stat = IQR)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## quartz_off_screen 
##                 2

bayesplot::ppc_stat(xC, ytilde, stat = "min")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Figure 4.4C

bayesplot::ppc_stat(xC, ytilde, stat = "max")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## quartz_off_screen 
##                 2

Figure 4.4B

# bayesplot::ppc_stat(xC_scaled, ytilde, stat = e1071::skewness)
# bayesplot::ppc_stat(xC_scaled, ytilde, stat = e1071::kurtosis)
# 2D (mean, sd) plot
bayesplot::ppc_stat_2d(xC, ytilde, stat=c("mean", "sd"), size=2.5, alpha=0.7)

## quartz_off_screen 
##                 2

2 Appendix: Functions Used

2.1 fftinv()

# Implicitly Invert the FT (IFT)
#' This function does the IFT and scales appropriately the  result to ensure IFT(FT()) = I()
#' 
#' @param x FT of a dataset.
#' @return The IFT of the input array.
#'
fftinv <- function( x ) { 
  fft( x, inverse=TRUE ) / length( x ) 
  }

2.2 kSpaceTransform()

#' A generic function to Transform Data ={all predictors (X) and outcome (Y)} to k-space (Fourier domain)
#' For ForwardFT, set parameters as (rawData, FALSE, NULL)
#' For InverseFT, there are two parameters setting: (magnitudes, TRUE, reconPhasesToUse) or (FT_data, TRUE, NULL)
#'
#' @param data dataset that needs K-space transformation
kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) {
  # ForwardFT (rawData, FALSE, NULL)
  # InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL)
  FT_data <- array(complex(), length(data))
  mag_FT_data <- array(complex(), length(data))
  phase_FT_data <- array(complex(), length(data))
  IFT_reconPhases_data <- array(complex(), length(data))

  if (inverse == FALSE | is.null(reconPhases)) {
      FT_data <- fft(data, inverse)
      X2 <- FT_data
      mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2) 
      phase_FT_data <- atan2(Im(X2), Re(X2)) 
  }
  else {  # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators
      Real <- data * cos(reconPhases)  
      Imaginary <- data * sin(reconPhases) 
      IFT_reconPhases_data <- 
          Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data))
  }

    ######### Test the FT-IFT analysis-synthesis back-and-forth transform process 
    #         to confirm calculations
    # X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2); 
    # phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2)); 
    # Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1])
    # Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1])
    # man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
    # ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!")
    #########
  
    if (inverse == FALSE | is.null(reconPhases)) {
      return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data))
      # Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases
    }
    else {
      return(IFT_reconPhases_data)
      # Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data
    }
}

2.3 mixedDistFunc()

# Make a mixture model
#' This function generates a new distribution that is mixed model of two normal distributions
#' 
#' @param n A integer, specifies number of random numbers generated by each normal distribution
#' @param weight Weight of the first normal distribution
#' @param mu1 Mean of the first normal distribution
#' @param mu2 Mean of the secondnormal distribution
#' @param sig1 Standard deviation of the first normal distribution
#' @param sig2 Standard deviation of the second normal distribution
#' @return A number or numbers that is a combination of two normal distributions
#'
mixedDistFunc <- function (n, weight, mu1, mu2, sig1, sig2) {
    set.seed(1234); U <- rnorm(n, mean=mu1, sd = sig1)
    set.seed(1234); V <- rnorm(n,mean=mu2, sd = sig2)
    # randomly choose U or V
    set.seed(1234); wght <- rbinom(n, size=1, prob=weight) 
    X <- U*(1 - wght) + V*wght 
}

For the MCSI data analytics, see the separate script: UM_Michigan_Consumer_Sentiment_Index.R.

Figure 4.1, 4.2, 4.3 & 4.4

Back To Index

SOCR Team

June 2019

1 Spacetime IID vs. Spacekime Sampling

1.1 fMRI timeseries

1.1.1 Data Visualization

1.1.1.1 Violin Plot

1.1.1.2 Empirical CDF

1.1.1.3 Beanplot

1.2 IID Simulation

1.2.1 Data Visualization

1.2.1.1 Violin Plot

1.2.1.2 Empirical CDF

1.2.1.3 Beanplot

1.3 Bayesian Inference Simulation

1.3.1 Canned academic example (using the `cars` dataset)

1.3.2 Example of spacekime-analytics as a Bayesian Inference problem (using the fMRI dataset)

1.3.3 Example of spacekime-analytics as a Bayesian Inference problem (using the `bimodal` simulated dataset)

2 Appendix: Functions Used

2.1 fftinv()

2.2 kSpaceTransform()

2.3 mixedDistFunc()

Figure 4.1, 4.2, 4.3 & 4.4

Back To Index

SOCR Team

June 2019

1 Spacetime IID vs. Spacekime Sampling

1.1 fMRI timeseries

1.1.1 Data Visualization

1.1.1.1 Violin Plot

1.1.1.2 Empirical CDF

1.1.1.3 Beanplot

1.2 IID Simulation

1.2.1 Data Visualization

1.2.1.1 Violin Plot

1.2.1.2 Empirical CDF

1.2.1.3 Beanplot

1.3 Bayesian Inference Simulation

1.3.1 Canned academic example (using the cars dataset)

1.3.2 Example of spacekime-analytics as a Bayesian Inference problem (using the fMRI dataset)

1.3.3 Example of spacekime-analytics as a Bayesian Inference problem (using the bimodal simulated dataset)

2 Appendix: Functions Used

2.1 fftinv()

2.2 kSpaceTransform()

2.3 mixedDistFunc()

1.3.1 Canned academic example (using the `cars` dataset)

1.3.3 Example of spacekime-analytics as a Bayesian Inference problem (using the `bimodal` simulated dataset)