--- title: "Spacekime Analytics (Time Complexity and Inferential Uncertainty)" subtitle: "Spacetime IID vs. Spacekime Sampling" author: "SOCR Team " date: "`r format(Sys.time(),'%m/%d/%Y')`" output: html_document: theme: spacelab highlight: tango includes: before_body: TCIU_header.html toc: true number_sections: true toc_depth: 2 toc_float: collapsed: false smooth_scroll: true code_folding: hide --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warings = FALSE) ``` # IID Spacetime Sampling vs. Spacekime Sampling For all natural spacetime processes, various population characteristics like the mean, variance, range, and quantiles can be estimated by collecting *independent and identically distributed (IID)* samples. These samples represent observed data that is traditionally used to obtain sample-driven estimates of the specific population characteristics via standard formulas like the *sample* arithmetic average, variance, range, quantiles, etc. The latter approximate their population counterparts and form the basis for classical parametric and non-parametric statistical inference. Typically, reliable *spacetime* statistical inference is conditional on the distribution of the native process as well as a sample-size reflecting the characteristics of the phenomenon. We will demonstrate that *spacekime* analytics can be equally effective with measuring a single spacetime observation and having a reasonable estimates of the unobserved process kime-phases. Without loss of generality, suppose we have a pair of cohorts $A$ and $B$ and we obtain a series of measurements $\{X_{A,i}\}_{i=1}^{n_A}$ and $\{X_{B,i}\}_{i=1}^{n_B}$, respectively. Obviously the relations between the cohorts could widely vary, from being samples of the same process, to being related or completely independent. To allow us to examine the extreme cases of pairing (1) IID cohorts ($A$ and $B$), and (2) independent but differently distributed cohorts ($A$ and $C$). The latter case may be thought of as a split of the first cohort ($A$) into training ($C$) and testing ($D$) groups. This design allows us to compare the classical spacetime-derived population characteristics of cohort $A$ to their spacekime-reconstructed counterparts obtained using a single random kime-magnitude observation from $A$ and kime-phases estimates derived from cohorts $B$, $C$ or $D$. ## fMRI timeseries The demonstration below is based on a functional magnetic resonance imaging (fMRI) data, which is a 4D hypervolume with intensities representing the blood oxygenation level dependence at a specific spacetime location $(x,y,z,t)$. For simplicity, we will only focus on two fixed spatial locations with varying intensity distributions. ```{r eval=TRUE, message=FALSE,warning=FALSE} library(EBImage) require(brainR) library(spatstat) library(ggplot2) library(kSamples) library(reshape2) library(beanplot) library(rstanarm) fMRIURL <- "http://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz" fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz") download.file(fMRIURL, dest=fMRIFile, quiet=TRUE) fMRIVolume <- readNIfTI(fMRIFile, reorient=FALSE) # dimensions: 64 x 64 x 21 x 180 ; 4mm x 4mm x 6mm x 3 sec fMRIVolDims <- dim(fMRIVolume); # fMRIVolDims # time_dim <- fMRIVolDims[4]; time_dim ## 180 # 2. extract the time-corse of 1D mid-axial slice (3D) hypervolume xA_fMRI_1D_x20_y20_z11 <- fMRIVolume[20, 20, 11, ]; # length(xA_fMRI_1D_x20_y20_z11) # 180 # hist(xA_fMRI_1D_x20_y20_z11) library(plotly) plot_ly(x = ~xA_fMRI_1D_x20_y20_z11, type = "histogram") %>% layout(bargap=0.1) xB_fMRI_1D_x30_y30_z13 <- fMRIVolume[30, 30, 13, ]; # length(xB_fMRI_1D_x30_y30_z13) # 180 # hist(xB_fMRI_1D_x30_y30_z13) # Now, combine your two 1D timeseries into one dataframe for joint hist plotting as densities. # First make a new column in each that will be # a variable to identify where they came from later. xA_df <- as.data.frame(xA_fMRI_1D_x20_y20_z11) colnames(xA_df) <- "value"; xA_df$cohort <- "xA" xB_df <- as.data.frame(xB_fMRI_1D_x30_y30_z13) colnames(xB_df) <- "value"; xB_df$cohort <- "xB" # and combine into your new data frame vegLengths xA_xB_df <- rbind(xA_df, xB_df) # ggplot(xA_xB_df, aes(value, fill = cohort)) + # geom_density(alpha = 0.5, size=1.2) + # theme(text = element_text(size=20)) + # xlim(c(10200, 12000)) density_xA <- density(xA_xB_df[ which(xA_xB_df$cohort=="xA"), ]$value) density_xB <- density(xA_xB_df[ which(xA_xB_df$cohort=="xB"), ]$value) df_xA <- as.data.frame(cbind(x=density_xA$x, y=density_xA$y)) df_xB <- as.data.frame(cbind(x=density_xB$x, y=density_xB$y)) plot_ly(df_xA, x = ~x, y = ~y, type = 'scatter', mode = 'lines', name = "xA", fill = 'tozeroy') %>% add_trace(x = ~df_xB$x, y = ~df_xB$y, type = 'scatter', mode = 'lines', name = "xB", fill = 'tozeroy') %>% layout(title="Cohort A and B Distributions", xaxis = list(title = 'value'), yaxis = list(title = 'Density')) ``` Clearly the intensities of cohorts $A$ and $B$ are independent and follow different distribution. We'll split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups. Then we will: - transform all four cohorts into Fourier k-space, - iteratively randomly sample single observations from cohort $C$, - reconstruct the data into spacetime using alternative kime-phase estimates derived from cohorts $B$ and $D$, and - compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases. ```{r eval=TRUE} # Generic function to Transform Data to/from k-space (Space/Fourier domain) kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) { # ForwardFT (rawData, FALSE, NULL) # InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL) FT_data <- array(complex(), length(data)) mag_FT_data <- array(complex(), length(data)) phase_FT_data <- array(complex(), length(data)) IFT_reconPhases_data <- array(complex(), length(data)) if (inverse == FALSE | is.null(reconPhases)) { FT_data <- fft(data, inverse) X2 <- FT_data mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2) phase_FT_data <- atan2(Im(X2), Re(X2)) } else { # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators Real <- data * cos(reconPhases) Imaginary <- data * sin(reconPhases) IFT_reconPhases_data <- Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data)) } ######### Test the FT-IFT analysis-synthesis back-and-forth transform process # to confirm calculations # X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2); # phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2)); # Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1]) # Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1]) # man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2)) # ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!") ######### if (inverse == FALSE | is.null(reconPhases)) { return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data)) # Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases } else { return(IFT_reconPhases_data) # Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data } } # 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups. subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8)) # 80% training + 20% testing xC_fMRI_train <- xA_df$value [subset_int]; # length(xC_fMRI_train) # 144 xD_test <- xA_df$value [-subset_int]; # length(xD_test) # 36 # 2. Transform all four cohorts into Fourier k-space # xA, xB, xC_fMRI_train; xD_test xA <- xA_fMRI_1D_x20_y20_z11; # length(xA) # 180 xB <- xB_fMRI_1D_x30_y30_z13; # length(xB) # 180 ft_xA <- fft(xA); ft_xB <- fft(xB) ft_xC_fMRI_train <- fft(xC_fMRI_train); ft_xD_test <- fft(xD_test); # Magnitudes and Phases: Phase <- atan(Im(img_ff)/Re(img_ff)) mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2) mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2) mag_ft_xC_fMRI_train <- sqrt(Re(ft_xC_fMRI_train)^2+Im(ft_xC_fMRI_train)^2) mag_ft_xD_test <- sqrt(Re(ft_xD_test)^2+Im(ft_xD_test)^2) phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA)) phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB)) phase_ft_xC_fMRI_train <- atan2(Im(ft_xC_fMRI_train), Re(ft_xC_fMRI_train)) phase_ft_xD_test <- atan2(Im(ft_xD_test), Re(ft_xD_test)) # Double-Check FT-IFT==I ImplicitlyInvert the FT (IFT) fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) } # head(Re(fftinv(ft_xA))); head(xA) # 3. Iteratively randomly sample single observations from cohort $C$, N <- 30 # to 30 simulations # take a random sample of size N (without replacement) from $C$ N_sampleIndx <- sample(1:length(xC_fMRI_train), N, replace=FALSE) xC_fMRI_sampleN <- xC_fMRI_train[N_sampleIndx] ft_xC_fMRI_sampleN_mag <- mag_ft_xC_fMRI_train[N_sampleIndx] # 4. reconstruct the $C$ data into spacetime using a single ft_xC_fMRI_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$ # for each ft_xC_fMRI_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_fMRI_sampleN_PhaseB ift_ft_xC_fMRI_sampleN_PhaseD ift_ft_xC_fMRI_1sampleN_PhaseB <- array(dim=c(length(xC_fMRI_train), length(xC_fMRI_sampleN))) ift_ft_xC_fMRI_1sampleN_PhaseD <- array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN))) ift_ft_xC_fMRI_1sampleN_PhaseC <- array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN))) ift_ft_xC_fMRI <- array(dim=length(xC_fMRI_train)) # dim(ift_ft_xC_fMRI_1sampleN_PhaseB) # [1] Time=144 Samples_N=30 for (i in 1:N) { ift_ft_xC_fMRI_1sampleN_PhaseB[ , i] <- Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)), TRUE, phase_ft_xB[1:length(xC_fMRI_train)])) ift_ft_xC_fMRI_1sampleN_PhaseD[ , i] <- Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(phase_ft_xD_test)), TRUE, phase_ft_xD_test[1:length(phase_ft_xD_test)])) ift_ft_xC_fMRI_1sampleN_PhaseC[ , i] <- Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)), TRUE, phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)])) } ift_ft_xC_fMRI <- Re(kSpaceTransform(mag_ft_xC_fMRI_train, TRUE, phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)])) # head(xC_fMRI_train) == head(ift_ft_xC_fMRI) # 5. compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases. # Data = xC_fMRI_train, ift_ft_xC_fMRI_1sampleN_PhaseB, ift_ft_xC_fMRI_1sampleN_PhaseD # length(xC_fMRI_train) == length(ift_ft_xC_fMRI_1sampleN_PhaseB[ , 1]) summary(scale(xC_fMRI_train)) summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseC[ , 15])) summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseB[ , 15])) summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseD[ , 15])) ``` ### Data Visualization ```{r warning=FALSE} # Plot all histograms as densities ift_ft_xC_fMRI_1sampleN_PhaseC_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC)) # colnames(xA_df) <- "value"; xA_df$cohort <- "xA" # xA_scale_df <- as.data.frame(scale(xA_df$value)) # colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA" xB_df <- as.data.frame(scale(xB_fMRI_1D_x30_y30_z13)) colnames(xB_df) <- "value"; xB_df$cohort <- "xB" # # and combine into your new data frame Lengths xA_xB_df <- rbind(xA_df, xB_df) # ggplot(xA_xB_df, aes(value, fill = cohort)) + # geom_density(alpha = 0.5, size=1.2) + # theme(text = element_text(size=20)) + # xlim(c(10200, 12000)) density_xA <- density(xA_df$value) df_xA <- as.data.frame(cbind(x=density_xA$x, y=density_xA$y)) plot_ly(df_xA, x = ~x, y = ~y, type = 'scatter', mode = 'lines', name = "xA", fill = 'tozeroy') %>% layout(title="Cohort A Distribution", xaxis = list(title = 'value'), yaxis = list(title = 'Density')) # length(xC_fMRI_train); dim(ift_ft_xC_fMRI_1sampleN_PhaseB) # dim(ift_ft_xC_fMRI_1sampleN_PhaseC); dim(ift_ft_xC_fMRI_1sampleN_PhaseD) # Compute the averages accross all N=30 experiments for the B, C & D reconstructions ift_ft_xC_fMRI_1sampleN_PhaseB_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseB, 1, mean) ift_ft_xC_fMRI_1sampleN_PhaseC_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseC, 1, mean) ift_ft_xC_fMRI_1sampleN_PhaseD_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseD, 1, mean) # Plot 4 density curves (orig=xC_fMRI and 3 reconstructions from B, C and D) xC_fMRI_train_scale_df <- as.data.frame(scale(xC_fMRI_train)) colnames(xC_fMRI_train_scale_df) <- "value"; xC_fMRI_train_scale_df$series <- "xC_fMRI_original" ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseB_avg)) colnames(ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df) <- "value" ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB" ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC_avg)) colnames(ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df) <- "value" ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC" ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseD_avg)) colnames(ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df) <- "value" ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD" # and combine into your new data frame vegLengths xC_fMRI_SK_Phases_B_C_D_df <- rbind(xC_fMRI_train_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df) # library(ggplot2) # ggplot(xC_fMRI_SK_Phases_B_C_D_df, aes(value, fill = series)) + # geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) + # position = "stack" # theme(text = element_text(size=20)) + # scale_fill_manual( values = c("yellow", "red", "blue", "green")) + # geom_line(data=xC_fMRI_train_scale_df, stat = "density", color="purple", lty=4, lwd=2) + # ## guides(color = guide_legend(order=1)) + # theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank()) # # theme(legend.position="bottom") density_xC_fMRI_orig <- density(xC_fMRI_SK_Phases_B_C_D_df[ which(xC_fMRI_SK_Phases_B_C_D_df$series=="xC_fMRI_original"), ]$value) density_SK_PhaseB <- density(xC_fMRI_SK_Phases_B_C_D_df[ which(xC_fMRI_SK_Phases_B_C_D_df$series=="SK_PhaseB"), ]$value) density_SK_PhaseC <- density(xC_fMRI_SK_Phases_B_C_D_df[ which(xC_fMRI_SK_Phases_B_C_D_df$series=="SK_PhaseC"), ]$value) density_SK_PhaseD <- density(xC_fMRI_SK_Phases_B_C_D_df[ which(xC_fMRI_SK_Phases_B_C_D_df$series=="SK_PhaseD"), ]$value) df_xC_fMRI_orig <- as.data.frame(cbind( x=density_xC_fMRI_orig$x, y=density_xC_fMRI_orig$y)) df_SK_PhaseB <- as.data.frame(cbind( x=density_SK_PhaseB$x, y=density_SK_PhaseB$y)) df_SK_PhaseC <- as.data.frame(cbind( x=density_SK_PhaseC$x, y=density_SK_PhaseC$y)) df_SK_PhaseD <- as.data.frame(cbind( x=density_SK_PhaseD$x, y=density_SK_PhaseD$y)) plot_ly(df_xC_fMRI_orig, x = ~x, y = ~y, type = 'scatter', mode = 'lines', name = "xC_fMRI_orig", fill = 'tozeroy') %>% add_trace(x = ~df_SK_PhaseB$x, y = ~df_SK_PhaseB$y, type = 'scatter', mode = 'lines', name = "SK_PhaseB", fill = 'tozeroy') %>% add_trace(x = ~df_SK_PhaseC$x, y = ~df_SK_PhaseC$y, type = 'scatter', mode = 'lines', name = "SK_PhaseC", fill = 'tozeroy') %>% add_trace(x = ~df_SK_PhaseD$x, y = ~df_SK_PhaseD$y, type = 'scatter', mode = 'lines', name = "SK_PhaseD", fill = 'tozeroy') %>% layout(title="Cohort Distribuitions - xC_fMRI_orig, SK_PhaseB, SK_PhaseC and SK_PhaseD", legend = list(orientation = 'h'), xaxis = list(title = 'value'), yaxis = list(title = 'Density')) ``` #### Violin Plot ```{r} # ggplot(xC_fMRI_SK_Phases_B_C_D_df,aes(x=series, y=value, fill=series)) + # geom_violin(trim=FALSE) + # geom_boxplot(width=0.1) + # theme_bw() xC_fMRI_SK_Phases_B_C_D_df %>% plot_ly(x = ~series, y = ~value , split = ~series, type = 'violin', box = list(visible = T), meanline = list(visible = T)) %>% layout(xaxis = list(title = "series"), yaxis = list(title = "density", zeroline = F)) ``` #### Empirical CDF ```{r} # ggplot(xC_fMRI_SK_Phases_B_C_D_df,aes(x=value, color=series)) + # stat_ecdf(size = 0.5) df <- dplyr::arrange(xC_fMRI_SK_Phases_B_C_D_df, value) pl <- ggplot(df, aes(x=value, color=series)) + stat_ecdf(size = 0.5) ggplotly(pl) ``` #### Beanplot ```{r} boxplot(xC_fMRI_SK_Phases_B_C_D_df$value ~ xC_fMRI_SK_Phases_B_C_D_df$series) beanplot(xC_fMRI_SK_Phases_B_C_D_df$value ~ xC_fMRI_SK_Phases_B_C_D_df$series, border = "#CAB2D6", col = c("#CAB2D6", "#33A02C", "#B2DF8A"), side="second",add = T) ``` ## IID Simulation The following simulation example generates two mixture distribution random samples each of $n=10,000$ observations, $\{X_{A,i}\}_{i=1}^{n_A}$, where $X_{A,i} = 0.3U_i + 0.7V_i$, $U_i \sim N(0,1)$ and $V_i \sim N(5,3)$, and $\{X_{B,i}\}_{i=1}^{n_B}$, where $X_{B,i} = 0.4P_i + 0.6Q_i$, $P_i \sim N(20,20)$ and $Q_i \sim N(100,30)$. ```{r eval=TRUE} n=10000 mu1 <- 0; mu2 <- 5 sig1 <- 1; sig2 <- 3 weight <- 0.7 mixedDistFunc <- function (n, weight, mu1, mu2, sig1, sig2) { set.seed(1234); U <- rnorm(n, mean=mu1, sd = sig1) set.seed(1234); V <- rnorm(n,mean=mu2, sd = sig2) # randomly choose U or V set.seed(1234); wght <- rbinom(n, size=1, prob=weight) X <- U*(1 - wght) + V*wght } xA <- mixedDistFunc(n=n, weight, mu1, mu2, sig1, sig2) hist(xA, freq = F) # length(xA) # n=10,000 xB <- mixedDistFunc(n=n, 0.6, 20, 100, 20, 30) hist(xB, freq = F) # length(xB) # Now, combine your two univariate sets into one dataframe for joint hist plotting as densities. # First make a new column in each that will be # a variable to identify where they came from later. xA_df <- as.data.frame(xA); colnames(xA_df)<-"value"; xA_df$cohort<-"xA" xB_df <- as.data.frame(xB); colnames(xB_df)<-"value"; xB_df$cohort<-"xB" # and combine into your new data frame vegLengths xA_xB_df <- rbind(xA_df, xB_df) ``` Figure 5.2 ```{r} ggplot(xA_xB_df, aes(value, fill = cohort)) + geom_density(alpha = 0.5, size=1.2) + theme(text = element_text(size=20)) ``` Clearly the intensities of cohorts $A$ and $B$ are independent and follow different distributions. We'll split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups, and then: - transform all four cohorts into Fourier k-space, - iteratively randomly sample single observations from cohort $C$, - reconstruct the data into spacetime using a single kime-magnitude value and alternative kime-phase estimates derived from cohorts $B$, $C$, and $D$, and - compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their spacekime counterparts obtained using a single $C$ kime-magnitude paired with $B$, $C$, or $D$ kime-phases. ```{r eval=TRUE} # Generic function to Transform 1D Data to/from k-space (Space/Fourier domain) kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) { # ForwardFT (rawData, FALSE, NULL) # InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL) FT_data <- array(complex(), length(data)) mag_FT_data <- array(complex(), length(data)) phase_FT_data <- array(complex(), length(data)) IFT_reconPhases_data <- array(complex(), length(data)) if (inverse == FALSE | is.null(reconPhases)) { FT_data <- fft(data, inverse) X2 <- FT_data mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2) phase_FT_data <- atan2(Im(X2), Re(X2)) } else { # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators Real <- data * cos(reconPhases) Imaginary <- data * sin(reconPhases) IFT_reconPhases_data <- Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data)) } ######### Test the FT-IFT analysis-synthesis back-and-forth transform process # to confirm calculations # X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2); # phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2)); # Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1]) # Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1]) # man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2)) # ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!") ######### if (inverse == FALSE | is.null(reconPhases)) { return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data)) # Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases } else { return(IFT_reconPhases_data) # Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data } } # 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups. subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8)) # 80% training + 20% testing xC <- xA_df$value [subset_int]; # length(xC) # 8000 xD <- xA_df$value [-subset_int]; # length(xD) # 2000 # 2. Transform all four cohorts into Fourier k-space # xA, xB, xC (train), xD (test) ft_xA <- fft(xA); ft_xB <- fft(xB) ft_xC <- fft(xC); ft_xD <- fft(xD) # Magnitudes and Phases: Phase <- atan(Im(img_ff)/Re(img_ff)) mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2) mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2) mag_ft_xC <- sqrt(Re(ft_xC)^2+Im(ft_xC)^2) mag_ft_xD <- sqrt(Re(ft_xD)^2+Im(ft_xD)^2) phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA)) phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB)) phase_ft_xC <- atan2(Im(ft_xC), Re(ft_xC)) phase_ft_xD <- atan2(Im(ft_xD), Re(ft_xD)) # Double-Check FT-IFT==I ImplicitlyInvert the FT (IFT) fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) } # head(Re(fftinv(ft_xA))); head(xA) # 3. Iteratively randomly sample single observations from cohort $C$, N <- 30 # 30 simulations # take a random sample of size N (without replacement) from $C$ set.seed(1234); N_sampleIndx <- sample(1:length(xC), N, replace=FALSE) xC_sampleN <- xC[N_sampleIndx] ft_xC_mag <- mag_ft_xC[N_sampleIndx] # 4. reconstruct the $C$ data into spacetime using a single ft_xC_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$ # for each ft_xC_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_sampleN_PhaseB ift_ft_xC_sampleN_PhaseD ift_ft_xC_1sampleN_PhaseB <- array(dim=c(length(xC), length(xC_sampleN))) ift_ft_xC_1sampleN_PhaseD <- array(dim=c(length(xC), length(xC_sampleN))) ift_ft_xC_1sampleN_PhaseC <- array(dim=c(length(xC), length(xC_sampleN))) ift_ft_xC <- array(dim=length(xC)) # dim(ift_ft_xC_1sampleN_PhaseB) # [1] Size=8000 Samples_N=30 for (i in 1:N) { ift_ft_xC_1sampleN_PhaseB[ , i] <- Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)), TRUE, phase_ft_xB[1:length(xC)])) ift_ft_xC_1sampleN_PhaseD[ , i] <- Re(kSpaceTransform(rep(ft_xC_mag[i], length(phase_ft_xD)), TRUE, phase_ft_xD[1:length(phase_ft_xD)])) ift_ft_xC_1sampleN_PhaseC[ , i] <- Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)), TRUE, phase_ft_xC[1:length(xC)])) } ift_ft_xC <- Re(kSpaceTransform(mag_ft_xC, TRUE,phase_ft_xC[1:length(xC)])) # head(xC) - head(ift_ft_xC) # roundoff should be ~ 0 # 5. compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases. # Data = xC_train, ift_ft_xC_1sampleN_PhaseB, ift_ft_xC_1sampleN_PhaseD # length(xC) == length(ift_ft_xC_1sampleN_PhaseB[ , 1]) summary(scale(xC)) summary(scale(ift_ft_xC_1sampleN_PhaseC[ , 15])) summary(scale(ift_ft_xC_1sampleN_PhaseB[ , 15])) summary(scale(ift_ft_xC_1sampleN_PhaseD[ , 15])) ``` ### Data Visualization ```{r} # Plot all histograms as densities ift_ft_xC_1sampleN_PhaseC_df <- as.data.frame(scale(ift_ft_xC_1sampleN_PhaseC)) # colnames(xA_df) <- "value"; xA_df$cohort <- "xA" # xA_scale_df <- as.data.frame(scale(xA_df$value)) # colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA" xB_df <- as.data.frame(scale(xB)) colnames(xB_df) <- "value"; xB_df$cohort <- "xB" # # and combine into your new data frame Lengths xA_xB_df <- rbind(xA_df, xB_df) ggplot(xA_xB_df, aes(value, fill = cohort)) + geom_density(alpha = 0.5, size=1.2) + theme(text = element_text(size=20)) # length(xC); dim(ift_ft_xC_1sampleN_PhaseB) # dim(ift_ft_xC_1sampleN_PhaseC); dim(ift_ft_xC_1sampleN_PhaseD) # Compute the averages accross all N=30 experiments for the B, C & D spacekime (IFT) reconstructions ift_ft_xC_1sampleN_PhaseB_avg <- apply(ift_ft_xC_1sampleN_PhaseB, 1, mean) ift_ft_xC_1sampleN_PhaseC_avg <- apply(ift_ft_xC_1sampleN_PhaseC, 1, mean) ift_ft_xC_1sampleN_PhaseD_avg <- apply(ift_ft_xC_1sampleN_PhaseD, 1, mean) # Plot 4 density curves (orig=xC and 3 reconstructions from B, C and D) xC_scale_df <- as.data.frame(xC) colnames(xC_scale_df) <- "value"; xC_scale_df$series <- "xC_original" ift_ft_xC_1sampleN_PhaseB_avg_scale_df <- as.data.frame(ift_ft_xC_1sampleN_PhaseB_avg) colnames(ift_ft_xC_1sampleN_PhaseB_avg_scale_df) <- "value" ift_ft_xC_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB" ift_ft_xC_1sampleN_PhaseC_avg_scale_df <- as.data.frame(ift_ft_xC_1sampleN_PhaseC_avg) colnames(ift_ft_xC_1sampleN_PhaseC_avg_scale_df) <- "value" ift_ft_xC_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC" ift_ft_xC_1sampleN_PhaseD_avg_scale_df <- as.data.frame(ift_ft_xC_1sampleN_PhaseD_avg) colnames(ift_ft_xC_1sampleN_PhaseD_avg_scale_df) <- "value" ift_ft_xC_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD" # Combine into a new data frame xC_SK_Phases_B_C_D_df xC_SK_Phases_B_C_D_df <- rbind(xC_scale_df, ift_ft_xC_1sampleN_PhaseB_avg_scale_df, ift_ft_xC_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_1sampleN_PhaseD_avg_scale_df) # library(ggplot2) ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) + # position = "stack" theme(text = element_text(size=20)) + scale_fill_manual( values = c("yellow", "red", "blue", "green")) + geom_line(data=xC_scale_df, stat = "density", color="purple", lty=4, lwd=2) + ## guides(color = guide_legend(order=1)) + theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank()) # + # xlim(c(-1, 1)) # theme(legend.position="bottom") #ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + # theme(text = element_text(size=20)) + # scale_fill_manual( values = c("yellow", "red", "blue", "green")) + # geom_line(stat = "density", lty=4, lwd=2) + # ## guides(color = guide_legend(order=1)) + # theme(axis.title.x=element_blank(),axis.text.x=element_blank(), #axis.ticks.x=element_blank()) ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + geom_line(aes(colour=series, lty=series), stat = "density", lwd=1.5) + theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank()) # Are the xC (original training) and ift_ft_xC_1sampleN_PhaseC_avg (reconstructed) CORRELATED? # cor(xC, ift_ft_xC_1sampleN_PhaseC_avg) # [1] 0.8872053 # Are the xC (original training) and ift_ft_xC_1sampleN_PhaseD_avg (reconstructed) CORRELATED? # cor(xC, ift_ft_xC_1sampleN_PhaseD_avg) # [1] 0.005248561 # Are the xC (original training) and ift_ft_xC_1sampleN_PhaseB_avg (reconstructed) CORRELATED? # cor(xC, ift_ft_xC_1sampleN_PhaseB_avg) # [1] -0.001070121 ``` Figure 5.3 ```{r} # Plot xC vs. ift_ft_xC_1sampleN_PhaseC_avg plot(xC, ift_ft_xC_1sampleN_PhaseC_avg, xlab = "Original", ylab = "Reconstructed", main = "Spacekime signal reconstruction using \n a single spacetime observation and perfect kime-phases", cex=1.4) abline(lm(ift_ft_xC_1sampleN_PhaseC_avg ~ xC), col="red", lwd=2) text(0, 9, sprintf("Corr(Orig, Rec)=%s", format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2))) ``` ```{r, fig.height=4, fig.width=6,warning=FALSE} #Additional quantitative measures quantifying the differences between distribution (original signal and spacekime reconstructions) include two-sample Kolmogorov-Smirnov (KS) test and the correlation coefficient. The KS test statistic (D) is the maximum distance between the estimated cumulative distribution functions and the corresponding p-value is the probability of seeing a test statistic as high or higher than the one observed given that the two samples were drawn from the same distribution. In our case, comparing the distributions of the original data and its reconstruction using a single kime magnitude and the correct kime-phases yields a KS statistics D = 0.053875, and p_value= 1.647?e^(-10). This suggests very strong statistical evidence (due to the large sample size) and marginal practical difference between the real and reconstructed signals. Comparing a pair of reconstructions using a single kime-magnitude value and two independent kime-phase estimates (cohorts C and D) yields D = 0.017375, and p_value= 0.1786. ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg)) # D = 0.053875, p-value = 1.647e-10, alternative hypothesis: two-sided ks.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(ift_ft_xC_1sampleN_PhaseD_avg)) # D = 0.017375, p-value = 0.1786, alternative hypothesis: two-sided # wilcox.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg)) # insignificant because of scaling #null_distribution <- function (x) { # if (!is.na(approxfun(density(scale(xC)))(x))) { # return (approxfun(density(scale(xC)))(x)) # } else { return (0) } #} # Cramer-Von Mises Test of Goodness-of-Fit #cvm.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) # omega2 = 1854.8, p-value = 0.2955 # Anderson-Darling Test of Goodness-of-Fit #ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) # # Anderson-Darling Test of Goodness-of-Fit ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)) # T.AD=19.18; asympt. P-value=8.578e-09 ``` Figure 5.4 ```{r warning=FALSE} plot(density(scale(xC)), col="black", lty=1, lwd=3, xaxt="n", xlab = "Range", main="Spacetime Original vs. Spacekime (SK) Reconstructed Distributions") lines(density(scale(ift_ft_xC_1sampleN_PhaseC_avg)), col="green", lwd=2, lty=2) lines(density(scale(ift_ft_xC_1sampleN_PhaseD_avg)), col="blue", lwd=2, lty=10) lines(density(scale(ift_ft_xC_1sampleN_PhaseB_avg)), col="red", lwd=2, lty=20) legend("topright", legend=c("Original (Mixture of N(0,1) & N(5,3))", "SK Synthesis (1 Mag, Phase=True)", "SK Synthesis (1 Mag, Phase=Indep)", "SK Synthesis (1 Mag, Phase=Diff.Proc.)", paste0(sprintf("\nKS.test(Original, SK(1Mag,Phase=True)), D=%s, p=%s\n", format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$statistic[[1]], digits=2), format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)), sprintf("KS.test(SK(1Mag,Phase=Indep), SK(1Mag,Phase=True)), p=%s\n", format(ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg), scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)), sprintf("Corr(Original, SK(1Mag,Phase=True))=%s", format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2)))), col=c("black", "green", "blue", "red", "black"), lty=c(1, 2, 10, 20, 0), lwd=2, bty = "n", cex=0.75, y.intersp=0.0, title="Scaled Densities") ``` ```{r} # Skewness # e1071::skewness(xC); e1071::skewness(ift_ft_xC_1sampleN_PhaseC_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseB_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseD_avg) # Kurtosis # e1071::kurtosis(xC); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseC_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseB_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseD_avg) raw_data <- data.frame(Original=scale(xC), Phase_True=scale(ift_ft_xC_1sampleN_PhaseB_avg), Phase_Diff.Proc=scale(ift_ft_xC_1sampleN_PhaseC_avg), Phase_Indep=scale(ift_ft_xC_1sampleN_PhaseD_avg)) ``` #### Violin Plot ```{r} melted_data = melt(raw_data) ggplot(melted_data,aes(x=variable, y=value, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1) + theme_bw() ``` #### Empirical CDF ```{r} ggplot(melted_data,aes(x=value, color=variable)) + stat_ecdf(size = 0.5) ``` #### Beanplot ```{r} boxplot(melted_data$value ~ melted_data$variable) beanplot(melted_data$value ~ melted_data$variable, border = "#CAB2D6", col = c("#CAB2D6", "#33A02C", "#B2DF8A"), side="second", add = T) ``` ## Bayesian Inference Simulation Here we relate to `Bayesian inference` to `Spacekime analytics` based on a single (cohort ($A$) spacetime observation $x_{i_o}$ and some prior `kime-phases` (obtained from cohorts $A$, $B$, or $C$). This is accomplished by computing the `prior` or `posterior predictive distribution`. ### Canned academic example (using the `cars` dataset) ```{r} #install.packages("rstanarm") #library("rstanarm") # Docs: https://rdrr.io/cran/rstanarm/man/posterior_predict.stanreg.html # 1. Canned example # if (!exists("example_model")) example(example_model) # yrep <- posterior_predict(example_model) # table(yrep) # 2. Example using sample data (n=10): counts ~ outcome + treatment # counts <- c(18,17,15,20,10,20,25,13,12,15) # outcome <- gl(5,2,10) # treatment <- gl(2,5) # model_fit <- stan_glm(counts ~ outcome + treatment, # family = poisson(link="log"), # prior = normal(0, 1), prior_intercept = normal(0, 5)) # new_data <- data.frame(treatment = factor(rep(1,5)), outcome = factor(1:5)) # Draw from the posterior predictive distribution of the outcome. # ytilde <- posterior_predict(model_fit, new_data, draws = 500) # print(dim(ytilde)) # 500 by 5 matrix (draws by nrow(new_data)) # ytilde <- data.frame(count = c(ytilde), # outcome = rep(new_data$outcome, each = 500)) # ggplot2::ggplot(ytilde, ggplot2::aes(x=outcome, y=count)) + # ggplot2::geom_boxplot() + # ggplot2::ylab("predicted count") # ytilde <- posterior_predict(model_fit, draws = 500) # bayesplot::color_scheme_set("brightblue") # bayesplot::ppc_dens_overlay(counts, ytilde[1:100, ]) # Using the CARS data (mpg ~ wt) mtcars2 <- mtcars # dim(mtcars2) # [1] 32(Automakers) 11(Features) # mtcars2$log_mpg <- log(mtcars2$mpg) # Define outcome of interest model_fit <- stan_glm(mpg ~ wt, data = mtcars2) # Get the posterior predictive distribution as a matrix: # D(number of draws posterior predictive distribution)=500 by # N(number of data points being predicted per draw) ytilde <- posterior_predict(model_fit, draws = 500) # dim(ytilde) # Outcome=mpg, [1] 500(MCMC draws) 32(automakers) head(ytilde) bayesplot::color_scheme_set("brightblue") bayesplot::ppc_dens_overlay(mtcars2$mpg, ytilde[1:100, ]) bayesplot::ppc_hist(mtcars2$mpg, ytilde[1:100, ]) # Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws. # The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*. # The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg) bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "mean") bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "sd") #bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "range") bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "min") bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "max") # 2D (mean, sd) plot bayesplot::ppc_stat_2d(mtcars2$mpg, ytilde[1:100, ], stat=c("mean", "sd"), size=2.5, alpha=0.7) ``` ### Example of spacekime-analytics as a Bayesian Inference problem (using the fMRI dataset) Recall that we have: * Original data `ift_ft_xC_fMRI` * IFT of the data using C-Phases `ift_ft_xC_fMRI_1sampleN_PhaseC` for all samples of $N=30$ * IFT of the data using B-Phases `ift_ft_xC_fMRI_1sampleN_PhaseB` for all samples of $N=30$, and * IFT of the data using D-Phases `ift_ft_xC_fMRI_1sampleN_PhaseD` for all samples of $N=30$ ```{r} #install.packages("rstanarm") #library("rstanarm") # xC_fMRI_scaled <- scale(xC_fMRI)[ , 1] # length(ift_ft_xC_fMRI) # [1] 8000 # dim(ift_ft_xC_fMRI_1sampleN_PhaseC) # [1] 8000 30 # dim(ift_ft_xC_fMRI_1sampleN_PhaseB) # dim(ift_ft_xC_fMRI_1sampleN_PhaseD) # Get the posterior predictive distribution as a matrix: # D(number of draws posterior predictive distribution)=30 by # N(number of data time points being predicted per draw)=8000 # Inspect the first few spacekime reconstructed draws: Mind the transposition of the tensor! # t(ift_ft_xC_fMRI_1sampleN_PhaseC)[ , 1:7] # 30(draws) 7(first timepoints ony) # the spacekime estimates represent the posterior predictive distribution # ytilde <- t(ift_ft_xC_fMRI_1sampleN_PhaseC + mean(xC_fMRI)) # all 8000 timepoints (columns) # Center/Offset the spacekime estimates to the center of the original data (xC_fMRI) ytilde <- rbind(t(ift_ft_xC_fMRI_1sampleN_PhaseC+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseC,2,mean)), t(ift_ft_xC_fMRI_1sampleN_PhaseB+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseB,2,mean)), t(ift_ft_xC_fMRI_1sampleN_PhaseD+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseD,2,mean))) # dim(ytilde) # Outcome=bimodal process (xC_fMRI), [1] 30(MCMC draws) 8000(timepoints) bayesplot::color_scheme_set("brightblue") ``` ```{r} bayesplot::ppc_dens_overlay(ift_ft_xC_fMRI, ytilde) + xlim(10300,10800) + ylim(0,0.01) ``` ```{r} bayesplot::ppc_hist(ift_ft_xC_fMRI, ytilde[1:11, ]) # Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws. # The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*. # The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg) bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "mean") bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "sd") ``` ```{r} bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = IQR) ``` ```{r} bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "min") ``` ```{r} bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "max") ``` ```{r} # bayesplot::ppc_stat(xC_fMRI_scaled, ytilde, stat = e1071::skewness) # bayesplot::ppc_stat(xC_fMRI_scaled, ytilde, stat = e1071::kurtosis) # 2D (mean, sd) plot bayesplot::ppc_stat_2d(ift_ft_xC_fMRI, ytilde, stat=c("mean", "sd"), size=2.5, alpha=0.7) ``` ### Example of spacekime-analytics as a Bayesian Inference problem (using the `bimodal` simulated dataset) Recall that we have: * Original data `xC` * IFT of the data using C-Phases `ift_ft_xC_1sampleN_PhaseC` for all samples of $N=30$ * IFT of the data using B-Phases `ift_ft_xC_1sampleN_PhaseB` for all samples of $N=30$, and * IFT of the data using D-Phases `ift_ft_xC_1sampleN_PhaseD` for all samples of $N=30$ ```{r} # xC_scaled <- scale(xC)[ , 1] # length(xC) # [1] 8000 # dim(ift_ft_xC_1sampleN_PhaseC) # [1] 8000 30 # dim(ift_ft_xC_1sampleN_PhaseB) # dim(ift_ft_xC_1sampleN_PhaseD) # Get the posterior predictive distribution as a matrix: # D(number of draws posterior predictive distribution)=30 by # N(number of data time points being predicted per draw)=8000 # Inspect the first few spacekime reconstructed draws: Mind the transposition of the tensor! # t(ift_ft_xC_1sampleN_PhaseC)[ , 1:7] # 30(draws) 7(first timepoints ony) # the spacekime estimates represent the posterior predictive distribution # ytilde <- t(ift_ft_xC_1sampleN_PhaseC + mean(xC)) # all 8000 timepoints (columns) # Center/Offset the spacekime estimates to the center of the original data (xC) ytilde <- rbind(t(ift_ft_xC_1sampleN_PhaseC+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseC,2,mean)), t(ift_ft_xC_1sampleN_PhaseB+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseB,2,mean)), t(ift_ft_xC_1sampleN_PhaseD+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseD,2,mean))) # dim(ytilde) # Outcome=bimodal process (xC), [1] 30(MCMC draws) 8000(timepoints) bayesplot::color_scheme_set("brightblue") ``` Figure 5.5 ```{r warning=FALSE} bayesplot::ppc_dens_overlay(xC, ytilde) + xlim(-22,25) + ylim(0,0.3) ``` ```{r} bayesplot::ppc_hist(xC, ytilde[1:11, ]) # Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws. # The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*. # The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg) bayesplot::ppc_stat(xC, ytilde, stat = "mean") bayesplot::ppc_stat(xC, ytilde, stat = "sd") ``` ```{r} bayesplot::ppc_stat(xC, ytilde, stat = IQR) ``` ```{r} bayesplot::ppc_stat(xC, ytilde, stat = "min") ``` ```{r} bayesplot::ppc_stat(xC, ytilde, stat = "max") ``` ```{r} # bayesplot::ppc_stat(xC_scaled, ytilde, stat = e1071::skewness) # bayesplot::ppc_stat(xC_scaled, ytilde, stat = e1071::kurtosis) # 2D (mean, sd) plot bayesplot::ppc_stat_2d(xC, ytilde, stat=c("mean", "sd"), size=2.5, alpha=0.7) ``` For the [MCSI data analytics](https://data.sca.isr.umich.edu/), see the separate script: `UM_Michigan_Consumer_Sentiment_Index.R`.