SOCR ≫ | TCIU Website ≫ | TCIU GitHub ≫ |
For all natural spacetime processes, various population characteristics like the mean, variance, range, and quantiles can be estimated by collecting independent and identically distributed (IID) samples. These samples represent observed data that is traditionally used to obtain sample-driven estimates of the specific population characteristics via standard formulas like the sample arithmetic average, variance, range, quantiles, etc. The latter approximate their population counterparts and form the basis for classical parametric and non-parametric statistical inference.
Typically, reliable spacetime statistical inference is conditional on the distribution of the native process as well as a sample-size reflecting the characteristics of the phenomenon. We will demonstrate that spacekime analytics can be equally effective with measuring a single spacetime observation and having a reasonable estimates of the unobserved process kime-phases.
Without loss of generality, suppose we have a pair of cohorts \(A\) and \(B\) and we obtain a series of measurements \(\{X_{A,i}\}_{i=1}^{n_A}\) and \(\{X_{B,i}\}_{i=1}^{n_B}\), respectively. Obviously the relations between the cohorts could widely vary, from being samples of the same process, to being related or completely independent.
To allow us to examine the extreme cases of pairing (1) IID cohorts (\(A\) and \(B\)), and (2) independent but differently distributed cohorts (\(A\) and \(C\)). The latter case may be thought of as a split of the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) groups. This design allows us to compare the classical spacetime-derived population characteristics of cohort \(A\) to their spacekime-reconstructed counterparts obtained using a single random kime-magnitude observation from \(A\) and kime-phases estimates derived from cohorts \(B\), \(C\) or \(D\).
The demonstration below is based on a functional magnetic resonance imaging (fMRI) data, which is a 4D hypervolume with intensities representing the blood oxygenation level dependence at a specific spacetime location \((x,y,z,t)\). For simplicity, we will only focus on two fixed spatial locations with varying intensity distributions.
library(EBImage)
require(brainR)
library(spatstat)
library(ggplot2)
library(kSamples)
library(reshape2)
library(beanplot)
library(rstanarm)
fMRIURL <- "http://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz"
fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz")
download.file(fMRIURL, dest=fMRIFile, quiet=TRUE)
(fMRIVolume <- readNIfTI(fMRIFile, reorient=FALSE))
## NIfTI-1 format
## Type : nifti
## Data Type : 4 (INT16)
## Bits per Pixel : 16
## Slice Code : 0 (Unknown)
## Intent Code : 0 (None)
## Qform Code : 1 (Scanner_Anat)
## Sform Code : 0 (Unknown)
## Dimension : 64 x 64 x 21 x 180
## Pixel Dimension : 4 x 4 x 6 x 3
## Voxel Units : mm
## Time Units : sec
# dimensions: 64 x 64 x 21 x 180 ; 4mm x 4mm x 6mm x 3 sec
fMRIVolDims <- dim(fMRIVolume); fMRIVolDims
## [1] 64 64 21 180
# time_dim <- fMRIVolDims[4]; time_dim ## 180
# 2. extract the time-corse of 1D mid-axial slice (3D) hypervolume
xA_fMRI_1D_x20_y20_z11 <- fMRIVolume[20, 20, 11, ]; length(xA_fMRI_1D_x20_y20_z11) # 180
## [1] 180
## [1] 180
# hist(xB_fMRI_1D_x30_y30_z13)
# Now, combine your two 1D timeseries into one dataframe for joint hist plotting as densities.
# First make a new column in each that will be
# a variable to identify where they came from later.
xA_df <- as.data.frame(xA_fMRI_1D_x20_y20_z11)
colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
xB_df <- as.data.frame(xB_fMRI_1D_x30_y30_z13)
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"
# and combine into your new data frame vegLengths
xA_xB_df <- rbind(xA_df, xB_df)
ggplot(xA_xB_df, aes(value, fill = cohort)) +
geom_density(alpha = 0.5, size=1.2) +
theme(text = element_text(size=20)) +
xlim(c(10200, 12000))
Clearly the intensities of cohorts \(A\) and \(B\) are independent and follow different distribution. We’ll split the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) subgroups. Then we will:
# Generic function to Transform Data to/from k-space (Space/Fourier domain)
kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) {
# ForwardFT (rawData, FALSE, NULL)
# InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL)
FT_data <- array(complex(), length(data))
mag_FT_data <- array(complex(), length(data))
phase_FT_data <- array(complex(), length(data))
IFT_reconPhases_data <- array(complex(), length(data))
if (inverse == FALSE | is.null(reconPhases)) {
FT_data <- fft(data, inverse)
X2 <- FT_data
mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2)
phase_FT_data <- atan2(Im(X2), Re(X2))
}
else { # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators
Real <- data * cos(reconPhases)
Imaginary <- data * sin(reconPhases)
IFT_reconPhases_data <-
Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data))
}
######### Test the FT-IFT analysis-synthesis back-and-forth transform process
# to confirm calculations
# X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2);
# phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2));
# Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1])
# Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1])
# man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
# ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!")
#########
if (inverse == FALSE | is.null(reconPhases)) {
return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data))
# Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases
}
else {
return(IFT_reconPhases_data)
# Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data
}
}
# 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups.
subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8))
# 80% training + 20% testing
xC_fMRI_train <- xA_df$value [subset_int]; length(xC_fMRI_train) # 144
## [1] 144
## [1] 36
# 2. Transform all four cohorts into Fourier k-space
# xA, xB, xC_fMRI_train; xD_test
xA <- xA_fMRI_1D_x20_y20_z11; length(xA) # 180
## [1] 180
## [1] 180
ft_xA <- fft(xA); ft_xB <- fft(xB)
ft_xC_fMRI_train <- fft(xC_fMRI_train); ft_xD_test <- fft(xD_test);
# Magnitudes and Phases: Phase <- atan(Im(img_ff)/Re(img_ff))
mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2)
mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2)
mag_ft_xC_fMRI_train <- sqrt(Re(ft_xC_fMRI_train)^2+Im(ft_xC_fMRI_train)^2)
mag_ft_xD_test <- sqrt(Re(ft_xD_test)^2+Im(ft_xD_test)^2)
phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA))
phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB))
phase_ft_xC_fMRI_train <- atan2(Im(ft_xC_fMRI_train), Re(ft_xC_fMRI_train))
phase_ft_xD_test <- atan2(Im(ft_xD_test), Re(ft_xD_test))
# Double-Check FT-IFT==I ImplicitlyInvert the FT (IFT)
fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) }
head(Re(fftinv(ft_xA))); head(xA)
## [1] 10620 10483 10519 10661 10598 10780
## [1] 10620 10483 10519 10661 10598 10780
# 3. Iteratively randomly sample single observations from cohort $C$,
N <- 30 # to 30 simulations
# take a random sample of size N (without replacement) from $C$
N_sampleIndx <- sample(1:length(xC_fMRI_train), N, replace=FALSE)
xC_fMRI_sampleN <- xC_fMRI_train[N_sampleIndx]
ft_xC_fMRI_sampleN_mag <- mag_ft_xC_fMRI_train[N_sampleIndx]
# 4. reconstruct the $C$ data into spacetime using a single ft_xC_fMRI_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$
# for each ft_xC_fMRI_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_fMRI_sampleN_PhaseB ift_ft_xC_fMRI_sampleN_PhaseD
ift_ft_xC_fMRI_1sampleN_PhaseB <-
array(dim=c(length(xC_fMRI_train), length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI_1sampleN_PhaseD <-
array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI_1sampleN_PhaseC <-
array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI <- array(dim=length(xC_fMRI_train))
dim(ift_ft_xC_fMRI_1sampleN_PhaseB) # [1] Time=144 Samples_N=30
## [1] 144 30
for (i in 1:N) {
ift_ft_xC_fMRI_1sampleN_PhaseB[ , i] <-
Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)),
TRUE, phase_ft_xB[1:length(xC_fMRI_train)]))
ift_ft_xC_fMRI_1sampleN_PhaseD[ , i] <-
Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i],
length(phase_ft_xD_test)), TRUE,
phase_ft_xD_test[1:length(phase_ft_xD_test)]))
ift_ft_xC_fMRI_1sampleN_PhaseC[ , i] <-
Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)),
TRUE, phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)]))
}
ift_ft_xC_fMRI <- Re(kSpaceTransform(mag_ft_xC_fMRI_train, TRUE,
phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)]))
head(xC_fMRI_train) == head(ift_ft_xC_fMRI)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
# 5. compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases.
# Data = xC_fMRI_train, ift_ft_xC_fMRI_1sampleN_PhaseB, ift_ft_xC_fMRI_1sampleN_PhaseD
length(xC_fMRI_train) == length(ift_ft_xC_fMRI_1sampleN_PhaseB[ , 1])
## [1] TRUE
## V1
## Min. :-2.27528
## 1st Qu.:-0.72802
## Median :-0.01324
## Mean : 0.00000
## 3rd Qu.: 0.69867
## Max. : 3.53484
## V1
## Min. :-2.3609
## 1st Qu.:-0.6187
## Median : 0.0327
## Mean : 0.0000
## 3rd Qu.: 0.7080
## Max. : 3.4877
## V1
## Min. :-3.32577
## 1st Qu.:-0.56299
## Median :-0.03926
## Mean : 0.00000
## 3rd Qu.: 0.60848
## Max. : 2.56786
## V1
## Min. :-2.72484
## 1st Qu.:-0.62653
## Median : 0.04799
## Mean : 0.00000
## 3rd Qu.: 0.68595
## Max. : 2.43560
# Plot all histograms as densities
ift_ft_xC_fMRI_1sampleN_PhaseC_df <-
as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC))
# colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
# xA_scale_df <- as.data.frame(scale(xA_df$value))
# colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA"
xB_df <- as.data.frame(scale(xB_fMRI_1D_x30_y30_z13))
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"
#
# and combine into your new data frame Lengths
xA_xB_df <- rbind(xA_df, xB_df)
ggplot(xA_xB_df, aes(value, fill = cohort)) +
geom_density(alpha = 0.5, size=1.2) +
theme(text = element_text(size=20)) +
xlim(c(10200, 12000))
## Warning: Removed 180 rows containing non-finite values (stat_density).
## [1] 144
## [1] 144 30
## [1] 144 30
## [1] 144 30
# Compute the averages accross all N=30 experiments for the B, C & D reconstructions
ift_ft_xC_fMRI_1sampleN_PhaseB_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseB, 1, mean)
ift_ft_xC_fMRI_1sampleN_PhaseC_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseC, 1, mean)
ift_ft_xC_fMRI_1sampleN_PhaseD_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseD, 1, mean)
# Plot 4 density curves (orig=xC_fMRI and 3 reconstructions from B, C and D)
xC_fMRI_train_scale_df <- as.data.frame(scale(xC_fMRI_train))
colnames(xC_fMRI_train_scale_df) <- "value"; xC_fMRI_train_scale_df$series <- "xC_fMRI_original"
ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseB_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB"
ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC"
ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseD_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD"
# and combine into your new data frame vegLengths
xC_fMRI_SK_Phases_B_C_D_df <- rbind(xC_fMRI_train_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df,
ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df)
# library(ggplot2)
ggplot(xC_fMRI_SK_Phases_B_C_D_df, aes(value, fill = series)) +
geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) + # position = "stack"
theme(text = element_text(size=20)) +
scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
geom_line(data=xC_fMRI_train_scale_df, stat = "density", color="purple", lty=4, lwd=2) +
## guides(color = guide_legend(order=1)) +
theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank())
ggplot(xC_fMRI_SK_Phases_B_C_D_df,aes(x=series, y=value, fill=series)) +
geom_violin(trim=FALSE) +
geom_boxplot(width=0.1) +
theme_bw()
The following simulation example generates two mixture distribution random samples each of \(n=10,000\) observations, \(\{X_{A,i}\}_{i=1}^{n_A}\), where \(X_{A,i} = 0.3U_i + 0.7V_i\), \(U_i \sim N(0,1)\) and \(V_i \sim N(5,3)\), and \(\{X_{B,i}\}_{i=1}^{n_B}\), where \(X_{B,i} = 0.4P_i + 0.6Q_i\), \(P_i \sim N(20,20)\) and \(Q_i \sim N(100,30)\).
n=10000
mu1 <- 0; mu2 <- 5
sig1 <- 1; sig2 <- 3
weight <- 0.7
mixedDistFunc <- function (n, weight, mu1, mu2, sig1, sig2) {
set.seed(1234); U <- rnorm(n, mean=mu1, sd = sig1)
set.seed(1234); V <- rnorm(n,mean=mu2, sd = sig2)
# randomly choose U or V
set.seed(1234); wght <- rbinom(n, size=1, prob=weight)
X <- U*(1 - wght) + V*wght
}
xA <- mixedDistFunc(n=n, weight, mu1, mu2, sig1, sig2)
hist(xA, freq = F)
## [1] 10000
## [1] 10000
# Now, combine your two univariate sets into one dataframe for joint hist plotting as densities.
# First make a new column in each that will be
# a variable to identify where they came from later.
xA_df <- as.data.frame(xA); colnames(xA_df)<-"value"; xA_df$cohort<-"xA"
xB_df <- as.data.frame(xB); colnames(xB_df)<-"value"; xB_df$cohort<-"xB"
# and combine into your new data frame vegLengths
xA_xB_df <- rbind(xA_df, xB_df)
Figure 4.1
ggplot(xA_xB_df, aes(value, fill = cohort)) +
geom_density(alpha = 0.5, size=1.2) +
theme(text = element_text(size=20))
## quartz_off_screen
## 2
Clearly the intensities of cohorts \(A\) and \(B\) are independent and follow different distributions. We’ll split the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) subgroups, and then:
# Generic function to Transform 1D Data to/from k-space (Space/Fourier domain)
kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) {
# ForwardFT (rawData, FALSE, NULL)
# InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL)
FT_data <- array(complex(), length(data))
mag_FT_data <- array(complex(), length(data))
phase_FT_data <- array(complex(), length(data))
IFT_reconPhases_data <- array(complex(), length(data))
if (inverse == FALSE | is.null(reconPhases)) {
FT_data <- fft(data, inverse)
X2 <- FT_data
mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2)
phase_FT_data <- atan2(Im(X2), Re(X2))
}
else { # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators
Real <- data * cos(reconPhases)
Imaginary <- data * sin(reconPhases)
IFT_reconPhases_data <-
Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data))
}
######### Test the FT-IFT analysis-synthesis back-and-forth transform process
# to confirm calculations
# X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2);
# phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2));
# Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1])
# Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1])
# man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
# ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!")
#########
if (inverse == FALSE | is.null(reconPhases)) {
return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data))
# Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases
}
else {
return(IFT_reconPhases_data)
# Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data
}
}
# 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups.
subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8))
# 80% training + 20% testing
xC <- xA_df$value [subset_int]; length(xC) # 8000
## [1] 8000
## [1] 2000
# 2. Transform all four cohorts into Fourier k-space
# xA, xB, xC (train), xD (test)
ft_xA <- fft(xA); ft_xB <- fft(xB)
ft_xC <- fft(xC); ft_xD <- fft(xD)
# Magnitudes and Phases: Phase <- atan(Im(img_ff)/Re(img_ff))
mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2)
mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2)
mag_ft_xC <- sqrt(Re(ft_xC)^2+Im(ft_xC)^2)
mag_ft_xD <- sqrt(Re(ft_xD)^2+Im(ft_xD)^2)
phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA))
phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB))
phase_ft_xC <- atan2(Im(ft_xC), Re(ft_xC))
phase_ft_xD <- atan2(Im(ft_xD), Re(ft_xD))
# Double-Check FT-IFT==I ImplicitlyInvert the FT (IFT)
fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) }
head(Re(fftinv(ft_xA))); head(xA)
## [1] 1.3788028 5.8322877 8.2533235 -2.0370931 0.4291247 6.5181677
## [1] 1.3788028 5.8322877 8.2533235 -2.0370931 0.4291247 6.5181677
# 3. Iteratively randomly sample single observations from cohort $C$,
N <- 30 # 30 simulations
# take a random sample of size N (without replacement) from $C$
set.seed(1234); N_sampleIndx <- sample(1:length(xC), N, replace=FALSE)
xC_sampleN <- xC[N_sampleIndx]
ft_xC_mag <- mag_ft_xC[N_sampleIndx]
# 4. reconstruct the $C$ data into spacetime using a single ft_xC_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$
# for each ft_xC_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_sampleN_PhaseB ift_ft_xC_sampleN_PhaseD
ift_ft_xC_1sampleN_PhaseB <-
array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC_1sampleN_PhaseD <-
array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC_1sampleN_PhaseC <-
array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC <- array(dim=length(xC))
dim(ift_ft_xC_1sampleN_PhaseB) # [1] Size=8000 Samples_N=30
## [1] 8000 30
for (i in 1:N) {
ift_ft_xC_1sampleN_PhaseB[ , i] <-
Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)),
TRUE, phase_ft_xB[1:length(xC)]))
ift_ft_xC_1sampleN_PhaseD[ , i] <-
Re(kSpaceTransform(rep(ft_xC_mag[i], length(phase_ft_xD)),
TRUE, phase_ft_xD[1:length(phase_ft_xD)]))
ift_ft_xC_1sampleN_PhaseC[ , i] <-
Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)),
TRUE, phase_ft_xC[1:length(xC)]))
}
ift_ft_xC <- Re(kSpaceTransform(mag_ft_xC, TRUE,phase_ft_xC[1:length(xC)]))
head(xC) - head(ift_ft_xC) # roundoff should be ~ 0
## [1] -2.664535e-15 -1.554312e-15 -4.440892e-16 -5.773160e-15 0.000000e+00
## [6] -8.881784e-16
# 5. compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases.
# Data = xC_train, ift_ft_xC_1sampleN_PhaseB, ift_ft_xC_1sampleN_PhaseD
length(xC) == length(ift_ft_xC_1sampleN_PhaseB[ , 1])
## [1] TRUE
## V1
## Min. :-2.38784
## 1st Qu.:-0.88609
## Median :-0.03893
## Mean : 0.00000
## 3rd Qu.: 0.75821
## Max. : 3.59925
## V1
## Min. :-2.52901
## 1st Qu.:-0.76221
## Median :-0.05584
## Mean : 0.00000
## 3rd Qu.: 0.72999
## Max. : 3.73114
## V1
## Min. :-3.798440
## 1st Qu.:-0.636799
## Median : 0.009279
## Mean : 0.000000
## 3rd Qu.: 0.645119
## Max. : 3.986702
## V1
## Min. :-2.66007
## 1st Qu.:-0.79651
## Median :-0.08165
## Mean : 0.00000
## 3rd Qu.: 0.73477
## Max. : 3.39448
# Plot all histograms as densities
ift_ft_xC_1sampleN_PhaseC_df <-
as.data.frame(scale(ift_ft_xC_1sampleN_PhaseC))
# colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
# xA_scale_df <- as.data.frame(scale(xA_df$value))
# colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA"
xB_df <- as.data.frame(scale(xB))
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"
#
# and combine into your new data frame Lengths
xA_xB_df <- rbind(xA_df, xB_df)
ggplot(xA_xB_df, aes(value, fill = cohort)) +
geom_density(alpha = 0.5, size=1.2) +
theme(text = element_text(size=20))
## [1] 8000
## [1] 8000 30
## [1] 8000 30
## [1] 8000 30
# Compute the averages accross all N=30 experiments for the B, C & D spacekime (IFT) reconstructions
ift_ft_xC_1sampleN_PhaseB_avg <- apply(ift_ft_xC_1sampleN_PhaseB, 1, mean)
ift_ft_xC_1sampleN_PhaseC_avg <- apply(ift_ft_xC_1sampleN_PhaseC, 1, mean)
ift_ft_xC_1sampleN_PhaseD_avg <- apply(ift_ft_xC_1sampleN_PhaseD, 1, mean)
# Plot 4 density curves (orig=xC and 3 reconstructions from B, C and D)
xC_scale_df <- as.data.frame(xC)
colnames(xC_scale_df) <- "value"; xC_scale_df$series <- "xC_original"
ift_ft_xC_1sampleN_PhaseB_avg_scale_df <-
as.data.frame(ift_ft_xC_1sampleN_PhaseB_avg)
colnames(ift_ft_xC_1sampleN_PhaseB_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB"
ift_ft_xC_1sampleN_PhaseC_avg_scale_df <-
as.data.frame(ift_ft_xC_1sampleN_PhaseC_avg)
colnames(ift_ft_xC_1sampleN_PhaseC_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC"
ift_ft_xC_1sampleN_PhaseD_avg_scale_df <-
as.data.frame(ift_ft_xC_1sampleN_PhaseD_avg)
colnames(ift_ft_xC_1sampleN_PhaseD_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD"
# Combine into a new data frame xC_SK_Phases_B_C_D_df
xC_SK_Phases_B_C_D_df <- rbind(xC_scale_df, ift_ft_xC_1sampleN_PhaseB_avg_scale_df,
ift_ft_xC_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_1sampleN_PhaseD_avg_scale_df)
# library(ggplot2)
ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) +
geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) + # position = "stack"
theme(text = element_text(size=20)) +
scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
geom_line(data=xC_scale_df, stat = "density", color="purple", lty=4, lwd=2) +
## guides(color = guide_legend(order=1)) +
theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank()) # +
# xlim(c(-1, 1))
# theme(legend.position="bottom")
#ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) +
# theme(text = element_text(size=20)) +
# scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
# geom_line(stat = "density", lty=4, lwd=2) +
# ## guides(color = guide_legend(order=1)) +
# theme(axis.title.x=element_blank(),axis.text.x=element_blank(), #axis.ticks.x=element_blank())
ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) +
geom_line(aes(colour=series, lty=series), stat = "density", lwd=1.5) +
theme(axis.title.x=element_blank(),axis.text.x=element_blank(),
axis.ticks.x=element_blank())
# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseC_avg (reconstructed) CORRELATED?
cor(xC, ift_ft_xC_1sampleN_PhaseC_avg) # [1] 0.8872053
## [1] 0.8844888
# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseD_avg (reconstructed) CORRELATED?
cor(xC, ift_ft_xC_1sampleN_PhaseD_avg) # [1] 0.005248561
## [1] 0.01011428
# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseB_avg (reconstructed) CORRELATED?
cor(xC, ift_ft_xC_1sampleN_PhaseB_avg) # [1] -0.001070121
## [1] 0.001782234
Figure 4.2
# Plot xC vs. ift_ft_xC_1sampleN_PhaseC_avg
plot(xC, ift_ft_xC_1sampleN_PhaseC_avg, xlab = "Original", ylab = "Reconstructed",
main = "Spacekime signal reconstruction using \n a single spacetime observation and perfect kime-phases",
cex=1.4)
abline(lm(ift_ft_xC_1sampleN_PhaseC_avg ~ xC), col="red", lwd=2)
text(0, 9, sprintf("Corr(Orig, Rec)=%s",
format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2)))
## quartz_off_screen
## 2
#Additional quantitative measures quantifying the differences between distribution (original signal and spacekime reconstructions) include two-sample Kolmogorov-Smirnov (KS) test and the correlation coefficient. The KS test statistic (D) is the maximum distance between the estimated cumulative distribution functions and the corresponding p-value is the probability of seeing a test statistic as high or higher than the one observed given that the two samples were drawn from the same distribution. In our case, comparing the distributions of the original data and its reconstruction using a single kime magnitude and the correct kime-phases yields a KS statistics D = 0.053875, and p_value= 1.647?e^(-10). This suggests very strong statistical evidence (due to the large sample size) and marginal practical difference between the real and reconstructed signals. Comparing a pair of reconstructions using a single kime-magnitude value and two independent kime-phase estimates (cohorts C and D) yields D = 0.017375, and p_value= 0.1786.
ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))
##
## Two-sample Kolmogorov-Smirnov test
##
## data: scale(xC) and scale(ift_ft_xC_1sampleN_PhaseC_avg)
## D = 0.05775, p-value = 5.174e-12
## alternative hypothesis: two-sided
# D = 0.053875, p-value = 1.647e-10, alternative hypothesis: two-sided
ks.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(ift_ft_xC_1sampleN_PhaseD_avg))
## Warning in ks.test(scale(ift_ft_xC_1sampleN_PhaseC_avg),
## scale(ift_ft_xC_1sampleN_PhaseD_avg)): p-value will be approximate in the
## presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: scale(ift_ft_xC_1sampleN_PhaseC_avg) and scale(ift_ft_xC_1sampleN_PhaseD_avg)
## D = 0.014625, p-value = 0.3592
## alternative hypothesis: two-sided
# D = 0.017375, p-value = 0.1786, alternative hypothesis: two-sided
# wilcox.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg)) # insignificant because of scaling
#library("goftest")
#null_distribution <- function (x) {
# if (!is.na(approxfun(density(scale(xC)))(x))) {
# return (approxfun(density(scale(xC)))(x))
# } else { return (0) }
#}
# Cramer-Von Mises Test of Goodness-of-Fit
#cvm.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) # omega2 = 1854.8, p-value = 0.2955
# Anderson-Darling Test of Goodness-of-Fit
#ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) #
# Anderson-Darling Test of Goodness-of-Fit
# install.packages("kSamples")
ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)) # T.AD=19.18; asympt. P-value=8.578e-09
##
##
## Anderson-Darling k-sample test.
##
## Number of samples: 2
## Sample sizes: 8000, 8000
## Number of ties: 0
##
## Mean of Anderson-Darling Criterion: 1
## Standard deviation of Anderson-Darling Criterion: 0.76131
##
## T.AD = ( Anderson-Darling Criterion - mean)/sigma
##
## Null Hypothesis: All samples come from a common population.
##
## AD T.AD asympt. P-value
## version 1: 16.18 19.93 4.168e-09
## version 2: 16.20 19.94 4.291e-09
Figure 4.3
# PLOT
# windows(width=10, height=8)
# quartz(width=10, height=8)
plot(density(scale(xC)), col="black", lty=1, lwd=3, xaxt="n", xlab = "Range",
main="Spacetime Original vs. Spacekime (SK) Reconstructed Distributions")
lines(density(scale(ift_ft_xC_1sampleN_PhaseC_avg)), col="green", lwd=2, lty=2)
lines(density(scale(ift_ft_xC_1sampleN_PhaseD_avg)), col="blue", lwd=2, lty=10)
lines(density(scale(ift_ft_xC_1sampleN_PhaseB_avg)), col="red", lwd=2, lty=20)
legend("topright", legend=c("Original (Mixture of N(0,1) & N(5,3))", "SK Synthesis (1 Mag, Phase=True)",
"SK Synthesis (1 Mag, Phase=Indep)", "SK Synthesis (1 Mag, Phase=Diff.Proc.)",
paste0(sprintf("\nKS.test(Original, SK(1Mag,Phase=True)), D=%s, p=%s\n",
format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$statistic[[1]], digits=2),
format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)),
sprintf("KS.test(SK(1Mag,Phase=Indep), SK(1Mag,Phase=True)), p=%s\n",
format(ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg),
scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)),
sprintf("Corr(Original, SK(1Mag,Phase=True))=%s",
format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2)))),
col=c("black", "green", "blue", "red", "black"), lty=c(1, 2, 10, 20, 0), lwd=2, bty = "n", cex=0.75,
y.intersp=0.0, title="Scaled Densities")
## Warning in ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg),
## scale(ift_ft_xC_1sampleN_PhaseC_avg)): p-value will be approximate in the
## presence of ties
## Warning in ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg),
## scale(ift_ft_xC_1sampleN_PhaseC_avg)): p-value will be approximate in the
## presence of ties
## quartz_off_screen
## 2
# Skewness
e1071::skewness(xC); e1071::skewness(ift_ft_xC_1sampleN_PhaseC_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseB_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseD_avg)
## [1] 0.359816
## [1] 0.2563408
## [1] 0.001021943
## [1] 0.2446415
# Kurtosis
e1071::kurtosis(xC); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseC_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseB_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseD_avg)
## [1] -0.6270681
## [1] -0.3619818
## [1] 0.2149918
## [1] -0.5228285
raw_data <-
data.frame(Original=scale(xC),
Phase_True=scale(ift_ft_xC_1sampleN_PhaseB_avg),
Phase_Diff.Proc=scale(ift_ft_xC_1sampleN_PhaseC_avg),
Phase_Indep=scale(ift_ft_xC_1sampleN_PhaseD_avg))
## No id variables; using all as measure variables
ggplot(melted_data,aes(x=variable, y=value, fill=variable)) +
geom_violin(trim=FALSE) +
geom_boxplot(width=0.1) +
theme_bw()
Here we relate to Bayesian inference
to Spacekime analytics
based on a single (cohort (\(A\)) spacetime observation \(x_{i_o}\) and some prior kime-phases
(obtained from cohorts \(A\), \(B\), or \(C\)). This is accomplished by computing the prior
or posterior predictive distribution
.
cars
dataset)#install.packages("rstanarm")
#library("rstanarm")
# Docs: https://rdrr.io/cran/rstanarm/man/posterior_predict.stanreg.html
# 1. Canned example
# if (!exists("example_model")) example(example_model)
# yrep <- posterior_predict(example_model)
# table(yrep)
# 2. Example using sample data (n=10): counts ~ outcome + treatment
# counts <- c(18,17,15,20,10,20,25,13,12,15)
# outcome <- gl(5,2,10)
# treatment <- gl(2,5)
# model_fit <- stan_glm(counts ~ outcome + treatment,
# family = poisson(link="log"),
# prior = normal(0, 1), prior_intercept = normal(0, 5))
# new_data <- data.frame(treatment = factor(rep(1,5)), outcome = factor(1:5))
# Draw from the posterior predictive distribution of the outcome.
# ytilde <- posterior_predict(model_fit, new_data, draws = 500)
# print(dim(ytilde)) # 500 by 5 matrix (draws by nrow(new_data))
# ytilde <- data.frame(count = c(ytilde),
# outcome = rep(new_data$outcome, each = 500))
# ggplot2::ggplot(ytilde, ggplot2::aes(x=outcome, y=count)) +
# ggplot2::geom_boxplot() +
# ggplot2::ylab("predicted count")
# ytilde <- posterior_predict(model_fit, draws = 500)
# bayesplot::color_scheme_set("brightblue")
# bayesplot::ppc_dens_overlay(counts, ytilde[1:100, ])
# Using the CARS data (mpg ~ wt)
mtcars2 <- mtcars # dim(mtcars2) # [1] 32(Automakers) 11(Features)
# mtcars2$log_mpg <- log(mtcars2$mpg) # Define outcome of interest
model_fit <- stan_glm(mpg ~ wt, data = mtcars2)
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 8.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.87 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.057879 seconds (Warm-up)
## Chain 1: 0.049047 seconds (Sampling)
## Chain 1: 0.106926 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 1.2e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.06328 seconds (Warm-up)
## Chain 2: 0.043701 seconds (Sampling)
## Chain 2: 0.106981 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 1.3e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.052791 seconds (Warm-up)
## Chain 3: 0.051304 seconds (Sampling)
## Chain 3: 0.104095 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.053378 seconds (Warm-up)
## Chain 4: 0.050564 seconds (Sampling)
## Chain 4: 0.103942 seconds (Total)
## Chain 4:
# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=500 by
# N(number of data points being predicted per draw)
ytilde <- posterior_predict(model_fit, draws = 500)
dim(ytilde) # Outcome=mpg, [1] 500(MCMC draws) 32(automakers)
## [1] 500 32
## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
## [1,] 27.83336 18.09879 20.46724 14.94926 21.22127
## [2,] 24.77289 16.98112 23.68493 20.31273 22.80704
## [3,] 23.77452 19.55476 21.77243 20.91546 26.30971
## [4,] 26.54254 26.26840 24.77963 18.60093 15.30545
## [5,] 19.92686 21.62055 17.91423 18.02250 21.09841
## [6,] 21.90781 26.91936 29.17032 22.04833 20.19422
## Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE
## [1,] 16.04502 11.63372 10.66559 21.98524 18.77021 16.82558 20.66930
## [2,] 17.02287 19.83796 17.14120 16.72355 18.56964 18.19241 13.83327
## [3,] 11.94724 17.55357 20.60386 20.24220 16.07105 13.96353 21.31576
## [4,] 18.45152 11.59260 17.42055 19.23979 19.46640 16.28337 11.92779
## [5,] 19.79754 20.18799 10.31959 22.63071 14.70449 21.23318 12.28052
## [6,] 19.52867 17.17794 26.50403 20.79099 16.85627 16.74970 13.36202
## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## [1,] 15.08942 11.83466 6.372787 9.092416
## [2,] 17.31504 16.66325 9.372890 12.408165
## [3,] 15.25952 12.59248 17.628026 10.148073
## [4,] 13.40738 19.09064 9.711535 4.601367
## [5,] 12.99332 11.82580 13.126779 10.442684
## [6,] 19.48801 17.77542 10.097532 13.456131
## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla Toyota Corona
## [1,] 3.101490 26.17248 33.26072 26.44477 23.51168
## [2,] 9.955109 23.92047 32.12174 31.83815 18.68618
## [3,] 15.272304 26.73828 31.45684 29.42007 18.51923
## [4,] 4.139900 24.74204 29.99198 27.54381 23.87595
## [5,] 10.925105 31.37050 25.43979 29.22767 19.55515
## [6,] 11.670132 26.93697 23.03120 27.34927 22.59423
## Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1-9
## [1,] 16.76778 20.74306 13.82242 20.16253 21.41692
## [2,] 20.19332 19.76735 15.33447 14.03374 29.31019
## [3,] 19.45280 20.17639 18.14594 18.82701 21.83076
## [4,] 19.98458 16.43369 17.16427 17.83606 31.44703
## [5,] 19.06299 12.15604 13.62427 14.28430 27.65276
## [6,] 18.59290 17.41055 17.88571 16.34603 24.81989
## Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora
## [1,] 26.13833 30.02604 21.19611 19.17363 14.86521
## [2,] 25.81284 31.69259 17.01650 22.37688 14.34606
## [3,] 22.29241 29.33770 23.94098 21.58584 15.72230
## [4,] 30.00675 27.46188 19.61855 26.95284 22.31696
## [5,] 22.70551 27.20964 19.32096 22.94286 16.71434
## [6,] 25.03603 30.59111 12.12025 20.62699 15.26275
## Volvo 142E
## [1,] 19.80113
## [2,] 21.60304
## [3,] 25.29954
## [4,] 20.34696
## [5,] 17.49975
## [6,] 18.85936
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "mean")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "range")
bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "min")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# 2D (mean, sd) plot
bayesplot::ppc_stat_2d(mtcars2$mpg, ytilde[1:100, ], stat=c("mean", "sd"), size=2.5, alpha=0.7)
Recall that we have:
ift_ft_xC_fMRI
ift_ft_xC_fMRI_1sampleN_PhaseC
for all samples of \(N=30\)ift_ft_xC_fMRI_1sampleN_PhaseB
for all samples of \(N=30\), andift_ft_xC_fMRI_1sampleN_PhaseD
for all samples of \(N=30\)#install.packages("rstanarm")
#library("rstanarm")
# xC_fMRI_scaled <- scale(xC_fMRI)[ , 1]
length(ift_ft_xC_fMRI) # [1] 8000
## [1] 144
## [1] 144 30
## [1] 144 30
## [1] 144 30
# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=30 by
# N(number of data time points being predicted per draw)=8000
# Inspect the first few spacekime reconstructed draws: Mind the transposition of the tensor!
# t(ift_ft_xC_fMRI_1sampleN_PhaseC)[ , 1:7] # 30(draws) 7(first timepoints ony)
# the spacekime estimates represent the posterior predictive distribution
# ytilde <- t(ift_ft_xC_fMRI_1sampleN_PhaseC + mean(xC_fMRI)) # all 8000 timepoints (columns)
# Center/Offset the spacekime estimates to the center of the original data (xC_fMRI)
ytilde <- rbind(t(ift_ft_xC_fMRI_1sampleN_PhaseC+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseC,2,mean)),
t(ift_ft_xC_fMRI_1sampleN_PhaseB+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseB,2,mean)),
t(ift_ft_xC_fMRI_1sampleN_PhaseD+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseD,2,mean)))
dim(ytilde) # Outcome=bimodal process (xC_fMRI), [1] 30(MCMC draws) 8000(timepoints)
## [1] 90 144
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "mean")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.