SOCR ≫ TCIU Website ≫ TCIU GitHub ≫

1 Exegeneous Feature Time-series analysis

1.1 Figure 5.7

We can use the UCI ML Air Quality Dataset to demonstrate the effect of kime-direction on the analysis of the longitudinal data. The Air Quality data consists of \(9358\) hourly-averaged responses from an array of 5 sensors embedded in an Air Quality Chemical Multisensor Device. These measurements were obtained in a significantly polluted area during a one year period (March 2004 to February 2005). The features include Concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx), and Nitrogen Dioxide (NO2).

The attributes in the CSV file include:

  • Date (DD/MM/YYYY)
  • Time (HH.MM.SS)
  • True hourly averaged concentration CO in mg/m^3 (reference analyzer)
  • PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)
  • True hourly averaged overall Non Metanic Hydro-carbons concentration in microg/m^3 (reference analyzer)
  • True hourly averaged Benzene concentration in microg/m^3 (reference analyzer)
  • PT08.S2 (Titania) hourly averaged sensor response (nominally NMHC targeted)
  • True hourly averaged NOx concentration in ppb (reference analyzer)
  • PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
  • True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)
  • PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
  • PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
  • Temperature in ?C
  • Relative Humidity (%)
  • AH Absolute Humidity
##         Date            Time          CO.GT.         PT08.S1.CO.  
##           : 114   0:00:00 : 390   Min.   :-200.00   Min.   :-200  
##  1/1/2005 :  24   1:00:00 : 390   1st Qu.:   0.60   1st Qu.: 921  
##  1/10/2005:  24   10:00:00: 390   Median :   1.50   Median :1053  
##  1/11/2005:  24   11:00:00: 390   Mean   : -34.21   Mean   :1049  
##  1/12/2005:  24   12:00:00: 390   3rd Qu.:   2.60   3rd Qu.:1221  
##  1/13/2005:  24   13:00:00: 390   Max.   :  11.90   Max.   :2040  
##  (Other)  :9237   (Other) :7131   NA's   :114       NA's   :114   
##     NMHC.GT.         C6H6.GT.        PT08.S2.NMHC.       NOx.GT.      
##  Min.   :-200.0   Min.   :-200.000   Min.   :-200.0   Min.   :-200.0  
##  1st Qu.:-200.0   1st Qu.:   4.000   1st Qu.: 711.0   1st Qu.:  50.0  
##  Median :-200.0   Median :   7.900   Median : 895.0   Median : 141.0  
##  Mean   :-159.1   Mean   :   1.866   Mean   : 894.6   Mean   : 168.6  
##  3rd Qu.:-200.0   3rd Qu.:  13.600   3rd Qu.:1105.0   3rd Qu.: 284.0  
##  Max.   :1189.0   Max.   :  63.700   Max.   :2214.0   Max.   :1479.0  
##  NA's   :114      NA's   :114        NA's   :114      NA's   :114     
##   PT08.S3.NOx.     NO2.GT.         PT08.S4.NO2.   PT08.S5.O3.    
##  Min.   :-200   Min.   :-200.00   Min.   :-200   Min.   :-200.0  
##  1st Qu.: 637   1st Qu.:  53.00   1st Qu.:1185   1st Qu.: 700.0  
##  Median : 794   Median :  96.00   Median :1446   Median : 942.0  
##  Mean   : 795   Mean   :  58.15   Mean   :1391   Mean   : 975.1  
##  3rd Qu.: 960   3rd Qu.: 133.00   3rd Qu.:1662   3rd Qu.:1255.0  
##  Max.   :2683   Max.   : 340.00   Max.   :2775   Max.   :2523.0  
##  NA's   :114    NA's   :114       NA's   :114    NA's   :114     
##        T                  RH                AH               X          
##  Min.   :-200.000   Min.   :-200.00   Min.   :-200.0000   Mode:logical  
##  1st Qu.:  10.900   1st Qu.:  34.10   1st Qu.:   0.6923   NA's:9471     
##  Median :  17.200   Median :  48.60   Median :   0.9768                 
##  Mean   :   9.778   Mean   :  39.49   Mean   :  -6.8376                 
##  3rd Qu.:  24.100   3rd Qu.:  61.90   3rd Qu.:   1.2962                 
##  Max.   :  44.600   Max.   :  88.70   Max.   :   2.2310                 
##  NA's   :114        NA's   :114       NA's   :114                       
##    X.1         
##  Mode:logical  
##  NA's:9471     
##                
##                
##                
##                
## 
## Series: aqi_data$CO.GT. 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1      ma1  intercept  PT08.S1.CO.  NMHC.GT.  C6H6.GT.
##       0.9758  -0.5825   -40.4356       0.0056    0.0452   -2.9162
## s.e.  0.0025   0.0099    22.5588       0.0120    0.0071    0.5165
##       PT08.S2.NMHC.  NOx.GT.  PT08.S3.NOx.  NO2.GT.  PT08.S4.NO2.
##              0.1017   0.0363       -0.0157   0.0091       -0.0196
## s.e.         0.0188   0.0057        0.0073   0.0098        0.0095
##       PT08.S5.O3.        T       RH      AH
##           -0.0091  -0.3853  -0.2147  3.4773
## s.e.       0.0044   0.3318   0.1219  0.6934
## 
## sigma^2 estimated as 1202:  log likelihood=-46447.71
## AIC=92927.43   AICc=92927.49   BIC=93041.73
## Warning in forecast.Arima(fitArimaX, xreg = as.matrix(aqi_data[c((9471 - :
## Upper prediction intervals are not finite.

We can first explore the time-course harmonics of the data.

## quartz_off_screen 
##                 2

## quartz_off_screen 
##                 2

## quartz_off_screen 
##                 2

1.2 Fig 5.8

There is a key difference between spacekime data analytics and spacetime data modeling and inference. This contrast is based on the fact that in spacetime, statistical results are obtained by aggregating repeated (IID) dataset samples or measuring identical replicate cohorts under identical conditions. In spacekime, reliable inference can be made on a single sample, if the kime direction angles are known. Indeed the latter are generally no observable, however, they can be estimated, inferred, or approximated. As the FT and IFT are linear functionals, addition, averaging and multiplication by constants are preserved by the forward and inverse Fourier transforms. Therefore, if we have a number of phase estimates in k-space, we can aggregate these (e.g., by averaging them) and use the resulting assemblage phase to synthesize the data in spacekime. If the composite phases are indeed representative of the process kime orientation, then the reconstructed spacekime inference is expected to be valid even if we used a single sample. In a way, spacekime inference allows a dual representation of the central limit theorem, which guarantees the convergence of sample averages to their corresponding population mean counterparts.

In light of this analytics duality, we can now perform a traditional ARIMA modeling of CO concentration (outcome: PT08.S1.CO.) based on several covariates, e.g., predictors: NMHC.GT., C6H6.GT., PT08.S2.NMHC., NOx.GT., PT08.S3.NOx., NO2.GT., PT08.S4.NO2., PT08.S5.O3., T, RH, and AH.

## [1] 9471   17
## [1] TRUE
## [1] 9000   12
## [1]    9 1000   12
## [1] TRUE
## [1] 1000   12
# 1D timeseries FFT SHIFT
fftshift1D <- function(img_ff) {
  rows <- length(img_ff)   
  rows_half <- ceiling(rows/2)
  return(append(img_ff[(rows_half+1):rows], img_ff[1:rows_half]))
}

# 1. Transform all 9 signals to k-space (Fourier domain)
x1 <- c(1:1000)
FT_epochs_aqi_data <- array(complex(), c(9, 1000, 12))
mag_FT_epochs_aqi_data <- array(complex(), c(9, 1000, 12))
phase_FT_epochs_aqi_data <- array(complex(), c(9, 1000, 12))
for (i in 1:9) {
  FT_epochs_aqi_data[i, , ] <- fft(epochs_aqi_data[i, , ])
  X2 <- FT_epochs_aqi_data[i, , ]
  # plot(fftshift1D(log(Re(X2)+2)), main = "log(fftshift1D(Re(FFT(timeseries))))") 
  mag_FT_epochs_aqi_data[i, , ] <- sqrt(Re(X2)^2+Im(X2)^2); 
  # plot(log(fftshift1D(Re(X2_mag))), main = "log(Magnitude(FFT(timeseries)))") 
  phase_FT_epochs_aqi_data[i, , ] <- atan2(Im(X2), Re(X2)); 
  # plot(fftshift1D(X2_phase), main = "Shift(Phase(FFT(timeseries)))")
}

### Test the process to confirm calculations
# X2<-FT_epochs_aqi_data[1,,];X2_mag<-mag_FT_epochs_aqi_data[1,,];X2_phase<-phase_FT_epochs_aqi_data[1,,]
# Real2 = X2_mag * cos(X2_phase)
# Imaginary2 = X2_mag * sin(X2_phase)
# man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
# ifelse(abs(man_hat_X2[5,10] - epochs_aqi_data[1, 5, 10]) < 0.001, "Perfect Syntesis", "Problems!!!")
#######

# 2. Invert back to spacetime the epochs_aqi_data_1 signal with nil phase
Real = mag_FT_epochs_aqi_data[1, , ] * cos(0)  # cos(phase_FT_epochs_aqi_data[1, , ])
Imaginary = mag_FT_epochs_aqi_data[1, , ] * sin(0)   # sin(phase_FT_epochs_aqi_data[1, , ])
ift_NilPhase_X2mag = Re(fft(Real+1i*Imaginary, inverse = T)/length(FT_epochs_aqi_data[1,,]))
# display(ift_NilPhase_X2mag, method = "raster")
# dim(ift_NilPhase_X2mag); View(ift_NilPhase_X2mag); # compare to View(epochs_aqi_data[1, , ])

# 3. Perform ARIMAX modeling of ift_NilPhase_X2mag; report (p,d,q) params and quality metrics AIC/BIC

fitArimaX_nil <- auto.arima(ift_NilPhase_X2mag[ , 1], xreg= ift_NilPhase_X2mag[ , 2:12])
fitArimaX_nil
## Series: ift_NilPhase_X2mag[, 1] 
## Regression with ARIMA(2,0,1) errors 
## 
## Coefficients:
##          ar1      ar2      ma1  intercept    xreg1   xreg2    xreg3
##       1.1141  -0.1457  -0.7892   503.3455  -0.4028  0.1366  -0.5146
## s.e.  0.2064   0.1571   0.1821    73.4216   0.1087  0.1123   0.1072
##        xreg4   xreg5   xreg6   xreg7   xreg8    xreg9  xreg10   xreg11
##       1.0961  1.2195  1.3063  1.2087  1.1491  -0.4823  0.0315  -0.4640
## s.e.  0.1059  0.1029  0.1564  0.1023  0.1101   0.1049  0.1125   0.1076
## 
## sigma^2 estimated as 30448:  log likelihood=-6573.68
## AIC=13179.36   AICc=13179.91   BIC=13257.88
## Warning in cbind(intercept = rep(1, n.ahead), newxreg): number of rows of
## result is not a multiple of vector length (arg 1)
## Warning in z[[1L]] + xm: longer object length is not a multiple of shorter
## object length

## [1] 1000   12
##               [,1]          [,2]         [,3]          [,4]         [,5]
## [1,]  0.0000000+0i  3.0047664+0i -1.038689+0i  0.6640319+0i 0.6095669+0i
## [2,] -1.4212860+0i -0.1016919+0i -2.648523+0i -0.5298284+0i 2.3392869+0i
## [3,] -1.4819520+0i  0.4516082+0i -2.289320+0i -0.9302969+0i 2.5178117+0i
## [4,]  0.1747996+0i -2.5427688+0i -1.433284+0i  1.6936688+0i 2.6054166+0i
## [5,] -2.0177463+0i -0.8160858+0i  3.018639+0i -1.0881887+0i 2.7825180+0i
## Series: ift_AvgPhase_X2mag[, 1] 
## Regression with ARIMA(2,0,3) errors 
## 
## Coefficients:
##          ar1     ar2     ma1      ma2     ma3  intercept   xreg1   xreg2
##       0.3295  0.2384  0.2673  -0.0061  0.1573   742.8001  0.5838  0.2809
## s.e.  0.1354  0.1150  0.1363   0.0650  0.0451    97.1677  0.1700  0.1776
##         xreg3   xreg4    xreg5   xreg6   xreg7   xreg8    xreg9  xreg10
##       -0.6497  1.2399  -0.0261  1.0818  0.2540  0.3065  -0.4052  0.3511
## s.e.   0.1468  0.1695   0.1710  0.2394  0.1706  0.1665   0.1450  0.1791
##        xreg11
##       -0.4577
## s.e.   0.1709
## 
## sigma^2 estimated as 82982:  log likelihood=-7073.9
## AIC=14183.8   AICc=14184.5   BIC=14272.14
## Series: epochs_aqi_data[1, , 1] 
## Regression with ARIMA(0,1,4) errors 
## 
## Coefficients:
##           ma1      ma2     ma3      ma4   xreg1   xreg2   xreg3   xreg4
##       -0.6227  -0.0504  0.0067  -0.2011  0.0788  6.2927  0.0893  0.0180
## s.e.   0.0327   0.0366  0.0396   0.0307  0.0212  1.6588  0.0553  0.0152
##         xreg5    xreg6   xreg7   xreg8   xreg9  xreg10    xreg11
##       -0.0522  -0.0127  0.1809  0.1788  6.4129  1.7568  -11.9521
## s.e.   0.0206   0.0256  0.0273  0.0118  0.7105  0.2722    1.8927
## 
## sigma^2 estimated as 2291:  log likelihood=-5275.11
## AIC=10582.23   AICc=10582.78   BIC=10660.73
## [1] "-0.0114"
## [1] "0.077"
## [1] "0.309"
## Warning in w0 * rep.int(0:(ncol - 1), rep.int(n.legpercol, ncol)): longer
## object length is not a multiple of shorter object length

## Warning in w0 * rep.int(0:(ncol - 1), rep.int(n.legpercol, ncol)): longer
## object length is not a multiple of shorter object length
## quartz_off_screen 
##                 2
## Warning in w0 * rep.int(0:(ncol - 1), rep.int(n.legpercol, ncol)): longer
## object length is not a multiple of shorter object length

## Warning in w0 * rep.int(0:(ncol - 1), rep.int(n.legpercol, ncol)): longer
## object length is not a multiple of shorter object length
## quartz_off_screen 
##                 2

1.3 Fig 5.9

An alternative data analytic approach involves using the Fourier transform applied to the complete 2D data-matrix (rows=time, columns=features), inverting it back in spacetime, and investigating the effect of the timeseries analysis with and without using the correct kime-directions (phases). Knowing the kime-directions is expected to produce better analytical results (e.g., lower bias and lower dispersion).

## [1] 9471   17
## [1] 9471   13
##      CO.GT.        PT08.S1.CO.      NMHC.GT.         C6H6.GT.       
##  Min.   :-200.0   Min.   :-200   Min.   :-200.0   Min.   :-200.000  
##  1st Qu.:   0.6   1st Qu.: 915   1st Qu.:-200.0   1st Qu.:   3.900  
##  Median :   1.5   Median :1050   Median :-200.0   Median :   7.800  
##  Mean   : -33.8   Mean   :1036   Mean   :-157.2   Mean   :   1.843  
##  3rd Qu.:   2.6   3rd Qu.:1218   3rd Qu.:-200.0   3rd Qu.:  13.500  
##  Max.   :  11.9   Max.   :2040   Max.   :1189.0   Max.   :  63.700  
##  PT08.S2.NMHC.       NOx.GT.        PT08.S3.NOx.       NO2.GT.       
##  Min.   :-200.0   Min.   :-200.0   Min.   :-200.0   Min.   :-200.00  
##  1st Qu.: 704.0   1st Qu.:  47.0   1st Qu.: 631.0   1st Qu.:  51.00  
##  Median : 890.0   Median : 139.0   Median : 791.0   Median :  95.00  
##  Mean   : 883.8   Mean   : 166.6   Mean   : 785.4   Mean   :  57.45  
##  3rd Qu.:1102.0   3rd Qu.: 281.5   3rd Qu.: 957.0   3rd Qu.: 132.00  
##  Max.   :2214.0   Max.   :1479.0   Max.   :2683.0   Max.   : 340.00  
##   PT08.S4.NO2.   PT08.S5.O3.           T                  RH         
##  Min.   :-200   Min.   :-200.0   Min.   :-200.000   Min.   :-200.00  
##  1st Qu.:1171   1st Qu.: 689.0   1st Qu.:  10.700   1st Qu.:  33.50  
##  Median :1440   Median : 936.0   Median :  17.000   Median :  48.10  
##  Mean   :1375   Mean   : 963.3   Mean   :   9.661   Mean   :  39.01  
##  3rd Qu.:1658   3rd Qu.:1250.0   3rd Qu.:  24.000   3rd Qu.:  61.70  
##  Max.   :2775   Max.   :2523.0   Max.   :  44.600   Max.   :  88.70  
##        AH           
##  Min.   :-200.0000  
##  1st Qu.:   0.6768  
##  Median :   0.9711  
##  Mean   :  -6.7553  
##  3rd Qu.:   1.2915  
##  Max.   :   2.2310
##      CO.GT.       PT08.S1.CO.        NMHC.GT.         C6H6.GT.      
##  Min.   : 1143   Min.   :-185.1   Min.   :-392.3   Min.   : -65.27  
##  1st Qu.: 1789   1st Qu.: 138.1   1st Qu.: 188.4   1st Qu.: 228.36  
##  Median : 1982   Median : 213.8   Median : 254.5   Median : 298.23  
##  Mean   : 2100   Mean   : 218.9   Mean   : 257.7   Mean   : 304.04  
##  3rd Qu.: 2223   3rd Qu.: 291.7   3rd Qu.: 324.4   3rd Qu.: 373.90  
##  Max.   :41122   Max.   :3787.7   Max.   :1323.0   Max.   :2876.22  
##  PT08.S2.NMHC.        NOx.GT.         PT08.S3.NOx.        NO2.GT.       
##  Min.   : -93.41   Min.   :  32.15   Min.   :-183.04   Min.   :-183.04  
##  1st Qu.:  97.44   1st Qu.: 392.22   1st Qu.:  22.36   1st Qu.:  22.36  
##  Median : 152.62   Median : 458.39   Median :  78.28   Median :  78.28  
##  Mean   : 167.31   Mean   : 476.19   Mean   :  85.98   Mean   :  85.98  
##  3rd Qu.: 217.35   3rd Qu.: 533.20   3rd Qu.: 139.73   3rd Qu.: 139.73  
##  Max.   :4611.64   Max.   :7979.90   Max.   :3896.77   Max.   :3896.77  
##   PT08.S4.NO2.      PT08.S5.O3.            T                 RH        
##  Min.   :  32.15   Min.   : -93.41   Min.   : -65.27   Min.   :-392.3  
##  1st Qu.: 392.22   1st Qu.:  97.44   1st Qu.: 228.36   1st Qu.: 188.4  
##  Median : 458.39   Median : 152.62   Median : 298.23   Median : 254.5  
##  Mean   : 476.19   Mean   : 167.31   Mean   : 304.04   Mean   : 257.7  
##  3rd Qu.: 533.20   3rd Qu.: 217.35   3rd Qu.: 373.90   3rd Qu.: 324.4  
##  Max.   :7979.90   Max.   :4611.64   Max.   :2876.22   Max.   :1323.0  
##        AH        
##  Min.   :-185.1  
##  1st Qu.: 138.1  
##  Median : 213.8  
##  Mean   : 218.9  
##  3rd Qu.: 291.7  
##  Max.   :3787.7

## quartz_off_screen 
##                 2

## quartz_off_screen 
##                 2
## [1] 38880.76

## quartz_off_screen 
##                 2

## quartz_off_screen 
##                 2

## quartz_off_screen 
##                 2
## [1] 2876.822

## quartz_off_screen 
##                 2

2 Appendix: Functions Used

2.1 ff_harmonics()

# Compute the spectral decomposition of an array (harmonics)
#' This function computes the FT of the singlal and plots the first few harmonics
#' 
#' @param x Original signal (1D, 2D, or 3D array).
#' @param n Number of first harmonics to report (integer).
#' @param up Upsamping rate (default=10).
#' @param plot Boolean indicating whether to print the harmonics plot(default==TRUE).
#' @param add whether to overplot the harmonics on an existing graph (default=FALSE), 
#' @param main Title for the plot.
#' @return A plot and a dataframe with the sampled harmonics and their corresponding FT magnitudes/amplitudes.
#'
ff_harmonics = function(x=NULL, n=NULL, up=10L, plot=TRUE, add=F, main=NULL, ...) {
  # The discrete Fourier transformation
  dff = fft(x)
  # time
  t = seq(from = 1, to = length(x))
  
  # Upsampled time
  nt = seq(from = 1, to = length(x)+1-1/up, by = 1/up)
 
  #New spectrum
  ndff = array(data = 0, dim = c(length(nt), 1L))
  ndff[1] = dff[1] # mean, DC component
  if(n != 0){
    ndff[2:(n+1)] = dff[2:(n+1)] # positive frequencies come first
    ndff[length(ndff):(length(ndff) - n + 1)] = dff[length(x):(length(x) - n + 1)] # negative frequencies
  }
  
  # Invert the FT
  indff = fft(ndff/length(y), inverse = TRUE)
  idff = fft(dff/length(y), inverse = TRUE)
  if(plot){
    if(!add){
      plot(x = t, y = x, pch = 16L, xlab = "Time", ylab = "Measurement", 
          col = rgb(red = 0.5, green = 0.5, blue = 0.5, alpha = 0.5),
          main = ifelse(is.null(main), paste(n, "harmonics"), main))
      lines(y = Mod(idff), x = t, col = adjustcolor(1L, alpha = 0.5))
    }
    lines(y = Mod(indff), x = nt, ...)
  }
  ret = data.frame(time = nt, y = Mod(indff))
  return(ret)
}