Learning Modules

Class Notes » (Video) DSPA Overview & Motivation »

Examples of Big Biomedical Challenges (AD, PD, ALS, AWD)

Brain Visualization


Genomics computing


Common Characteristics of Big (Biomedical and Health) Data

High-throughput Big Data Analytics

Class Notes » R Code » Assignment » (Video) DSPA Chapter 1 »

Statistical Software – Pros/Cons Comparison    

Getting started        

Install Basic Shell-based R 

GUI based R Invocation (RStudio)

RStudio GUI Layout 


Simple Long-to-Wide Data format translation    

Data generation


Slicing and extracting data 

Variable conversion 

Variable information 

Data selection and manipulation   

Math Functions       

Matrix Operations    

Advanced Data Processing



QQ Normal Probability Plots

Low-level plotting commands       

Graphics parameters         

Optimization and model fitting      




Data Simulation Primer

Class Notes » R Code » Assignment » (Video) DSPA Chapter 2 »

Managing Data in R

Saving and Loading R Data Structures  

Importing and Saving Data from CSV Files      

Exploring the Structure of Data   

Exploring Numeric Variables       

Measuring the Central Tendency - mean and median 

Measuring Spread - quartiles and the five-number summary 

Visualizing Numeric Variables - boxplots

Visualizing Numeric Variables - histograms      

Understanding Numeric Data - uniform and normal distributions      

Measuring Spread - variance and standard deviation  

Exploring Categorical Variables   

Measuring the Central Tendency - the mode

Exploring Relationships Between Variables

Missing Data

Parsing webpages and visualizing tabular HTML data

Cohort-Rebalancing (for Imbalanced Groups)

Class Notes » R Code » Assignment » (Video) DSPA Chapter 3 »

Classification of visualization methods   


Histograms and density plots      

Pie Chart     

Heat map     


Paired ScatterPlots 


Trees and Graphs  

Correlation Plots     


Line plots using ggplot      

Density Plots


2D Kernel Density and 3D Surface Plots

Jitter plot     


Hands-on Activity (Health Behavior Risks)       

Class Notes » R Code » Assignment » (Video) DSPA Chapter 4 »

Linear Algebra & Matrix Computing        

Building Matrices     

Create matrices       

Adding columns and rows  

Matrix subscripts     

Matrix Operations    




Elementwise multiplication  

Matrix multiplication 




Matrix Operations    

Matrix Algebra Notation     

Matrix Notation        

Solving Systems of Equations      

The identity matrix   

Vectors, Matrices, and Scalars     

Sample Statistics     



Applications of Matrix Algebra: Linear modeling 

Finding function extrema (min/max) using calculus      

Least Square Estimation    

The R lm Function   

Eigenvalues and Eigenvectors      

Other important functions  

Matrix notation        

Linear regression

Sample covariance matrix  

Class Notes » R Code » Assignment » (Video) DSPA Chapter 5 »

Principal Component Analysis (PCA)

Independent Component Analysis (ICA)

Factor Analysis (FA)

Singular Value Decomposition (SVD)

Class Notes » R Code » Assignment » (Video) DSPA Chapter 6 »

Understanding classification using nearest neighbors

The kNN algorithm

Calculating distance

Choosing an appropriate k

Preparing data for use with kNN

Why is the kNN algorithm lazy?

Predictive Diagnostics

Class Notes » R Code » Assignment » (Video) DSPA Chapter 7 »

The Naive Bayes Algorithm


Bayes Formula       

The Laplace Estimator      

Case Study: Head and Neck Cancer Medication        

Class Notes » R Code » Assignment »

Understanding decision trees

Divide and conquer

The C5.0 decision tree algorithm

Choosing the best split

Pruning the decision tree

Boosting the accuracy of decision trees

Making some mistakes more costly than others

Understanding classification rules

Separate and conquer

The One Rule algorithm

The RIPPER algorithm

Rules from decision trees

Class Notes » R Code » Assignment »

Simple linear regression    

Ordinary least squares estimation


Multiple Linear Regression

Case Study 1: Baseball Players

Step 2 - exploring and preparing the data

Step 3 - training a model on the data     

Step 4 - evaluating model performance 

Step 5 - improving model performance  

Regression trees and model trees

Heart Attack Data

Class Notes » R Code » Assignment »

Neural Networks     

Network topology   

Training neural networks with backpropagation

Case Study 1: Google Trends and the Stock Market

Support Vector Machines (SVM)

Case Study 2: Optical Character Recognition (OCR)

Case Study 3: Iris Flowers

Class Notes » R Code » Assignment »

Association Rules   

Rule support and confidence

Case Study 1: Head and Neck Cancer Medications

Practice Problems: Groceries

Class Notes » R Code » Assignment »

Clustering as a machine learning task    

The k-Means Clustering Algorithm         

Case Study 1: Divorce and Consequences on Young Adults 

Case study 2: Pediatric Trauma

Practice Problem: Youth Development

Class Notes » R Code » Assignment »

Measuring performance for classification         

Working with classification prediction data

Evaluation: Confusion matrices

Other performance measures

Visualizing performance tradeoffs

Estimating future performance (internal statistical validation)

The holdout method

Class Notes » R Code » Assignment »

Tuning stock models for better performance

Using caret for automated parameter tuning

Creating a simple tuned model

Customizing the tuning process

Improving model performance with meta-learning

Understanding ensembles



Random forests

Training random forests

Evaluating random forest performance

Class Notes » R Code » Assignment »

Working with specialized data and databases

Querying data in SQL databases

Downloading the complete text of web pages

Web-page Data Scraping

Parsing JSON from web APIs

Reading and writing Microsoft Excel spreadsheets using XLSX

Visualizing network data

Data Streams and Streaming Classification

Optimization and improving the computational performance

Generalizing tabular data structures with dplyr

Parallel computing

GPU computing

Class Notes » R Code » Assignment »

Variable selection methods

Case Study - ALS

Evaluating model performance

Class Notes » R Code » Assignment »

Regularized Linear Modeling       

Ridge Regression   

Least Absolute Shrinkage and Selection Operator (LASSO) Regression         

Linear Regression  

Assessing Prediction Accuracy    

Estimating Prediction Error

Improving Prediction Accuracy

General Regularization Framework

Example: Neuroimaging-genetics study of Parkinson's Disease Dataset

Computational Complexity

n-Fold Cross Validation

Controlled Variable Selection: Knockoff Filtering: Simulated Example

PD Neuroimaging-genetics Case-Study


Class Notes » R Code » Assignment »

Time series analysis

Identifying the Diff, AR and MA parameters

Structural Equation Modeling (SEM)

Case study - Parkinson's Disease (PD)   

Linear Mixed model 

GLMM and GEE Longitudinal data analysis

Class Notes » R Code » Assignment »

Term Frequency (TF), Inverse Document Frequency (IDF)

Document Term Matrix (DTM)

Case-Study: Job ranking 


Class Notes » R Code » Assignment »

Forecasting types and assessment approaches


Internal Statistical Cross-validation is an iterative process

Example (Linear Regression)

Cross-validation methods


Summary of CS output

Alternative predictor functions

Prediction Models

Appendix: R Debugging

Class Notes » R Code » Assignment »

Free (unconstrained) optimization

Constrained Optimization

Equality nand Inequality constraints

Lagrange Multipliers

Linear and Quadratic Programming

Manual vs. Automated Lagrange Multiplier Optimization

Data Denoising

Class Notes » R Code » Assignment »


Biological Relevance

Simple Neural Net Examples XOR and NAND Operators

Sonar data example

Schizophrenia Neuroimaging Study

Spirals 2D Data

IBS Study

Country QoL Ranking Data

Handwritten Digits Classification

Classifying Real-World Images

SOCR Resource Visitor number Dinov Email