SOCR ≫ DSPA ≫ Topics ≫

1 Import, plot, summarize and save data

Load the following two datasets, generate summary statistics for all variables, plot some of the features (e.g., histograms, box plots, density plots, etc.) of some variables, and save the data locally as CSV files:

2 Explore some bivariate relations in the above data

Use ALS case-study data and long-format SOCR Parkinson’s Disease data(extract rows with Time=0) to explore some bivariate relations (e.g. bivariate plot, correlation, table, crosstable etc.)

Use 07_UMich_AnnArbor_MI_TempPrecipitation_HistData_1900_2015 data to show the relations between temperature and time. [Hint: use geom_line and geom_bar]

3 Missing data

Introduce (artificially) some missing data, impute the missing values and examine the differences between the original, incomplete and imputed data in statistics.

4 Surface plots

Generate a surface plot for the SOCR Knee Pain Data illustrating the 2D distribution of locations of the patient reported knee pain (use plotly and kernel density estimation).

5 Unbalanced designs

Rebalance Parkinson’s Disease data(extract rows with Time=0) according to disease(SWEED OR PD) and health(HC) using synthetic minority oversampling (SMOTE) to ensure approximately equal cohort sizes. (Notice: need to set 1 as the minority class.)

6 Aggregate analysis

Use the California Ozone Data to generate a summary report. Make sure include: summary for every variable, structure of data, proper data type convert(if needed), discuss the tendency of the ozone average concentration in terms of year’s average for each location, explore the differences of the ozone concentration for area, explore the change of ozone concentration as seasons.

SOCR Resource Visitor number Dinov Email