SOCR ≫ DSPA ≫ Topics ≫

1 Regression Forecasting for Numerical Data

Use the Quality of Life data (Case06_QoL_Symptom_ChronicIllness) to fit several different Multiple Linear Regression models predicting clinically relevant outcomes, e.g., Chronic Disease Score. Complete the following protocol:

  • Collect data and preprocess it carefully.
  • Summarize and visualize the data using summary, str, pairs.panels, ggplot.
  • Report correlations for numerical features and try to visualize these associations (e.g heatmap, pairs plot etc.)
  • Examine potential dependences of the predictors and the dependent response variable.
  • Fit several Multiple Linear Regression models, report your results, and explain the summary, residuals, effect-size coefficients and the coefficient of determination, \(R^2\).
  • Draw various model diagnostic plots, including QQ plot, residuals plot and leverage plot (half norm plot).
  • Interpret the results in terms of the data.
  • Predict the outcomes for new data and assess the prediction using several criteria (e.g.,correlation coefficient, MSRE, etc.)
  • Try to improve the model performance using the step function and interpret both AIC and BIC.
  • Fit a regression tree model, visualize the model and compare it with the earlier OLS model.
  • Use M5P in RWeka to obtain a better model.

SOCR Resource Visitor number Dinov Email