SOCR ≫ DSPA ≫ Topics ≫

1 Explain these concepts:

  • Information Gain Measure
  • Impurity
  • Entropy
  • Gini

2 Decision Tree Partitioning

Load the SOCR Neonatal Infant Pain score data and follow these steps:

  • Collect and preprocessing the data, e.g., data conversion and variable selection.
  • Randomly split the data into training and testing sets.
  • Train decision tree models on the data using C5.0 and rpart.
  • Evaluate and compare the two models.
  • Tune the rpart parameter and repeat the evaluation and comparison again.
  • Assess the prediction accuracy and report the confusion matrix.
  • Comment on different aspects of the prediction performance.
  • Use various impurity measures and re-estimate the models.
  • Try to use the RWeka package to train decision models and compare the results.
  • Try to apply Random Forest and obtain variables importance plot.

SOCR Resource Visitor number Dinov Email