SOCR ≫ DSPA ≫ Topics ≫

1 Explain the following concepts

  • Information Gain Measure
  • Impurity
  • Entropy
  • Gini

2 Decision Tree Partitioning

Use the SOCR Neonatal Pain data to build and display a decision tree recursively partitioning the data using the provided features and attributes to split the data into clusters.

  • Create two classes using variable Cluster
  • Create random training and test datasets
  • Train a decision tree model on the data, use C5.0 and rpart, separately
  • Evaluate the model performance and compare the C5.0 and rpart results
  • Tune the parameter for rpart and evaluate again
  • Make predictions on testing data and assess the prediction accuracy - report the confusion matrix
  • Comment on the classification performance
  • Try to apply Random Forest classification and report the variables importance plot, predictions on testing data, and assess the prediction accuracy.

SOCR Resource Visitor number Dinov Email