SOCR ≫ | DSPA ≫ | Topics ≫ |

Start by reviewing Chapter 13 (Model Performance Assessment). Cross-validation is a strategy for validating predictive methods, classification models and clustering techniques by assessing the reliability and stability of the results of the corresponding statistical analyses (e.g., predictions, classifications, forecasts) based on independent datasets. For prediction of trend, association, clustering, and classification, a model is usually trained on one dataset (*training data*) and subsequently tested on new data (*testing or validation data*). Statistical internal cross-validation defines a test dataset to evaluate the model predictive performance as well as assess its power to avoid overfitting. *Overfitting* is the process of computing a predictive or classification model that describes random error, i.e., fits to the noise components of the observations, instead of identifying actual relationships and salient features in the data.

In this Chapter, we will use Google Flu Trends, Autism, and Parkinson’s disease case-studies to illustrate (1) alternative forecasting types using linear and non-linear predictions, (2) exhaustive and non-exhaustive internal statistical cross-validation, and (3) explore complementary predictor functions.

In Chapter 6 we discussed the types of classification and prediction methods, including `supervised`

and `unsupervised`

learning. The former are direct and predictive (there are known outcome variables that can be predicted and the corresponding forecasts can be evaluated) and the latter are indirect and descriptive (there are no *a priori* labels or specific outcomes).

There are alternative metrics used for evaluation of model performance, see Chapter 13. For example, assessment of supervised prediction and classification methods depends on the type of the labeled **outcome responses** - `categorical`

(binary or polytomous) vs. `continuous`

.

`Confusion matrices`

reporting accuracy, FP, FN, PPV, NPV, LOR and other metrics may be used to assess predictions of dichotomous (binary) or`polytomous outcomes`

.\(R^2\), correlations (between predicted and observed outcomes), and RMSE measures may be used to quantify the performance of various supervised forecasting methods on

`continuous features`

.

Before we go into the cross-validation of predictive analytics, we will present several examples of *overfitting* that illustrate why certain amount of skepticism and mistrust may be appropriate when dealing with forecasting models based on large and complex data.

By 2017, there were only **57 US presidential elections** and **45 presidents**. That is a small dataset, and learning from it may be challenging. For instance:

- If the predictor space expands to include things like
*having false teeth*, it’s pretty easy for the model to go from fitting the generalizable features of the data (the signal, e.g., presidential actions) to matching noise patterns (e.g., irrelevant characteristics like gender of the children of presidents, or types of dentures they may wear). - When overfitting noise patterns takes place, the quality of the model fit assessed on the historical data may improve (e.g., better \(R^2\), more about the Coefficient of Determination is available here). At the same time, however, the model performance may be suboptimal when used to make inferences about prospective data, e.g., future presidential elections.

This cartoon illustrates some of the (unique) noisy presidential characteristics that are thought to be unimportant to presidential elections or presidential performance.