8.1 Training and Validation Recipe
As we build models from our data, it is valuable to see how they hold up out of sample. That is, we want to build a model on a training portion of our data set, and validate the model on the holdout (test) portion of the data set. A simple procedure to do this is as follows:
- Select a training subset of the data at random using the spreadsheet subset selection procedure and go into interactive tree analysis.
- Compute a tree on random subset.
- Return to the spreadsheet and invert the selection to obtain the holdout (test) data set.
- From the spreadsheet Analysis->Apply a Tree Model menu, choose the model built in step 2.
- You can then view the average tree predictions, as well as the RMS error of the predictions that result from applying the tree built from the training set onto the holdout set.