Forests of Random Trees
While individual trees are adequate predictors, and are very useful for exploring and analyzing data, there is room for improvement in terms of predictive power. Years of research and analysis have proved that averaging ensembles of predictive models, or “Forests” of Trees, gives the best performance. While this was first implemented with tree-based models several years ago, few products have yet to implement the approach. Of those few that have, none have done so with either multi-way segmentation or with multivariate outcomes. And no one but Golden Helix provides it in a single, fully integrated package.
Time and again, we have found that the random tree approach implemented in Optimus RP outperforms all other predictive modeling techniques.
Forests of Random Trees Process:
The process for creating and using Forests of Trees is very simple, and is largely automated. Start by indicating the number of random trees that are to be included in the model, confirm the level of significance at which you wish to stop building a given tree, and press ok. That simple.
Behind the scenes, Optimus RP performs exhaustive calculations, finding the best split (binary or multi-way) for each variable and calculating its statistical significance. Optimus RP then randomly chooses one of the statistically significant splits, ignoring the others, splits the data again, and then repeats the process in each sub-group until there are no further significant splits and the preset limits of statistical significance are reached. This ends the process for the first tree. This process is repeated over and over again, for each tree in the forest, until the limits of statistical significance are reached and the number of desired trees has been created.
Applying the Model
For a given tree, the prediction is the mean of the leaf node in which the observation falls. Averaging the mean of the leaf nodes across all trees gives a multi-tree prediction.