7.1 Tree Analysis Overview
Optimus RP’s tree analysis engine is based on Formal Inference Recursive Modeling (FIRM) technology. FIRM has its roots going back to work done in the 1970’s and 1980’s by Dr. Douglas Hawkins ( http://www.douglashawkins.com).
Early recursive partitioning approaches, such as AID, suffered from a lack of statistical rigor. Dr. Hawkins introduced statistical hypothesis testing as a means for better characterizing the statistical validity of the models generated. FIRM was released in the early 1980’s as a non-GUI package and is still in use today.
Optimus RP has taken the statistical foundations of FIRM and augmented it with faster and more exact segmenting algorithms. It has also extended FIRM methods to include multivariate response. We are grateful for the continued assistance of Dr. Hawkins in devising and improving many of the statistical and algorithmic methods underlying Optimus RP.
The best way to gain a working understanding of tree analysis is to go through an example.
NOTE: In the rest of this chapter, we generally assume that only one variable (column) has been designated in the spreadsheet to be dependent, and that if that dependent variable is categorical, that it contains only two categories. See 10 for a discussion of multiple dependent variables and multi-category dependent variables.
|
A Sample of a Tree Analysis
The tree in Fig. 7.1 is from the CSIM data set included in the Optimus RP release.
This data set simulates the effect of a blood pressure treatment on 1000 individuals. The measured response is diastolic blood pressure (BP), measured after a hypothetical drug treatment regime. There are several environmental variables that are introduced that mitigate activity of the simulated treatment and therefore affect the response variable.
To begin creating this tree, open a new project, import the CSIM.ghd legacy file, and then open the spreadsheet view. From the spreadsheet view, inactivate the BP_I column (gray it out), and set BP as the dependent variable. From the spreadsheet click on the Analysis->Interactive Tree Analysis menu items to bring up the tree view.
Right click on the tree node and do a Manual Split on Sex node. This split produces two subgroups - a subgroup of females with a lower blood pressure than the male subgroup (mean value 93.4 vs. 97.8).
Next do a Manual Split on Smoke? from the Sex F node. This splits the females into a smoking subgroup and non-smoking subgroup. The smokers have a higher blood pressure.
All of the effects are not necessarily found in a single tree. For example, at a given node there may be more than one significant splitter, but only one of these can be used as a splitter. This is why multiple trees are usually interactively explored or sampled using the random tree creation menu described later.