Creating a Random Tree Model
The random creation of multiple trees can be started from both a spreadsheet view and from a tree view.
From the spreadsheet view, the Random Tree Creation dialog window is accessed by Analysis->Create A Multiple Tree Model menu.
From a tree view window, the menu choice Tree->Extend Current Tree Randomly opens the Random Tree Creation dialog shown in Fig. 9.1. Normally, this selection would be made prior to partitioning the tree’s root node.
However, you may build random branches off of a partially built tree. To do this, invoke Extend Current Tree Randomly menu when a tree is partially built. HelixTree will then only randomly sample models that begin with that partially built tree.
|
From this dialog, you can specify tree building parameters as well as change the tree options used for performing the splits. The tree options can also be changed through the menu Trees->Options->Tree tab
Here are the details of the thirteen values you can edit (which are covered in detail at the beginning of Chapter 7):
| Value | Purpose |
|---|---|
| Number of random trees | This specifies the number of trees that will be generated. The more trees you have time to generate, the better the quality of the multiple tree analysis. You may wish to pick a large number, go out to lunch, and cancel the tree building when you get back. Those trees that were built prior to canceling will be available for analysis. |
| Random seed | This integer value sets the seed for the random number generator. This is useful for generating different forests of random trees, or if you want to regenerate a forest of random trees using the same sequence of random steps as done previously. |
| Number of significant splitters | The program will rank the splitters by significance and uniformly choose from among the 10 most significant by default. You may raise or lower this threshold. If you want any significant splitter to be chosen, set this to a number greater than or equal to the total number of columns in the spreadsheet. Note: non-significant splitters will not be randomly chosen. The default significance, which is set in the tree options menu, is 0.01. |
| Minimum elements per child | This specifies the minimum number of elements with a default setting of 1. |
| Max Segments | This is the maximum cardinality of a multi-way split, with a default value of 10. |
| Segmenting Algorithm | There are two segmenting algorithms available, the default Approximate which runs faster and the Exact which, though slower is...more exact! |
| Parallel threads | The default setting is 1. If you are running a multiprocessor machine, setting this value to the number of processors available accelerates the computing. |
| P value threshold type | The default setting is Bonferroni adjusted P. |
| P value threshold | The default setting is 0.01. |
| Multi-way split pairwise P value threshold | The default setting is 0.01. |
| Using missing values | This determines if missing values are used in prediction or are dropped. |
| Genotype splits | The maximum segments is set with a default value of 10. |
| Haplotype Trend Regression | By default this is not ON. In order to do Haplotype Trend Regressions (See Chapter 17) it must be turned ON. If it is ON, then additional elements are available to change such as Marker window size, Minimum frequency. The type of haplotype estimation (EM or CHM can be changed as well as the Maximum EM Iterations and EM Convergence Tolerance.) |
| Non-genetic splits | The default setting is ON. |
| Linear/Logistic Regression | The default setting is OFF. (See Chapter 7.) |
Click on the Go button to begin the generation of random trees.
Click on Close to exit without creating a tree.
Click on Help to open a simple help screen.
If you have chosen a sizable number of trees to create, this could take some time. The following progress indicator window pops up showing the percentage completed, thus helping you gauge the amount of time remaining. You can click the End Processing button to stop the tree building before completion. All the trees built up to this point will be saved and be available for analysis.
|
Once the multiple tree model is created, it is saved in the Project Folder and can be viewed, renamed, or recalled from the Project Navigator window.