Defining Splits

Even though in many cases you want HelixTree to calculate the significant splits, there are times when you want to manually split on the variables and look at the statistical results. From the Manual Split view, you can pick an independent variable and manually split the results into two or more nodes.

The following describes how to do this for continuous or discrete predictors.

7.5.1 The Split Point


[Picture]
Figure 7.27: A view of the Split View referred to in the text.

In this view the response variable, BP is on the Y-axis, and the predictor variable, AGE? is shown on the X-axis. Data points are plotted and the red line is the moving average over the points.

There is a single cut point at age 50, and missing values are depicted as a large question mark. The mean response of observations with missing values is bracketed by one standard deviation on either side. The question mark (partially hidden to the right of the cut point) represents how the missing data will be classified.

Pressing the Reclassify Missing Data button toggles the missing values from being grouped with the right daughter (as currently pictured) or on their own which would place them to the far right, still represented by a question mark. Segmentation cut-points (splits) can be moved, added, or deleted as follows:

  • To move a cut point: Click the left button to drag and drop.
  • To add a cut point: Shift Left-click at the desired location.
  • To delete a cut point: Right-click on or near the cut point.

NOTE: The right mouse button may also be used to zoom the graph. See Section 7.5.5, Zooming into a Specific Region below.

7.5.2 The Split Point Controls the Node Information


[Picture]
Figure 7.28: This is a depiction of the tree with the node split on the cut point

With the split line at 50, this is what the tree view looks like. Note we have 518 patients under or equal to age 50. This is a live connection between the define split view and the tree view. Drag the split line and the tree view reflects the change.


[Picture]
Figure 7.29: The Split View showing the cut point moved and a second one added.

Switching from the Age? variable to the Age variable (with one split at about age 40), we now look at adding a cut point. You could use Shift Click to add a cut point at about age 60. You now have two cut points.


[Picture]
Figure 7.30: The tree view with three nodes reflecting the changes in cut points

7.5.3 Smoothing the Data Points


[Picture]
Figure 7.31: The slider at the bottom changes the width of the moving average of the points.

The tree view is recomputed and re-displayed whenever the user changes the position of the cut points or missing values.

7.5.4 Refined or Course Data Points

Notice the small black dots in Fig. 7.31. Each dot represents one record. The pull-down menu allows us to enlarge the points from refined data points to coarse data points. This may give you a better visual feel for the data point distributions of the data.


[Picture]
Figure 7.32: A view of the Coarse data points.

7.5.5 Zooming into a Specific Region

The data area of these graphs, and of the Show Split Data graphs in Section (7.3.5), may be zoomed using the right mouse button.


[Picture]
Figure 7.33: Right click and diagonally drag with the mouse to zoom on a rectangular area.

A rectangular “rubberband” defining the potential region shows the area that will be in the expanded view.


[Picture]
Figure 7.34: The graph showing the zoomed-in area delineated above.

The data points and other features from the rectangular area will now fill the graph. What had been outside the original rectangular area, which may include lines indicating cut points, will no longer show.


[Picture]
Figure 7.35: An area of the graph defined by the two horizontal lines showing an area of the graph to be expanded.

Alternatively, you can expand the graph in just one direction. To zoom in just the direction of one of the axes, right click and diagonally drag over that axis (where the numbers are). Two lines perpendicular to the axis defining the potential region show you the area to be expanded.


[Picture]
Figure 7.36: The expanded area of the graph previously defined by the two horizontal lines.

Here, after expanding vertically, we decide to further expand the image in the horizontal direction.


[Picture]
Figure 7.37: An area of the graph defined by the two vertical lines showing an area of the graph to be further expanded.


[Picture]
Figure 7.38: The expanded area of the graph previously defined by the two vertical lines.

To restore the original graph, press the Reset View button.

7.5.6 Adding Linear Regression


[Picture]
Figure 7.39: A tree view with linear regression used.

From the tree view, we can turn on linear regression from the Tree->Options menu. If we then manually split on the Age linear regression “split”, it drops a regression residual node. If we then try to split the residual node on the Age variable we get the Define Split view shown in Fig. 7.40, which shows how the residual has removed the linear relationship between Age and BP, and the Age split is not statistically significant.


[Picture]
Figure 7.40: A Define Split view of a linear regression.

7.5.7 Categorical Predictors

When we choose a categorical (text) value to split on, the Define Split button opens a different dialog.


[Picture]
Figure 7.41: A view of the data showing the grouping of the predictors.

This dialog is used to control the grouping of categorical predictors. The response variable is plotted in the Y dimension as bars. The categorical predictor labels are on the X-axis.

In the example above, we have split on a genetic variable, where there are 3 categorical variables grouped into two distinct groups. The categories 1_2 has been selected and is highlighted in pink.

If the user highlights one or more variables by Control-clicking on them, and then clicking in the blank space, the selected variable(s) will be segregated from the rest, into their own group. Or, by clicking on the left-most group, the selected variable(s) will combine with the leftmost variables to form a new group composed of the union of all those variables.


[Picture]
Figure 7.42: The graph redrawn to show the regrouping described in the text.

After selecting 1_2 and then clicking in a blank space the 1_2 value makes a new group, as shown in Fig. 7.42.


[Picture]
Figure 7.43: Here we have joined 2_2 and 1_2 together.

Here are the details:

To select the categorical predictor values to regroup:

  • Left-click once on a vertical bar to select one categorical value; OR
  • Hold down Ctrl while clicking on the desired values to select/de-select one or more non-contiguous values; OR
  • Left-click on the first value, then Shift click on the last value in a range to select a linear range of consecutive values.

To place the selected value(s) into an existing group:

  • Click on the existing group.

To place the selected value(s) into a new group:

  • Left-click on empty space between groups

To reclassify one value:

  • Click and hold the value.
  • Drag it over to either another group or an empty space between groups, and release the button there.