7.5 Defining Splits

Even though in many cases you want ChemTree to calculate the significant splits, there are times when you want to manually split on the variables and look at the statistical results. From the Manual Split view, you can pick an independent variable and manually split the results into two or more nodes.

The following describes how to do this for continuous or discrete predictors.

7.5.1 The Split Point


[Picture]
Figure 7.24: A view of the Split View referred to in the text.

In this view the response variable, Potency is on the Y axis, and the predictor variable, PLHI:C(C)-C(CC) is depicted on the X axis. Data points are plotted, and the green line is a moving average over the points. There is a single cut-point that divides the compounds into two groups: path length 4 and lower, and path length 5 and higher. The missing values have been grouped with the latter, and are depicted as a large question mark with an underscore at the level of the mean Potency.

Pressing the Reclassify Missing Data button toggles the missing values from being grouped with the right daughter (as currently pictured) or on their own which would place them to the far right, still represented by a question mark. Segmentation cut-points (splits) can be moved, added, or deleted as follows:

  • To move a cut point: Click the left button to drag and drop.
  • To add a cut point: Shift Left-click at the desired location.
  • To delete a cut point: Right-click on or near the cut point.

NOTE: The right mouse button may also be used to zoom the graph. See Section 7.5.5, Zooming into a Specific Region below.

7.5.2 The Split Point Controls the Node Information


[Picture]
Figure 7.25: This is a depiction of the tree with the node split on the cut point

With the split line at 4 this is what the tree view looks like. Note we have 31 compounds with a path length of 4 or lower and 128 compounds with a path length of 5 or higher. This is a live connection between the define split view and the tree view. Drag the split line and the tree view reflects the change.


[Picture]
Figure 7.26: The Split View showing the cut point moved and a second one added.

You could use Shift Click to add a cut point at the pathlength of 3. You now have two cut points.


[Picture]
Figure 7.27: The tree view with three nodes reflecting the changes in cut points

7.5.3 Smoothing the Data Points


[Picture]
Figure 7.28: The slider at the bottom changes the width of the moving average of the points.

The tree view is recomputed and re-displayed whenever the user changes the position of the cut points or missing values.

7.5.4 Refined or Course Data Points

Notice the small black dots in Fig. 7.28. Each dot represents one record. The pull-down menu allows us to enlarge the points from refined data points to coarse data points. This may give you a better visual feel for the data point distributions of the data.


[Picture]
Figure 7.29: A view of the Coarse data points.

7.5.5 Zooming into a Specific Region

The data area of these graphs, and of the Show Split Data graphs in Section (7.3.7), may be zoomed using the right mouse button.


[Picture]
Figure 7.30: Right click and diagonally drag with the mouse to zoom on a rectangular area.

A rectangular “rubberband” defining the potential region shows the area that will be in the expanded view.


[Picture]
Figure 7.31: The graph showing the zoomed-in area delineated above.

The data points and other features from the rectangular area will now fill the graph. What had been outside the original rectangular area, which may include lines indicating cut points, will no longer show.


[Picture]
Figure 7.32: An area of the graph defined by the two horizontal lines showing an area of the graph to be expanded.

Alternatively, you can expand the graph in just one direction. To zoom in just the direction of one of the axes, right click and diagonally drag over that axis (where the numbers are). Two lines perpendicular to the axis defining the potential region show you the area to be expanded.


[Picture]
Figure 7.33: The expanded area of the graph previously defined by the two horizontal lines.

Here, after expanding vertically, we decide to further expand the image in the horizontal direction.


[Picture]
Figure 7.34: An area of the graph defined by the two vertical lines showing an area of the graph to be further expanded.


[Picture]
Figure 7.35: The expanded area of the graph previously defined by the two vertical lines.

To restore the original graph, press the Reset View button.

7.5.6 Categorical Predictors

When we choose a categorical (text) value to split on, the Define Split button opens a different dialog.

This dialog is used to control the grouping of categorical predictors. The response variable is plotted in the Y dimension as bars. The categorical predictor labels are on the X-axis.

If the user highlights one or more variables by Control-clicking on them, and then clicking in the blank space, the selected variable(s) will be segregated from the rest, into their own group. Or, by clicking on the left-most group, the selected variable(s) will combine with the leftmost variables to form a new group composed of the union of all those variables.

Here are the details:

To select the categorical predictor values to regroup:

  • Left-click once on a vertical bar to select one categorical value; OR
  • Hold down Ctrl while clicking on the desired values to select/de-select one or more non-contiguous values; OR
  • Left-click on the first value, then Shift click on the last value in a range to select a linear range of consecutive values.

To place the selected value(s) into an existing group:

  • Click on the existing group.

To place the selected value(s) into a new group:

  • Left-click on empty space between groups

To reclassify one value:

  • Click and hold the value.
  • Drag it over to either another group or an empty space between groups, and release the button there.