13.2 Viewing Correlation Interactions


[Picture]
Figure 13.3: A Correlation Interaction matrix of the previously selected variables

13.2.1 The Correlation Interaction View

Figure 13.3 shows a matrix of the selected variables in the order they were sorted in the Multitree Model window.

The numbers appearing in the black diagonal blocks represent the average proportion of observations described by the variable across all trees. They show the proportion of all observations, over all enumerated random trees, for which the indicated variable was a subject splitter.

13.2.2 The Upper Triangle

The upper triangle of colored blocks represents the average proportions of observations described jointly by their corresponding variable pairs across all trees. That is, each cell in the upper triangle displays the proportion of all observations over all enumerated random trees for which the indicated variables were both subject splitters.

13.2.3 The Lower Triangle

Each cell in the lower triangle of colored blocks represents the statistical difference in standard deviation distances between the actual number of cases described by the variable pair and the expected number of cases. (The standard deviation distance referred to here is the standard deviation of a would-be random variable being used to determine splitting where the average proportions of observations for each of the two variables (considered separately) match between the random variable and the actual proportions of observations. One unit represents one standard deviation distance of the would-be random variable.)

A positive number means the splitters tend to bunch together for the same observations (positive correlation in a mathematical sense). This represents evidence of an interaction effect between the splitters. That is, some effect depends upon both splitters acting simultaneously, or interacting, to create that effect. In a tree, this will appear with a split with one splitter underneath a split with the other splitter.

A negative number means the splitters tend to stay apart from each other and not appear together with the same observations, or at least appear together less than they would at random. (This is negative correlation in a mathematical sense.) This represents evidence of a correlation or similarity between the effects of the variables. That is, some effect may be “explained” by either one splitter or the other. If the first splitter is used, the other is “not needed”, and if the other splitter is used, the first is “not needed”. This will show as either one splitter or the other but not both being used in any given tree.

Coloring is as follows:
Red represents evidence of interaction (positive mathematical correlation). Darker red represents more evidence of an interaction effect.
Blue represents evidence of correlation of effects (negative mathematical correlation). Darker blue represents more evidence of correlation of effects.
The color spectra range from zero to four standard deviations (of our would-be random splitting variable).
Closer to white indicates the variables are independent.

Suppose ni describes the number of observations influenced by variable i, and nj describes the number of observations influenced by variable j, and nij describes the number jointly influenced. Then the statistic computed (and displayed in the lower-left triangle) is:

nij --ninnj-
 ∘ ninj .
    n

In the Correlation Interaction view which is shown, the effects of “Smoke?” and “BMI” are strongly correlated. “Smoke?” describes approximately 70% of the observations and “BMI” describes about 60%. If they were independent, one would expect that approximately 42% of the observations would be jointly influenced (.7x.6). Instead it is significantly (6 standard deviations) lower.