‹‹ Back to SVS Home
17.3 Normally Distributed Response Continuous-Ordinal Predictor
17.3 Normally Distributed Response Continuous-Ordinal Predictor
The model is that the observations segment into k subgroups, and that each subgroup has a different mean with noise. It turns out that to find the k-1 cut-points that optimally split the data in a maximum likelihood sense reduces to minimizing the sum of squared deviations of the subgroup means from the observations. This can be done efficiently using dynamic programming. We do not outline the method here, however, an excellent reference is Bret Musser’s thesis, “Extensions to Recursive Partitioning” University of Minnesota School of Statistics, 1999.
There is limited theory on the choice of the number of segments. A heuristic is used whereby a two-sample t-test is done between each adjacent segment in a k-way split, and if the means are not significantly different, we drop down to the best k-1 split. Once each adjacent segment is significantly separated, we do an overall goodness of fit test for the resulting split of 2 or more segments.
One can use an F-test to test the hypothesis that there is only one mean versus several. This p-value, which is labeled “p=” in a tree node display, can be calculated as follows.
Let there be n observations split into k subgroups with D unique values of the continuous or ordinal predictor (not counting a possible missing value). Let F0 be the sum of squared deviations from the mean over all responses. Let Fk be the sum over all segments of the squared deviations from the mean responses of each respective segment. Then

and the p-value is given as an F-test with k-1 and n-k degrees of freedom: p = Ftest(F,k - 1,n - k). Note that this test
statistic does not account for the exhaustive searching through all possible cut-points in finding the optimal set of segments. A
multiplicity adjustment is called for. Musser’s thesis describes many approaches. An obvious but overly conservative
adjustment would be on the order of
, where D is the number of unique observations, and k is the
cardinality of the split. Musser describes various asymptotic estimates, however he notes that these do not hold so
well for smaller sample sizes. The approach we use is that of Douglas Hawkins (unpublished work). He has
calculated a multiplicity adjustment based on curve-fitting thousands of simulations of segmenting random
normal data. The adjusted p value, denoted “aP=” within a node is calculated from the raw p-value, “p”, as
follows:
Let I=1 if there are missing values, otherwise I=0;
if((I=1 and D=k-1) or (I=0 and D=k)) then aP=p. Otherwise:
let X = log100D if k =2 then a = 2.401 + (1.2799 log(X) ) + (0.2646 I ), otherwise
a= 0.82626 + (3.8544 X) - (0.61894 k) + (0.71286 kX) + (0.065859 I)
b= 1.2565-(0.83545 X)+(0.41471 X2)+(0.0012531 k2)-(0.0325355 kX)
aP = min( exp( a + b log(p)), 1)