17.10 Binomially Distributed Response Categorical Predictor

In the univariate case, the predictors are ordered from lowest to highest in terms of their means (proportion of ones) before segmenting is performed. As with the continuous predictor case, segmenting is performed by minimizing the metric

        k
       ∑
Fk = -    [- 2nj(pjlog(pj)+ (1 - pj) log(1- pj))],
       j=1

where there are n observations split into k subgroups of D categories.

In the multivariate case, where we have a v-dimensional response, we order the predictors from lowest to highest in terms of the proportion of ones for a given dimension, calculate the optimal split, and then reorder along the next dimension and calculate the optimal split, and so forth, and take the split with lowest overall p-value. The p-value calculations are as follows:

  • More than two categories (D > 2) or multivariate. An analysis of deviance test is performed. Let s be the proportion of ones in the entire sample, and pj be the proportion of ones in segment j. Then
    F0 = - 2n(s log(s)+ (1 - s)log(1- s)),

    and Fk is defined as above. Then X2 = F0 - Fk, and p = chisqr(X2,k - 1). If k = D, aP = p; otherwise,

               (                               )
                      F0 - Fk
aP = chisqr  min(1.013+---1-----1-,1),D - 1  .
                        D0.75   k1.75

  • Two categories (and two segments) (D = k = 2), univariate test. A Pearson Chi-squared test is done. Suppose there are n0 items in the first segment whose binary responses add to s0, and that there are n1 items in the other segment whose binary responses add to s1. Then
         (s (n - s )- s(n  - s))2(n + n )
X2 = --0--1---1----1-0---0-----0---1-,
       n0n1(s0 + s1)(n0 - s0 + n1 - s1)

    and p = chisqr(X2,k - 1) = chisqr(X2,1). aP = p in this case.

An analysis of deviance test is a likelihood ratio statistic.

When someone talks about the 2x2 chi-square, he or she is usually thinking of testing association in the 2x2 table using a Pearson chi-squared test, which is indeed the most common test statistic, but not the only one.

However, the analysis of deviance test is an accepted valid test, and in fact has somewhat more theory in its foundation than does the Pearson test as it is a likelihood ratio test, to which the Pearson is a first-order approximation.

Still, the Pearson and deviance tests, being first-order equivalent, normally do not differ all that much.