17.8 Binomially Distributed Response Binary Predictor

17.8.1 Univariate Case

In this case, all the observations with zero as the predictor variable are placed in one group, and all of the observations with a one as the predictor variable are placed in a second group. A chi-square test is used to determine the probability that the two groups have the same proportion of ones and zeros.

More specifically, suppose you have a binary split with n0 items in one group, whose binary responses add to s0, and there are n1 items in the other group whose binary responses add to s1. Then

                            2
X2 =  (s0(n1 --s1)--s1(n0---s0))-(n0 +-n1),
        n0n1(s0 +s1)(n0 - s0 + n1 - s1)

and the p-value is given by integrating the tail of a chi-square distribution with 1 degree of freedom:

p = chisqr(X2,1).


17.8.2 Multivariate Case

Suppose we have a v-dimensional binary response, and we split those observations into two daughter nodes based on whether the predictor for the given observation is zero or one. Let si be the proportion of ones for a given binary response, for i = 1...v. Let pi be the proportion of ones for dimension i in all of the observations where the predictor is equal to one, and qi be the proportion of ones for dimension i in all of the observations where the predictor is equal to zero. Let np be the number of observations where the predictor is equal to one, and nq be the number of observations where the predictor is equal to zero. Then

    ∑v
F0 =   [- 2n(silog(si)+ (1- si)log(1- si))]
    i=1

     ∑v
Fk =    [- 2np(pilog(pi) +(1 - pi)log(1 - pi))- 2nq(qilog(qi)+ (1- qi)log(1- qi))]
     i=1

p = aP = chisqr(F - F ,1).
                0   k