Logistic Regression From a Tree Node
The (univariate binary) response, y, is fit to the given predictor variable, x, using logistic regression, and a residual node is created beneath the node in the tree. We may represent the response as y = logit(b1x + b0) + ε, with the model itself being the expression logit(b1x + b0) and the error term, ε, expressing the difference, or residual, between the model of the response and the response itself. If further splits and/or regressions are done upon this node, this ε term becomes effectively the “dependent variable” for those splits and/or regressions.
For missing values of the predictor, the prediction becomes the mean of the parent node.
Assuming we have n observations, we use a logit model to fit the binary response, y, using x and a vector of all one’s as a covariate matrix (z). (The vector of all one’s facilitates obtaining an intercept.) We use the Newton method approach of maximizing the log likelihood function for the logit model outlined in Econometric Analysis, by W.H. Green, third ed., Prentice Hall, NJ, 1997, pp. 882-886. To obtain a p-value, we test the hypothesis that the slope and intercept coefficients in the logit model are zero. We calculate a likelihood ratio statistic, where l0 is the unrestricted likelihood and l1 is the restricted likelihood, and -2ln(l0∕l1) should be chi-squared with n-1 degrees of freedom.
We simply the notation by using capital L to mean “log likelihood”–that is,

and

where base e logarithms are used.
Using this notation, the unrestricted log likelihood is
![L0 = n[s log(s)+ (1- s)log(1- s)],](manual136x.png)
where s is the proportion of the n dependent observations (y1...yn) that are equal to one, and the restricted log likelihood is
![n [ ( ) ( ) ]
L = ∑ y log ----1---- + (1- y )log 1- ----1---- ,and
1 i=1 i 1 + e- βTzi i 1 + e-βTzi](manual137x.png)

Here, β is the vector
.