Linear Regression From a Tree Node
The (univariate) response, y, is fit to the given predictor variable, x, using linear regression, and a residual node is created beneath the node in the tree. We may represent the response as y = b1x + b0 + ε, with the model itself being the line expression b1x + b0 and the error term, ε, expressing the difference, or residual, between the model of the response and the response itself. If further splits and/or regressions are done upon this node, this ε term becomes effectively the “dependent variable” for those splits and/or regressions.
For missing values of the predictor, the prediction becomes the mean of the parent node.
The p-value is calculated using a likelihood ratio test. We calculate the sum of squared residuals as follows:
SSR = ∑
i=1n
2. Let the mean of the xi‘s be
, and the sum of squared deviations from the mean be given
by : SS = ∑
i=1n
2. Then

and the p-value is given by the two-sided Student’s t distribution with n-2 degrees of freedom:
