‹‹ Back to SVS Home
Logistic Regression
15.4 Logistic Regression
Genotype or Numeric Association Test - Logistic Regression
The (univariate binary) response, y is fit to the given predictor variable, x, using logistic regression, and the results include
the regression p-value, the parameters β0 and β1 which are output in a new spreadsheet along with other association test
results, any multiple test correction results, as well as any expected p-values based on the rank of the observed p-value and
number of predictors. The response is represented with the formula y = logit(b1x + b0) + 𝜖, with the model itself being the
expression logit(b1x + b0) and the error term, 𝜖, expressing the difference, or residual, between the model of the response and
the response itself.
Assuming there are n observations, a logit model is used to fit the binary response, y, using x and a vector of 1’s as a
covariate matrix (z). (The vector of 1’s facilitates obtaining an intercept.) The Newton’s method approach of maximizing the
log likelihood function is used for estimating the logit model [Green 1997]. The null hypothesis being tested is that the slope
and intercept coefficients in the logit model are zero. A likelihood statistic is calculated, where L0 is the unrestricted
likelihood and L1 is the restricted likelihood, and −2ln(L0∕L1) is asymptotically distributed as Chi-Squared with n − 1
degrees of freedom.
We simplify the notation by using a lower-case l to mean “log likelihood”–that is,


Using this notation, the unrestricted log likelihood is
![l0 = n[slog(s)+ (1 − s)log(1− s)],](manual152x.png)
![∑n [ ( ) ( )]
l1 = yilog ----1---- + (1− yi)log 1− ---1----- ,
i=1 1 + e− ˆβTzi 1+ e− ˆβTzi](manual153x.png)
is the vector (b1,b0).Multiple Logistic Regression
Full Model Only Regression Equation
The multiple logistic regression uses a logit model to fit the binary response y, using the covariate matrix X, consisting of
the regression coefficients for continuous predictors and indicator coefficients for categorical predictors, along with a column
of 1’s for the intercept. The Newton’s method approach of maximizing the log likelihood function is used for estimating the
logit model [Green 1997]. The null hypothesis being tested is that the slope and intercept coefficients in the
logit model are zero. A likelihood statistic is calculated, where L0 is the unrestricted likelihood and L1 is the
restricted likelihood, and −2ln(L0∕L1) is asymptotically distributed as Chi-Squared with n − 1 degrees of
freedom.
We simplify the notation by using a lower-case l to mean “log likelihood”–that is,


Using this notation, the unrestricted log likelihood is
![l0 = n[slog(s)+ (1 − s)log(1− s)],](manual157x.png)
![[ ( ) ( )]
∑n ----1---- ---1-----
l1 = yilog 1 + e− ˆβTzi + (1− yi)log 1− 1+ e− ˆβTzi ,
i=1](manual158x.png)
is the vector (b0,b1,…,bk) of slope coefficients.Full Versus Reduced Model Regression Equation
For the full versus reduced logistic regression model, logistic regression equations are obtained for both the full model and
for the reduced model. The reduced logistic regression model includes only the dependent and any covariates selected for
the reduced model. The full logistic regression model includes all of the variables including any full model
covariates.
A likelihood ratio statistic is calculated to find the significance of including the full model regressors vs not including these regressors. The restricted likelihood of the reduced model is represented by L0 and L1 is the restricted likelihood of the full model. Both L0 and L1 are computed as below:
![n [ ( ) ( ) ]
l0 = logL0 = ∑ yilog ----1---- + (1− yi)log 1− ----1---- ,
i=1 1 +e− ˆβTRzi 1 + e− ˆβRTzi](manual160x.png)
![n [ ( ) ( ) ]
l = logL = ∑ y log ----1---- + (1− y )log 1− ----1---- ,
1 1 i=1 i 1 +e− ˆβTF zi i 1 + e− ˆβFTzi](manual161x.png)
and p − value = P(X > −2(l0 − l1)) where X ∼ χ2(m − k) where m are the degrees of freedom of the full model and k
are the degrees of freedom of the reduced model. Here,
R is the reduced model vector of slope coefficients, and
F is the full
model vector of slope coefficients.
Regressor Statistics
The coefficient of each regressor, along with the y-intercept, is calculated as a part of the Newton’s method approach of maximizing the log likelihood function for the full model.
The standard error of the jth regressor is found by inverting the information matrix of the regression, which is formed using the intercept as the last coefficient. The square root of the jth diagonal element of the inverted matrix is the standard error of the jth regressor, and the square root of the last diagonal element of the inverted matrix is the standard error of the intercept. (See [Hosmer and Lemeshow 2000].)
The p-value Pr(Chi) associated with dropping the jth regressor from the regression is found by running a separate logistic regression using all the regressors as the full model and a model with all the regressors except the jth regressor as the reduced model. (The “Chi” refers to the likelihood ratio test that is performed between these two models to find the p-value.)
The regression odds ratio for a coefficient β is simply eβ. The interpretation of this is by how much (by what ratio) the odds of the dependent being one change if the given regressor changes by one unit. An example would be the ratio of the odds of being a case rather than a control for a smoker to the odds of being a case rather than a control for a non-smoker.
The p-value for the univariate fit of the jth regressor is obtained from a separate logistic regression which is calculated as if the jth regressor were the only regressor in the model against the dependent variable.
Categorical Covariates and Interaction Terms
If a covariate is categorical, dummy variables are used to indicate the category of the covariate. A value of “1” for
the observation indicates that it is equal to the category the dummy variable represents. Similarly, if the
observation is not equal to the category for the dummy variable, then it is assigned the value of “0”. As the
values of one dummy variable can be determined by examining all other dummy variables for a covariate, in
most cases the last dummy variable is dropped. This avoids using a rank-deficient matrix in the regression
equation.
A first-order interaction term is considered a new covariate created from the product of two covariates as
specified in either the full- or reduced-model covariates. If one interaction term is categorical, dummy variables
for each category of the covariate will be multiplied by the other covariate to create a first-order interaction
term. If both covariates are categorical, dummy variables from both covariates will be multiplied by each
other.
For example, consider the following covariates for five samples.
| Sample | Lab | Dose | Age |
| sample01 | A | Low | 35 |
| sample02 | A | Med | 31 |
| sample03 | A | High | 37 |
| sample04 | B | Low | 32 |
| sample05 | B | Med | 36 |
| sample06 | B | High | 33 |
Using dummy variables for the categorical covariates the above table would be:
| Sample | Lab=A | Lab=B | Dose=Low | Dose=Med | Dose=High | Age |
| sample01 | 1 | 0 | 1 | 0 | 0 | 35 |
| sample02 | 1 | 0 | 0 | 1 | 0 | 31 |
| sample03 | 1 | 0 | 0 | 0 | 1 | 37 |
| sample04 | 0 | 1 | 1 | 0 | 0 | 32 |
| sample05 | 0 | 1 | 0 | 1 | 0 | 36 |
| sample06 | 0 | 1 | 0 | 0 | 1 | 33 |
Interactions Lab*Dose and Lab*Age would be specified as:
| Sample | A*Low | A*Med | A*High | B*Low | B*Med | B*High | A*Age | B*Age |
| sample01 | 1 | 0 | 0 | 0 | 0 | 0 | 35 | 0 |
| sample02 | 0 | 1 | 0 | 0 | 0 | 0 | 31 | 0 |
| sample03 | 0 | 0 | 1 | 0 | 0 | 0 | 37 | 0 |
| sample04 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 32 |
| sample05 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 36 |
| sample06 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 33 |
Stepwise Regression
If only a few variables (regressors or covariates) drive the outcome of the response, Stepwise Regression can isolate these
variables. The methods for the two types of stepwise regression, forward selection or backward elimination, are described
below.
Forward Selection
Starting with either the null model or the reduced model (depending on which type of regression was
specified), successive models are created, each one using one more regressor (or covariate) than the previous
model.
Each of the unused regressors is added to the current model to create a “trial” model for that regressor. The p-value of
the trial model (or full model) versus the current model (or reduced model) is calculated, and the model with the smallest
p-value is used as the next model. This method adds the next most significant variable to the current model. If the current
model had the smallest p-value, or if no p-value is better than the p-value cut-off specified, then the forward selection
method stops and declares the current model as the final model as determined by stepwise forward selection.
If the model with all regressors has the smallest p-value then this full model is determined to be the final
model.
From the standpoint of further analysis, the final model becomes the “full model” for this set of potential
regressors.
Backward Elimination
Starting with the full model, successive models are created, each one using one less regressor (or covariate) than the
previous model.
Each of the regressors currently in the model is removed to create a “trial” model excluding that regressor. The p-value of
the current model (or full model) versus the trial model (or reduced model) is calculated, and the model with the smallest
p-value is used as the next model. This method removes the least significant variable from the current model. If every p-value
is smaller than the p-value cut-off specified, the backward elimination method stops. The method also stops if
all variables have been removed from the model, or if all variables left are included in the original reduced
model.
From the standpoint of further analysis, the final model becomes the “full model” for this set of potential
regressors.