Results from Linear Regression (Optional Module)
NOTE: Sometimes a linear regression attempt will fail. This can be because of insufficient rank in the matrix due to there not being enough observations or due to some of the regressors being “colinear”, that is, being linear combinations of another regressor or regressors and therefore not able to “present new data” to the regression.
When doing haplotype regression on a spreadsheet which has some rows inactivated, a regression can also fail when some haplotypes are only present in the inactivated rows.
26.11.1 P-value Plot
In the case that a moving HTR window was specified, a p-value plot is output. The p-value at any specific plot location is from the regression which was done for the window which begins at the indicated marker, and is the full-model-vs-reduced-model p-value, if applicable, or otherwise is the regression p-value.
26.11.2 Residual Spreadsheet
In the case that a moving HTR window was NOT specified, a residual spreadsheet may be produced. This spreadsheet will contain the actual, predicted, and residual values for each sample, as well as the estimated haplotype values for haplotypes and the spreadsheet values for non-genetic regressors. The residual value of a sample is defined as the difference between the sample’s actual value and its predicted value from the regression.
26.11.3 Linear Regression Statistical Output Viewer
As detailed in 24.2.6, a statistical output viewer will be displayed for a single regression, either when run directly or when invoked from a point in the p-value plot.
26.11.4 Overall Statistics
If a full vs. reduced model is being used, the following overall statistics are displayed for both normal and stepwise regression:
- The name of the response variable.
- The R2 statistic (coefficient of determination) of the full model. For stepwise regression, “full” means only
using the regressors chosen from the stepwise procedure (plus the reduced-model covariates).
The coefficient of determination R2 is computed as

where sse is the sum square of errors (sum of squares of predicted minus actual values) and sst is the sum square of totals (sum of squares of the dependent average minus the actual values). This statistic is sometimes thought of as the amount of variation of the dependent variable “explained” by the independent variables.
- The R2 statistic of the the reduced model, computed in a similar fashion.
- The adjusted R2 statistic of the full model. Adjusted R2 is meant to compensate for many regressors each
“explaining” small portions of the variation by chance.
This statistic is computed as

where N is the sample size and k is the number of regressors. (In some cases, adjusted R2 may be negative.)
- The F-statistic for the full model.
- The F-statistic for the full vs. the reduced model.
- The p-value from the full model.
- The p-value from the full vs. the reduced model.
- The permuted P-Value, if permutation testing has been selected.
- The number of permutations, if permutation testing has been selected.
- Regression degrees of freedom of the full model.
- Regression degrees of freedom of the reduced model.
- Residual degrees of freedom of the full model.
- Total degrees of freedom of the full model.
If only a full model without a reduced model is being used, the following overall statistics are displayed for both normal and stepwise regression:
- The name of the response variable.
- The multiple correlation coefficient R. (Square root of R2.)
- The coefficient of determination R2.
- The adjusted-R2 statistic.
- The sample size.
- The standard error of the estimate. This is computed as

where se is the standard error (of the estimate), sse is the sum square of errors (sum of squares of predicted minus actual values), n is the sample size, and reg_df is the number of regressors in the full model.
- The standard deviation of the response.
- The F-statistic.
- The p-value from the regression.
- The permuted p-value, if permutation testing has been selected.
- The number of permutations, if permutation testing has been selected.
- The regression degrees of freedom.
- The residual degrees of freedom.
- The total degrees of freedom.
26.11.5 Regressor Statistics
For all types of linear regressions, the y-intercept for the full model is displayed, and for full-vs-reduced-model linear regressions, the y-intercept for the reduced model is displayed.
Then, the following statistics are displayed for each regressor:
- The regressor, which might be either a haplotype or a covariate.
- The regression coefficient for this regressor.
- The standard error for this regressor. To compute this, a full-model-only regression is taken with all the
regressors but this one as the full model, but with this regressor as a substitute dependent variable. If ssr is the
sum of squares of this regressor’s actual values minus this regressor’s average, and Rr2 is the R2 value obtained
from this regression-against-the-regressor, and the standard error of the estimate is se, then the standard error
of the regressor sr will be

- The value of the t-statistic for this regressor.

where β is the regressor’s regression coefficient.
- Pr(> |t|). This is the p-value from regressing using the actual full model as its full model, but using the actual full model without this regressor as its reduced model. Thus, this shows how much difference this particular regressor is making in the regression. Pr(> |t|) refers to the probability that the difference made by adding this regressor is accounted for by chance, and thus that this case could be thought of as being in one of the “tails” of the t-distribution.
- Univariate Fit. This is the p-value of simply taking a regression with this regressor, all by itself, against the dependent variable. Even if the main regression is full-model vs. reduced-model, this regressor will be the only regression variable involved at all in finding this p-value.
26.11.6 Left-Out Regressors
Any potential regressors which have been left out are listed here.
All non-stepwise regressions which include haplotypes leave out one final haplotype-based regressor. This may be a haplotype of a “normal” frequency (if there is no “rare” haplotype), a “rare” haplotype, or one final haplotype category consisting of the frequencies of the “rare haplotypes” aggregated together. Leaving out this final regressor avoids the multicollinearity problem that would otherwise occur between the haplotypic regressors.
For a stepwise regression, this list will include all regressors that were excluded from the final model of the regression.
26.11.7 Table of Haplotypes
If haplotypes were involved, a table of the haplotypes (used in the regression or not) and their frequencies is shown.
Additionally, if a full vs. reduced model is used for the main regression, the “Individual vs. Reduced Model P-Value” is shown for each haplotype. The full model for this p-value is derived from taking the reduced model and adding to it the haplotype being listed. The reduced model is also used as a reduced model here. (This contrasts with the method for finding the “univariate fit”, which uses no reduced model.)
26.11.8 Parameters
The parameters used for the regression are shown.
NOTE: If this display was made from clicking a single point in a p-value plot made from a moving window, and then clicking the View Regression Results button, the markers used for this point’s regression are shown just before this (Parameters) section, after the table of haplotypes. Otherwise, they are shown near the bottom of this section.