Test Statistics
HelixTree can perform or output the following statistical tests where appropriate:
- Correlation/Trend Test
- Armitage Trend Test
- Exact Form of Armitage Test
- (Pearson) Chi-Squared Test
- Fisher’s Exact Test
- Odds Ratio with Confidence Limits
- Analysis of Deviance
- F-Test
These are described in the following subsections.
18.3.1 Correlation/Trend Test
This test (which is not available when missing values are used as predictors) is available for both case/control and quantitative dependent variables for every genetic model except the genotypic model.
Also, this is the only test which is available if Principal Components Analysis (PCA) is used for stratification correction on the input data. (See 18.6.1.)
This test will show the p-value for the (possibly PCA-corrected) dependent variable value having any correlation with or “trend” which depends upon the (possibly PCA-corrected) count value of the genotype.
For case/control dependent variables, and before any PCA correction, a “case” is considered to have a value of one, and a “control” is considered to have a value of zero.
For the genotype predictor variable, its count values (before any PCA correction) are as follows:
- For the additive model: The count of the minor allele D, which is zero within genotype dd, one within genotype Dd, and two within genotype DD, where d is the major allele.
- For the dominant model: The count is one for genotypes DD and Dd and zero for genotype dd.
- For the recessive model: The count is one for genotype DD and zero for genotypes Dd and dd.
In addition, this test will show a signed correlation value indicating the amount and direction of dependency of the (possibly PCA-corrected) count value upon the (possibly PCA-corrected) dependent variable value.
NOTE: In the special circumstance of an additive model where the dependent variable is case/control and there is no Principal Components Analysis correction being done, this test yields p-value results very close to those obtained from the Armitage Trend Test, described below.
See Section 26.22.1 in the Formulas and Theories chapter for an explanation of this statistic.
18.3.2 Armitage Trend Test
This test is available specifically under the additive model for a case/control dependent variable when missing data is dropped.
The test is performed of “case” vs. “control” having a “trend” which depends upon the count of the minor allele D, which is zero within genotype dd, one within genotype Dd, and two within genotype DD.
See Section 26.22.2 in the Formulas and Theories chapter for an explanation of this statistic.
18.3.3 Exact Form of Armitage Test
This test is also available specifically under the additive model for a case/control dependent variable when missing data is dropped.
This exact test yields the probability under the null hypothesis of having a “trend” at least as extreme as the one observed, assuming an equal probability of any permutation of the dependent variable. This form, which is more computationally expensive than is the normal Armitage Trend Test, avoids the chi-square approximation used in that test.
See Section 26.22.3 in the Formulas and Theories chapter for an explanation of this statistic.
18.3.4 (Pearson) Chi-Squared Test
The Pearson Chi-Squared test is available for a case/control dependent variable for all genetic models and tests except the Additive Model, and is available whether missing values are used or dropped.
This test is on the observed contingency table vs. the expected contingency table created with all the possible variations of the selected model in one direction vs. “case” vs. “control” in the other direction, keeping the margins constant.
The respective contingency tables and their dimensions when dropping missing values are as follows:
| Genetic Model or Test | Contingency Table and Dimension |
|---|---|
| Basic Allelic Test | (Case/Control) vs (D vs d) (2 x 2) |
| Genotypic Test | (Case/Control) vs (DD vs dd vs Dd) (2 x 3) |
| Dominant | (Case/Control) vs ((DD or Dd) vs dd) (2 x 2) |
| Recessive | (Case/Control) vs (DD vs (Dd or dd)) (2 x 2) |
If you have chosen Use Missing Values As Predictors, the respective expanded contingency tables and their dimensions become as follows:
| Genetic Model or Test | Table and Dimension |
|---|---|
| Basic Allelic Test | (Case/Control) vs (D vs d vs missing – two missing values are used for every missing genotype) (2 x 3) |
| Genotypic Test | (Case/Control) vs (DD vs dd vs Dd vs missing) (2 x 4) |
| Dominant | (Case/Control) vs ((DD or Dd) vs dd vs missing) (2 x 3) |
| Recessive | (Case/Control) vs (DD vs (Dd or dd) vs missing) (2 x 3) |
See Section 26.22.4 in the Formulas and Theories chapter regarding the Pearson Chi-squared test.
18.3.5 Fisher’s Exact Test
The Fisher’s exact test is also available for a case/control dependent variable for all genetic models and tests except the Additive Model, and is available whether missing values are used or dropped.
This test yields the exact probability under the null hypothesis of having a contingency table at least as extreme as the one observed, assuming an equal probability of any permutation of the dependent variable. This form, which is more computationally expensive than the Pearson Chi-squared test, avoids the chi-square approximation altogether.
See 18.3.4 above for a listing of the possible contingency tables.
See Section 26.22.5 in the Formulas and Theories chapter regarding Fisher’s Exact Test.
18.3.6 Odds Ratio with Confidence Limits
If you have a case/control dependent variable, you are dropping missing data, and you are using any model or test other than the Genotypic Test, you may select to output odds ratios and the lower and upper 95% confidence bound for each under the following models:
- Basic Allelic Tests: The odds ratio for the minor allele enhancing the effect and the odds ratio for the major allele enhancing the effect.
- Dominant: The “normal” odds ratio ((DD, Dd) vs. (dd), where D is the minor allele and d is the major allele) and an inverse odds ratio ((dd) vs. (DD, Dd)).
- Recessive: The “normal” odds ratio ((DD) vs. (Dd, dd)) and an inverse odds ratio ((Dd, dd) vs. (DD)).
- Additive: The odds ratio for (Dd) vs. (dd) (heterozygous vs. homozygous major allele) and the odds ratio for
(DD) vs. (Dd) (homozygous minor allele vs. heterozygous).
NOTE: Under this model, the two odds ratios may be thought of as a check on the validity of the model itself in describing the effect, as well as indicators of the intensity of the association. If the two odds ratios are approximately the same, then the additive model may be considered valid. If the two odds ratios are very different, then there may be some other model that better describes the data. For instance, a high and significant odds ratio for (Dd) vs. (dd) and a low or insignificant odds ratio for (DD) vs. (Dd) may be an indicator that the dominant model really better describes the effect.
NOTE: An odds ratio is generally considered significant if both the lower and the upper 95% confidence bounds are greater than one (or both less than one for an odds ratio less than one).
See Section 26.22.6 in the Formulas and Theories chapter regarding odds ratios.
18.3.7 Analysis of Deviance
This test is available for a case/control dependent variable for all genetic models and tests except the Additive Model, and is available whether missing values are used or dropped.
It is a first-order equivalent alternative statistic for testing an observed contingency table vs. expected contingency table created with all the possible variations of the selected model in one direction vs. “case” vs. “control” in the other direction.
See 18.3.4 for a listing of the possible contingency tables.
This test has somewhat more theory in its foundation than does the Pearson test (18.3.4) as it is a likelihood ratio test, to which the Pearson test is a first-order approximation.
See Section 26.22.7 in the Formulas and Theories chapter for an explanation of this statistic.
18.3.8 F-Test
This is one of the two tests available for a quantitative dependent variable. (The other one is the correlation/trend test 18.3.1.) The F-Test is available for all genetic models and tests except the Additive Model, and is available whether missing values are used or dropped.
This test is on whether the distributions of the dependent variable within each category are significantly different between the various categories of the predictor variable.
The respective sets of categories when dropping missing values are as follows:
| Genetic Model or Test | Categories |
|---|---|
| Basic Allelic Test | D vs d |
| Genotypic Test | DD vs dd vs Dd |
| Dominant | (DD or Dd) vs dd |
| Recessive | DD vs (Dd or dd) |
If you have chosen Use Missing Values As Predictors, the respective expanded sets of categories become as follows:
| Genetic Model or Test | Categories |
|---|---|
| Basic Allelic Test | D vs d vs missing – two missing values are used for every missing genotype |
| Genotypic Test | DD vs dd vs Dd vs missing |
| Dominant | (DD or Dd) vs dd vs missing |
| Recessive | DD vs (Dd or dd) vs missing |
See Section 26.22.8 in the Formulas and Theories chapter for an explanation of the F-Test.