Methods for the Genetic Association Tests
The following subsections further explain certain methods used in the Genetic Association Test module (Chapter 18).
26.22.1 Correlation/Trend Test
This test the significance of any correlation between two numeric variables (or two variables which have been encoded as numeric variables). This test may also be thought of as any “trend” which either one of the numeric variables may have taken against the other one.
If we have n pairs of observations (xi,yi), the (signed) correlation R between them is

.
Meanwhile,

follows an approximate chi-squared distribution with one degree of freedom, from which may be obtained a p-value.
NOTE: In the special case of the additive model (and no PCA correction) for a case/control study, if we were to use, instead of the above formula,

,
we would have the mathematical equivalent of the Armitage Trend Test.
NOTE: This correlation/trend test is also the one test available to be used after PCA correction–however, the formula for the chi-square statistic is instead

,
where k is the number of principal components that have been removed from the data. The premise is that the PCA correction has now removed k degrees of freedom from the data, and only the remaining degrees need to be tested.
26.22.2 Armitage Trend Test
This tests the “trend” in an ordered case/control contingency table. In HelixTree, the ordering is by number of minor alleles in the genotype–zero, one, or two.
Let n10, n11, and n12 be the counts for cases with 0, 1, and 2 alleles, respectively, and n00, n01, and n02 be the counts for controls with 0, 1, and 2 alleles, respectively. Also, let s0 = 0, s1 = 1, and s2 = 2.
If we let N be the total count, pcase =
, p1i =
,
=
, and

then the prediction equation under ordinary least-squares fit is

The statistic for the Armitage Trend Test is

which is asymptotically chi-squared with one degree of freedom. This is used to obtain the chi-squared-based p-value for this test.
26.22.3 Exact Form of Armitage Test
The exact form of this test yields the exact probability under the null hypothesis of having a “trend” at least as extreme as the one observed, assuming an equal probability of any permutation of the dependent variable.
To perform the exact Armitage test, we define the trend score for the contingency table m as

where

The exact permutation p-value is evaluated as

where

26.22.4 (Pearson) Chi-Squared Test
This is the most-often used way to obtain a p-value for (the extremeness of) an (unordered) mxn contingency table, to know whether to reject the null hypothesis that the proportions in the rows and columns of the table match the proportions of the margin column totals and the margin row totals, respectively.
If the contingency table with elements xij has N observations, we make an “expected” contingency table based on the marginal totals:

We then obtain a p-value from the fact that

approximates a chi-squared distribution with (m-1)(n-1) degrees of freedom.
For the 2x2, 2x3, and 2x4 tables for which this technique is used in HelixTree, the degrees of freedom are 1, 2, and 3 respectively.
26.22.5 Fisher’s Exact Test
The output of this test is the sum of the probabilities of all contingency tables whose marginal sums are the same as those of the observed contingency table and which are as extreme or more extreme (equally probable or less probable) than the observed contingency table.
The probability of a 2xr contingency table with elements xrc and row totals rc and column totals cr and N elements is given by

To reduce the amount of computation, techniques developed by Mehta and Patel ([Mehta and Patel 1983], [Mehta and Patel 1986]) are used in HelixTree for computing Fisher’s Exact Test.
26.22.6 Odds Ratio with Confidence Limits
For the purposes of this discussion, we define our 2x2 contingency table as being organized as “(Case vs. Control) vs. (Yes vs. No)”, and we define ycase as the number of “Case” Yes’s, ncase as the number of “Case” No’s, ycontrol as the number of Control Yes’s, and ncontrol as the number of “Control” No’s.
The odds ratio is defined as the ratio of the odds for “Case” among the Yes’s to the odds for “Case” among the No’s, or equivalently the ratio of the odds for “Yes” among the cases to the odds for “Yes” among the controls, or equivalently

To obtain confidence limits, we use the standard error of log(OR), which is

The 95% confidence interval then ranges from elog(OR)-1.96s to elog(OR)+1.96s.
26.22.7 Analysis of Deviance
This is a maximum-likelihood based technique.
Let s be the proportion of cases in the entire sample, nj be the number of observations in column j of the contingency table, and pj be the proportion of cases in column j. Then, to perform an analysis of deviance test, we define

and
![∑k
Fk = - [- 2nj(pjlog(pj)+ (1 - pj) log(1- pj))].
j=1](manual173x.png)
The test statistic is then F0 -Fk, which approximates a chi-squared distribution with k-1 degrees of freedom. A p-value is then obtained based on this chi-squared approximation.
26.22.8 The F-Test
The F-Test applies to a quantitative trait being subdivided into two or more groups according to the category of the predictor variable.
This test is on whether the distributions of the dependent variable within each category are significantly different between the various categories of the predictor variable. Another way to phrase this question is whether the variation of the trait between the categories is substantial by comparison to the variation of the trait within the categories.
If there are n observations xi subdivided into k groups, we define

, and

.
if v1 = (k - 1) and v2 = (n - k), then

is proportional to the variance between the groups, and

is proportional to the variance within the groups.
The F statistic becomes

.
To obtain a p-value from this, the two degrees of freedom v1 and v2 must be taken into account, to know which F distribution to use.
26.22.9 The T-Test
The T-Test is a special form of the F-Test in which distributions in only two categories are being compared. (The T statistic is the square root of the corresponding F statistic for two categories.)
In the CNAM Association Test module 25, the T-Test is used for a quantitative predictor (independent variable) and a case/control (boolean) dependent variable.
The test is on whether the distributions of the quantitative predictor within the two categories of case vs. control are significantly different. Another way to phrase this question is whether the variation of the predictor between the categories is substantial by comparison to the variation of the predictor within the categories.
If there are nt observations xti corresponding to a true dependent variable value and nf observations xfi corresponding to a false dependent variable value, we define



Then

If Sd is less than a threshold (10-6), then the P-value returned is 1.0. Otherwise,

The p-value may be calculated on the basis of this T value as a “two-sided p-value” using Student’s t distribution with nt + nf - 2 degrees of freedom.