LD Computation

You can specify whether the EM algorithm or CHM is used for the LD computations from the project options dialog (see Section 3.5.3.3) or the tree options dialog (see Section 7.2.3.1). The project options are accessed from the project window using Tools->Options for Updates and New Projects->LD Parms or Tools->Current Project’s Options->LD Parms, and the tree options are accessed from the tree diagram using Tree->Options->LD Parms. In the following explanation, the bolded text refers to options in this dialog window.

The linkage disequilibrium is calculated as follows:

Let K be a marker locus with alleles 1...k having frequencies p1,…,pk and M be a marker locus with alleles 1...m having frequencies q1,…,qm. The linkage disequilibrium contribution for allele i from locus K and allele j from locus M is Dij = pij - piqj, where pij is the joint frequency of alleles i and j on the same gamete.

If the Composite Haplotype Method (CHM) of haplotype estimation is chosen, we approximate this by using the composite LD, Δij = (pij + pi∕j) - 2piqj, where pi∕j is the joint frequency of i and j on two different gametes.

Δij approximates Dij because when Hardy-Weinberg equilibrium holds on the level of two-marker haplotypes, pi∕j = piqj. Because pij + pi∕j is an observable quantity, Δij may be estimated without using the EM algorithm.

Let niujv be the count of the genotypes containing allele i and allele u at locus K, and allele j and allele v at locus M. Let n be the number of individuals. If we let

               ∑k           m∑          1  ∑k    ∑m
nuv = 2nuuvv +      niuvv +      nuujv +-             niujv,
              i=1,i⁄=u       j=1,j⁄=v       2 i=1,i⁄=u j=1,j⁄=v

we have pij + pi∕j = nij
-n-, and Δij = nij
-n- - 2piqj. When k > 2 or m > 2, we use the chi-squared distribution with (k - 1)(m - 1) degrees of freedom,

  2    ∑k m∑   Δ2ij
X  = n       2piqj.
       i=1 j=1

From this, we obtain the distribution’s p-value p = chisqr(X2,(k - 1)(m - 1)), and the correlation R from the inverse distribution for one degree of freedom, which is

    ∘-------
      F-1(p)
R =     n   .

(Some values obtained for the correlation may be greater than one since these are estimates based on p-values.)

When k = m = 2, and we do not have the HW Correction when both markers are biallelic option set, we approximate what the chi-squared distribution using Dij would be,

      ∑2  ∑2 Δ2     ∑2 ∑2  D2
X2 = n       --ij-≈ n       -ij-
       i=1 j=1 piqj    i=1j=1 piqj

when the markers are close to HWE. (Since this is an approximation, some values obtained for the correlation may be greater than one. These indicate departure from Hardy-Weinberg equilibrium on the level of two-marker haplotypes.)

When k = m = 2, and the HW Correction when both markers are biallelic option is set, we instead use the following direct formula

                          nΔ2
nR2 = X2 = -----------------AB--------------- ,
           (pA(1- pA )+ DAA )(qB(1- qB) +DBB  )

where the two biallelic markers are thought of as containing alleles A vs. a and B vs. b, respectively, and DAA and DBB are the respective Hardy-Weinberg coefficients for allele A of the first marker and allele B of the second marker. This expression may be shown to approximate the expression

nR2 = ------nD2AB--------
      pA(1- pA)qB(1- qB)

to compute n times the square of the correlation.

On the other hand, if the Expectation/Maximization (EM) method of haplotype estimation has been chosen (see Section 19), we have available estimates of the pij and therefore of the Dij, so we may directly write the chi-squared distribution with (k - 1)(m - 1) degrees of freedom as

 2    ∑k ∑m -D2ij
X  = n      piqj.
      i=1j=1

From this, as for CHM, we obtain the distribution’s p-value

p = chisqr(X2, (k - 1)(m - 1)),

and the correlation R from the inverse distribution for one degree of freedom, which is

     -------
    ∘ F-1(p)
R =   --n---.

In the case that the Use Patient Data Containing Missing Values box has been checked (as a part of using the EM method), not only will haplotype frequencies for missing data be imputed, but from these imputed frequencies, the pi and qj for the LD calculations will also be imputed.

14.4.1 LD Computation for D prime

Let K and M be two markers with alleles 1,...,k and 1,...,m respectively, having frequencies p1,...,pk and q1,...,qm respectively. We wish to estimate the normalized LD coefficient D, which is the LD coefficient divided by its maximum possible value.

If we are using the Composite Haplotype Method (CHM) method of haplotype estimation, and we are estimating D involving at least one multi-allelic marker, or we are not using the HW Correction when both markers are biallelic option, we approximate the maximum possible value of the LD coefficient with what would be the maximum possible value given Hardy-Weinberg equilibrium on the level of two-marker haplotypes. Using the same definitions as in the previous discussion on linkage disequilibrium, including the definitions for n,nij,pij,pi∕j,andΔij, we thus define

      {
  ′     Δij∕min (piqj,(1 - pi)(1- qj)), ifΔij < 0
D ij =  Δ  ∕min ((1 - p )q,p (1- q )), otherwise
          ij          i  j i     j

and

    ∑k ∑m
D′ =       piqj|D ′ij|.
     i=1 j=1

If we are using the Composite Haplotype Method (CHM) method of haplotype estimation for two biallelic markers, but we are also using the HW Correction when both markers are biallelic option, the Destimate is first computed as above. Then it is corrected by multiplying by the square root of the ratio of the corrected chi-square estimate to the would-be uncorrected chi-square estimate for the same markers. This ratio is

               ∘ ---------------------------------
                        pA(1- pA)pB(1- pB )
D ′ = D ′uncorrected (p-(1--p-)+-D--)(p-(1--p-)+-D--).
                   A     A     A   B     B     B

On the other hand, if the Expectation/Maximization (EM) method of haplotype estimation is chosen, we may define the Dcontributions directly from the EM estimates for Dij. We have

      {
D ′ =   Dij∕min (piqj,(1 - pi)(1- qj)), ifDij < 0
  ij    Dij∕min ((1 - pi)qj,pi(1- qj)), otherwise

and (as for CHM)

 ′  ∑k ∑m       ′
D =        piqj|D ij|.
     i=1 j=1

In the case that the Use Patient Data Containing Missing Values box has been checked (as a part of using the EM method), not only will haplotype frequencies for missing data be imputed, but from these imputed frequencies, the pi and qj for the LD calculations will also be imputed.

NOTE: For multi-allelic markers, the p-value method of finding an equivalent R for one degree of freedom sometimes results in the computed value for R being greater than the computed value for D.

14.4.2 LD Computation for Quick Mode

If the Only Output R Squared and D Prime (Quick Mode) option has been selected, we directly use

  2  ∑k ∑m -Δ2ij-
R  =       2piqj
     i=1j=1

(CHM),

  2  ---------------Δ2AB---------------
R  = (pA(1- pA) +DAA )(qB(1- qB)+ DBB ),

(CHM with HW correction), or

 2   k∑  m∑  D2ij-
R  =       piqj
     i=1 j=1

(EM).

No further computations are pursued, except those for D.

NOTE: For multi-allelic markers, the value of the (first or third) relation above, as applicable, which is actually equal to X2-
 n, is shown in quick mode as an approximation to R2.