Haplotype Frequency Estimation Overview

Genetic data for HelixTree is assumed to be phase-ambiguous – that is, for each patient and marker locus, the two alleles are known, but it is not known which allele belongs to which chromosome. However, what is often of interest are patterns of alleles over different loci on the same chromosome, that is, haplotypes. This module, when invoked from the spreadsheet view, allows estimation of haplotypes for selected loci using the Expectation/Maximization (EM) algorithm.

Each iteration of EM computes would-be probabilities for the multi-locus genotypes based on the probabilities of their corresponding haplotypes as previously estimated or as initialized, assuming random mating (the "Expectation" step). The iteration then finishes by re-estimating the haplotype probabilities based on the ratios of the computed genotype probabilities to the actual ones (the "Maximization" step).

We used the EM approach outlined in [Excoffier 1995]. We used the approach for handling missing values outlined in [Chiano 1998].