Window Display and Navigation


[Picture]
Figure 19.1: Haplotype Frequency viewer window

In Figure 19.1, the user has selected markers 1-11, and run the EM algorithm to convergence to find there are 9 different haplotypes in the population that occur with a frequency of at least 0.0001.

Following is a description of the various user interface elements available in the Haplotype Frequency viewer:

19.2.1 Marker List

In the upper-left corner, the list of genetic marker loci and their spreadsheet variable numbers are displayed. Use single click, shift-click, control-click, and/or press-and-drag to select (by highlighting) the marker loci from which to estimate haplotypes.

In practice, depending on marker heterogeneity, approximately 27 bi-allelic markers is close to the maximum capacity of the program on a computer with 512MB of memory. For multi-allelic markers, only a smaller number is possible because of the greater number of possible haplotypes.

19.2.2 Patient List

In the upper middle window is displayed a list of patient identifiers, with the default selection being “ALL PATIENTS”. If you click on a particular patient, the haplotype list will be modified to show the diplotypes for the patient. (See 19.3 and 19.5 below.)

19.2.3 Alternate Field Chooser

This chooser in the upper-right corner allows you to select what field is shown in column 3 of the haplotype list (see below) when the Patient List selection is set to “ALL PATIENTS”. The choices are CHM Frequencies, Estimated 99% Confidence Intervals, Estimated 95% Confidence Intervals, and Estimated 90% Confidence Intervals.

19.2.4 Haplotype List

The resulting list of haplotypes is displayed on the right side corner under the alternate field chooser. The haplotypes are in column 1, the EM-method estimated frequencies are in column 2, and the chosen alternate field (see above) is in column 3. (For an individual patient, the CHM diplotype probability is always shown in column 3.) The haplotype list may be sorted by any column by clicking on the columnś header bar.

19.2.5 Copying Haplotype Results

To copy all or some of the haplotype list (for pasting into other documents such as Excel), use click, shift-click, control-click, and/or press-and-drag to select the desired portion of the list, then right-click to get a copy menu. Alternatively, right-click to get a copy menu, which also allows "select all", to select the entire list, then right-click again to get the menu for copying.

19.2.6 EM Initial Conditions

Choose whether to initialize for the EM algorithm with "CHM" (Composite Haplotype Method frequency estimates), Random Values, or Equal Values. The Composite Haplotype Method (see Section 26.7) estimates haplotypes assuming that both allele phases are equally likely at every locus. Random values are recomputed whenever (re-)initialization takes place.

19.2.7 Compute EM/Refine Estimate

To compute the haplotypes using the selected initialization method, up to the selected number of iterations, and within the selected convergence tolerance, click the "Compute EM" button. The estimated haplotypes will appear in the list in the upper-right corner. This takes place immediately for "simple" estimates. A progress bar is displayed for more difficult estimates. You may cancel from more difficult estimates if, for instance, you want to change the convergence tolerance or use single-step mode (see Section 19.2.12).

This button will change to a "Finish EM" button if convergence was not achieved or if you use single-step mode (see Section 19.2.12). It will change to a "Refine Estimate" button if convergence is achieved, but you then change the desired convergence tolerance.

This button will change its name to "Re-Compute" if convergence is achieved while "Random Values" was selected for initializing, or if a new method of initializing is selected.

19.2.8 Finish EM

Click this to perform another set of iterations to attempt to obtain convergence.

19.2.9 Refine Estimate

Click this to resume iterating until convergence is obtained according to a new convergence tolerance you have entered, or up to the selected number of iterations.

19.2.10 Re-Compute

Click this to re-estimate haplotypes using a brand new set of initial values.

19.2.11 Initialize

To merely set or re-set the initial conditions, possibly using a different initializing method, click the "Initialize" button.

19.2.12 Step Once

To note the results of iterating just once, click "Step Once".

19.2.13 Display Threshold

Only haplotypes with a frequency above the "Display Threshold" value will be shown. To show more or fewer haplotypes, change the "Display Threshold" value. This is useful for culling out rare haplotypes from the display.

19.2.14 Reshow

To immediately display more or fewer haplotypes based on a new "Display Threshold" value, click "Reshow".

19.2.15 Confidence Interval Estimation Threshold

To avoid possible singularities in matrix computation, and to improve computation speed for larger marker windows and numbers of possible haplotypes, only those haplotypes whose estimated frequencies are higher than this threshold will be involved in the final step of the confidence interval calculations. To check the rigor of this approximation (on a smaller-sized window), decrease this number, or set it to zero. To speed up the calculation further, increase this number.

MATHEMATICAL NOTE: This procedure (of not having the above threshold exactly zero) works by assuming the information matrix elements formed from a large-probability (“common”) haplotype and a small-probability (“rare”) haplotype are close to zero. If those elements were exactly zero, you would be able to put the information matrix into a block-diagonal form, with the common haplotypes in one block and the rare haplotypes in the other. That would allow dealing only with the first block–that is, ignoring rare haplotypes altogether. This approximation assumes that reasonable results will come from proceeding, in any case, to deal only with the common-haplotype block.

19.2.16 Use Patient Data Containing Missing Values

When this box is checked, if a patient has missing values for one or more of the markers that have been selected, all possible allelic values at these markers are assumed with equal probability as initial conditions. All these resulting haplotypes are used in the EM computations.

When this box is not checked, the patients who have missing values for any one marker that makes up the haplotype will be dropped. The patient list will be changed to gray out those patients that have missing values when the Compute EM button is pressed.

19.2.17 Maximum EM Iterations

To change the maximum number of iterations that will be done at one time by the "Compute EM"/"Finish EM"/"Refine Estimate"/"Re-Compute" button, change the "Maximum EM Iterations" value.

19.2.18 EM Convergence Tolerance

To change the maximum amount by which any haplotype frequency value may change in order to consider the estimate to have "converged", alter the "EM Convergence Tolerance" value.

19.2.19 Genotypes/Haplotypes

The total number of genotypes found and the total number of haplotypes found from these genotypes (separated by a slash) is displayed under this title.

19.2.20 Haplotypes Shown

The number of haplotypes being displayed is shown here. If the number that meet the "Display Threshold" value exceeds a certain upper limit, that number will be shown separately.

19.2.21 Current EM Iterations

This is the total number of iterations taken thus far since initialization.

19.2.22 Proximity to Convergence

This is the maximum amount by which any haplotype frequency value changed during the last iteration.