Using the Association Test Window

18.8.1 Data Requirements

Genetic Association Tests require a data set containing genotypic data and either case/control or quantitative trait data. To use these tests, first import your data into a HelixTree project (See Chapter 4). Once you have the spreadsheet for this data, select the column representing the case/control status or quantitative trait as the dependent variable (See Section 6.2) and access the Genetic Association Tests options dialog by selecting Genetics->Genetic Association Tests from the spreadsheet menu.

NOTE: It is common practice to inactivate those markers known to have data quality issues before testing, especially if you wish to use PCA.

NOTE: If you have case/control data with some missing values, see 18.4.2. You can still analyze it as case/control data.

18.8.2 Available Tabs

The genetic association test window consists of three tabs:

  • Association Test Parameters. This tab contains all the parameters necessary for the association tests themselves, plus options for selecting principal components analysis, either for stratification correction of the test input data or merely to obtain principal components analysis of the data without actually using it for stratification correction.
  • Principal Component Analysis Parameters This tab contains all of the remaining parameters for principal components analysis (PCS).

    NOTE: These parameters are also available in the stand-alone PCA window. If you wish to do principal components analysis of your data without performing an association test, please go to that window. (See 18.9.)

  • Overall Marker Statistics This tab contains the parameters for obtaining overall marker statistics. These statistics are independent of any association test, other than the fact that most of these statistics will subdivide their results by overall, cases, and controls if you have made a single case/control variable dependent.

    NOTE: These parameters are also available in the stand-alone Overall Marker Statistics window. If you wish to obtain overall marker statistics without performing an association test, please go to that window. (See 6.6.2.)

18.8.3 The Association Test Parameters Tab


[Picture]
Figure 18.1: The Association Tests Window

In the Association Test Parameters tab, select the one genetic model or other test you wish to use, select whether to include missings in the analysis, select whether you wish to correct your input data for stratification through PCA, and select all of the statistical tests you wish to perform.

Optionally you may select multiple-testing corrections to perform for the non-exact statistical tests or to correct for stratification through Genomic Control.

NOTE: This user interface is dynamic, in that making certain choices will change the availability or selection available of other choices. Specifically, the following restrictions apply:

  • Selecting your genetic model, whether to use missings, and whether to correct your input data through PCA will alter your selection of statistical tests which are available.
  • PCA is not available for basic allele tests or genotypic tests.
  • The additive model is not available when using missing data as predictors.
  • Genomic control is not available when using missing data as predictors.
  • PCA additive model is not available when using missing data as predictors.
  • Genomic control is not available at the same time as permutation testing.
  • Genomic control is not available for the genotypic model when the dependent variable is quantitative.

(If you can’t remember these, the window will remind you!)

NOTE: Single Value Permutations and Full Scan Permutations can be run individually or together. You must provide a value for the number of permutations used in the test. When running both types of permutations together, the selected number of permutations is the same for both. The number of permutations should be greater than or equal to three. Permuted P-Values are calculated only for non-exact test statistics.

18.8.4 Principal Component Analysis Parameters Tab


[Picture]
Figure 18.2: The Principal Component Analysis Parameters Tab

If you selected to either correct for stratification with PCA or to output a PCA component spreadsheet, you will be able to select PCA parameters from this tab–specifically, number of components to be found, normalization, whether to output a separate eigenvalue spreadsheet, and whether and how to eliminate component outlier subjects and recompute components. Please see 18.6.1 for an explanation of the options for this tab.

Note that the genetic model, selectable in the Association Test Parameters tab, is also a parameter that influences finding the principal components.

18.8.5 Overall Marker Statistics Tab


[Picture]
Figure 18.3: The Overall Marker Statistics Tab

Here, you can optionally select to output any of the overall marker statistics available in this tab. Please see 18.7 for an explanation of the options for this tab.

18.8.6 Processing

When you have selected all the tests and outputs you wish to perform, select the Run button to start the selected tests. While the association test analysis itself is running, you can press the End Processing button on the progress bar dialog to stop the analysis and display the results for only the markers completed. (Note that stopping the progress bar for the principal components analysis will simply cancel all analysis.)

When the test is complete a message will open to inform you of the number of markers analyzed and skipped and of the spreadsheet(s) produced.

18.8.7 Spreadsheet Outputs

These can be as follows:

  • The results of the association tests and marker statistics will be displayed in the same spreadsheet. Each of the statistics calculated will be in its own column. If the original data set was a marker mapped spreadsheet, the first two columns of the this spreadsheet will be chromosome number and position.

    NOTE: The markers that were skipped will be included in this spreadsheet, but the data for those markers will be displayed as missing values.

  • If you requested a principal components spreadsheet, this will be created, with rows according to the patient or subject, and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components you requested will be shown.
  • If you requested an eigenvalue spreadsheet from PCA, it will simply show the eigenvalues from large to small (of the number of components you requested).
  • If you requested elimination of outlier subjects, and outliers were found, a spreadsheet will be made to list these outliers and the iteration and component in which they were found.

NOTE: If you wish to see any outputs in the form of p-value-style plots, you can go to the output spreadsheet and, once there, either plot the individual columns of interest by clicking on their headers (6.2.5) or select that spreadsheet’s Analysis->Plot Numeric Columns to create one plot window in which any of the output columns may be selected to show as a plot (6.5.5). The column’s data will be plotted against its row header names or numbers.