‹‹ Back to SVS Home

CNV Association Tests

10.5 CNV Association Tests

The CNV Association Tests window (Analysis > CNV Association Tests) allows you to run association tests and principal component analysis directly on CNV segment mean covariates with a marker map applied. This window has the same type of output as the Numeric Association Test window (see Numeric Association Tests).

Summary information for the dependent variable and any other variable information is displayed at the top of this window for reference. This information is visible from all two tabs in this dialog window.

CNV Association Tests Overview

The CNV Association Tests window offers a straightforward way of testing for associations between numeric log2 ratio or related CNV predictors against either case/control status or a quantitative trait using one or more statistical measures.

In addition, the CNV Association Test window offers batch effects (and stratification) correction using principal component analysis. There are also several options for multiple testing corrections on the resulting p-values from the association test.

Tests and Analysis Methods

You can choose from the following statistical tests and/or methods where appropriate:

  • Correlation/Trend Test
  • T-Test
  • Linear Regression
  • Logistic Regression

These tests and methods are described below.

Correlation/Trend Test

This test is available for both case/control and quantitative dependent variables. This test is the only test (besides logistic or linear regression) which is available if principal components analysis (PCA) is used for batch effects/stratification correction on the input data. See Principal Component Analysis Overview for more information.

This test will show the p-value for the (possibly PCA-corrected) dependent variable value having any correlation with or “trend” which depends on the (possibly PCA-corrected) predictors.

For case/control dependent variables, and before any PCA correction, a “case” is considered to have a value of one, and a “control” is considered to have a value of zero.

In addition, this test will show a signed correlation value indicating the amount and direction of dependency of the (possibly PCA-corrected) count value on the (possibly PCA-corrected) dependent variable value.

See the Formulas and Theories chapter for an explanation of this statistic (Statistics Available for Genotype Association Tests).

T-Test

The T-Test can only be run when the phenotype dependent trait is binary (which may be the result of transforming a variable such as gender into a binary variable). For more information, see Statistics for Numeric Association Tests.

Linear/Logistic Regression

When the dependent is a quantitative (real- or integer-valued) trait, linear regression is available. With linear regression, a line is fit to the response in terms of the predictor’s value, and a p-value is computed for goodness of fit. The output will include not only the regression p-value but also the estimate for the intercept and slope of the regression.

When the dependent is a binary trait logistic regression is available. With logistic regression a logistic (sigmoid) curve is fit to the predictor value, and a p-value is computed for goodness of fit. The output will include not only the regression p-value but also the estimates for β0 and β1.

Bonferroni and False Discovery Rate (FDR) multiple testing corrections can also be applied to the regression results.

See the Formulas and Theories chapter for an explanation of this statistic (Section Linear Regression and Section Logistic Regression).

Note on Missing Values

All missing values will be dropped from the analysis both from the predictor variables and from the dependent variable.

Multiple Testing Corrections

It may be possible to obtain a good test statistic by chance alone. Multiple testing corrections are designed to help ensure, if possible, this is not the case. You may optionally select one or more of the following multiple testing corrections.

Bonferroni Adjustment

The Bonferroni adjustment multiplies each individual p-value by the number of times a test was performed. This value, which is quite conservative, seeks to estimate the probability that this test would have obtained the same value by chance at least once from all the times this test was performed. (The number of times this test was performed will be equal to the number of bi-allelic markers processed. Other types of tests on the same markers are not counted.)

False Discovery Rate

The False Discovery Rate (FDR) option calculates the FDR for each statistical test selected. This test is based on the p-values from the original test.

A general interpretation of the FDR is “What would the rate of false discoveries (false positives) be if I accepted ALL of the tests whose p-value is at or below the p-value of this test?”

See the Formulas and Theories chapter for an explanation of this correction procedure (Section False Discovery Rate).

Permutation Testing

Permutation testing is another way of determining if a significant test statistic value was obtained by chance alone.
  Single Value Permutation Testing  With single value permutations, the dependent variable is permuted and the given statistical test using the given model on the given marker is performed. This process is repeated the number of times you select (counting the original test as one “permutation”). The permuted p-value is the fraction of permutations in which the test came out as significant or as more significant than it did with the non-permuted dependent variable.
  Full Scan Permutation Testing  The full-scan permutation technique differs from the single-value technique in that it addresses the multiple testing problem. It does this by comparing the original test result from an individual marker with the most significant permuted results from all tested markers. The specified number of permutations are done on the dependent variable and these permutations are tested with each marker. For each permutation only the most significant result statistic of all markers tested with that permutation is saved. The p-value is the fraction of permutations in which this best saved value of the test statistic was more significant than the original statistical test on the given marker.
See the Formulas and Theories chapter for a more detailed explanation and examples of permutation testing. (Section Permutation Testing Methodology).

Principal Components Analysis

To use principal component analysis (PCA) to correct for batch effects or stratification check the “Correct for Batch Effects/Stratification with PCA” box. The PCA options tab will be made available once this box is checked. If the “Use Corrected Dependent” box is checked, the dependent variable will also be corrected using the principal components found. If this box is not checked the dependent variable will be left uncorrected. Correcting a binary dependent variable makes it continuous. Thus, linear regression and the Corr/Trend Test are the appropriate tests in this situation.

Using the CNV Association Test Window

Summary information for the dependent variable is displayed at the top of this window for reference. This information is visible from both tabs in this dialog window.

Data Requirements

CNV Association Tests require a marker mapped dataset containing numeric log2 ratio or related CNV data and either case/control or quantitative trait data. To use these tests, first import your data into a SVS project (See Importing Your Data Into A Project.) Once you have the spreadsheet for this data, select the column representing the case/control status or quantitative trait as the dependent variable (See Column States) and access the CNV Association Tests options dialog by selecting Analysis > CNV Association Tests from the spreadsheet menu.

Available Tabs

The CNV Association Tests window consists of two tabs:

  • Association Test Parameters: This tab contains all the parameters necessary for the association tests themselves, plus options for selecting principal component (PCA) analysis for batch effects/stratification correction of the test input data and for using PCA to correct the dependent variable.
  • PCA Parameters: This tab contains all of the remaining parameters for principal component analysis (PCA).
    NOTE:

The Association Test Parameters Tab


[Picture]

Figure 84: CNV Association Tests Window – Association Test Parameters Tab

In the Association Test Parameters tab (see Figure 84), select all of the statistical tests you wish to perform, select whether you wish to correct your input data for batch effects or stratification through PCA, and select any multiple-testing corrections to apply to the results.

If an option is hidden, grayed out or inaccessible, it means a different option or options you previously selected will not allow this hidden, grayed out, or inaccessible option to be simultaneously selected.

NOTE:

  • Single Value Permutations and Full Scan Permutations can be run individually or together. You must provide a value for the number of permutations used in the test. When running both types of permutations together, the selected number of permutations is the same for both. The number of permutations should be greater than or equal to three. Permuted P-Values are calculated only for non-exact test statistics.

The PCA Parameters Tab


[Picture]

Figure 85: CNV Association Tests Window – PCA Parameters Tab

If you selected to correct for batch effects/stratification with PCA, you will be able to select PCA parameters from this tab (see Figure 48).

The principal components can be computed, or if they have already been computed for the dataset, the spreadsheet of principal components can be selected after selecting the “Use precomputed principal components” option. See Applying PCA to a Superset of Markers and Applying PCA to a Subset of Samples for specific limitations of this feature.

The other options include the number of components to be found, whether to output a separate eigenvalue spreadsheet, and whether and how to eliminate component outlier subjects and recompute components. See Principal Component Analysis Overview for an explanation of the options for this tab.

Processing

When you have selected all the tests and outputs you wish to perform, select the Run button to start the selected tests and correction procedures. While the association test analysis is running, you can press the Cancel button on the progress bar dialog to stop the analysis.

When the tests are completed the output spreadsheet(s) will appear.

Spreadsheet Outputs

These can be as follows:

  • The results of the association tests will be displayed in a marker mapped spreadsheet. Each of the statistics calculated will be in its own column.
  • If you requested a principal components spreadsheet, this will be created, with rows according to the patient or subject, and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components you requested will be shown.
  • If you requested an eigenvalue spreadsheet from PCA, it will simply show the eigenvalues from large to small (of the number of components you requested).
  • If you requested elimination of outlier subjects, and outliers were found, a spreadsheet will be made to list these outliers and the iteration and component in which they were found.

NOTE:

  • If you wish to see any outputs in the form of p-value-style plots, you can go to the output spreadsheet and, once there, either plot the individual columns of interest by right clicking on their headers and selecting Plot Variable or going to Analysis > Plot Numeric Columns to plot multiple columns at once. The column’s data will be plotted against genomic position and the plot viewer will represent a Genome Browser with available Annotation Tracks.