‹‹ Back to SVS Home
CNV Association Tests
10.5 CNV Association Tests
The CNV Association Tests window (Analysis > CNV Association Tests) allows you to run association tests
and principal component analysis directly on CNV segment mean covariates with a marker map applied.
This window has the same type of output as the Numeric Association Test window (see Numeric Association
Tests).
Summary information for the dependent variable and any other variable information is displayed at the top of this
window for reference. This information is visible from all two tabs in this dialog window.
CNV Association Tests Overview
The CNV Association Tests window offers a straightforward way of testing for associations between numeric log2 ratio or
related CNV predictors against either case/control status or a quantitative trait using one or more statistical
measures.
In addition, the CNV Association Test window offers batch effects (and stratification) correction using principal
component analysis. There are also several options for multiple testing corrections on the resulting p-values from the
association test.
Tests and Analysis Methods
You can choose from the following statistical tests and/or methods where appropriate:
- Correlation/Trend Test
- T-Test
- Linear Regression
- Logistic Regression
These tests and methods are described below.
Correlation/Trend Test
This test is available for both case/control and quantitative dependent variables. This test is the only test
(besides logistic or linear regression) which is available if principal components analysis (PCA) is used for
batch effects/stratification correction on the input data. See Principal Component Analysis Overview for more
information.
This test will show the p-value for the (possibly PCA-corrected) dependent variable value having any correlation with or
“trend” which depends on the (possibly PCA-corrected) predictors.
For case/control dependent variables, and before any PCA correction, a “case” is considered to have a value of one, and a
“control” is considered to have a value of zero.
In addition, this test will show a signed correlation value indicating the amount and direction of dependency of the
(possibly PCA-corrected) count value on the (possibly PCA-corrected) dependent variable value.
See the Formulas and Theories chapter for an explanation of this statistic (Statistics Available for Genotype Association
Tests).
T-Test
The T-Test can only be run when the phenotype dependent trait is binary (which may be the result of transforming a variable such as gender into a binary variable). For more information, see Statistics for Numeric Association Tests.
Linear/Logistic Regression
When the dependent is a quantitative (real- or integer-valued) trait, linear regression is available. With linear regression, a line is fit to the response in terms of the predictor’s value, and a p-value is computed for goodness of fit. The output will include not only the regression p-value but also the estimate for the intercept and slope of the regression.
When the dependent is a binary trait logistic regression is available. With logistic regression a logistic (sigmoid) curve is
fit to the predictor value, and a p-value is computed for goodness of fit. The output will include not only the regression
p-value but also the estimates for β0 and β1.
Bonferroni and False Discovery Rate (FDR) multiple testing corrections can also be applied to the regression
results.
See the Formulas and Theories chapter for an explanation of this statistic (Section Linear Regression and Section Logistic
Regression).
Note on Missing Values
All missing values will be dropped from the analysis both from the predictor variables and from the dependent variable.
Multiple Testing Corrections
It may be possible to obtain a good test statistic by chance alone. Multiple testing corrections are designed to help ensure, if possible, this is not the case. You may optionally select one or more of the following multiple testing corrections.
Bonferroni Adjustment
The Bonferroni adjustment multiplies each individual p-value by the number of times a test was performed. This value,
which is quite conservative, seeks to estimate the probability that this test would have obtained the same value by chance
at least once from all the times this test was performed. (The number of times this test was performed will
be equal to the number of bi-allelic markers processed. Other types of tests on the same markers are not
counted.)
False Discovery Rate
The False Discovery Rate (FDR) option calculates the FDR for each statistical test selected. This test is based on the
p-values from the original test.
A general interpretation of the FDR is “What would the rate of false discoveries (false positives) be if I accepted ALL of
the tests whose p-value is at or below the p-value of this test?”
See the Formulas and Theories chapter for an explanation of this correction procedure (Section False Discovery
Rate).
Permutation Testing
Permutation testing is another way of determining if a significant test statistic value was obtained by chance
alone.
Single Value Permutation Testing With single value permutations, the dependent variable is permuted and the
given statistical test using the given model on the given marker is performed. This process is repeated the number of times
you select (counting the original test as one “permutation”). The permuted p-value is the fraction of permutations in
which the test came out as significant or as more significant than it did with the non-permuted dependent
variable.
Full Scan Permutation Testing The full-scan permutation technique differs from the single-value technique
in that it addresses the multiple testing problem. It does this by comparing the original test result from an
individual marker with the most significant permuted results from all tested markers. The specified number
of permutations are done on the dependent variable and these permutations are tested with each marker.
For each permutation only the most significant result statistic of all markers tested with that permutation is
saved.
The p-value is the fraction of permutations in which this best saved value of the test statistic was more significant than
the original statistical test on the given marker.
See the Formulas and Theories chapter for a more detailed explanation and examples of permutation testing. (Section
Permutation Testing Methodology).
Principal Components Analysis
To use principal component analysis (PCA) to correct for batch effects or stratification check the “Correct for Batch Effects/Stratification with PCA” box. The PCA options tab will be made available once this box is checked. If the “Use Corrected Dependent” box is checked, the dependent variable will also be corrected using the principal components found. If this box is not checked the dependent variable will be left uncorrected. Correcting a binary dependent variable makes it continuous. Thus, linear regression and the Corr/Trend Test are the appropriate tests in this situation.
Using the CNV Association Test Window
Summary information for the dependent variable is displayed at the top of this window for reference. This information is
visible from both tabs in this dialog window.
Data Requirements
CNV Association Tests require a marker mapped dataset containing numeric log2 ratio or related CNV data and either
case/control or quantitative trait data. To use these tests, first import your data into a SVS project (See Importing
Your Data Into A Project.) Once you have the spreadsheet for this data, select the column representing the
case/control status or quantitative trait as the dependent variable (See Column States) and access the CNV
Association Tests options dialog by selecting Analysis > CNV Association Tests from the spreadsheet
menu.
Available Tabs
The CNV Association Tests window consists of two tabs:
- Association Test Parameters: This tab contains all the parameters necessary for the association tests themselves, plus options for selecting principal component (PCA) analysis for batch effects/stratification correction of the test input data and for using PCA to correct the dependent variable.
- PCA Parameters: This tab contains all of the remaining parameters for principal component analysis
(PCA).
NOTE:- These parameters are also available in the stand-alone Numeric Principal Component Analysis window. If you wish to perform principal component analysis on your data without performing an association test, see Using the Numeric Principal Components Analysis Window.
The Association Test Parameters Tab
In the Association Test Parameters tab (see Figure 84), select all of the statistical tests you wish to perform, select
whether you wish to correct your input data for batch effects or stratification through PCA, and select any multiple-testing
corrections to apply to the results.
If an option is hidden, grayed out or inaccessible, it means a different option or options you previously selected will not
allow this hidden, grayed out, or inaccessible option to be simultaneously selected.
NOTE:
- Single Value Permutations and Full Scan Permutations can be run individually or together. You must provide a value for the number of permutations used in the test. When running both types of permutations together, the selected number of permutations is the same for both. The number of permutations should be greater than or equal to three. Permuted P-Values are calculated only for non-exact test statistics.
The PCA Parameters Tab
If you selected to correct for batch effects/stratification with PCA, you will be able to select PCA parameters from this
tab (see Figure 48).
The principal components can be computed, or if they have already been computed for the dataset, the spreadsheet of
principal components can be selected after selecting the “Use precomputed principal components” option. See
Applying PCA to a Superset of Markers and Applying PCA to a Subset of Samples for specific limitations of this
feature.
The other options include the number of components to be found, whether to output a separate eigenvalue spreadsheet,
and whether and how to eliminate component outlier subjects and recompute components. See Principal Component
Analysis Overview for an explanation of the options for this tab.
Processing
When you have selected all the tests and outputs you wish to perform, select the Run button to start the selected tests
and correction procedures. While the association test analysis is running, you can press the Cancel button on the progress
bar dialog to stop the analysis.
When the tests are completed the output spreadsheet(s) will appear.
Spreadsheet Outputs
These can be as follows:
- The results of the association tests will be displayed in a marker mapped spreadsheet. Each of the statistics calculated will be in its own column.
- If you requested a principal components spreadsheet, this will be created, with rows according to the patient or subject, and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components you requested will be shown.
- If you requested an eigenvalue spreadsheet from PCA, it will simply show the eigenvalues from large to small (of the number of components you requested).
- If you requested elimination of outlier subjects, and outliers were found, a spreadsheet will be made to list these outliers and the iteration and component in which they were found.
NOTE:
- If you wish to see any outputs in the form of p-value-style plots, you can go to the output spreadsheet and, once there, either plot the individual columns of interest by right clicking on their headers and selecting Plot Variable or going to Analysis > Plot Numeric Columns to plot multiple columns at once. The column’s data will be plotted against genomic position and the plot viewer will represent a Genome Browser with available Annotation Tracks.