Copy Number Association Features

The following is an overview of the main copy number analysis features in CNAM. For an in depth step-by-step workflow of how to use these features, download the following tutorial:

Whole Genome Copy Number Association Workflow

Data Import and Normalization

Affymetrix Gene-Chip CompatibleThe first step in copy number analysis is to create a set of log ratios by normalizing probe intensity data against a reference set. CNAM offers direct support for Illumina and Affymetrix platforms with additional functionality for other providers.

Affymetrix: CNAM substantially replicates the Affymetrix workflow for converting CEL files to log ratios, including:

  • Quantile normalization (without gender bias)
  • Virtual Array Generation (merging CN and SNP data, or NSP and STY)
  • Normalizing log ratios against reference populations (samples can be their own reference for all platforms)

This can be done for 500k, SNP 5.0, and SNP 6.0 arrays. Further, it is relatively high speed and works for thousands of samples. (2,000 samples can be processed overnight.) If preferred, CNAM can also read CNT and CNCHP files from Affymetrix's CNAT Batch Analysis and Genotyping Console Software 2.0 Software tools.

For more details, see Workflow for Reading Affymetrix CEL Files in the latest Manual.

Illumina:  A BeadStudio plug-in was developed in a joint effort with Illumina, Illumina Connect Logoproviding the ability to export Illumina intensity data directly into Golden Helix’s DSF format.  All chromosomes can be exported at once or individually during the export process.

For more details, see Exporting Copy Number Data using the HelixTree DSF Plug-In in the latest Manual.

Other Platforms: You may also create a minimal data file with your own normalized log ratio data using the Affymetrix CNT file format.

Filtering Problematic Markers

It is sometimes desirable to filter "problematic" markers before performing association, such as those with low call rates or gender-associated markers caused by effects of poorly randomized experiments. CNAM provides a straightforward method for identifying these markers and excluding them from analysis.

Correcting for CNV-Based Batch Effects and Stratification

Similar to SNPs in HelixTree, CNAM can correct log ratios for batch effects and stratification using an Eigenstrat-based principal component analysis (PCA) method.
Learn more ››

Whole Genome Log Ratio Association Tests

LogR Smooth ThumbIn addition to performing association tests on copy number covariates using segmentation (below), CNAM offers a straightforward approach to perform whole genome single marker associations (correlation/trend, t-test, and regression) directly on log2 ratios. Further, a median smoothing script is available and can be applied to real-value columns (i.e. p-values) to significantly improve signal to noise ratios (right).
Learn more ››

Identifying Copy Number Variations Using Dynamic Optimal Segmenting

CNAM employs an optimal segmenting algorithm that utilizes dynamic programming to exhaustively search through all possible change-points in log ratio data to discover regions of markers in which log ratios vary significantly from region to region. These regions will generally be where there is copy number variation in your data. CNAM employs robust permutation testing to verify found copy number variations.

CNAM offers two types of segmenting methods, univariate and multivariate. These methods are based on the same algorithm, but use different criteria for determining cut-points.

Multivariate Segmenting
The multivariate method is preferable for finding very small copy number regions (even single marker resolution), and for finding conserved regions that may be useful for association studies.

Univariate Segmenting
The univariate method segments each sample separately, finding the cut-points of each segment for each sample and outputting a spreadsheet showing all cut-points found among all samples.

For more details see, The Copy Number Segmentation Algorithm in the latest HelixTree Manual.

Association Analysis on Copy Number Variation Covariates

Performing association analysis on copy number variations found upon segmenting log ratios is done via segmenting association and regression using HelixTree. Similar to how the segmenting algorithm works in CNAM to identify copy number variations from log ratios, HelixTree employs segmenting to find the optimal split(s) in a dataset Multiway CNV Splitwhereby the mean(s) of the resulting subgroups differ based on a given variable (e.g. mean intensity values for a given marker). HelixTree then performs the appropriate association test to determine whether the differences in means among the subgroups are statistically significant.

HelixTree is unique in that it can use this segmenting association approach to find multiway splits supported by the data. This is especially powerful when considered in the context that a deletion or duplication is a disruption, and a disruption may lead to higher incidence of disease. In cases where a basic test of association, such as regression and t-tests, may not find anything, a multi-way split may find that patients with a segment log ratio mean near zero (copy number two) may have a lower incidence of disease than patients with a higher or lower log ratio mean (the tails).

Visualizing Segmenting Results

There are tools available outside CNAM  to visualize your copy number segmenting analysis results. Using the optional wiggle track file output or the CSV export of the segments data set you can directly import your results into the following genome viewers:

uscs genome browser illumina genome browser affymetrix genome browser
USCS Genome Browser Illumina BeadStudio Genome Viewer Affymetrix Integrated Genome Browser (IGB)

For more details see, Visualizing Copy Number Analysis Results in the latest Manual.