Copy Number Analysis Overview

25.1.1 Copy Number Variation

A normal base pair has two copies, one for each chromosome. (A base pair in the X chromosome in men will normally have only one copy.) Even if the two base pairs are of different genotypes, it is still considered that there are two copies of it.

However, under certain circumstances, especially in the case of certain diseases, there may sometimes be a base pair, or even an entire chromosome, that will either be replicated more than two times or, on the other hand, appear just once or deleted entirely. The number of copies of a base pair is termed “copy number”, and this variation of the copy number is termed “copy number variation”.

When being scanned by micro arrays, the more copies of a base pair there are, the higher total intensity there will be, irrespective of which alleles may be present if it is a polymorphism. There is typically a lot of processing to transform intensity data to a quantile-normalized log base-2 ratio of intensities of observations versus a reference population. When the intensities of the observations are the same as the reference population median for a given marker, the log2 ratio will be equal to zero. Amplifications over the reference standard will be significantly larger than zero, and deletions will be significantly less than zero.

25.1.2 The Copy Number Analysis Module (CNAM)

The Copy Number Analysis Module (CNAM) is capable of scanning microarray log2 ratio data from both the Affymetrix and Illumina platforms with the object of determining where copy number variation occurs, and then performing association analysis on the log2 ratios over these regions of copy number variation. Your own log2 ratio data (possibly from other platforms) may also be prepared and analyzed by CNAM by first converting it to the Affymetrix CNT text file format (C.4).

The workflow for doing this would be as follows:

  • Prepare a “.dsf” file of log2 ratio data from your Affymetrix or Illumina data.
  • Execute the copy number optimal segmenting algorithm on your “.dsf“ data file.
    • Scan the log2 ratio data and perform segmenting on it to determine regions of probable copy number variation. In addition to the normal progress bar, a log will be shown informing you of its progress in segmenting sub-regions of the region of markers being analyzed.
    • Import a new covariates spreadsheet into HelixTree consisting of average log2 ratios over the respective regions for each sample.
    • Optionally export a UCSC Wiggle File containing the segment positions and means.
  • Import another spreadsheet with phenotypic data such as case-control data.
  • Join the covariates spreadsheet with the phenotypic spreadsheet such that the phenotypic data comes first.
  • Perform association analysis on a phenotype with the covariates from the joined spreadsheet.