Created:
March 27, 2008

Updated:
July 16, 2008

User Level:

Intermediate

Products:
HelixTree, CNAM, WGA Module

Step 5. Run Segmenting Algorithm to Generate Segmenting Mean Values and Covariate Table

CNAM employs an optimal segmenting algorithm that utilizes dynamic programming to exhaustively search through all possible change-points in LogR data to discover regions of markers in which LogRs vary significantly from region to region. These regions will generally be where there is copy number variation in your data.

The segmenting process is optimized by working at three levels:

  1. 1. If desired, the region of markers is subdivided into a moving window of sub-regions.
  2. 2. A unique segmenting algorithm is applied to find multiple segments wherever possible, and by implication, segment boundaries which we term “cut-points”.
  3. 3. A permutation algorithm is applied to validate the found cut-points.

CNAM offers two types of segmenting methods, univariate and multivariate. These methods are based on the same algorithm, but use different criteria for determining cut-points.

The multivariate method segments all samples simultaneously, finding general copy number regions that may be similar across all samples. This method is preferable for finding very small copy number regions, and for finding conserved regions that may be useful for association studies. For a given sample, the covariate is the mean of the LogRs within each segment for that sample. If there are consistent positions for copy number variation across multiple samples, the copy number segments will be found.

In reality, there may not always be consistent copy number segments across multiple samples. The univariate method segments each sample separately, finding the cut-points of each segment for each sample and outputting a spreadsheet showing all cut-points found among all samples.

This tutorial focuses on performing multivariate segmentation.

Performing Copy Number Segmentation
From the Project Navigator window, select >CNAM >Copy Number Segmentation. The Copy Number Segmentation windows will appear (Figure 1).


Copy Number Segmentation Window
Figure 1. Copy Number Segmentation
window

As with LogR association tests, you can perform segmentation on any LogR DSF file, though we recommend first correcting for batch effects and stratification and filtering out problematic markers as was done in Steps 2 and 3.

Browse to the LogR DSF file created in Step 3 and select the Chromosomes you want to segment.

Note: Segmentation takes a fair amount of time to run. You may want to start with only those chromosomes that showed interesting regions from the LogR association tests in Step 4.

The next set of parameters, Chromosome Segmenting Options, allows you to choose which method you want to use for segmenting (multivariate or univariate) and to optimize the speed of the analysis. For the purpose of this tutorial, select the Multivariate algorithm and leave the rest of the parameters as the default values. For more information about these parameters, see the following:

›› Using the Copy Number Analysis Segmentation Tool

Next, click on the Optional Output tab. Check the Optional Bookmark File Output box if you want to export the segment means to a UCSC Wiggle Track (WIG) format file. WIG files allow you to view segment results in supported genome browsers such as UCSC’s Genome Browser and Affymetrix’s Genome Browser). Viewing segment results in UCSC’s Genome Browser will be covered in Step 8. Viewing segmenting results in Affymetrix's GTC Browser is covered in the following tutorial:

›› Visualizing Copy Number Variation in the Affymetrix Genome Browser

NOTE: If you are considering using UCSC’s Genome Browser, be aware there are limitations affected by the parameter settings in this step. Please see Step 8 before proceeding.

Enter a file name and Browse to a path where you want to save the WIG file and click Save.

Note: If outputting WIG files while using the univariate segmenting algorithm, the browse button will have you select a directory location because a WIG file will be generated for each sample. These files will be named using the sample name.

If you wish to exclude additional markers other than those already excluded in Step 3, go to the Exclude Markers tab, Browse to the CSV file with the additional markers and click Open.

You are now ready to run segmentation. Click Run.

Copy Number Segmentation
Figure 2a. Copy number segment
covariate spreadsheet.
Segment Means
Figure 2b. Segment means spreadsheet.









When segmentation has completed, two spreadsheets will pop up. The first is a copy number segment covariate spreadsheet (Figure 2a), which contains the mean LogR value for each sample within each segment of markers. The second is a segment means spreadsheet (Figure 2b), which contains columns for the chromosome name, segment start position, segment end position, segment mean, and the segment length in number of markers. If you selected to output a WIG file this will be stored at the path you chose above.

From here you can visualize found copy number segments in a genome browser (Step 8), discretize the segment covariate spreadsheet to create additional two-state or three-state covariates (Step 6) or move on to performing association analysis on the copy number segment covariates (Step 7).