Copy Number Analysis Examples and Tutorials
25.9.1 Univariate Segmenting Example
This example will demonstrate the univariate segmenting algorithm for copy number analysis (25.10). In this example, we will import the segment mean data calculated using the univariate segmenting algorithm. We will then view the segments found using the UCSC Genome Browser.
25.9.1.1 Performing Univariate Segmenting on Two Samples
For this example, we import the segment mean data for a DSF containing two samples each with 100 markers. To import the segmenting results we open from the Project View CNAM->Copy Number Analysis (25.3). We select our DSF file and select to perform the analysis using the Univariate algorithm. From the Optional Output tab, we choose to output the segmenting results as Wiggle track files to import into the UCSC Genome Browser.
When the analysis is complete we see from the Run Log that two segments were found in sample1 and five segments were found in sample2. Viewing the resulting Segment Means spreadsheet, we can see where the segments occur and what the mean log2 ratios are for each segment.
|
Note that in our example there were two samples in the data set. These samples each have different segmenting results, but to show the samples in the same spreadsheet, the markers for each sample are separated to show each unique segment for all samples. This output format results in there being, for individual samples, adjacent segments with equivalent segment means.
25.9.1.2 Viewing the Segmenting Results in the Genome Browser
To view the segmenting results graphically, we use the UCSC Genome Browser (25.8.2). Using the Manage Custom Tracks feature of the genome browser, we upload the WIG files containing the segmenting results. Because we performed univariate segmenting on two samples, the analysis outputted two WIG files. Both files can be viewed together in the genome browser. The resulting plot shows the segments and segment means for both samples.
|
25.9.1.3 Checking the Results Against a Tree Analysis
To check the univariate segmenting results, we use the same data in a tree analysis to find the significant splits. We import a file for each sample containing the same log2 ratio data. To create this spreadsheet, we import the log2 ratio data, transpose the spreadsheet, and join it with a column of marker indexes.
|
For the first sample we set the log2 ratio column as the dependant variable and perform an Interactive Tree Analysis (7). In the Tree View, we click on the tree node and select Manual Split. From the Manual Split window, we choose SNP# as the splitter and select Define Split (7.3.1.2). By repeating these steps for the second sample we created two Define Split plots, each showing the split data for one sample.
|
The views in figure 25.16 were created by averaging over 10 values and averaging between segments rather than averaging overall. From these plots you can see that these splits correspond with the segmenting results found using the univariate segmenting algorithm. To see the details of these splits, press OK in the Define Split window. The resulting tree shows exactly where the splits occur and the (rounded) mean between those splits.
25.9.2 Multivariate Segmenting Example
This example will demonstrate the multivariate segmenting algorithm for copy number analysis (25.10). In this example, we will import the segment mean data calculated using the multivariate segmenting algorithm. We will then view the segments found using the UCSC Genome Browser.
25.9.2.1 Performing Multivariate Segmenting on Two Samples
For this example, we import the segment mean data for a DSF containing two samples each with 100 markers. To import the segmenting results we open from the Project View CNAM->Copy Number Analysis (25.3). We select our DSF file and select to perform the analysis using the Multivariate algorithm. From the Optional Output tab, we choose to output the segmenting results as a Wiggle track file to import into the UCSC Genome Browser.
When the analysis is complete we see from the Run Log that four common segments were found between sample1 and sample2. Viewing the resulting Segment Means spreadsheet, we can see where the segments occur and what the mean log2 ratios are for each segment.
|
25.9.2.2 Viewing the Segmenting Results in the Genome Browser
To view the segmenting results graphically, we use the UCSC Genome Browser (25.8.2). Using the Manage Custom Tracks feature of the genome browser, we upload the WIG file containing the segmenting results. Because we performed multivariate segmentation on the samples, the results for both samples are contained in one WIG track file. The resulting plot shows the segments and segment means for both samples.
|
25.9.3 Data Quality Tutorial
There may be instances where you want to exclude markers from your data if you know their performance is questionable. The following tutorial demonstrates finding markers in non-sex chromosomes that are being falsely associated with gender. These markers will then be used to create an exclusion list for removing the erroneous markers from future segmenting analyses.
Note: It is important when thinking about data quality to consider the source of your data. Normalization of the intensity data is required to evaluate copy number variation; the method of normalization will effect the quality of the copy number signal.
25.9.3.1 Finding Erroneous Markers
To begin, we import log ratio data for chromosomes 1 through 22. We join the imported spreadsheet with a spreadsheet containing sex phenotypes for those samples. The column containing the sex phenotypes is set to active-dependant.
Note: If you have many samples in your data set, you may be unable to import all chromosomes 1 through 22 together. You may alternatively import one chromosome at a time and create a marker exclusion list for each chromosome.
|
Select to perform an Interactive Tree Analysis on the joined data set. In the Tree View, go to Tree->Options to select Linear/Logistic Regression and to deselect Non-genetic splits (7.2). Return to the Tree View and, on the tree node, click for a drop down menu and select Manual Split. From the Manual Split window, select Plot P Values by Var # (7.3.1.2). From this plot you can see where there are markers whose P-Values are significantly different from the rest of the data set. These markers are the ones to be removed as they show an incorrect correlation between themselves and the gender of the sample.
|
25.9.3.2 Creating the Marker Exclusion List
From the plot window, select File->P-Value Spreadsheet to open a spreadsheet containing these values. Sort the spreadsheet by aP (adjusted P) to arrange the markers with the smallest P-Values at the top of the spreadsheet. We decide that, for future analysis of this data set, we will want to exclude all markers with P-Values less than 10-4.
|
We create a subset of these markers to save and to use to create an exclusion list. Inactivate all rows containing acceptable P-Values and export the active rows as a CSV file. Using a spreadsheet editor or some other utility, you will need to remove all data except the marker names (the row labels). The comma separated file containing only marker names will be used as the exclusion list (25.3.5).