Created:
July 30, 2008

User Level:
Intermediate

Products:
HelixTree, CNAM

Step 1. Run PCA with the Number of PCs Equal to the Total Number of Samples Minus 1

Before you can determine the correct number of PCs to use, you first need to run PCA with the number of PCs equal to the total number of samples minus 1. This will give you the complete list of eigenvalues to build a scree plot with.

SNP Analysis
To run PCA for SNP analysis, open a spreadsheet with genotypic covariates. This spreadsheet can contain other covariate as well.

NOTE: Including the nonautosomal chromosomes in PCA analysis may result in an erroneous systematic shift of the data, so we recommend only adjusting based on the first 22 chromosomes. If you have a marker map applied to your genotype spreadsheet you can do this by selecting >Edit >Column >Select columns by chromosome/region range and only highlighting chromosomes 1-22.

Select >Genetics >Principal Component Analysis. This will pop up the following window:

Principal Component Analysis Window
Figure 1. Principal Component Analysis
window.

From here you can choose the genetic model you are testing for, how you want each marker normalized, etc.

Under the Principal Components section where it says Find up to Top, enter the number equal to the total number of samples in your spreadsheet minus 1. Make sure the Output Separate Spreadsheet of Eigenvalues box is checked and click Run.

When the analysis finishes, two spreadsheets will be created, one containing eigenvalues and the other principal components. At this point, you are still deciding how many principal components to use, so you can delete the principal component spreadsheet.

CNV Analysis
To run PCA for CNV analysis make sure to first prepare a LogR DSF file. If you are not familiar with this process, follow Step 1 of the Whole Genome Copy Number Association tutorial.

Once you have prepared a LogR DSF, from the Project Navigator window select >CNAM >LogR Association Tests & PCA.

LogR Association & PCA Window
Figure 2. LogR Association Tests and
PCA window ready to perform PCA
analysis on LogR DSF file.

From this window (Figure 2), Browse to the LogR DSF file and click Open. Notice by default all the chromosomes will appear in the box below the input DSF file location. Only those chromosomes appearing here will be corrected.

NOTE: Including the nonautosomal chromosomes may result in an erroneous systematic shift of the data, so we recommend excluding them from PCA analysis and adjusting only the first 22 chromosomes. To do this click Chromosomes, uncheck X, Y, XY, MT, and ? (if available) and click OK.

Since you are not performing association tests during this step, Uncheck the three association test boxes. And though you are still determining how many PCs to use at this point, you need to check the Correct Batch Effects/Stratification with PCA box to activate the principle Principle Component Analysis Parameters tab. Leave the Output the PCA adjusted LogRs to a DSF box unchecked.

Principal Component Tab
Figure 3. Principal Component Analysis
Parameters tab set to find 269 principal
components.

Click the Principle Component Analysis Parameters tab (Figure 3).

Enter the number of principal components equal to the number of samples in your LogR DSF file minus 1.

Make sure the Output Separate Spreadsheet of Eigenvalues box is checked. For the purpose of this tutorial, do not worry about PCA Outlier Removal. Click Run.

When the analysis finishes, two spreadsheets will be created, one containing eigenvalues and the other principal components. At this point, you are still deciding how many principal components to use, so you can delete the principal component spreadsheet.