‹‹ Back to SVS Home

Numeric Principal Component Analysis

7.9 Numeric Principal Component Analysis

Data Requirements

Numeric principal components analysis requires a dataset containing numeric data. First import numeric data into an SVS project (see Importing Your Data Into A Project). Once a spreadsheet has been created in a project with numeric binary, integer, and/or real-valued data, access the Numeric Principal Component Analysis options dialog by selecting Quality Assurance > Numeric Principal Component Analysis from the spreadsheet menu.

NOTE:

  • It is common practice to inactivate markers known to have data quality issues before using principal components analysis.
Using the Numeric Principal Components Analysis Window

[Picture]

Figure 41: Numeric Principal Component Analysis

NOTE:

  • This window (see Figure 41) essentially accomplishes the functions of the PCA Parameters tab from the Numeric Association Tests dialog obtained through the Analysis menu when Correct for Batch Effects/Stratification with PCA is selected on the Association Test Parameters tab. The only difference is that it is not necessary to simultaneously perform an association test to use the separate Numeric Principal Component Analysis window.

Processing

The principal components can be computed, or if they have already been computed for the dataset, the spreadsheet of principal components can be selected after selecting the “Use precomputed principal components” option. See Applying PCA to a Superset of Markers and Applying PCA to a Subset of Samples for specific limitations of this feature.

Select the PCA parameters – specifically, the maximum number of components to find and correct for, the spreadsheets to output, whether to center by marker or by sample or by both marker and sample, and whether to eliminate component outlier subjects. When PCA outlier removal is performed by recomputing components, there are selections for the number of times to recompute the components, the criteria for determining an outlier, and the number of components to remove outliers from.

For new projects, data centering by marker is normally recommended. Data centering by sample may be useful for components that are applied to chromosomes or regions other than those from which they are originally computed. For more information on these options, see Centering by Marker vs. Not Centering by Marker and Centering by Sample vs. Not Centering by Sample.

See Correction of Input Data by Principal Component Analysis for more information on other options in this dialog.

Select the Run button to perform the analysis.

When the analysis is complete, a message indicating the number of components found and the number of predictors analyzed will be appended to the Node Change Log for each output spreadsheet. All spreadsheets selected for output will be opened.

Spreadsheet Outputs

The possible output spreadsheets are as follows:

  • The corrected input data. (Recall that genotypic data is first converted to numeric data by the selected genetic model.)
  • The principal components spreadsheet with rows according to the sample and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components requested will be shown.
  • The eigenvalue spreadsheet will simply show the eigenvalues from greatest to smallest (of the number of components requested).
  • If recomputing components after removal of outliers was selected, and outliers were found, then a spreadsheet will be created to list these outliers and the iteration and component in which they were found.

NOTE:

  1. If you wish to plot any outputs (such as the one column in the eigenvalue spreadsheet) select Plot Numeric Values from the Plot menu. The column of data will be plotted against the row labels or eigenvalue number.
  2. If you wish to plot a principal component eigenvector against another eigenvector, select XY Scatter Plots from the Plot menu. See Multi-Color Scatter Plots for PCA or Gender Analysis for more information.