Using the Separate Principal Components Analysis Window
NOTE: This window essentially accomplishes the functions of the middle tab of the Association Test Window when Output Principal Components to Spreadsheet is selected in the first tab of that window, except it is not necessary to simultaneously perform an association test to use this window.
18.9.1 Data Requirements
Principal components analysis requires a data set containing genotypic data. First import your data into a HelixTree project (See Chapter 4). Once you have the spreadsheet for this data, access the Principal Component Analysis options dialog by selecting Genetics->Principal Component Analysis from the spreadsheet menu.
NOTE: It is common practice to inactivate those markers known to have data quality issues before using principal components analysis.
18.9.2 Processing
Select the PCA parameters–specifically, the genetic model, number of components to be found, normalization, whether to output a separate eigenvalue spreadsheet, and whether and how to eliminate component outlier subjects and recompute components. Please see 18.6.1 for an explanation of the options for this window.
Select the Run button to start finding the principal components.
When the test is complete a message will open to inform you of the number of markers analyzed and skipped and of the spreadsheet(s) produced.
18.9.3 Spreadsheet Outputs
These can be as follows:
- The principal components spreadsheet with rows according to the patient or subject and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components you requested will be shown.
- The eigenvalue spreadsheet, if you requested it, will simply show the eigenvalues from large to small (of the number of components you requested).
- If you requested elimination of outlier subjects, and outliers were found, a spreadsheet will be made to list these outliers and the iteration and component in which they were found.
NOTE: If you wish to see any outputs (such as the one column in the eigenvalue spreadsheet) in the form of p-value-style plots, you can go to the output spreadsheet and, once there, either plot the individual column(s) of interest by clicking on their header(s) (6.2.5) or select that spreadsheet’s Analysis->Plot Numeric Columns to create one plot window in which any of the output columns may be selected to show as a plot (6.5.5). The column’s data will be plotted against its row header names or numbers.