Step 3. Find the Position of the "Elbow" in the Scree Plot
Visually finding the position of the elbow on a scree plot is very difficult. There is a way to do this automatically in HelixTree using its segmenting algorithm incorporated into interactive tree analysis.
Open the log transformed eigenvalue spreadsheet created in Step 2. Make sure the columns are still the same colors as when the spreadsheet was originally created (black, gray, and magenta respectively).
Select >Analysis >Interactive Tree Analysis.
A window with a single box (referred to as the root node) will appear.
From this window first select >Tree > Options. Leave all the options as the defaults except change Max Segments to 2 and Segmenting Algorithm to be Exact O(n^2) as in Figure 2. Click OK to set these parameters.
Next, from the tree view left click on the root node and select Manual Split. This will pop up the manual split window (not shown) and split the root node in the tree view into two "daughter" nodes (Figure 3). Note for each node, n = mean, u = mean, and s = standard deviation.
To understand what the algorithm is doing in this context, think of the eigenvalues as belonging to two groups, the eigenvalues that correspond to batch effects and/or population stratification, and the eigenvalues that don't. To find that cut point HelixTree's segmenting algorithm minimizes the sum of the squared errors to determine the index that has the smallest sum of squared errors for the two groups. This cut point is essentially the "elbow" on the scree plot.
For the example represented in Figure 2, HelixTree found 20 to be the cut point. Therefore, this is the number of PCs you should use in further PCA analysis.
TABLE OF CONTENTS |
|
| Introduction |
|
| Compute and Plot the Log of Eigenvalues |
|
| ›› | Find the Position of the "Elbow" in the Scree Plot |