Created:
July 30, 2008

User Level:
Intermediate

Products:
HelixTree, CNAM

Step 3. Find the Position of the "Elbow" in the Scree Plot

Visually finding the position of the elbow on a scree plot is very difficult. There is a way to do this automatically in HelixTree using its segmenting algorithm incorporated into interactive tree analysis.

Tree Options
Figure 1. Tree options.

Open the log transformed eigenvalue spreadsheet created in Step 2. Make sure the columns are still the same colors as when the spreadsheet was originally created (black, gray, and magenta respectively).

Select >Analysis >Interactive Tree Analysis.

A window with a single box (referred to as the root node) will appear.

From this window first select >Tree > Options. Leave all the options as the defaults except change Max Segments to 2 and Segmenting Algorithm to be Exact O(n^2) as in Figure 2. Click OK to set these parameters.

Tree Split
Figure 2. Tree options.

Next, from the tree view left click on the root node and select Manual Split. This will pop up the manual split window (not shown) and split the root node in the tree view into two "daughter" nodes (Figure 3). Note for each node, n = mean, u = mean, and s = standard deviation.

To understand what the algorithm is doing in this context, think of the eigenvalues as belonging to two groups, the eigenvalues that correspond to batch effects and/or population stratification, and the eigenvalues that don't. To find that cut point HelixTree's segmenting algorithm minimizes the sum of the squared errors to determine the index that has the smallest sum of squared errors for the two groups. This cut point is essentially the "elbow" on the scree plot.

For the example represented in Figure 2, HelixTree found 20 to be the cut point. Therefore, this is the number of PCs you should use in further PCA analysis.