Step 7. Perform Association Analysis on Copy Number Segment Covariates
The next step is performing association analysis on the copy number segment covariates created in Step 5 and Step 6. To do this, you first need to join the copy number segment covariate spreadsheets with your phenotype data.
Your phenotype spreadsheet should already be in HelixTree from Step 2. If you have not already done so, from the Project Navigator window go to >File >Import Data and choose either >Import Wizard or >Import ASCII File. Make sure to indicate the column with your sample names as the row label column. This will ensure each sample’s phenotype status is appropriately matched to its copy number segment data when performing the association test(s).
Open your phenotype spreadsheet and select >File >Join Spreadsheets on Row Labels. Select one of the copy number covariate spreadsheets (raw or discretized) and click OK. A new spreadsheet will be created (Figure 1) with your sample names as row labels followed by each sample’s phenotype information and then its covariates for each segment.
copy number segment covariate
spreadsheet.
Performing association analysis on this spreadsheet is done using the classic tree-based analysis approach in HelixTree. To do this, from your joined spreadsheet, first Left Click once on the column header of your dependent variable. This will turn the column magenta indicating its dependent variable status. You can also Left Click twice on the column header of each non-copy number segment column header to turn it grey. This will inactivate these columns so they won’t be used during analysis. If you want to include them, leave them black.
Next select >Analysis >Interactive Tree Analysis.

node: n = 500 samples, u = 62% cases,
s = standard deviation of .48.
A window with a single box (referred to as the root node) will appear (Figure 2) displaying the number of samples in your data set (n), the mean value of the response variable (u), and the response variables standard deviation (s). From this window you can either perform tree-based segmenting or regression association analysis.
NOTE: This tutorial only covers basic tree and regression analysis. To learn more about advanced capabilities of tree analysis see Interactive Tree Analysis in the manual.
Tree-Based Segment Association
Similar to how the segmenting algorithm works in CNAM to identify copy number variations from LogRs (Step 5), HelixTree employs segmenting to find the optimal split(s) in a dataset whereby the mean(s) of the resulting subgroups differ based on a given variable (e.g. mean intensity values for a given marker).
HelixTree then performs the appropriate association test to determine whether the differences in means among the subgroups are statistically significant. HelixTree is unique in that it can use this segmenting association approach to find multi-way splits supported by the data. This is especially powerful when considering a deletion or duplication is in general a disruption, and a disruption, regardless of its direction, may lead to higher incidence of disease. In cases where a basic test of association, such as regression and t-tests, may not find anything, a multi-way split may find that patients with a segment mean LogR near zero (copy number two) may have a lower incidence of disease than samples with either higher or lower mean LogRs (the tails).
To perform basic tree-based segment association, Right Click on the root node and select Manual Split. This will pop up the Manual Split window (Figure 3) with a list of p-values, the copy number segment covariates (referred to here as splitters), and the split rule used to perform the association.
highlighting significant regionChr1:210-216
with F-Test Bonferroni corrected
p-value (bP) of 1.15E-10.
Clicking on each row of the Manual Split window will change the tree view to reflect the various split rules. Clicking on Plot P Values by Var # will plot the p-values for each segment covariate, which will produce a p-value plot similar to the ones created in Step 4.
You can visualize the association results for each splitter by clicking Define Split at the bottom of the Manual Split window. This will plot the segment mean LogR values against the response variable. For copy number segment association, you will typically see a plot with a binary (two-way) or three-way split (denoted by one or two vertical lines respectively).
Figure 4a, shown below, displays a binary split. Notice how the plot is centered around zero on the X-axis. This is because a LogR of zero is equivalent to copy number two. A rudimentary interpretation of this plot is that on average, a copy number equal to or less than two will result in a reduced incidence of disease (control), whereas a copy number variation greater than two will result in a greater incidence of disease (case).
Figure 4b, is a screenshot of a three-way split. The interpretation of this plot is that on average, a copy number less than or greater than two will result in a greater incidence of disease (case), whereas a copy number variation equal to two will result in a reduced incidence of disease (control).
If you see a plot where the LogRs go from low to high or high to low, there is a good chance an additive affect is taking place. In this case, it would be best to perform regression association (below).
If you have multiple segment covariate spreadsheets as created in Step 6, repeat this step for each one.
Tree-Based Regression Association
Regression association is also performed using interactive tree analysis with a modification to one of the tree options. To perform regression, go back to your joined phenotype/copy number segment covariate spreadsheet. Make sure your response variable is still magenta and select >Analysis >Interactive Tree Analysis. This will again pop up the tree view with a single root node. From this window select >Tree >Options. The following window will appear (Figure 5).
/Logistic Regression selected.
Toward the middle of the window there is a check box Linear/Logistic Regression. Check this box. This will enable regression to be performed from the Tree View.
NOTE: Based on your response variable, HelixTree will automatically perform a linear regression for quantitative dependent variables and logistic regression for binary dependent variables.
Click OK.
Now from the root node in the Tree View, Right Click and select Manual Split. A similar window to Figure 6 will appear.
significant region (Chr1:163-171) with
regression based Bonferroni p-value
(bP)of 5.22E-05.
Notice now there are regression Split Type’s and the split rule is equivalent to the regression equation. The p-values are also different because they are now based on a regression test.
Click Create Split. This will drop a residual node in the Tree View. (Figure 7). Notice the mean (u) in the residual node is now zero. From here you can plot the association results by Right Clicking on the root node and selecting >Visualize Split Data >Show Split Data. A plot similar to that in the Define Split window will appear (Figure 8).
“split” with option to view association
results (split data).
From here you can repeat Step 5 and segment additional chromosomes or visualize interesting regions in an external genome browser (Step 8).