‹‹ Back to SVS Home
Creating Specialized Plots
12.12 Creating Specialized Plots
The following are instructions for creating three common plots in genetic association studies. These are multi-color plots for principal component analysis (PCA) or Gender analysis, Manhattan plots, and Q-Q or P-P Plots.
Multi-Color Scatter Plots for PCA or Gender Analysis
Both of these plots use an XY Scatter Plot and a third categorical column to split and filter the data plotted based on
either gender or batch/site/ethnicity information. If there are unexpected results or structure in the plot
based on gender or batch/site/ethnicity information, this would indicate that there maybe problems with the
data.
To create either one of these plots, the appropriate data spreadsheet must be joined with the desired phenotype
information. See Joining or Merging Spreadsheets for more information on joining two spreadsheets. Once the spreadsheet is
in the correct format, select Plot > XY Scatter Plot. Select the appropriate independent and dependent variables for the
particular application. See Multi-Color Scatter Plots for PCA or Gender Analysis or Multi-Color Scatter Plots for PCA or
Gender Analysis for more information.
Then, in the Plot Viewer, select the graph item listed under the graph. In the Graph Item Controls, select the Filter tab.
In the column selector dialog select the phenotype to use as the third dimension, or the variable to determine the plot color.
Then click Split.
The graph items are named based on the category information in the categorical or binary column used for splitting, but
these items could be renamed to have either shorter or more informative names. Right-click on the item and select
“Rename”.
PCA Analysis
For PCA Analysis the spreadsheet from which the Plot Viewer is launched should be Principal Components joined with
the phenotype information. In the XY Scatter Parameters dialog, select the first principal component for the Independent
variable and the second principal component for the Dependent variable.
In the Graph Item Control Filter tab, select the batch/site/ethnicity/gender phenotype column for splitting (see
Figure 98).
If there is structure in the plot such that most of the data points corresponding to one batch/site/ethnicity are grouped together, then there is evidence of batch to batch, site to site, or ethnic to ethnic variability and the non-randomness could negatively affect association results. PCA correction using the correct number of principal components will correct for this stratification before analysis.
This process can be repeated for other principal components. When the distribution of the different color data points becomes random, this indicates principal components may be corrected for accurately.
Gender Verification
|
|
For gender verification the gender phenotype information will need to be joined to the average X intensity and average Y
intensity information. The average X and Y intensities will need to be created using a Python script or other
means.
Once the spreadsheet is in the correct format, in the XY Scatter Parameters dialog select the Average X Intensity for the
Independent variable and the Average Y Intensity for the Dependent variable. See Figure 99 for an example of a gender
concordance plot. In this example there are several reported females that exhibit apparent mosaicism, and should be
considered for exclusion from the analysis.
In the Graph Item Control Filter tab, select the reported gender phenotype column for splitting.
There should be two clusters of data points, the upper left cluster will correspond to patients with average intensities
consistent with male intensities, and the lower right cluster will correspond to patients with average intensities consistent
with female intensities. Outliers and data points in the wrong cluster according to reported gender are suspect and should be
examined for accuracy or dropped from analysis.
This procedure can be performed if only the X chromosome is available. In this case, sort the Average X Intensity column
in Ascending order, and then select Plot > Plot Numeric Values and select the Average X Intensity for the Dependent
variable.
Multi-Color Manhattan Plots
|
|
For Multi-Color Manhattan Plots, the −log 10 p-value from an association result is plotted in such a manner that every
chromosome is a different color. This requires splitting on chromosome number.
The first step is to import the appropriate genetic marker map as a spreadsheet. To do so see Importing Genetic Marker Map as Spreadsheet.
Select Plot > Plot Numeric Values and in the Numeric Value Plot Parameters dialog select the desired p-value
column, preferably the −log 10 version if available, as the Dependent variable. For an example of a Manhattan Plot, see
Figure 100. In this case the dependent variable was Gender, the significant results in this plot indicate markers that are
associated with gender.
Select the graph item in the Graph Control Tree structure in the Graph Control Interface. Set the desired graph item behavior. Then in the Graph Item Controls Filter tab, select chromosome in the column selector list box. Click Split. This will create a graph item for every chromosome in the association test results, each one should be a different color. Colors can be changed by editing the individual graph items, and graph items can be renamed by right clicking on the graph item in the Graph Control Tree structure. A legend can be added to the graph by clicking on the graph name in the Graph Control Tree and checking the Legend box in the Annotation Tracks tab window.
Q-Q Plot or P-P Plot
If “Output Data for P-P/Q-Q Plots” is selected in an analysis window, then columns for plotting this information is
output in the association test results spreadsheet.
A Q-Q Plot is a plot of the observed quantiles versus the expected quantiles. A P-P plot is the observed p-values versus
p-value rank or −log 10(p − value) versus −log 10(rank). To plot these values, select Plot > XY Scatter Plot (see
Figure 101). In the XY Scatter Parameter dialog, select the desired expected value as the Independent variable and the
respective observed value as the Dependent variable.
In the Plot Viewer, to add a y = x line, click on the graph name in the Graph Control Tree in the Graph Control Interface. From the Graph Controls, select the “Add Item” tab and select the f(x) = m(x) + b item. The default line is the y = x line.