‹‹ Back to SVS Home

Creating Specialized Plots

12.12 Creating Specialized Plots

The following are instructions for creating three common plots in genetic association studies. These are multi-color plots for principal component analysis (PCA) or Gender analysis, Manhattan plots, and Q-Q or P-P Plots.

Multi-Color Scatter Plots for PCA or Gender Analysis

Both of these plots use an XY Scatter Plot and a third categorical column to split and filter the data plotted based on either gender or batch/site/ethnicity information. If there are unexpected results or structure in the plot based on gender or batch/site/ethnicity information, this would indicate that there maybe problems with the data.

To create either one of these plots, the appropriate data spreadsheet must be joined with the desired phenotype information. See Joining or Merging Spreadsheets for more information on joining two spreadsheets. Once the spreadsheet is in the correct format, select Plot > XY Scatter Plot. Select the appropriate independent and dependent variables for the particular application. See Multi-Color Scatter Plots for PCA or Gender Analysis or Multi-Color Scatter Plots for PCA or Gender Analysis for more information.

Then, in the Plot Viewer, select the graph item listed under the graph. In the Graph Item Controls, select the Filter tab. In the column selector dialog select the phenotype to use as the third dimension, or the variable to determine the plot color. Then click Split.

The graph items are named based on the category information in the categorical or binary column used for splitting, but these items could be renamed to have either shorter or more informative names. Right-click on the item and select “Rename”.

PCA Analysis


[Picture]

Figure 98: Plot of Second Eigenvector vs First Eigenvector – Split on Ethnicity

For PCA Analysis the spreadsheet from which the Plot Viewer is launched should be Principal Components joined with the phenotype information. In the XY Scatter Parameters dialog, select the first principal component for the Independent variable and the second principal component for the Dependent variable.

In the Graph Item Control Filter tab, select the batch/site/ethnicity/gender phenotype column for splitting (see Figure 98).

If there is structure in the plot such that most of the data points corresponding to one batch/site/ethnicity are grouped together, then there is evidence of batch to batch, site to site, or ethnic to ethnic variability and the non-randomness could negatively affect association results. PCA correction using the correct number of principal components will correct for this stratification before analysis.

This process can be repeated for other principal components. When the distribution of the different color data points becomes random, this indicates principal components may be corrected for accurately.

Gender Verification


[Picture]

Figure 99: Gender Verification – Average Y Chr Intensity vs Average X Chr Intensity, Split on Reported Gender

For gender verification the gender phenotype information will need to be joined to the average X intensity and average Y intensity information. The average X and Y intensities will need to be created using a Python script or other means.

Once the spreadsheet is in the correct format, in the XY Scatter Parameters dialog select the Average X Intensity for the Independent variable and the Average Y Intensity for the Dependent variable. See Figure 99 for an example of a gender concordance plot. In this example there are several reported females that exhibit apparent mosaicism, and should be considered for exclusion from the analysis.

In the Graph Item Control Filter tab, select the reported gender phenotype column for splitting.

There should be two clusters of data points, the upper left cluster will correspond to patients with average intensities consistent with male intensities, and the lower right cluster will correspond to patients with average intensities consistent with female intensities. Outliers and data points in the wrong cluster according to reported gender are suspect and should be examined for accuracy or dropped from analysis.

This procedure can be performed if only the X chromosome is available. In this case, sort the Average X Intensity column in Ascending order, and then select Plot > Plot Numeric Values and select the Average X Intensity for the Dependent variable.

Multi-Color Manhattan Plots

[Picture]

Figure 100: Manhattan Plot of -Log10 P Values from Association Test on Simulated Phenotype of 90 CEU HapMap Samples

For Multi-Color Manhattan Plots, the log 10 p-value from an association result is plotted in such a manner that every chromosome is a different color. This requires splitting on chromosome number.

The first step is to import the appropriate genetic marker map as a spreadsheet. To do so see Importing Genetic Marker Map as Spreadsheet.

Select Plot > Plot Numeric Values and in the Numeric Value Plot Parameters dialog select the desired p-value column, preferably the log 10 version if available, as the Dependent variable. For an example of a Manhattan Plot, see Figure 100. In this case the dependent variable was Gender, the significant results in this plot indicate markers that are associated with gender.

Select the graph item in the Graph Control Tree structure in the Graph Control Interface. Set the desired graph item behavior. Then in the Graph Item Controls Filter tab, select chromosome in the column selector list box. Click Split. This will create a graph item for every chromosome in the association test results, each one should be a different color. Colors can be changed by editing the individual graph items, and graph items can be renamed by right clicking on the graph item in the Graph Control Tree structure. A legend can be added to the graph by clicking on the graph name in the Graph Control Tree and checking the Legend box in the Annotation Tracks tab window.

Q-Q Plot or P-P Plot

[Picture]

Figure 101: Plot of Observed versus Expected -Log10 P-Values

If “Output Data for P-P/Q-Q Plots” is selected in an analysis window, then columns for plotting this information is output in the association test results spreadsheet.

A Q-Q Plot is a plot of the observed quantiles versus the expected quantiles. A P-P plot is the observed p-values versus p-value rank or log 10(p value) versus log 10(rank). To plot these values, select Plot > XY Scatter Plot (see Figure 101). In the XY Scatter Parameter dialog, select the desired expected value as the Independent variable and the respective observed value as the Dependent variable.

In the Plot Viewer, to add a y = x line, click on the graph name in the Graph Control Tree in the Graph Control Interface. From the Graph Controls, select the “Add Item” tab and select the f(x) = m(x) + b item. The default line is the y = x line.