QUICK LINKS



" 'Where is the missing heritability?' is a question asked frequently in genetic research. The difficulty seems to come down to the common disease/common variant hypothesis not holding up." » Read more

Tutorial: SNP Analysis

Updated: November 2009

Level: Beginner

Modules: HelixTree, WGA, Regression

The following tutorials are designed to systematically introduce you to a few of the new features and enhancements in SVS 7, particularly focusing on SNP association. They are not meant to replicate all the workflows you might use in a complete analysis, but instead touch on a sampling of the more typical scenarios you may come across in your own studies.

7. REGRESSION ANALYSIS

Need Assistance With Your Research?

Golden Helix offers free webinar-based training and a host of analytic services covering every aspect of genetic association studies including study design, quality assurance, and data analysis.

Request Webinar Training | Service Information

Now that you have uncovered the top hits, you can use them in a full vs. reduced model regression analysis to see how much they add to the explanation of case/control status beyond what other associated phenotypic variables already explain.

A. Isolating SNPs for Full Model

At this point you are only interested in the top hits that passed a whole genome chi-squared test, so you need to subset them out from the original filtered spreadsheet.

  • Open the Association Tests (Basic Allelic Test) spreadsheet. Right-click on the Chi-Squared P column and select Sort Ascending.

For this tutorial only the top two hits will be considered as they were the only ones to meet a 0.05 Bonferroni significance level.

  • Select Select >Row >Inactivate All Rows, turning all rows gray.
  • Next, left-click on the row labels for the first and second SNPs turning them black or active.

A new spreadsheet, Association Tests (Basic Allelic Tests) - Mapped Sheet 2, is created. This spreadsheet will be used to isolate only these two markers in the original filtered genotype spreadsheet.

  • Open the HM_All – Active Subset spreadsheet and go to Select >Activate Columns by Row Labels.
  • Choose the Association Tests (Basic Allelic Tests) –Mapped Sheet 2 spreadsheet (just created) and click OK.

A new spreadsheet tab is created, HM_All - Mapped Sheet 3, with only the two significant SNP columns active. Since it is hard to see this with so many columns, you can search for those SNPs in the spreadsheet to confirm.

  • Select Edit >Find. Enter SNP_A-4236717 in the text box. Make sure Column Names is selected under Find in: and click OK. If desired, search for the second significant SNP, SNP_A-4194530.

In order to perform regression, the phenotypic variables need to be activated.

Figure 13. Regression Analysis Window

Figure 13. Regression Analysis Window

  • Scroll to the beginning of the HM_All - Mapped Sheet 3 spreadsheet.
  • Left-click once on the C/C column header to activate it.
  • Holding the Shift key, scroll to the last phenotypic variable (Med Class) and left-click its column header. This will activate all columns in between.

To clean up the project a little, first create a column subset spreadsheet from these active columns and then create a top-level spreadsheet.

  • Go to Select >Column >Column Subset Spreadsheet.
  • From the resulting subset spreadsheet select File >Create Top-Level Spreadsheet. Give it the name HM_All Regression and click OK.
  • Close all other windows except the newly created HM_All Regression - Sheet 1 spreadsheet.

B. Recoding Genotypes

In order to perform regression analysis on genotypic variables, you need to recode them as integers based on a specified genetic model.

  • Open the HM_All Regression – Sheet 1 spreadsheet and select Edit >Recode Genotypes.
  • In the Recode Genotypes window select Encode genotypes numerically based on genetic model: and select the Additive Mode.
  • Create the recoded spreadsheet as a child of the Current spreadsheet and click OK.
Figure 14. Regression results

Figure 14. Regression results

C. Full vs. Reduced Model

  • From the HM_All Regression Numeric Genotypes Additive spreadsheet, left-click the C/C column header once to make it the dependent (magenta) and select Analysis >Regression Analysis.
  • Within the Regression Analysis window, click Compute significance of full model vs. reduced model under Model Significance to activate the Reduced Model Regressors section.
  • Click Add Covariate under Reduced Model Regressors. Select both Age and Weight (Lbs) (Ctrl-click) since they are known to be associated with the case/control status (which is based on Sbp) and click Add.
  • Click Close to close the covariate selection window.
  • Add the two SNPs as Full model covariates in the same manner.
  • Make sure the Perform regression with selected covariates only radio button is selected. The resulting window should look like Figure 13.
  • Click Run.

Upon completion, a Regression Statistics Viewer window will appear (Figure 14) with the results for the full vs. reduced model. Notice SNP_A-4236717 has an odds ratio of ~2.92. Not bad...

© 2010 Golden Helix, Inc. All Rights Reserved

Privacy Policy   |   Contact Us