Is there a recommended cutoff for Hardy Weinberg Equilibrium and minor allele frequency for whole genome analysis?

We are really not sure if there are accepted rules of thumb, so a more cautious answer is, that it depends.

There are certain population structures where large departures from HWE are legitimate, as well as regions of the genome prone to copy number deletions that could result in large departures from HWE. If you ignore that and assume the p-values were uniformly distributed, then by chance alone for say a 500K dataset, the number of p-values < .01 out of 500,000 would be 500,000*.01=5000. So if you picked a threshold of .01, you would be throwing away 5000 snps that by chance alone have a HWE < .01. Some people pick a .001 cutoff, so they are throwing away no more than 1/10th of a percent of the real data (perhaps much more of the bad data). Again, if you had reason to believe the departures from HWE were real, you might modify this. It would be worthwhile to see how many snps fail your threshold and to see if it is what is expected by statistical chance. In a very high quality data set, absent the other processes we described above, those numbers would be the same and you would have reason to lower your threshold. If, the contrary is true, it might be more desirable to make sure you were getting rid of bad snps than being worried about losing a few good snps, in which case the .001 threshold might be adequate.

Note, that the X-chromosome will mostly be out of HWE by definition when males are included, so you might want to use the females only for filtering the X chromosome.

With minor allele frequency, if you have a small number of cases with the disease, the rare alleles could be powerful in performing associations, and you would want to be careful in excluding them. Consider the extreme case where a rare allele appears in exactly the cases, and not the controls -- power is maximized. I would not imagine going lower than .001. Perhaps .01 or .05 would be a reasonable thresholds to consider, particularly if you don't believe the process is driven by rare alleles (say in a balanced number of cases and controls) and you wish to reduce the multiple testing by getting rid of snps that are either bad or unlikely to be informative.

Would you like to...

Print this page Print this page

Email this page Email this page

Post a comment Post a comment

Subscribe me

Add to favorites Add to favorites

Remove Highlighting Remove Highlighting

Edit this Article

Quick Edit

Export to PDF

User Opinions (3 votes)

100% thumbs up 0% thumbs down

How would you rate this answer?



Thank you for rating this answer.

Related Articles

Attachments

No attachments were found.

Visitor Comments

  1. Comment #1 (Posted by EHAB )
    I HOPE TO HELP ME TO DETERMINED THE APPLICATION WHERE SOFTWARE?

Post a comment

To post a comment for this article, simply complete the form below. Fields marked with an asterisk are required.
   Name:
   Email:
* Comment:
* Enter the code below:
 

Continue

© 2010 Golden Helix, Inc. All Rights Reserved

Privacy Policy   |   Contact Us