Assess and Remedy Data Quality

Quality Control








QUALITY CONTROL

To ensure your data is of the highest quality, HelixTree provides a variety of features that not only help you assess the quality of your data, but remedy any problems as well.

Allele and Genotype Frequencies/Counts
Quickly assess the minor allele frequency (MAF) and allele and genotype counts for each marker in your dataset.

Call Rates
Calculate the fraction of called genotypes for each marker. With data from certain providers you can also set a confidence threshold upon import to indicate which genotypes are to be called or not.

Hardy-Weinberg Equilibrium P-Value
Determine how closely respective genotypes in your dataset approximate a state of Hardy-Weinberg equilibrium (HWE) by rapidly calculating and plotting HWE p-values for an entire dataset or subgroups (e.g. cases or controls) within the dataset.
›› More about Hardy-Weinberg Equilibrium

Fisher’s Exact Test for HWE P-Value
This option displays Fisher’s Exact Test HWE P-Values for each marker. For each marker this is calculated for cases, controls, and the total data set.
See the Formulas and Theories chapter of the HelixTree Manual for a more comprehensive explanation of this statistic.

Signed HWE Correlation R
This option displays the Signed HWE Correlation R for each marker. This is a measure designed to show specifically if the data for this marker shows a tendency towards being homozygous (positive signed R) or towards being heterozygous (negative signed R). This statistic is calculated for each marker for each of cases, controls, and the total data set.

Genotype Gender Check
Dr. Bo Peng of MD Anderson Cancer Center developed a script, which predicts the gender of given samples by looking at the rate of heterozygosity in the chromosome X genotype data. To use this functionality, download the following script and save it to the ../HelixTree/scriptsHT/user/Spreadsheet/Genetics/ directory.
GenotypeGenderCheck.py
| Documentation

SNP Concordance
It is often beneficial to genotype a set of samples more than once to confirm the validity of an assay. The SNP Concordance feature facilitates this by calculating various concordance rates for all SNPs for a given set of samples. You can also use it to check for concordance between the same markers genotyped on different platforms.

Filtering Markers
Easily exclude data that is out of HWE, has a minor allele frequency or call rate below a user-specified threshold, or does not meet other quality controls.

Haplotype Frequency Estimation
The haplotype frequency viewer enables you to estimate haplotypes for selected loci using both the EM and Composite Haplotype Method (CHM) algorithms. Combining haplotypes enables you to generate a diplotype table for further analysis.
›› More about Haplotype Frequency Estimation

Inferring Missing Genotype Data
Though call rates for the latest genotyping platforms consistently exceed >95%, tens of thousands of SNPs may still not receive a call. HelixTree makes it easy to “recover” this data by inferring missing genotypes using an extension of the Expectation-Maximation (EM) algorithm.
›› More about Inferring Missing Genotypes

Data Import and Preparation Stratification Correction Genetic Association Testing Mitigating False Positives Advanced Association Analysis