SVS offers options for performing many different QC functions on genomic data. This blog takes you through some of the most commonly applied filters for various analysis types.
Filters for GWAS data vary depending on the type of association tests you are performing. A typical GWAS for a common variant usually requires filters to remove problematic or poorly called variants, and also to eliminate rare variants, as they have limited statistical power. The default minor allele frequency (MAF) threshold in SVS is set at 5%, but users may often wish to use lower thresholds (1% or less), especially with larger numbers of samples. The default call rate threshold in SVS is 0.95, but might be adjusted to reflect the call rate which would be considered an outlier in your data. LD pruning to remove correlated SNPs is a good practice prior to running principal components analysis, IBD analysis, or other population-level functions that might be biased by large blocks of redundant SNPs. Most of these functions, together with many others, can be found under the Genotype menu in any SVS spreadsheet.
Note: See some of our previous blogs and webcasts about GWAS data quality.
When looking at variant calls from DNAseq data it is necessary to make sure the data used in analysis is accurate. SVS allows you to utilize any of the quality metrics contained within your VCF file for filtering variants, and also calculates some other useful QC statistics. To filter on values from the VCF file, the usual starting point is to use the function Set Genotypes to No-Call Based on Additional Spreadsheets in the DNASeq menu. For example, a VCF file commonly includes information like Read Depths, Allelic Depths, and Genotype Qualities. Each of these additional spreadsheets can be used in various ways to filter your data. Options for filtering include: based on the zygosity state, a range of values or a simple value threshold. This will set any marker to show missing data if the threshold selected is not met for a particular sample and eliminates entire markers from the dataset if none of the samples for that marker pass the filters. Within SVS you can also calculate useful statistics like Ti/Tv ratios, singleton counts, and genotype concordance rates for duplicates.
Note: For more information on VCF file formats please see Dr. Bryce Christensen’s Blog on the subject.
Quality checking genomic data is just as important, or more so, than analyzing the results because if the data is of low quality a result may not end up being significant, or being significant with no supporting evidence. This has been a brief review of common filters and quality assurance metrics available in SVS. If there is a function that you did not read about here, but are interested in seeing if SVS will work for your research, please contact our Support Team.
We try to make quality assurance and data filtering a less stressful process in SVS and I hope these tips and tricks have been helpful. We offer researchers flexible options to choose their own filtering and quality assurance parameters. The parameters in this blog are merely common values and should not be used without consideration of your own data filtering and quality assurance needs.