| Overview video » | Hitchhiker's Guide to NGS » | |
![]() |
![]() |
Golden Helix is committed to helping you succeed.
All packages of SVS come with our bioinformatic experts at your fingertips. You can rest assured that you have someone to call when you get stuck or have a question.
Learn more about the Bioinformatic Support
that comes with SVS »
Next-generation sequencing poses some unique data import and management challenges. Unlike microarrays where every sample is assayed for the same SNP set, next-generation sequencing generates variant calls unique to each sample. Most of these are considered rare and are not currently catalogued, which makes conventional data import and mapping difficult. SVS makes this easy with streamlined import and mapping of common and standardized formats, such as variant call files (VCF) or variant files from Complete Genomics. Furthermore, you can combine NGS data from multiple sources without having worry about file format compatibility. If your files contain read depth and quality scores, they will be imported automatically as well. From there, quality assurance, variant filtering, and analysis is as fast as ever.
Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frameshifting, etc). This gives insight into which variants are most likely to have functional effects.
After performing extensive quality assurance on your data, the next step is sorting through all your variants to find those that really matter. Though more manageable on a whole-exome scale, this process can be daunting. The 1000 Genomes Project has already catalogued more than 25 million variants, 18 million more than dbSNP 135 (the closest thing to a database for common variants). Each person is expected to have roughly 4 million variants, 20 thousand in coding regions, and 250-300 that are potentially damaging. How do you distinguish the relatively small number of damaging variants from those that are benign?
SVS makes this process easy with filtering by annotation tracks. Using gene tracks, you can filter variants outside of genes or exons, leaving only those in coding regions. Public database probe tracks, such as dbSNP 135 or 1000 Genomes, enable you to exclude variants considered common.
The db NS Functional Predictions track can be used to filter out variants that are predicted as tolerated or benign based on the following functional predictions: SIFT, PolyPhen2, MutationTaster, GERP++, PhyloP. One or all of these predictions can be used to measure how likely a variant is to be damaging and filter out those considered benign. Functional prediction filtering is especially helpful for targeted resequencing projects where you are trying to locate causal variants based on GWAS results. You can also use case-control or familial data to identify variants that are unique to affected individuals only.
SVS provides a wide array of quality assurance measures to ensure your data is of the highest quality and your results are optimized. In addition to standard quality assurance measures - such as SNP and sample filtering on call rate, cryptic relatedness, population stratification, Mendelian error detection, LD pruning, etc. - you can also screen for variants with poor read depths and other quality scores that come with your variant call files.
Variant map visualization provides a practical representation of large genotype spreadsheets in the context of sequencing variant analysis. With a quick glance at a variant map where variants can be colored by allele or any categorical variable (e.g. variant classification) researchers can immediately see areas where samples groups differ, indicating a possible site for further analysis. Adding annotation tracks further illustrates a complete picture of variants, helping you better understand the relevance of significant findings.
In rare variant analysis, it's hypothesized that rather than there being a single causal variant, multiple variants have a compound effect on the trait of interest, referred to as rare variant burden. Traditional single marker association techniques used in GWAS studies do not have the power to detect rare variants or provide tools for measuring their compound effect. To do this, it is necessary to “collapse” several variants into a single covariate based on regions such as genes.
SVS employs several collapsing methods that enable you to perform association testing with your sequence data. The simplest method creates a binary covariate per gene whereby each sample is assigned a one or zero based on the presence or absence of at least one rare variant in each gene. A slightly more sophisticated approach creates an integer covariate for each gene by counting the number of variants for a given sample in each gene. Using the software's powerful numeric association testing and regression analysis capabilities, you can then perform association testing with these gene-based covariates.
More advanced methods in SVS are Combination Multivariate and Collapsing (CMC) and Kernel Based Adaptive Cluster (KBAC) by Li and Leal. CMC first bins variants according to a criterion such as minor allele frequency, then collapses the variants within each bin, and finally performs multivariate testing on the counts across the various bins. KBAC differs from CMC in that both variant classification and association testing are unified into a single procedure. KBAC models the risk associated with multi-site genotypes rather than collapsing individual genotypes based on specified bins.
Both CMC and KBAC in SVS allow for quantitative phenotypes and the correction of covariates and confounders in permutation testing, resulting in less false positives. Using one of these approaches will give greater power to detect the significance of rarer variants.
Due to cost, most next-generation studies thus far have involved a relatively small number of samples compared to traditional GWAS studies. This makes it difficult to calculate in-sample minor allele frequency (MAF) to identify how rare a variant is. Variant Frequency Binning by MAF uses the MAFs of an external reference population to classify the variants in your own samples in terms of rarity.
Considering that the cost of sequencing may still be cost-prohibitive on a large scale, one unique approach is to combine next-generation sequencing with the latest custom microarray technologies to maximize your GWAS results. By sequencing a modest number of cases, you can use SVS to identify an enriched panel of variants that are "common" in cases, though rarer in the general population. You can then design a more affordable custom microarray panel based on these variants and run them on all samples in typical GWAS fashion.
Read more about combining NGS with microarrays on Our 2 SNPs® »