|Overview video »||Hitchhiker's Guide to NGS »|
Golden Helix is committed to helping you succeed.
All packages of SVS come with our bioinformatic experts at your fingertips. You can rest assured that you have someone to call when you get stuck or have a question.
Next-generation sequencing poses some unique data import and management challenges. Unlike microarrays where every sample is assayed for the same SNP set, next-generation sequencing generates variant calls unique to each sample. Most of these are considered rare and are not currently catalogued, which makes conventional data import and mapping difficult. SVS makes this easy with streamlined import and mapping of common and standardized formats, such as variant call files (VCF) or variant files from Complete Genomics. Furthermore, you can combine NGS data from multiple sources without having worry about file format compatibility. If your files contain read depth and quality scores, they will be imported automatically as well. From there, quality assurance, variant filtering, and analysis is as fast as ever.
Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frameshifting, etc). This gives insight into which variants are most likely to have functional effects.
After performing extensive quality assurance on your data, the next step is sorting through all your variants to find those that really matter. Though more manageable on a whole-exome scale, this process can be daunting. The 1000 Genomes Project has already catalogued more than 25 million variants, 18 million more than dbSNP 135 (the closest thing to a database for common variants). Each person is expected to have roughly 4 million variants, 20 thousand in coding regions, and 250-300 that are potentially damaging. How do you distinguish the relatively small number of damaging variants from those that are benign?
SVS makes this process easy with filtering by annotation tracks. Using gene tracks, you can filter variants outside of genes or exons, leaving only those in coding regions. Public database probe tracks, such as dbSNP 135 or 1000 Genomes, enable you to exclude variants considered common.
The db NS Functional Predictions track can be used to filter out variants that are predicted as tolerated or benign based on the following functional predictions: SIFT, PolyPhen2, MutationTaster, GERP++, PhyloP. One or all of these predictions can be used to measure how likely a variant is to be damaging and filter out those considered benign. Functional prediction filtering is especially helpful for targeted resequencing projects where you are trying to locate causal variants based on GWAS results. You can also use case-control or familial data to identify variants that are unique to affected individuals only.
SVS provides a wide array of quality assurance measures to ensure your data is of the highest quality and your results are optimized. In addition to standard quality assurance measures - such as SNP and sample filtering on call rate, cryptic relatedness, population stratification, Mendelian error detection, LD pruning, etc. - you can also screen for variants with poor read depths and other quality scores that come with your variant call files.
Variant map visualization provides a practical representation of large genotype spreadsheets in the context of sequencing variant analysis. With a quick glance at a variant map where variants can be colored by allele or any categorical variable (e.g. variant classification) researchers can immediately see areas where samples groups differ, indicating a possible site for further analysis. Adding annotation tracks further illustrates a complete picture of variants, helping you better understand the relevance of significant findings.
In rare variant analysis, it's hypothesized that rather than there being a single causal variant, multiple variants have a compound effect on the trait of interest, referred to as rare variant burden. Traditional single marker association techniques used in GWAS studies do not have the power to detect rare variants or provide tools for measuring their compound effect. To do this, it is necessary to “collapse” several variants into a single covariate based on regions such as genes.
SVS employs several collapsing methods that enable you to perform association testing with your sequence data. The simplest method creates a binary covariate per gene whereby each sample is assigned a one or zero based on the presence or absence of at least one rare variant in each gene. A slightly more sophisticated approach creates an integer covariate for each gene by counting the number of variants for a given sample in each gene. Using the software's powerful numeric association testing and regression analysis capabilities, you can then perform association testing with these gene-based covariates.
More advanced methods in SVS are Combination Multivariate and Collapsing (CMC) and Kernel Based Adaptive Cluster (KBAC) by Li and Leal. CMC first bins variants according to a criterion such as minor allele frequency, then collapses the variants within each bin, and finally performs multivariate testing on the counts across the various bins. KBAC differs from CMC in that both variant classification and association testing are unified into a single procedure. KBAC models the risk associated with multi-site genotypes rather than collapsing individual genotypes based on specified bins.
Both CMC and KBAC in SVS allow for quantitative phenotypes and the correction of covariates and confounders in permutation testing, resulting in less false positives. Using one of these approaches will give greater power to detect the significance of rarer variants.
SVS comes with an array of resources for learning and utilizing the software to get the most out of your data including tutorials, add-on scripts, example data and projects, and much more.
Due to cost, most next-generation studies thus far have involved a relatively small number of samples compared to traditional GWAS studies. This makes it difficult to calculate in-sample minor allele frequency (MAF) to identify how rare a variant is. Variant Frequency Binning by MAF uses the MAFs of an external reference population to classify the variants in your own samples in terms of rarity.
Considering that the cost of sequencing may still be cost-prohibitive on a large scale, one unique approach is to combine next-generation sequencing with the latest custom microarray technologies to maximize your GWAS results. By sequencing a modest number of cases, you can use SVS to identify an enriched panel of variants that are "common" in cases, though rarer in the general population. You can then design a more affordable custom microarray panel based on these variants and run them on all samples in typical GWAS fashion.
The core architecture of SVS has been designed to efficiently handle datasets of virtually any size and type on a desktop computer. SVS natively supports over 70 different file formats and over 40 export formats to streamline data management, ensuring you spend most of your time on the more important aspects of analysis.
Real-time spreadsheet manipulation, data editing, and enrichment help eliminate the hassles of working with large-scale, complex data. Easily combine multiple sample sets and data of different types, from different arrays, or even platforms. Further, an integrated spreadsheet editor facilitates data editing and transformations on a grand scale.
Genomic Build, Marker Map, and Annotation Management
SVS provides a robust set of tools for working with and managing genomic information. Easily switch among a wide-variety of supported species and genomic builds and apply genetic marker maps to ensure all analyses and visualizations are accurate based on correct genomic coordinates. Further, genomic annotations can be used to enrich analyses with visualization alongside data for greater context in the genome browser. SVS provides real-time network access to an expanding list of genomic annotations. You can also use your own custom annotations from private sources or public databases, such as UCSC, RefSeq, and Ensemble.
Python is a clear and powerful object-oriented programming language, comparable to Perl, Ruby, Scheme, or Java. SVS gives you fully programmatic access to most SVS functionality via a Python scripting interface enabling you to automate workflows, interoperate with other programs, and develop more robust data management and manipulation routines. Also included is the mature statistical and numeric packages of NumPy and SciPy giving you a broad base of standardized test statistics to add your own methods as well as the 2D plotting library, matplotlib, for generating a near limitless number of publication quality plots and other visualizations. More about Python Scripting in SVS »