SNP & Variation Suite | Genomic and Phenotypic Data Analytics

Features

SNP & Variation Suite is a powerful analytic tool created specifically to empower biologists and other researchers to easily perform complex analyses and visualizations on genomic and phenotypic data. With SVS you can focus on your research instead of learning to be a programmer or waiting in line for bioinformaticians.

Numeric Analysis Methods

Principal component analysis for integer or quantitative data
Wave detection/correction
Matched pairs T-Test
Fisher's Exact Test for binary predictors and a binary dependent variable
Derivative log ratio spread
Percentile-based Winsorizing
Segmentation of log-ratio data to detect copy number regions
Standard sample statistics to summarize columns or rows of data

Data Management

Display p-value results, raw data and annotation sources all in the same view
Natural pan and zoom controls quickly allow you to zero in on a region of interest
A smart labeling system balances clarity with information density

Support & Extensibility

Technical manual with methods fully documented and explained
Customer support available by phone and e-mail
Training available on live web demonstrations

Genomic VISUALIZATION

Efficiently handle micro-array and whole-exome data for thousands of samples on a desktop computer
Scales to whole-genome and imputed datasets

Call CNVs on Large-N Workflows

Call Copy Number Variants on targeted gene panel and exome NGS coverage data
Output for use on association studies and other cohort analysis

SVS-CVS (Clinical Variant Scoring)

Splice Site Predictions around existing splice sites
Novel Splice Site Predictions for exonic and intronic variants not near splice sites
SIFT/Polyphen2 functional predictions precomputed for all missense mutations
PhyloP/GERP conservation scores computable for any genomic variant using a 100-species multiple sequence alignment

Use Cases

GWAS

GWAS continues to be an effective method for identifying disease susceptible genes in humans and other organisms. SNP & Variation Suite empowers users to run basic and advanced SNP analyses, incorporating a number of intuitive workflows to lead you beyond single marker associations.

Powerful Genotype Association Testing and Statistics
VS offers a powerful and straightforward way of testing for genotypic association against either dichotomous or quantitative traits using one or more statistical measures under any one of several genetic model assumptions. These tests can be run individually or simultaneously while also correcting for stratification and applying multiple testing corrections (including permutation testing).
Meta-Analysis
Meta-Analysis takes the results of two or more GWAS studies for multiple SNPs or markers, and standard meta-analysis statistics are then performed on each SNP and the results compiled into one spreadsheet. SVS can perform meta-analysis on results created within the SVS software or from third-party software programs or a combination of the two. Results for a fixed-effects model, random-effects model and tests for heterogeneity between studies are automatically computed for every meta-analysis performed.
Linkage Disequilibrium and Haplotype Analysis
Interactively explore linkage disequilibrium (LD) and haplotypes in an innovative and powerful interface. You can view LD plots from one or more populations and explore them side-by-side with association results. For haplotype analysis it is easy to define and modify haplotype blocks from an LD plot or spreadsheet, compute haplotype and diplotype frequency tables, and perform a number of haplotype association tests and trend regression, including per-block and per-haplotype methods.
Regression Analysis
SVS incorporates advanced regression technologies that enable you to perform linear and logistic regression, stepwise regression (both backward elimination and forward selection), gene by environment interatction regression, and permutation tests with numeric variables and recoded genotypes. You can use a moving window along with numeric or categorical covariates, against a single dependent variable. Regressions may either be performed with all variables and covariates together ("full model") or with some of the covariates grouped into a "reduced model" (yielding a full-vs-reduced model p-value).

Genomic Prediction

From obtaining allele substitution values to building predictive models, SNP & Variation Suite has all the tools for genomic prediction and visualization. Compare and contrast results using the available methods or pick your favorite method. Covariates can be included in every analysis and X-Chromosome correction is also available. SVS simplifies the entire genomic prediction process from data management to model building to visualization.

Genomic Prediction Methods
Methods available in SVS include Genomic Best Linear Unbiased Predictors (GBLUP), Bayes C and Bayes C-Pi. These tools create and find a solution to, or an approximate solution to, one or more sets of mixed linear model equations. The genomic information from the samples is included in every model to obtain a "genomic prediction". Given the available dataset, genomic prediction methods can be used to build a prediction model that explains the association between the genotypes (genetic data) and the phenotype information best. This model can then be used in research to better understand the phenotype, and in commercial applications to improve decision making.
K-Fold Cross Validation
Automatically build training and validation sets within SVS using K-Fold Cross Validation. Account for stratification when picking the samples for each set to ensure balanced sets to obtain the best prediction models. Then run genomic prediction for one or more genomic prediction methods directly from K-Fold Cross Validation to save time and mouse clicks. SVS's K-Fold Cross Validation will also ensure major and minor alleles are consistently encoded through each data subset to ensure consistent direction of effect.
Applying a Prediction Model to New Data
After building a model, apply it to a new dataset to predict the phenotype. If the phenotype values are known this can be used to validate the model. If unknown, this can be used to make decisions based on the genetic data for the samples without phenotype information based on the samples used to build the prediction model. SVS automatically adjusts for strand information to ensure consistent direction of effect between the model used for prediction and the dataset the model is applied to.
Visulization
Visualize the predicted versus actual phenotypes in a cluster plot to gauge the accuracy of the prediction model. Getting to a scatter plot with a trend line is straightforward and you can color the data points by any covariates or by a stratifying variable. The normalized log-transformed allele substitution values are genomic data and as with all genomic data in SNP & Variation Suite, plotting these values with GenomeBrowse provides you with the genomic context to interpret the markers with the largest influence in the prediction model to interpret key genes. Our live-streaming annotation repository as well as custom annotations for dozens of species can help decipher the significance of any results in the context of your research.

Imputation

Impute missing or incomplete genotypes in your GWAS workflows with SVS's adaptation of the mature BEAGLE 4.1 algorithm that is designed to scale to tens of thousands of samples and whole genome sequencing variation density.

Human & Animal Genomics
If you are studying human populations, we provide publicly available subsets of pre-phased 1000 Genomes phased genotypes subsetted down to useful frequencies to be used for imputation:

5% allele frequency or greater (8.5 million variants)
1% allele frequency or greater (14.2 million variants)
Allele count greater than 20 (~0.4% with 19.5 million variants)

System Requirements
The imputation capability is provided as part of an SVS Server license.The recommended minimum machine requirement to run SVS on a server with imputation is an 8 core machine with 16GB of RAM. The imputation program is multi-threaded and automatically detects the number of available CPU cores. Runtime is directly correlated to the number of CPU cores and so large impute jobs will benefit from having as many CPU cores as possible on a single server.

Large Sample DNA-Sequencing Analysis

SNP & Variation Suite includes rare variant analysis tools with region-based collapsing methods for whole-genome and whole-exome DNA next-generation sequencing. For the first time, in a single, integrated desktop solution, you can perform standard variant association workflows for quality assurance and association analysis on hundreds to millions of common and rare variants for thousands of samples.

Data Import and Management
Next-generation sequencing poses some unique data import and management challenges. Unlike microarrays where every sample is assayed for the same SNP set, next-generation sequencing generates variant calls unique to each sample. Most of these are considered rare and are not currently cataloged, which makes conventional data import and mapping difficult. SVS makes this easy with streamlined import and mapping of common and standardized formats such as variant call files (VCF) version 4.0 and higher. Furthermore, you can combine NGS data from multiple sources without having worry about file format compatibility. If your files contain read depth and quality scores, they can be imported as well. From there, quality assurance, variant filtering, and analysis is as fast as ever.
Variant Classification
Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frame-shifting, etc). This gives insight into which variants are most likely to have functional effects.
Quality Assurance
SVS provides a wide array of quality assurance measures to ensure your data is of the highest quality and your results are accurate. Standard quality assurance measures for small sample or small family exome or whole genome workflows are supported. Including screening out variants with poor read depths and other quality scores from the variant call files, presence (or absence) in public annotation databases, minor allele (alternate allele frequency) filtering based on public catalogs, and having an effect on the protein coding. For small families, Mendelian error detection is also available.
Variant Filtering
After performing extensive quality assurance on your data, the next step is sorting through all your variants to find those that really matter. Though more manageable on a whole-exome scale, this process can be daunting. SVS makes this process easy with filtering by annotation tracks. Using gene tracks, you can filter variants outside of genes or exons, leaving only those in coding regions. Public database probe tracks, such as dbSNP, 1000 Genomes, NHLBI ESP6500 Exomes, and ClinVar enable you to exclude variants considered common. The dbNSFP NS Functional Predictions track can be used to filter out variants that are predicted as tolerated or benign based on the following functional predictions: SIFT, PolyPhen2, MutationTaster, GERP++, PhyloP and more. One or all of these predictions can be used to measure how likely a variant is to be damaging and filter out those considered benign. Functional prediction filtering is especially helpful for targeted resequencing projects where you are trying to locate causal variants based on GWAS results. You can also use case-control or familial data to identify variants that are unique to affected individuals only.
Rare Variant Burden and Association Testing
SVS employs several collapsing methods that enable you to perform association testing with your sequence data. The simplest method creates a binary covariate per gene whereby each sample is assigned a one or zero based on the presence or absence of at least one rare variant in each gene. A slightly more sophisticated approach creates an integer covariate for each gene by counting the number of variants for a given sample in each gene. Using the software's powerful numeric association testing and regression analysis capabilities, you can then perform association testing with these gene-based covariates.
Variant Frequency by MAF
Due to cost, most next-generation studies thus far have involved a relatively small number of samples compared to traditional GWAS studies. This makes it difficult to calculate in-sample minor allele frequency (MAF) to identify how rare a variant is. Variant Frequency Binning by MAF uses the MAFs of an external reference population to classify the variants in your own samples in terms of rarity.

RNA-Sequencing Analysis

SNP & Variation Suite offers advanced analysis tools designed to perform differential expression workflows for RNA expression profiling experiments. Regardless of the upstream secondary analysis tool used to align and quantify reads into weighted counts, SVS provides all the data normalization, differential expression, and visualization techniques needed to be able to conduct RNA sequencing analysis quickly and easily, giving you everything you might expect from expression microarrays.

DESeq Analysis
Taking advantage of analysis techniques developed by Anders and Huber 2010, the DESeq tool is designed to estimate variance-mean dependence in count data and test for differential expression between types using a model based on the negative binomial distribution. DESeq in SVS not only calculates the mean values from your genes or transcripts for each group, but also detects the squared coefficient of variation (SCV). This approach helps to recognize those transcripts with the highest consistency by providing p-values and fold change between each study group while filtering out erratic variations found within certain transcripts.
Normalization and Log Transformation
Various aspects of the RNA-Seq sample preparation and sequencing process can result in extremely high variance of read counts within a sample and between a sample, even when each sample is sequenced with the same target depth. While DESeq has a built in normalization method, you can also normalize your data as outlined by Bullard et al. 2010. This normalized data can then be used in PCA analysis to see if your biological factors are driving the primary principle components or to run association analysis with some of our many supported statistical tests such as T-Test and regression with optional covariates.
Visualization
Advanced visualization can be used to interpret the analysis of your RNA-seq differential expression. Getting to a standard volcano plot showing p-values versus fold-change is a cinch. And you can interactively set thresholds on the data and see what genes show statistical significance and large-magnitude count differences. Your top genes in their normalized form are output from DESeq and can be hierarchically clustered and plotted in a heatmap. The dendrogram on both the sample and gene axes provide clear feedback that the undirected clustering followed the biological grouping and the statistic test provided genes with stark differences in expression between groups.

Small Sample DNA-Seq Workflows

SNP & Variation Suite delivers the most powerful rare variant filtering workflows with the latest annotation sources and GenomeBrowse visualization. For the first time, in a single, integrated desktop solution, you can interactively filter hundreds to millions of common and rare variants down to a handful of potentially pathogenic variants.

Data Import and Management
Next-generation sequencing poses some unique data import and management challenges. Unlike microarrays where every sample is assayed for the same SNP set, next-generation sequencing generates variant calls unique to each sample. Most of these are considered rare and are not currently cataloged, which makes conventional data import and mapping difficult. SVS makes this easy with streamlined import and mapping of common and standardized formats such as variant call files (VCF) version 4.0 and higher. Furthermore, you can combine NGS data from multiple sources without having worry about file format compatibility. If your files contain read depth and quality scores, they can be imported as well. From there, quality assurance, variant filtering, and analysis is as fast as ever.
Quality Assurance
SVS provides a wide array of quality assurance measures to ensure your data is of the highest quality and your results are accurate. Standard quality assurance measures for small sample or small family exome or whole genome workflows are supported. Including screening out variants with poor read depths and other quality scores from the variant call files, presence (or absence) in public annotation databases, minor allele (alternate allele frequency) filtering based on public catalogs, and having an effect on the protein coding. For small families, Mendelian error detection is also available.
Variant Classification
Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frame-shifting, etc). This gives insight into which variants are most likely to have functional effects.
Variant Filtering
After performing extensive quality assurance on your data, the next step is sorting through all your variants to find those that really matter. Though more manageable on a whole-exome scale, this process can be daunting. The 1000 Genomes Project has already cataloged more than 25 million variants, 18 million more than dbSNP 135 (the closest thing to a database for common variants). Each person is expected to have roughly 4 million variants, 20 thousand in coding regions, and 250-300 that are potentially damaging. How do you distinguish the relatively small number of damaging variants from those that are benign? SVS makes this process easy with filtering by annotation tracks. Using gene tracks, you can filter variants outside of genes or exons, leaving only those in coding regions. Public database probe tracks, such as dbSNP, 1000 Genomes, NHLBI ESP6500 Exomes, and ClinVar enable you to exclude variants considered common. The dbNSFP NS Functional Predictions track can be used to filter out variants that are predicted as tolerated or benign based on the following functional predictions: SIFT, PolyPhen2, MutationTaster, GERP++, PhyloP and more. One or all of these predictions can be used to measure how likely a variant is to be damaging and filter out those considered benign. Functional prediction filtering is especially helpful for targeted resequencing projects where you are trying to locate causal variants based on GWAS results. You can also use case-control or familial data to identify variants that are unique to affected individuals only.

Normalization and Log Transformation
Various aspects of the RNA-Seq sample preparation and sequencing process can result in extremely high variance of read counts within a sample and between a sample, even when each sample is sequenced with the same target depth. While DESeq has a built in normalization method, you can also normalize your data as outlined by Bullard et al. 2010. This normalized data can then be used in PCA analysis to see if your biological factors are driving the primary principle components or to run association analysis with some of our many supported statistical tests such as T-Test and regression with optional covariates.
Visualization
Advanced visualization can be used to interpret the analysis of your RNA-seq differential expression. Getting to a standard volcano plot showing p-values versus fold-change is a cinch. And you can interactively set thresholds on the data and see what genes show statistical significance and large-magnitude count differences. Your top genes in their normalized form are output from DESeq and can be hierarchically clustered and plotted in a heatmap. The dendrogram on both the sample and gene axes provide clear feedback that the undirected clustering followed the biological grouping and the statistic test provided genes with stark differences in expression between groups.

Copy Number Analysis

SNP & Variation Suite offers a complete set of tools for processing raw intensity data, identifying regions of copy number variation (CNV), visualizing copy number data, and performing association analyses on a variety of copy number covariates. From cytogenetic research to genome-wide copy number association from microarrays, SVS delivers a powerful toolset for correlating common and rare chromosomal aberrations with a disease.

Data Processing
SVS offers direct import of log ratio data from a number of providers including Affymetrix, Agilent, NimbleGen, and Illumina. For Affymetrix CEL files (500K, 5.0, and 6.0), a powerful processing tool enables you to run quantile normalization on the A and B probe intensities, including virtual array generation to merge CN and SNP probes or multiple arrays (e.g. NSP and STY). This process scales to thousands of samples and can use any sample set as a reference.
CNV Association Testing
A number of covariate generation procedures enable you to perform association testing on raw or PCA-corrected log ratios, CNV segment means, and discretized values based on three- and two-state models representing loss, neutral, and gain. Perform numeric association tests or advanced linear and logistic regression with CNV covariates alone or in combination with other genetic markers and phenotypic variables.
Copy Number Detection with Optimal Segmenting
SVS employs a powerful optimal segmenting algorithm called Copy Number Analysis Method (CNAM) using dynamic programming to detect inherited and de novo CNVs on a per-sample (univariate) and multi-sample (multivariate) basis. Unlike Hidden Markov Models, which assume the means of different copy number states are consistent, optimal segmenting properly delineates CNV boundaries in the presence of mosaicism, even at a single probe level, and with controllable sensitivity and false discovery rate. Optimal segmenting incorporates a parallelized, unbiased randomization permutation procedure that uses all available cores on your computer. The permutation procedure replaces a na?ve, potentially biased randomization procedure with the unbiased Fisher and Yates method (also known as the Knuth shuffle). An added option allows you to further refine your segments by efficiently removing univariate outliers during the segmentation process.
Detecting and Correcting for Plate/Batch Effects, Genomic Waves, and other Quality Issues
For both micro-array and aCGH data, significant bias can be introduced by batch effects (plate, machine, and site variation), genomics waves, and population stratification. Other sources of variation include sample extraction and preparation procedures, cell types, temperature fluctuation, and even ambient ozone levels in a lab. These can lead to complications ranging from poorly defined segments to false and non-replicable findings. SVS offers a number of tools to not only detect for these data quality problems but correct for them as well.

Recommended Learning Materials

We have a variety of supplemental learning materials that are an excellent resource for anyone interested in the industry or our software solutions. Here are some of our recommended materials for you to check out related to SVS!

eBooks

Check out our eBooks on a variety of exciting topics.

Webcasts

Watch an informative webcast showing SVS in action!

Other Resources

Explore a GWAS project in SVS or follow along with a tutorial!

SVS Viewer:
Download Here

Introduction to SVS:
Download Here

Contact Sales

Our sales team is ready to set you up on an SVS license! Contact us today to begin analyzing your data with the various features of SVS.

Please fill out the form below, and we will send you the details!

Technical Specifications

SVS is on-premises software, ensuring full control over installation and data management. It is compatible with various deployment environments including workstations, server setups with remote desktop access, and private cloud servers.

The software is optimized for operation within strict corporate firewalls. It seamlessly integrates with existing web proxy configurations, ensuring uninterrupted functionality in secured network infrastructures. SVS's internet connectivity requirements are minimal. It only needs to connect to a select group of Golden Helix servers. This connection is essential for license verification and accessing annotation data updates.

See System Requirements for more details of hardware and operating systems requirements based on planned workflows.

SVS is a powerful analytic tool created specifically to empower biologists and other researchers to easily perform complex analyses and visualizations on genomic and phenotypic data.

Features

Numeric Analysis Methods

Data Management

Support & Extensibility

Genomic VISUALIZATION

Call CNVs on Large-N Workflows

SVS-CVS (Clinical Variant Scoring)

Use Cases

GWAS

Genomic Prediction

Imputation

Large Sample DNA-Sequencing Analysis

RNA-Sequencing Analysis

Small Sample DNA-Seq Workflows

Copy Number Analysis

Recommended Learning Materials

eBooks

Webcasts

Other Resources

Contact Sales

Technical Specifications

Related Products

VarSeq

GenomeBrowse

SVS is a powerful analytic tool created specifically to empower biologists and other researchers to easily perform complex analyses and visualizations on genomic and phenotypic data.

Features

Numeric Analysis Methods

Data Management

Support & Extensibility

Genomic VISUALIZATION

Call CNVs on Large-N Workflows

SVS-CVS (Clinical Variant Scoring)

Use Cases

GWAS +

Genomic Prediction +

Imputation +

Large Sample DNA-Sequencing Analysis +

RNA-Sequencing Analysis +

Small Sample DNA-Seq Workflows +

Copy Number Analysis +

Recommended Learning Materials

eBooks

Webcasts

Other Resources

Contact Sales

Technical Specifications

Related Products

VarSeq

GenomeBrowse

GWAS

Genomic Prediction

Imputation

Large Sample DNA-Sequencing Analysis

RNA-Sequencing Analysis

Small Sample DNA-Seq Workflows

Copy Number Analysis