Never Let the Important Become Urgent: A reflection on the genetics supply chain and our need to increase value to the end patient
» Read blog post
The fifth installment of SNP & Variation Suite (SVS) 7 fills out the Sequence Analysis Module premiered in version 7.4, giving you more ways to explore and analyze your NGS data to identify variants that matter. The Genome Browser has also received a lot of attention, including the addition of a global and track-based search feature and the ability to immediately visualize the differences among two or more groups by grouping, filtering, and splitting graphs. Enjoy!
SVS 7.4 brought you an entirely new Sequence Analysis Module with the latest advances in tertiary or "sense making" analysis methods for whole-genome and whole-exome DNA next-generation sequencing. Version 7.5 makes it even better.
While version 7.4 introduced the ability to import VCF files standardized by the 1000 Genomes project, SVS 7.5 can now more efficiently import a wider variety of VCF files as well as variant files from Complete Genomics. Furthermore, the import tool has been overhauled to allow for the combination of file formats from various sources without having to worry about compatibility.
After importing, you may want to use the new bi-allelic expansion tool to encode multi-allelic variants (variants with two or more alternate alleles present at a given locus) as multiple bi-allelic variants. Because the mature workflows developed for microarray based genotype analysis only support bi-allelic columns, multi-allelic variants would be ignored or dropped in the past.
SVS can now expand the genotype columns into a single column for each alternate allele present in all samples at a given locus, allowing for more comprehensive analysis of NGS data.
Variant map visualization provides a practical representation of large genotype spreadsheets in the context of sequencing variant analysis. With a quick glance at a variant map where variants can be colored by allele or any categorical variable (e.g. variant classification) researchers can immediately see areas where samples groups differ, indicating a possible site for further analysis. Adding annotation tracks further illustrates a complete picture of variants, helping you better understand the relevance of significant findings.
Similar to the ANNOVAR program, variant classification examines the interactions between variants and gene transcripts in order to classify variants based on their potential effect on genes. Variants are classified according to their position in relation to gene transcripts (intronic, exonic, utr5, etc). In addition, variants in coding exons are further classified according to their effect on the gene's protein sequence (synonymous, non-synonymous, frameshifting, etc). This gives insight into which variants are most likely to have functional effects.
Combination Multivariate and Collapsing (CMC) was introduced in SVS in February as an advanced method for analyzing NGS data. Li and Leal's second method, Kernel Based Adaptive Clustering (KBAC), has now also been added. KBAC differs from CMC in that both variant classification and association testing are unified into a single procedure. KBAC models the risk associated with multi-site genotypes rather than collapsing individual genotypes based on specified bins.
But we took them both one step further. Working in conjunction with Baylor College of Medicine, the Golden Helix product team implemented both CMC and KBAC in SVS in a regression framework, allowing for quantitative phenotypes and the correction of covariates and confounders. These new versions are more powerful and result in less false positives through the use of permutation testing. Using either of these approaches will give greater power to detect the significance of rarer variants.
» More about Next-Generation Sequencing
A cornerstone of SVS, the Genome Browser received two substantial upgrades in version 7.5 as well as the variant maps discussed above.
In the age of Google, users now demand powerful searching capabilities in every tool they use. Responding to this cry, our engineers built a powerful search engine into the ever-more-powerful Genome Browser. Search for your favorite gene, and you're right there. Search for an RS id, and you're right there. Search for a reference in an annotation track, and you're right there. Navigation has never been so fun.
With SVS 7.5, users will be able to group and filter variables dynamically "on the fly" to look at multiple dimensions of specificity and perform quality assurance on their data. While users can currently filter in the Genome Browser, it is limited to one variable at a time. Version 7.5 eliminates this constraint to empower visualization to the nth level, saving you time and allowing for a more exploratory experience as you can cluster, sort, categorize, and dig in based on the results you are seeing in real time.

With over 30 new features, the fourth installment of SNP & Variation Suite 7 empowers you to explore your data as never before. Identify rare causal variants with a new Sequence Analysis Module. Improve your GWAS and CNV results with state-of-the-art workflows. Dramatically increase your productivity. And gain greater insights with the most advanced genome browser. This is just a taste of what you'll discover on your way to more impactful research. Enjoy!
| Click to see what's new | ||||
![]() |
![]() |
![]() |
![]() |
![]() |
| DNA next-gen sequencing |
New state-of-the-art GWAS & CNV quality | Genome browser and annotation tracks | GPU accelerated copy number analysis |
On-demand advanced method development |
INTRODUCING THE FIRST INTEGRATED SOLUTION FOR DNA NEXT-GEN SEQUENCING ANALYSIS
SVS 7.4 brings you an entirely new Sequence Analysis Module with the latest advances in tertiary or "sense making" analysis methods for whole-genome and whole-exome DNA next-generation sequencing. For the first time in a single, integrated desktop solution, you will be able to quickly analyze millions of common and rare variants from tens to thousands of samples to assess their impact on inherited traits.
Zoomed in view of the human reference genome and SIFT Prediction track.
Targeted resequencing, whole-exome, or whole-genome. It doesn't matter. With SVS you can effeciently manage, analyze, and interactively explore millions of variants for thousands of samples.
SVS makes importing variant calls easy with streamlined import and mapping of the most common and standardized formats, such as variant call files (VCF), and SoapSNP from the Beijing Genome Institute.
Using genomic annotation tracks such as dbSNP, SIFT, 1000 Genomes, and more, SVS enables you to quickly and easily sift through millions of variants to filter out those that are common, benign, poorly covered, or don’t matter for your study. You can also use case-control or familial data to identify variants that are unique to affected individuals only.
SVS gives you the power to assess the impact of rare variants on your trait of interest when traditional association techniques don't apply. Find genes or regions with an abundance of variants in your sample set. Classify the rarity of variants when your sample size is too small to calculate in-sample minor allele frequency. Assess rare variant burden using powerful collapsing and association methods, including the Combined Multivariate Collapsing (CMC) method from Li and Leal. And understand the contribution of rare variants with functional prediction.
» More about Next-Generation Sequencing
Laurie, et. al. from Bruce Weir's group at the University of Washington recently published a definitive paper on quality control and quality assurance methods in genome-wide association studies. We challenged ourselves to provide you with every method they covered. We did that and then some. In addition to the already comprehensive quality assurance procedures available in SVS, here's what's new in 7.4.
Heat map of Identity by Descent matrix sorted by bovine breed.
Related individuals wreak havoc on association tests where independence is assumed. Identity by descent (right) and inbreeding coefficient calculations help you control for unknown or cryptic relatedness in your samples.
To obtain better results when running certain tests you can quickly filter (prune) correlated markers prior to analysis.
Identifying outliers in autosome heterozygosity helps detect contaminated DNA samples (and population stratification in some cases).
Several new methods make it easy to verify that a sample’s reported gender is consistent with its inferred gender. These include X chromosome heterozygosity on genotypes, plotting X versus Y intensity values and averaging log ratio values of the X chromosome (especially helpful for identifying gender anomalies).
PCA plot of study population with outliers identified in green based on
multidimensional outlier detection.
Calculating the inter-quartile range (IQR) of a numeric distribution is useful for determining outliers for many quality assurance measurements.
An extension of quartile summary statistics, you can use this feature to identify outliers on multiple dimensions, such as samples whose ethnicity does not match that of your study population when examining two or more principal components.
Derivative log ratio spread (DLRS) is a measurement of point-to-point consistency or noisiness in log ratio (LR) data. It correlates with low call rates and over/under abundance of identified copy number segments. Samples with higher values of DLRS tend to have poor signal-to-noise properties and are good candidates to exclude from analysis.
Detecting large chromosomal aberrations is both a quality assurance step and an analysis step. For example, by averaging log ratios across all autosomal chromosomes you can quickly detect cell line artifacts. But you may also be able to detect large aberrations that are instrumental in detecting disease causing loci.
Comparing "good" log ratios (top) versus log ratios with a wave effect (bottom).
Genomic waves are ubiquitous in copy number data and can cause inaccuracies with any copy number detection algorithm. SVS employs the Diskin, et. al., 2008 method to help you both detect and correct for genomic waves.
Percentile-based winsorizing can be used to prevent segmentation algorithms from being driven by outlier values, resulting in a more accurate determination of regions of copy number variation.
"I have a Dell T7500 workstation with 2x Intel x5570 processors and 48GB RAM running ubuntu 10.04 64-bit. Using hyperthreading, 16 CPU cores are available, which I have used for CNAM in the past. Recently I installed a Tesla c2050 GPGPU card and had the chance to run a small comparison on Affymetrix Cytogenetics 2.7M arrays for one chromosome.
The speed up is about 25x, which is quite satisfying. SVS 7 achieves a 99% GPU usage and so makes excellent use of the hardware. The potential time savings are simply invaluable."
- K. Duesing, Research Scientist at Australian Commonwealth Scientific and Research Organization
Major enhancements to key copy number analysis workflows help you get the most accurate and informative results significantly faster than before.
By using your computer’s video graphics card, which acts like a mini compute cluster for a fraction of the price, CNV segmenting that used to take hours or days can now be completed in minutes - without compromising accuracy. Internal benchmark tests have shown 5-20x speed increases for univariate segmenting using a GPU over the CPU. Even more exciting is the 10-100x speed increase for multivariate segmenting, which admittedly, was nearly impossible to use before.
Let's face it. Importing Affymetrix CEL files in the past was a pain. Not so anymore. We have completely revamped Affymetrix CEL file import to be much more streamlined and versatile. You can now easily select all samples as the reference without building a reference spreadsheet. You can also choose to use pre-computed HapMap populations as references. Based on the type of CEL files you're importing SVS will also automatically identify the proper marker map and annotation files you need. If you don't have them, it will automatically download them for you. And for downstream analysis you have more flexibility in choosing the type of data you can import.
The Affymetrix Cytogenetic Whole Genome 2.7M Array is now fully supported, with enhanced CEL and CYCHP import, downloadable marker maps and library files, and access to pre-computed normalization data built on 485 samples so that you can normalize log ratios on a sample-by-sample basis.
Heat map of univariate segmentation results.
A number of new methods and enhancements are also available once you segment your log ratio data. You can discretize your segment covariates and segment list spreadsheets to categorize segment means into two or three state models. This helps magnify small, statistically significant differences between cases and controls and reduces the influence of outliers. You can also assess the overabundance of segments per sample. An unusually large number of segments is often indicative of data quality problems such as wave effects.
After the successful launch of the SVS Genome Browser in v7.3, we immediately began making it more powerful and flexible. Although we have worked hard to provide you with a bountiful set of public reference data through our on-demand network track feature, we realized that we needed to put the power into your hands to convert, create, and visualize any potential annotation information that can help you understand your data.
Annotation track manager with import from Wiggle file selected.
You can now easily customize the genome browser with annotation tracks that matter to you. Support for 2Bit, Wiggle, FASTA, and tabular files enables you to import your own custom annotations or tables from popular online databases such as UCSC, RefSeq, and dbGaP. You can also create any type of annotation from an SVS spreadsheet or download network annotation tracks from Golden Helix and store them locally for speed and efficiency.
You now have immediate access to several new annotation tracks from our network server including probe tracks for dbSNP builds 129, 130, and 131, SNPs catalogued from the 1000 Genomes project, miRNA, and Affymetrix MIP and Cytogenetics array annotations. For rare variant analysis a SIFT track is also available with predictions of how likely a mutation at a given loci is damaging.
Whether you're studying human genetics on newer or older builds, or one of many plant and animal species, you can now set the default genome so that you don't have to switch the build every time you open a plot. You can also set default annotation tracks to appear every time a genome browser is opened.
Now included with Python in SVS are the mature statistical and numeric methods packages of NumPy and SciPy, giving SVS a broad base of standardized test statistics and linear algebra. Now both you and our own bioinformaticians supporting you can quickly adapt methods and build custom analyses to solve any unique challenges you encounter. Combined with the powerful interactive features of SVS, Python scripts using these packages are first class features with polished interfaces, interactions and logging support. In fact, the Combined Multivariate and Collapsing method was entirely developed in Python!
Several enhancements and new additions will make SVS easier to use and learn. Download complete projects to help learn new analysis tricks and plotting techniques. Access pre-processed public data such as the 1K Genomes and HapMap to use as references. And easily download a full assortment of Affymetrix and Illumina marker maps. You'll also find a redesigned Regression Analysis window that makes it more intuitive as well as some handy dimension data at the top of every spreadsheet so you always know exactly how many sample and variables are represented without having to scroll.

Fully Integrated and Interactive Genome BrowserStatic genome browsers are a thing of the past.SVS 7.3 delivers fast, exploratory analysis of your data and genomic annotations simultaneously in a single, coherent view. Real-time network access to an » More about the Genome Browser Faster, More Powerful Runs of Homozygosity AnalysisComparing regions of the genome where long stretches of homozygous markers (Runs of Homozygosity) are present or absent, can help identify rare variants involved in recessive, pentrant disorders. SVS 7.3 delivers a faster ROH algorithm with more control over parameters, allowing the detection of longer, more biologically meaningful runs.
Enhanced Data Support for Copy Number and Cytogenetic ResearchSVS now offers a full suite of copy number and cytogenetic research tools for all major aCGH and SNP microarray platforms, including Affymetrix, Agilent, Nimblegen and Illumina. New in SVS 7.3 is streamlined import of Nimblegen Data Summary Files and Affymetrix's Cytogenetics Whole-Genome 2.7M and Molecular Inversion Probe (MIP) Arrays. |
Non-Human Genomes
Enhanced Plotting Controls
Creating captivating visualizations just got a whole lot easier. SVS 7.3 offers more control over how images are displayed, saved, and shared as well as providing the ability to add as many graphs to a single view from any data source in your project without having to first merge spreadsheets. Combined with annotation tracks from the new Genome Browser, the views you create are sure to make your colleagues jealous. Accelerated and Enhanced PBAT Analysis
With SVS 7.3 we continue our dedication to working collaboritively with Dr. Christoph Lange of Harvard University School of Public Health to deliver the fastest, most powerful version of PBAT yet. Enhancements include accelerated performance, less restrictive parameters, and more options for family-based association testing. |
For a complete list of improvements and bug fixes in v7.3 see the Release Notes section of the SNP & Variation Suite Manual.

SVS now provides integrated tools for the design and analysis of family-based association studies through an exclusive version of the PBAT software package developed by Dr. Christoph Lange of Harvard University's School of Public Health. PBAT incorporates virtually all of the features of the FBAT package also released by Harvard but also provides many additional options for designing association/linkage studies and analyzing data with multiple continuous traits.
» More about Golden Helix PBAT
The latest version of PBAT incorporates a novel test that assesses the genotyping quality of individual probands in family-based association studies. Published in PLoS Genetics [Fardo, 2009] these tests are “ideally suited as the final layer of quality control filters in the cleaning process of genome-wide association studies." You can also assess Mendelian errors, Hardy-Weinberg Equilibrium and Call Rates per Marker.
» More about Family-Based QC in PLos Genetics
A new plotting option enables you to generate heat maps – two-dimensional intensity plots of numeric values – from a spreadsheet. Heat maps are useful for identifying non-random patterns in your data. In addition to other applications, they can be helpful in identifying samples, or groups of samples, with copy number losses and gains. Heat maps can also be plotted alongside other numeric plots (e.g. p-values, CNV segmentation results) as well as LD plots.
Also included in the latest version is a global sample test to detect departures from Hardy-Weinberg Equilibrium within a single proband or case in a population based-association study. This test is especially valuable for genome-wide association studies.
Plots can now be more easily customized for publication, printing, and outputting to PDF with new print and image preview capabilities. Increase the scale and quality of an image, include Full Domain and Genome Track views, save to a variety of graphic formats and more.
» More about Saving and Printing Graphs
For a complete list of improvements and bug fixes in v7.1 see the Release Notes section of the SNP & Variation Suite Manual.

Interactively explore LD and haplotype analysis in an innovative and powerful new interface. You can view LD plots from one or more populations and explore them side-by-side with association results. For haplotype analysis it is easy to define and modify haplotype blocks from an LD plot or spreadsheet, compute haplotype and diplotype frequency tables, and perform a number of haplotype association tests, including per-block and per-haplotype methods.
» More about LD and Haplotype Analysis
Achieve better precision and accelerated speed for detecting copy number variation. CNAM Optimal Segmenting now incorporates a new parallelized, unbiased randomization permutation procedure that uses all available cores on your computer. The new permutation procedure replaces a naïve, potentially biased randomization procedure with the unbiased Fisher and Yates method (also known as the Knuth shuffle). An added option allows you to further refine your segments by efficiently removing univariate outliers during the segmentation process.
» More about CNAM Optimal Segmenting
The time required for iterative use of Principal Component Analysis (PCA) has been significantly reduced by enabling the “recycling” of pre-computed principal components. This lets you run PCA once and then reuse the principal components in subsequent analyses instead of performing the time-consuming computation each time. Further, new data centering options, by marker and by sample, are now available for numeric data values (such as log ratios), improving the calculation of and correction for principal components.
» More about Principal Component Analysis
Support for importing and exporting PED, TPED, and BED file formats makes it easy to move your data back and forth between SVS and other genetic analysis.
» More about Supported File Types
For a variety of applications, such as imputation and meta-analysis, it is important that two or more datasets represent alleles from the same strand for a given set of markers. Marker maps for Affymetrix and Illumina data (when exporting as Golden Helix DSF from BeadStudio) now include fields for top and bottom strand alleles. This enables you to transcode all genotypic markers from the AB to ACGT formats based on one or the other strands, ensuring consistency among two or more datasets.
» More about Genetic Marker Maps
Regression results are now more informative with several new regression outputs added to the results spreadsheet (when regressing once on each data column). This makes it easy to both sort and plot on a number of regression-based statistics. Selecting Allele Frequencies under Genotype Statistics now displays the minor and major alleles, in addition to their frequencies, for each genotypic marker.
» More about Regression Analysis
For a complete list of improvements and bug fixes in v7.1 see the Release Notes section of the SNP & Variation Suite Manual.

Anticipating association studies with hundreds of millions of data points generated per sample by next generation sequencing, the core architecture of SVS 7 has been completely reinvented to efficiently handle datasets of virtually any size on a desktop computer. Smart memory management and data caching ensures you will experience accelerated performance at every step.
Seeing is believing with an intuitive interface that puts your data in genomic context at every step. Discover how rewarding it is to navigate whole genome data live within a spreadsheet - complete with genomic annotations - or visually in a genome browser. For follow up analyses you can quickly look up significant markers in supported online databases. More consistent workflows make performing complex analyses quick, easy, and efficient.
Find more associations with the most extensive collection of genetic association tests, including allele, genotype, haplotype, copy number variation, runs of homozygosity, multi-locus, LD, and regression-based testing. Many tests can be run individually or simultaneously while also controlling for false positives by employing multiple testing corrections and permutation testing. Additional outputs of expected values enable you to generate Q-Q and P-P plots.
SVS 7 offers a complete workflow for copy number analysis and related CNV association studies. Process raw intensity data and simultaneously correct for batch effects, genomics waves and population stratification, while significantly improving signal-to-noise ratios. Employ optimal segmentation to detect copy number segment boundaries both on a per-sample (univariate) and multi-sample (multivariate) basis in the presence of mosaicism, even at a single probe level, and with controllable sensitivity and false discovery rates. Further, calculate CNV covariates for association testing and visualize copy number data in a genome browser.
» More about Copy Number Analysis
A new dynamic analytic visualization tool with integrated genome browser offers exceptional flexibility in how you visualize data and present results. Gain greater insights with unprecedented whole genome views and navigation control. Apply data transformations or analytic functions in real-time. When you finalize the view you want, save your plots to a number of publication quality formats, including scalable vector graphics.
Having collaborated on over twenty SNP and CNV genome-wide association studies, we understand how critical high quality data is for achieving quality results. Therefore, considerable effort has been made to enhance quality assurance at every step. You can now easily generate a number of genotype statistics, view cluster plots of allele intensities, check gender and marker concordance, perform variance analysis on log ratios, filter poor quality markers and samples, and more
In addition to standard quality assurance measures, SVS 7 offers a powerful principal component analysis (PCA) approach for both SNP and CNV data to simultaneously correct for batch effects, genomic waves, and population stratification. New enhancements include streamlined plotting of principal components and the ability to correct data using pre-computed principal components from a subset of markers (e.g. ancestry informative markers).
» More about Principal Component Analysis
The sheer size and complexity of whole genome data makes it extremely difficult to work with. SVS 7 eliminates the hassles with real-time spreadsheet manipulation, data editing, and enrichment. Easily combine multiple sample sets and data of different types, from different arrays, or even platforms. Quickly recode genotypes based on a specified genetic model, flip DNA strands, transcode from AB to AGCT formats, and more. Further, an integrated spreadsheet editor facilitates data editing and transformation on a grand scale.
An advanced regression module allows you to perform linear and logistic regression, stepwise regression (both backward elimination and forward selection), and permutation tests with numeric variables and recoded genotypes. Use a moving window along with numeric or categorical covariates, against a single dependent variable. Regressions may either be performed with all variables and covariates together (“full model”) or with some of the covariates grouped into a “reduced model” (yielding a full-vs-reduced model p-value).
» More about Regression Analysis
Automate workflows, incorporate custom methods, or interoperate with other programs. These are just a few examples of how you can enhance the utility of SVS 7 with a fully programmatic Python scripting interface. New to SVS 7 is an integrated Python script editor that makes it easy to read and write scripts helping even novice users realize the power of scripting.