A scalable, multi-project warehouse for NGS variant call sets, clinical reports and catalogs of variant assessments.


As Precision Medicine is taking off, the number of samples in a testing lab and the associated data volume is increasing exponentially. In order to organize the data and build a knowledge base of cases that can be used for future analysis as well as ongoing research, labs need to leverage state of the art warehousing technology. Building on the algorithms and high-performance storage technology powering the VarSeq® software, VSWarehouse is a scalable, multi-project warehouse for NGS variant call sets, clinical reports and catalogs of variant assessments.

Organize samples into Projects

Rather than having a costly and mutable single large relational model, VSWarehouse builds on the highly-performant storage technology developed by VarSeq to allow your samples to be organized in as many fully-versioned projects as needed in a fraction of the space. As new samples get uploaded from VarSeq's integrated VSWarehouse uploader, a background job is queued and run to create a new version of the project.

Scalable Technology

VSWarehouse is built on the Postgres database technology stack with a completely customized and optimized storage and query-execution layer. Taking advantage of the matrix structure of genomic data, a very space-efficient columnar and compressed storage engine allows projects computed with VarSeq's mature NGS data wrangling and annotation algorithms to be stored at a fraction of the size of traditional databases while still allowing for the full power and utility of a mature SQL front-end.

Variant Assessment Catalogs

VarSeq strives to provide all the high and low-level details needed for a variant scientist or medical professional to classify or QC variants for a specific sample or presenting phenotype. Our Assessment Catalog feature allows for a flexible way to capture lab-specific flags or classifications of variants outside of the single-project context, so it can be used as an annotation source for future projects. VSWarehouse acts as the hosting server of these assessment catalogs, providing a web-interface in which to query and manage them.

Central VSReports Hosting

VSReports allows for customizable report templates to be completed on a sample-by-sample basis in VarSeq. These sample level decisions and the rendered report are saved at the project level (and exportable as HTML or PDF). VSWarehouse allows for the same user experience within VarSeq, however, the reports are hosted, saved and indexed on the VSWarehouse server. All reports are then able to be queried at the variant or sample level, with the rendered reports hosted on the server and are ready for download or integration with other internal systems.

Projects as Variant Frequency Annotations

Projects hosted on VSWarehouse can be used as annotation sources in VarSeq to be integrated into your custom variant annotation and interpretation workflow. This allows any new variant to be annotated and potentially filtered with the frequency of that variant in your warehouse projects.

Multiple Interfaces

Without losing a single piece of information in the VCFs, VSWarehouse creates a single annotated matrix of all unique variants for all uploaded samples that is accessible through multiple interfaces include web-based, annotation interface from VarSeq and more.

Use Cases

Managing Massive amounts of Genetic Data

As Next-Generation Sequencing is taking off in the clinic, it creates a significant data management issue for clinicians, scientists and IT professionals alike.

How can we retain massive amount of data coming out of clinical pipelines in a way that enables labs to systematically build a knowledge base capturing the insights clinician gain on a day to day basis analyzing the genetic information of their patients? What infrastructure is required to alert medical personal of new research that could potentially alter medical decisions? And how can we embed the work that is being done in the labs into the general hospital workflows? Data warehousing is a pivotal technology that can help in all of these areas.

Relevant Data Warehouse Concepts

In a nutshell, a data warehouse integrates the following concepts:

  • Full scale storage of all genomic variant data across gene panels, exome, and whole genome data. This includes not only the complete variant data in the VCF, but also clinical reports and stored ACMG or AMP variant classifications and interpretation records.
  • Overlay data from the outside, such as industry benchmark data. In our domain, clinicians will likely leverage databases such as ICGC, CIViC, ClinVar, dbSNP, gnomAD Exomes and Genomes and many more.
  • Store data in suitable format to allow for easy access and decision making: This requires the definition of a unified warehouse data model.
  • Allow the deployment of analytics: Creation of dashboards, that give the users insights about the content of the warehouse and the ability to query the data models, e.g. how many samples, what variants have we seen in this gene.
  • Provide mechanisms to connect with external systems.

Easily Reference Previous Work

Among other things, our customers deploy the our warehouse solution to reference past work in their ongoing clinical interpretation, they maintain they own assessment catalogues and determine allele frequencies for a specific population or disease category.

Recommended Learning Materials

We have a variety of supplemental learning materials that are an excellent resource for anyone interested in the industry or our software solutions. Here are some of our recommended materials for you to check out related to VSWarehouse!


Read our eBook on the data explosion in genetics and how warehousing will come into play!


Learn how to leverage our state of the art genetic data warehousing technology.

Getting Started with VSWarehouse

Watch Now

Other Resources

Explore a clinical workflow in the VarSeq or follow along with a tutorial!

VarSeq Viewer:
Download Here

VSWarehouse Tutorial:
Download Here

Try VSWarehouse for Free

Did you know we offer complimentary trials of our software? No restricted features, no sample data - you get to try all the features of VSWarehouse with your data and see how it works!

If you are interested in a trial, please fill out the form below, and we will send you the details!

Technical Specifications

VarSeq is on-premises software, ensuring full control over installation and data management. It is compatible with various deployment environments including workstations, server setups with remote desktop access, and private cloud servers.

The software is optimized for operation within strict corporate firewalls. It seamlessly integrates with existing web proxy configurations, ensuring uninterrupted functionality in secured network infrastructures. VarSeq's internet connectivity requirements are minimal. It only needs to connect to a select group of Golden Helix servers. This connection is essential for license verification and accessing annotation data updates.

See System Requirements for more details of hardware and operating systems requirements based on planned workflows.