Cyber Security Strategies for NGS Testing Labs: Part IV

         February 13, 2020

Golden Helix is in a unique position to provide a secure on-premise analysis solution. This capability is based on two enablers. First, we build our software solutions from scratch and from the ground-up with the assumption that it should run on any operating system and potentially behind firewalls or even without internet access. Second, we provide these solutions on a licensing model based on training and supporting users, not tracking per-sample usage of cloud resources. 

Components of Deployment Architecture 

A secure solution requires more than a single component to serve the needs of analyzing, reporting and saving long-term patient and genomic data created while providing an NGS-based genetic test. At a high level, the components of this workflow are as follows: 

  • Automated Secondary Analysis: Next-Generation Sequencing machines create short sequence reads in parallel, covering selected genes many fold over. Each sample must go through a series of algorithms to align these short reads to the human genome and detect variations from the canonical reference. Each variant is then looked up in various populations and clinical catalogs as well as having its functional and gene impact predicted. Furthermore, different algorithms and strategies are used to detect short variations versus large-scale Copy Number Variants (CNVs). 
  • Variant Interpretation and Reporting: With all variants detected and annotated, generally only a few meet the quality and annotation criteria to be evaluated. The vast majority of mutations will be dismissed as benign based on reasons such as occurring frequently in other individuals or not changing the gene product. The remaining variants must be interpreted using standardized guidelines to reduce the variation of clinical outcomes between laboratories. For this reason, a scoring rubric has been published by the American College of Medical Genetics and Genomics (ACMG) that evaluates population, functional, clinical and patient-level evidence to reach a final classification of a variant. Similarly, when interpreting somatic variations in cancer, the Association for Molecular Pathology (AMP) guidelines provides evaluation and reporting systems for cancer gene testing. 
  • Warehousing of Genomic Results and Internal Knowledgebase: As a laboratory follows the test-appropriate guidelines, the work done to score and classify each variant can be saved and re-used the next time that variant is detected. Similarly, each clinical report should be saved and periodically analyzed in an automated scan for new and changed genomic and clinical annotations that may invalidate the asserted results. These critical knowledgebases need to be stored in a secure central server with integration into each step of the analysis process. Furthermore, as NGS tests scale to exomes and genomes, the number of detected variants grow from thousands to millions per sample. Efficient and genomic optimized storage must be employed to enable cataloging all observed variants across all tested samples to leverage the full potential of the laboratory’s efforts. 

Golden Helix has developed and packaged products that can take the analysis from the raw sequence data to the clinical reporting and data warehousing requirements described above. The individual products include: 

  • Sentieon’s Secondary Analysis & VS-CNV: These solutions provide the complete set of secondary analysis algorithms. In their entirety, they can their data through an automated process that takes raw sequence reads produced by an NGS sequencing machine, align them to the human reference and call small variants and large CNV events. The VS-CNV algorithm goes beyond the capabilities of existing public algorithms and provides unprecedented levels of detection of single-exon to whole chromosome events. 
  • VSPipeline: Along with automating the detection of small variants and CNV events, much of the enrichment of these raw variants and even application of the scoring criteria provided by the ACMG and AMP guidelines can be automated to provide lab personnel with precomputed projects with high-quality candidate variants prepared for interpretation. VSPipeline allows for the lab-specific workflows and knowledgebases to be configured into this automated process, complying with the reproducibility requirements of the Clinical Laboratory Improvement Amendments (CLIA) certification required for providing LDTs. 
  • VarSeq: Following the automated analysis, VarSeq provides the rich visual interface to perform sample and variant level quality assurance and the deep dive interpretation work for evaluating the impact of candidate variants on the gene and patient. Dozens of data sources are pulled on-demand to provide rich graphs, tables, and views of previously interpreted variants within the genomic region being evaluated. Variants that meet ‘Pathogenic’ scoring criteria are then placed alongside other test summaries and technical information into a clinical report that can be signed out by the laboratory medical director. Multiple users may be involved with these steps, with often first pass work done by a technician and final review and report summary authored by medical geneticists. 
  • VSWarehouse: The output of the many steps in the test analysis process can be centralized, stored and queried in VSWarehouse. Along with an interface for managing the knowledgebases containing the clinical interpretations and reports, VSWarehouse also has a storage and annotation backend that scales to hundreds of thousands of whole exomes and genomes to support the future needs of any clinical lab. 
  • Supporting Resources Servers: These individual analysis components often need vast amounts of genomic annotations and clinical assertions from both public and licensed sources. Golden Helix provides these annotations as part of the clinical workflow solutions through globally distributed annotation servers. As individual annotation sources are utilized by an algorithm or an interpretation workflow, the latest version can be downloaded locally to support fast and offline analysis. When starting the software, the user must first log in to authenticate themselves and validate their software license. As a logged-in user, their name is associated with saved interpretations and reporting. 

If you wish to continue reading the eBook, I invite you to download a complimentary copy by clicking on the button below.

Leave a Reply

Your email address will not be published. Required fields are marked *