Automating a clinical workflow creates a stable and repeatable clinical analysis. Automation reduces the potential to introduce human error, helps in regulatory compliance, and improves the precision of the clinical results. It is important to know that if you run a sample through your clinical pipeline, you are going to get the same results today as you will in 6 months. The key to reproducibility is the stability achieved through a reduction in human input. The reduction in human input also allows for higher throughput. An automated workflow can be set up to run as soon as the sequencer is finished with no lag time between analysis steps. It can scale easily by adding more computational resources and free up lab technicians time. This allows them to focus on the variant analysis. For them, this means less time on the keyboard at the command line, and more time analyzing clinically relevant variants in VarSeq. Finally, the automated pipeline scripts provide stable documentation of the process that makes it easier to ensure that you are meeting regulatory guidelines, such as CLIA certification.
From FASTQ to Clinical Report
In this series of posts, we will discuss the analytical process for NGS genetic tests from FASTQ to Clinical Reports. Many steps, especially at the beginning of the pipeline can be fully automated. When a step does require hands on analysis however, we will cover how it can be streamlined and made repeatable through a guided workflow.
In this first part of three, we cover the fully automated steps of the pipeline that can take the raw sequence data to fully called, annotated, filtered and prioritized “candidate” variants and CNVs.
The total pipeline can be broken down into two major sections: Automated and Hands-On. Lets first discuss the pipeline that takes the output from the sequencer and create a VCF with the called variants for a given sample. This entails a couple different steps which are best described by the intermediate files that they create. This step creates the FASTQ, BAM, and VCF files. Next, two work streams are initiated for CNVs versus small variants. For CNVs, the raw BAM files are analyzed to compute the coverage of the target panel regions and that data is used to call CNVs. Both raw CNVs and raw variants from the VCF file can be augmented with genomic annotations and filtered. The result should be a set of prioritized “candidate” variants, that have passed filters based on quality metrics, annotation of population frequency and clinical relevance and in-silico predictions. The end goal of automating these processes is to create a fully preprocessed VarSeq project with clinically relevant high-quality events ready for the next hands-on step of the analytical process. This step can often be done by a lab technician and takes the candidate variants to a QC-validated set of draft interpretations ready for a lab director to review and sign off in a clinical report. We discuss these steps in Part 2 of this blog post series.
Automating Raw Sequence to Called Variants
Setting up the automated pipeline is tedious and requires sharp attention to detail. Fortunately, you only have set this up once, after which you can run the same analysis using the same script on all of the samples. If the sequencer does not output FASTQ files directly, the first step is to convert the output. For example, when working with Illumina sequencers, the larger machines will stream out “BCL” files during the run. These BCL files capture the raw machine instrumental data from the sequencing process. To convert from BCL to FASTQ you use Illumina’s free tool aptly named bcl2fastq. Once the outputs have been converted to the universal FASTQ format, we run the script that will take the FASTQ files and align with the reference sequence to create the initial BAM files. These BAM files are then cleaned up by removing duplicates and performing base score quality re-calibration (BSQR). The final step is to call the variants in the processed BAM files creating the VCF files. These steps are typically the most
Automating Annotation and Filtering
VSPipeline is a command-line version of VarSeq. It uses a VarSeq template which recreates any VarSeq project you have used in the past with a new set in input samples. VSPipeline takes the VCF and BAM files as inputs to create a VarSeq project that is ready for analysis by a lab tech. The BAM files are used to call CNVs for each sample, which are then annotated and filtered based on the template parameters. Likewise, the VCF provides the variants which can be annotated and filtered using any of the sources and algorithms available in VarSeq. This includes the ACMG Guidelines algorithm, which automatically scores the input variants using many of the ACMG criteria. Filtering on the ACMG algorithm classification provides an excellent starting point when looking at clinically actionable variants.
You are not alone, we are here to help
Pipeline automation is a complex process with many different inputs and parameters. Luckily our team is on hand to help you get set up with the best practices for the needs of your lab. Our experience positions us with the tools and pre-built best practice pipelines to help get your own workflow implemented to match your laboratories unique needs. Automating this process simplifies the rote and repetitive steps to have more time to focus on analyzing the high-quality clinical variants that are produced in the VarSeq project.