Getting Started Guide for Sentieon

         September 21, 2018

Sentieon; your swift secondary analysis solution.

Golden Helix’s software solutions present a reputable and top-quality analysis of your NGS data. Looking at this process from a 30,000 ft view, the annotation and filtering of variants in your vcf files and discovery of CNVs based coverage data in the bam file make up the tertiary level portion of the analysis. However, what solution does Golden Helix offer regarding secondary analysis; i.e., the alignment of reads and variant calling process?

Last year we announced our partnership with Sentieon to provide our users a top of the line accurate and exceedingly fast secondary solution. The purpose of this blog post is to add additional instruction on getting started with Sentieon and prepping to run your secondary pipeline.

First steps: Getting started

If you click on the link above, you will be directed to our step by step guide to get Sentieon installed and set up on your machine. There is some consideration to have before making license requests and downloading content. First and foremost, consider the environment you would like to run the software. Sentieon is fast, really fast, but it still is going to perform best in a robust computational environment (a server for example).

Following the guide, you’ll see the first step is to download the required files. After updating your proxy settings if necessary, you’ll then need to prep to download the Sentieon content (Fig 1). After installing Git, you can then run the downloading scripts (Fig 2). These instructions coincide with additional instruction for a Windows install.

Fig 1. The user will need to have Git installed which can be done using these listed commands.
Fig 2. Steps to clone the Git repository and download the Sentieon software and reference sequence.

With these steps completed, you will then have the Windows_Sentieon directory created containing the secondary analysis directory. If you are installing in windows instead of Linux, you’ll also create a Cygwin terminal to run the simple Linux commands for Sentieon.

Fig 3. The new secondary analysis folder will be created after running the download scripts. The Cygwin directory/terminal will be available for users who would like to run Sentieon on Windows.

Second Step: Requesting the License File

This step will also require contacting Golden Helix to discuss setting up a trial run of the software. Our sales team is always available and excited to discuss the benefits of utilizing Sentieon as your secondary analysis solution. Once we have approved the trial session for our users, they can run this licensing script to get the license file from us at Golden Helix (Fig 4).

Fig 4. License request command where the user will enter their email address and automatically send Golden Helix/Sentieon the necessary machine information to be included in the license file.

Step 3: Creating an example pipeline

We supply an example pipeline that users can utilize to gain insight on what a pipeline script (i.e., bash script) may look like. This example file is available in secondary analysis folder after download. After receiving the license file from Golden Helix, you may also consider copying it into your secondary analysis folder (Fig 5).

Fig 5. Secondary analysis directory containing the example build_pipeline script that users can either use directly or reference to develop their custom pipeline.

When running the pipeline script, you’ll go through a series of steps to name your sample, designate the path to your Fastq files (in an inputs directory I created in the secondary analysis directory; see Fig 5.), output alignment metrics, and access the license file (Fig 6).

Fig 6. The process of running the example pipeline script shipped with the secondary analysis folder.

After running the build_pipeline script in either your Cygwin or Linux terminal, you will see that a new output directory is created in the secondary analysis directory, which contains the call_variants script (Fig 7).

Fig 7. Newly created output directory and script file.

Let’s take a quick look at what is in this call variant script. This script file contains the number of threads determined from when running the license request command, annotation and reference sequence file paths, sample names and paths, path to Sentieon tools and license file, and the output folder path.

Fig 8. The top section of the call_variants script created after building the example pipeline.

The next section of the call_variant script lists the algorithms and steps along the alignment and variant calling process. This first runs the BWA-MEM equivalent for alignment, producing metrics results, deduping the repetitive reads and finally calling the variants (Fig 9).

Fig 9. Section of the call_variant script that includes the alignment, metric, deduping, and calling steps with the associated settings and algorithms.

Users can customize their call_variants script to whichever settings and algorithms they wish to use, and an excellent reference for these options is the Sentieon manual found in the secondary analysis directory.

Now the final step is to run the call_variant script (Fig 10). You’ll see this script run through the alignment, metrics, deduping, and variant calling steps. After completing the run, you’ll find the bam/vcf files in the output folder, as well as the metric output information generated in the second step from the script. These bam/vcf files can then be imported into VarSeq/SVS for your DNAseq analysis. This concludes the basics behind getting Sentieon installed and running your example pipeline. A future Sentieon blog will describe more advanced steps in customizing your call_variant script for joint calling and running batches of samples in one run.

Fig 10. Running the call variant script.
Fig 11. Final output after running the call_variant script will include the vcf and bam files ready for import into your Golden Helix software solution.

Part II of this getting started can be found here which covers custom scripts for batch runs. 

Leave a Reply

Your email address will not be published. Required fields are marked *