If you stay current on the developments of Golden Helix features, you are aware of the substantial evolution of our copy number detection and evaluation capabilities in VarSeq. The process of CNV detection and evaluation is typically handled through the VarSeq graphic user interface. However, in some cases, users benefit from running this process via the command-line interface. Fortunately, Golden Helix seeks to support our users in both realms and this is the case with our VS-CNV command-line tool shipped with the VarSeq software. The purpose of this blog is to expose our users to the VS-CNV command-line tool so as to provide an alternative CNV calling approach.
First step: Accessing the tools
Once you’ve installed the VarSeq software, You’ll see the VS-CNV tool in the installation folder. Figure 1 shows the option to navigate to this folder via terminal or from the user interface in Figure 2.
Second Step: Building the CNV Reference Samples
Entering commands with no arguments presents the user with a help message. In Figure 3, you’ll see the command entered at the top to run VS-CNV, define the user login credentials, and the command to build the CNV reference set. The fundamentals of the CNV approach in VarSeq is to use a collection of reference sample coverage data to create a normalized diploid coverage profile for all samples in your cohort population.
Figure 4 shows an example run of the reference sample coverage calculation defining the path to the CNV reference folder, the path to the reference BAMs and the bed file defining all target regions (exons) for the genes in this TruSight panel.
Once the command is complete, you can then browse to the CNV reference folder to see the coverage calculations for each sample stored in TSF files. These references need to be calculated prior to any CNV calls for individual samples which is our next step.
Third Step: Calling CNVs
Now that our reference sample coverage statistics are completed, we can carry out the CNV detection commands. Figure 6 shows the command “target” used to detect CNVs for targeted region coverage for both the panel and exome CNV methods. The final output is a TSV file containing the CNV calls for sample 1.
As the CNV caller progresses, users will see the process unfold in the terminal which can be seen in Figure 7. The final product is seen at the bottom of the output where the sample TSV file is written. You can see the example CNV calls for this TruSight panel in Figure 8.
The example shown above was a simple approach to running CNVs for a smaller panel. Many of our users may also wish to run this process with their exome or whole-genome data. Below is a screenshot of additional CNV calling steps for including LOH calls to exclude non-normal coverage regions and the binned approach for the whole genome. All of the described steps listed in this blog can be referenced here.
If you would like to learn more about this command-line approach of CNV calling or alternatively deploy this process through the graphic user interface, please reach out to [email protected] for a training session. If you are interested in scheduling a call to find which Golden Helix product would best suit your needs, please email us at [email protected].