1. Home
  2. VarSeq
  3. Introduction to VarSeq Tutorial
  1. Home
  2. VarSeq Tutorials
  3. Introduction to VarSeq Tutorial

Introduction to VarSeq Tutorial

Welcome to the Introduction to VarSeq Tutorial!

Updated: January 16, 2021

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/welcome.png

Level: Beginner

Product: VarSeq

This tutorial covers a basic cardiomyopathy gene panel workflow with an emphasis on understanding and exploring filter chains and variant tables.

Requirements

To complete this tutorial you will need to download and unzip the following file, which includes a starter project.

Important

The majority of the workflow described in this tutorial can be done within the free VarSeq Viewer however some options do require an active license to perform. You can go to Discover VarSeq and request a viewer or evaluation license.

Download

VarSeq_Intro_Tutorial.zip

Files included in the above ZIP file: Introduction to VarSeq Tutorial – Starter project containing 1 VCF file for the NextSeq 500 TruSight Cardio dataset with the NA12877-Rep_S1 sample.

Note

VarSeq version 2.2.1 was used to create this tutorial. While every attempt will be made to keep this content relevant, it is possible that certain features or icons may change with newer releases.

Setup

The most recent version of VarSeq can be downloaded from here: VarSeq Download.

varseq download

Select your operating system and download. Additional information for platform specific installation can be found in the Installing and Initializing section of the manual.

The Setup Wizard will then guide you through the setup process.

setup wizard start

On the final page of the Setup Wizard, select Finish with the Launch VarSeq option checked.

setup wizard finish

This will bring up the introductory VarSeq page where new users can register their information. This will lead to a confirmation email being sent to confirm the email address.

register

Once the email has been confirmed, users can select the Login tab and enter their login email and password.

login

At this point, the VarSeq Viewer mode is accessed and can be used. If the user already has a license key, this can be activated by selecting Help on the title bar and then selecting Activate a VarSeq License Key.

activate license

This will bring up a dialog where the license key can be entered. Enter you license key, select and select Verify.

activate license key

Once the license key is verified, select the I accept the license agreement after reading the agreement, and select Verify.

Congratulations! At this point, the product license is activated and you are ready to start an example project or a tutorial!

Note

During the initial installation process, the user will be asked where to store the AppData folder. Although this location can be changed after installation, it is recommended that multiple-user organizations select a shared drive location to increase ease of project sharing and to decrease redundancy.

Overview

This tutorial will start with an existing project. A lot of the functionality described in this tutorial can be performed with a “Viewer” license, for those options not allowed you will get instead a warning from the software. We recommend going through the tutorial with an active license or a demo license to experience full functionality within this project. Additionally, this project could be used as a basis for further exploration, annotation and filtering. The end result will be exported data with a candidate variant.

VarSeq software supports several variant filtering workflows such as trio, cancer, and gene panel workflows to name a few. This tutorial will focus on a Cardiomyopathy gene panel workflow that utilizes a set of filtering criteria to find a clinically relevant pathogenic variant.

The ZIP file can be downloaded and the contents can be extracted to a convenient location. Then from VarSeq go to File > Open Project to select the saved project file. Alternatively, if you would like to go through the import process, see the import instructions below.

To import the original VCF files that were used in the supplied project, download them from our Data Repository through VarSeq by opening the software and going to Tools > Manage Data Sources and selecting the NextSeq 500 TruSight Cardio file from the Example Samples > NextSeq 500 TruSight Cardio folder. Select the**NA12877-Rep1_S1 and** NA12878-Rep1_S1 Variant Map files. Then click Download in the lower left corner. See Figure 1-1.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure0-1.png
Figure 1-1: Download the VCF file that was in the sample process to import manually into a new VarSeq project

Return to VarSeq and select Create New Project. At this point you can either choose to use a template or start an empty project from scratch. Provide a name for the project and click Ok. See Figure 1-2.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure0-2.png
Figure 1-2: Create a new project in VarSeq

Next we will import the VCF files by clicking Import Variants. Then, open the folder containing the NA12877-Rep1_S1 and NA12878-Rep1_S1 VCF files that were downloaded and click and drag the files into the import window. Alternatively, you can click Add Files and navigate to the folder with the samples and add them this way. See Figure 1-3. Click next. Now specify the sample relationships. Click next twice more and click Finished to complete the import.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure0-3.png
Figure 1-3: Add VCF files to VarSeq by clicking and dragging the files into the importer or add the files through the Add Files option

A typical hereditary workflow would begin with a genetic test for patients with inherited cardiovascular disorders, hereditary cancers, or other inherited diseases that are identified through the patient’s family history and clinical presentation. Once variants have been called by a secondary analysis pipeline, the variants should be first filtered on quality of the call. Also, it is important to capture rare variants that are not found at high frequency across the general population and this can be achieved by utilizing variant frequency databases. The workflow wraps up by choosing variants that are associated with the phenotype of the patient and selecting for those variants that are pathogenic or likely pathogenic.

The second chapter or section of this tutorial will cover the Filter View, including the input and output of the filter chain, how to expand and collapse a card, how to move a card, change a card and click on a card to get a filter level report.

The third chapter will cover the Table View, including how to hide and show columns, cloning a table and linking and unlinking it to a filter result, viewing intermediate results in the filter chain, and getting column reports and histograms.

The fourth chapter will cover exporting results from VarSeq into various formats.

The last chapter will wrap up the tutorial as well as include links for where to go next to learn more about VarSeq.

The Filter View Interface

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-1.png
Figure 2-1: The VarSeq window with the filter view highlighted and detailed
  1. First, we will examine the input and output number of variants in the filter chain. You should see a number at the top right of the filter chain (as noted in Figure 2-1) that indicates how many variants were imported from the VCF file for the current sample. This number should be 14,597. See also Figure 2-2.
https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-2.png
Figure 2-2: The number of variants from the VCF file for the current sample

You should also see a number at the bottom right of the filter chain (as noted in Figure 2-1) that indicates how many variants remained for the current sample after applying the current filter chain and all enabled filters. This number should be 1. See also Figure 2-3.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-3.png
Figure 2-3: The number of variants from the VCF file for the current sample after applying all of the filters in the filter chain

Note

Two filters are currently not enabled. These filters are grayed out and all of the variants pass through these cards (the card has no effect on the variants in the filter chain).

You should also notice a number on each filter card enabled or otherwise. This is the number of variants that remain in the filter chain after applying the particular filter.

2. Next, we will examine a filter in more detail. In the current project, the filters are shown in a collapsed state. These filters can be examined by expanding the cards and showing the filter criteria. Click on the open rectangle (see Figure 2-4) for the Read Depths (DP) (Current) card.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-4.png
Figure 2-4: Click on the open square for the Read Depths (DP) (Current) card to expand the card and see the filtering criteria

This filter is a numeric filter which means there is a numeric threshold box and options on how to filter based on that threshold. Currently only variants for the current sample are kept if the read depth is greater than or equal to 100 reads. See Figure 2-5.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-5.png
Figure 2-5: 13,222 variants had a read depth of less than 100. The number of variants with a read depth score greater than or equal to 100 was 797. 578 variants were recorded as missing for the current sample.

3. To see what happens to the filter chain when changing a numeric filter, change the value of 100 to 50. You should see that only 1,077 variants pass this filter See Figure 2-6.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-6.png
Figure 2-6: Changed the read depth threshold from 100 to 50 which resulted in the filter chain updating the resulting variants

Now, change the threshold back to 100 and collapse the card by clicking on the horizontal line in the upper right corner of the card. See Figure 2-7.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-7.png
Figure 2-7: Collapse the Read Depth card to hide the threshold and criteria information to conserve space

4. Filter containers are a great way to group filter cards. To make a filter container, right click inside of the filter chain window and select Add Filter Container. See Figure 2-8. Re-name the filter container to ‘Quality Control Filters’ by double clicking on the filter container title. See Figure 2-9.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-8.png
Figure 2-8: Right click in the empty space underneath the filter chain then select Add Filter Container

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-9.png
Figure 2-9: You can change the name of any filter card by double-clicking on the filter card or filter container.

To change the order of the filtering operations, click on a card and drag it to the desired location. We will demonstrate this by moving the Read Depths (DP) (current) and the Genotype Qualities (GQ) (Current) filter cards to into the Quality Control Filters filter container. Then, click on the Read Depths (DP) (Current) filter card and drag and hover over inside of the Quality Control Filters filter container until the black horizontal I-bar appears under the Quality Control Filters filter container. See Figure 2-10.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-10.png
Figure 2-10: Move the Read Depths filter into the Quality Control Filters filter container

Repeat this process with the Genotype Qualities (GQ) (Current) card. See Figure 2-11.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-11.png
Figure 2-11: Both Read Depth and Genotype Qualities filter cards are within the Quality Control Filters filter container

Finally, drag the Quality Control Filters filter container to the top of the filter chain by clicking on the filter container and drag and hover over the top of the filter chain until the black horizontal I-bar appears over the All MAF card. See Figure 2-12.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-12.png
Figure 2-12: Move the Quality Control Filters filter container to the top of the filter chain.

5. To add an annotation source to the project, click on the Add Icon and select Add Variant Annotation. See Figure 2-13.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-13.png
Figure 2-13: Use the Add Icon to add annotation sources to your project

We will add an additional population frequency database. To do this, select the Public Annotations folder and then select 1kG Phase3-Variant Frequencies 5a with Genotype Counts, GHI. You can also use the Filter search bar at the top of the window to easily search for annotation sources. See Figure 2-14.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-14.png
Figure 2-14: Add 1kG Phase3-Variant Frequencies 5a with Genotype Counts, GHI to the project

Once added, scroll all the way to the right in the variant table to see 1kG Phase3-Variant Frequencies 5a with Genotype Counts, GHI within the table and all of the associated fields. Any of the fields in the variant table can be added to the filter chain by right clicking on the field and selecting Add to Filter Chain. Let’s do this with the Allele Frequencies field. In addition, when you click on any column header in the variant table, the Details view will update to reflect the information from the source as well as provide a table of values contained within the column and the proportional of values seen for the input set of variants for each category. See Figure 2-15.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-15.png
Figure 2-15: Add the Allele Frequencies field from the 1kG Phase3-Variant Frequencies 5a with Genotype Counts, GHI source to the filter chain. Also note the updated details view

Now, set the numeric filter to 0.3 for the Allele Frequencies filter card, then move the filter card under the All MAF filter card. See Figure 2-16.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-16.png
Figure 2-16: Set the threshold for allele frequency to 0.3 and move the filter card under the All MAF filter card.

6. Next, expand the All MAF filter card and click on the yellow histogram icon. This will give a histogram of the minor allele frequencies for input variants for this card. See Figure 2-17.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure2-17.png
Figure 2-17: The histogram of the All MAF (NHLBI) values for the variants input into the filter.

The Table View Interface

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-1.png
Figure 3-1: The VarSeq window with the filter view highlighted and detailed

The table view (Figure 3-1) in VarSeq can have numerous columns from the various data sources. To make these columns or data fields easier to work with, they are grouped by source. Columns and column groups can be hidden or shown to make the fields or sources more easily accessible. First, we will show you two different ways to hide and show columns and column groups. Right-click on any column, for example Identifier and select Hide. All column groups with the exception of sample column groups can be hidden in this manner as well. See Figure 3-2.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-2.png
Figure 3-2: Right-click on the Identifier column and select Hide

The visibility of fields for a particular column group can be quickly modified by hovering over the column group and right clicking to see all the fields available. Check the Identifier box to show it again. See Figure 3-3.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-3.png
Figure 3-3: Check the box for Identifier to show the field in the table

Another way to hide and show columns and column groups is to use the Hide/Show or Column Visibility dialog. This dialog allows you to quickly select which columns to see or not see as well as to show hidden columns or column groups. Click on the eye icon in the tool bar to launch the dialog. See Figure 3-4. The column visibility dialog should look like Figure 3-4 with the RefSeq Genes 105v2, NCBI column group expanded.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-4.png
Figure 3-4: Click on the Hide/Show Icon in the tool bar to launch the Column Visibility dialog. Expand and collapse column groups to view the fields in each group. Check or uncheck the boxes to change the visibility of each column group.

Expand the Transcript Interactions RefSeq Genes 105v2, NCBI column group, click on Unselect All to the right of this column group name and then click on the transcript interactions RefSeq Genes 105v2, NCBI box, finally, click on the box in front of HGVS p. and then click OK. This will show the column group but only have the “P.” notation column visible. See Figure 3-5.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-5.png
Figure: 3-5: Show HGVS p. notation

Tables can either updated when filter results are clicked on or they can be fixed to a particular filter result. To tell if a table is locked to a particular filter, examine the table filter icon. If the padlock icon is open the table is unlocked. If it is closed, then it is locked to the particular result identified in the icon. See Figure 3-6.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-6.png
Figure 3-6: The current table shows the results from the current filter chain but it is not locked.

Click on the final result of the filter chain. The table filter icon should now say “Cardiomyopathy Input”. We want to keep a copy of this table a create another table to explore intermediate results. So, click the lock icon to lock the current table then click the clone icon in the Table View tool bar. This will lock the current table and create a copy. See Figure 3-7.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-7-1.png
https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-7-2.png
Figure 3-7: Clone the current table to lock it to the filter chain results and create another locked table.

Click the table for the copy to bring it to the front and unlock the filter view then click on the filter results for the Effect (Combined) filter card. You should see the top table update to include all 82 variants from that card, however, the bottom table locked to Cardiomyopathy does not update. See Figure 3-8.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-8.png
Figure 3-8: Clicking on intermediate results updates the unlocked table but does not update the locked table

Click around some more, expand some cards and see how the table updates. Click on the Gene Rank filter card and expand it. Now right-click on the Greater than 0.9 category and select Results Up Through This Expression. This will update the table to be all variants for the current sample that match our phenotype of interesting (cardiomyopathy) from the original input set of variants. See Figure 3-9.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-9.png
Figure 3-9: Display all of the variants for the current sample that match the cardiomyopathy phenotype from the original input set of variants

See in Figure 3-10 how the table filter icon changes to indicate a particular category was used for the variants to display in the table. See Figure 3-10.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure3-10.png
Figure 3-10: All variants matching the cardiomyopathy phenotype in the table view

Exporting Results

For this chapter, the definition of “export” will be loosely interpreted. The first “export” we are going to perform is exporting a ClinVar RCV number to the ClinVar website to examine a variant in further detail. To start, click on the locked table tab that contains our 1 clinically significant variant and unlock the table. Next, scroll over to the ClinVar* column group. See Figure 4-1. Double-click on the RCV number for the variant and this will launch a web browser displaying results from ClinVar for this variant. See Figure 4-1.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure4-1.png
Figure 4-1: The ClinVar information for the clinically significant variant. Click on the RCV number

Double-click on the RCV number for the variant and this will launch a web browser displaying results from ClinVar for this variant. See Figure 4-2.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure4-2.png
Figure 4-2: Click on the RCV number to pull up ClinVar in an internet web browser

Rows can also be copied out of the table individually. To do this, right-click anywhere in a row and select one of the options. Information from a cell can also be copied out of the table from the right-click menu as well. See Figure 4-3.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure4-3.png
Figure 4-3: Copy information from a row or a cell to the clipboard for pasting

To copy information out of the Detail View, you can either click on the Copy button on the lower right corner of the view to copy all of the visible information or you can click and drag to make a selection and either use Ctrl + C or right-click and select Copy to copy the information and formatting to the clipboard. See Figure 4-4.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure4-4.png
Figure 4-4: Copying information out of the detail view preserves any HTML formatting

The variants and annotation information from the table can also be exported directly to different file formats including VCF, Microsoft Excel XLSX, and text (TXT). Click on the Export icon on the top left of the VarSeq Window and then select VCF File…. See Figure 4-5.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure4-5.png
Figure 4-5: Click on the Export icon and select Export to VCF

Select the (Variants: 14,597) Cardiomyopathy Input: NA12877-Rep1_S1 table to be exported in VCF format then click OK. See Figure 4-6.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure4-6.png
Figure 4-6: Exporting single table to VCF

All of the annotation information can be exported in the VCF file, which will include all samples. Just as columns and column groups can be hidden and shown, the available columns and column groups can be selected for inclusion in the VCF file. Select RefSeq Genes 105 Interim v1, NCBI, Gene and Cardiomyopathy PhoRank and then click Next. Click on Browse to select a location to save the file and set the file to export to export_Cardiomyopathy_tutorial-{sample}.vcf.gz or another name of your choice. Once the options are set click Export. See Figure 4-7.

https://doc.goldenhelix.com/VarSeq/tutorials/varseq_intro/_images/figure4-7.png
Figure 4-7: Select fields and specify file name and path to export the data to a VCF file

This file can then be imported into any other program that takes VCF files or reimported into VarSeq for accounts that have an active license.

Conclusion

This tutorial was designed to give a taste of all the features and capabilities of VarSeq and a brief orientation to key features.

If you are interested in getting a demo license to try out additional features that require an active license, such as creating a project, adding annotation sources, and saving project, please request a demo from: Discover VarSeq

If you have an active license, we encourage you to try out the intermediate tutorial on Cancer Gene Panels: Cancer Gene Panel Tutorial.

Additional features and capabilities are being added all the time, so if you do not see a feature you need for your workflows please do not hesitate to let us know at Golden Helix Support!

Updated on February 25, 2021

Was this article helpful?

Related Articles

Leave a Comment