Welcome to the VarSeq Cancer Gene Panel Tutorial!
This tutorial covers a basic gene panel workflow with an emphasis on adding, modifying and manipulating filter chains.
To complete this tutorial you will need to download and unzip the following file, which includes 3 VCF files for import into a project.
Files included in the above ZIP file: Cancer Gene Panel Tutorial – Contains three VCF files for three replicates at different percentages of Horizon Dx known somatic mutations with NA12877 (increase in dilution from 10%, 25% and 50%).
VarSeq ® software supports both trio and gene panel variant filtering workflows. This tutorial focuses on a cancer gene panel workflow to highlight and demonstrate ways that the filter chain can be used.
This tutorial will start with creating a new project from an empty project template, importing data, creating a filter chain and adding additional annotation sources.
The end result will be a new project template that can be used to create new projects in the future.
Three VCF files are contained in the ZIP file that accompanies this project. They are three replicated samples at different percentages of Horizon Dx known somatic mutations with NA12877 from the Illumina TruSight Myeloid Sequencing Panel. The replicates increase in dilution: 10%, 25% and 50%.
After the ZIP file has been downloaded, extract the contents to a convenient location.
Create a New Project
1. First, open VarSeq. From the VarSeq welcome screen, you can either select an existing project or create a new project. See Figure 2.1.
2. Click on Create New Project, select Empty Project in the template list, and change the project name to Cancer Gene Panel Tutorial. Browse to the correct location to store your project, and then click OK. See Figure 2.2.
3. After creating a new project VarSeq will open with a single view prompting you to import data. Click on Import Variants… to import the data provided for this tutorial. See Figure 2.3.
Importing Data from VCF Files into a Project
Variants are imported into VarSeq from VCF files. Multiple VCF files can be imported simultaneously. If there are multiple samples in a single VCF file, or no samples in a VCF file, these can all be imported into VarSeq.
In this case, we are going to import three VCF files; NA12877 replicates at 10%, 25%, and 50% Horizon Dx dilutions.
1. From the import dialog, click on Add Files and browse to where the example VCF files were saved. Select them and click Open. Then click Next > from the import dialog. See Figure 3.1.
2. On the sample relationships page, select Cancer Samples and then click Next >. See Figure 3.2.
The next step in the wizard is to change the sample options. The samples can be renamed at this point, and the affection status of the samples can be changed. If many samples are being imported simultaneously, and the affection status or other phenotype information is contained in a text file, this can be imported by clicking on From Text File.
For this tutorial, we will change all three sample names. Click in the Rename Sample Text Box for each replicate and rename:
- NA12877-10-Horizon_S3 to 10 Percent
- NA12877-25-Horizon_S2 to 25 Percent
- NA12877-50-Horizon_S1 to 50 Percent
All three replicated are Affected, so we don’t need to change the affection status. Then click Next >. See Figure 3.3.
The status currently has two options:
- Affected: Sample is affected with one or more phenotypes under study.
- Unaffected: Sample is not affected and used to filter out common variants or as a baseline.
4. The last page of the Import Variants Wizard is a report of the data to be imported. If something doesn’t look right you can go back and make changes before committing to importing the data. Once satisfied with the summary click Finished. See Figure 3.4.
5. Once the data is imported two new views are added to the VarSeq window, the Filter View and the Table View. See Figure 3.5.
6. Data is automatically saved when certain events happen, but we recommend that you save the project after importing data. To do this, either:
- Go to File > Save Project, or
- Click on the Save Project icon on the tool bar, or
- Use the keyboard shortcut <CTRL> + S.
Rearranging Views in VarSeq
The goal for this step is to learn how to rearrange views and add views to the window. To do this we will delete the filter view add it back into the window, along with the details view.
1. First, remove the Filter View by clicking on the “X” on the filter tab. You should be left with the table view. See Figure 4.1.
2. Click on the Hide/Show details window icon in the toolbar. See Figure 4.2.
This will create a Detail view in the right side of the screen. You can click on the Hide/Show details window or the ‘Undock’ icon in the details window to hide the details view.
3. Next, Hover over the upper left hand corner to get the stash icon for that corner. Right-click on this icon and select Filter. See Figure 4.3.
You should end up with a view like that in Figure 4.4. Note You can move any tab in VarSeq by dragging and dropping the tabs into target locations. See Figure 4.4.
Adding Quality Filters to the Filter Chain
Creation of the filter chain has been split into two chapters or pages. This first chapter covers adding quality filters, creating a filter container and moving filter cards into this container.
1. The first filter we are going to add is a read depth filter. For this data, the field we prefer is the Read Depth (DP) or Read Depth.
We will add this filter from the table view. From the table, right click on the Read Depth (DP) column and select Add to Filter Chain. See Figure 5.1.
2. In the Read Depth (DP) (Current) filter card, on the left, the (Current) refers to the current sample as indicated above the Filter Chain. If you switch samples the values in this card will update to the sample indicated.
For this data a good threshold is 2500. So change the value in the text box from 0 to 2500 and click on Equal to 2500 and Greater than 2500. See Figure 5.2.
3. Next, we are going to create a genotype quality filter. Right-click on the Genotype Qualities (GQ) column and select Add to Filter Chain. Chnage the threshold for the Genotype Qualities (GQ) filter card to be 50 and click on Equal to 50 and Greater than 50. See Figure 5.3.
4. In addition to filter cards, you can group filter cards together in the filter chain. This is done by creating a filter container. Create a filter container by right-clicking in the Filter View and selecting Add Filter Container. See Figure 5.4.
5. We are going to put both the read depth and genotype qualities filter cards into this container. First, we will change the title of the filter container to be more descriptive. Double-click on the title of the filter container and edit the title to be Quality Filters. See Figure 5.5.
7. Click and drag the two quality filter cards ( Read Depth (DP) and GQ) into the Quality Filters container. See Figure 5.6.
These two filters are applied sequentially, the container in this situation is a way to group similar filters. Filter containers have logical conditions to determine how variants pass through the various filter cards within the containers. By default this sequential logic or AND logic means that for a variant to pass through the filter container it has to pass through all filter cards in the container. The logical condition can be changed to OR which would mean that a variant would only have to pass through at least one filter card in the container.
Try this now, by clicking on the wrench icon for the Quality Filter container and choose OR. See Figure 5.7.
Notice how the cards are rearranged to be side-by-side instead of vertically stacked? This is a visual cue so that you know the logic being used for the filter container.
Now, switch the logic back to AND.
8. Now that we are finished with the quality filters, we will collapse the filter container so that it takes up less space in the filter chain. To do this, click on the dash in the upper right corner. This same behavior can be used to collapse filter cards as well. See Figure 5.8.
NOTE: If the filter container logic was set to OR, instead of a horizontal bar the collapse button would be a vertical line.
At the end of these steps your filter chain should look like this: See Figure 5.9.
Adding Additional Filters to the Filter Chain
This chapter will cover a different way to add a filter card to a filter chain, annotating variants against a COSMIC data source and creating filter cards for COSMIC fields.
1. Another way to add a filter card to the filter chain is directly from the filter chain. We will do this to add an variant allele frequency filter. This uses the variant allele frequency values that are computed from the number of reads with an alternate allele divided by the total read depth. In this case the formula is (FSAF + FSAR)/FDP.
Right-click below the Quality Filters container in the filter view and select Add Filter. See Figure 6.1.
2. The source and field selection dialog should open automatically. If it doesn’t click on the wrench icon in the upper right hand corner of the new card. See Figure 6.2.
Type in variant into the filter box and click on Variant Allele Freq and click OK. See Figure 6.3.
Now that the Variant Allele Freq filter card has been created, type in 0.01 in the input box and click Greater than 0.01. This will keep all variants that have an variant allele frequency greater than 0.01. See Figure 6.4.
To keep only heterozygous variants, click on the + button next to the text box and enter in a new value of 0.85, and select Between 0.01 and 0.85. See Figure 6.5.
Now delete the 0.85 value by clicking on the minus – sign next to the box with 0.85 in it.
5. Next, we will select a public annotation source, download it, and annotate the variants with this data source. From the menu bar, click on the add icon, and select Variant Annotation. See Figure 6.6.
Now click the Public Annotations location on the left side of the dialog and then type CIViC filter box to show the CIViC sources. See Figure 6.7.
Select CIViC- Region Clinical Evidence Summaries 2021-01-01, WUSTL and press Select. If already have a local copy download VarSeq will start the annotation process. If not, then you will be prompted to download the required source. Select Yes to download the source. See Figure 6.8.
Once the download has finished VarSeq will automatically add the annotations to the variant table. Scroll to the right in the table view to see the new column groups added for CIViC. See Figure 6.9.
6. To filter variants down to only those present in CIViC, right-click on the Matched? column and select Add to Filter Chain Click on True on the filter card. See Figure 6.10.
7. To filter variants down to a particular set of genes in CIViC that contain or start with the same string say, all NOTCH genes, right-click on the Gene Name column and select Add to Filter Chain. Type in NOTCH and select Starts with NOTCH. See Figure 6.11.
Collapse this card by clicking on the minus – sign and double click on the card title and change it to NOTCH Genes. See Figure 6.12.
8. Filter cards can be rearranged by clicking, dragging and dropping them in the desired location in the filter chain or in a particular filter container.
Click and drag the NOTCH Genes filter card to the top of the filter chain and drop it there. See how the number of variants change? While the end result is still 2 variants for 10 Percent, there were originally 2,971 variants for 10 Percent in NOTCH genes. See Figure 6.13.
Exploring Filter Chains for Each Sample Including the Control Sample
The filter chain filters variants for a single sample at a time. You can browse through all of the samples to examine the effect of the filter chain on each sample. By default, only the affected samples will be shown in the sample selection dialog, or through browsing, but this option can be changed to show all samples.
In this section of the tutorial we be doing the following:
- Go through examining the results of the filter chain set up in the previous steps in this tutorial for all samples including the control sample.
- Then we will perform variant classification on the variants.
- We will wrap up by removing any variants present in control sample that have at least one copy of the alternate allele.
1. On the menu bar, you should see a left navigation arrow, sample name and a right navigation arrow. Click on the navigation arrows on either side of the sample name to see how the numbers of variants that pass through the filter chain changes. See Figure 7.1.
2. First, 10 Percent has 2 variants that pass through the filter chain. See Figure 7.2.
3. Next, 25 Percent has 3 variants that pass through the filter chain.
4. Finally, 50 Percent has 3 variants that pass through the filter chain. See Figure 7.4.
5. To perform variant classification (Annotate Transcripts), all we need to do is annotate against a transcript/gene source. From the menu bar, click on the Add icon and select Variant Annotation…. Enter in RefSeq to the filter dialog and select the RefSeq Genes 105v2, NCBI source and click Select. See Figure 7.5.
After the annotation has finished, new column groups will be added to the table pertaining to variant classification and transcript/gene annotation.
Finishing Project and Saving as Template
To wrap up this tutorial, we will save this project and also save it as a template that can be reused on other samples.
Much more analysis can be performed in VarSeq, however, we hope this was a good orientation to most of the functionality of the filter chain and how to annotate variants and perform computation on sources.
1. To save the project, either:
- Go to File > Save Project, or
- Click on the Save Project icon on the tool bar, or
- Use the keyboard shortcut <CTRL> + S.
2. To save the project as a template, go to File > Save Project as Template….
Suggested information that can be filled in the text boxes is shown in the image below and also in the bulleted list that you can copy and paste into the dialog. See Figure 8.1.
- Name: Cancer Gene Panel Tutorial Template
- Description: This template takes multiple cancer gene panel samples, performs quality filtering, variant allele frequency filtering, identifies variant in COSMIC, performs transcript annotation.
- Series Name: Cancer Template
- Version: 1.4.2
- Author: Golden Helix, Inc. (or your name)
- Tags: cancer
- Specific Assembly: Check this box and make sure the assembly is set to Homo sapiens (Human), GRCh37 g1k (Feb 2009)
Once this information is set click Save.
Congratulations! You have completed the Cancer Gene Panel Tutorial. If you had any problems or have any questions please feel free to contact us at firstname.lastname@example.org.