Welcome to the VarSeq VSClinical ACMG Tutorial!
This tutorial covers a basic VSClinical ACMG workflow with an emphasis on understanding and exploring VSClinical ACMG classification tools.
To complete this tutorial you will need to download and unzip the following file, which includes a starter project.
VarSeq VSClinical ACMG
VSClinical is a tool that provides a simple way to leverage all the available evidence for a variant and score it for the potential impact it has on a disorder. The available evidence can be categorized into groups which includes considerations for gene function/phenotype association, variant segregation in a family, results of running variants through prediction algorithms or their presence in databases, but also utilizing previous discoveries you yourself have made for any variant.
This collection of evidence is then linked to 33 criteria that the user can efficiently assess in a streamlined effort. These criteria then are aggregated and provide the basis for the final classifications of pathogenic, likely pathogenic, uncertain significance, likely benign or benign. When considering the amount of available evidence and requirements for eventually classifying variants, this process can become complex and difficult to master. Fortunately, VSClinical is a solution to this complexity for a number of reasons.
VSClinical provides a means of simplifying not only the process of scoring and classifying variants, but also provides both a simple yet sophisticated means of presenting all evidence and criteria visually.
As you work through the classification process, you will be presented with questions you are left to answer to connect the evidence to the best criteria, but VarSeq also computes recommended answers while providing all the supporting evidence for each recommendation.
Figure 1-2: Pathogenic classification results.
One major concern for the process of answering these evidence-based questions and selecting the appropriate criteria is the repeatability of the resulting classification. A possible overlooked issue when manually going through the guideline process is the impact of fatigue. VSClinical removes all the variability and is easy to learn! Through VarSeq’s intuitive workflow, any user not well versed in the ACMG guidelines can be brought up to speed rapidly and produce consistent results. Moreover, we have even implemented a capability of having multiple users make individual blind evaluations, which then can be compared to determine if multiple evaluations have consistent classifications. It is also worth mentioning that despite the application of a simplified workflow, the user can really deep dive into any of the supporting evidence to further clarify the variants impact/classification.
Another advantage of getting new users up-to-speed is that the high workload of assessing variants is streamlined and less users can do more. On top of this, our team works to incorporate new developments to the guideline process, so you don’t have to.
This is a simple interpretation into the immense power behind the VS clinical ACMG workflow, and we have several other resources/publications that can shed more light on the content, but for now let’s focus on looking at an example workflow.
2. New Project Workflow
This tutorial walks the user through an ACMG workflow from blank template through to report. First, a new, blank project will be created, the VCF file imported, the variants filtered, and the ACMG Classifier run on the selected variants. This will lead to some example case variants and finally a report to highlight the results.
This tutorial was accompanied by a VCF file contained in a ZIP file. Before starting, download the ZIP file, and extract the contents to a convenient location.
This tutorial begins by opening up a new instance of VarSeq and creating a new project with the GRCh37 Genome Assembly and Empty Project template. The image below shows these options and labels this project: ACMG Tutorial Example.
Selecting OK brings you to the next screen that directs the user to import variants. Select Add Files, navigate to the location of the extracted ZIP file, and add select the VCF file. The default selections can be kept, and the import completed by selecting Next three times and then Finished.
At this point we have a blank template project with 1 sample imported consisting of 30,294 variants, so the next step is to filter the variants down to a smaller number of applicable variants for further examination.
3. The Filter View Interface
We first want to look at high quality variants, so the first filter we will apply is that of Read Depth. Now this field is not shown by default, so we will show it by selecting the eye icon in the Variant tab, expand the Sample Fields category, and select Read Depths (DP), and then OK to close this dialog.
Next, returning to the Variant tab, this field can be added to the filter chain by locating it under the Sample field category, right click on it, and select Add to Filter Chain.
Next, enter a value of 100 in the Read Depths card and select the Equal to and Greater than filter card options. Next, add the Genotype Qualities (GQ) field to the filter card by right-clicking on the column title in the Variant tab and selecting Add to Filter Chain. For this filter, we will input a value of 20 and select the Greater than option on the filter card. Now the VarSeq project and filter chain should look like the image below.
Next, we would like to filter further using specific annotation sources. To add annotations, click on Add in the title bar and select Variant Annotation.
This will bring up the Select Data Source dialog where you want to select the Public Annotations header in the file tree.
We are going to select 2 different annotation sources, NHLBI ESP6500SI-V2-SSA137 Exomes Variant Frequencies 0.0.30, GHI, and ClinVar 2019-02-01, NCBI. To find these in the list of annotations, one can simply scroll through the list, or type title snippets in the Filter box to limit the annotations displayed. To select this specific older version of ClinVar, you must un-check the Current checkbox in the top-right of the window as previous versions of sources are filtered by default.
Note: We are using these specific allele frequency and ClinVar tracks here to demonstrate manually annotating and filtering variants, not as examples of a production workflow. At the end of this tutorial we will discuss better filters and how to save your project as a template. We also suggest using the ACMG Guidelines Gene Panel template as a starting point for production workflows.
The annotation sources are selected by checking the box next to the annotation title and then clicking on Select. At this point, the user may be asked to download the selected sources to the local system for faster recall in the future, so select, Yes.
The annotation sources are added to the variant table and can be found by scrolling to the right. We are going to add a filter to the variant Classification field in the ClinVar track and All MAF from the NHLBI track. This is done that same as above by right clicking on the field column title and selecting, Add to Filter Chain.
Since we are trying to filter down to problematic variants, we are going to select Likely Pathogenic, Pathogenic, Uncertain Significance, and Missing in the ClinVar Classification filter card, and enter a Minor Allele Frequency value of 0.3 and select every option EXCEPT the Greater than in the filter card.
Keep in mind that we can rearrange the filters to change the order in which the filters are applied. The filter cards can be rearranged by left clicking on the card title and dragging and dropping the card in a different order. While moving cards, users can notice that the variant totals will change accordingly as the filters are evaluated top-to-bottom. We will use the filter cards in the order of Read Depths, Genotype Qualities, Classification, All MAF to match the image below.
At this point the extensive filtering has led us to 987 variants. We will order these in decreasing Read Depth (DP) by right clicking on the Read Depth column and selecting Sort Descending.
From this point, we would like to run the ACMG Classifier algorithm.
4. ACMG Algorithm
This is performed by selecting Add in the title bar and selecting Computed Data…. And under Variant > Per Sample select ACMG Sample Classifier and click OK.
Running this algorithm will require some additional annotation sources to be downloaded for use and the status of these sources will be shown in the following dialog.
This dialog prompts the user to choose the assessment catalogs for saving Cancer Interpretations, Somatic Variants, and Germline ACMG Variants. If assessment catalogs for any category already exist, they can be found or navigated to from the corresponding drop-down menu, but we want to create new assessment catalogs for this tutorial by selecting Create first for the Internal Database of Cancer Interpretations section.
The create catalog will first ask you to determine the database type. The options are SQLite, PostgreSQL, MySQL, or if you have the added feature connected, VSWarehouse.
If not using VSWarehouse, a common choice is using the SQLite type and saving the catalog locally. In either case, select a name or location for the new assessment catalog and select, OK.
At this point an assessment catalog was selected or created and we are ready to run the ACMG Sample Classifier by selecting, OK on this screen as well as on the following screen to keep default values.
This may take a minute to run, but will add the ACMG Sample Classifier results to the variant table.
At this point, it would be useful to change the column visibility using the eye icon at the top left of the variant table. Using the up arrow on the bottom right, move the ACMG Sample Classifier fields below Sample Fields. This will position the ACMG Classifier columns next to Sample Fields for easier navigation.
Figure 4-5: Changing column visibility.
Once the algorithm completes computation, scroll to the output in the variant table. Feel free to review the output, particularly the Classification column. Right click on the Classification column to add this field to the filter chain. Once added to the filter chain select Likely Pathogenic, Pathogenic, and VUS/Weak Pathogenic. Your filter chain should look like Figure 4-6, leaving us with 18 variants for potential evaluation in VSClinical.
Return to the variant table to create a variant flag used for selection of variants to be evaluated. Ensure that the variant set is Sample specific then click OK.
With the variants sorted for descending read depth, select the top 4 variants for evaluation: the CT deletion in PTEN, large insertion in EGFR, and the G/A variants in MLH1 and RAF1.
The next step is to look at the ACMG Guidelines.
5. ACMG Guidelines
After running the ACMG Sample Classifier algorithm, opening the ACMG Guidelines is as easy as opening a new tab and selecting VS Clinical, and then selecting ACMG Guidelines from the drop-down menu.
This will open up a dialog that will ask the user to specify which assessment catalog to use (or create a new catalog) for each of the three databases. Starting with the Internal Database of Classified Germline ACMG Variants for Samples, the database options are SQLite, PostgreSQL, MySQL, or if you have the added feature connected, VSWarehouse. If not using VSWarehouse, a common choice is using the SQLite type and saving the catalog locally. In either case, select a name or location for the new assessment catalog and select, OK.
Repeat this step to create two more catalogs, one from the Internal Database of Classified Germline ACMG CNVs for Samples and one from the Internal Database of Gene Dosage Sensitivity Curations. The example below shows 3 created SQLite assessment catalogs called Germline, ACMG CNV Case Catalog, and ACMG CNV Gene Catalog.
Next will be creation of the variant sets for Primary and Secondary Findings and Uncertain Significance variants. Click the Create link at the bottom of the window to create the variant sets then click Apply.
The next window will present and evaluation summary but requires the selection of which variants will be evaluated. Scroll down to the Variants to Evaluate section and click Add Variants From Project
In the top left corner, change the dropbox to the To Evaluate variant set where you’ll see the selected variants come through. Click the Score all Recommended Criteria checkbox at the bottom and click Prepare to Add. Then you’ll see the autoclassifications generate for the selected variants. Now click Add Variants to move onto evaluation.
With the variants ready for review, click the Variants tab at the top of the window to move into the detailed evaluation for each variant.
Each variant is automatically evaluated by the ACMG recommendation engine, with many of the individual criteria receiving recommendations on whether it should be answered Yes or No as well as detailed reasons for these recommendations.
In the Evidence Summary section, criteria that are recommended will be listed under Recommended to Score Pathogenic and Recommended to Score Benign. The name of the recommended criteria are in colored bubbles with arrow links to navigate to the section containing the criteria question and supporting evidence.
Criteria questions are organized in the following sections lower in the window: Population, Gene Impact, Studies, and Clinical. In the following pages of this tutorial, we will go through a number of examples that will take us through these sections. As we confirm criteria questions, note how the right bar ACMG Scoring section will update with the currently scored criteria and resulting ACMG Classification.
The following examples are taken from the evaluation created with our specific filtering and sorting efforts. The first example is an interesting missense variant in the RAF1 gene that at first looks to be pathogenic.
6. Example 1 Pathogenic RAF1 S257L
The first variant is a missense RAF1 variant in exon 7 and is shown as number 1 in the variant cards of the sidebar. This variant will be determined to be a straightforward pathogenic variant.
First look at the Population Frequency section below which shows that this variant is novel in both gnomAD Exomes Frequency and 1000 Genomes Frequency tracks which allows the inclusion of the first moderate criteria, PM2. Next, move on to the Gene Region and Mutation Profile section.
Another moderate criteria PM1 is confirmed here since this variant is located in a mutational hotspot with no known benign variants. Furthermore, with a high missense Z-score shown in the Missense as Mechanism of Disease section, the gene has a low rate of benign variants and is known to have other missense variants which provide a common mechanism for disease. This combines to capture the first supporting evidence criteria, PP2. Moving down further to the Computational Evidence section, the second captured supporting evidence is PP3 since SIFT and Polyphen predict the variant damaging and PhyloP and GERP++ predict it to be conserved.
The last two criteria to assess are in the Clinical and Functional Studies section. Due to the presence of multiple other high-quality (2+ star rating) pathogenic variants in ClinVar, the strong PS1 criteria and moderate PM5 criteria can be included. This of course is contingent upon the basis of the literature review, of which is easily accessible with VSClinical. In this section, variants can be searched in Google, Google Scholar or PubMed from their corresponding buttons. Also, the detailed assessments and citations of labs submitting this variant to ClinVar can be reviewed. Two publications are easily found that reference not only this specific variant, but also other pathogenic variants in the same amino acid. These papers can be cited in the final interpretation. After completing these classifications, return up to the ACMG Classification/Interpretation section.
At this point all of the criteria have been accounted for, the auto interpretation built, and the final classification presented, which is pathogenic following rule iii of the ACMG rules having: 1 strong, AND 3 moderate 2 supporting criteria.
In summary, this has been determined to be a straight-forward pathogenic variant, so we want to include this in the final report, and specifically in the primary findings. So, in the interpretation section for this variant, we will select the Primary Findings check box, verify that Pathogenic is selected in the Classification drop-down menu, and then click in the For Disorder: box. This brings up a list of associated phenotypes based on the OMIM annotation source. Select Noonan Syndrome 5, and we will leave the Inheritance / Variant Type: as the default. Click Review and Save to confirm the variant interpretation and then Save and close in the pop up menu.
After saving the interpretation for RAF1 missense variant, we’ll now click on the MLH1 synonymous variant.
7. Example 2 Uncertain Significance MLH1 Q701= Splice Variant
The first criteria listed for this synonymous variant in MLH1 is yes for PM2 being rare or novel from frequency catalogs 1kG Phase 3 and gnomAD Exomes. Looking at the Computation Evidence section, you’ll see that this G>A mutation is not only in a conserved position but is predicted to disrupt a splice site predicted by all splice site algorithms.
You would expect the inclusion of PVS1_strong since all 4 algorithms agree in disruption of a donor splice site with 40 downstream pathogenic variants indicating this region is critical to protein function. However, because the mutation results in a synonymous variant, this is a special consideration in which PVS1 cannot be scored. This demonstrates how some of the complexities in applying criteria are being automatically considered and presented to the users for thorough evaluation.
Let us move on now to example 3.
8. Example 3 Likely Pathogenic EGFR In-Frame Insertion
The third example is an in-frame insertion in EGFR. Select this variant card and look at the Population Frequency. This section shows that the variant is novel by looking at the frequency tracks, so includes the PM2 criteria. Next, move to the Gene Region and Mutation Profile section.
Expanding the Gene Region and Mutation Profile section and showing the details for PM1 explain that this 12 bp in-frame insertion is located in a mutational hotspot, with 2 other variants within 6 amino acids having been shown to be pathogenic and none shown to be benign. Along with scoring PM1 (Hot spot), we can score PM4 as this insertion changes the protein length by adding 4 additional residues while not being in a repeat region (seen in the Coding Change and Repeats section). Finally, we can see that in the Computational Evidence section that this position is conserved by GERP++ and PhyloP across 100 vertebrate species and we can score PP3.
While looking at this section, we can consider the representation of any insertion or deletion with the display of the alignment of the mutation against the transcript strand. The standard procedure is to represent an insertion or deletion in the right aligned format, or 3’ alignment against the transcript. However, in forward strand genes, this position is opposite of how a variant is represented in VCF file (left-aligned to the reference sequence). VSClinical automatically re-aligns the variant to the canonical 3’ transcript position, but allows you to view the variant in either alignment. This particular variant moves 12bp and across a intron-exon boundary and would be completely mis-classified by any tool not performing right-shifting normalization.
Final criteria listed here (PP5 & PM5) relate to nearby and previously submitted classifications for this variant as likely pathogenic in ClinVar.
9. Example 4 Uncertain Significance PTEN Stop Gain
For the last example select variant number 4. First move to the Population Frequency section to analyze the population counts. It is shown again in this example that this variant is novel, so it captures the first layer of moderate evidence PM2. Next, move on to the Tolerant Loss of Function Mutations section.
Here we can further analyze the frameshift variant in PTEN. The PVS1 criteria is presented for null variants in a gene where LOF is a known mechanism of disease (and note that PVS1 is the very strong evidence toward pathogenic classification). VSClinical provides critical insight into the fact that this variant is in the last exon of this gene, but also there is also one other pathogenic stop gain variant downstream of this one. This is the justification for the PVS1_Strong criteria.
Now we’ll take a look at reporting capabilities with the RAF1 Pathogenic variant evaluated earlier.
10. Using Interpretations in Reports
Now click on the Report tab at the top of the VSClinical page. You’ll see open fields available to enter sample level data. Feel free to test filling these sample fields, but ultimately select the Microsoft Word icon on the right panel when ready to render the report.
In the center of the screen, click the New Report Template icon, and choose from the pre-made templates. Select the Mendelian Disorder Template and label it ACMG tutorial.
Next, click the Render icon
If you have Microsoft Word on your machine, you should see it automatically pop up the rendered report from Word. Here you’ll see the sample and lab information at the top as well as the variant interpretation from the RAF1 Pathogenic variant we flagged for Primary Findings.
To modify any report template structure, click on the ACMG Tutorial.docx open location link. This document presents the location any data will end up in the final report and can easily be rearranged or modified to suit your preferred template layout.
11. Making a Project Template
Finally, if we want to repeat all of these steps with data from a new set of samples, VarSeq supports saving any project as a Project Template that allows you to recreate the exact same project with a new set of input VCFs.
Simply go to File > Save Project as Template… and provide a new name for the template. You also have the option of providing documentation for users who will see it in the New Project dialog.
That concludes the VarSeq VSClinical ACMG Guidelines workflow tutorial.
This tutorial was designed to give a taste of all the features and capabilities of VarSeq and a brief orientation to key features.
If you are interested in getting a demo license to try out additional features that require an active license, such as creating a project, adding annotation sources, and saving project, please request a demo from: Discover VarSeq
If you have an active license, we encourage you to try out the intermediate tutorial on Cancer Gene Panels: Cancer Gene Panel Tutorial
Additional features and capabilities are being added all the time, so if you do not see a feature you need for your workflows please do not hesitate to let us know!