Oncogenicity Scoring in VSClinical

Summary

Before accessing the clinical evidence associated with a specific variation, one must establish that the variant is likely to be a driver mutation, which generates functional changes that enhance tumor cell proliferation. In this webcast, we will discuss VSClinical’s capabilities for determining the oncogenicity of a variant. This will include a deep dive into our oncogenicity scoring system and a discussion of the various criteria used to distinguish driver mutations from benign variations and variants of uncertain significance.

What you will learn in this webcast:

  • How to evaluate the oncogenicity of a variant in VSClinical
  • What evidence to consider when classifying LoF variants
  • How to examine the in-silico evidence for missense variants
  • How to evaluate a variant's rate of occurrence in somatic catalogs

About the Presenter

Dr. Nathan Fortier

Nathan Fortier, Ph.D, Director of Research for Golden Helix, joined the development team in June of 2014. Nathan obtained his Bachelor’s degree in Software Engineering from Montana Tech University in May 2011, received a Master’s degree in Computer Science from Montana State University in May 2014, and received his Ph.D. in Computer Science from Montana State University in May 2015. Nathan works on data curation, script development, and product code. When not working, Nathan enjoys hiking and playing music.

Transcript

Before I jump into a discussion of the AMP guidelines and our oncogenicity scoring, I'd like to take a moment to talk about our NIH funding. So, the research that we're reporting here was supported by the National Institute of General Medical Sciences of the National Institutes of Health under the following awards. The PI under this grant is Andreas Scherer, Ph.D., and the content that I'm presenting today is the sole responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

I'd like to take a moment to talk about who we are as a company. Golden Helix is a global bioinformatics software and analytics company that enables research and clinical practices to analyze large genomic data sets. We were originally founded in 1998, based on pharmacogenomics work performed at GlaxoSmithKline who was and still is a primary investor in our company. Now, we currently have three flagship products: VarSeq, SVS, and VSWarehouse. SVS is our research application and it enables researchers to perform complex analysis and visualizations on genomic and phenotypic data. And VarSeq, on the other hand, is our clinical application platform that's used for filtering and annotating variants of interest.

And of course all the information produced from VarSeq can then be stored in our VSWarehouse solution, which is designed to be installed on a server location and serve as a repository for your variants, evaluations, annotations, and hosted reports.

Our software has been very well received by the industry over the years and we've been cited in thousands of peer-reviewed publications and it's really a testament to our customer base. We work with over 400 organizations all over the globe. This includes pharmaceutical companies, top-tier institutions such as Stanford and Yale, government organizations, clinics, and genetic testing labs. Our products boast well over 20,000 installations and thousands of unique users.

This means that over the course of 20 years, our products have received a lot of user feedback which were always very receptive to when developing and releasing new versions of our products. This user feedback allows our software to stay relevant and well vetted in its capabilities and quality which builds our products reputation, trust, and client experience. We also stay on the forefront of the needs of the industry and community by regularly attending conferences and providing useful product information via eBooks, tutorials, and blog posts.

Your access to the software is a simple subscription-based model where we don't charge per-sample, nor per-version. You also maintain full access to our support and training staff to help you get up to speed quickly and get going with your analysis.

Now, the Golden Helix stack supports the entire workflow for NGS genetic testing of cancer - all the way from FASTQ to clinical reports. Through our partnership with Sentieon, we allow you to small variants, such as SNPs and insertions and deletions. Through VS-CNV, we allow you to call both large and small CNVs including amplifications and deletions.

Once you've called your variants and you've called your CNVs, you can annotate, filter, and prioritize your SNPs and CNVs using VarSeq. And then once you have filtered down to a set of interesting SNPs and CNVs, you can then interpret these using VSClinical. And, of course, you can interpret variants in accordance with the ACMG guidelines and AMP guidelines for germline and somatic variations respectively. And once you've completed your interpretation, you can generate beautiful clinical reports that can then be stored as either Microsoft Word format or a PDF report. And of course, all variants in any of your VarSeq projects can be exported as Excel or as plain text.

So now I'd like to go ahead and talk a little bit about the biology of cancer. Now, the reality is that cancer is understood at a patient level by understanding the underlying biology, and that biology is driven by genetics as cancer is a disease of the genome. To best understand the biology of cancer, the seminal paper “Hallmarks of Cancer” really digs into to the cellular mechanisms that drive tumor genesis and the proliferation of metastasis. These hallmarks include things like angiogenesis and escaping programmed cell death. For a given gene, it can promote or suppress one or more of these Hallmarks. Understanding these specific drivers as they relate to the genetics of a tumor in a specific cancer opens new opportunities for things like targeted molecular therapies.

For example, if we're looking at an oncogene, then we know we're looking for variants that are going to be gains of function for that gene. While if on the other hand, we're looking at a tumor suppressor gene such as TP53 that then tells us that we're going to be looking for variants that are likely to be causing a loss-of-function in that gene.

Now, many biomarkers relevant to cancer treatment diagnosis and prognosis can be identified using a Next-Generation Sequencing panel, which is why we see the proliferation of these kits from Ion Torrent and Illumina that are oriented around the 50-100 most commonly mutated oncogenes and tumor suppressor genes in cancer.

The identification of relevant biomarkers can provide the clinician with indications for treatment along with prognostic or diagnostic outcomes. Now, these biomarkers range from the presence or absence of certain proteins, antigens, and specific genomic attributes of the tumor. And many of these biomarkers such as BRAF V600E or CNVs such as ERBB2 amplifications can be identified through Next-Generation Sequencing.

Now, of course, once you've identified these biomarkers, it is important to have a set of best practices to guide the interpretation of these variants. And the ACMG guidelines provide such best practices in the context of the somatic variants.

The AMP guidelines include evidence tier system for assessing the quality of the evidence for the biomarker under consideration within the context of the patient's tumor type. Tier 1 evidence includes things like FDA-approved therapies for targeting a specific biomarker and tumor type. Tier 2 evidence includes FDA-approved therapies for different tumor types along with investigational therapies. And Tiers 3 and 4 are reserved for variants that are either of uncertain significance or that are benign, respectively.

VSClinical implements all of the requirements described in the AMP guidelines in a guided workflow that supports the scoring and interpretation of all variants and produces high-quality and customizable clinical reports.

The primary value of our workflow is reuse of work - all interpretations that are saved can then be later reused. Variants can of course be saved at the variant level, but they can also be saved at the gene or exon level. For example, if you're saving an interpretation for a damaging variant in TP53, then you may want to save your interpretation within the scope of all loss-of-function variants in TP53. So, next time you see any damaging variant in that gene, you can then reuse that previously saved interpretation, even if it is not the exact same variant.

The VSClinical AMP workflow also provides the clinician with tools for determining if a variant is likely to be a driver mutation that should be interpreted in the first place. And these tools are part of our oncogenicity scoring system, and this is going to be the crux of today's talk.

When developing our oncogenicity scoring system, we've worked with a number of different stakeholders. And the whole point of the scoring system is to allow you to determine if a variant is likely to be a driver mutation, and whether that variant is worth considering in terms of interpretation. Now, the idea here is that we're going to be using an additive scoring system and when scores exceed a certain threshold, then they'll be classified into certain classes. And so, for example, scores exceeding a threshold of 3 are classified as ‘likely oncogenic’ or ‘oncogenic’ while variants that have scores that are negative 3 or lower will be considered ‘benign’ or ‘likely benign’.

And on this slide, you can see the kinds of evidence that we take into account when determining the oncogenicity of a given variant. When determining if a variant is benign, we rely on things like population frequencies, whether the variant is homozygous in controls, whether the variant is a silent or intronic variant, or a variant in a UTR that is known to have no effect on splicing. And evidence that we use to determine that a variant is oncogenic includes things like the variant’s occurrence in somatic catalogs such as COSMIC, the relevant assessments associated with that variant in databases, like CIViC and ClinVar.

We also take into account the effect of the variant, so for example, if you're looking at a loss-of-function variant, then our software will examine whether loss-of-function variants tend to be oncogenic in that gene by looking at databases such as COSMIC, and what kinds of mutations are in that gene within COSMIC. And if you're looking at a missense variation, then we will look to see what kinds of very missense variants we are seeing in the nearby region. Do we see other pathogenic variants within the same region, or do we see benign missense variations within this region? And of course, we check to see if the variant occurs in a hot spot or in an active binding site. And of course, for any variant, we'll take into account all available computational evidence. This includes things like missense prediction algorithms such as SIFT and PolyPhen, along with conservation scores such as GERP and PhyloP, and of course splice site prediction algorithms.

So, with that brief overview I would like to go ahead and jump into a demonstration of the product so you can see how this really works in practice. So, this is the screen that you'll see when you first open the AMP guidelines. Now, the idea here is that we've already filtered down to a set of interesting variants that we'd like to interpret. And now we're going to go ahead and examine these variants. The first screen here just shows simple information about the sample and about the patient. But beyond this, you can also specify the specific kind of cancer that we're looking at here. And so, in this case, I've selected that we're actually looking at a Colorectal Adenocarcinoma. And later when we start considering some of the evidence for a given biomarker we will try to look at evidence that is relevant to your specified tumor type.

The next screen we have here is our mutation profile. And this is where you specify which variants you're going to interpret. Now, of course, you can always import any small variants or CNVs that are in your current project. But, we also allow you to manually enter small variants, to import or manually enter CNVs that were called using other methodologies, and we also allow you to enter infusions and wild types as well.

Once you specify the set of variants that you would like to interpret, this is where the oncogenicity scoring would come in. So, this is the oncogenicity scoring screen where you can examine whether a variant is likely to be a driver mutation. And it starts out by giving you just some basic overview information about the variant. You know, which chromosome does it occur in, which gene does it occur in and you know, what kinds of evidence are we seeing associated with this variant?

If we scroll down to this next section, we can actually see the specifics of the variant’s oncogenicity. Now this variant is kind of a textbook example of a driver mutation. It's actually a G12D mutation in inRoss. And so, it's a very well-known mutation. And of course, we're classifying it as oncogenic with a score of 10 that is well-above our threshold for classifying a variant as oncogenic. And if we look on the left here, we can see the specific pieces of evidence that were using to obtain this classification.

So for example, we can see that the variant occurs in nearly a thousand samples in COSMIC. It's been previously classified as pathogenic and ClinVar. We can see that we have a large amount of computational evidence indicating that this variants likely to be damaging, both SIFT and PolyPhen predict that it's damaging. It is conserved across all mammalian species in our multiple species’ alignment and GERP and PhyloP predicted the region is conserved. We can see that the variant occurs in both the cancer hot spot as well as an active binding site, and we can also see that 7 variants within 6 amino acid positions of our variant of interest have been shown to be pathogenic and none have been shown to be benign. And all this is giving a strong evidence of this oncogenicity. Now for any of these criteria, it's really easy to deep dive into it and look at the data yourself and examine it. And so for example, we could do this with the insilico prediction algorithms. If we just click on this arrow here, we can see the specific reasons why we're recommending this specific criteria.

And if we scroll up we can see the specific scores for our prediction algorithms. And so here we can see that SIFT and PolyPhen both predicted to be damaging and we can see that it appears to be highly conserved in PhyloP and GERP.

Now if we look at the right, we can actually see an alignment of 100 different vertebrates, and we can see exactly what things look like at this position in those species. And we can get a good feel for how conserve this region is. So, as we scroll down here, we can really see how highly conserved this particular position is. In fact, it's conserved all the way down to Lamprey. So clearly this is a highly conserved region.

So, we can go back. And now we can examine some of these other criteria that we have here. Now the next one I'd like to deep dive into is this criteria: NP+1. This is the criteria for a nearby pathogenic variant. So what it's asking is do we have a missense variant in a region that contains multiple pathogenic variants, but no nearby benign variants? And we can see the reason we're being recommended this: 7 variants within 6 amino acid positions of our variant of interest have been shown to be pathogenic. But, if we scroll up, we can look at this evidence our self.

So, in this table, we present all variants in ClinVar. And in the future, it will also present all variants in CIViC, and includes variants that are in your own internal database that you've previously classified along with information on how those variants have been classified and what kind of variant you're looking at. And of course, you can use these controls at the top to change the scope of the variants we're looking at. So, for example, we could look at variants that are only in the same codon as our variant of interest. And since our variant is a missense variation, we could go ahead and only look at missense variants within the same codon as a variant of interest. And if we do this, we can clearly see that we have 6 variants within the same codon this or variant of interest in every one of them have been classified as ‘pathogenic’. And so, we can see the exact reasons why this particular criteria is being recommended to us.

So that covers a really straightforward oncogenic variation. Let's look at a variant that maybe is a little bit less straightforward. So here we have a mutation in MLH1 and you'll notice that if you look at the c.790+5G>T notation here, this is actually an intronic variant or actually 5 base pairs into the intron of this variant. And so given the fact that this variants intronic, it's very interesting that we're recommending a ‘likely oncogenic’ classification.

So, let's examine why this might be the case. Well, our first indication in this direction is the fact that this region appears to be highly conserved. So, let's go ahead and look at our multiple species alignment again. Here we can see for our 100 vertebrates, if we scroll down, yes in fact this region is very highly conserved. Again, it's conserved all the way down to Lamprey and that's interesting for a variant that is in an intronic region. But if we look here on the left, we can clearly see what's going on… all 4 of our splice site prediction algorithms are predicting this variant will disrupt an existing splice site, and that seems to explain why this region is so highly conserved.

And sure enough if we scroll down we can see that normally, for a silent variant such as an intronic variant, that we would generally recommend that to be evidence of the variant being benign. But because we have 4/4 splice site prediction algorithms predicting this variant to disrupt an existing splice site, we are no longer going to recommend this evidence of the variant being ‘benign’ because we do not know that this variant has no impact. In fact, we have evidence that this variant may disrupt a splice site and may in fact cause a loss-of-function of this gene. And likewise, we can see that we're recommending a piece of evidence related to the splice site predictions. We're recommending moderate evidence in support of oncogenicity because all four of our splice site algorithms predict that this variant will disrupt splicing.

Now it's worth noting that if we only have one of these pieces of evidence, it would not be enough to give us a ‘likely oncogenic’ classification. If the region was highly conserved, but we had no evidence that the variant impacted splicing, then this would result in a variant of uncertain significance. Likewise, if all of our algorithms predicted the variant to impact splicing but the region didn't appear to be conserved, then that again would put this variant into the uncertain significance category. It's only with both of these pieces of evidence together that we can determine that, yes, this variant is likely to be oncogenic despite the fact that it's intronic.

So, let's look at another variant that's a little less straightforward. So here we have a loss-of-function mutation. We can see here with the description of the variant that it is actually a frameshift mutation. And so, it's interesting given that we have a frameshift mutation that it's been classified as ‘likely benign’. So, let's see why we might be reaching this classification.

What we do see is we have one piece of evidence that supports oncogenicity and that's the fact that it is loss-of-function.

We can see that we have a loss-of-function variant and it is well upstream from the penultimate exon junction. We can see specifically that it's about 6,000 base pairs away from the penultimate exon. So normally we would expect this to be a damaging variant. But if you know anything about the gene MUC4, then you will know that it's an oncogene and what we're expecting here is a gene that's going to be activating this gene in order to be a driver mutation. And so, to determine whether this is the case this is where this next piece of evidence would come in.

What this is doing is it's looking in COSMIC to see what kinds of variants are there in terms of their effect. And what we can see is that loss-of-function variants in this gene only make up less than 1% of the variants in this gene in COSMIC. And so that indicates to us that the loss-of-function variants are not a mechanism for activating this gene. And therefore, we don't get this additional evidence that we would normally get if we in fact found that loss-of-function variants had an activating effect on this gene.

And then beyond that we can see that this variant also occurs at a high frequency. And that's what this criteria PF5 is saying; it's saying that this variant is common in at least one of our population catalogs and because of that, that is considered strong evidence that the variant is benign. And if we scroll up here, we can actually look at a plot of that population frequency information. And so we can see here in gnomAD that in all individuals it occurs at about 2.4% in an heterozygous state. And if we look at Finnish individuals in particular, we can see that this variant occurs at about 6% in a heterozygous state and this is much, much higher than the frequency we would expect for a truly oncogenic variant. And so, despite the fact that we're looking at a loss-of-function variants here, we have good reason to believe that this variant is ‘likely benign’, and it's probably not worth interpreting.

All right, so I want to look at one more variant here on our oncogenicity scoring system. And this is a missense mutation in the gene APC. So you'll notice again this variant does occur in COSMIC much like our variant that we saw in inRoss, but it's also worth noting that it occurs in much fewer samples. The variant that we were looking at in inRoss occurred in nearly 1,000 samples - this variant only occurs in four samples in COSMIC.

And as a result, its occurrence in COSMIC is only considered a piece of weak supporting evidence, which is why we only get a +1 for this criteria. We also notice that we do have some insilico evidence, its predicted damaging by both SIFT and PolyPhen, and the region is conserved across all mammalian species.

We can also see that the variant is predicted to disrupt splicing by all of our splice site algorithms. And finally, we can see that there are three variants within six amino acids positions of our variant of interest that have been shown to be pathogenic, while none have been shown to be benign. And it’s taking all of this evidence, and in combination, we can reach this oncogenic classification for this variant in APC. So, once you've gone through and you've determined that you have this list of variants that are either oncogenic were likely to be oncogenic, you can then go ahead and interpret those variants, and this is where our biomarkers tab comes into play. So, this biomarkers tab allows you to write interpretations from scratch or to modify interpretations that were pulled in from our cancerKB.

So, on the right here, we can actually see a review of the biomarker itself saying which specific biomarker we're looking at here. And then here, a little further right, we can see descriptions relating to the gene. Specifically its descriptions pulled from COSMICs’ “Hallmarks of Cancer”, and we can also see information about the specific transcript that we're looking at here. If we scroll down a little bit further we can actually see information about our gene summary and in this gene summary page what we have is a number of different sources including things like COSMIC, CIViC, and Genetics Home Reference. And all of this is providing you with some information about the specific gene.

Now here on the left, you can see some of this interpretation information that has been pulled in from our cancerKB. And of course, any of this information can be edited or added to. So, for example, we could easily select some of our information from COSMIC Hallmarks. We could copy that, we can insert that into our interpretation. And of course, that will go ahead and update our inline references based on the presence of this PubMed ID.

And of course once we've edited any of this information, we can go ahead and save that to our knowledge base. So, we can go ahead we can select review and Save, and we can actually save this information. And then, any time we see a mutation in the same gene, this gene-level information that we've just saved will come in and be shown to us.

Next we have our frequency and outcome section. And the idea here is that it's describing the impact of the gene APC specifically on colorectal cancer, and how often it is mutated within this context. And this includes information on the mutation rate for various tissue types in both COSMIC as well as MSK impact. And so here we can see that this gene tends to be mutated very frequently for bowel cancer in both of these databases. Now the screen also shows you any information on related interpretations of the same tissue type so that you can take that into account in your interpretations.

The next section we have here is our biomarker summary, and this is showing you information related to the oncogenicity of the variant that we discussed in the previous section.

And then finally, we have our clinical evidence section. And this includes all the clinical evidence obtained from DrugBank, PMKB, and CIViC. It includes information on drug sensitivity, drug resistance, prognostic evidence, as well as diagnostic evidence. And ultimately this information is really what makes a variant reportable.

Now for our particular variant of interest, we can see we do have a few things. We don't have any information that would allow us to report this variant is Tier 1, but we do have some preclinical evidence that we could include in our report here. And we make it really easy to add any of these drugs to your report, you simply click the plus button by the name of the drug, you can add that to the report, you can easily add any text to your interpretation. And of course, you can add in any relevant references as well and those will automatically get included in your report based on our lookup of the PubMed ID.

And so that gives you an idea of how you can go through and interpret some of these biomarkers. If you want to more thorough discussion of this, I definitely recommend looking at Gabe Rudy's webcast. That was our previous webcast on the topic where he really deep dives into the interpretation of these biomarkers. But once you've gone ahead and done this interpretation, then you can go ahead and generate a report. And with a simple click of a button you can either generate a Microsoft Word report or you can generate a PDF version of this report. Once you've generated your report we can go ahead and look at that. So here we can see a list of the biomarkers that we're reporting. We can see detailed information on any of this interpretation that we've entered in the previous screen will be shown here including information about drug sensitivity. So here we can see all of our information that we've entered on this inRoss G12D mutation, and we can also see any information that we've entered on the APC mutation that we just covered, along with that drug sensitivity information that we entered. And of course, again any PubMed IDs that you've entered anywhere here, will go ahead and get imported as in line references by PubMed with links out to the specific article that you cited.

And so that concludes our overview here of our oncogenicity scoring system based on the AMP guidelines now, we do have some upcoming webcast. I'd like to mention, in August we have a webcast that will be done by Eli Sward, Ph.D., on using VSClinical AMP guidelines to perform cancer testing. And then Gabe Rudy will be presenting another webcast in September on cancer interpretation reuse and Golden Helix’s cancerKB that we touched on a little bit today. And again, I'd like to thank the NIH - all the research that we've shown today was supported through the National Institute of General Medical Sciences of the NIH under the following award numbers. And the content that I've shown today is solely the responsibility of the authors and it doesn't necessarily reflect the official views of the NIH.

First question is can I include germline variants in my clinical report?

Yes, you definitely can. It's in fact very easy. If you have a variant you can actually mark that variant as being a germline variant and then you can interpret it using our ACMG guidelines workflow which will guide you through the process of interpreting the variant in accordance with the ACMG guidelines. Once that interpretation is complete you can then include the variant as a secondary germline finding in your report.

Okay, how do I filter my variants to select those to use in the workflow?

Yeah, so VarSeq has a robust filtering interface and a robust collection of annotations. And so you can annotate against things like the population catalog such as 1,000 genomes and gnomAD. You can annotate against COSMIC, and you can annotate against ClinVar. And of course you can filter down to any relevant variants based on any of those annotations and we also have numerous pre-built templates to help get you up and running with that filtering process. It's also worth mentioning as well that for our ACMG guidelines workflow, we have an automated classification algorithm that you can run that will go ahead and attempt to automatically classify all of the variants within your project, allowing you to easily filter out some of these obviously benign variants.

Great. Thank you, Nathan. It looks like that's you've wrapped up all the questions there and thank you for speaking today. And thank you to everyone else who joined us. As Nathan mentioned, we have a lot of webcasts coming up, so we hope that you will join us for those – July, August, and September are full! That wraps up today's presentation, I just reminder to fill out that exit survey that will pop up in a minute here and hope you all have a great rest of your day. Thanks again Nathan!