Hello, and welcome to today's webcast presentation, clinical validation of copy number variants using the AMP guidelines. We want to thank all of you for joining us today, and also our presenter, Dr. Eli Sward. Eli, thank you so much for joining us.
Yeah. Thanks for having me.
Before I pass things over to you Eli. I would just like to mention to all of our attendees that throughout the presentation, we will be accepting questions via the questions tab of your GoToWebinar panel. So if something pops up during the webcast you can go ahead and enter those into the tab, and at the end Eli will be answering some live Q&A. So Eli, I will go ahead and pass things over to you.
Perfect. Thank you.
I wanted to thank all of you for joining today. The subject of this webcast will be exploring the clinical validation of copy number variants using the AMP guidelines with focus on quick, consistent, and comprehensive variant interpretation paired with standardizing clinical reporting. But before we jump into the project, I did want to inform any newcomers with some background of our company.
As Delaina mentioned, if you do have any questions throughout the demonstration, don't hesitate to put them in the questions tab. I'll try to answer them either during the demonstration or after, so feel free to use that option.
But first and foremost, we recently received a grant funding from the NIH which we are incredibly grateful for. The research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institute of Health under these listed awards. Additionally, we are also grateful for receiving local grant funding from the state of Montana.
Our PI is Dr. Andreas Scherer, who is also the CEO of Golden Helix. The content described today, which will be talking about, is the responsibility of the authors and does not officially represent the views of the NIH. Again, we are very thankful for the grants such as this which provides huge momentum in developing the quality software we provide. So now let's learn a little bit more about Golden Helix as a company.
Golden Helix is a global bioinformatics company that was originally founded in 1998, and it was based off work performed at GlaxoSmithKline who was and still is a primary investor in our company. We currently have two Flagship products, which are VarSeq and SNP & Variation Suite, SVS for short.
VarSeq is our tertiary analysis tool that is used in the clinical setting for variant annotation filtering, and now users have the ability to automate these evaluations according to the AMP and ACMG guidelines, as well as render clinical reports with ease. VarSeq has a capability to also detect copy number variations, which will be the primary focus of today's webcast.
For large-scale high-throughput labs, there is the VSPipeline solution, which can automate this entire process. Paired with VarSeq is VSWarehouse, and this ultimately solves the issue of data storage for the ever-increasing genomic content that your lab may obtain, but it is also fully queryable and auditable with definability of user access for project managers or collaborators.
Lastly, we won't be talking about it today, but we have our research application platform, which is SVS. This enables complex analysis and visualizations on genomic and phenotypic data, including the ability to perform genome-wide association studies, genomic prediction, RNA sequencing analysis as well as process CNVs.
We're also cited in thousands of peer-reviewed publications, including high-impact journals such as Science, Nature, and Nature Genetics. We have over 400 customers globally, including top-tier institutions such as Stanford and Yale, governmental organizations, clinical settings such as SickKids, and genetic testing labs. There's quite a consortium of customers that we have; and we have over 20,000 installs of our products and thousands of unique users.
Why this is relevant to you? Over the course of 20 years our products have received a lot of user feedback, so our software is extremely well-vetted.
We also stay relevant in the community by regularly attending conferences and providing useful product information via eBooks, tutorials, and blog posts, which I would highly recommend checking out. Additionally, Golden Helix provides innovative software solutions such as our CNV caller, as well as others which have received active grant funding, which was mentioned earlier. Most importantly, your access to the software is based on an annual subscription model. You're not charged on a per-sample basis, and you have full access to myself and other field application scientists to help get you up to speed quickly with your analysis.
So now let's talk a little bit more about our product stack. Golden Helix partners with Sentieon to provide the capability to start with an initial FASTQ file all the way down to a clinical report. Sentieon provides the alignment and variant calling steps to produce the VFC and BAM files, which serve as a basis for CNV detection and import for your tertiary analysis in VarSeq. If you are performing NGS-based CNV analysis, Golden Helix is the market leader. We're supported by studies like Robarts Research Institute, which show 100% concordance with MLPA and CNV event detection.
Additionally, the imported variants in your VarSeq project can be run through the VSClinical automated ACMG and AMP guidelines. After completing the secondary and tertiary processing, all analysis can be rendered into a clinical report which can be stored in our VSWarehouse solution. Ultimately, this provides researchers and clinicians with access to this information to view those previous findings. But since the focus of this demonstration is on CNVs, let's look a little bit more into this innovative feature.
To best explain the value of the VS-CNV detection, we can compare it against the traditional methods, which include MLPA and CMA, both of which require outsourcing. So MLPA is ideally tailored for detecting smaller events centered around 150 base pairs and is tailored for a single or a few genes, but it's also expensive so it can cost around $80 per gene. One additional con to the MLPA is an inability to detect larger events, which chromosomal microarrays can handle. So the large aneuploidy level of CMA event detection is typically from 10 kilo base pairs or larger, but similarly fails to detect smaller CNV events like MLPA. Now, on the other hand, the CNV detection with your NGS data and VarSeq accurately detects events ranging from single exon to aneuploidy events and gives users full capability to process everything from small gene panels up to whole genome data sets.
Furthermore, this all occurs in one software suite and it's not outsourced, which can save you a fortune on your assays and reduce your analysis time. So next question is how are the CNVs detected using your NGS data? Well NGS based detection of CNVs starts with the coverage data stored in the BAM file.
Now inherently, the coverage profile can differ between samples and targets due to systematic biases, which you can see on the image on the right-hand side. You can see that the coverage can differ between samples, but our development team created an algorithm that provides a solution to these associated challenges. So, the solution is actually to normalize the coverage data and to use a set of user-defined references to represent a normal diploid region. For this approach to work effectively, we do recommend having a reference set composed of no less than 30 samples that come from the same library platform and preparation methods.
Furthermore, for targeted gene panels and whole exomes, we do recommend an average coverage of 100x; and obviously if you were looking at whole genome, it'd be more shallow coverage data, which would be less. Following the correct setup, a normalized reference set is then compared to the normalized coverage for your sample of interest, which then computes the CNV events.
This algorithm then validates those events using metrics such as the ratio and z-score. So, the ratio is simply a reflection in coverage. It's a reflection of the change in coverage between your sample and the reference set. A decrease in ratio to a value of around 0.5 is indicative of a heterozygous deletion, which you can see; and as well as a ratio of around approximately 1.5 is indicative of a possible duplication.
We have an image here which we can see; we have displayed the ratio. A normal diploid region would be centered around 1, and we have this ratio increase in coverage with the ratio of around 1.5 indicating that we are looking at a duplication event over these BRCA2 exons. Additionally, the z-score is actually the standard deviations of the samples normalized coverage relative to the reference set. You can see that in this image here, which shows the standard deviations ranging from around 6 to around 3.5, which paired together with the ratio is strong evidence for a duplication event. Now in addition to these metrics, there's also an introspective capability of adding certainty to the detected event by generating a p-value, and the CNV caller also offers an extensive database to annotate your CNVs.
In addition to the p-value, these annotations include the ability to detect known events in healthy populations such as DGV, ExAC, and 1kG, as well as to detect CNVs in repeat regions as well as even to determine maybe their classification of those CNV events detected. So it really applies to the tertiary analysis for CNV annotations. But ultimately this is all performed in one suite and it can prevent the need to outsource to other limited and costly traditional methods.
So now that, you know a little bit more about our caller, I wanted to point out that calling CNVss from your NGS data is actually very well validated. Our CNV software has been adopted by hundreds of users and cited in over 15 publications, including well-regarded journals focusing on a variety of different topics. Most importantly a paper by Iacocca et al. from the Robarts Research Institute compared our software to the traditional methods.
Which they ultimately found 100% concordance in the CNV event detection and highlighted that our software is an accurate and cost-effective tool in the CNV workspace. If you're interested about these papers, feel free to reach out to us and we can always provide you with this information. So now that we have discussed the CNV caller, let's briefly talk about the VSClinical AMP feature, which will be the primary focus of today. VSClinical is a solution to automate the evaluation of both germline and somatic variants including single nucleotide variants indels, gene fusions, and CNVs according to the ACMG and AMP guidelines. Now automation of these guidelines is important because it promotes consistency and standardizations in variant and biomarker evaluations, which can ultimately reduce workflow fatigue and improve accuracy.
VSClinical will also provide users with the most up-to-date variant knowledge, evidence, and annotations, which is a solution to manually managing the evolving knowledge and databases for any given variant. A more subtle value, but just as critical, is the educational interface VSClinical provides which familiarizes new users to the AMP and ACMG guidelines. So this can be extremely valuable when teaching a new employee in your workspace or a group of students potentially in your classroom. Lastly, clinicians can also benefit from the fact that we integrate Word-based reporting capabilities directly into the software, which allows for unique and lab specific report customizations and prevents the need for outsourcing the reporting pipeline. So now that we have discussed some of the primary values of VSClinical, let's deep dive a little bit more into the AMP workspace.
In the AMP workspace, the ultimate goal is to have a reportable biomarker, which is ultimately a variant that affects clinical care by having drug sensitivity, drug resistance, prognostic or diagnostic outcomes.
So biomarkers are largely interpreted using the clinical evidence available and is now standardized using the AMP guidelines which suggests a clustering of available clinical evidence in the four tier levels that most of you are likely familiar with. Variants that reach a Tier 1 are those that have strong clinical significance, or in other words known FDA approved therapies or well-powered clinical studies from experts in the field. It ranges from Tier 1 all the way down to Tier 4 for variants that are either likely benign or benign due to high allele frequencies and population databases, or they have no published evidence for association through the cancer.
However, the reality is that capturing all relevant evidence to determine the tier level is a large undertaking. For example, how does one efficiently process the rapidly growing knowledge of a Tier 3 or variants of uncertain significance in a consistent routine fashion with the exponentially evolving and growing variant evidence? This question is actually the premise for the automation of the AMP guidelines with VSClinical, which actually does so by leveraging information from the most up-to-date annotation sources. So with the publication of an increasing number of large-scale genomic sequencing projects for a variety of tumor types, a wealth of genomic information is being generated and consolidated into many public databases globally. Here, some of the evolving cancer databases that are automatically incorporated and updated into the VSClinical AMP guidelines. These databases contain information about the variant frequencies, known somatic mutations, as well as functional predictions and treatment / clinical trial information such as DrugBank, PMKB, and CiVIC.
Some popular cancer databases include COSMIC and CiVIC, which you can see here. These ultimately contain all the known information about each variant, tumor location, histology and internet links to original publications and case reports. We also support clinical drug and prediction annotations such as DrugBank, PMKB and clinical trial information, which can provide insight into FDA-approved drugs for a given gene or biomarker. Furthermore, not seen on this list, but incredibly useful to users, is our Golden Helix supported CancerKB catalog.
Built into the VSClinical AMP feature is the Golden Helix CancerKB catalog, which is an interpretation knowledge base that covers many common genes and biomarkers. It's developed by a team of expert curators in a clinical context that writes interpretations for the most common genes, most common biomarkers, and the most common cancers. What this means is that if you add a variant to your report and have not interpreted it before, you may be provided with a jump-start on your interpretation with information on the gene level and sometimes biomarker and even targeted therapy options. Additionally, this is an evolving database, so if you would like, you can contribute your own interpretations back to us anonymously, and our professional team will integrate all interpretations on a regular basis back into the knowledge base.
Following the variant interpretation and classification is the ability to easily and quickly create standardized clinical reports. Although all the relevant annotation and catalog evidence for a variant is automatically populated into the report, and report customizations can be easily done it through a Microsoft Word interface. As mentioned in the previous slide, the time to reporting is greatly reduced even more when leveraging the CancerKB catalog, which has those predefined clinical assessments which we’ll show today. So now that we have the background let's break down what we will be investigating in today's demo.
Our product is based on a whole exome sequencing sample, which has a reference set of around 50 samples. In this data set the coverage statistics and CNV events are already precomputed, but we will outline how the steps were performed. Furthermore, the basis of this analysis focuses on a female that has been diagnosed with breast carcinoma. With this knowledge, we will use our full rank algorithm to identify the genes most highly associated with this disease. We will then navigate through the AMP interface, starting with demonstrating how CancerKB can allow one to render a clinical report quickly by automating the biomarker in therapy information; and then we’ll backtrack and show the steps performed to get to that interpretation.
So as I mentioned, if you have any questions throughout this demonstration up to now feel free to enter them into your Go-To Meeting Questions Tab, and I'll be sure to try to answer those as we go or maybe even at the end of the demonstration. With that said, let's focus on our VarSeq project.
As I mentioned, here we have a project that was already precomputed. But for those of you that are new to our software, I just wanted to outline some basic functionality. During the presentation we mentioned that VarSeq is implemented for your tertiary analysis. What we have in this project is we've imported our VCF files which contains our variants, as well as our BAM files, which will be used to compute our CNV events. This has the basic orientation in the upper right-hand corner.
You can see we have the number of samples that were imported as well as the number of variants which are listed here on a per-row basis. Previous webcasts have gone through great detail in explaining on how we can get down to variants of clinical significance using our filter chain, which you can see is already predefined, but the focus of this demonstration is actually going to be on our CNVs. The premise of the CNVs is actually utilizing the existing coverage data stored in your BAM file. In order to compare the differences in coverage, you have to build a reference set. The reference set is built by going to the Tools icon in the upper left-hand corner and going to manage reference samples. In this dialogue, we have all of the samples that I've essentially used in the past as my reference set.
It's really useful because you can not only store your references from targeted gene panels, but you can also store your references for a whole exomes or whole genomes all in the same interface.
So, when you go to load a new project and analyze those CNVs, it'll pull from those references that are directly tied to your sample of interest. What you can see here is we have a list of references. Obviously, if you were to start a project from scratch you would have a blank slate. In order to add your references, all you have to do is go down to this icon and select Add References. You can then define your sample BAMs. This is asking you to locate to your directory that has your BAM files for your reference set, and if we choose to we can go to our Add Files. I can go to my BAM files and upload just a couple as an example. So as recommended, we do recommend having a reference set composed of around 30 samples. You can continue to add to those new references as you go. So, what we've done now is we've added our BAM files. If we click on Next, this next interface is basically taking you through the option of defining your workflow.
For somatic mutations or germline mutations where you may be utilizing a target gene panel or looking at whole exome data, you would use this Targeted References option. What it's doing is it's asking you to select an interval source that defines the target coverage regions. So typically when you are looking at a target gene panel or whole exome data, you are provided with a BED file, and when you have that BED file, you can then choose to select that track and it brings you to the options of looking for your BED file in your local folder. Alternatively, you do have this Convert wizard icon, which you can upload that BED file relatively easily.
As mentioned, support is always included, so if you have any questions or would like us to convert it for you, we can do that as well. Once you have it converted and load it into your local folder, all you have to do is locate to that particular interval track, click Select, and it's now recognized.
What it's going to do is it's going to compute cover statistics over all of your BAM files that you've imported and over the specific intervals defined by your BED file. Once you click create you will then be presented with a dialog very similar to this where you have your sample name. You can see here we have approximately 54 exome samples. It has some unique nomenclature including the panel that was used and even a unique panel hash which allows the software to identify for your sample of interest what panel should be used for your reference set.
Once you've computed your coverage statistics for your reference set, which you can see is already done. The next thing that we need to do is compute the coverage statistics for our sample of interest for this particular sample that we're looking at. As I mentioned these coverage statistics were already precomputed. But if you were to do this on your own, it's relatively easy. You just have to go to this Add icon and go to the Secondary Tables and now you'll see a couple different options which include calling the coverage regions performing the LoH and then that CNV annotation. What we've done is we've already ran the coverage regions, which we have displayed here. What you can see is that for this particular sample, we have our intervals which again is defined by our BED file and then we have some basic summary statistics, including the min max, as well as the average coverage at 1, 20, 100, and 500x. So, as recommended, if you are looking at whole exome or targeted gene panels, you want to have an average coverage centered around 100x, it's kind of our gold standard which we validated the software with. So, once you've computed your coverage statistics, the next thing is to go back to your secondary tables and call LoH. LoH is based on the variant allele frequency being pulled in from your VCF file. So, if you're looking at targeted gene panels, it's likely that you're not going to have a large number of variants imported into your project and thus running the LoH algorithm isn't necessarily valid.
Typically, we recommend running the LoH algorithm for whole exome and whole genome data. What that looks like is we have our LoH output, which again you can see we have our intervals and then we have some basic information indicating if we're looking at an LoH event or not. Ultimately what this does is rules out noisy regions from the normalization process, so it allows the software to accurately detect CNVs from your sample of interest relative to your reference set.
So. we have already called LoH, and the next thing to do is to call our CNVs, which again is just going to the Add Secondary Tables; and then I'm going to our CNVs which we've already precomputed in our project. What we are displaying here is basically we have our CNV table and we have our predefined filter chain, which is something that you could ultimately save in a project template and run this for future samples.
What it’s shown us is we have our CNVs, which are basically identified from our existing coverage data stored in our BAM file, as well as some basic information including CNV information, the number of intervals, the CNV event coverage, as well as the size. So as mentioned in the presentation, our software can not only detect single exon events, but also large aneuploidy events, and then we also have some basic information regarding our sample. What we want to do is ultimately get down to the CNVs that are clinically relevant and then evaluate them according to the AMP guidelines. The way that we do that is by generating a filter chain, which is already created. The filter cards are all implemented from our CNV table.
So for example, we can see we have here the CNV State, and if we open up this dialog we're selecting for those that are deletions, duplications, or heterozygous deletions, which is being pulled from any field that's present in our CNV table just by right-clicking on the column header and saying add the Filter Chain. You'll now see it’s present and then you can select those values. Similarly, we've selected those for the CNV Staten, we've identified those CNVs that aren't associated with any quality flags. Ultimately, the question mark is indicating we're looking at high quality CNV events. And then again, we do have that introspective capability of defining the confidence of your CNV using the p-value. The p-value is a threshold that you can implement. Right now, it's set to a pretty strict standard which is 0.001, and you can see that just by applying that we're getting down to the CNV events that are of high quality. Now in addition to integrating filter cards coming in from our CNV caller, we also have the ability to annotate the CNV events, which is kind of one of the first areas beyond just having the innovative ability to call CNVs from your NGS data is actually to annotate your CNVs using public annotation sources. Some common ones that we have here are implementing the 1kG Phase 3, which is essentially we're removing those common CNVs that are present in maybe the 1000 genomes project. We're also looking at the ClinVar classification, and we're ruling out those that are maybe benign or likely benign. Now the question is, in addition to these annotations, do you have other annotations that we could use for our CNV analysis? We do have an extensive list of public annotations which can be accessed using this icon, which here you can see under our Public Annotations folder.
We list all of those annotations that could be integrated into your project, as well as even a subset that's specific to CNVs and large variants. So here as we're getting that 1kG Phase 3, ClinGen, as well as ClinVar. Once you select an annotation source, it’s then downloaded into your project and you can see those annotations on the right-hand side. For these individual CNV events, we can see that we have our annotation being pulled from RefSeq genes. It’s useful because it can show you the gene names that is present in as well as maybe the exons that it's overlapping. Furthermore, we also have those annotations including the presence in 1kG Phase 3, as well as the summary of those CNV events in ClinVar.
Now as I mentioned, this particular individual was diagnosed with breast cancer. A lot of labs might have a known phenotype, and if you ever want to integrate that phenotype into your analysis, all you have to do is go to the Add icon and go to Computed Data where we have an algorithm under this project cohort for our CNVs, which is the full rank algorithm.
The full rank algorithm, which you can see here, is based on the full rank algorithm, which essentially ranks you are genes in association to your particular phenotype. Now the higher the gene rank the more associated that particular gene is with your phenotype. What we've done is we ran that for breast cancer. For example, we scroll to the right here, we can see we ran this breast cancer full ranking algorithm which provided us with a gene rank score which we've added to our filter chain. And so with this filter chain, we're setting a pretty high threshold of 0.99 percent. So we only want to look at those genes that are most highly associated with our particular phenotype, which is this breast carcinoma. And so we're getting down to this handful of variants where we have around nine CNVs we can choose to investigate.
The focus of today is actually going to be on a BRCA2 CNV event. But before I kind of jump into the AMP guidelines, I also wanted to point out, we do have an exceptional visualization capability with our Genome Browse. Genome Browse is something that's accessed within the software and for your particular event, if we were to select a CNV event here, we can see that information being pulled up in Genome Browse. So again, you know it within this dialog what we have here is plotted is the ratio and the z-score which is just kind of showing you those changes in that coverage profile relative to the reference set. So for this heterozygous deletion, we have a ratio of around 0.2 to 0.6 indicating we're getting this het deletion with a z-score of around five to three standard deviations.
So this is kind of another way we can visualize. We can also plot your BAM file and look at that coverage and pileup information as well. Well, this is just showing you some basic visualizations. So ultimately what we've done is we've now identified potentially a heterozygous deletion that's causing a loss of function in our BRCA2 gene and we want to investigate this in the AMP guidelines.
The way that we can do that is if we open up the AMP guidelines interface, what we have is some basic information regarding a couple different tabs including the patient, mutation profile, variants, biomarkers and report. Now in this patient tab, you can enter in some basic information regarding your sample as well as your patient information, but more importantly you can select the current diagnosis or tumor type.
We do provide you with the available tissues as well as even the subsets. We also have some common ones here on the right-hand side. So what we're doing is we're selecting this invasive breast carcinoma defined by the tissue, defined by the breast tissue. What this will do is it'll really perpetuate the ability to provide adequate interpretations for this particular mutation, but then in addition to selecting this tumor type, we also have some basic sequencing summary statistics, which according to the AMP guidelines is relevant and important to include in your report. So this is just kind of showing us our distribution.
We have 20,000 plus variants showing us the distribution of heterozygous, homozygous variants as well as that variant type, it’s showing us the average coverage for our genes which is more valid information that we can incorporate into report. Now in addition to defining the tissue type as well as some basic information you also have the ability to enter in your variants into the AMP guidelines and so within each of these you can see here, you can insert a single nucleotide variants using the small variants. We have CNVs which we can see we have our BRCA2 mutation, but we also have the ability to evaluate gene fusions as well as considerations for wild-type genes. You not only have the ability to add your CNVs your SNVs or gene fusion wild-types from your projects, but you can also manually add them and investigate them according to the AMP guidelines.
And so what we've done here is we have our BRCA2 mutation, it’s showing us that we have an exon loss of between 13 and 14 and showing us the z-score and ratio that were computed with our software. Then we have the ability, we can see that it’s being reported as a biomarker which can be included in the report. Now when you add these variants into the AMP guidelines you'll also get a notification if it's potentially present in the Golden Helix CancerKB catalog, what that classification was and so it’s showing us that we have this BRCA2 deletion which is being classified as this Tier 1 Level A where we have drug sensitivity information, as well as gene summary biomarker in outcomes and frequencies. Before we kind of just deep dive into how we're getting to this Tier 1 classification, I really wanted to show you how valuable this CancerKB catalog is and how easy it is to render this information into a clinical report. And so the way that we can do that is if we go to our report tab, you'll see we have information being pulled in from you know, our results summary so we can choose if we wanted to we could add modify this information where we have multiple genomic alterations detecting including biomarkers from FDA approved drugs, as well as some summary on patient information, but we also have the biomarker results. This is the evidence that is necessary in a clinical setting for a report which allows the physician and the patient to interpret a report or the clinical evidence correctly. So we have classification evidence. We have the clinical significance as well as the outcomes and frequencies and biomarkers summaries. So ultimately what this is going to show us is the, you know, we have some drug sensitivity, which is it's recommending a particular drug. You can modify this information in this dialogue.
You'll also see that it's providing you with those in-line references, which will show how we got to here in a little bit but more importantly is the report capability, and so as I mentioned during the presentation our reports are entirely based on Word, and so if you wanted to you can choose to open up the template based on Word and make modifications and then when you're ready, all you have to do is click on the Render icon it'll upload your information into your Word report and then you can open up that dialogue. When you open up that dialogue you'll now see we have our reporting capability. And so the report is an essential part of any laboratory tests and should contain all of the information required for the ordering physician and the patient to know what exactly was tested, what results were obtained from the test, and any additional analytic factors that may influence the clinical interpretation of the results. Our template really provides a basis for that information. Here you can see you have your patient information. More importantly we have our particular mutation with our evidence level. So it's a Tier 1 level A we have that that data directly imported in an easy to view format.
We have our BRCA2, it's an exon loss, its associated with Tier 1 Level A Drug sensitivity for this Olaparib. We also provide you with the automated interpretations, including the clinical significance, outcome, and frequencies, and biomarker summary, as well as that drug sensitivity. In addition, you'll also notice that we have these PubMed IDs which are listed at the bottom of the report.
We list some NGS coverage report information including as well as methods and limitations for that report, but then we also have those Those inline citations. So this is all based on Word and includes all the information that is required by the AMP guidelines for reporting somatic mutations in cancer. And so in addition, if you don't have a Word based document, you can also choose to use our Cloud converter which will convert your document into a PDF which can then also be stored in your particular project as well.
So this is just kind of showing you that report capabilities with all of this Rich information which is provided by and aggregated by this Golden Helix CancerKB catalog. But now let's kind of take a step back and look at how this information was acquired. All of this information is really present in the Biomarkers Tab, and so in our Biomarkers Tab, we can see we have some basic BRCA2 summary is associated with the DNA.
They repair we also have some Gene symbols and previous names but we also display the hallmarks maybe according to COSMIC as well as if your variant influences a different transcript we can see what transcript is being selected as well as if there were a consideration for Gene fusions and Pathways, but more importantly, if we scroll down, we're now getting into this information where we have our BRCA2 gene summary. So this is just providing you with information regarding your particular variant.
If you wanted to look at on that and how we're getting to this interpretation, which is again something that's provided by this Golden Helix CancerKB catalog, we have that. We have this information already pre-built which can really streamline your analysis. But if you were to start from scratch, we also provide you with the most relevant annotation and interpretation sources including information being pulled in from COSMIC, CiVIC, Genetic Home Reference, CGD and NCBI. Now another useful thing too is it's really important for labs internally to keep track of maybe any modifications that were made to a particular biomarker interpretation. And so if that is the case, it's very straightforward. So let's say we wanted to add for example, just a simple sentence into our interpretation for our BRCA2 gene summary.
What you'll notice is the ability to choose to review and save now, and what it'll show you is your previous interpretations and then the interpretation that was just added. So this is a really helpful way to kind of keep track of all of the modifications that were made, especially if you have multiple people evaluating your same particular variant, so you can keep track of those modifications this way. Another thing too that's really valuable is sharing this Anonymous interpretation with the Golden Helix’s curation team. So what this is essentially doing a few, you know, if you do have the option to keep to uncheck this to keep a check what it does is it kind of creates a repository that's sent to our curation team, the expert professionals and panel.
What they will do is consolidate that information and incorporate it back into this CancerKB catalog and then if a new user were to see your see that same variant or biomarker, they would be provided with that interpretation according to that CancerKB catalog. So that's just a very useful feature for keeping those by biomarker interpretations in kind of one centralized repository.
Additionally, you can also see who it was modified by. So I modified it you could choose to discard those changes and I'll remove it for from your interpretation.
But getting back to basically this gene summary, we're looking at this BRCA2, it's a tumor suppressor gene and it participates in the DNA damage response. So if we were to scroll down including that Gene summary, we also have the alteration frequency and outcome.
So this interpretation is specific to this invasive breast carcinoma, but if you wanted to you could also change it to look at all cancers, but this alteration frequency and outcomes we also display the mutation frequency among different tissue types and so we have for example this BRCA2 alteration largely occurs in breasts but it also can occur in bowel, lung, as well as a variety of other ones according to COSMIC, and if we wanted to look at MSK we could see that information present as well.
Now another useful feature to is we do have the related interpretations for other tumor types. So this ultimately can provide you with a good starting point on what is the relevance what's the alteration frequency and outcomes may be in a different tumor type which could really start jumpstart your evaluation. In addition, we do have this information being provided by the Golden Helix CancerKB catalog and right now it’s showing us information for colorectal adenocarcinoma. So, in addition to the alteration frequency outcome, it's kind of showing us that it's occurring between 2 to 9 percent of breast cancers, according to Cosmic and OncoKB.
There’s evidence for germline and somatic alterations approximately 7% of breast cancers are estimated to harbor a germline BRCA 1 or 2 mutation. So this is obviously evidence that is valuable to include in our clinic report, which we're doing and then the last couple ones are the biomarker summary so BRCA2 deletion, we're looking at this interpretation scope for a BRCA2 deletion really this is valuable because this is kind of indicating that we're looking at a loss of function mutation for our BRCA2 gene. So it's essentially truncating our particular protein. And so when we look at this BRCA2 deletion and this loss of function for all cancers again we’re provided with this expert interpretation for this biomarker, so it's basically again showing us that when it's for loss of function it’s critical in that DNA repair most importantly for that homologous recombination, but it's also important for essentially leading to a biomarker interpretation. So what we're listing here is you can also see another valuable source is that when there are PubMed ID references as you can see here and they're integrated into your interpretation, it will automatically pick up those inline citations. And if you wanted to ever view that information, all you have to do is click on that particular hyperlink and it'll bring you to not only the title, the PubMed ID, the author, but also the abstract including those details and then you can choose to look at those individual publications and just kind of get a good understanding of you know, how they're coming, you know to the BRCA2 deletion causing a loss of function and the result of it being a biomarker.
So that's just another useful capability but more importantly is the clinical evidence. And so this ultimately is where we're getting to our tier classifications. So again, this is actually based on three different annotation sources, which include DrugBank, PMKB, and CiVIC and we can see the information for clinical evidence is important to have the drug sensitivity, drug-resistant, prognostic, and diagnostic capabilities.
And so for our particular if we were to look at the BRCA, so breast cancer adenocarcinoma, for the tissue type and look at even maybe in a matching region or even a matching gene and look at all of the drug sensitivity information, we can see here that we have essentially 29 results and we actually have one that's a direct hit and this is the Olaparib for characterized for a BRCA2 mutation and cancer and we can see that it has a Tier 1 Level A validation with the four-star review status.
Now the useful thing is that in addition to providing you with this kind of hierarchical information. We also display all of the data regarding this particular choice. And so for example, if we were to look at this related interpretation for this invasive breast carcinoma for loss of function, we can see that it's associated with this Tier 1 Level A for this Olaparib. We then provide you with all of the relevant information regarding the clinical evidence surrounding this therapeutic option and it's been integrated into this reporting capability. So here you can see we have Olaparib, its associated with Tier 1 Level A for this invasive breast carcinoma. And then that interpretation scope is that BRCA2 deletion as well as our inline citations. So relatively quickly it's easy to see how this information especially with Golden Helix CancerKB can really streamline your analysis for some of those common mutations and cancer, but additionally we also provide you with all of relevant information, all of the up-to-date annotation sources to essentially standardized this reporting capability.
And so just as a recap if we go back to our reports you then have the ability to sign off and finalized which will finalize your sample as well as you can modify this information. But if we were to open up our report tab click on the render icon and open up that Word report, we now have again all of that the relevant information that's required for that reporting capability. So ultimately the goal of these guidelines is to establish standardized classifications, annotations, interpretations, and reporting of sequencing variance for both germline and somatic mutations. I really hope that this demonstration was able to convey how VSClinical aids in the standardization and can improve your NGS pipeline but most of all, I hope you enjoyed this demonstration.
And so with that said, let's jump back to the Powerpoint for a couple more slides to go over here.
So as I mentioned, if you have any questions, please don't hesitate to enter them into the GoToWebinar panel. But again, I want to thank our grant funding from the NIH which we are great very grateful for. The research reported in this publication was supported by the National Institute of General Medical Sciences and the National Institutes of Health under these listed Awards. We're also very grateful for that local grant funding from the state of Montana the pi is Dr. Andreas Scherer who is also the CEO at Golden Helix and the content described today was solely the responsibility of the authors and does not officially represent the views of the NIH.
Questions & Answers
Can we import external CNVs into the AMP workflow?
External CNVs called with chromosomal microarray or MLPA can be imported into the CNV caller. This function is located in the secondary tables of the “add” icon from which you can select “import your CNVs from file”. This will provide you with a new CNV table, which you can then annotate your events and evaluate them according to the AMP guidelines.
How do we assess the quality of the CNVs or determine if it’s real?
If you wanted to assess the validation of the CNV calls, you can visualize information such as the ratio and the z-score for your CNV event. The thresholds are defined within the algorithm and have been created to accurately detect CNVs relative to CMA and MLPA.
Is the CancerKB feature an additional cost?
No, the CancerKB is integrated into the VSClinical AMP workflow. Additionally, you can submit your interpretations to this database, which will be reviewed by our expert panel and added to the catalog. If you are interested in adding the AMP workflow to your license, let our team know here!
We’re considering validating CNV analysis for a whole genome/whole exome analysis using a total of 25 samples. Will that essentially suffice for our initial needs and thereafter subsequently include additional CNV references once more samples are sequenced over time?
We recommend having a reference set composed of 30 samples as we have found that our software performs best with this condition. However, it is an iterative process so if you have 25 samples, you can start with those references and then continue to add your references as you go.
Can VSClinical software detect and annotate specific variants including structural rearrangements, inversions, and translocations for whole exome/whole genome?
Within our AMP workflow, you do have the ability to evaluate Single Nucleotide Variants, insertions and deletions, gene fusions, copy number variants, and considerations for wild type genes. However, inversions and translocations are not traditionally identified using NGS approaches and are thus not present in a VCF file. Since the import option within VarSeq is dependent on VCF and BAM files, it would not be able to detect or evaluate those types of variants.
When are you using the binned option for your reference set?
The binned approach is geared towards analyzing shallow coverage whole-genome data that does not have a BED file that defines the target intervals. The minimum bin size can be set to 10,000 base pairs and then the algorithm will compute coverage statistics for the entire genome within the specified bin size. The binned approach is very similar to CMA, which allows you to accurately detect large aneuploidy events.
Can you say again how the determination is made that a sample CNV is the same CNV present in 1kg ClinGen, etc? What is the overlapping calculation?
The annotations that we are using for CNVs are interval tracks that are defined by matching regions rather than a specific location. With a matching region, the algorithm will then produce a similarity coefficient, which is defined as the size of intersections divided by the size of the union, also known as the Jaccard index. We could definitely deep-dive a little further into this question and if you are interested, just give us a shout at firstname.lastname@example.org!
What’s the estimated false discovery rate of the CNV prediction tool?
The false discovery rate is defined by the user in the algorithm and can be changed between a sensitivity, balanced or precision setting. Sensitivity will detect more CNVs but increase the rates of false positives, whereas the precision setting will detect fewer CNVs but decrease the rates of false positives. This in combination with removing quality flags, and the introspective capability of defining confidence based on the event p-value will significantly reduce the rate of false positives and allow you to accurately detect CNV events with your NGS data.