- The VS-CNV algorithm has been upgraded with new options, filters and quality flags for the NGS Exome use case. Newly added CNV algorithms will use some of these options, but project templates will preserve the previous algorithm defaults. For more details see our NGS Exome CNV Calling tutorial and the documentation CNV Caller on Target Regions.
- There is a new Create Low Quality Targets wizard accessible from the Manage Reference Samples dialog that produces a set of targets from a selected panel that fail various quality metrics, including average depth across all reference samples, extreme GC content and too short spans.
- Previously when importing external CNVs from a text file, the text file containing the CNVs had to be associated with a sample to successfully complete import. Now, the CNV site files can be imported without a sample association.
- When multiple CNV VCF files were imported into a project with multiple samples, the mapping of samples to VCF was not respected when the imported file order did not match the project sample order. This has been fixed.
- When importing external CNVs, merge behavior settings can now be saved and incorporated into project templates.
- In the CNV table and in the details panel, the ACMG Sample CNV Classifier algorithm now displays the CNV classifications correctly with spaces. Previously, the classifications were displayed as a single word without a space.
- VarSeq now supports editing of the sample names in the project without triggering a re-running of the CNV and LOH algorithms.
- If the coverage table contains extra chromosomes with lowercase letters in their names, a crash would occur in the code used to call whole chromosome CNV events. These issues have now been resolved.
- Novel splice sites will now be considered by VSClinical when computing the recommended ACMG scoring criteria. Variants that have 3 or 4 out of the 4 splice site algorithms agree that a novel splice site is introduced by the variant will have “No” recommendation for BP7 and will recommend PP3 (instead of potentially BP4), allowing variants that would otherwise be filtered out as likely benign to have some weak pathogenic evidence. Ultimately, review of literature and clinical case history is required to get a novel splice site to a Likely Pathogenic classification. The Computational Evidence section and recommendation text has been enhanced with details about the effect of novel and canonical splice sites, distinguishing between events that cause a frameshift or stop gain versus those that preserve the reading frame.
- We have changed the recommended criteria for predicted splice disrupting variants from PVS1 (with modifiers) to PP3_Strong and PP3_Moderate depending on the predicted impact. This change was made to reflect the exact language of the 2018 PVS1 guideline paper in conjunction with a discussion with one of the authors of that paper.
- The following strength modifiers have been added to the ACMG/ACGS Guideline workflows: PS1_Moderate, PS1_Supporting, PM1_Strong, and PS3_VeryStrong.
- The ACMG Classifier algorithms within VarSeq projects were updated to more strictly apply PM5 only when ClinVar variants are contained by the current mutated codon. This prevents the case of larger deletions that overlap the current variant from triggering a PM5 recommendation. It also now applies it consistently for in-frame insertions and deletions.
- The ACMG Classifier algorithms now applies the ACGS guidelines-specific criteria rules, just like the recommendation in VSClinical. This includes PVS1 for stop-loss variants, PVS1_Moderage and PS1_Moderate (potentially) for start losses and PS1_Supporting (potentially) for splice disrupting variants.
- The VarSeq ACMG Classifier algorithm now recommends PM6 (Unconfirmed Denovo allele) versus PS2 (Confirmed Denovo Allele) for variants that are in a de Novo genotype state in the project. The guidelines require the confirmation of maternity and paternity to use PS2, and PM6 if otherwise assumed.
- In regions with poor multi-species alignment, such as the mitochondrial, BP4 was sometimes being recommended because the algorithm mistook the lack of data as evidence of “Not Conserved. The ACMG Classifier algorithms now only recommend BP4 when there is evidence in both GERP and phyloP that the positions are not conserved.
- Various rare edge cases in variant classification have been harmonized to have consistently suggested criteria and reasons between the ACMG Classifier algorithms within VarSeq and the VSClinical ACMG workflow recommendations.
- The ACMG Classifier algorithms improve handling of multi-allelic variants that require normalization (such A/AA/-) to correctly annotate the normalized form of each alternate allele (the insertion of a A, and deletion of A in this example).
- The ACMG Classifier algorithms had issues when the same sub-population names appeared in multiple custom control population sources. The algorihtm can now handle these duplicate names.
- The ACMG Classifier algorithms had the default options for including population frequency sources to double the homozygous count. This made sense when computing the number of alleles, but was being done unnecessarily in this context, causing the recommendation based on homozygous counts to have incorrect numbers in the recommendation text. Note that the recommendation code itself was correct and not impacted.
- When the ACGS guidelines are chosen as the scoring system, PP5/BP6 will not be recommended as per the following recommendation: “The ClinGen Sequence Variant interpretations group recommends that this criterion is not used (Biesecker and Harrison, 2018).”
- VSClinical now allows adding and annotating variants up to 1000bp upstream and downstream of an interpreted transcript.
- VSClinical now supports adding and interpreting variants in non-coding RNAs. For example, adding NR_003051.3(RMRP):n.-19_-13dupTCTGTGA will now detect the upstream non-coding RNA and detect and display the Pathogenic ClinVar assertion and recommend PP5 along with PM2.
- Mitochondrial variants can now be manually added to VSClinical with NC_012920.1:m.9035T>C where the HGVS notation uses “m” instead of “g” as the prefix.
- PM1 will now be recommended in VSClinical for larger variants spanning multiple amino acids when the nearby pathogenic variant is within 6 amino acids of either the start or stop of the affected amino acids. Previously this computation only compared the distance between the amino acid starts of the compared variants.
- When saving variant interpretations, there is now a “Save & Next” button that will take you to the next unsaved variant in the evaluation. If all variants are saved, the button will say “Save & Review” and take you back to the variant table in the Evaluation tab.
- When using a project opened over a network share, doing quick and repetitive saves of variant assessments in VSClinical had the potential to back up the save queue of the project workflow state and cause a crash. This was only observed in a relatively slow network when the project workflow states were over 10MB, but the issue was addressed by forcing the serialization of these save requests.
- When classifying variants with the ACGS guidelines, now two very strong (VS) criteria will result in a Pathogenic auto-classification.
- The Splice Predictions criteria now considers disruptive novel splice site predictions as well as changes to existing splice sites. The wording of the criteria and responses has also been updated to reflect these changes.
- The splice sites view in VSClinical displays various details on hover. The “Delta Raw” metric has been added to the splice site prediction tool tip for GeneSplicer and MaxEntScan predictions.
- The VarSeq 2.2.2 release introduced a regression that removed the ability to switch genes in the transcript selection dialog when a variant overlapped multiple genes. This capability has now been restored to allow the evaluation of the variant to switch to a different overlapping gene or even non-coding RNA.
- Multi-allelic variants in the 1000 Genomes catalog would in some situations be picked up incorrectly in the ACMG recommendation engine, impacting criteria like BS2. This has been fixed to always pick the correct allele frequency records from the 1000 Genomes catalog.
- The system Word report templates have been updated to have different formatting and output for each of the Test Result values of Positive, Negative, Inconclusive, Not Evaluated
- An option to “Open Systems Filters Location” has been added to the drop-down menu for the Word template output selector that displays the system filters file that may be useful when learning the programming environment for Word template filter functions.
- The first two tabs of the VSClinical interface have been combined into an Evaluation tab. This makes it easy to see the tumor type before adding mutations to the evaluation. A “Changelog History” section was added as well. Similarly, a fly-out menu is available from the top-right of the interface, similar to VSClinical ACMG, that provides an overview of the evaluation and quick access to actions like close and finalize.
- A Commercial Labels field has been added to the reportable drugs section. This field fills in automatically from DrugBank’s provided “Product” field for drugs and drug combinations.
- The variant scoring card now has an auto generated Oncogenicity Interpretation based on the scored criteria that can be added to the variant interpretation with a single button.
- Spice acceptor variants that are predicted to cause exon skipping of an in-frame exon are optionally interpreted as exon deletions instead of LoF variants. This allows cancer variants like NM_007297.4:c.5327-1G>A to be interpreted as a MET exon 14 deletion as it is commonly characterized in the literature.
- You can now add Loss of Heterozygosity events to the AMP workflow from the CNV project table or manually. Per-gene LOH events can be provided with biomarker interpretations and added to clinical reports with AMP tier evidence levels.
- When using a project that has not been configured with VSClinical ACMG, loss of function germline variants showed an error message and not a recommendation for the ACMG PVS1 criteria. These variants can now be added without issue.
- When saving somatic or germline variant interpretations, there is now a “Save & Next” button that will take you to the next unsaved variant in the evaluation. If all variants are saved, the button will say “Save & Review” and take you back to the variant table on the Mutation Profile tab.
- Liftover can now be enabled in the import command with the additional “liftover=true” parameter, allowing the import of VCF files with different coordinate systems than the genome assembly specified in the project template.
- Attempting to import a VCF file with a detected genome assembly different than that of the selected assembly for the project template will produce an error with a description of the mismatched coordinate systems.
- The update_cnv_import command caused an import error when provided more than one CNV VCF file as input. It now works with both single VCF file and per-sample VCF files.
- To support the above capabilities, a task_wait command is required before exporting or closing the project to ensure the import and all dependent algorithms are run. See Waiting For Task Completion for more details.
- VSPipeline will now provide a more informative error message for license_activate if the program does not have permissions to write the license file.
- There is now a download_annotations.sh script that invokes vsipeline with the request to download the reference sequence for the given genome assembly (or GRCh37 by default).
VarSeq Annotations and Algorithms
- A new “Gene List Exon Coverage” algorithm is available that will compute coverage statistics on a list of genes instead of requiring a user-provided target region annotation file. For each gene, the clinically relevant transcript is selected, and coverage is computed for the coding region of each exon. See Gene List Coverage Statistics.
- The Coverage Statistics Algorithm (and Binned Coverage Statistics) now have a “Count Duplicate Alignments (filtered by default)” that when check, turns off the filtering of reads marked as duplicates in the BAM.
- Most VarSeq algorithms, such as the ACMG Classifier and the Annotate Transcripts Algorithm, that have user defined setting options now have a right-click menu option allowing the algorithm options to be edited from the variant or CNV table directly.
- ACMG CNV Site Classifier: A new algorithm that is computed in the CNV Table. This algorithm computes classifications for each CNV based on the ACMG CNV Guidelines and does not require sample designation, or the CNVs to have been called with the VarSeq CNV algorithm. Additionally, this algorithm can be used with CNVs imported from an external caller.
- Updates have been made to the Annotate Transcripts algorithm to better support annotating non-coding RNAs along with a new option to include non-coding RNA, mircoRNA, and mRNA transcripts. Another option has been added to allow the annotation of upstream and downstream variants. These options are on by default for newly added gene annotations.
- Our shipped project templates have been updated to match the default options for gene annotation, including the annotaiton of non-cding RNAs, the inclusion of upstream and downstream variants and the detection of splice sites. Splice site predictions no longer require a VSClinical license.
- The transcript annotations algorithm novel splice site predictions for insertions and deletions now considers the canonical splice site and the novel splice site in conjunction to not predict an introduced splice site that is just the shifted canonical site.
- There are RefSeq transcript alignments to the human genome that have to compensate for differences in the reference sequence to the canonical RNA sequence. When these differences are one or two missing or additional bases, the transcript alignment introduces a one or two base “intron” (alignment gap). The transcript annotations algorithm will no longer classify variants in these gapped introns as canonical splice site mutations, but simply as splice region variants. This results in fewer benign polymorphisms from being classified and considered incorrectly as loss of function variants.
- The RefSeq shipped gene track for GRCh37 was replaced with a version that contained the same transcripts but has updated gene names and aliases to match the latest genes names from NCBI. Specifically, the gene names for the MT chromosomes were using older naming conventions and not the “MT-” prefix used by the latest gene tracks.
VarSeq Projects and General
There have been fairly large changes in the system gene preferences file (under Data/GenePreferences.gene-pref) in VarSeq 2.2.2 and now in VarSeq 2.2.3. These changes were prompted by the incorporation of the MANE transcripts in our most recent gene tracks and the large increase in the number of available transcripts in those tracks. In a high-profile example, BRAF went from single transcript to multi-transcript status, switching the long-established default transcript. Our heuristic for selecting clinically relevant transcript has been stable, but we use the system gene preferences to override the heuristic when there is a clear community preference for a given transcript. For example, BRAF NM_004333 has 563 ClinVar submissions referencing this specific transcript, yet the “MANE Select” mRNA transcript NM_001374258 has 0. In VarSeq 2.2.3, we have used the ClinVar Assessments counts to ensure these clear preferences are the default. Of course, you can switch transcripts used in VSClinical at any time, and saving a variant interpretation also saves your transcript preference for all future variants in a given gene. Using this updated strategy, in VarSeq 2.2.3 there are 850 genes with preferred transcript defined.
- The shipped gene preferences file has been updated to select default transcripts based on data from the ClinVar Assessments track and the gene track. Transcripts that have more than 10 ClinVar assessments and more than double the assessments than the next most interpreted transcript will be set as the default transcript, overriding the MANE transcript in about 90 genes. The remaining 750 genes with manual transcripts are ones without MANE transcripts, resulting in the clinically interpreted transcript replacing the transcript that would have been selected by VarSeq’s default heuristics.
- VarSeq now runs on Mac with the “Dark Mode” setting enabled. Previously, the setting caused a mix of inverted and regular colors. While VarSeq does not have a separate dark color palette, it now will consistently use the default colors and remain usable in dark mode.
- Attempting to import a VCF file with a detected genome assembly different than that of the selected assembly for the project template will produce a warning on the select sources screen and prompt with a final warning message when completing the wizard.
- The HGVS output for variants before and after transcript sequences has been improved to be able to use the “dup” syntax for insertions. For example: NR_003051n.-22_-15dupTACTCTGT.
- The log messages for algorithm computation completion now list the relevant axis (table types) in the output. For example, annotating a NV table will now have an event name of “Task Finished” (CNVs).
- On import, there is a new default option to “Use Default Chromosome Names” that will rename alternative chromosome names to the primary names used in VarSeq. For example, a VCF with NC_000009.11 in the CHR field will be imported as 9. Similarly a VCF with “MT” for a GRCh38 project will be renamed to “M” (see following note).
- Annotating assessment catalog variants with an “M” chromosome name on GRCh38 would fail due to the assembly primary chromosome name being “MT”. We changed a couple of things in this regard. First, we made changes to the annotation code to allow project variants with non-primary chromosome names to be read and saved to assessment catalogs. Second, we changed the default mitochondrial chromosome name on GRCh38 to be “M”, matching the reference sequence naming provided by NCBI and commonly used for alignment to GRCh38. Finally, we translate non-primary chromosome names to the primary chromosome names of the assembly on import.
There that GRCh38 assessment catalogs with previously saved “MT” chromosome entries will not be seen in new projects that now have “M” mitochondrial chromosome names on import. Reach out to support and we will help you update your historical assessment catalogs to rename all “MT” interpretations to match the new “M” naming convention going forward.
- VarSeq projects templates will now remember preferences about mapping per-VCF fields like QUAL into sample fields when saving project templates. This also works for imported CNV VCF files as part of the project template.
- GenomeBrowse: Improved handling of plotting mitochondrial chromosome variants regardless of “M” or “MT” being used as the chromosome name.
- GenomeBrowse: When the version of an annotation source is updated in the variant and CNV tables, if the source is plotted in GenomeBrowse, the plot will now update to the same annotation version that was used in the table.
- GenomeBrowse: The Feature List when displaying all features of filtered clinically relevant GRCh38 gene tracks will now display all features, including those in the mitochondrial chromosomes
- GenomeBrowse: Newly created BAM plots will now have the “Filter Duplicate Alignments” option turned on, to match the coverage statistics algorithms in VarSeq.