VarSeq 2.3.0 is a major releases with significant new features and improvements based on user feedback and product roadmap progression. The major new capabilities include:
- VSClinical AMP module update to support genomic signatures and an updated CancerKB 2.0 that includes expanded expert curations for signatures and the latest relevant cancer drugs and biomarker interpretations.
- An updated importer that supports VCF 4.3 BND (breakend) records that can describe structural variants such as gene fusions. VarSeq also added supporting annotations and VSClinical AMP import capabilities for these imported structural variants.
- Support for the CRAM file format in addition to BAM for visualization and coverage analysis, including coverage used for CNV calling.
- Expanded customization and automation through VSClinical evaluation scripts, with bundled scripts for popular platforms such as TSO500 and Archer Fusions.
- A new curation interface is now available to users wherein they can write their own cancer interpretations that can be saved and reused in evaluations. This feature is available as a third dropdown option “Cancer Interpretations” when opening VSClinical and enables writing and saving interpretations for variants independent of needing to add variants to an evaluation.
- A new assessment catalog has been added for saving drug interpretations, including fields saving information about drug approval status, drug class, mechanism of action, indication for use, drug descriptions, drug resistance, citations, and information about who wrote the interpretation and when it was saved. This catalog currently must be saved locally and is not supported in VSWarehouse.
- Evaluation options has been updated to allow setting interpretation match behavior to “Match Best”, Match All”, “Match All Including Other Tumor Types”.
- The VSClinical AMP start screen has been updated to prompt the user to select a tumor type prior to starting an evaluation.
- VSClinical now supports adding/describing fusions using a double colon and defining exon-to-exon splice fusion junctions i.e. VCL::ALK e16-e20.
- Structural variants, copy number variants, and SNPs can be added directly from VarSeq variant tables and manually added to VSClinical AMP evaluations.
- The Evaluation tab has been restructured to display data within 5 tables; Variants to Evaluate, Knowledge Base Interpretations, Drugs and Trials, NGS Sequencing Summary, and Changelog History. The Knowledge Base Interpretations is a new table that displays drug sensitivity, drug resistance, diagnostic, prognostic, biomarker summaries, and gene interpretations for previously saved variants in the user assessment catalog or interpretations that are included in the CancerKB database. The Drugs and Trials table is a new table that allows users to see and or navigate to all drugs and trials that have been associated with variants in the project.
- The gene tab has been restructured to include gene summaries, alteration frequencies, and genomic signature descriptions. These interpretation were previously found in the variants tab.
- Variant impact can now be associated with biomarkers within VSClinical AMP: “Activating”, “Gain of Function”, “Loss of Function”, “Loss”, “Negative Finding”, “Unknown”.
- In VSClinical AMP there is now support for analyzing and reporting combination biomarkers.
- VSClinical AMP now supports saving variant, region, or gene level negative finding interpretations.
- The Biomarkers and Clinical Trials tabs have been merged and now clinical evidence such as available drug therapies, diagnostic or prognostic information, and associated clinical trials for variants that have been added to the evaluation are analyzed within the Clinical Evidence tab.
- Within the therapeutic options table of the Clinical Evidence tab, a Drug Descriptions tab has been added and it includes a new Resistance Description interpretation field to include information about eventual acquired resistance associated with the drug which is agnostic of the sample, variant and tumor type.
- Large copy number variants spanning multiple genes can now be added to the AMP workflow as a single event. If the CNV overlaps any cancer genes, these genes are presented with their role in cancer and users can select the primary gene for clinical reporting.
- Copy number variants can now be reported as VUS, biomarker, or germline suspected CNVs.
- VSClinical AMP now supports adding CNVs with cytoband notation (i.e del(17p13.1)).
- A CNV region plot, similar to the CNV plot in VSClinical ACMG, has been incorporated into the AMP interface.
- The clinical trials search dialog allows users to search by drug, biomarker, and disease and these results can now be filtered by trial phase, patient age, patient sex, by region (such as Europe) or country, and by distance using United States and European postal codes.
- In the add/edit dialog for clinical evidence, a Scored Criteria Worksheet has been added. The worksheet enables users to specify the reasoning behind AMP tier level selection and simplify tier level assignment.
- Manually adding fusions with fusion partners to VSClinical AMP evaluations has been improved to allow swapping the primary/secondary gene instead of auto-selecting the primary and secondary gene.
VSClinical AMP now has “Evaluation Scripts” on the first screen that enable importing of custom data files as well as automation of many capabilities of the workflow. Existing evaluation scripts can be be modified by copying the existing script and making changes or new scripts can be generated from scratch. Evaluation scripts can also be called from VSPipeline to automate many tasks that would previously have required manual user action.
- Import TSO500 script adds fusions, DNA called CNVs, and genomic signatures from solid tumors and liquid biopsy samples that are reported in the CombinedVariantOutput.tsv file.
- Add Associated Clinical Trials script adds all associated clinicals for biomarkers added to the project.
- Import Project Variants, CNVs, and Fusions script adds the filtered result of the the variants, CNV and breakend tables to a new evaluation.
- Import TSO500 All Splice Variants script accepts the annotated.json.gz file from the RNA pipeline output as input and will import all of the splice variants with allele depths greater than or equal to 5 and has a PASS filter to the evaluation as Copy Number Variations.
- Import TSO500 All Fusions script accepts as input the ALLFusions.csv file from the RNA pipeline out as input and will import all fusions in the file. The script can be easily edited to choose to only import fusions that pass the “KeepFusion” quality filter.
- Import Ion Torrent Signatures script imports genomic signature data from sample fields of a VCF file header into an evaluation.
- Import Archer Fusions and Deletions script imports combined TSV files from the Archer pipeline.
- Sync Report Status with Variant Sets script enables record sets from variant tables to be used to define the report status of the variants that have been added to the evaluation (i.e record sets mapped to “Primary Findings” will get a report status of “Biomarker”, record sets mapped to “Secondary Findings” will get a report status of “Secondary Germline”, and record sets mapped to “Uncertain Significance” will get a report status as “VUS”).
- The custom script “Compatibility Report Data” can optionally be run in the report tab which prepares the report data to be compatible with pre-2.3.0 AMP word report templates.
- Three new report templates have been added to VSClinical AMP and now support reporting negative findings, genomic signatures, and combination biomarkers.
- The word report system will now load the filter functions from a .js file with the same base file name as the selected word .docx template file. This supports having multiple report templates with their own .js files.
- Hovering over variants in the Evaluation tab when the table is sorted would sometimes provide the variant name of a different row. The tooltip has been updated to match the sorted row data.
- VSClinical ACMG now supports adding CNVs with cytoband notation (i.e del(17p13.1))
- Adding variants is now more robust to selected Previously Interpreted Variants sources with unexpected field values.
- The gnomAD frequency information card now links out to gnomAD v3 (GRCh38 coordinates) when in a GRCh38 project.
- VSClinical ACMG classification of MT variant failed to pick up overlapping ClinVar variants on the GRCh38 genome but will not match and score criteria related to nearby or overlapping pathogenic variants.
- Multi-allelic variants will now normalize each allele before display the HGVS in the classifier output
- The CNV Evidence card in VSClinical now provides bounds for the HGVS g. coordinate on the lifted over chromosome to match gene coordinates on non-native assembly
- When adding an annotation source that is not supported as an ACMG Previously Interpreted Variant Source in the Project Options dialog, specifically ClinVar, an errored occurred. VSClinical will no longer error out even if an invalid source is selected.
- The copy number (CN) field imported into VarSeq from CNV vcf files is now pulled into VSClinical ACMG/AMP for copy number variant evaluation and can be included in clinical reports.
- Opening a project that requires additional configuration to view VSClinical now delays popping up the options dialog until the user selects “Update” from the VSClinical tab
- VSClinical AMP and ACMG now support manually adding variants with g. dot notation, transcript name with g. notation, and transcript name with c. notation for CNVs (this was already supported for variants).
- On computers with dates being displayed in Tai or similar non-numeric encodings, VSClinical was not able to process the list of available sources and was stuck on the source list dialog. This has now been fixed so that the serialization of dates in projects is agnostic to the user desktop preferences.
VarSeq Annotations and Algorithms
- The following annotation sources have been added to VSClinical AMP: NCT Trials, Drug Central, CIViC Assertion Summaries, and NCI Thesaurus Drugs. The following sources have been removed: PMKB and CGI Biomarkers.
- Tabular sources such as Cancer Ontology, Human Phenotype Ontology, OMIM Phenotype Ontology, and MONDO, Panel App Panels, Gene Ontology, Drug Central, NCI Thesaurus, and NCT Trials are now updated on a monthly schedule instead of upon VarSeq release.
- The T2T CHM13 genome assembly support was added, with built-in liftover support for GRCh37 and GRCh38 to map to T2T assembly.
- The OMIM Genes and Phenotypes annotation sources now use a batchidentifier algorithm to annotate variants based on Gene ID (Entrez). OMIM annotation will now leverage the RefSeq Genes transcript algorithm settings to annotate upstream and downstream variants. Previous project templates can be updated by deleting and re-adding the OMIM annotations to pick up the new annotation mode.
- Reference sequences for GRCh37 and 38 have been updated to better support CRAM file coverage.
- The default gene preferences file was updated to include curated inheritance models for more genes, manually resolved in the case of conflicts from OMIM. It also includes update default transcripts for a small number of genes based on updated submission counts from ClinVar. See the blog post on Selecting Clinically Relevant Transcripts.
Golden Helix CancerKB V2
- New drug interpretations field added to CancerKB including information about the class of the drug, mechanism of action, indication for use, description, and information about eventual acquired resistance associated with the drug.
- New interpretations for drug sensitivity and resistance, prognostic, and diagnostic are now available at a tumor type level and for genomic signatures.
- New tier II drug sensitivity and resistance interpretations added for BRAF, KIT, PTEN, ABL1, and MET.
- Interpretations for genomic signatures have been curated and are available in CancerKB.
- Interpretations for combination biomarkers or multiple biomarkers are now supported in CancerKB.
- Coverage algorithms can now use CRAM files in addition to BAM files to compute coverage.
- The Annotate Transcripts algorithm has been updated to show all clinically relevant transcripts when annotating multi-allelic variants as opposed to only displaying one transcript.
- A new Annotate Fusions Algorithm was added for annotating rearrangements. This algorithm annotates against gene fusions in the selected source, and identifies complex rearrangements with breakends whose overlapping genes match those of fusions in the annotation source.
- A new Annotate Overlapping Genes algorithm was added for annotating rearrangements. This algorithm annotates breakends against overlapping genes based on the clinically relevant transcript. This algorithm has a number of different outputs such as indicating the 5′ and 3′ genes, location of the breakends, preserved exons, and effect.
- A new Annotate Overlapping Regions algorithm was added for annotating rearrangements. This algorithm annotates fusions against regions in the selected source and identifies rearrangements with breakends overlapping corresponding regions in the annotation source.
- The RefSeq transcript algorithm can now be configured to label new stop codons with an “*” instead of “Ter” for nonsense and frameshift variants.
- VarSeq gene annotations algorithm has improved performance on large projects, up to 50% faster.
- The PhoRank algorithm was enhanced with additional fields to report the directly associated patient phenotypes and count of directly associated patient phenotypes.
- The Match Genes Linked to Phenotypes, Match Genes Linked to Disorders, and Phorank algorithms were not correctly parsing multi-word HPO terms that were copied and pasted from a text document into the search dialog. This has now been fixed so that each word is not searched independently and the multi-word HPO term is searched as one entry.
- The VarSeq CNV Calling algorithm now automatically detects the copy number (CN) field from VCF files and uses this field for CNV calling. Previously, the CNV Caller algorithm relied on z-score and ratio fields from the coverage statistics algorithm for calling CNVs. The CN field is also incorporated into VSClinical and can be included in clinical reports. The CNV caller is also now more aware of CNVs called in PAR regions for male samples.
- Additional advanced option were added to the CNV Calling algorithm to adjust sex chromosome thresholds for gender inference.
- When importing CNVs or running the CNV Caller algorithm, regions that have no CNVs were previously called “Diploid” but are now labeled “Normal”. This change particularly applies to the X chromosome in male samples. Prior to this change, regions of the male X chromosome without copy number events were labeled diploid when these regions are actually haploid.
- Reported Bugs fixed.
update_bnd_importcommand was added to allow importing break-end VCF files defining fusions and other structural variants to a specific table in the project template, similar to the existing
- Work was done to make error messages for VSPipeline more concise and informative. If an error occurred while processing a project, a non-0 error exit code will be used, allowing calling bash scripts in Linux to respond to the error.
- The VSPipeline import command can now be passed a directory as a substitution for a list of files and will import every VCF file present in that directory.
- VSPipeline has addition commands
add_annotation_folderto influence the paths used to find annotation files for projects.
- VSPipeline will now look for the
VS_PASSWORDenvironment variables, and if present, perform a
logincommand on startup with their values. Similarly, VSPipeline will call the
login_tokencommand with the
VS_LOGIN_TOKENvariable if present in the environment.
VarSeq Projects and General
- A new project template, “Comprehensive Cancer Template”, is now available for GRCh 37 and GRCh 38.
- VarSeq license will stay current, even if permission to write to the user data folder does not allow updating the vsprops.json file
- There will not be a message on Linux about “DevTools listening” as the integrated web browser developer tools are not enabled by default. To enable the dev tools, set the environment variable QTWEBENGINE_REMOTE_DEBUGGING=8923
- On Linux, there is now a crash handler that runs if VarSeq executes an invalid instruction. This produces a “mdmp” file in the User Data folder that can be sent to support. This capability is has been previously only available on Windows.
- Various performance improvements were made to the filter chain to improve responsiveness to changes in the filter chain or the flagging of rows from tables.
- VarSeq now supports importing structural variants in the form of breakend VCF records (BND records). The project import wizard, as well as the option to add a secondary import table, now support importing these records into a new “Breakend” table. This new table type supports specialized annotation algorithms for annotating fusions.
- Importing records with an ALT field containing a * allele is now supported. According to the VCF spec, this symbol indicates there was an overlapping deletion. VarSeq will now import these record by removing the reference to this allele in the corresponding GT field and allele-matching fields. This will in effect make variants haploid for their galled alleles.
- Variant import can be filtered based on fields from the SNV, CNV, and SV vcf files such as filter = PASS, alt allele frequency or read depth.
- CRAM files can now be associated with samples as alignment files and can be used for computing targeted and binned coverage statistics.
- The custom folder path options dialog now has a “Reset All” button that will revert all custom folder paths to be reset to their original location based on the current AppData path.
- When exporting variants to VCF from VarSeq, there is now an option to add the ‘chr’ prefix to the chromosome name. There is also an added command to export variants with the added ‘chr’ prefix using VSPipeline.
- When using the import wizard for assessment catalogs, the auto-mapping of fields from a selected table to a catalog’s fields was not consistent. This has been improved to be deterministic and predictable in the auto-detection of field mappings.
- An import option has been added allowing the use of a variant/CNV size threshold to determine which table the variants/CNVs will be imported.
- The Subset on Track import option will now warn users if the project coordinates do not match the coordinates of the subset track source when lift-over is also selected as an import option.
- A new option to match samples names on prefix has been added to the sample import dialog. This feature is most applicable when importing variant and CNV vcf files or variant, CNV and SV vcf files for the same sample.
- You can now close a project and downloads will continue. You will be prompted on the closing of VarSeq if you would like to cancel ongoing downloads.
- Better error messages are displayed in the case of a download failure (such as network interruption or blocking web proxy).
- The default Assessment Catalog folder path is no longer associated with the App Data location when a custom assessment catalog path is provided. This was an issue when reading and writing to the gene preferences file.
- The VarSeq login screen now reports detailed error messages in the case of SSL connection issues cause by local firewalls or web proxies.
- Warehouse-based region annotations can now be used to annotate CNV tables from the add icon within the CNV table.
- Assessment catalog automapping to fields from the VarSeq table such as HGVS c./p. and gene names has been improved to be deterministic.
- VarSeq was crashing when adding samples to an existing project when sample names were identical. This has now been fixed.
- The genombrowser has been updated to support plotting fasta files as a reference sequence track.
- Soft-clipped reads can now be visualized in Genomebrowse. By default the soft-clipped reads are hidden but this feature can be optionally turned on.
- CRAM files can now be plotted in GenomeBrowse in the same way as BAM files. They will auto-detect indexed reference sequences that are available or prompt to download available references that match. If a custom reference was used to do alignment, it will need to be run through the Convert Wizard before the CRAM file will be readable.