New Product Add-Ons
- The VSClinical ACMG Guidelines workflow now has an additional CNV interpretation framework based on the ACMG/ClinGen guidelines. This product supports interpreting CNVs detected with VS-CNV or imported CNVs alongside variants and requires both a VSClinical ACMG license and a CNV license.
Release notes are organized by product. Each product section starts with new features, then polish items and bug fixes for that product or functional area.
VS-CNV
- A new flag was added to the CNV table output which denotes deletions that contain two or more heterozygous variants. This flag is called “Deletion Contains Heterozygous Variants”.
- Probability/Segregation: A new algorithm has been implemented, “Copy Number Probability/Segregation”, which runs on the CNV table and computes the expected copy number of each called CNV. If parental information is provided for a sample, the algorithm also computes the probability that the CNV is present in the mother and the father. This is useful for having a computed confidence that a CNV is de Novo.
- Latest Assessment: A new algorithm has been implemented, “Latest CNV Sample Assessments”, which annotates the CNV table with the latest assessments from a selected CNV assessment catalog.
- Matching Sample: Two new algorithms has been implemented, “Annotate CNVs Matching Current Sample” and “Annotate Regions Matching Current Sample”, which annotate the CNV or region table with the assessments from a selected CNV assessment catalog that contain the current sample name in a specified “Samples” field.
- The default value of “Size of the Reference Sample Subset” in the Advanced Parameters for the CNV Caller on Target Regions has been increased from 100 samples to 10,000 samples. This returns VS-CNV to match the 2.2.0 release behavior in most cases.NoteThe 2.2.1 update added a feature to subset the considered reference samples to handle extremely large reference sample sets. The default value of 100 used in 2.2.1 resulted in selecting sub-optimal reference samples and thus changed the performance of the CNV caller negatively for some customers. To maintain the same normalization behavior as Version 2.2.0, we have updated the default to a very large number, namely 10,000.
- The CNV output of a karyotype value has been adjusted to only report a karyotype on CNVs larger than at least one cytoband. CNV flags will also be reported for large CNVs along with karyotypes.
- The Binned CNV Caller algorithm was running slowly due to processing the Variant Allele Frequencies (VAF) values, even though they were not incorporated into the calling algorithm. The VAF values are no longer processed during Binned CNV Caller computations.
- The CNV caller algorithm (Target or Binned) would get triggered to re-run when any sample table field was edited. Now it only re-runs when the “Samples” name field changes.
- The “Annotate Overlapping Genes” algorithm has been enhanced to support the match mode of features that are “Within” or “Overlap” the current record. This is in addition to the current matching based on a similarity (Jacard) index.
- The “Annotate Overlapping Regions” and “Annotate Overlapping CNVs” algorithms now have the additional options of annotating records that are completely “Within” another annotation record or annotating records that completely “Contain” other annotation records.
- The CNV gene annotation algorithm, when run on a large number of CNVs, would produce an “Algorithm Error”, reporting that it “exceeds the maximum number of unique strings”. The algorithm has been updated to have the correct type for the “Region (Clinically Relevant)” field and to no longer produce this error.
- When adding a CNV annotation algorithm through Add > Computed Data, then selecting Annotate Overlapping CNVs (or Regions) from the CNV table, afterwards selecting a warehouse source, the software would report that only local sources are supported. This has been updated to support setting parameters for annotating warehouse-based catalogs.
- The sample sex detection for the CNV caller has been enhanced to use the heuristic of a normalized Y mean depth ratio of < 0.05 for females.
- The Target CNV Caller dialog has new “Advanced” parameters:
- The checkbox “Signal Scaling for High Percent Difference” allows manually (enabling or) disabling normalization scaling performed on samples with a high percent difference to the reference samples.
- The new parameter “Percent of Targets to Force Aneuploid Call” forces an event to be called a whole chromosome event if the percentage of targets in a given state exceed the selected percentage (95% default). In other words, if the percentage of deleted or duplicated targets in a chromosome exceed the selected threshold, then the entire chromosome will be called as an aneuploid deletion or duplication event.
- The new parameter “Sample Type” can be set to either “Auto”, “Gene Panel”, or “Exome” for inferring the sample type. If the sample type is “Exome”, the algorithm is permitted to call whole chromosome events and report these events in the Karyotype output at the sample level as being whole chromosome events. You may force this option if you desire to call whole chromosome aneuploidy events, especially in the X or Y chromosomes.
- The option “Subset normalization targets in exomes to those containing het variants” will cause the algorithm to only use targets containing variants with a heterozygous VAF when computing the mean coverage for normalization. This is recommended only for highly mutated tumor samples. In previous releases this action always took place for “exome” sized target lists, but now must be turned on explicitly as an option as most samples do not benefit from this normalization technique.
VSClinical ACMG
- As noted above, the VSClinical ACMG Guidelines workflow now has an additional CNV interpretation framework based on the ACMG/ClinGen guidelines. This product supports interpreting CNVs detected with VS-CNV or imported CNVs alongside variants and requires both a VSClinical ACMG license and a CNV license.
- The existing ACMG variant scoring workflow has been reorganized into a single view under the Variants tab.
- The new ACMG CNV scoring workflow has been created under the CNV tab.
- The Gene tab now contains coverage statistics at the target level as well as the ability to provide a gene list for summary statistics and reporting.
- The Phenotype tab allows selecting phenotypes and disorders for the patient as well as entering patient notes that result in automatically extracted terms.
- The Reports tab summarizes all reportable information and the ability to render that information into a customizable Word and PDF report output.
- The management of annotation sources used by the ACMG (and AMP) guidelines has been updated to allow selecting specific versions to be used in the evaluation process. The versions selected when creating an evaluation will be locked. This means that regardless of what new or updated sources are downloaded, an evaluation will continue to use and display the versions of sources selected on creation. The ACMG workflow allows these versions to be locked to the project template so all new evaluations are created with a specific version list. The name and version date of each source is available to display in the Word-based report system.
- The VSClinical ACMG catalog schema has been expanded with fields “Omim ID”, “Mondo ID”, “Report Section”, “Criteria Comments”, “Citations Data”, “Interpretation Citations”, “Exon Number” and “Interpretation Notes”. There is a useful button in the ACMG configuration dialog to upgrade your catalog to include these fields. Once upgraded, all newly saved interpretations will include data in these fields from the evaluation.
- The population catalogs using the scoring criteria BA1, BS1, and PM2 can now be configured in the options dialog under the “ACMG Frequency Sources” tab. The catalogs for controls (BS2) can be controlled under the “ACMG Control Sources” tab.
- The PM2 criteria recommendation is now based on a very low allele frequency instead of an absolute total allele count to be more robust to population catalogs with small sample sizes. In rare cases, the previous behavior resulted in both PM2 and BA1 being recommended. These settings are tunable, but by default the PM2 threshold is less than 0.02% for recessive genes and 0.01% for dominant genes. The threshold will be evaluated on the sub-population with the largest allele frequency that meets the specified minimum allele count.
- The ACMG guidelines have been updated to incorporate the Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. This allows PVS1 to be recommended at lower strength levels (Strong, Moderate, Supporting) based on LoF variants with less definite evidence of pathogenicity.
- The Gene Constraints track used by VSClinical has been switched from ExAC to gnomAD. You can expect the constraint values used in PVS1 and PP2 to evaluate the tolerance of loss-of-function variants and missense variants (respectively) to be different. Additionally, the LoF value now being used is based on the computations of the observed/expected ratio (O/E) and specifically the upper bound of the 90% CI computed on this value as suggested by the gnomAD team in the gnomAD v2.1 blog post.
- A special consideration warning is now given if “PVS1” and “PP3” are both scored.
- The ACMG criteria have had the evidence strength modifiers expanded to include newly published papers’ recommendations.
- Variants reported in the Gene Region and Mutation Profile table for forward-strand genes are now displayed in their right-aligned position. In most cases, this should match the HGVS descriptions reported by ClinVar for these variants.
- Now, adding the ACMG Classifier algorithm will always use the existing project’s version of the gene track for nearby variant criteria in order to stay consistent with the current project. Previously, the algorithm always selected the most recent locally downloaded gene track version.
- The ACMG classifier algorithm now ensures that previous classifications are detected and counted even when the transcript name has changed, such as when a gene track has been updated.
- The ACMG classifier algorithm now matches any gene annotation algorithm in the project to define the splice region variant for the purpose of the variant sequence ontology.
- Comments that were added to different classification sections in VSClinical were not being correctly added to the variant interpretation when selecting the “Add Text to My Interpretation” option. This has now been fixed.
- Comments that were written in the “Criteria List Table” were not being saved to the evaluation. This has now been fixed.
VSClinical AMP
- The AMP workflow has now adopted the concept of work being done in “Evaluations” similar to the VSClinical ACMG workflow. This allows for the creation of potentially multiple evaluations per samples, and also allows for wiping the slate clean by deleting an evaluation. Existing projects are upgraded to show a single evaluation where work was done for a sample.
- The AMP workflow report templates have been updated to use the new customizable “filters” technique also used in the ACMG workflow. This means the input data has been re-organized to be less redundant while also providing more details.NoteDue to these changes, Word report templates based on the previous version may not fully render. We suggest you re-create custom reports starting with the two default report templates. Please contact support if you have a more complex custom template and we will be happy to assist in transitioning it to the new system.
- A “Somatic Catalog” was added to the AMP workflow. This allows saving and auto-filling previously saved oncogenic scoring and interpretations at the variant level for somatic variants. This is the equivalent of the “Classified Germline” catalog for the ACMG workflow and germline variants in the AMP workflow. You can save/load variants into this catalog in the “Variants” tab, while still using the “Cancer Interpretations” catalog in the “Biomarkers” tab.
- The updated management of annotation source versions now used in VSClinical is used in the AMP guidelines as well. See the note in the ACMG section for more details.
- A new Genes tab is available with the details and plots about the target region coverage. You can now provide virtual gene panel lists at the sample or project level to report coverage details.
- The cancer tumor type options have been expanded to have a top-level type for each tissue type. For example, “Lung Carcinoma” can now be selected instead of just one of its sub-types.
- When scoring somatic variants, the cancer hotspot criteria is now available to manually score for any variant type, not just missense and in-frame indels. The auto-scoring system will continue to only recommend the hotspot criteria for missense changes.
- When searching for clinical trials, there is now a filter option for trials that contain specific Diseases in their summary.
- Manually adding germline variants to the AMP Guidelines was producing an error. It is now supported again.
- There was an SSL handshake error on Windows when using the cloud convert Word to PDF feature. This has now been fixed.
VSReports
- APIs that allow for reading input data from other samples have been added to VSReports.
- VSReports now allows the creating of reports in projects that contain only variant sites without associated samples.
- VSReports now allows selecting record sets for Coverage and CNV sections from any of the tables in the project. Previously only a single table of a given type was supported. This allows, for example, reports that may select failed target regions from different coverage tables.
VSPipeline
- VSPipeline now has a “update_cnv_import” command that allows the setting of input files for importing external CNVs into the project from VCFs.
VarSeq Annotations and Algorithms
- Latest Assessment: A new algorithm has been implemented, “Latest Variant Sample Assessments”, which annotates the variant table with the latest assessments from a selected variant assessment catalog.
- Matching Sample: A new algorithm has been implemented, “Annotate Variants Matching Current Sample”, which annotates the variant table with the assessments from a selected assessment catalog that have the current sample name in a specified “Samples” field.
- Transcripts: The novel splice site predictions in the transcript annotation algorithm now has a “N of 4 Changed To Predicted Splicing” field to improve filtering options for interesting novel splice sites.
- Transcripts: Previously the novel splice prediction algorithm reported any change in splicing predictions, but now it reports only changes from not-splicing to splicing for a novel splice site.
- Transcripts: For multi-allelic variants, it is possible that each alternate allele has a different sequence ontology and impact on the overlapping transcript. The “(Clinically Relevant)” output fields, including the HGVS descriptions, have been updated to prioritize reporting the allele with the highest impact on the transcript.
- Transcripts: The “Distance to Splice Site (Clinically Relevant)” field in the transcript annotation algorithm has been updated to account for all combinations of a variant being before or after the splice site and on reverse strands. For clarity, this distance calculation is now the number of bases that would be required to move the variant to overlap the canonical splice site. This is generally one greater than the previously reported number.NoteThe output of the “N of 4 Predicted Splice Disrupted” outputs may also change due to changes related to this fix. We expect the change to be in the direction of an additional algorithm predicting the variant as splice disrupting.
- Transcripts: The HGVS format for frame-shifts that introduce stop-gains has been changed to not include the “fs” suffix. Thus, for instance, we have “p.Leu103*” instead of “p.Leu103*fs” (as preferred by ClinVar). This matches the format expected by other tools such as Mutalyzer.
- Target Region Coverage: The option to “Collapse overlapping regions” was added to the coverage statistics algorithm so that, if this option is selected, target regions that overlap do not get counted more than once.
- Target Region Coverage and Binned Region Coverage: These algorithms now allow users to optionally include multiple additional depth threshold calculations outside of the default 1x, 20x, 100x, and 500x options.
- Gene List: The Match Genes List Per Sample algorithm was creating an error if there were certain missing fields in the input source. This has been changed to run even if some fields are missing.
- Phenotype Linked Genes: The Match Genes Linked to Phenotypes algorithm will now show the linked genes as well as the selected HPO terms used in the project Log.
- PhoRank: The Human Phenotype Ontology and Gene Ontology tracks used by PhoRank and other phenotype-based features have been updated to the latest versions.
- PhoRank: The PhoRank algorithm was previously producing ‘nan’ values for the score when all of the genes involved receive a score of zero. They are now filled in with ‘?’ values.
- PhoRank: Running the Variant PhoRank Gene Ranking algorithm with HPO terms did not recognize the HPO terms if “Enhance with OMIM Phenotypes” was selected. This has now been fixed.
- Mendel Error: The Mendel Error algorithm now outputs the fields “Inherited from Mother Count” and “Inherited from Father Count” to the sample table.
- Import: There is now improved support for importing CNVs in VCFs from more external callers such as CANVAS.
- Import: The import system used to add additional sample level fields was made robust to detect and handle non-UTF-8 text encoded using the Latin-1 codec in addition to continuing to support the default UTF-8 Unicode.
- Import: The way in which multi-allelic variants have their alternate alleles represented on import was dependent on file list order (if different files used different allele orderings), but this has been changed to be file order agnostic by forcing all multi-allelic variants to be in lexicographic order (A/C/G/T).
- Import: The “Match Variants to Affected Individuals’ Genotypes” import option had an out of bounds error with the trio date when samples were excluded on import. This has been resolved to allow sample subsetting and variant matching on import.
- Import: When importing files to VarSeq, the default import selections were not preserved if the Advanced Options checkbox was checked and a project template was being used. This has now been fixed.
- Import: Importing multiple CNV files for different samples was not correctly mapping CNVs to the assigned sample. This has now been fixed.
- Import: The 23andMe import tool has been updated to set the genotypes for insertions/deletions to missing, as the provided data in their text file does not inform genotypes.
VarSeq Projects and General
- The shipped gene tracks have been updated:
- GRCh37: RefSeq Genes 105.20190906 v2, NCBI with the 0.91 release of MANE transcript status.
- GRCh38: RefSeq Genes 109.20200815 v1, NCBI, with the 0.91 release of MANE transcript status.
- Network The following “timeouts” have been increased:
- Loading remote source info: from 5 seconds to 1 minute.
- Querying remote annotation source: from 1 minute to 3 minutes.
- Sources: In a multi-user or clinical environment, it may make sense to not use the latest version of sources published to the public and secure annotation sources when adding new algorithms or using VSClinical but rather depend on the latest downloaded sources. This fundamental behavior can be changed for all users launching VarSeq by adding a “hosts.json” file to the install directory. This file also allows setting the default VSWarehouse URL so that each user does not need to configure that internal property the first time they run VarSeq. See Installing and Initializing for more details
- Sources: VarSeq now prompts when a requested download action will create a second copy of an annotation source. This prevents the problem of a shared annotations folder collecting multiple fully downloaded copies of updated annotation sources.
- Projects: The shipped templates now have the common annotation sources unlocked so that the most recent versions of these annotations will be used automatically.
- Projects: Certain complex templates would run to completion but not display the per-sample data in the coverage table. The visibility of these fields has been resolved.
- Projects: When using a saved project template which used LiftOver in the import process, changes to the LiftOver options on new projects were not recognized. New projects can now change the assembly of input files on project templates saved with LiftOver enabled.
- Projects: The project loading performance was improved for projects with multiple overlapping filters and algorithms.
- Projects: On the latest Mac and Linux operating systems, complex projects could crash due to low default resource limits applied to applications. VarSeq now raises the allowed file handles and will no longer crash due to this cause.
- Annotation Sources: You will now be prompted if you select an annotation source to download that is already present in your annotations folder or is in the process of being downloaded. This situation may happen when you are using a shared annotation folder or you are running multiple instances of VarSeq.
- Tables: In VarSeq tables, the Change Column Visibility dialog now makes it easier to disable a group and not lose the state of an individual column’s visibility.
- GenomeBrowse: On GRCh38, the X and Y chromosomes would only display if there was plotted data present in them. This has been updated to always display these chromosomes, which matches the behavior for GRCh37.
- GenomeBrowse: The handling of gene tracks with malformed coordinate systems has been improved.
- GenomeBrowse: Plotting certain tracks that used chromosome identifiers such as NC_000001.10 failed to display the zoomed-out coverage. This was an impact from recent versions of dbSNP. These recent versions of dbSNP are now supported.
- GenomeBrowse: In rare circumstances, closing the GenomeBrowse view would cause the program to crash. Changes were made to the cleanup strategy to prevent further crashes upon closing the GenomeBrowse view.