Enhanced Transcript Annotation: HGVS Repeat Notation Support in VarSeq

· Nathan Fortier · About Golden Helix, Bioinformatic Support

At Golden Helix, we’ve supported the HGVS variant nomenclature standard since the very first release of VarSeq. The HGVS standard provides a consistent and precise language for describing sequence variants in the context of gene transcripts and is a critical part of clinical genomics workflows.

As with all living standards, HGVS continues to evolve, refining how we describe increasingly complex genomic events. Along the way, we’ve kept pace by extending VarSeq’s Annotate Transcripts algorithm to handle new edge cases and provide users with fine-grained control over how their HGVS annotations are computed and displayed.

Some users adopt the latest recommendations from HGVS, while others prefer more stable community practices aligned with clinical reporting pipelines. We support both of these approaches, by offering customizable options that strike the right balance for your workflow. Today, we’re excited to announce support for one of the more recent additions to the HGVS standard: repeat notation.

Why Repeat Notation Matters

During NGS analysis it is common to encounter variants that occur in repeat regions where a short nucleotide motif is repeated multiple times. Previously, deletions or duplications in these repeat regions could only be annotated using standard HGVS notation without regard for the surrounding repeated sequence. This can be problematic when analyzing clinically significant short tandem repeats (STRs).

A compelling example is the HTT gene, where repeat variation has direct clinical implications for Huntington disease. The HTT gene contains a repeated motif in exon 1, and the number of repeats in this region varies significantly across the population. Importantly, the number of repeats in the reference sequence is somewhat arbitrary. While the GRCh38 reference sequence contains 19 copies of the repeat motif, in the general population, individuals may have anywhere between 6 and 26 copies, which are considered benign. Pathogenic expansions typically involve more than 40 repeats.

Consider the variant 4:3074877 -/CAGCAGCAGCAG shown below:

This variant occurs in the context of the HTT repeat region, with the variant inserting an additional four copies of the repeated motif. Using the default notation, this variant would be annotated as c.110_111insGCAGCAGCAGCA. This notation indicates an insertion of 12 nucleotides from positions 110 to 111, but it fails to tell us anything about how many total copies of the motif are present in the sample.

Using the new repeat notation feature, VarSeq will instead annotate this variant as c.54GCA[23]. This alternate representation tells us that, at position 54, the clinically relevant repeat motif “GCA” occurs exactly 23 times in the sample. This representation makes it instantly clear that the sample falls within the benign range, providing more clinically relevant information than the traditional insertion notation.

Syntax of HGVS Repeat Notation

The HGVS repeat notation follows the syntax structure shown below:

prefix + position_repeat_start + repeat_sequence + [ copy_number ]

Where:

  • prefix defines the transcript and notation type (e.g., NM_006172.4:c.)
  • position_repeat_start indicates the starting position of the repeat
  • repeat_sequence is the actual nucleotide sequence that constitutes the repeat unit
  • copy_number is the number of times the repeat motif is present in the sample

This standardized format ensures consistency across different analysis workflows while providing clear, interpretable information about repeat-altering variants.

How to Use This Feature in VarSeq

To take advantage of this new feature:

  1. From VarSeq select Add → Computed Data
  2. Under the Project/Cohort folder, Select the Annotate Transcripts algorithm
  3. Select your desired gene track
  4. Open the Output Options tab
  5. Click the Enabled option next to Repeat Notation

Conclusion

This new feature reflects our ongoing commitment to supporting high-quality, standards-compliant transcript annotation. Whether you’re analyzing exomes, panels, or whole genomes, VarSeq’s support for flexible and accurate HGVS repeat notation empowers you to communicate results with greater precision. If you’re interested in learning more about VarSeq’s powerful annotation capabilities, please reach out to our team.

Leave a comment

Nathan Fortier

About Nathan Fortier

Nathan Fortier, Ph.D, Director of Research for Golden Helix, joined the development team in June of 2014. Nathan obtained his Bachelor’s degree in Software Engineering from Montana Tech University in May 2011, received a Master’s degree in Computer Science from Montana State University in May 2014, and received his Ph.D. in Computer Science from Montana State University in May 2015. Nathan works on data curation, script development, and product code. When not working, Nathan enjoys hiking and playing music.

View all posts by Nathan Fortier →