The Breakend Catalog Format for VarSeq 3

         July 8, 2025

New to VarSeq 3 is support for breakend catalogs. Breakends represent the junction points of structural variants, which include complex genomic rearrangements like translocations, inversions, and large deletions that play crucial roles in cancer genomics and rare disease analysis. These catalogs are unique as they must save two genomic positions for each record, representing both sides of the structural variant junction. This requires a unique catalog for this record type. We took this opportunity to reflect on the fields required to save each breakend record to simplify interpretation and enable a quick visual understanding of the breakend rearrangements.

The Current Challenge with Structural Variant Representation

Structural variants, particularly breakends representing complex rearrangements like translocations and inversions, play crucial roles in both cancer genomics and germline disease analysis. However, the multiple ways the two splice sites can be combined require some mental gymnastics when interpreting them. This combined with the fact that there are multiple correct ways to represent them (for example a CNV deletion can also be represented as breakends) leads to a lot of complexity when it comes time to interpret them. The VarSeq import process helps handle this by providing options to convert between CNVs and BNDs on import allowing users to work with their preferred representation. While the VCF standard provides a framework for representing breakends, it was not designed for human consumption but rather as a format to capture all the details observed when calling a structural variant.

The challenge when establishing a standard for storing structural variants into catalogs to track population frequency and previous clinical interpretations is to pick a representation that is both accurate and supporting humans having an intuition and mental visualization of these complex rearrangements. Traditional approaches often focus on the technical details of strand orientation and mate pair information. This can obscure the fundamental question: which genomic regions are actually joined together in the resulting fusion? Our breakend visualization tool highlighted this interpretive challenge.

This complexity led us to rethink how breakends should be stored and displayed in VarSeq 3. Rather than simply importing the technical representation from VCF files, we developed a more intuitive catalog format that emphasizes which genomic regions are retained and joined, making it easier for researchers and clinicians to quickly understand the functional impact of these complex rearrangements.

The new fields for the breakend catalogs are:

  • Chr – The chromosome of the described end of the event.
  • Start – The position of the described end of the event.
  • Stop – Matches the start position, as breakend positions are described as two zero-width locations representing the junction between two genomic segments. This field is included in the catalog so for completeness and support for reading the track as a genomic position (chr:start-stop).
  • Retained – Indicates whether the region before or after the genomic position was retained in the resulting breakend (in genomic space). Possible values are ‘Before’ or ‘After’.
  • Insertion – The inserted sequence between the two breakpoints, if present.

In addition to these fields, additional fields may be added to provide supplemental information. This can be used to support annotations as well as more complex interpretation workflows.

Example Breakends

We can look at how the breakends from the VCF spec would be handled. These entries show the different possible configurations of a breakend regarding the strand preserved from each side of the break point. The three variants in the example look like this when translated into VCF form:

##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=MATEID,Number=1,Type=String,Description="Breakend mate">
##reference=file:///mnt/data/files/human_38.fasta
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA19238
2	321681	bnd_W	G	G]17:198982]	6	PASS	MATEID=bnd_Y;SVTYPE=BND	GT	1/1
2	321682	bnd_V	T	]13:123456]T	6	PASS	MATEID=bnd_U;SVTYPE=BND	GT	1/1
13	123456	bnd_U	C	C[2:321682[	6	PASS	MATEID=bnd_V;SVTYPE=BND	GT	1/1
13	123457	bnd_X	A	[17:198983[A	6	PASS	MATEID=bnd_Z;SVTYPE=BND	GT	1/1
17	198982	bnd_Y	A	A]2:321681]	6	PASS	MATEID=bnd_W;SVTYPE=BND	GT	1/1
17	198983	bnd_Z	C	[13:123457[C	6	PASS	MATEID=bnd_X;SVTYPE=BND	GT	1/1

If you want to test out these variants yourself, you can download the file here:

Variant #1

Pasting the first fusion into our handy tool from the VCF visualization blog post, we get a fusion that looks like this:

In VarSeq this is the first row imported into the table with a ‘Left’ orientation, and a ‘Reverse’ mate strand. This is because the ‘Left’ region is preserved, and the joined region is on the reverse strand. With the new nomenclature we call this a ‘Before’, ‘Before’ for the Retained field since the region before each position is retained in the resulting fusion.

Variant #2

The second fusion looks like this:

In VarSeq this is a ‘Left’ oriented breakend with the joined region on the forward strand. To convert this to the new field values we just need to represent which side of each of the breakpoints are preserved. In this case the region ‘After’ the chr 2 position is attached to the region ‘Before’ the chr 13 position.

Variant #3

The final fusion looks like this:

In VarSeq this is a ‘Right’ oriented breakend with the mate on the ‘Reverse’ strand. When we add this to the catalog we have ‘After’ for both of the positions since the region after each break is preserved.

Conclusion

The new breakend catalog format in VarSeq 3 represents a significant step forward in structural variant analysis, providing researchers and clinicians with a streamlined approach to storing, annotating, and interpreting complex genomic rearrangements. By simplifying the field requirements while maintaining comprehensive genomic position information, these catalogs enable more efficient identification of clinically relevant breakends and potential sequencing artifacts. Additionally, we hope this new format makes it possible to easily mentally visualize the breakpoint being interpreted.

If you have questions about implementing breakend catalogs in your VarSeq 3 workflows or need assistance with structural variant analysis, please don’t hesitate to contact our team at [email protected]. We’re here to help you get the most out of these powerful new features.

Leave a Reply

Your email address will not be published. Required fields are marked *