Real-World Applications of VarSeq in Genetic Research

         March 25, 2025
Real-World Applications of VarSeq in Genetic Research Header

Genomic research is advancing at an unprecedented pace, and with it comes the need for powerful tools that can turn raw sequencing data into meaningful insights. VarSeq, our variant analysis and interpretation platform, has become a go-to solution for researchers tackling complex genetic questions. Whether in rare disease diagnostics, population genetics, cancer research, or infectious disease studies, VarSeq provides the accuracy, efficiency, and flexibility needed to make sense of vast genomic datasets. In this blog, we’ll share some real-world use cases, highlighting how researchers are leveraging VarSeq to push the boundaries of genetic discovery.

Case Report: Prolonged survival in Schinzel–Giedion syndrome featuring megaureter and de novo SETBP1 mutation

Background: Rare early-onset lower urinary tract (REOLUT) disorders affect the ureter, urinary bladder, or urethra and manifest before birth or in childhood. Monogenic causes have been reported in a subset of such individuals.

Objectives: A possible genetic cause was considered in a child with a megaureter who had syndromic features.

Subjects and methods: Whole-exome sequencing was undertaken in individuals with megaureter. Immunohistochemistry was performed in urinary tract tissues of unaffected human fetuses.

Results: The index case presented at 6 months with urosepsis and was found to have a unilateral primary non-refluxing megaureter which required stenting of its distal portion. This, together with dysmorphic features and developmental delay, led to a clinical diagnosis of Schinzel–Giedion syndrome (SGS). She was found to carry a de novo missense variant in SET binding protein 1 (SETBP1), c.2613T>G (GenBank: NM_015559.3) (p.Ile871Met), a gene previously implicated in SGS. She was in good general health at 11 years of age, an unusual outcome given that most individuals with SGS die in the first 2 years of life. SETBP1 was detected in the fetal urinary tract, both in the urothelium and in nerve trunks in the kidney hilum and around the ureter. No SETBP1 gene variants were detected in eight further cases of megaureter.

Conclusions: This case indicates the value of genetic testing when a REOLUT disorder is accompanied by syndromic signs outside the urinary tract. SETBP1 may drive the functional differentiation of the human fetal ureter.

The index case and others with megaureter were recruited to a research study investigating genetic causes for REOLUT disorders. Informed consent was provided by the parents (Integrated Research Application System study code 286682) who were themselves asymptomatic. They also permitted the sharing of the facial image of the index case. Whole-exome sequencing (WES) was carried out for the index individual, as previously described by the Beijing Genomics Institute (BGI) (6). The paired-end sequencing was performed using a BGI exome kit version 4 (59M) 6G BGI-Seq500, and the expected average sequencing coverage was ≥50× per nucleotide. Sequencing reads were aligned to the human reference genome (GRCh37/Hg19) using the Burrows–Wheeler Aligner software (BWA-short v.0.6.2). Raw data (FASTQ files) were converted to Variant Call Format (VCF) and Binary Alignment Map (BAM) files, which are standard formats for storing variant data. Analysis of exome data was performed in-house using VarSeq v2.2 (Golden Helix, Inc., Bozeman, MT, USA; http://www.goldenhelix.com). Duplicate reads were removed. Initially, the genome data were filtered for rare or novel variants in genes previously associated with LUT. Subsequently, an agnostic approach to variant filtering was applied prioritizing variants with an ultrarare minor allele frequency for both dominant and recessive inheritance models and those with in silico evidence of pathogenicity.

Beaman GM, Jarvis BW, Goyal A, Keene DJB, Cervellione M, Lopes FM, Metcalfe KA, Woolf AS and Newman WG (2025) Case Report: Prolonged survival in Schinzel–Giedion syndrome featuring megaureter and de novo SETBP1 mutation. Front. Pediatr. 13:1534192. doi: 10.3389/fped.2025.1534192

Detection of Genetic Variants in Thai Population by Trio-Based Whole-Genome Sequencing Study

This trio-based whole-genome sequencing (WGS) study enhances the accuracy of variant detection by leveraging parental genotypes, which facilitates the identification of de novo mutations and population-specific variants. Nonetheless, the comprehensive genetic variation data of the Thai population remain limited, posing challenges to advancing personalized medicine and population-based screening strategies. We establish the genetic variation information of a healthy Thai population by analyzing the sequences of 40 trios, yielding 120 whole genomes (excluding offspring). The resulting dataset encompasses 20.2 million variants, including 1.1 million novel and 19.1 million known variants. Within this dataset, we identify 169 pathogenic variants, of which 56 are classified as rare and 87 are absent from the ClinVar database as of version 2023. These pathogenic variants, particularly the rare and de novo mutations, will likely be of significant interest for genetic association studies. Notably, one pathogenic variant linked to a de novo mutation is found in the SF3B2 gene, which is associated with craniofacial microsomia. With its innovative methodology and comprehensive dataset, our trio-based whole-genome sequencing study provides an invaluable representation of the genetic variations in the Thai population. These data provide a critical foundation for further analyses of the pathogenic variants related to human disease phenotypes in genetic association studies.

The final step in the family-based whole-genome sequencing study involved genotype refinement using the Genome Analysis Toolkit (GATK) (package version 4.0.12.0). The bioinformatics tools used in the genotype refinement pipeline included the CalculateGenotypePosteriors, VariantFiltration, and VariantAnnotator (PossibleDeNovo). Additionally, the VarSeq® software version 2.2.1 (Golden Helix®, Bozeman, MT, USA) was employed for variant filtration and annotation with default parameters.

Boonin, P.; Klumsathian, S.; Iemwimangsa, N.; Sensorn, I.; Charoenyingwatana, A.; Chantratita, W.; Chareonsirisuthigul, T. Detection of Genetic Variants in Thai Population by Trio-Based Whole-Genome Sequencing Study. Biology 2025, 14, 301. https://doi.org/10.3390/biology14030301

EV DNA from pancreatic cancer patient-derived cells harbors molecular, coding, non-coding signatures and mutational hotspots

DNA packaged into cancer cell-derived EV is not well appreciated. Here, we uncovered signatures of EV DNA secreted by pancreatic cancer cells. The cancer cells and non-cancer counterparts exhibit distinct low vs. high molecular weight (LMW vs. HMW) EV DNA fragments distribution, respectively. Genome sequencing and Single Nucleotide Variants analysis revealed that 95% of reads and 94% of SNVs map to noncoding regions of the genome. Given that ~1% of the human genome represents coding regions, the 5% mapping rate to coding regions suggests a non-random enrichment of certain coding regions and mutations. The LMW DNA fragments not only set cancer cells apart, but also harbor cancer specific enrichment of unique coding regions, the top nine being FAM135B, COL22A1, TSNARE1, KCNK9, ZFAT, JRK, MROH5, GSDMD, and MIR3667HG. Additionally, the cancer cells’ LMW DNA fragments exhibit dense centromeric mapping more strikingly on chromosomes 3, 7, 9, 10, 11, 13, 17, and 20. Mutational profiling turned up close to 200 mutations specific for the cancer cells. Altogether, our analyses suggest that centromeric regions might hold clues to EV DNA content from pancreatic cancer, the molecular, mutational signatures thereof, and rationalizes the need for a new approach to DNA biomarker research.

“Variant analysis was performed using VarSeqTM v2.5.0 (Golden Helix, Inc., Bozeman, MT, www.goldenhelix.com) (ref: VarSeq™ (Version 8.x) [Software]. Bozeman, MT: Golden Helix, Inc. Available from http://www.goldenhelix.com). Overall, 10,066,263 variants were detected in the dataset. Variants were quality-filtered to a minimum read depth of at least 100, without a LowQual flag, as well as have a variant allele fraction ≥0.1. Additionally, variants with allele frequencies ≤0.01 and ≥0.99 in gnomAD and 1 kG Phase3 databases were kept42,43. Variants remaining after filtration are summarized and visualized using ggplot2 in R.

Olou, A.A., Tom, W.A., Krzyzanowski, G. et al. EV DNA from pancreatic cancer patient-derived cells harbors molecular, coding, non-coding signatures and mutational hotspots. Commun Biol 8, 368 (2025). https://doi.org/10.1038/s42003-025-07567-1

Genetic Landscape and Mitochondrial Metabolic Dysregulation in Patients Suffering From Severe Long COVID

Long COVID represents a significant global health challenge with an unclear etiology. Alongside accumulating evidence of mitochondrial dysfunction in patients with acute SARS-CoV-2 infection, a symptomatic overlap exists between long COVID and mitochondrial disorders. However, the genetic underpinnings of mitochondrial dysfunction in long COVID have not been previously explored. We employed whole genome sequencing to analyze 13 patients with severe long COVID to identify genetic defects related to mitochondrial function. We performed extracellular bioenergetics flux analysis on peripheral blood mononuclear cells and proteomics to evaluate cellular bioenergetics and compared the results to those of healthy controls. Our investigation identified 10 variants classified as pathogenic or likely pathogenic and 83 variants of unknown significance affecting a wide range of mitochondria-associated biological functions. Bioenergetics flux analysis in peripheral blood mononuclear cells revealed an altered ATP production rate in four long COVID patients compared to healthy controls. This study presents initial evidence of a potential underlying genetic predisposition to mitochondrial dysfunction in long COVID while demonstrating altered cellular energy capacity in a subset of these patients. These findings open avenues for further research into the role of mitochondrial dysfunction and pathology in patients suffering from long COVID and may pave the way for targeted therapeutic strategies aimed at mitigating mitochondrial dysfunction.

Variant analysis and filtering were performed in VarSeq 2.3.0 (Golden Helix Inc, Bozeman, MT). First, variants were filtered based on quality, keeping only high-quality variants. Variants were retained if they had either PASS or missing Variant Quality Score Recalibration (VQSR) scores (not used on WGS data), variant allele frequency in the sample > 0.25, genotype quality ≥ 20, and read depth > 30 (WES) or > 10 (WGS). Second, variants were kept if the minor allele frequency was < 0.001 in gnomAD exomes. Third, variants were filtered based on deleteriousness. Only variants classified as loss of function, missense, or located within 20 bp of splice sites were considered. Additionally, they had to fulfill the American College of Medical Genetics and Genomics (ACMG) classification as pathogenic, likely pathogenic, or VUS, and meet at least one of the following criteria: Combined Annotation Dependent Depletion (CADD) score ≥ 15 (or missing) or Rare Exome Variant Ensemble Learner (REVEL) score ≥ 0.25 (or missing). Finally, a filter for biological relevance to mitochondrial dysfunction was applied using three gene lists: Human MitoCarta 3.0, the PanelApp Mitochondrial Disorders (v. 4.113), and an in-house list of muscle-related genes. Additionally, only variants in genes with a gene damage index above 1,383,953 were retained, while variants found in uncertain reads within low-complexity areas were removed. All retained variants underwent thorough manual assessment regarding their impact on gene function and interference with relevant cellular and immune antiviral pathways. Their pathological potential, as documented in existing literature, was considered in relation to existing hypotheses on long COVID pathogenesis.

Hansen KS, Jørgensen SE, Cömert C, Schiøttz-Christensen B, Bross P, Agergaard J, Leth S, Østergaard L, Palmfeldt J, Olsen RKJ, Mogensen TH. Genetic Landscape and Mitochondrial Metabolic Dysregulation in Patients Suffering From Severe Long COVID. J Med Virol. 2025 Mar;97(3):e70275. doi: 10.1002/jmv.70275. PMID: 40025839; PMCID: PMC11873671.

Genomic research doesn’t just require data—it demands precision, speed, and the right tools to make sense of it all. VarSeq has proven itself as a trusted solution for genetic analysis, helping researchers diagnose rare diseases, identify novel population variants, and uncover critical cancer mutations. As the field continues to evolve, we remain committed to providing the technology that empowers scientists to accelerate discoveries and drive innovation in precision medicine.

Leave a Reply

Your email address will not be published. Required fields are marked *