Golden Helix customers continue to push the boundaries of genomic research and clinical applications, leveraging VarSeq to analyze complex genetic data and uncover previously uncatalogued variants. With advancements in long-read sequencing and our recent updates to the VarSeq software, clinicians can now harness phasing information to distinguish between inherited and novel mutations in a Trio analysis with unprecedented accuracy. In this blog, we highlight a case study on how VarSeq integrates long-read sequencing data from providers like PacBio and ONT to phase variants and reveal critical genetic information that was once hidden, paving the way for new discoveries in hereditary and de novo mutations. Let’s dive in.
For this Trio, the Mother and Father are unaffected, while the Proband has an array of phenotypes. We are specifically focusing on the HPO terms: Intellectual Disability and Growth Delay. With our classic VarSeq Trio template, we are only able to analyze what we classically would consider ‘small variants’ with no regard for phased position as indicated by the genotype (GT) and phase set (PS) fields. While GT will be familiar to those who analyze germline samples, the PS field may be new. This field can be used to indicate which variants were sequenced on the same strand and, therefore, are on the same chromosome. By updating our workflow, we are able to bring our classic Inheritance Stack into the Collapsed Phased Variants filter chain. The key difference from our classical Trio workflow is that now we have the ability to leverage phasing information to look for even more variants that are de novo or inherited from a family member.
In addition, we can leverage the PhoRank algorithm to narrow the scope to variants related to our patient’s phenotype. As mentioned before, our patient has the phenotypes of Intellectual Disability and Growth Delay, while both parents are unaffected. This implies we are looking for either a de novo variant or a recessive variant. While there were a number of variants that fit the bill, our Collapsed Phased Variants algorithm was the real hero, bringing two very interesting variants to light.
First, we have a Pathogenic variant in EP300, a dominant gene. This variant is shown to be de novo in our Proband. What is interesting is the original VCF had this variant as two different variants:
- Proband: 22: 41172489 GT/AA GT:1|0
- Proband: 22: 41172491 T/AC GT: 1|0
These adjacent variants were also in phase. When collapsed down to one variant, we can see it is both highly pathogenic according to ClinVar and highly related to our phenotypes of interest.
- Proband: 22:41172489 GTT/AAAC GT:1|0
Next, we leverage our Phased Compound Het Detection to identify two variants in Compound Het through the use of the PS field. This algorithm was able to determine that two variants in NSUN2, a recessive gene, were inherited from the Mother and Father and are in Recessive Compound Het for our Proband.
- Mother: 5:6604227 G/C 0/1
- Father: 5:6609844 C/T 0/1
- Proband: 5:6604227 G/C 0|1 AND 5:6609844 C/T 1|0
These variants also are pathogenic and score highly on our PhoRank search for our phenotypes. Thanks to the phased set information, our new phased variant algorithms, and some updates to our standard Trio workflow, we are able to further analyze the data available to us to bring answers to families.
If you would like help updating your Trio templates to include Inheritance Stacks for Complex Variables and Collapsed Phased Variables, please email [email protected].