Our webcast on Integrating Long and Short Read Sequencing for Comprehensive NGS Analysis was a timely review of a topic. We discussed how users are leveraging both short and long-read sequencing modalities for comprehensive NGS analyses, reviewing the differences between long and short-read sequencing, the benefits and limitations of each modality, and how they complement each other. The theme was how Golden Helix facilitates streamlining and validating these workflows, as our VarSeq software provides the interface that can create comprehensive tertiary analysis workflows on both short and long-read data, and we also partner with Sentieon for secondary analysis pipelines that can transform both short and long read sequencing data into formats compatible for VarSeq tertiary analysis. Our demonstration depicted a comprehensive family analysis project where both short and long-read sequencing were used to evaluate potentially causative variants behind a rare ocular disorder called aniridia. You can watch the webcast recording here.
We reviewed how well long-read technology has increased the yield of data that can be derived from sequencing a genome or parts of a genome. Short-read sequencing is globally well established and, as such, is the workhorse for NGS, but where short-read falls short, especially for genomic events that are buried in the dark regions of the genome, long-read sequencing is an excellent complementary approach for the resolution of these unsolved cases. We explored the idea that when trying to achieve the most comprehensive NGS analysis, do users lean towards sacrificing the yield of long reads for familiarity and the low cost of short reads, or do they just dive into the new technology? What is being observed is that transitions seem to make the most sense, especially for people in a clinical scenario, so there is a gradual adoption of long-read sequencing.
A number of questions came up during the webcast that we would like to recap and provide answers to here:
Can VarSeq be used to filter or sort DNA modifications such as methylation info from long-read data? If not – is it in the pipeline to do so? Is it in the works to incorporate methylated BAMs such as the ones produced by ONT?
VarSeq provides the ability to import and visualize differentially methylated regions (DMRs). With our secondary tables import tool, you can import these regions (usually a .tsv file from PacBio or ONT methylation pipelines) for annotation and filtering. You may visualize differential methylation scores and calls. The ability to visualize 5mC bases in BAM files in our Genome Browser is under consideration for feature development. Please reach out to [email protected] with your use cases for methylation data to see how we can best support you.
For the family trio VarSeq analysis, did you compare long reads from PacBio vs Oxford Nanopore?
While we did not present a side-by-side comparison of the same samples processed by PacBio versus Oxford Nanopore in our webcast demonstration, a comparison like this is quite easily accomplished within VarSeq. This article also highlights the comparison between PacBio and Oxford Nanopore.
Do you have a “normal” (without CNVs) sample database to use as a reference to detect CNVs?
The VarSeq CNV caller does not use a panel of normals. We use a reference set normalization method, wherein we normalize the coverage data for a given sample and compare it to a normalized mean value of diploid coverage across a collection of germline reference samples provided by the user. The user’s reference set must be derived from samples that were prepared on the same sequencing platform and using the same library prep method for consistency. When using this method, the reference set does not need to be only control samples without CNV events. Having multiple reference samples and averaging the normalized coverage is enough to prevent any CNV event in a single reference sample from skewing the reference set overall. PacBio and ONT have their unique pipelines for CNV detection. You can read more about PacBio CNV calling here and Oxford Nanopore here.
Is there any strategy employed to overcome the limitations of long-read sequencing in detecting small variants, particularly in regions of high sequence complexity?
Over the years, the accuracy of long-read sequencing technologies has drastically improved and is continually improving. Current estimates place the accuracy of PacBio HiFi reads, for example, on par with short read sequencing, with a typical accuracy of 99.9%, as discussed in this article and this article.
Do you have an analysis of NIPS from MGI?
We are agnostic to the secondary analysis pipeline and can typically handle any secondary output that meets the standard VCF specifications. We are ready to support VarSeq users and potential customers in testing and importing data from new long-read pipelines, so feel free to reach out to us.
This is an exciting time in the NGS field, as newer technologies like long-read sequencing have greatly increased the yield from DNA sequencing and opened the door to more comprehensive analyses. With VarSeq, you have a tertiary tool that facilitates analyzing and integrating new and current technologies. If you are interested in how VarSeq can enhance your short and long-read analysis, please email us at [email protected] or visit us here to start your evaluation.