Warehouse Support for Long-read Part 1

· Andrew Legan · About Golden Helix

Native Support for WDL

Warehouse Support for Long-read Part 1: Native Support for WDL

As long-read sequencing delivers on its promise, it is awesome to see our customers moving beyond single-nucleotide variants (SNVs) and insertions/deletions (INDELs) into structural variants (SVs), tandem repeats, haplotype phasing, and even methylation signals from the same dataset.

But for most labs, the bottleneck is implementation of long-read analysis workflows. Getting a best-practice pipeline to run once is not even the hardest part (although not always trivial!). Getting it to run the same way every time, across many samples, with clean traceability and downstream handoff to evaluation and reporting systems… that can take a lot of time to configure, and even more time to repeat if the documentation isn’t thorough.

That is why VSWarehouse 3 (VSW3) added native support for WDL (Workflow Description Language, a human-readable language for defining tasks and workflows). WDL makes complex pipelines portable and shareable, but portability alone does not guarantee day-to-day reliability. Labs still need consistent parameterization, repeatable execution, and a practical way to manage outputs so they are immediately usable for interpretation and reporting.

In VSW3, “native support” is about integration. You can run upstream WDL workflows faithfully while keeping full alignment with the Warehouse automation environment.

A concrete example is the PacBio HiFi WGS workflows we ship. This automation includes prerequisite tasks for preparing the required reference bundle, individual tasks that can be run independently, and an execution step that runs the upstream WDL singleton workflow through MiniWDL, all within your Warehouse environment.

With these workflows, the automation infrastructure is designed so that you can always quickly and confidently answer questions like:

  • Which container image (and version) ran?
  • What exact parameters were used?
  • What upstream inputs produced this output?
  • What resources were required (CPU, RAM, disk)?
  • Where are the logs and QC outputs for audit/troubleshooting?

A system for repeated automation of long-read analysis

At a high level, the upstream singleton workflow covers the core long-read stages you would expect for production WGS. That includes alignment (pbmm2), small variant calling (DeepVariant), SV detection (Sawfish), tandem repeat genotyping (TRGT), phasing and haplotagging (HiPhase), and coverage metrics (Mosdepth). It also includes specialized modules that long-read programs increasingly rely on, such as mitochondrial analysis (MitorSaw), pharmacogenomics (PBStarPhase with downstream formatting for importing the diplotype calls to your VarSeq PGx projects), and complex locus haplotyping (Paraphase). These are the kinds of analyses that often force labs into brittle one-off scripts if the workflow system cannot manage them cleanly.

VSW3 helps ensure the outputs land in a consistent location and structure, even linking them to automated tertiary analysis steps. Rather than just running a pipeline, users can get results that are immediately ready for interpretation and reporting. As with all of the Golden Helix software suite, nothing is a black box, and users can configure any step in the process.

This is the real value of native WDL support in VSW3. Instead of treating WDL as something you run off to the side, you run it inside a governed workflow system designed to connect automated secondary analysis to downstream tertiary interpretation and reporting workflows.

Leave a comment

Andrew Legan

About Andrew Legan

Andrew Legan joined Golden Helix in 2025 as a Technical Field Application Scientist. Andrew graduated in 2015 with a BA from Vanderbilt and in 2022 with a PhD from Cornell Neurobiology and Behavior. He was a postdoc at the USDA and University of Arizona, conducting research in comparative genomics. Outside of work, Andrew enjoys playing the drum set and exploring the outdoors.

View all posts by Andrew Legan →