A special thank you to everyone who was able to attend our most recent webcast, Dynamic and Flexible Fullstack NGS Pipelines in VSWarehouse 3. We covered a lot of ground in this session and wanted to follow up with you about our lingering Q&A questions about the functionality of VSWarehouse 3 (VSW3):
Questions
Q: Could you clarify ‘tasks’ versus ‘workflow’ versus ‘pipelines’?
A: While all of these are somewhat generic terms, in VSWarehouse, we’re using them in specific contexts. Let’s break it down:
- Task: In VSW3, a task is an immutable component of a workflow. That is, it is a process that can’t really be broken down into sub-processes, or for convenience simply hasn’t been broken down further. A task can be anything, from something like aligning a FASTQ into a BAM or CRAM, to pulling sample information from a LIMS. Tasks generally have specific inputs and outputs.
- Workflow: In VSW3, a workflow is just a directed acyclical graph (DAG) of tasks, where certain tasks depend on the inputs from other tasks. In the examples that we looked at, we had simple workflows where the inputs of each task are the outputs of the previous one. For instance, one task takes as input a set of FASTQs and outputs a BAM file, and the next one takes in a BAM and outputs a VCF. A pipeline is just a colloquial synonym for a workflow, not to be confused with VSPipeline, our command-line interface for VarSeq.
Q: For the cloud installation, is there only one cloud that this works on (like AWS) or are you flexible?
A: VSW3 can be deployed on any cloud infrastructure, or a mix of different clouds and local hardware.
Q: There was a lot of talk about ‘glob’ expressions that seemed to be using RegEx to truncate output file names. Is there any documentation or would the FAS Team be able to help me set that up?
A: Regular Expression or RegEx can seem intimidating at first, but do not worry! Our FAS Team has a lot of experience in this field and will be happy to coach you through setting up your workflows in your lab, including using RegEx to specify your file name outputs. We also include documentation directly within the VSW3 interface.
Q: Of the workflows seen in the webcast, are these all shipped with the software, or do I have to start from scratch?
A: Several secondary and tertiary pipeline workflows do ship with the software. We try to make these workflows and tasks as accessible and flexible as possible so users can adapt them to their needs. We also provide support for porting users’ existing pipelines into VSWarehouse 3.
Final Thoughts
Our Dynamic and Flexible Fullstack NGS Pipelines in VSWarehouse 3 webcast was a lot of fun, where we got to demonstrate how we can consolidate disparate secondary analysis pipelines in VSWH3 for streamlined workflows. We also got to demonstrate how we can leverage VSWH3’s infrastructure flexibility to optimize performance for specific project needs. Overall, this highly customizable interface leads users to the same kinds of VarSeq projects they have gotten used to over the years.
If you would like to know more about VSWarehouse3, please check out our other webcasts: