VarSeq PGx inputs: Converting Microarray data to VCF format

· Darby Kammeraad · Software Update

The release of VarSeq version 2.6.0 provides many new features. Most notably is our support for tertiary analysis of Pharmacogenomic (PGx) data. VarSeq not only calls the necessary gene diplotypes for your PGx panels but also handles large batches of samples from the called diplotypes to final report on drug recommendations. Here is a link to a recent webcast demonstrating the capabilities. This feature has been well received as many clinical labs are already processing a variety of PGx panels from microarray platforms. However, there is an effort to utilize next-generation sequencing (NGS) to ensure coverage of all relevant genomic variations across the genome. https://pubmed.ncbi.nlm.nih.gov/33291630

From a support perspective, we quickly realized that we needed data from two different genotypic file types to import into the VarSeq software. Fortunately, we were quick to provide a solution.

NGS based PGx

For several of our current users, it may be an expectation to handle the PGx analysis utilizing the existing sequencing data. That is indeed the case with our PGx platform in VarSeq, as it is designed to import the same VCF file formats currently generated in your secondary analysis pipeline. The imported format is consistent whether calling panels, exomes, or the entire genome. Moreover, the goal of the genome is to generate calls for the entire spectrum of variant types, including small variants or SNVs, indels, copy number variants, and other structural variants. This is another advantage to long-read sequencing and a justification for our blossoming partnership with PacBio. However, the implementation of NGS data into clinical pharmacogenomics may still be in early transition, if not just research for most labs.  

Array-based PGx

The commercial PGx arrays can range from a small handful of variants up to a quite large list of all drug response-related variants. The scale of variant load is not a problem for VarSeq as it has proven its ability to run large genome projects with multiple samples. However, getting array data into the software poses a unique challenge. Fortunately, array data is relatively straightforward forward, sometimes simply two fields being a variant identifier and the associated genotypes for the given sample. Golden Helix has developed a process of converting the array data to the VCF structure necessary to build a VarSeq project. The landscape of array data formats may vary across platforms, but the strategy for conversion remains constant. If you currently have array data and seek to handle the automated tertiary process of diplotype calling, associating phenotypes, and subsequently rendering reports, we encourage you to reach out to our support team at [email protected] to set up this conversion and test our VarSeq platform.

References

van der Lee M, Kriek M, Guchelaar HJ, Swen JJ. Technologies for Pharmacogenomics: A Review. Genes (Basel). 2020 Dec 4;11(12):1456. doi: 10.3390/genes11121456. PMID: 33291630; PMCID: PMC7761897.

Leave a comment

Darby Kammeraad

About Darby Kammeraad

Darby Kammeraad is the Director of Field Application Services at Golden Helix, joining the team in April of 2017. Darby graduated in 2016 with a master’s degree in Plant Sciences from Montana State University, where he also received his bachelor’s degree in Plant Biotechnology. Darby works on customer support and training. When not in the office, Darby is learning how to play guitar, hunting, fishing, snowboarding, traveling or working on a new recipe in the kitchen.

View all posts by Darby Kammeraad →