VarSeq PGx inputs: Converting Microarray data to VCF format

         April 3, 2024

The release of VarSeq version 2.6.0 provides many new features. Most notably is our support for tertiary analysis of Pharmacogenomic (PGx) data. VarSeq not only calls the necessary gene diplotypes for your PGx panels but also handles large batches of samples from the called diplotypes to final report on drug recommendations. Here is a link to a recent webcast demonstrating the capabilities. This feature has been well received as many clinical labs are already processing a variety of PGx panels from microarray platforms. However, there is an effort to utilize next-generation sequencing (NGS) to ensure coverage of all relevant genomic variations across the genome.

From a support perspective, we quickly realized that we needed data from two different genotypic file types to import into the VarSeq software. Fortunately, we were quick to provide a solution.

NGS based PGx

For several of our current users, it may be an expectation to handle the PGx analysis utilizing the existing sequencing data. That is indeed the case with our PGx platform in VarSeq, as it is designed to import the same VCF file formats currently generated in your secondary analysis pipeline. The imported format is consistent whether calling panels, exomes, or the entire genome. Moreover, the goal of the genome is to generate calls for the entire spectrum of variant types, including small variants or SNVs, indels, copy number variants, and other structural variants. This is another advantage to long-read sequencing and a justification for our blossoming partnership with PacBio. However, the implementation of NGS data into clinical pharmacogenomics may still be in early transition, if not just research for most labs.  

Array-based PGx

The commercial PGx arrays can range from a small handful of variants up to a quite large list of all drug response-related variants. The scale of variant load is not a problem for VarSeq as it has proven its ability to run large genome projects with multiple samples. However, getting array data into the software poses a unique challenge. Fortunately, array data is relatively straightforward forward, sometimes simply two fields being a variant identifier and the associated genotypes for the given sample. Golden Helix has developed a process of converting the array data to the VCF structure necessary to build a VarSeq project. The landscape of array data formats may vary across platforms, but the strategy for conversion remains constant. If you currently have array data and seek to handle the automated tertiary process of diplotype calling, associating phenotypes, and subsequently rendering reports, we encourage you to reach out to our support team at to set up this conversion and test our VarSeq platform.


van der Lee M, Kriek M, Guchelaar HJ, Swen JJ. Technologies for Pharmacogenomics: A Review. Genes (Basel). 2020 Dec 4;11(12):1456. doi: 10.3390/genes11121456. PMID: 33291630; PMCID: PMC7761897.

Leave a Reply

Your email address will not be published. Required fields are marked *