Genotype imputation is a common and useful practice that allows GWAS researchers to analyze untyped SNPs without the cost of genotyping millions of additional SNPs. In the Services Department at Golden Helix, we often perform imputation on client data, and we have our own software preferences for a variety of reasons. However, other imputation software packages have their own advantages as well. This motivated us to perform some tests to assess certain performance features, such as accuracy and computation time, of a few common imputation software programs.
For this comparison, we tested three different imputation softwares: BEAGLE, IMPUTE2, and Minimac. Imputation was performed both with and without pre-phasing the sample data with BEAGLE and IMPUTE2. Minimac is an implementation of the MaCH method that utilizes pre-phasing. We did not run MaCH without pre-phasing due to computational constraints. Pre-phasing is a technique that can significantly improve computation time with a slight accuracy trade-off by phasing the sample data prior to running imputation (as opposed to phasing the sample data during imputation).
The variables measured include imputation accuracy (concordance rates), imputation quality, computation time, and memory usage. Concordance for each SNP is measured by taking the total number of accurate genotypes (comparing the imputed data against the full dataset) over the total number of genotypes or samples. Quality was determined by looking at the per-SNP quality metrics provided by each program. These metrics differed and recommended appropriate thresholds were used separately for each. Computation time was measured based on running each program on a 64-bit Linux computer with 16GB of memory.
The baseline study data included 141 unrelated HapMap samples genotyped on Illumina Omni1, representing the three major HapMap population groups. We imputed these samples based on the 1000 Genomes Phase 1 v3 reference panel as provided on each imputation program’s website. In order to simulate how a researcher would typically perform imputation on their own data, the reference datasets were downloaded directly from each program’s website and were not modified. Each data provider filters the reference data in a slightly different way, so this means that the reference datasets were not identical, even though all were derived from the same original dataset. The sample data was limited to only include SNPs in chromosome 20.
The following Venn diagram represents the overlap of genetic data at the same genomic position between the three reference datasets and the original 1000 Genomes dataset. Therefore, the total number of rows found in each dataset is slightly more than the number displayed on the diagram, since some variants have duplicate positions.
An interesting point to note about this diagram is the existence of markers in the IMPUTE2 and BEAGLE reference dataset at genomic positions that were not found in the original 1000 Genomes dataset. Upon further investigation, most of these could be attributed to one-off position differences with some indels reported in the 1000 Genomes dataset. This demonstrates how different data processing pipelines handle complex genotype information in slightly different ways. For the same reason, the total number of markers at unique positions differs in each version of the reference dataset.
All programs outperformed others in certain areas. Based on all of the metrics measured, IMPUTE2 seemed to perform with the greatest accuracy and quality although other programs performed better in other areas.
As expected, pre-phasing the original dataset drastically improved the total compute time. When the data was pre-phased, IMPUTE2 ran the quickest, followed by Minimac, and then BEAGLE. Without pre-phasing, IMPUTE2 was much faster than BEAGLE.
IMPUTE2 also had superior concordance rates, although all software programs performed well in this area. Minimac had the lowest concordance rate at 96.25%.
|Total Compute Time*
|Mean SNP Concordance
|Total # SNPs
|# High Quality SNPs
|% High Quality Imputed
|IMPUTE2 with Pre-phasing
|BEAGLE with Pre-phasing
*includes all steps required
Without pre-phasing, IMPUTE2 had the highest quality imputation, but after pre-phasing, the certainty metric provided in the IMPUTE2 output dropped dramatically (see first figure below). The R^2 accuracy value given by BEAGLE was also lower in the output based on pre-phased data, but the change was not nearly as dramatic (see second figure below).
An unfortunate side effect of IMPUTE2 was the intensive memory usage. IMPUTE2 used all available RAM (16 GB) making it impossible to perform any other tasks. BEAGLE and Minimac, on the other hand, used far less memory (although took longer to finish). BEAGLE was run using the “lowmem” option for more efficient memory usage, which also had the effect of increasing runtime.
All of the 141 test samples are also included in the 1000 Genomes reference panel. We recognize that this may bias the accuracy of the results, but it was acceptable for our purposes. The concordance rates represent how well each imputation program was able to reproduce genotypes for samples where the correct answer was already present in the reference panel. The algorithms used in each program may be more or less appropriate for this situation.
Another metric not discussed previously is the availability of documentation. In this category, BEAGLE wins. Not only do they have a nice PDF manual, we’ve had great success in asking specific questions to the authors and getting thorough responses in a timely manner.
In summary, choosing the most appropriate imputation program to use depends on the qualities most important to the researcher and the hardware available. An important factor in our testing was that we chose to run the entire length of chromosome 20 in a single batch. The performance of the various tools, particularly with regard to compute time, would likely be quite different had we run the imputation in smaller batches.