Why should a genetic researcher care about the latest in video gaming technology? The answer is video graphics cards or Graphics Processing Units (GPUs). For certain computational tasks, a single GPU can perform as well as an entire cluster of CPUs for only a fraction of the cost. And because video gaming has grown into a highly competitive multi-billion dollar industry, the cost of powerful graphics hardware is being driven ever downward. Although GPUs are optimized for graphics rendering, new technologies like OpenCL and CUDA have made it possible for programmers to put the GPU to work on more general computational tasks such as in scientific computing.
You might be wondering why anyone would want to bother reprogramming a GPU to perform scientific computing or more specifically bioinformatics. GPUs are highly specialized for one task: Drawing 3D polygonal models as quickly as possible. This is surprisingly similar to many tasks in bioinformatics. Here’s a rough breakdown of the process
Drawing 3D polygonal models:
- Load millions of polygons into memory.
- Compute the position, lighting, and texture for each polygon.
- Output the results to the screen.
- Load millions of data points into memory.
- Compute (or accumulate) a statistic of interest for each point.
- Output the results to a file.
Given the exponential growth of data generated from the latest in microarrays and next-generation sequencing, these similarities motivated us to investigate using GPU computing to accelerate certain aspects of our own genetic analysis package, the SNP & Variation Suite.
GPUs for CNV analysis
The first task we decided to tackle was copy number segmenting. The segmenting algorithm implemented in the Copy Number Analysis Module (CNAM) of SVS is an optimal algorithm, which produces high quality CNV segmentation (we’d argue the highest), but quality comes at the expense of computation time. If we can dramatically speed the computation, our customers will no longer have to choose between quality and speed. Using the new OpenCL standard, we were able to produce a version of CNAM that runs on a variety of GPUs, which Golden Helix customers can look forward to in the upcoming 7.4 release of SVS.
So, how does a GPU perform on copy number analysis? To find out, I compared several CPU and GPU processors that we had around (see table below) to run CNAM and measured how many samples could be processed per hour. I used some sample data from the Affymetrix 2.7M array since it is fairly representative of the high density data-sets that can be burdensome to segment.
Summary of Processors Used for CNV Test
|Approx. Cost as of Oct 2010
|Intel® Core™ 2 Duo T7250
|Intel® Core™ I7 860
|Intel® Xeon™ E5430
|NVidia® GeForce™ GTX260
|NVidia® GeForce™ GTX480
Here are the results when using a 25,000 marker moving window and a permutation p-value of 0.001:
As you can see, the GPU’s have a respectable advantage over the CPU’s. Even the Xeon™ E5430 (a high-end quad core server processor) can’t keep up with the GTX260, which is an older mid-range consumer graphics card. Switching from a Core2 Duo to a GPU would yield a 20x-40x speed up, which translates into several weeks’ time savings for some whole genome projects.
In this first test, chromosomes were divided into 25,000 marker windows which were each optimally segmented and then combined. Breaking the chromosomes into “bite sized” pieces saves time, but does not produce a globally optimal segmentation. So I disabled the moving window for the next test, meaning each chromosome has to be handled in one large batch (up to 210,000 markers in the case of chromosome 2). Due to time constraints, the Core™2 Duo was omitted from this test.
The CPU’s take a big performance hit when processing an entire chromosome, but the GPU’s (which are designed to handle millions of polygons) barely flinch. The GTX480 is actually a bit faster when not using a moving window. When using a GPU, researchers will no longer need to settle for the moving window approximation.
CNAM uses permutation testing to discard superfluous segments. Unfortunately, we have found that permutation testing does not benefit from GPU computing, which makes it the main bottleneck when segmenting with a GPU. However, if you are able to afford a more relaxed p-value of 0.005, it is possible to squeeze even more speed out of the GPU’s:
What creates GPU’s superior performance?
It’s worthwhile to briefly discuss how the GPU achieves such high performance. The secret lies in the GPU’s massive parallelism. Take a look at a typical CPU’s architecture vs. a typical GPU’s architecture:
A CPU commonly has 4 to 8 fast, flexible cores clocked at 2-3Ghz, whereas a GPU has hundreds of relatively simple cores clocked at about 1Ghz. Tasks that can be efficiently divided across many threads will see enormous benefits when running on a GPU. This highly parallel architecture is the reason that a GPU can process such large batches of copy number data so quickly.
A real game changer?
It’s hard to tell whether or not GPUs will revolutionize the industry, but here’s what we know: data sets are getting exponentially larger and the computation on that data exponentially more expensive. I highlighted an example here where, because of GPUs, one is able to get truly optimal results rather than succumb to approximations given computational limitations, and at a fraction of the cost. There are many similar examples in bioinformatics, especially with next-generation sequencing becoming more mainstream (think exhaustive gene-gene interaction interactions on 10 million or more variants). Nonetheless, GPUs provide a practical way to keep pace with ever expanding data sets without investing in cumbersome compute clusters, centralizing data processing, or going to the cloud for everything.
Who knows…maybe names like Nvidia® GeForce™ GTX 480 or ATI FirePro™ V9800 will be as common in our industry as Illumina® HiSeq2000, Affymetrix® Genome-Wide Human SNP Array 6.0, and Applied Bio SOLiD™ 4hq. …And that’s my two SNPs.