Meta-analysis is an important tool to have in the bioinformatics toolbox. The numbers alone speak for themselves. It is the fourth most requested feature for SVS, and a simple google scholar search for 2014 and 2015 find 17,300 results for genetics + meta-analysis. There are several meta-analysis utilities out there that will take results from studies and perform the meta-analysis. Fingers crossed that you have all of the information you need and in a usable format!
Let’s step back for a minute and talk about meta-analysis. What it is and why should you consider it? Meta-analysis takes the results from existing studies and performs analysis on those results, not directly on the original data. Such an approach is valuable when you want to compare between different published results or compare results between different populations for an alternative to PCA correction or mixed-models analysis.
This differs from a standard genome-wide association study approach in that a GWAS starts from the raw data, or at least the genotype calls, and filters down the SNP markers and samples to remove low quality calls, probes or samples. Assuming you have enough cases and controls (or a quantitative phenotype) you can get good results from a GWAS. If you want to compare your results to another study, you could just look at the p-values (hint: not a great idea), you could try to get your hands on the raw data and run the new data through the same process you used for your data (great idea, probably not going to happen) or you could analyze just the results using a bit of extra information and perform a statistical test to compare. That statistical test is meta-analysis.
Of course, there is more than just one type of meta-analysis test. Which test you run will depend on what kind of information you have available. You will need at least the same type of data from all studies, either effect data (p-values) or inverse-variance-based (odds ratios or effect size) inputs. Of course if you use p-values, you will also need to know the effective sample sizes and the direction of the effect. With odds ratios you will also need the 95% confidence interval, and with effect sizes you will need the standard error. Now, you can mix and match odds ratios with effect sizes between studies. Assuming you have all of that information and in the correct formula, it is just a matter of entering in the parameters you want.
For example, consider five studies, labeled A through E in Figure 1. The information available for the five studies is summarized in this chart:
Figure 1: Effect Data Output versus Inverse Variance Based Input
Not all of the studies have the same information available. However, 4 out of the 5 studies have inverse-variance input information available whereas only 3 have all the p-value information needed. Thus the input from Study C should be excluded and the meta-analysis should use the combination of the odds ratio and effect size input for analysis.
In addition to making sure that the required input is available, it is also essential to make sure that the alleles tested are equivalent between studies. When this information is available, good meta-analysis programs can account for differences in the reported alleles, except for in ambiguous cases.
Figure 2: Individually, none of the populations have a significant result for this gene, but cumulatively since all of the studies have the same direction of the effect, this leads to an interesting finding for the meta-analysis.
Now the exciting news. Meta-analysis is coming to Golden Helix’s SNP and Variation Suite (SVS) very soon! You will be able to take advantage of the plethora of data import options for SVS to compare studies or results from numerous formats, select your options via a dialog (no need to know the command line options!) and visualize your results using GenomeBrowse or export your results to Excel to create a forest plot. (Forest plots in SVS will be coming later).
Featured image: http://imgs.xkcd.com/comics/meta-analysis.png