Late last month I had the opportunity to attend one of my favorite events: the annual meeting of the International Genetic Epidemiology Society (IGES). This year’s conference was held in Vienna, in conjunction with the Genetic Analysis Workshop (GAW) and the International Society for Clinical Biostatistics (ICBS). The program at IGES this year was very diverse, with content ranging from Pharmacogenomics to risk prediction to microbiomics and beyond.
The session on risk prediction, held jointly with ICBS on August 28th set the theme of the conference for me. Two talks in particular, by Joan Bailey-Wilson and Bertram Müller-Myhsok, really made me think about what elements are required to successfully implement a predictive model based on genetic data, and I listened to the rest of the conference with this theme in mind.
Dr. Müller-Myhsok spoke in detail about data quality, reinforcing the fact that you can never be too careful about data accuracy, batch effects, and confounding factors. Risk predictions are only as good as the data upon which they are based. It was exciting to hear an audience member quote from a paper by our company founder, Christophe Lambert, during the Q&A session. The questioner suggested (correctly, IMO) that confounding factors would be less of a problem if we all followed better study design principles. In response, Dr. Müller-Myhsok admitted that poor study design is endemic in genetics, especially with regard to data sharing and the resulting batch effects. Those who follow this blog and our webinars know how passionate we are about data quality and good study design at Golden Helix.
Dr. Bailey-Wilson spoke about the four fundamental issues of any clinical predictive test: analytic validity; clinical validity; clinical utility; and ELSI (ethical, legal, and social implications). The QA issues mentioned above relate mainly to analytic validity. Among the other issues, I see clinical utility being most important for genetic tests in complex diseases. A predictive test isn’t going to have much clinical utility unless it can accurately explain a high proportion of disease variability (perhaps 60-80% or more), at least within a defined group of people. BRCA1 and BRCA2 testing for breast cancer is useful because mutations in those genes have very high penetrance, or positive predictive value. But there are no such high-penetrance genes known for many other complex diseases. One interesting example is heart disease. In a 2010 Lancet paper, researchers built a risk score based on carefully selected and highly significant GWAS SNPs, only to find that the “genetic risk score did not improve risk discrimination” over traditional methods. The effect size of the SNPs in the model was just too small to be effective for prediction. Similar challenges have been encountered in many other complex diseases.
So what can be done? We can’t change nature—there is a reason that we call them complex diseases—but we can certainly rethink our approach to the problem. I think that we can start by considering the message given the following day by Françoise Clerget-Darpoux. She did not mince words as she spoke about proper study design and reminded us all to think carefully about the data that we use. For example, GWAS is based on tag-SNPs. Tag-SNPs are great for finding susceptibility loci, but are very poor for modeling gene effects. In terms of prediction, tag-SNPs may be several steps removed from the biological process, meaning that any models based on those SNPs will be noisy at best. Risk models should ideally be based on functional variants or other biomarkers that are better predictors of the underlying biology. I think that we all know this, which is why we continue to pursue next-gen sequencing and other advanced technologies.
Our field is continuing onward. There is a constant stream of new data generation and data analysis methods being produced. Our understanding of complex diseases continues to improve as well. At Golden Helix, we are trying to keep abreast of advances in the field. Please contact us if you have questions about how to approach your research data. We always like to hear about what you are working on and the challenges that you face. We might even be able to offer some help, either through our software products or custom analysis services. And that’s … my 2 SNPs.