While there is a lot of excitement in the industry about Next-Generation Sequencing, questions remain as to whether it will be “the answer” to the ails of the genetic research industry. I recently had the opportunity to sit down with three of Golden Helix’s thought leaders and ask a few questions about NGS – Gabe Rudy, Vice President of Product Development; Dr. Bryce Christensen, Statistical Geneticist and Director of Services; and Dr. Christophe Lambert, Chairman & CEO. Here’s what they had to say:
Jessica: Is the missing heritability going to be found in Next-Generation Sequencing (NGS) data?
Gabe: Maybe, maybe not. There has been the challenge in finding strong causal signals in single loci of common variants for awhile. So we are now looking for compound effects of multiple loci and aggregating to the level of genes. Or even taking a systems biology approach and looking for strong signals at functional units larger than genes and test some compound genetic signal. Another theory is that complex diseases may not be explained by common variants, but may actually be an aggregation of smaller or more private mutations that present similarly in the clinic.
Bryce: The consensus seems to be that causal variants for complex diseases can only be discovered with huge case/control studies, which is also the belief in GWAS. So we shouldn’t panic if early sequencing efforts don’t immediately answer the outstanding questions for all of the complex diseases we are studying. We need to be patient and let the science evolve.
Gabe: Very true. There are many who believe there’s no point in doing analysis on complex disease with small numbers of samples. But that argument inherits some of the underlying assumptions of the GWAS era, which is that complex common diseases are mostly driven by common variants. Doing a GWAS-like single-probe test or even a gene-level test does require large sample sizes to provide enough statistical power.
The counter-argument is that there are things to learn with small family experiments with NGS data. Studies that use a single family oriented analysis are hailing success stories and may contribute to our understanding the genetic basis of complex traits. And there is a lot to learn about specific complex diseases with this study design, whether finding candidate genes for larger scale studies or cataloging rare or private mutations that co-segregate with common mutations in interesting biological pathways.
Christophe: That’s an exciting thing in our marketplace. Because of cost drivers, there’s going to be a lot of researchers doing small sequence studies on families. But there are some new success stories of uncovering the causal variants for at least the rare diseases, if not some of the common ones, using small pedigrees.
Bryce: Collapsing tests make a lot of sense in theory, but have yet to be proven in practice. There is ongoing discussion in the field about the proper way to conduct collapsing tests, and new collapsing algorithms are being proposed on a regular basis. While people generally talk about collapsing within genes, there are other functional units in the genome that should be considered, like micro RNAs. On a higher level, we may even consider variant aggregation over signalling pathways and families of genes.
Gabe: Right. When collapsing at the pathway or gene family level, you are still leaving a lot of the biology unanswered. Sometimes these networks will have a number of similarly-functioning genes, and you need loss-of-function mutations in multiple function groups to really affect the product of the pathway. But, of course, these are good problems to have, and if you can discover loss-of-function mutations in genes with a plausible biological story, you have a good step towards finding a causal mutation or set of mutations.
Christophe: On the causal side, I’d also tie in the idea of private mutations and using model organisms to do knock-out studies. So the question is, if the causes of diseases are so diverse and unique to the individual or pedigree, how are we going to do functional studies? And in the end, how is that going to get translated in the clinic?
Jessica: Going back to NGS studies, what are some issues you see today?
Gabe: On the bioinformatics side, I would say all of the effort in benchmarking is redundant and inefficient. Researchers are saying “I’ve tried five different indel callers and they’re all bad.” Well, every single one of them just took the output of BWA and tried to use it. Obviously that’s going to all be constrained by the limitations of BWA’s alignment strategy for gaps and deletions, which is not its primary focus. To top that, you have to go with local realignment, which Complete Genomics has showed is far superior for getting accurate small insertions and deletions in DNA resequencing studies. Clearly, the informatics tools are not at the point where everyone can agree on them and move on.
Another thing is the trend toward integrative studies– integrating RNA, DNA, methylation, and sequencing and throwing them all into one giant mess of analysis. Because it’s so much data, you sometimes try an undirected approach, such as machine learning, to make some sense out of it. So far, this approach doesn’t look too promising in seeing any significance through the noise. At the moment, using a directed or single-assay oriented approach seems to be the better choice. Start with variant calls or RNA-Seq and then pull in things like ChIP-Seq or Meth-Seq after you have done your first round of analysis to see if they help support your results or point out false-positives.
Christophe: I see a lot of problems with design of experiments and multiple testing in the individual studies, and often only a promising pattern might emerge from a study at close to a 0.05 significance level. So if the individual study designs aren’t working, synergizing different types of studies with all of the attendant confounding and experimental design problems isn’t likely to be as productive as proponents of integrative biology might hope.
Another piece of the puzzle is the technology and the current battle between long reads versus short reads. In the long term, long reads are going win, but short reads have their day today. Interrogating biology thoroughly and dealing with odd structural rearrangements and repeat regions are near-intractable with short reads. So we haven’t even interrogated all parts of the genome well (including some of the most common variants in repeat regions) and we’re already saying, “Hey, the missing heritability – we don’t know where it is.”
Bryce: At IGES, in September, Ellen Wijsman pointed out in the opening session that NGS is really just another way to collect genotypes, and it doesn’t change the fact that the phenotypes are still complex. We’ve used all kinds of different technology over the years to gather genotypes. We shouldn’t be surprised that just having a different way to get genotypes isn’t answering the question. That message was repeated throughout the conference, and I heard a similar theme emerge at ASHG.
Jessica: So if it isn’t the answer to everything, what does the future of NGS look like?
Bryce: In the short term, I wonder if a year from now we’re going to see a “Back to the Future” moment in light of Affymetrix and Illumina both launching exome chips with extensive targeted rare content in coding regions. They are very inexpensive compared to exome sequencing and won’t have the bioinformatics overhead that we’re talking about with sequence data. I don’t know why anyone wouldn’t use them. It sounds like they are selling in huge numbers. Perhaps we will see the broader field put NGS aside for a while and try out the exome arrays while the power players in NGS work out the kinks.
Christophe: My question is – are we in the midst of a hype curve for next-gen sequencing? Is it going plummet in the next year and then slowly pick back up, as we know many “hot” things do in our industry? You have this exaggerated sense of promise, and then we might have a reality check whereby a lot of the early findings on rare diseases are invalidated. And then we’ll be throwing up our hands again. Or is the promise really delivering so much that we are just going to ride up the hype curve, and we’re never going to have a dip? Does that ever happen?
Gabe: The discouragement of jumping into NGS is that the bioinformatics is hard and the interpretation is hard. But there’s stuff being found, right? I don’t think the promise is being under-delivered. It’s probably just a mismatching of expectations. The people leading the charge with the biggest bull-horns are the platform providers and the large genome institutes, and they certainly make it look easy! But it is hard.
Bryce: You look at some of the influential early papers coming out on whole genome NGS, and they really gloss over how hard it was to do the bioinformatics part. It’s no surprise that so many new bioinformatics services companies seem to be popping up, given the difficult learning curve involved. The workflows will undoubtedly become more standardized and streamlined over time. I mentioned earlier that we need to be patient as the science continues to evolve, and this is a big part of that.
Jessica: What methods are being used for sequence data analysis today?
Gabe: It’s pretty clear the VAAST-style workflow is getting a lot of attention. VAAST ranks your variants for you with a lot of things happening under the hood to account for functional prediction, population frequency, and other annotations. And it incorporates family structure and inheritance to whatever degree possible. On the downside, it’s a big black box, and that is a turn-off for a certain type of investigator who wants to ensure those steps and those criteria are in line with their understanding of their experiment.
Christophe: One of the things to keep our eye on is IBD (identity by descent) approaches being merged with sequence data where the idea is that, eventually, everybody is related. So these little chunks of DNA from a shared ancestor are going to be very informative and be used in the filtering process. That’s how the IBD stuff is going to bridge the two worlds of large enough pedigrees and huge case/control studies.
If I was to bet on where some breakthroughs are going to happen in bioinformatics, it would be some sort of merging of IBD, sequencing, and perhaps coalescent ancestry models. All that will eventually come together, and we are going to get much better disease models and models of where these diseases originated. So it’s going to be relevant not just to ancestry, but also to disease. Phasing and imputation are the other areas this is relevant.
I do see a lot of talk about phasing, especially because Complete Genomics does local phasing with all their variant calls. And I haven’t seen anyone really using that information. You take a family of four and you can actually map out every single recombination point on the genome (about 20 or 30 per generation) and that, alongside your disease and mutation information, can give you a lot of insight.
For example, if you have two heterozygous mutations in a gene and if you know the alternate alleles are on the same chromosome, then they’re going to both be transcribed into the same protein product. Versus if they’re on different chromosomes, you are potentially creating two different mutant transcripts and two different proteins.
Bryce: Phasing really can give you more insight, particularly in family studies. This is a place where I see longer sequence reads eventually making a big difference. The longer reads you can get, the better you can determine your phasing. If we can accurately determine phase, we can do things like identify compound heterozygotes without needing to sequence parents, or, as Christophe mentioned, identify shared IBD segments more accurately in distantly related individuals.
Christophe: In addition to speculating on future innovation, it is also worthwhile to characterize the current genetic research market by observing the distribution of posters at ASHG/ICHG. From what I could see, the majority of them are some case report of a rare disease often accompanied by a photograph of the affected individual, perhaps with a family pedigree analysis. If you think about the progression of science, it goes from classification and categorization to correlation and then causation. A large part of the distribution of people in genetic research are making case reports – classifying what they see, and will be working towards understanding correlations and functional causes later.
Bryce: As one of my former professors likes to say, the science of biology has been little more than the process of naming things for most of the last 10,000 years. We can’t get away from it, because we are still finding so many new things.
Jessica: Any final thoughts regarding NGS?
Gabe: NGS is inevitably going to be more and more relevant to clinical workflows and the analysis is not getting easier. But a huge swell of momentum is pushing things forward at an incredible pace.
Bryce: The future will definitely be exciting and many great discoveries will be made, but we need to keep our expectations realistic and not expect too much too soon. Next-gen sequencing is far from being a mature science. It’s still in the embryonic stage. I think there’s going to be a few more generations added on to “next-gen,” whether it’s “next-next-gen,” or “next-next-next-gen”. The technology isn’t quite there yet, particularly in regard to the data analysis and informatics. Our high-throughput data generation capabilities have been outpacing our data analysis capabilities at least since the birth of GWAS, but we are catching up.
Christophe: As I look at the onward march of humanity trying to make progress, it is so frustrating to see the incremental nature of it and humanity’s inability to let go of old paradigms and deal with uncertainty and complexity. I’m not seeing systemic coordination to really get us to where we want to go. It seems disconnected and scattered. We’re are risk of continuing to be much slower than we could be because science isn’t stepping back and looking at the process. Still, I’m tremendously optimistic about the power of NGS technologies, and we are betting our future on their use becoming ubiquitous for both research and the clinic.
…And that’s our 2 SNPs.®