My investigation into my wife’s rare autoimmune disease
I recently got invited to speak at the plenary session of AGBT about my experience in receiving and interpreting my Direct to Consumer (DTC) exomes. I’ve touched on this before in my post discussing my own exome and a caution for clinical labs setting up a GATK pipeline based on buggy variants I received in an updated report.
But I haven’t had a chance to discuss the potentially most interesting member of my exome trio: my wife.
While my exome analysis falls squarely in the “narcissisome” camp of investigating a healthy individual with no expectation of finding highly penetrant functional alleles, I have a meaningful and nuanced question to ask of my wife’s exome: Can exome data provide a plausible genetic story about the pathology of a complex disorder like autoimmune diseases?
While I commented previously about the incomplete picture we have after the “GWAS era” in complex diseases, it turns out Rheumatoid Arthritis is arguably one of the success stories.
The overall heritability of RA is estimated at 60%, and after many GWAS studies, recent meta-analyses pin the portion of the heritability we can account for around 50%.
But my wife was not diagnosed with RA, but rather Juvenile Idiopathic Arthritis (JIA). Although symptomatically these end up being classified and treated similarly, JIA is defined with the early onset of symptoms that is first diagnosed before the age of 16.
Interestingly, the “Idiopathic” in JIA means literally that we don’t understand the fundamental cause of the onset of the rheumatic symptoms for patients.
So my first thought was to follow a path similar to what I took with my own exome and look for rare, potentially damaging variants of high effect in my wife’s exome.
But I knew this would most likely result in quite a few candidate variants, and I wanted to focus on variants that had some plausible phenotypic story.
In particular, were any of these variants in genes directly associated with JIA or maybe in a suggestive pathway relationship with JIA associated genes?
Rare Variant Analysis with Biological Context in Ingenuity Variant Analysis
Ingenuity is well known for having expended a huge amount of effort curating literature to build up a fairly comprehensive repository of gene network connections to phenotypes.
Ingenuity’s cloud-based variant analysis tool does the same type of variant filtering I went through with my own exomes in SVS, but it can also look at variants in the context of a disease if they have curated literature for it.
It turns out, Ingenuity had Juvenile Idiopathic Arthritis in their database.
Applying the common filters with the additional “Biological Context” filter that looks at variants directly or “1 hop” away from genes curated to be involved in JIA, I had a very manageable list of 36 variants to look through.
Most of these variants when pulled up in GenomeBrowse turn out to be relatively benign or tentative in how it could relate to JIA, but there were certainly some rare variants with a plausible story on how it might confer risk of an autoimmune disease.
One of the most interesting stories is a heterozygous missense mutation in NFKB1. What makes it possibly pathogenic is that NFKB1 has documented evidence of being haploinsufficient such that a heterozygous mutation could result in functional changes in its downstream genes.
In this case, NFKB1 interacts with 7 different genes annotated as being associated with JIA. The allele frequency of this mutation is quite rare and it never occurs in the homozygous state in the NHLBI cohort. Interestingly enough, the allele frequency is within the rage of the prevalence of JIA in the population, which is between 8 and 150 per 100,000.
The trick in interpreting this variant, as is the case with most variants, is that there is no direct relationship between this gene and JIA, let alone autoimmune disorders in general, and there is no clinical annotations about this individual mutation directly. Without that, all we can say is that this is an interesting putative variant with a suggestive pathway.
It’s also worth noting that my wife has no rare potentially damaging variants in any of the JIA associated genes directly.
No Rare Variants. Where Else Can We Look for that Heritability?
You might have noticed that quite a few variants in my IVA screenshot have gene names starting with HLA.
It’s not surprising that the MHC region (containing the HLA genes) has a lot of mutations (it’s the most polymorphic region of the human genome), but it’s also not surprising to see HLA genes, which encode a cell’s immune response, in a biological filter for an autoimmune disease.
In fact, this paper claims that at least for RA, most of the heritability of the 32 genes in the HLA region can be attributed to the HLA-DRB1 and HLA-DPB1 genes.
Compared to the common SNPs associated to RA that may only explain individually 0.1-0.5% of the heritability of the disease, the MHC region in whole is expected to explain a whopping 37%.
So can we look at these many rare frameshift or missense mutations in DRB1 and learn something about my wife’s genetic risk?
As you can see from the above GenomeBrowse screenshot, this region is extremely problematic from an alignment and variant calling perspective. The read coverage of 500-1,000x is way higher than the 100x average and half the reads are filtered (shown by the light gray hashed area) because DRB1 is known to have many pseudogenes with high homology and those reads map to multiple genomic loci.
More importantly, talking about the variants between the human reference sequence and one’s own exome or genome for the HLA genes is not that meaningful. There are actually 7 “alternate loci” defined by the GRCh consortium for the MHC region of which only one is used in the canonical chromosome 6 reference we aligned to.
But even accounting for the alternative loci is not the way to go if you want to take advantage of the research done on specific HLA gene alleles and the risk associated with them.
For example, I found the following HLA Gene “types” to have JIA specific predictions:
- HLA-A*02 predisposes to early-onset JIA
- HLA-DRB1*08 and HLA-DPB1*03 predispose to poly RF negative JIA (my wife’s subtype)
If you don’t recognize this <gene name>*<type> syntax, it turns out that for the 32 HLA genes, these “types” are defined by the IMGT/HLA and there are nearly 9,000 unique cataloged allele sequences for these 32 genes.
The classification is hierarchical where a two digit code like HLA-A*02 specifies a set of similar proteins encoded by variants of the HLA-A gene in the population, while a four-digit code such as HLA-A*02:01 uniquely specifies a protein encoded by HLA-A.
Well, given I have exome data, I wasn’t sure whether I could get to this type of HLA “type” data for my wife. PCR and protein-based assays are often used to get these types for samples, but I remembered chatting with some folks from an informatics company named Omixon that they were doing this with NGS data.
I fired off an email and soon got confirmation from their CTO, Tim Hague, that they could do HLA typing with NGS data and have just started handling exomes with great success.
Better yet, he’d be happy to run my wife’s data for this region through their latest software!
Given they are in Central European Time, I stayed up late chopping up my wife’s BAM file and converting it back to FASTQ and got it uploaded to them at around 11:30 pm.
By the time I was kicking my feet out from under the covers, I saw emails on my phone from them that had arrived “3 hours ago”.
Now that’s a quick turn-around time!
It’s neat to see how Omixon’s technology works. In most cases, like the HLA-DRB1 gene for my wife, there is a very clear winner in terms of coverage and quality of all the hundreds of known allele sequences for each of her chromosomes.
|HLA Alleles for risk-associated HLA genes|
|HLA-A*24:02:01:01 – HLA-A*03:01:01:01|
|HLA-DRB1*04:07:01 – HLA-DRB1*14:54:01|
It is very cool to have HLA types, and to have them out to this many digits (beyond four digits, the extra digits specify intronic and UTR differences between alleles), but when comparing these to the risk alleles for JIA I showed above, we are out of luck.
Put the Exomes on the Shelf, Pull Out Good Ol’ Genotype Array Data
While exomes give us great detail on rare and even private mutations, most of the literature on complex diseases is focused on the findings of Genome-Wide Association Studies (GWAS) that are looking at the genotypes of common variants for large cohorts of case/controls.
These common variants are almost entirely located between genes (often 1-100kbp upstream) with the hypothesis that they are “tagging” a region containing a causal variant that is most likely involved in regulating the transcription and expression of the associated genes.
You can see in the GenomeBrowse view of the the HLA region above for example, that the genotype array SNPs show up between genes and don’t overlap with the exome coverage and exome variant set.
While there have been many GWAS studies on RA, only recently have we started to see publications presenting associations between common SNPs and JIA.
For example, just last year in 2012, a group from the UK Epidemiology Unit published their latest findings of JIA associations and compared them to RA associations.
Since I had my full set of 560K common variant genotypes from 23andMe as part of their standard service, I thought I might try to imitate their key feature of aggregating the risk predictions of GWAS studies and build a JIA specific genetic risk prediction for my wife!
Well, this is definitely more tricky than it looks.
First, one has to find the genotypes for the published SNPs or see if you can impute your genotype (potentially by hand) if you don’t have the reported RSID in your dataset, which I had to do for about 50% of the reported SNPs. You then flip the Odds Ratios for the SNPs where you do not have the minor allele (which is is not always the alternate allele, as many of these common SNPs have the minor allele as the reference). Then you need to pull out your statistics textbook (or bug your statistician colleague) and figure out how to combine all the Odds Ratios listed in these papers using a Cochran–Mantel–Haenszel statistic. Finally, there are still a lot of choices to make about which lists of SNPs from these papers to use and how to compare or aggregate these predictions from multiple papers.
Looking at this 2012 paper UK Epidemiology Unit, there is a mixed set of signals, but I found the PTPN2 risk allele interesting as a section in another GWAS of JIA paper published by CCHMC in 2010 mentioned that PTPN2 knock-out mice showing an increase in TNFα. That’s phenotypically relevant here because my wife did not respond to the standard treatment of methotrexate for her arthritis but does have her symptoms managed beautifully by a TNF-inhibitor Enbrel.
But when I re-did this type of analysis with the CCHMC 2010 GWAS I just mentioned and their follow-up paper in 2012, I get very inconsistent and inconclusive results.
Really, what I suspect is going on here is the phenotypic heterogeneity of JIA is making these GWAS results very variable in how they apply to a distinct subphenotype like my wife’s polyarticular RF negative JIA.
Less than half of the individuals in this study have polyarticular JIA (affecting all joints) and the remaining have pauciarticular JIA (affecting five or fewer big joints).
While that was necessary to get to the sample sizes required to have genome-wide significant statistics, pauciarticular JIA is thought to be only 40% heritable and potentially have a very different genetic architecture!
The Exome May Still Bear Fruit for Polygenic Diseases, But We Might Need Whole Genomes
I honestly think that there is some meaningful signal in my wife’s exomes. But until there is more research done on autoimmune disorders or JIA in specific that includes rare content, it’s very hard to know if rare loss of function or potentially damaging mutations are likely to play a role in the pathogenesis of her JIA.
Unfortunately, so far there have been very few success stories of rare variants contributing to the genetic architecture of autoimmune diseases. Pretty much the only one to speak of is a rare variant in IFIH1 that protects against Type 1 Diabetes.
It’s very possible most of the lower penetrance but individually significant alleles for these disorders are causal mutations in regulatory regions for these associated genes. In most cases the causal allele will be tagged by a GWAS SNP but not caught by a genotype array directly.
With more affordable whole-genome sequencing and much better definitions of the functional regulator regions for genes from the ENCODE project, there may still be more to this genetic story to come.