Q&A from our December Genomic Prediction webcast

Our Genomic Prediction webcast in December discussed using Bayes-C pi and Genomic Best Linear Unbiased Predictors (GBLUP) to predict phenotypic traits from genotypes in order to identify the plants or animals with the best breeding potential for desirable traits.

The webcast generated a lot of good questions as our webcasts generally do. I decided to begin to share these Q&A sessions with the community. If the questions below spark new questions or need clarification, feel free to get in touch with us at [email protected].

Question: Does the program (SNP & Variation Suite (SVS)) allow fitting fixed effects in GBLUP?

Answer: The answer to that is, absolutely yes. There is an option to add additional covariates into the model, and any numeric or binary variable or categorical variable, can be accounted for in that manner.

Question: Is it possible to use several phenotypes in the analysis and get prediction based on a combination of those, potentially with specific weighting?

Answer: This is a bit more of a challenge. The best available option to take would be to incorporate those additional phenotypes as covariates.

Question: Does the Bayesian method as implemented in SVS account for relatedness?

Answer: Yes. Similar to GBLUP, the Bayesian method incorporates the genomic relationship matrix.

Question: Are options available for calculating the relationship matrix?

Answer: Yes, we a few different methods at this point; identity-by-state matrices, or identity-by-descent. There is also a G matrix, or an A matrix.

Question: The phenotype distribution should be normal but if not normal, how do you proceed?

Answer: The prediction accuracy is going to be best with normal or near normal distributions. You may need toapply a transformation to normalize some variables. If you have a binary trait, the methods as implemented in SVS will work, but it’s not going to assign a zero or a one as the prediction. It’s going to treat the outcome as though it was quantitative.

Question: Can exome data be used instead of SNP chip data for human near mendelian disorders?

Answer: This is a case that I haven’t personally tested. The question becomes, with exome data, typically a matter of the frequency of the SNPs and what kind of statistical power you get from them. Quite often, the SNPs on the exome chips tend to be very rare, very low frequency, and as a result, may not be ideal for model-building, but certainly there’s nothing to stop you from doing it if you want to give it a try.

Question: Was there any bias in estimation of SNP effects in GBLUP from Bayes-C Pi as often seen in single SNP estimates?

Answer: Generally, with GBLUP, given that it fits so many SNPs into the model, it’s hard for any one SNP to be overly influential. But we did see the one example (in the webcast) with Bayes-C Pi where the R implementation and the SVS implementation selected different SNPs. That was very influential in the predictions given for the few samples where those SNPs had differing genotypes. It’s something that you should watch out for.

Question: Does SVS use all markers to predict breeding value for oligogenic trait?

Answer: That depends on which method you use. GBLUP is going to use all markers, or at least all of the markers that you as the user provide. In the example, I selected markers based on allele frequency and call rate. There are situations where you may want to restrict further and only give it – as a starting point – SNPs that have some prior association to the phenotype perhaps, in which case it would only use what you give it. Now Bayes-C Pi of course, whether you give it a lot of SNPs or a few SNPs, it’s usually going to try and identify just a few of the most influential SNPs from which to build the model.