While VarSeq has always had excellent support for variant interpretation and analysis, we continue to find new edge cases in the clinical literature that improve our interpretation capabilities. In this blog, we will be covering some of the new improvements in VarSeq to support the interpretation of non-coding and splice site variants. Transcript Annotation Improvements Let’s start by covering some… Read more »
Using the K-Fold Cross-Validation Statistics to Understand the Predictive Power of your Data in SVS In cross-validation, a set of data is divided into two parts, the “training set” and the “validation set”. A model for predicting a phenotype from genotypic data and (usually) some fixed effect parameters is “trained” using the training set—that is, the best value(s) of the… Read more »
Interpretation of variants in accordance with the ACMG guidelines requires that variants near canonical splice boundaries be evaluated for their potential to disrupt gene splicing [1]. The five most common tools for splice site detection are NNSplice, MaxEntScan, GeneSplicer, HumanSplicingFinder, and SpliceSiteFinder-like. Because these algorithms have been made easily accessible in the bioinformatics tool Alamut, they have been canonized for… Read more »
The VarSeq clinical platform is built on a strong foundation of data curation and annotation algorithms to ensure the variants identified have all the information required to make the correct clinical assessments. It’s easy to make light of “variant annotation”, but the details run very deep into the roots of how we represent genomic data, how public data is aggregated, stored… Read more »
One frequent question I hear from SVS customers is whether whole exome sequence data can be used for principal components analysis (PCA) and other applications in population genetics. The answer is, “yes, but you need to be cautious.” What does cautious mean? Let’s take a look at the 1000 Genomes project for some examples.
Golden Helix is excited to host a webinar on Tuesday August 26th discussing the Genomic Prediction methods which were recently integrated into the SVS software. Genomic prediction uses several pieces of information when calculating its results. Genetic information is used to predict the phenotype or trait for the individuals. The phenotypic trait data can be provided for a subset or for all… Read more »
You probably haven’t spent much time thinking about how we represent genes in a genomic reference sequence context. And by genes, I really mean transcripts since genes are just a collection of transcripts that produce the same product. But in fact, there is more complexity here than you ever really wanted to know about. Andrew Jesaitis covered some of this… Read more »
A few months ago, our CEO, Christophe Lambert, directed me toward an interesting commentary published in Nature Reviews Genetics by authors Bjarni J. Vilhjalmsson and Magnus Nordborg. Population structure is frequently cited as a major source of confounding in GWAS, but the authors of the article suggest that the problems often blamed on population structure actually result from the environment… Read more »
Recently I gave a presentation on bioinformatic filtering: the process of using quality scores, annotation databases, and functional prediction scores to intelligently and quickly reduce your variant search space. In this webcast, I mention that filtering is something we have been doing for a long time, and that there are some great examples that use exome sequencing data along with… Read more »
Allow me to introduce you to Blaine Bettinger. Blaine is a patent attorney who holds a PhD in Biochemistry with a concentration in genetics. He is also a family history enthusiast who writes the Genetic Genealogist blog, where he gives commentary on applications of genomic science for advancing personal and family history research. I first learned about Blaine last May… Read more »
Golden Helix’ SNP & Variation Suite (SVS) has a Regression Module to enable researchers with varying degrees of statistical knowledge to interrogate their data using regression models to account for potential confounding effects of covariates and interaction terms. While these tools are labeled “basic”, they can be difficult to use and results hard to interpret for those who have only… Read more »
Including the completion of the Human Genome Project in 2003, scientists have created whole genome sequence maps for over 1,000 species. From maize to oysters, the quest to investigate different species’ genetic code continues. Mapping is the “first step” that provides a baseline for further study into differences between species, the occurrence of certain diseases, and the prevalence of traits… Read more »