Category Archives: General statistical genetics principles

VarSeq Update: Support for the Interpretation of Non-coding and Splice Site Variants

         April 15, 2021
Splice site

While VarSeq has always had excellent support for variant interpretation and analysis, we continue to find new edge cases in the clinical literature that improve our interpretation capabilities. In this blog, we will be covering some of the new improvements in VarSeq to support the interpretation of non-coding and splice site variants. Transcript Annotation Improvements Let’s start by covering some… Read more »

Using the K-Fold Cross-Validation Statistics to Understand the Predictive Power of your Data in SVS

         December 12, 2019
SVS 8

Using the K-Fold Cross-Validation Statistics to Understand the Predictive Power of your Data in SVS In cross-validation, a set of data is divided into two parts, the “training set” and the “validation set”. A model for predicting a phenotype from genotypic data and (usually) some fixed effect parameters is “trained” using the training set—that is, the best value(s) of the… Read more »

Revisiting the Five Splice Site Algorithms used in Clinical Genetics

         January 16, 2018

Interpretation of variants in accordance with the ACMG guidelines requires that variants near canonical splice boundaries be evaluated for their potential to disrupt gene splicing [1]. The five most common tools for splice site detection are NNSplice, MaxEntScan, GeneSplicer, HumanSplicingFinder, and SpliceSiteFinder-like. Because these algorithms have been made easily accessible in the bioinformatics tool Alamut, they have been canonized for… Read more »

Top 10 Posts for Understanding Clinical Annotation of Genomic Variants

         August 24, 2017
Top 10

The VarSeq clinical platform is built on a strong foundation of data curation and annotation algorithms to ensure the variants identified have all the information required to make the correct clinical assessments.  It’s easy to make light of “variant annotation”, but the details run very deep into the roots of how we represent genomic data, how public data is aggregated, stored… Read more »

SVS, Population Genetics, and 1000 Genomes Phase 3

         January 27, 2015

One frequent question I hear from SVS customers is whether whole exome sequence data can be used for principal components analysis (PCA) and other applications in population genetics. The answer is, “yes, but you need to be cautious.” What does cautious mean? Let’s take a look at the 1000 Genomes project for some examples.

Genomic Prediction and How it’s Used

         August 21, 2014

Golden Helix is excited to host a webinar on Tuesday August 26th discussing the Genomic Prediction methods which were recently integrated into the SVS software. Genomic prediction uses several pieces of information when calculating its results. Genetic information is used to predict the phenotype or trait for the individuals. The phenotypic trait data can be provided for a subset or for all… Read more »

RefSeq Genes: Updated to NCBI Provided Alignments and Why You Care

         August 14, 2014

You probably haven’t spent much time thinking about how we represent genes in a genomic reference sequence context. And by genes, I really mean transcripts since genes are just a collection of transcripts that produce the same product. But in fact, there is more complexity here than you ever really wanted to know about. Andrew Jesaitis covered some of this… Read more »

Population Structure + Genetic Background + Environment = Mixed Model

         March 22, 2013

A few months ago, our CEO, Christophe Lambert, directed me toward an interesting commentary published in Nature Reviews Genetics by authors Bjarni J. Vilhjalmsson and Magnus Nordborg.  Population structure is frequently cited as a major source of confounding in GWAS, but the authors of the article suggest that the problems often blamed on population structure actually result from the environment… Read more »

What is Bioinformatic Filtering?

         June 29, 2012

Recently I gave a presentation on bioinformatic filtering: the process of using quality scores, annotation databases, and functional prediction scores to intelligently and quickly reduce your variant search space. In this webcast, I mention that filtering is something we have been doing for a long time, and that there are some great examples that use exome sequencing data along with… Read more »

Admixture and Blaine Bettinger

         January 25, 2012

Allow me to introduce you to Blaine Bettinger.  Blaine is a patent attorney who holds a PhD in Biochemistry with a concentration in genetics.  He is also a family history enthusiast who writes the Genetic Genealogist blog, where he gives commentary on applications of genomic science for advancing personal and family history research.  I first learned about Blaine last May… Read more »

Please Help Me Get My Regression Model Set Up!

         July 6, 2011

Golden Helix’ SNP & Variation Suite (SVS) has a Regression Module to enable researchers with varying degrees of statistical knowledge to interrogate their data using regression models to account for potential confounding effects of covariates and interaction terms. While these tools are labeled “basic”, they can be difficult to use and results hard to interpret for those who have only… Read more »

The What, Why, and How of Creating a Genome Map

         August 10, 2010

Including the completion of the Human Genome Project in 2003, scientists have created whole genome sequence maps for over 1,000 species. From maize to oysters, the quest to investigate different species’ genetic code continues. Mapping is the “first step” that provides a baseline for further study into differences between species, the occurrence of certain diseases, and the prevalence of traits… Read more »