New breakthroughs are being made every day in genomics. It’s a dynamic and fascinating industry, and with exceptional growth forecast in the DNA sequencing market, a new generation of people are entering the field: future researchers, clinicians, counselors and doctors. This new generation will need to learn not only the science, but also understand how to process the massive amounts of data generated with DNA sequencing (and genomics in general).
Managing large volumes of data is already a mission critical topic in bioinformatics, where many core facilities are overworked. They do their best to keep up with the demand, but going forward there will be more data, more projects and more people to support. How will bioinformatics keep up?
Now, as universities are putting educational programs together to prepare the next generation of scientists to understand the ins and outs of DNA analytics, they are running into obstacles. Bright kids who are fascinated by the science (human, animal, plant) are not necessarily computer programmers nor do they want to be. Yet, many of the tools used to teach basic analytic skills in genomics programs are public domain/open source programs that require enormous amounts of computer science knowledge to navigate.
These future scientists are forced to learn about the various public domain software platforms such as PLINK, Eigensoft, and ANNOVAR. They need to learn scripting languages such as R, Perl, or Python. They need to learn how to manipulate large datasets that even seasoned researchers can struggle with. Learning the programs on a basic level can take months and months, perhaps several semesters, before students even can focus on the actual data and the science they went to school to learn.
Based on interviews I have conducted with leading genomics programs, some of the key issues with current training methods are:
- Lack of intuitive visualization in public domain software programs.
- Students being required to learn scripting and programing skills.
- Conducting manipulations with very large datasets is difficult with the tools available. Common issues include dealing with heterogeneous data sources, cleaning up data, and preparing it for the analytics project.
We must collectively rethink how we train the next generation to handle the data so they can better understand the science. Fortunately, our SNP & Variation Suite (SVS) platform has already solved these problems.
SVS is an integrated collection of proven methods to manage, analyze, and visualize your data. It’s a big punch in a small package and can operate on the desktop or laptop which students already have.
We spend a lot of time simplifying the import of large datasets. Hundreds of exomes and genomes can be imported for analysis or visualization with a few mouse clicks. It’s simple and easy to start working with the data in a way that students are familiar with (a GUI), rather than having to learn arcane command-line syntax.
SVS eliminates the need to learn how to script, though advanced students can still explore scripting options with our Python interface.
All major methods and algorithms used in the software are documented (including the math, use cases, and article citations). We completely shunned the “black box” model. This is ideal for teaching purposes as professors want to review the underlying math and/or data manipulation workflows. We are not aware of another commercial competitor whose methods are as transparent as ours.
As a company we can provide additional support to professors putting together a full course or a summer program:
- We provide sample datasets that can be used to create training projects for the students.
- We have extensive videos and tutorials on our website that are useful for self-study and can be included in course work.
- Lastly, we have a team of training professionals with extensive knowledge of the industry. We are happy to advise on curriculum design and related topics.
We at Golden Helix are committed to supporting the education of the next generation of genetic researchers. To this end, we are offering flexible license schemes that address the special needs of educational genomics programs to make teaching and learning bioinformatics easier. Please contact us if you want to learn more.
Pingback: Bridging the gap between genetics and organic chemistry at UIUC | Our 2 SNPs…®
Pingback: Q&A Surrounding the Molecular Sciences Made Personal Webcast | Our 2 SNPs…®
Pingback: Precision Medicine – Part VI – The Educational Challenge | Our 2 SNPs…®
Pingback: Analyze Your 23andMe Genotype Files with Golden Helix | Our 2 SNPs…®