Recently we have expanded our annotation track offerings with new human variant frequency catalogs such as the 1000 Genomes Phase 1 Data. Of course, we also curate data for plant and animal genomes – some of which are currently available in our software and some of which will be available in our next release. In this blog post, I will fill you in on these new plant and non-human genomes and what to look for in the near future for annotation tracks. Also, I encourage you to contact us if you are working with SNP & Variation Suite (SVS) and a genome not currently available in our software, in order to let us know what genome and annotation tracks are important to you and your research.
This fall we have been quickly building up the list of supported species and builds for our software. We recently added the following four genomes:
- Zea mays AGP_2 (Maize)
- Bos taurus UMD_3.1 (Bovine)
- Sus scrofa Sscrofa_9.2 (Porcine)
- Ovis aries OAR_1.0 and OAR_2.0 (Ovine)
Previously, the data for these genomes came from a variety of sources. The resulting challenge was that most of the annotation data was not available in a nicely packaged format from either UCSC or Ensembl. We were recently informed that Ensembl finally had a gene track available for the Bos taurus UMD_3.1 build, and we quickly added this track to our list of available resources.
The following new tracks for the Maize genome (Zea mays AGP_2) can be requested and provided upon demand. Although there are currently no annotation tracks available through the Golden Helix Annotation Track Manager, we hope to have these tracks available directly from our network in the near future.
The maize annotation tracks were curated from data available from maizesequence.org
A previous version of the Bos taurus genome (Bos taurus UMD_3.1) has been available since September. However, with the release of the EnsemblGene track this week, the build name has been finalized to UMD_3.1, and we are now publicly releasing all tracks with this new standardized name. All of the available tracks for this genome are now available for download directly from the Golden Helix Network to the annotation track manager, and the new genome map will become available in the next software update for Golden Helix SVS (version 7.5.6). These tracks include:
- * EnsemblGenes-Ensembl_UMD_3.1_Bos_taurus.idf
- * NCBIGenes-CBCB_UMD_3.1_Bos_taurus.idf
- * ReferenceSequence-CBCB_UMD_3.1_Bos_taurus.idf
The porcine genome (Sus scrofa Sscrofa_9.2) is also now available. The annotation tracks are now publicly available from the Golden Helix Network and the genome map will be available in the next SVS release (version 7.5.6). The list of tracks for the porcine genome include:
Finally, the ovine genomes (OAR_1.0 and OAR_2.0) and annotation data are now available on request. The genome maps for these two builds will be released with the next version of the SVS software (7.5.6) and the annotation tracks should be publicly available sometime this month. The list of annotation tracks available include:
In addition to these four new genomes, reference sequence tracks for Mus musculus (mouse), Macaca_mulatta (Rhesus monkey), Rattus norvegicus (rat) and Gallus gallus (chicken) have been built and made publicly available for download.
In the future we plan on adding most of the genomes that are currently available through Ensembl and those in UCSC with gene tracks and reference sequence tracks. Often these sources do not have the newer genomes available directly from a consortium. In that case, as long as there is information on the size and number of chromosomes, reference sequence data, and ideally gene information that includes codon start and stop positions as well as exon start and stop positions, we can curate this data for you. Once we curate the data, we make it available to the requester for approval. Once documentation and citations have been generated, we can then make the tracks available to the general public. At this time, new genome maps can be transferred via email and then included in the software or included in the software with scheduled bug fix releases.
There is one special note I would like to pass on. We have been asked several times recently if we could curate data for SIFT or PolyPhen for plant or animal genomes. There is currently no genome-wide functional scoring data publicly available for genomes other than the two human genome builds (NCBI_36 and GRCh_37). Until additional genomes are analyzed and the data published from either sift.jcvi.org or http://genetics.bwh.harvard.edu/pph2 we will not be able to provide these kind of tracks for non-human genomes. If you are aware of pre-computed amino acid substitution functional scoring and prediction data that is computed for your genome of interest, please let us know and we can always look into how to curate and provide analysis support for this data.
At Golden Helix we do not create the annotation data, we merely process publicly available data into a format that is readable by our genome browser and features that use annotation tracks for analysis. All of the tools used to build annotation tracks and genome maps are currently available to our customers in order to build their own custom annotation tracks if data is not publicly available. We would be happy to either provide instruction on how to use these tools or curate the data for you. Please understand that it can take a couple of weeks to have data processed and prepared for public release. …And that’s my 2 SNPs.