JEnsembl
News...! |
---|
2015-01-13 Release version 1.78. Updated to handle up to Ensembl Schema 78 (Ensembl Genomes 25). Introduced searching for partial matches on species aliases/names. (Download) |
JEnsembl API is published: JEnsembl: a version-aware Java API to Ensembl data systems. Paterson T, Law A. (2012) Bioinformatics 38(21):2724-2731. [PDF] |
To help drive our development of the API we are running a poll to try and determine how potential users might use JEnsembl. Please add your opinion at Survey Monkey. |
A demonstration Java API for accessing Ensembl datasources
We have constructed a proof-of-concept implementation of an Ensembl API that demonstrates the tractability accessing Ensembl datasources at Ensembl and EnsemblGenomes using Java.
We believe that the availability of a Java-based API to the Ensembl system would be a timely and highly effective addition to the bioinformatics toolbox, facilitating the development of integrated applications for data analysis and display. Current programmatic access to Ensembl datasources is provided through Ensembl’s Perl API modules. As a flexible scripting language Perl is ideally suited to processing of large volumes of text-based data, but true object-oriented languages like Java are more appropriate for developing maintainable, large-scale software applications and embedding in graphical interfaces. Consequently Java is now widely used for bioinformatics analysis tools.
Our demonstration architecture addresses a number of objectives: specifically, it provides access to all versions of databases at Ensembl and EnsemblGenomes. It implements a varied selection of data access functions to demonstrate data retrieval from 'Core', 'Variation' and 'Compara' databases; mapping between CoordinateSystems and mapping transparently between database versions where there has been a change in schema requiring alternate data access methods.
The modular design architecture followed in the project design allows separation of DataAccess functionality from Model Objects. Specifically separation of the DataAccess Configuration (the mapping of SQL statements to Model Objects) from the DataAccess Objects allows the configuration module to control per schema changes in access code. The software modules are published as separate Maven artifacts , allowing users to import these libraries as Maven dependencies, alternatively the binary libraries may be downloaded and added to the class path of any Java project.
What is Ensembl?
The Ensembl Project provides a genome annotation system for the annotation, analysis and display of genome assembly databases, available for vertebrates at www.ensembl.org and more recently for other taxonomic groups at www.ensemblgenomes.org. Together with core genomic annotations, the curated resources now include comparative-genomic, variation, functional-genomic and regulatory data. Ensembl is one of the most widely used bioinformatics resources on the Internet.
Accessing Ensembl's Data
Access to the data is freely provided through Ensembl’s interactive web browser, the BioMart data mining tool, a publicly exposed MySQL database and programmatically through Perl API modules. The Perl API is also used for the majority of the Ensembl systems' internal workflows.
A variety of alternative API libraries for Ensembl data access have previously been developed in other programming languages: Java (Ensembl's retired EnsJ project), R (biomaRt at Bioconductor), Python (e.g. PyCogent, Jython (Using EnsJ libraries with the Java Python interpreter) and Ruby (BioRuby). Most of these are not actively maintained and many were developed to support bulk data access (which is in any case often adequately handled through Ensembl's BioMart). None of these alternative APIs adequately address issues around the versioning of Ensembl releases and the evolving data schema. Data access is often mediated through the database structure ( e.g. using 'ActiveRecords' in BioRuby) rather than though a genetic model of the data.