A full Java API for accessing Ensembl datasources would replicate all the data access functionality of the Perl API Core, Compara, FuncGene and Variation modules, and would:
- connect to and extract data from the current release version of Ensembl
- access all instances of Ensembl data systems including single-species databases at Ensembl and multi-species databases (bacterial collections) at EnsemblGenomes
- access data from all database types: core, funcgen, variation, compara etc.
- emit software objects corresponding to the major object types within Ensembl including Sequence Regions, Markers, Alleles, Genes, Exons, Transcripts, CoordinateSystems and AnnotationFeatures of numerous kinds
- map between appropriate CoordinateSystem levels for a given genome
- provide an architecture for updating the API connectivity and functionality as new versions of Ensembl are released, whilst maintaining backwards compatibility with earlier releases. (Improve on the Perl requirement for version-specific API releases)
- ideally be compatible with the BioJava Open Source Java libraries where relevant
There would be no obvious value in replicating any of Ensembl Perl API functionality specific to data curation, the annotation pipeline or website provision etc.