Home Project Objectives Project Development Source Code Design Architecture Downloads Code Examples Current JavaDocs Change Log Example Usage I: Savant Browser Plugin Example Usage II: ArkMAP Application Contact Roslin Bioinformatics
 
Get JEnsembl: a Java API for Ensembl at SourceForge.net. Fast, secure and Free Open Source software downloads
 
News...!
2014-03-17 Release version 1.75. Updated to handle up to Ensembl Schema 75 (Ensembl Genomes 22). Major improvements to methods for Registry initialization and query. (Download)
JEnsembl API is published: JEnsembl: a version-aware Java API to Ensembl data systems. Paterson T, Law A. (2012) Bioinformatics 38(21):2724-2731. [PDF]
To help drive our development of the API we are running a poll to try and determine how potential users might use JEnsembl. Please add your opinion at Survey Monkey.

JEnsembl: Example Code

Requirements
Download Example Code
Example Code Snippets
Example Code Files
 

Requirements

To run these examples you would need to have the compiled Java API and all of its dependencies on your class path. This can be done by one of the following... (most method simple first).

  • building your own Maven project using the specified Maven dependencies from public repositories, or by installing the artifacts locally (in your '.m2' repository) from the File Download Page.

  • including the binary bundle jensembl-bundle-no-config.jar together with a datasource configuration jar (e.g. Releases/1_75/jensembl-bundle-no-config-1.75.jar and Releases/1_75/ensembl-config-1.75.jar on the File Download Page).

  • including all of the individual binary jar files in a Release download (e.g. Releases/1_75/release1_75.zip on the File Download Page).

  • building your own Java project using the JEnsembl source code from Subversion (and importing required dependencies using Maven or downloading as above).
 

Download Example Code

Example code can be downloaded from the File Download Page (e.g. the source-code jar artifact ensembl-test-1.75-sources.jar is included in the Releases/1_75/release1_75.zip archive) as well as being available on Subversion (EnsemblTest). (Or source files may be viewed individually as a web page by clicking on the filename below).

example command-lineusage

>ls
classes lib source

>find . -name *.java
./source/uk/ac/roslin/ensembl/demo/ArchivesConnection.java
./source/uk/ac/roslin/ensembl/demo/AssemblyExceptions.java
./source/uk/ac/roslin/ensembl/demo/BacterialGeneHomologues.java ... etc.

>ls lib
ensembl-model-1.75.jar biojava3-core-3.0.3.jar cglib-nodep-2.1_3.jar mybatis-3.0.2.jar ensembl-config-1.75.jar mysql-connector-java-5.1.6.jar ensembl-data-access-1.75.jar ensembl-data-access-interface-1.75.jar slf4j-api-1.6.1.jar ensembl-datamapper-1.75.jar ensembl-datasource-aware-model-1.75.jar (optional slf4j-log4j12-1.6.1.jar & log4j-1.2.16.jar or the slf4j-nop-1.6.1.jar)
OR
>ls lib

jensembl-bundle-no-config-1.75.jar ensembl-config-1.75.jar

  • COMPILING
    on *nix:
    javac -sourcepath source -classpath classes:lib/* `find . -name *.java` -d classes
    on Windows:
    javac -sourcepath source -classpath classes;lib\* classes\uk\ac\roslin\ensembl\demo\*.java -d classes

  • RUNNING
    on *nix
    java -cp classes:lib/* uk/ac/roslin/ensembl/demo/Genes
    on windows
    java -cp classes;lib\* uk\ac\roslin\ensembl\demo\Genes

Note that it is necessary to include the logger facade binaries slf4j-api-1.6.1.jar but any compatible logger implementation, or the slf4j-nop-1.6.1.jar 'No Operation' logger can be included at run time if desired.  

 

Example Code Snippets

Load a Registry
NEW v1.22 Selective loading of Releases
The Default Version for Data Queries is 'Current'
MyBatis Configuration Handles Ensembl Versioning
DNA Sequences
Assembled DNA Sequences
Feature Locations
 

Load a Registry (All available datasources and versions at a datasource)

// Connect to default Ensembl or EnsemblGenomes datasource

DBRegistry ensReg 
             = new DBRegistry(DataSource.ENSEMBLDB);
DBRegistry ensgenReg 
             = new DBRegistry(DataSource.ENSEMBLGENOMES);

NOTE: because of the increasing size of the DataSources (particularly the number of species in EnsemblGenomes) loading complete registries may be unaccebtably slow. Users may prefer to create and query an uninitialized registry or to make a registry containing data from only the current or a specified release (see below).

// Connect to locally configured datasource

RegistryConfiguration conf = new RegistryConfiguration();
conf.setDBByFile(new File("dbConn.properties"));
conf.setSchemaByFile(new File("schema.properties"));
DBRegistry localReg = new DBRegistry(conf);
// Retrieve species objects from a registry

Species chick = eReg.getSpeciesByAlias("chicken");
Species O81 = egReg.getSpeciesByAlias("E_coli_O81");
 

Discover available releases and selectively load a Registry

In order to improve the speed and efficiency of data loading for users who only wish to use the current release data (or data from a particular release version) various new static methods were added to the DBRegistry class in release 1.75. It is nowpossible to query which versions are available without completely initializing the Registry, and information is also available from the SchemaVersion class about which schema versions and data releases the current API can use.

QUERY THE API CONFIGURATION (using SchemaVersion class)

SchemaVersion s = new SchemaVersion();

System.out.println("API CONFIGURATION: Current declared Ensembl Version: "+s.getCurrentEnsemblVersion());

System.out.println("API CONFIGURATION: Current declared EnsemblGenomes Version: "+s.getCurrentGenomesVersion());

System.out.println("API CONFIGURATION: List of Registered Ensembl Schema Versions: "+Arrays.toString(s.getRegisteredSchemas()));

System.out.println("API CONFIGURATION: List of Registered EnsemblGenomes Release Versions: "+Arrays.toString(s.getKnownGenomesReleases()));

DISCOVER AVAILABLE RELEASES WITHOUT INITIALIZING THE REGISTRY

DBRegistry reg = DBRegistry.createUninitializedRegistryForDataSource(DataSource.ENSEMBLDB);

//Prints brief report of detailing which versions are available in this Data Source for this API release
System.out.println(reg.getVersionReport());

//Print report further details of releases present in the Data Source
//which cannot be recognized by this API
System.out.println(reg.getBriefRegistryReport());

//Create detailed report listing all Species (and aliases), versions and databases available in the registry
File report = reg.getRegistryReport();

INITIALIZE A REGISTRY FOR PARTICULAR RELEASES

//Initialize Registry soleley for current release data
DBRegistry reg =
DBRegistry.createRegistryForDataSourceCurrentRelease(DataSource.ENSEMBLDB);

//Initialize data solely for Ensembl (vertebrates) release 72.
DBRegistry reg =
DBRegistry.createRegistryForDataSourceAtReleaseVersion(DataSource.ENSEMBLDB, 72);

//The release version for EnsemblGenomes and Bacteria does not match the schema number
DBRegistry reg =
DBRegistry.createRegistryForDataSourceAtReleaseVersion(DataSource.ENSEMBLGENOMES, 20);

 

Data is retrieved by default from the most recent DB schema, but earlier versions can be specified (e.g. version 55).

Chromosome chr25_curr = chick.getChromosomeByName("25");

Chromosome chr25_55 = chick.getChromosomeByName("25","55");

//The API also handles multispecies bacterial databases
Chromosome O81_chr = O81.getChromosomeByName("chromosome");
 

MyBatis configuration transparently handles schema changes between Ensembl release versions.

DAOFactories for each database use the appropriate Ibatis mapping configuration for each Ensembl Release. Changes to the database schema are handled by addition of version-specific mappings to the Configuration package/artifact (ensembl-config.jar), and limiting changes required in the Model and DataAccess artifacts.

For example, in Release 51 the column 'hit_id' was renamed to 'hit_name' in the core 'protein_feature' table. A new version of the MyBatis mapping file 'protein_feature.xml' is used for subsequent releases. The API transparently determines the correct mapping context for different database versions (via the DAOFactory objects).

Species hsa = eReg.getSpeciesByAlias("human");
Gene g58 = hsa.getGeneByStableID("ENSG00000139618", "58");
g58.getDescription();

   ==> breast cancer 2, early onset [Source:HGNC Symbol;Acc:1101]

g58.getCanonicalTranslation().getProteinFeatures().size();

   ==> 34

Gene g50 = hsa.getGeneByStableID("ENSG00000139618", "50");

g50.getDescription();

   ==> Breast cancer type 2 susceptibility protein (Fanconi anemia group D1 protein). [Source:Uniprot/SWISSPROT;Acc:P51587]

g50.getCanonicalTranslation().getProteinFeatures().size();

   ==> 34

In a further example the merging of separate '***_stable_id' tables with the 'gene', 'exon', 'transcript' and 'translation' tables in Ensembl release 65 requires different SQL queries to be run post and prior this release. Again the API transparently determines the correct mapping context for different database versions (via the DAOFactory objects).

Underlying getExon SQL query in uk.ac.roslin.ensembl.configfiles.schema.57.core.Exon.xml

  
  SELECT * 	
  FROM
  (  
    (exon e, exon_transcript et, seq_region sr)
      LEFT JOIN exon_stable_id esi ON esi.exon_id = e.exon_id)
 
   WHERE
     e.is_current = 1
   AND
     sr.seq_region_id = e.seq_region_id
   AND
     e.exon_id = et.exon_id 
   <if test="featureStableID != null" >
     AND esi.stable_id  = #{featureStableID}
   </if>
   <if test="featureID != null" >
     AND e.exon_id= #{featureID}
   </if>
   <if test="transcriptID != null" >
     AND et.transcript_id= #{transcriptID}
   </if>
   ORDER BY et.transcript_id, et.rank 

Underlying getExon SQL query in uk.ac.roslin.ensembl.configfiles.schema.65.core.Exon.xml

   
  SELECT *	
  FROM
    exon e, exon_transcript et, seq_region sr

  WHERE
    e.is_current = 1
  AND
    sr.seq_region_id = e.seq_region_id
  AND
    e.exon_id = et.exon_id
  <if test="featureStableID != null" >
    AND e.stable_id  = #{featureStableID}
  </if>
  <if test="featureID != null" >
    AND e.exon_id= #{featureID}
  </if>
  <if test="transcriptID != null" >
    AND et.transcript_id= #{transcriptID}
  </if>
  ORDER BY et.transcript_id, et.rank

Again, because this is handled purely via the MyBatis mappings rules in the Configuration module (directed by an individual DAOFactory for each database instance), the access code is the same for all release versions.

  public Collection<DAExon> DATranscript.getExons();

    is implemented by calling 
	
  Collection<DAExon> out =  
    (Collection<DAExon>) this.getDaoFactory()
      .getExonDAO().getExonsForTranscript(this);

    where the DBExonDAO uses its version-aware DAOFactory 
       to obtain the correct Query for that release 
       version via the ExonMapper Interface

  SqlSession session = this.getFactory().getNewSqlSession()  
  ExonMapper mapper = session.getMapper(ExonMapper.class);
  List<DAExon> = mapper.getExon(featureQuery);

 

DADNASequences extend and modify BioJava Sequence Objects behaviour.

Sequences can be instantiated directly
DADNASequence myseq 
      = new DADNASequence("accgggttttMKYRNWSVBDHACTG");
             
myseq.getSequenceAsString(3,24);

             ==> cgggttttMKYRNWSVBDHACT
             
myseq.getReverseComplementSequenceAsString(3,24);

             ==> AGTDHVBSWNYRMKaaaacccg
 

AssembledDNASequences are DNASequences that hold an Assembly object

Chromosomes are examples of AssembledDNASequences, that hold an Assembly that stitches together component DNASequences and Gaps.
They can be retrieved by Name (and Version):
Chromosome chr25v67  = chick.getChromosomeByName("25", "67"); 
But the 'actual' DNA sequence of AssembledDNASequences is only lazy loaded from the constitutent components of the Assembly as required.
The necessary mapping between the Coordinate Sytems of the AssembledDNASequences, Contigs and constitutent DNASequences is handled transparently by the API.
System.out.println(chr25v67).getSequenceAsString(1000000, 1000020));

             ==> CACAGGTATTTCTTGATCCTC
 

Feature locations on a DNASequence (i.e. 'mappings') can be queried bidirectionally using lazy load query.

Gene gene 
      = chick.getGeneByStableID("ENSGALG00000009011");
Features such as Genes are mapped on DADNASequences.
A Gene's location on the Chromsome can be queried:
System.out.println("Gene location: "
    + gene.getChromosomeMapping().getTargetCoordinates());

           ==> Gene location: 3080 - 12536 (REVERSE_STRAND)
This mapping held by the Gene references the Chromosome:
Chromosome chr  = (Chromosome) gene.getChromosomeMapping().getTarget();
System.out.println("Chromosome "+chr.getName());

           ==> Chromosome: 25
The Chromosome also holds mappings of its Gene locations (obtained by lazy-load):
List<? extends Gene> genesOnRegion 
       = chr.getGenesOnRegion(1,chr25_curr.getLength());
All of these Coordinates on the Chromosome can be transparently mapped to coordinates on the constituent Contigs and DNASequences in Ensembl that represent real sequence data (see above).

 

Example Code Files

RegistryFactory.java
NEW v1.75 In order to improve the speed and efficiency of data loading for users who only wish to use the current release data (or data from a particular release version) various new static methods have been added to the DBRegistry class. It is possible to query which versions are available without completely initializing the Registry, and information is also available from the SchemaVersion class about which schema versions and data releases the current API can use.
EnsemblConnection.java
Demonstrating typical connection to the Ensembl (Vertebrate) datasource, Registry autoconfiguration and basic data retrieval functions.
GenomesConnection.java
Demonstrating typical connection to the EnsemblGenomes (Non-Vertebrate) datasource, (Plants, Protists, Fungi, Metazoa and Bacterial Collections).
ArchivesConnection.java
Demonstrating typical connection to the archives of the Ensembl (Vertebrate) datasource.
SpeciesVersions.java
Demonstrates retrieving data for different release versions of Species assemblies. Includes access by alias, handling of species/database renames (e.g. Orangutan), shows how species in bacterial collections are accessed, concentrating particularly on the major expansion, reorganisation and renamings that were implemented in EnsemblGenomes v17.
LoadLocalDatasourceProperties.java
Demonstrate use of a local db-connection.properties file instead of using the file in-place in the configuration module.
LoadLocalConfigurationProperties.java
Demonstrate use of a local configuration.properties file instead of using the file in-place in the configuration module
CommandLineConfiguration.java
Script that can load db-connection.properties and schema-version.properties files on command-line (will default to use local demo files).
DNASequences.java
JEnsembl Datasource Aware DNA Sequences extend org.biojava3.core.sequence.DNASequence but with modified behaviour specified by implementing uk.ac.roslin.ensembl.model.core.DNASequence. Demonstrates basic sequence functions of DADNASequences (use of BioJava STRAND is deprecated, JEnsembl works in a Positive context unless methods specify 'ReverseComplement'.
RevalidateAndLazyLoadSequences.java
The properties (i.e. features and nucleotide sequence) of DADNASequences (minimally with a valid database identifier) can be 'lazy loaded' when required.
ComponentAssembledSequencesOfChromosomes.java
Demonstration of how to retrieve sequences and complementary sequences from a chromosome; the transparent assembly of a chromosome from its component sequences; the internals of a chromosome assembly - made up of mapped component DADNASequences.
ChromosomeCaching.java
Demonstrates the use of the chromosome cache. Chromosomes for a given species/version are only instantiated once - and then cached for reuse. Their properties are lazy loaded as needed; if initially instantiated as Generic DADNASequences - they can be validated.
AssemblyExceptions.java
The human genome assembly has assembly updates associated with release versions between major Genome builds. These are represented as Novel, Patch and Haplotype Exceptions (Note: Ensembl treats PseudoAutosomalRegions in a similar fashion, but JEnsembl integrates this information into the standard Chromosome Model).
Genes.java
Retrieving Ensembl annotated gene models thought the API. Genes are a specific type of 'feature' annotation. Genes are retrieved together with their mapped chromosomal location.
ExonsTranscriptionAndTranslation.java
Demonstrates integration of BioJava3 transcription and translation functions together with the retrieval and stitching together of exon sequences by the JEnsembl API. Uses BioJava 'transcription engines' for translation: the datasource is queried to use the correct codon table if specified. Examples shown for chordate, plant and bacterial genes.
TranscriptionAndTranslation.java
A fuller demonstration of moving between the various coordinate axes: chromosome, gene, primary transcript, processed (spliced) transcript, translation and protein, and retrieval of actual sequence data from each of these. Uses brca-2, a human gene on the forward strand of chromosome 13.
TranscriptionAndTranslationReverse.java
A similar demonstration of moving between the various coordinate axes as for TranscriptionAndTranslation.java above, but for zar1l, a human gene on the reverse strand of chromosome 13.
ExaminingObjectHashCodes.java
Mainly of use for developers, in order to to keep track of objects, for debugging or creating caches.
EnsemblGeneHomologues.java
Retrieving from the Ensembl Compara datasource all asserted homologues (orthologues and paralogues) for a given gene. Integration of gene information across the Ensembl Core and Compara schema, for example, comparing the mapping data for genes in Compara with information in the matching core database.
PlantGeneHomologues.java
Similar demonstration to 'EnsemblGeneHomologues.java' above but querying plant genes in EnsemblGenomes, to retrieve homologous plant genes. Separate Compara sources are available for the 'Plant', 'Fungi', 'Metazoa', 'Protist' and 'Bacterial' datasources.
BacterialGeneHomologues.java
Demonstration of bacterial gene homology searching using EnsemblGenomes.
EnsemblSyntenies.java
Demonstrates searching for regions of conserved synteny between one species and another. (Uses homology searches with genes in the selected region of the source chromosome).
PlantsSyntenies.java
As 'EnsemblSyntenies.java' above but querying a region of a plant chromosome using EnsemblGenomes datasource.
BacterialSyntenies.java
As 'EnsemblSyntenies.java' above but querying a region of a bacterial chromosome using EnsemblGenomes datasource.
Variations.java
Retrieving locations of Ensembl records of dbSNP variations within a chromosomal region of interest: examining whether these are unique or not.
VariationScript.java
Demonstration script for getting variations near genes.
Analysis.java
Analysis objects that describe the procedures used to generate particular feature annotations can be retrieved for Genes, Transcripts and ProteinFeatures. (Eventually will not be limited to Core Objects).
XRefs.java
Demonstrates the recovery and use of External References for 'XRef-ed' objects (genes, transcripts and translations). XRefs hold references informational objects in External Databases.
VegaAndCCDSIDs.java
Demonstration of both retrieval by VegaID (for Genes,Transcripts and Translation/Proteins) and CCDS_ID (for Transcripts) and retrieval of XRefs representing this information for these Object types. (Limited to the species with high quality manual annotations curated by the Vega and CCDS databases).
Synonyms.java
Demonstrates the retrieval of all name-synonyms held by all the XRefs of an object, or just the synonyms of a particular XRef. Also shows look up of objects by name using matches to a synonym.
LogicFromArkMAP.java
ArkMAP is a downloadable map drawing application the uses the JEnsembl API to download gene-annotated chromosome maps from Ensembl datasources. The application integrates JEnsembl data retrieval with ArkDB map drawing code which uses the Java Swing API. Salient features of Ensembl data retrieval are combined in this demonstration code (Getting a basic Ensembl map, Getting gene homologies, Getting SNP variations on an Ensembl map, Finding regions of conserved synteny for a selected chromosomal region, Displaying maps of assembly exceptions and haplotypes.)
LogicFromSavantPlugin.java
The Savant Plugin embeds JEnsembl functionality in a plugin for the Savant Genome Browser. Salient features of JEnsembl mediated data retrieval are shown in this demonstration code.
UserChristelle.java
Script developed for user: parses chromosomal regions (gene locations) specified in a local BED data file, uses the JEnsembl API to retrieve sequence data flanking the given locations.
UserSetzermann.java
Code developed for user script, uses the JEnsembl API to fetch all genes, transcripts and exons on all chromosomes: and writes details out to file.
BioinformaticsPublicationExamples.java
Reproduces the code examples shown in the JEnsembl publication in 'Bioinformatics'.