FlyMine: An integrated database for Drosophila and Anopheles genomics

Gos Micklem1, Andrew Varley2, Richard Smith, Rachel Lyne
1gos@gen.cam.ac.uk, University of Cambridge; 2ajv12@cam.ac.uk, University of Cambridge

The FlyMine project is an open-source project to build an integrated database of genomic, expression and protein data for Drosophila and Anopheles. We aim to provide a powerful and flexible query system, with the data available for arbitrary queries via a web interface and a programming API. The database itself is an object database built on top of PostgreSQL using the Apache OJB object/relational mapping tool, modified heavily in order to allow proper object-based queries, either using OQL or the FlyMine Query API (Java). At the underlying SQL level, the data in the tables are redundantly stored in a collection of "Precomputed tables" -- tables that are materialised views of one or more master tables. All incoming queries are automatically analysed to see if any combinations of these precomputed tables can be used to shorten the response time. This approach results in a substantial speed increase for many queries. This SQL re-writing module can be used independently of the FlyMine project to improve access to read-only SQL databases. Remote bioinformatics users will be able to access the data using the same query API over SOAP/HTTPS to the main FlyMine servers. The data model is specified as a UML diagram, which is used to automatically generate all model-specific parts of the system: therefore the FlyMine project can easily be applied to other domains. We will also provide a graphical object query tool to make it easy for non-programmers to formulate complex arbitrary queries against the data model. The project website is at www.flymine.org.