G-language Genome Analysis Environment Version 2

Yohei Yamada1, Kazuharu Arakawa2, Ryo Hattori, Yusuke Kobayashi, Hayataro Kouchi, Atsuko Kishi, Masaru Tomita
1skipper@g-language.org, Keio University, Department of Environmental Information; 2gaou@g-language.org, Keio Institute for Advanced Biosciences

G-language Project of Institute for Advanced Biosciences aims to introduce higher efficiency in genome analyses by: (1) Constructing an integrated environment for the development of analysis software. (2) Systematically accumulating existing analysis software, methodologies for analysis and their results. (3). Constructing generic analysis packages that allow users to avoid redundancy in the process of analysis. Development of version 1 of the G-language Genome Analysis Environment (G-language GAE) has reached a stable stage in July 2002, and we are currently implementing version 2 of the software with greater efficiency and flexibility. The version 2 core is rewritten to gain further speed and integrity through more flexible object-oriented architecture. Database access drivers, IO controllers, and calculation engines are easily pluggable, and by defining a specific super class, the internal data structures can also be altered. Supported genome database formats are extended to GenBank, Fasta, EMBL, Swiss, PIR, SCF, GCG, Ace, raw, ptt, eri, Qual, Phd, and the XML genome database formats such as GAME, BSML, and EML. Most common relational database systems such as Oracle, MySQL, Postgres, Ingres, Sybase, and Informix can be accessed under unified procedures. The whole system is perfectly compatible with version 1 of the software and Perl , and it can be easily linked to bioperl . An instance of G-language GAE is interchangeable with that of bioperl, and all analysis methods provided by the system is directly accessible with a bioperl instance. Large scale caching and storage of data internally with relational database reduces the cost of computation and memory, and a structured and persistent data realizes a ?session? of analysis that allows continuous analysis through cascades of analysis programs. An instance of G-language GAE can be instantly converted to and stored as a relational database. The new front-end ?Inspire? achieves platform independence and freedom in presentation by making HTML the basic framework of the output. With HTML, anyone can create multi-media presentation with little experience, and an expert will be able to create interactive and dynamic presentation with enhancements with Java, Flash, SVG, and etc. G-language GAE version 2 provides a set of API for the development of SVG, the Scalable Vector Graphics. SVG presentation can be clicked, zoomed in and out, dragged, and animated, and it can of course gain all the merits of Hyper Link to connect to the useful resources online. Inspire front-end runs both locally and on the Web; therefore, an application created for Inspire can be directly converted into a web application. This achieves complete operation system independence. The new genome map output is rendered in SVG. Analysis systems concurrently developed with G-language GAE version 2 are the new version of cDNA Analysis System (CASYS: Matsuzaki et al.), Comparative Genome Analysis System (COMGA: Nakamura et al.), Chi Sequence Analysis System (Arakawa and Kyuma et al.), and Bacteria Analysis System (BAS: Mori et al.). This work is supported by the members of G-language Project. URL: http://www.g-language.org/. G-language Project official Web site.