Automatic generation of cell-wide pathway model from complete genome

Kazuharu Arakawa1, Yohei Yamada2, Hiromi Komai, Kosaku Shinoda, Yoichi Nakayama, Masaru Tomita
1gaou@g-language.org, Institute for Advanced Biosciences, Keio University; 2skipper@g-language.org, Keio University

Knowledge in molecular biology is rapidly accumulating in the fields of genome, transcriptome, proteome, and metabolome, demanding for a systems biology approach in order to view the dynamic behavior of a cell as a complex system. However, simulation is a challenging task especially where large scale modeling is required due to the necessity for vast amount of accurate parameters. E-Cell project estimated the necessary cost for modeling the whole cell of Escherichia coli to be at least 1800 man month, from the experience in modeling a in silico mitochondria. Therefore a large scale modeling of cell in silico demands for a novel high-throughput approach. If successfully integrated, availability of large amount of genome sequence, transcripts and expression data, enzyme reaction data, metabolic pathway maps, and the data of metabolites in cells will create a strong base for a qualitative cell model. The Genome-based E-cell Modeling System (GEM System) developed upon the G-language Genome Analysis Environment realizes a fully automatic conversion of genome sequence data into a qualitative in silico cell model, linking information from major public database such as GenBank, EMBL, SWISS-PROT, KEGG, ARM, Brend, and WIT. The ORFs are matched to the corresponding proteins and thus to stoichiometric reactions, through a combined method using annotation, homology, and orthology. The generated reaction network is then checked with KEGG reference pathway maps for false positives and false negatives, and also for connectivity of pathways where applicable. Using the hybrid algorithm of dynamic and static simulation methods, when all rate limiting reactions are dynamically represented, every other reaction can be staticly represented with the same accuracy as completely dynamic simulation. GEM system can semi-automatically generate the static part of this hybrid algorithm, and provides a base for large scale dynamic simulation. The generated models can be directly simulated using E-Cell, and SBML porting is being developed.