PRIME: automatically extracted PRotein Interactions and Molecular Information databasE.

Asako Koike1, Yoshiyuki Kobayashi2, Toshihisa Takagi
1akoike@ims.u-tokyo.ac.jp, Human Genome Center, The Institute of Medical Science, Univ. of Tokyo; 2yashi@ls.hitachi.co.jp, Life Science Group, Hitachi, Ltd.

PRIME is an integrated database involving major completely sequenced eukaryotes and is an extended version of Kinase Pathway Database (http://kinasedb.ontology.ims.u-tokyo.ac.jp/). It contains, protein-protein/family, protein-gene, protein-compound interaction data, domain information, structural information, and protein kinase classifications. It also provides an automatic pathway graphic image interface. The protein, gene and compound interactions are automatically extracted from abstracts for all genes and proteins by natural language processing. The method of automatic extraction uses phrase patterns, the GENA (http://gena.ontology.ims.u-tokyo.ac.jp/search/servlet/gena): protein, gene and compound name dictionary, and the Family name dictionary, which contains hierarchical family/protein name, to extract the protein interactions of ambiguous protein/gene names. With this database, pathways are easily compared among species using protein interaction data and ortholog tables and unknown pathways can be predicted with various conditions such as sequence similarities and domain compositions using other organism protein interaction data. The database will be available for querying and browsing at http://prime.ontology.ims.u-tokyo.ac.jp/ .