The Encyclopedia of Life (EOL) Project

Phil Bourne1, Wilfred Li2, Baldridge, K.; Baru, C.; Byrnes, R.; Clingman, E.; Cotofana, C.; Ferguson, C.; Fountain, A.; Greenberg, J.; Jermanis, D.; Matthews, J.; Miller, M.; Mitchell, J.; Mosley, M.; Pekurovsky, D.; Quinn, G.B.; Reyes, V.; Rowley, J.; Shindyalov, I.; Smith, C.; Stoner, D.; Veretnik, S.
1bourne@sdsc.edu, San Diego Supercomputer Center; 2wilfred@sdsc.edu, San Diego Supercomputer Center

There are currently more than 800 genomes for which sequence data is publicly available. Accompanying this massive supply of genomic data is a need to annotate putative protein sequences from structural and functional points of view. The Encyclopedia of Life (EOL) is an ambitious project to extensive catalog the complete proteome of every living species in a flexible, powerful reference system. An open collaboration led by the San Diego Supercomputer Center, EOL will generate biological insight using the world's foremost academic computational resources. This includes calculating three-dimensional models and assigning biological function for all recognizable proteins in all currently known genomes.

Central to EOL genomic data processing is the use of an annotation pipeline, a computationally intensive process utilizing Grid, supercomputer and cluster computing resources. Important issues in the pipeline process are automation and associated automated quality assessment. In the pipeline model, this was addressed by the introduction of six reliability categories, a benchmark based on 1000 non-redundant SCOP folds and testing a variety of search conditions and methods within the benchmark.

Scientists will be able to uncover the prevalence of a given protein across all kingdoms of life, molecular interactions with that protein, and whether the function of the protein varies across species. EOL caters to a diversity of users, from researchers interested in proteomic associations, to undergraduates wishing to know the name and function of proteins associated with a particular organism, and even to elementary school students learning about proteins for the first time. For further information about the EOL project and to access the beta development version, point your web browser to: http://www.eolproject.info