ISCB Newsletter 9-2

BioLINK and BioCreAtIvE:
Linking Text to Biological Resources

Lynette Hirschman¹*, Rolf Apweiler⁶, Christian Blaschke², Evelyn Camon⁶, Gianni Cesareni⁷, K. Bretonnel Cohen⁸, Marc Colosimo1, Jeffrey Colombe1, Martin Krallinger³, Alexander A. Morgan1, Hagit Shatkay⁴, Larry Smith⁵, Lorraine Tanabe⁵, W. John Wilbur5, Alfonso Valencia^3*,

¹MITRE Corporation, ²BIOALMA, 3CNIO - Spanish National Center for Oncology Research, ⁴Queens University, Ontario, 5NCBI, NLM, NIH, 6EBI-EMBL, 7U. Rome Tor Vergata, 8University of Colorado Health Sciences Center.

*Correspondence should be addressed to Alfonso Valencia
(valencia@cnio.es) or Lynette Hirschman (lynette@mitre.org)

Text mining for biomedicine is a rapidly growing field with more than 200 papers published in 2005 and 100 systems available either online or for download. It is at this early stage in the life of a research area where it is important to set standards and evaluation criteria that pave the way for future progress, concentrating resources, focusing effort on key problems, and providing for shared data and
infrastructure.

For the past five years, the BioLINK ISCB SIG has been discussing the linkage of the biomedical literature to other biological resources, particularly ontologies and biological databases. Today's biologists and bioinformaticians rely not only on access to the literature (through PubMed), but increasingly on curated biological databases, such as the model organism databases, protein sequence and structure databases, and protein interaction and gene expression databases. In 2006 BioLINK will be co-organized with the Bio-ontologies SIG
(bio-ontologies.man.ac.uk/home.html).

BioLINK has focused on two factors: 1) the need of experimental biologists to obtain better access to the biomedical literature, and 2) the need of biological database curators for better tools to support curation of the literature. These discussions lead to the first BioCreAtIvE (Critical Assessment of Information Extraction in Biology), whose goal was to provide an assessment of the state-of-the-art for text mining applied to biological problems [1]. The first evaluation took place in 2003-2004 and attracted broad participation from the bioinformatics and biomedical text mining community. There were two tasks; the first for extraction of gene or protein names from text and their mapping into standardized gene identifiers. The second task addressed the automatic extraction and assignment of Gene Ontology (GO) annotations to human proteins based on evidence from full text articles.

We are now launching the second BioCreAtIvE challenge, to be held during October of 2006, with the workshop expected to take place in Spring 2007 (see biocreative.sourceforge.net). It will consist of three tracks: Gene/Protein Mentions (GM), Gene/Protein Normalization (GN) and Protein-Protein Interaction (PPI). The PPI track of BioCreAtIvE II is new and will involve identifying protein-protein interactions from full text papers, including extraction of excerpts from those papers that describe experimentally derived interactions, for curation into one of two interaction databases: IntAct [2] and MINT [3]

References
[1] Hirschman, L., Yeh, A., et al. (2005) Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics. 2005;6 Suppl 1:S1.

[2] Hermjakob,H., Montecchi-Palazzi,L. (2004) IntAct - an open source molecular interaction database. Nucl. Acids. Res. 2004 32: D452-D455.

[3] Zanzoni,A., Montecchi-Palazzi,L., et al. (2002) MINT: a Molecular INTeraction database. FEBS Lett. 2002 Feb 20;513(1):135-40.