ISCB Newsletter v.6.3

The third meeting of the Special Interest Group (SIG) for Text Mining was held in conjunction with ISMB in Australia this year, following the 2001 meeting in Copenhagen and the 2002 meeting in Edmonton. The Text Mining SIG has been organized by the BioLINK group www.pdg.cnb.uam.es/BioLINK with its main contributors Lynette Hirschman from MITRE (Bedford, MA) and Alfonso Valencia from the CNB (Madrid, Spain) together with this year’s local organizers Christian Blaschke, Marc Light and Alexander Yeh. The SIG’s main goal has been to foster communication in text mining and information extraction applied to biology and biomedicine. To this end, the BioLINK group holds regular open meetings to bring together researchers from the field to interchange ideas and share them with a wider community interested in the latest developments; in the past two meetings, the Text Mining SIG has included reports from related SIGs (e.g., Bio-Ontologies and BioPathways).

Despite the successes in other fields, Natural Language Processing (NLP) techniques were not introduced into biology until the late 90's. The field has been dominated by two, not necessarily convergent, approaches:

Application oriented, where simple methods are used (possibly too simple) to address "real" biological problems;
Tool oriented, where complex, state of the art NLP methods are used to address problems that are not always relevant to biologists.

During the SIG meeting it became apparent that three major bottlenecks hinder current development:

The complex and non-standardized nomenclature of genes and proteins in the scientific literature. This makes it difficult to identify the basic content of a document, in particular, the entities mentioned.
The non-existence of large, annotated standard corpora for training and evaluation of alternative methods.
The lack of common standards and evaluation criteria that allow researchers to compare the performance of different methodologies.

To begin to address these problems, the BioLINK group is organizing a critical assessment of text mining methods later this year (see www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html). The assessment is inspired by the CASP evaluations and will be carried out in collaboration with Swiss-Prot, HighWire Press, FlyBase and other groups. The tasks will include entity detection (gene and protein names) and the functional characterization of proteins by assignment of Gene Ontology terms, based on full-text documents.

Bio-Ontologies SIG

The annual one-day Bio-Ontologies Meeting has now been running for six years. The meeting consists of a series of half hour talks. These range in style from the fairly formal, and complete pieces of work, through works in progress, to the very informal, and discursive.

The theme of this year's meeting was “Ontologies and Text Analysis” The programme was strong this year, so that the tension between allowing time for the many excellent talks, discussion and questions from the floor was particularly keenly felt.

Although ontologies have been used for many years, the recent explosion of interest within bioinformatics can be explained straightforwardly in three words:- “The Gene Ontology” (www.geneontology.org). The success of GO continued to show at this year's meeting, with many talks relating directly or indirectly to GO.

Two main themes came out heavily this year (the call, program, abstracts and slides may be found at www.man.ac.uk/ ~stevensr/meeting03/). The first of these was the intended theme of the overlap between text and ontologies. This was the first time a theme in this meeting had been followed —indicating the close relationship of the two areas.

The second theme was the increase in bridges being built between formal ontologists and bioinformaticians. We welcome the willingness of many within the bio-ontology community to operate across domains to find solutions that work. We can see in the sixth annual Bio-Ontologies Meeting that the use of ontologies is moving out of a purely niche interest.

BioPathways SIG

The 5th BioPathways Consortium Meeting gathered 21 speakers, close to a hundred registered participants, and an undetermined number of visitors from neighboring SIGs.

The meeting featured two main scientific sessions, a session focused on databases and data exchange, and a contributed session on software tools. To foster depth of exchange, scientific sessions were structured as a series of long presentations, concluded by one hour of open discussion on the session theme.

The “Regulation and interactions on a systems scale” session revolved around a few key themes, such as the search for “modules” in biological networks, or the use of stochastic models and inference methods. The issues of reliability and validation of predictions were ubiquitous.

The “Function and evolution of metabolic networks” session illustrated well how the ongoing generation of measurements on a whole-cell scale is driving the need for models (static and dynamic), for network reconstruction methods, and for analytical approaches. A recurring theoretical theme was the use of comparative approaches.

The “Databases and Ontologies” session featured presentations on pathways exchange formats and languages by a fairly comprehensive selection of the existing groups (SBML, BioPax, CellML, OMG-LSR), as well as presentation on pathways databases, for which there is clearly an unfilled need.Finally, the “Software Tools” session include presentations on a variety of pathways data management and visualization tools.

In summary, the meeting confirmed that the field of networks-related computational biology is more than ever in a fast-growth stage, with both frontier and depth expanding. While some theoretical subfields, such as network reconstruction from experimental data, are acquiring technical depth and generating predictions of increasing biological relevance, there is a clear trend towards a stronger coupling between theoretical and experimental approaches, leading to new open questions on both sides. Another noticeable trend is the strong revival of fields which had been perceived as fairly well understood and stable, such as metabolism, thanks both to the “systems-wide” perspective and to new theoretical tools.

More details on the meeting program can be found at www.biopathways.org.