The third meeting
of the Special Interest Group (SIG) for Text Mining was held in
conjunction with ISMB in Australia this year, following the 2001
meeting in Copenhagen and the 2002 meeting in Edmonton. The Text
Mining SIG has been organized by the BioLINK group www.pdg.cnb.uam.es/BioLINK
with its main contributors Lynette Hirschman from MITRE (Bedford,
MA) and Alfonso Valencia from the CNB (Madrid, Spain) together with
this years local organizers Christian Blaschke, Marc Light
and Alexander Yeh. The SIGs main goal has been to foster communication
in text mining and information extraction applied to biology and
biomedicine. To this end, the BioLINK group holds regular open meetings
to bring together researchers from the field to interchange ideas
and share them with a wider community interested in the latest developments;
in the past two meetings, the Text Mining SIG has included reports
from related SIGs (e.g., Bio-Ontologies and BioPathways).
successes in other fields, Natural Language Processing (NLP) techniques
were not introduced into biology until the late 90's. The field
has been dominated by two, not necessarily convergent, approaches:
oriented, where simple methods are used (possibly too simple)
to address "real" biological problems;
- Tool oriented,
where complex, state of the art NLP methods are used to address
problems that are not always relevant to biologists.
During the SIG
meeting it became apparent that three major bottlenecks hinder current
- The complex
and non-standardized nomenclature of genes and proteins in the
scientific literature. This makes it difficult to identify the
basic content of a document, in particular, the entities mentioned.
- The non-existence
of large, annotated standard corpora for training and evaluation
of alternative methods.
- The lack
of common standards and evaluation criteria that allow researchers
to compare the performance of different methodologies.
To begin to
address these problems, the BioLINK group is organizing a critical
assessment of text mining methods later this year (see www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html).
The assessment is inspired by the CASP evaluations and will be carried
out in collaboration with Swiss-Prot, HighWire Press, FlyBase and
other groups. The tasks will include entity detection (gene and
protein names) and the functional characterization of proteins by
assignment of Gene Ontology terms, based on full-text documents.
The annual one-day
Bio-Ontologies Meeting has now been running for six years. The meeting
consists of a series of half hour talks. These range in style from
the fairly formal, and complete pieces of work, through works in
progress, to the very informal, and discursive.
The theme of
this year's meeting was Ontologies and Text Analysis
The programme was strong this year, so that the tension between
allowing time for the many excellent talks, discussion and questions
from the floor was particularly keenly felt.
have been used for many years, the recent explosion of interest
within bioinformatics can be explained straightforwardly in three
words:- The Gene Ontology (www.geneontology.org).
The success of GO continued to show at this year's meeting, with
many talks relating directly or indirectly to GO.
Two main themes
came out heavily this year (the call, program, abstracts and slides
may be found at www.man.ac.uk/
~stevensr/meeting03/). The first of these was the intended theme
of the overlap between text and ontologies. This was the first time
a theme in this meeting had been followed indicating the close
relationship of the two areas.
The second theme
was the increase in bridges being built between formal ontologists
and bioinformaticians. We welcome the willingness of many within
the bio-ontology community to operate across domains to find solutions
that work. We can see in the sixth annual Bio-Ontologies Meeting
that the use of ontologies is moving out of a purely niche interest.
The 5th BioPathways
Consortium Meeting gathered 21 speakers, close to a hundred registered
participants, and an undetermined number of visitors from neighboring
featured two main scientific sessions, a session focused on databases
and data exchange, and a contributed session on software tools.
To foster depth of exchange, scientific sessions were structured
as a series of long presentations, concluded by one hour of open
discussion on the session theme.
and interactions on a systems scale session revolved around
a few key themes, such as the search for modules in
biological networks, or the use of stochastic models and inference
methods. The issues of reliability and validation of predictions
and evolution of metabolic networks session illustrated well
how the ongoing generation of measurements on a whole-cell scale
is driving the need for models (static and dynamic), for network
reconstruction methods, and for analytical approaches. A recurring
theoretical theme was the use of comparative approaches.
and Ontologies session featured presentations on pathways
exchange formats and languages by a fairly comprehensive selection
of the existing groups (SBML, BioPax, CellML, OMG-LSR), as well
as presentation on pathways databases, for which there is clearly
an unfilled need.Finally, the Software Tools session
include presentations on a variety of pathways data management and
the meeting confirmed that the field of networks-related computational
biology is more than ever in a fast-growth stage, with both frontier
and depth expanding. While some theoretical subfields, such as network
reconstruction from experimental data, are acquiring technical depth
and generating predictions of increasing biological relevance, there
is a clear trend towards a stronger coupling between theoretical
and experimental approaches, leading to new open questions on both
sides. Another noticeable trend is the strong revival of fields
which had been perceived as fairly well understood and stable, such
as metabolism, thanks both to the systems-wide perspective
and to new theoretical tools.
on the meeting program can be found at www.biopathways.org.