Strategies for Seeking and Publishing Biomedical Literature on the WWW

Richard K. Belew
University of California, San Diego

Mark Craven
University of Wisconsin

Synopsis

It is hard to imagine bioinformatics having grown up without the WWW. Biological scientists now search through, and post data, scientific publications and curricula in variety of formats as a natural part of their work. This tutorial will provide an introduction to both well-established and state-of-the-art methods for finding, publishing, and extracting information from on-line, text-based sources.

Topics to be covered include:

Information retrieval basics
The basic methods and principles that underlie all information retrieval systems, including general-purpose WWW search engines as well as biomedically-focused resources like PubMed.

Bibliometric search techniques
Systems such as Entrez and Swanson's ARROWSMITH that (like Google) exploit bibliographic citations to uncover important relationships connecting the biomedical literature.

Document similarity
Methods for identifying ``similar'' documents, based on both keywords and citations associated with the documents.

Portals to biomedical sources
Special-purpose "portals" for accessing the biomedical literature. These will include more conventional text search engines like PubMed and Entrez, as well as other sequence-based databases, such as SwissProt , that provide entry points to the biomedical literature.

Authority resources
Resources such as the Medical Subject Headings (MeSH) controlled vocabulary, which is used to index articles in MEDLINE, and the Unified Medical Language System (UMLS), intended to help programs and people better regularize content descriptors.

Morphological analysis
Methods that exploit the special morphology (surface structure) of many biomedical terms in order to improve retrieval accuracy.

Extraction methods
Emerging techniques that automatically extract targeted classes of keywords and relationships among them (e.g. protein-protein interactions) from text sources.

Quality assurance mechanisms
Proposed mechanisms (e.g., publication and annotation standards) designed to increase the fidelity of and confidence in the biological data resources.

Publishing tricks for the Web
Finding relevant information requires intelligent search by browsing users, but authors can also increase the chances their resources are found using techniques (e.g., HTML META tags) that better describe WWW pages to search engines.