An algorithm to select abstracts from MEDLINE concerning UV-regulated genes

Hiroko Ao1, Toshihisa Takagi
1aohiroko@ims.u-tokyo.ac.jp, Department of Computational Biology

Ultraviolet ray (UV) is one of the most serious risky factors for skin. It causes non-good-looking changes such as wrinkles, stains, and cancers. To investigate UV-regulated genes, we attempted to extract information of their interactions from MEDLINE database. However, retrieving by gene name did not solve the requirement because of enormous irrelevances. Accordingly, we have exploited an efficient algorithm to select potentially required abstracts from results of a PubMed search. The algorithm is based on the observations that the potential abstracts usually include a query (gene name) and at least one keyword. Keywords are chosen from synonyms collected from public databases (e.g., SWISS-PROT, LocusLink, etc.). When applied the heuristics to 487 UV-regulated genes, it extracted sentences containing the query with 97% precision and 97% recall.