Untitled Document

The biomedical literature, as measured by the number of entries in the National Library of Medicine's MedLine database, has been growing exponentially (~e^0.043) for over two decades. Last year, 562,134 articles were added, more than 1540 per day. Furthermore, the genomic revolution is breaking down disciplinary boundaries in biomedicine, greatly expanding the number of potentially relevant publications that researchers must track. High-throughput techniques, such as expression arrays and shotgun proteomics, exacerbate this problem by identifying dozens to thousands of genes or gene products relevant to phenomena under study; many of those will have been characterized in subdisciplines previously thought to be unrelated to the study. Computational information extraction and retrieval techniques are becoming increasingly important tools for managing the biomedical literature, and rapidly finding and organizing all available information about large gene sets. Recent progress suggests that computational natural language processing techniques may be more effective in biomedical language than in general English. In this talk, I will give an overview of some of the relevant techniques and applications of computational natural language processing, as well as describe recent results obtained in my laboratory.