Oral Presentation Number: 36

J. Gregory Caporaso, University of Colorado Health Sciences Center, Dept. of Biochemistry and Molecular Genetics

Author(s): J. Gregory Caporaso, William A. Baumgartner, Jr., Hyunmin Kim, Zhiyong Lu, Helen L. Johnson, Olga Medvedeva, Anna Lindemann, Lynne Fox, Elizabeth K. White, K. Bretonnel Cohen, and Lawrence Hunter

Title: Information Retrieval, Question-Answering, Machine Learning, and Concept Recognition in TREC Genomics 2006

Abstract: TREC Genomics 2006 presented a challenge to automatically identify document passages that answer questions on twenty-seven topics from a collection of 162,259 full-text biomedical journal articles. Questions were derived from actual information needs of biomedical researchers, and performance was based on human evaluation of the retrieved passages. The Center for Computational Pharmacology approached this task by generating a candidate result set which was subsequently expanded (to improve recall) and then pruned (to improve precision). First, we created a Lemur search index and converted questions into term-expanded queries. Pseudo-relevance feedback was employed to expand search results. Next, from a pool of zone-filtered documents, we further expanded the result set using an LSA-like approach. In the final, false-positive eliminating step, we used naïve Bayesian classifiers trained on human-labeled data with features including bigrams, normalized semantic concepts, and OpenDMAP-style pattern matches. Three separate experiments allow us to compare the utility of the different techniques in this task.