Oral
Presentation Number: 36
Presenter: J.
Gregory Caporaso, University of Colorado Health Sciences Center, Dept. of Biochemistry
and Molecular Genetics
Author(s): J. Gregory Caporaso,
William A. Baumgartner, Jr., Hyunmin Kim, Zhiyong Lu, Helen L. Johnson, Olga
Medvedeva, Anna Lindemann, Lynne Fox, Elizabeth K. White, K. Bretonnel Cohen,
and Lawrence Hunter
Title: Information Retrieval, Question-Answering, Machine Learning, and Concept Recognition in TREC Genomics 2006
Abstract:
TREC Genomics 2006 presented a challenge to automatically identify
document passages that answer questions on twenty-seven topics from a collection
of 162,259 full-text biomedical journal articles. Questions were derived from
actual information needs of biomedical researchers, and performance was based
on human evaluation of the retrieved passages. The Center for Computational
Pharmacology approached this task by generating a candidate result set which
was subsequently expanded (to improve recall) and then pruned (to improve precision).
First, we created a Lemur search index and converted questions into term-expanded
queries. Pseudo-relevance feedback was employed to expand search results. Next,
from a pool of zone-filtered documents, we further expanded the result set using
an LSA-like approach. In the final, false-positive eliminating step, we used
naïve Bayesian classifiers trained on human-labeled data with features
including bigrams, normalized semantic concepts, and OpenDMAP-style pattern
matches. Three separate experiments allow us to compare the utility of the different
techniques in this task.
>>Close