Extracting Transcription Factor Interactions from Medline Abstracts

Marc Light1, Robert Arens, Vladimir Leontiev, Meredith Patterson, Xinying Qiu, Hudong Wang
1marc-light@uiowa.edu, University of Iowa

Staying abreast of research on transcription factors (TFs) is currently a difficult task for biologists. The body of research is already too large and is growing. We are building a system that will extract TF interactions from Medline abstracts automatically, populating a database table with such interactions. We will use a number of computational linguistics modules. We are in the early stages of the project but have annotated, manually, a corpus of Medline abstracts that note TF interactions. We have also formally evaluated, on Medline abstracts, a number of component technologies that are likely to be useful for the task, e.g., tokenizer, part-of-speech tagger, word sense disambiguator, syntactic parser, etc. The corpus currently is comprised of 97 positive examples of abstracts noting TF interactions and 784 negative examples. In addition, for each positive abstract, sentences that imply interact have been marked along with the interacting TFs. In this poster we present the corpus, curation process, evaluations of the component technologies, and our proposed system design.