20th Annual International Conference on
Intelligent Systems for Molecular Biology


Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category Y - ''
Y01 - An Innovative Biological Named Entity Recognition System Using Support Vector Machine and Web Evidence
Short Abstract: The number of electronic biological publications is growing rapidly, motivating the development of effective automated information retrieval systems. Identifying named entities in literature is an essential part of many projects. We propose an SVM-based biological named-entity recognition system that utilizes a small set of corpus-specific contextual features and supportive web evidence. Both the features and supportive evidence are obtained with little human intervention. Gene/protein names frequently co-occur with a restricted set of local contextual words, and therefore the statistical significance of the co-occurrence with those contextual words indicates the likelihood that a candidate word is a gene/protein name. We collected 35,384 sentences from 724 full-text articles that have at least one of 1153 pre-identified gene/protein names, and selected 43 contextual features from meaningful words that most frequently co-occur with these gene/protein names at the sentence level. For each selected contextual term, supportive web evidence, i.e., the ratio of the number of web PDF documents that contain both the candidate and contextual terms compared to the number containing only the candidate term, is identified. An SVM model is then trained on each set of such ratio vectors. During the prediction stage, we also consider the TF-IDF value, which limits the search for gene/protein names to the most meaningful terms in the texts, to further refine the SVM-predicted positives. Our system’s performance is comparable to ABNER on unseen texts and achieves an F1-score of 0.496, while requiring far fewer features and allowing simple adaptation to any corpus.
TOP
Y02 - Contextualising and exploring human-pathogen molecular interactions through full-scale biomedical text mining
Short Abstract: A comprehensive list of human pathogens compiled by Woolhouse & Gaunt (2007; PMID 18033594) identified 1,399 known species. For each of these pathogens, focus in the literature (with varying intensity) has been placed on documenting molecular interactions between the host and the pathogen. In this study we have extracted human-pathogen molecular interactions from all available Medline abstracts and titles and open access PMC full text, using the current state of the art in text mining. Human-pathogen molecular interactions were extracted from these documents using Textpipe, a software pipeline integrating various top-performing named entity recognisers and event extractors. We created a database of the human-pathogen literature references by mapping co-occurrences of each pathogen and human mentions to documents using text matches from a combined NCBI and custom species synonyms list, matching associated MeSH terms and corresponding gene mentions from a previously constructed dataset using GNAT (PMID: 18689813). To explore the extracted human-pathogen interactions, we also grouped each species taxonomically into various categories such as bacteria, virus, genus etc. Furthermore, we supplemented each pathogen with other useful contexts, such as whether the pathogen is toxic or an allergen, its pathogenicity and whether a vaccine is available, and studied the nature of the associated interactions. Our results highlight commonly occurring types of interactions across human pathogens, relate these to infection outcome and treatment, and suggest candidate interactions for lesser-studied pathogens yet to have been recorded. In conclusion, we have provided a resource and unique insight into the global human-pathogen molecular interaction network.
TOP
Y03 - Computational support for the reconstruction of Biochemical Networks
Short Abstract: Biochemical networks as a synthesis of bio-molecular knowledge play an essential role in systems biology and drug-related research. Assembly of such networks from heterogeneous and dispersed sources is typically performed by a group of experts (curators) that systematically scan relevant publications or databases and manually extract relevant information. We identified a number of key aspects where this particularly tedious and time consuming task can be supported computationally. We outline a curation pipeline and elucidate important and novel components. First, we present a novel data model designed to seamlessly integrate network reconstruction, structural and functional network validation, formulation of mathematical models, and network simulations. Key concepts include the representation of the type and strength of the evidence supporting the relations between network entities and the use of controlled vocabularies from biomedical ontologies. Second, we describe ChemSpot, a named entity recognition tool for identifying mentions of chemicals in natural language texts. ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary achieving an F1 measure of 68.1% on the SCAI corpus, outperforming the only other freely available tool, OSCAR4, by 10.8 percentage points. Third, we provide preliminary results on a machine learning approach for identifying experimental techniques and methods in scientific literature. The proper combination of these methods allows for enhanced computational support in biochemical network reconstruction.
TOP
Y04 - LitMS: A flexible relevancy scoring system
Short Abstract: Several studies on distribution of biomedical terms and relationships in the scientific literature show that they obey Zipf’s law, where a minority of topics are well supported with many literature hits, and the majority (the “long tail”) have few papers. We are particularly interested in the “long tail” of term pairs mentioned in only one paper, because it represents a large proportion of the literature in which documents with valuable, unique findings are intermingled with documents containing co-located, yet unrelated, pairs of concepts. This is where new ideas emerge, whether it is characterization of a previously uncharacterized gene, or a novel association between two previously known.

We have developed LitMS, a flexible Literature Mining System, that scores individual documents on their relevance to one or more topics typically queried by researchers. LitMS uses the Apache Lucene information retrieval library to perform dictionary look-ups and extract contextual and positional features from the text, which are then passed to one of several scoring algorithms to assess relevance to specific topics of interest. Using Medline records as an example, we show that LitMS can accurately score document topicality (i.e. centrality to a concept), and can effectively scale to large datasets. LitMS scores also distinguish documents representing valid relationships between pairs of concepts from documents that mention pairs of unrelated concepts, achieving an F-score of 0.74 (0.89 precision and 0.63 recall) measured against a corpus of manually curated gene-disease relationships.
TOP
Y05 - Elucidating gene signatures that control the circadian rhythm in cyanobacteria using bioinformatics methods
Short Abstract: Cyanothece sp. ATCC 51142 is an
organism that has both photosynthetic and nitrogen fixing ability. It has developed a temporal regulation in which N2 fixation and photosynthesis occur at different times throughout a diurnal cycle with very high levels of CO2 fixation during the light and high levels of N2 fixation in the dark. The mechanisms underlying the circadian rhythm and
the signature genes elucidating this mechanism are addressed in this research. The objective is to integrate gene expression data with data and knowledge from prior studies using bibliomics techniques, in the de novo construction of quasi-complete transcriptional regulatory networks to identify gene signatures in functional motifs and
elucidate their role in circadian rhythms in cyanothece sp. ATCC 51142.
The sequence data of transcription profiling time series of cyanothece sp.ATCC 51142 grown in 12-hour light/12 hour dark then 24 h light was used to construct the initial global regulatory network. The differentially regulated genes were used as the bait list to perform text mining using “BioMap” an in-house tool to extract associations from the
literature. Different network topological features areused to identify the signature pathways during the day and night. The functions of
already known genes in well-studied homologous species were mapped to the function of the un-annotated genes of cynaothece sp. ATCC 51142. We have identified significant (p<0.05) signature pathways like photosynthesis,
pantothenate and CoA biosynthesis and Glyoxylate and dicarboxylate metabolism that
operate during the day. During the night, pathways such as ribosome, riboflavin
metabolism, and fatty acid biosynthesis sulfur metabolism were found to be significant
(p<0.05).
TOP
Y06 - Knowledge Extraction from Biomedical Literature using Approximate Subgraph Matching
Short Abstract: Recent research in information extraction from the biomedical literature has focused on extracting important interactions between biological components and their downstream effects, known as events. Events of interest have included gene expression, binding, or regulation events. Often events can function as participants in other events, and the complex, nested event structures make automatic event extraction a challenging task. In this work, we developed an Approximate Subgraph Matching (ASM) approach to computationally mine biological events.

The key contextual dependencies among biological entities from full parsing of annotated text are identified automatically as event rules. Event recognition corresponds to a search for a subgraph within the sentence graph that is approximately isomorphic to the rule graph. ASM allows partial matching by importing corresponding penalties and applying a distance threshold to determine the degree of similarity between a rule graph and a sentence graph. By integrating certain degree of error tolerance into the graph matching process, ASM is capable of retrieving biological events encoded by longer-range dependencies and various syntactic relationships in sentences, while maintaining the extraction precision at the high level.

We have applied ASM to the GENIA event extraction task of BioNLP-ST-2011. It achieves a F-score of 51.9% on the development data, the first system using automatically learned rules to obtain a comparable performance to the dominating SVM-based systems. The 70.8% precision also shows an advantage over other methods. When applied to the extraction of protein-residue associations, ASM achieves a state-of-the-art 84% F-score, demonstrating the generalization capability of the approach.
TOP
Y07 - Construction of a Semantic Chinese Medical Terminology DictionaryLexicon
Short Abstract: Lexicon-based analysis is commonly used in medical language processing applications such as information extraction, entity annotation, etc. The quality of many medical language processing systems depends highly on the completeness of the lexicons they use. The Unified Medical Language System (UMLS) is a well-developed biomedical vocabularies for English, but resources for Chinese are limited. In this study, we construct a specifically designed Chinese Medical Thesaurus(CMT) as a basic tool to facilitate subsequent information retrieval in Chinese medical documents. Terms in this dictionary are collected from various data sources. We get more than 13,000 total entries after data cleaning and removing redundancy. They are put into 6 categories: drug name, disease, symptom and sign, anatomic location, pathophysiology and laboratory test. Category names are assigned to each word as semantic tags. We evaluate CMT with annotated drug manual documents and some electronic discharge summaries from Fudan University Shanghai Cancer Center. CMT can successfully recognize nearly 80% of medical related words and give their semantic meanings corresponding to the context. To identify the medical concepts and their relations between terminologies, next step we'll try out method to automatically build semantic relationships such as ‘functionally_related_to’, ’is-a’ within the dictionary. With more delicate organization of these data, we believe it will be a helpful tool for Chinese medical document retrieval and text mining.
TOP

View Posters By Category

Search Posters:


TOP