Updated April 19, 2011
Wednesday – February 23, 2011
4:00 pm - 5:00 pm - Poster (Author) Set-up
5:00 pm - 7:00 pm - Poster Reception
Poster 1: Contextual Understanding of Experimental Data Via Formal Semantic Integration of NLP-extracted Content with other Semantically Integrated Resources
IO Informatics, Inc.
Berkeley, CA US
Linguamatics ltd. UK
Biological systems are inherently complex. Experimental results, especially if they cover multiple experimental modalities or diverse biological responses, are difficult to interpret out of context. This is a key area for the application of semantic technologies. The first step is the integration of analytical results under a well-formed application ontology. Extensible semantic integration standards such as RDF, N3 and OWL are used to create triples-based coherent dynamically extensible and remappable data models. This first step supports the rapid creation of coherent experimental correlation networks and provides a statistically relevant view of system perturbations. However, this does not necessarily provide a better understanding of biological functions involved. In order to achieve contextual understanding, these networks need to be further enriched by adding mechanistic knowledge. This contextual understanding requires the ability to bring in resources (either through direct connections or via queries to SPARQL endpoints) relevant to biological functions. The addition of information about interactions, pathways or other information from previous observations is relevant to describe biological processes, which may otherwise be missed. Natural Language Processing (NLP) can be used to extract relationships between concepts from resources such as scientific journal articles, collaborations, comparative studies and clinical trials. When the NLP extracted relationships are semantically integrated with experimental findings, the consequential view of the biological system is enhanced. Using thesauri to harmonize classes and relationships from those extracts and merging them into a dynamically extended application ontology results in functionally connected experimental results. This approach makes it possible to apply biomarker patterns or molecular signatures derived from the network to answer complex biological questions, and also to apply them actively for screening and decision support. This poster describes a use-case in which multiple experimental datasets (micro-RNA, sequencing, gene expression, drug target assays) have been semantically integrated, enriched with public knowledge resources (tissue-specific gene expression and regulation [TIGER], human RNA drug targets [TargetScan], miRBase, Microcosm, Diseasome) and supplemented with NLP extracted relationships concerning specific diseases (in this case, severe renal failure) from a variety of articles and other sources. Tools used in this scenario were IO Informatics’ Sentient Knowledge Explorer for the semantic integration of experimental data, ontology import, network visualization and graphical SPARQL queries in conjunction with relationships extracted from MEDLINE abstracts by Linguamatics’ I2E enterprise text mining platform. The resulting semantic network provides a reliable qualification of drug targets with broader applications. The kidney-disease related profiles generated in this example are based on contextual understanding of the biological functions involved in the disease and their manifestation in grounded experimental observations as well as through verification with mined content from trusted resources. Such methodology significantly impacts the way life sciences’ and drug discovery research is leading towards more effective drugs, and for widespread use in personalized medicine to improve the quality of life.
Poster 2: DrugTree: A Phylogenetic Platform to Study Protein-ligand Binding Relationships in the Drug Discovery Process
D. Jason Seraydarian
Department of Computer Science
Montclair State University
Montclair NJ US
In the News: CSHALS Poster Contributor listed on the F1000 Posters Bank:
The discovery of drugs that have the desired pharmacological profiles is critical for human health and survival yet time consuming and expensive. Consequently we must aim for obtaining maximum benefits from those medicinal compounds that have already been identified and found to have favorable properties. The DrugTree Project creates toolkit for scientists interested in understanding the broader implications of the relationship between phylogenetics and the binding between a homologous set of enzymes and their corresponding ligands and inhibitors. Phylogeny is a useful context in which to view these relationships: As a protein evolves, one feature that changes is the binding pocket and hence binding specificity. Consequently, evolutionary relationships can provide predictive power to establish the binding between a given ligand and a homolog based on known binding relationships within a protein family. Insight into which phylogenetically prevalent amino acid changes within the binding site are responsible for different ligand specificities amongst the homologs in a family may also be gained. The DrugTree Project has completed a prototype World Wide Web-based computing system that integrates both phylogenetic data and analyses about enzymes with known information about their ligands and inhibitors. Currently, no one data repository integrates the drug-target, protein-ligand curated datasets with a large, popular protein database like UniProt and then gives tools to allow users to view these datasets in a phylogenetic-meaningful context. The DrugTree tool integrates data from UniProt (www.uniprot.org/), the BindingDB (www.bindingdb.org/) and BRENDA (www.brenda-enzymes.org/) databases to allow the user to create trees with data from both UniProt, with its massive non-redundant database and the data from the known inhibitor repositories. The system initially integrates these three dataset, creating a local repository. Via Web interface, it allows a user to create a phylogenetic tree for a homologous set of enzymes. It then enables the user to perform phylogenetic reconstruction analyses via parsimony techniques with a select few phylogenetics reconstruction algorithms. Finally, the tool then maps allows the user to view the compounds that inhibit or bind each homolog next to the enzyme name. This poster introduces the DrugTree tool. It will demonstrate its effectiveness through analysis of a subset of dihydrofolate reducatase proteins and some of the set’s known inhibitors. Dihydrofolate reductase is both an important target and a good model system: this enzyme has recently been of interest as a drug target in global health issues including treatment of various parasitic diseases such as malaria, African sleeping sickness, Changa’s disease, and tuberculosis. Many sequences and crystal structures are available for dihydrofolate reductase and purification is easy due to the commercial availability of affinity chromatography resin specific for this enzyme. Therefore, it is ideal in verifying our results. It will also discuss our future development plans for the DrugTree platform.
Poster 3: Data Driven Derivation of Canonical Eligibility Criteria for Clinical Trials
University of Missouri
Kansas City, Missouri US
Recruitment of subjects for clinical trial research is currently an inefficient and time-consuming process in the development of a new drug. Recruitment challenges are particularly difficult for studies involving vulnerable populations, especially those with psychiatric disorders. The other major hurdle to automate the process is that eligibility criteria are written in free text that cannot be reliably parsed or processed computationally. To overcome these obstacles, we created an intelligent online system which targets the following two goals: Helping recruiters to develop/specify a standardized representation of eligibility criteria. Automate selection of candidates for mental health research studies. As proof of concept, the methodology has been developed and validated on a corpus of 701 clinical trials on Generalized-Anxiety-Disorder containing 2765 and 4411 redundant inclusion and exclusion criteria set. A combination of Ontological terms pairwise matching and clustering is used to present semantically non-redundant eligibility criteria set. Text mining techniques like removal of punctuations, breaking criteria into individual sentences ,excision of stop words, stemming, conversion of a phrase to a single term were used to remove the noise from the free text .Finally only the ontological terms (terms of SNOMEDCT,MESH & LOINC) in each criteria are extracted for a symmetric pairwise scoring. The MCL (Markov-Chain-Clustering) was done for the above obtained output. The clusters obtained are transformed to ontological concepts using Tf-Idf terms of each cluster and mapping concepts to terminological hierarchy of SNOMEDCT and NCI-Thesaurus. Each cluster concepts are in turn linked to database queries dynamically. Protégé was used for ontology creation. Jena API to interact with ontology and SPARQL to construct queries. Finally these queries are used to retrieve patient’s records. Results for a particular study are ranked based on percentage of the criteria list satisfied. The recruiter receives suggestions for creating criteria by associative rule mining of eligibility concepts. The total numbers of non-redundant inclusion and exclusion concepts obtained were 126 and 175. The clustering accuracy is 0.93 for inclusion and 0.95 for exclusion determined using F-measure. The first 15 inclusion and 23 exclusion concepts (Taken based on size of cluster) set could cover 85% of the redundant criteria set. From the ontology developed any new eligibility criteria related to GAD can be mapped to a cluster ontology and in turn to a database query which is used to search the patient database table. Thus the recruitment of patient’s process can be largely automated. This paper presents a complementary data driven approach to help find a minimal non-redundant representation of an arbitrary collection of clinical trial eligibility criteria and automates the recruitment of patients for clinical trials. Thus our system allows the recruiters to the have the flexibility of using free-text while the semantics of the criteria are captured for computer readability. We could like to acknowledge National Institute of Mental Health for funding the project (1R43MH085372-01A1).
Poster 4: Integrating Multi-Dimensional Genomic, Proteomic and Clinical Data of Inflammation and Injury
Massachusetts General Hospital
Stanford Genome Technology Center
Stanford, CA US
Recent developments in high throughput technologies have enabled direct studies of patients’ genomic response to diseases and treatments, and new computational methods need to be developed to translate the large amount of genomic, proteomic and clinical data to new knowledge in medicine. Over the past nine years, the Inflammation and the Host Response to Injury Glue Grant Consortium has utilized multiple experimental tools to study the temporal immune-inflammatory response in blood leukocytes and sub-populations from over 500 severely injured patients, together with their comprehensive clinical information. We are developing semantics-based approaches in integrating these genomic and proteomic data with clinical information of patients to elucidate disease mechanism and predict patient outcomes.
Poster 5: Translational Medicine in Action: Linking and Visualizing a Network of Biomedical Research Scientists using Nexus
Stony Brook University School of Medicine
Stony Brook, NY US
Poster - .pdf: click here
The goal of translational medicine is to translate basic science research into advances in clinical medicine. One way to meet this goal is to pair up basic scientists with clinical researchers who share common research interests. The challenge is that the terms used by each group do not perfectly align. To demonstrate the utility of using semantic web technology in translational medicine we apply it to interconnect clinical and basic scientists research interests. The research interests of SUNY Reach faculty were obtained from MeSH terms of publication data and are expressed in the VIVO ontology normalized to the UMLS. The VIVO ontology is part of the NIH funded VIVO project to interlink research scientists across different institutions. To explore novel interconnections in the network of research scientists the Nexus visualization environment was utilized. Nexus, a locally developed project, is a semantic web visualization tool built on the OpenSimulator platform. Nexus allows collaborative real time viewing and annotating of RDF data in a 3D environment.