IO Informatics

Conference on Semantics in Healthcare and Life Sciences


Updated May 08, 2014

Indicates that the poster is available.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Posters will be on display throughout the conference, beginning 5 p.m. Wednesday, February 26, with a special reception for poster authors to present their work to conference delegates:

  • Poster Set-up: Wednesday – February 26, 4:00 p.m. – 5:00 p.m.
  • Poster Reception: Wednesday – February 26, 5:00 p.m. – 6:30 p.m.

When preparing and designing your poster please note that it should be no larger than 44 inches wide by 44 inches high (there are two posters per side).
  Posters must be removed by 3:00 p.m., Friday, February 28.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 01
A Semantic Computing Platform to Enable Translational and Clinical Omics

Jonathan Hirsch, Syapse, United States
Andro Hsu, Syapse, United States
Tony Loeser, Syapse, United States

To bring omics data from benchtop to point of care, labs and clinics must be able to handle three types of data with very different properties and requirements. The first, biomedical knowledge, is highly complex, continually evolving, and comprises millions of concepts and relationships. The second, medical data including clinical health and outcomes records, is temporal, unstructured, and hard to access. The third, omics data such as whole-genome sequence, is structured but voluminous. Attempts to bridge the three have had limited success. No single data architecture allows efficient querying of these types of data. The lack of scalable infrastructure that can integrate complex biomedical knowledge, temporal medical data, and omics data is a barrier to widespread use of omics data in clinical decisionmaking. Syapse has built a data platform that enables capture, modeling, and query of all three data types, along with applications and programming interfaces for seamless integration with lab and clinical workflows. Using a proprietary, semantic layer based on Resource Description Framework (RDF) and related standards, the Syapse platform enables assembly of a dynamic knowledgebase from biomedical ontologies such as SNOMED CT and OMIM as well as customers’ internally curated knowledge. Similarly, HL7-formatted medical data can be imported and represented as RDF objects. Lastly, Syapse enables federated queries that associate RDF-represented knowledge with omics data while retaining the benefits of scalable storage and indexing techniques. Biologists and clinicians can access the platform through a rich web application layer that enables role-specific customization at any point in the clinical omics workflow. We will describe how biologists and clinicians use Syapse as the infrastructure of an omics learning healthcare system. Clinical R&D performs data mining queries, e.g. selecting patients who share disease and treatment characteristics to identify associations between omics profiles and clinical outcomes. Organizations update clinical evidence such as variant interpretation or pharmacogenetic associations in a knowledgebase that triggers alerts in affected patient records. At point of care, clinical decision support interfaces present internal or external treatment guidelines based on patients’ omics profiles. The Syapse platform is a cloud-based solution that allows labs and clinics to deliver translational and clinical omics with minimal IT resources.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 02:
Optimization of RDF Data for Graph-based Applications

Ryota Yamanaka, The University of Tokyo, Japan
Tazro Ohta, Database Center for Life Science, Japan
Hiroyuki Aburatani, The University of Tokyo, Japan

Various omics datasets are publicly available in RDF via their data repositories or endpoints and this makes it easier to obtain integratable datasets from different data sources. Meanwhile, when we use parts of linked data in our own applications for data analyses or visualization, the data format does not have to be RDF but can be processed into appropriate formats according to usage. In fact, most web applications handle table format data rather than RDF.

Application developers can think of two reasons to convert semantic networks into other data models rather than keeping original RDF in backend databases. One reason is the difficulty of understanding complex RDF schema and writing SPARQL queries, and another reason is that the data model described in RDF is not always optimized for search performance. Consequently, we need practical methods to convert RDF into the data model optimized for each application in order to build an efficient database using parts of linked data.

The simplest method of optimizing RDF data for most applications is to convert it into table format data and store it in relational databases. In this method, however, we need to consider not only table definition but also de-normalization and indices to reduce the cost of table-join operations. As a result, we are focusing on graph databases instead. In graph databases, their data models can naturally describe semantic networks and enable network search operations such as traversal as well as in triplestores.

Although the data models in graph databases are similar in structure to RDF-based semantic networks, they are different in some aspects. For example, in the graph database management system we used in this project, Neo4j, relationships can hold properties, while edges in RDF-based semantic networks do not have properties. We are therefore researching how to fit RDF data to effectively use graph database features for better search performance and more efficient application development.

We have developed tools to convert RDF data (as well as table format data) and loaded sample data into graph database. Currently, we are developing demo applications to search and visualize graph data such as pathway networks. These resources are available at

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 03:
A Phenome-Guided Drug Repositioning through a Latent Variable Model

Halil Bisgin, National Center for Toxicological Research, United States
Zhichao Liu, National Center for Toxicological Research, United States    
Hong Fang, National Center for Toxicological Research, United States
Reagan Kelly, National Center for Toxicological Research, United States    
Xiaowei Xu, University of Arkansas at Little Rock, United States    
Weida Tong, National Center for Toxicological Research, United States

The phenome has been widely studied to find the overlaps with genome, which in turn, has identified correlations for diseases. Besides its explanatory power for causalities, the phenome has been also explored for drug repositioning, which is a process of identifying new uses of existing drugs. However, most of current phenome-based approaches limit the search space for candidate drugs with the most similar drug. For a comprehensive analysis of the phenome, we assumed that all phenotypes (indications and side effects) were generated with a probabilistic distribution that can provide the likelihood of new therapeutic indications for a given drug. We employed Latent Dirichlet Allocation (LDA), which introduces latent variables (topics) that were assumed to govern phenome distribution. Hence, we developed our model on the phenome information in Side Effect Resource (SIDER). We first examined the recovery potential of LDA by perturbating the drug-phenotype matrix for each of the 11,183 drug-indication pairs. Those indications were assumed to be unknown and tried to be recovered based on the remaining drug-phenotype pairs. We were able to recover known indications masked during the model development phase with a 70% success rate on a portion of drug-indication pairs (5516 out of 11183) that have probabilities greater than the random chance (p>0.005). After obtaining such a decision criterion that considers both probability and rank, we applied the model on the whole phenome to suggest alternative indications. We were able to retrieve FDA-approved indications of 6 drugs whose indications were not listed in SIDER. For 907 drugs that are present with their indication information, our model suggested at least one alternative treatment option for further investigations. Several of the suggested new uses can be supported with information from the scientific literature. The results demonstrated that the phenome can be further analyzed by a generative model, which can discover the probabilistic associations between drugs and therapeutic uses. In this regard, LDA stands as a promising tool to explore new uses of existing drugs by narrowing down the search space.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 04:
Cross-Linking DOI Author URIs Using Research Networking Systems

- Click here for PDF of poster

Nick Benik, Harvard Medical School, United States
Timothy Lebo, Rensselaer Polytechnic Institute, United States
Griffin Weber, Harvard Medical School, United States

A proof-of-concept application was created to automatically cross-link publications that were written by the same person by harvesting linked open data from institution-based research networking systems. This is important because it (1) helps people identify related articles when exploring the biomedical literature, (2) gives scientists appropriate credit for the work they have done, and (3) makes it easier to find experts in a subject area. Our presentation will include a demo of an interactive network visualization that allows exploration of these newly created links.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 05:
Simplified Web 3.0 Query Portal for Biomedical Research

Justin Lancaster Hydrojoule, LLC, United States

We will report on progress to develop a simpler, generalized web interface to provide a broader group of biomedical researchers useful access to one or more RDF triplestores for biomedical information (Bio2RDF, etc.). The approach "wraps" a SPARQL and/or Virtuoso-type query structure as a tool called by the more user-friendly query interface. The build employs a lightweight client, using JS, JS frameworks and JSON for exchange, with one or more servers providing backend routings and intermediary, user-specific storage (node.JS and a specialized graph-store DB). This is a first step on a path toward a longer-term prototype development to create hypothesis generation related to the research query and related to the first rounds of data returned from the triplestore.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 06:
A Minimal Governance Layer for the Web of Linked Data

- Click here for PDF of poster

Alexander Grüneberg, University of Alabama at Birmingham, United States
Jonas Almeida, University of Alabama at Birmingham, United States

The absence of governance engines capable of governing semantic web constructs by responding solely to embedded RDF (Resource Description Framework) descriptions of the governance rules remains one of the major obstacles to a web of linked data that can traverse domains with modulated privacy. This is particularly true when the governance is as fluid as the data models described by the RDF assertions. In part based on our previous experience developing and using S3DB (, we have realized that the most scalable solutions will place minimalist requirements on the governance mechanism and will maximize the use of existing dereferencing conventions. Accordingly, we propose a generalized rule-based authorization system decoupled from services exposed over HTTP.

The authorization rules are written using JSON-LD (JavaScript Object Notation for Linked Data), apply to resources identified by URIs, and can be dynamically inserted and removed. The solution found uses a simple inheritance model in which resources inherit rules from their parent resource. This procedure seeks to advance the recently proposed Linked Data Platform ( by weaving it with widely supported web authentication standards. The management of sensitive patient derived molecular pathology data is used as an illustrative case study where the proposed solution is validated. The mediating governance engine is made available as an open source NodeJS module at

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
***F1000 POSTER PRIZE***

Poster 07:
BAO Reveal: Assay Analysis Using the BioAssay Ontology and Open PHACTS

- Click here for PDF of poster

Simon Rakov, AstraZeneca, United States
Linda Zander Balderud, AstraZeneca, Sweden
Tom Plasterer, AstraZeneca, United States

Biochemical assay data is complex and highly variable. It contains attributes and terms that are not applicable to all assays, and uses controlled vocabularies inconsistently. Sometimes relevant data is entered in the wrong attribute, or appears in a different document altogether. The complexity and inconsistency of assay data creates challenges for those who wish to query assay data for a given set of attributes, such as the technology chosen to conduct a given assay.

The BioAssay Ontology (BAO) is an ontology developed by the University of Miami and extended by the Open PHACTS consortium, itself part of the European Union’s Innovative Medicine Initiative (IMI). The purpose of the BAO is to standardize how we represent assay data. It incorporates many standard public ontologies, importing sections of the NCBI taxonomy, Uniprot, the Unit Ontology, the Ontology of Biomedical Investigation and the Gene Ontology, among others. More than 900 PubChem assays have been annotated according to BAO.

We have converted over 400 AstraZeneca primary HTS assays into BAO format in order to evaluate whether this common model can improve project success analyses based on assay technologies, help us to better understand the impact of technology artifacts such as frequent hitters, and improve our ability to employ data mining methodologies against assay data. We have created static visualizations that combine our internal data with the annotated PubChem assays. Most recently, this project has created a dynamic interface, “BAO Reveal,” for querying and visualizing BAO data.

Our frequent hitter analysis methodology has found twice as many frequently-hitting assays when assay data is structured using BAO than with previous methods that did not have the granularity of BAO. This has suggested improvements to the data capture process from these assays. The dynamic faceting features and linked biochemical information in BAO Reveal provide researchers ways to investigate the underlying causes of broad assay patterns. This will allow us to focus assay development efforts on the most promising approaches.

BAO Reveal facilitates identification of screening technologies used for similar targets and helps analyze the robustness of a specific assay technology for a biological target. lt can identify screening data to confirm assay reproducibility, and also assist frequent hitter analysis. As a linked data application built on Open PHACTS methodologies and other semantic web standards, BAO Reveal is well positioned for exploitation in multiple directions by multiple communities.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 08:
A High Level API for Fast Development of High Performance Graphic Analytics
on GPUs

- Click here for PDF of poster

Zhisong Fu, SYSTAP LLC., United States

High performance graph analytics are critical for a long list of application domains, ranging from social networks, information systems, security, biology, healthcare and life sciences. In recent years, the rapid advancement of many-core processors, in particular graphical processing units (GPUs), has sparked a broad interest in developing high performance graph analytics on these architectures. However, the single instruction multiple thread (SIMT) architecture used in GPUs places particular constraints on both the design and implementation of graph analytics algorithms and data structures, making the development of such programs difficult and time-consuming.

We present an open source library (MPGraph) that provides a high level abstraction which makes it easy to develop high performance graph analytics on massively parallel hardware. This abstraction is based on the Gather-Apply-Scatter (GAS) model as used in GraphLab.  To deliver high performance computation and efficiently utilize the high memory bandwidth of GPUs, the underlying CUDA kernels use multiple sophisticated strategies, such as vertex-degree-dependent dynamic parallelism granularity and frontier compaction. Our experiments show that for many graph analytics algorithms, an implementation, with our abstraction, is up to two order of magnitude faster than parallel CPU implementations on up 24 CPU cores and has performance comparable to a state-of-the-art manually optimized GPU implementation. In addition, with our abstraction, new algorithms can be implemented in a few hours that fully exploit the data-level parallelism of the GPU and offer throughput of up to 3 billion traversed edges per second on a single GPU.  We will explain the concepts behind the high-level abstraction and provide a starting point for people who want to write high throughput analytics.

MPGraph is now in its second release.  Future work will extend the platform to multi-GPU workstations and GPU compute clusters.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
***F1000 POSTER PRIZE***

Poster 09:
A Nanopublication Framework for Systems Biology and Drug Repurposing

Jim McCusker, Rensselaer Polytechnic Institute, United States
Kusum Solanki, Rensselaer Polytechnic Institute, United States
Cynthia Chang, Rensselaer Polytechnic Institute, United States
Michel Dumontier, Stanford University, United States
Jonathan Dordick, Rensselaer Polytechnic Institute, United States
Deborah McGuinness, Rensselaer Polytechnic Institute, United States

Systems biology studies interactions between proteins, genes, drugs, and other molecular entities. A number of databases have been developed that serve as a patchwork across the landscape of systems biology, focusing on different experimental methods, many species, and a wide diversity of inclusion criteria. Systems biology has been used in the past to generate hypotheses for drug effects, but has become fragmented under the large number of disparate and disconnected databases. In our efforts to create a systematic approach to discovering new uses for existing drugs, we have developed Repurposing Drugs with Semantics (ReDrugS). Our ReDrugS framework can accept data from nearly any database that contains biological or chemical entity interactions. We represent this information as sets of nanopublications, fine-grained assertions that are tied to descriptions of their attribution and supporting provenance. These nanopublications are required to have descriptions of the experimental methods used to justify their assertions. By inferring the probability of truth from those experimental methods, we are able to create consensus assertions, along with a combined probability. Those consensus assertions can be searched for via a set of Semantic Automated Discovery and Integration (SADI) web services, which are used to drive a demonstration web interface. We then show how associations between exemplar drugs and cancer-driving genes can be explored and discovered. Future work will incorporate protein/disease associations, perform hypothesis generation on indirect drug targets, and test the resulting hypotheses using high throughput drug screening.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 10:
An Interoperable Framework for Biomedical Image Retrieval and Knowledge Discovery

Syed Ahmad Chan Bukhari, University of New Brunswick Canada, Canada
Mate Levente Nagy, Yale University, United States
Paolo Ciccarese, Harvard University, United States
Artjom Klein, IPSNIP Computing Inc., Canada
Michael Krauthammer, Yale University, United States
Christopher Baker, University of New Brunswick, Canada

Biomedical images have an irrefutably central role in life science discoveries. Ongoing challenges associated with knowledge management and utility operations unique to biomedical image data are only recently gaining recognition. Making biomedical image content explicit is essential with regards to medical decision making such as diagnosis, treatment plans, follow-up, data management, data reuse for biomedical research and the assessment of care delivery. In our previous work, we have developed Yale Image finder (YIF), which is a novel BioMedical image search engine that indexes around two million biomedical image data, along with associated meta-data. While YIF is considered to be a veritable source of easily accessible biomedical images, there are still a number of usability and interoperability challenges that have yet to be addressed, including provenance and cross platform accessibility.

To overcome these issues and to accelerate the adoption of YIF for next generation biomedical applications, we have developed a publicly accessible Biomedical Image API with multiple modalities. The core API is powered by a dedicated semantic architecture that exposes Yale Image Finder (YIF) content as linked data, permitting integration with related information resources and consumption by linked data-aware data services. We have established a protocol to transform image data according to linked open data recommendations, and exposed it through a SPARQL endpoint and linked data explorer.

To facilitate the ad-hoc integration of image data with other online data resources, we built semantic web services, such that it is compatible with the SADI semantic web service framework. The utility of the combined infrastructure is illustrated with a number of compelling use cases and further extended through the incorporation of Domeo, a well known tool for open annotation. Domeo facilitates enhanced search over the images using annotations provided through crowdsourcing. In the current configuration, our triplestore holds more than thirty-five million triples and can be accessed and operated through syntactic or semantic solutions. Core features of the framework, namely: data reusability, system interoperability, semantic image search, automatic update and dedicated semantic infrastructure make system a state of the art resource for image data discovery and retrieval.
A demo can be accessed at:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 11:
HYDRA8: A Graphical Query Interface for SADI Semantic Web Services

Christopher J. O. Baker, CEO
IPSNP Computing Inc.
Saint John, NB, Canada

HYDRA is a high-performance query engine operating on networks of SADI services representing various distributed resources. Here we present an intuitive end user-oriented querying and data browsing tool is designed construct SPARQL queries for issue to the HYDRA back end. The current prototype permits users to (i) enter key phrases and generate suitable query graphs corresponding to the key phrases; (ii) select between the suggested graphs; (iii) extend the automatically suggested query graphs manually, adding new relations and entities according to the current semantic data schema (predicate map); (iv) run a query with Hydra and see the results in a tabular form. A demo will be available at the conference .