Conference on Semantics in Healthcare and Life Sciences

TUTORIAL

Updated March 03, 2014


Those attending the Callimachus Tutorial on Wednesday, February 26th, have three ways to maximize their experience.  The tutorial will be interactive and projected to the room so everyone can follow along.

1.  Those wishing to explore the code at their own pace are welcome to install Callimachus Open Source from http://callimachusproject.org.
2.  Tutorial attendees are also being given the opportunity to get access to their very own copy of Callimachus Enterprise running on Rackspace Cloud. 
3.  Anyone who wishes to write Callimachus applications using Callimachus Enterprise during the conference should send email to This email address is being protected from spambots. You need JavaScript enabled to view it. and a cloud instance will be established for them. 

We look forward to seeing you at the tutorial!

...............................................................................................................

Tutorial – Wednesday, February 26

1:00 pm – 5:00 pm  

*Please note the tutorial is includes as part of the conference registration fee. Delegates wishing to attend should select the tutorial during registration.

Enterprise and Scientific Data Sharing Using Callimachus

Click here for presentation slides

Callimachus Enterprise is an innovative solution to build and host Web applications combining enterprise and public data on the cloud.  Applications built on Callimachus have specific benefits for scientific data sharing.

This tutorial will introduce common issues with existing enterprise data sharing and show how Callimachus can help.  Callimachus’ particular benefits to scientific enterprises will be highlighted, such as on-the-fly data integration from multiple sources, and Callimachus applications aimed at scientists such as presentation material reuse, tracking and attribution.

Attendees will be led through the creation of simple Callimachus applications to include:
- Data integration from multiple sources
- Data re-use
- Development of visualizations
- Leveraging data in the Web, including big data sets.

Callimachus Enterprise is based upon The Callimachus Project, an Open Source Software project which limits vendor lock in and encourages community involvement.  Callimachus implements Web standards to ensure it plays well with others.

**Participants should have some understanding of relational databases and enterprise in general would be helpful.  Knowledge of semantics is not necessary as the tutorial will cover the very basics.

Presenter:
Dr. David Wood is CTO of enterprise software vendor 3 Round Stones Inc.  He has contributed to the evolution of the World Wide Web since 1999, especially in the formation of standards and technologies for the Semantic Web. He has architected key aspects of the Web to include the Persistent Uniform Resource Locator (PURL) service and several Semantic Web databases and frameworks including the Callimachus Project. He has represented international organizations in the evolution of Internet standards at the International Standards Organization (ISO), the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C) and is currently chair of the W3C RDF Working Group.  David is the author of Programming Internet Email (O’Reilly, 1999) and Linked Data: Structured data on the Web (Manning, 2013) and editor of Linking Enterprise Data (Springer, 2010) and Linking Government Data (Springer, 2011).


TOP

 

Conference on Semantics in Healthcare and Life Sciences

PHOTOS


Robert Stanley, CEO, IO Informatics presenting the IO Informatics Best Poster Prize to Syed Amad Chan Bukhari, University of New Brunswick, St. John, An Interoperable Framework for Biomedical Image Retrieval and Knowledge Discovery
.................................................................................................
F1000 Poster Prize co-winners Tom Plasterer, AstraZeneca and Simon Rakov, AstraZeneca, BAO Reveal: Assay Analysis Using the BioAssay Ontology and Open PHACTS
.................................................................................................
IO Informatics Best Paper Prize winner Aftab Iqbal, Insight, Centre for Data Analytics, NUI Galway, GenomeSnip: Fragmenting the Genomic Wheel to Augment Discovery in Cancer Research with CSHALS Co-Chair Jonas Almeida, University of Alabama at Birmingham, CSHALS Co-Chair Ted Slater, YarcData, Robert Stanley, CEO IO Informatics, and CSHALS Program Committee Chair Chris Baker, University of New Brunswick, St. John, and IPSNP Computing Inc.
.................................................................................................
F1000 Poster Prize winner Jim McCusker, Rensselaer Polytechnic Institute, and 5AM Solutions, Inc. A Nanopublication Framework for Systems Biology and Drug Repurposing
.................................................................................................
 
 Thank you to our sponsors!
 .................................................................................................
 
ISCB Executive Director Diane Kovats welcoming delegates to CSHALS 2014
 .................................................................................................
 
Foundation Medicine Student Travel Fellowship Awardees
from left to right:  Syed Bukhari, University of New Brunswick, St. John, Helena Deus, CSHALS Organizing Committee Member and Senior Scientist, Foundation Medicine, Justin Lancaster, kwiKBio, Aftab Iqbal, Insight, Centre for Data Analytics, NUI Galway, David Mason, Concordia, Ryota Yamanaka, University of Tokyo, HyeongSik Kim, North Carolina State University. Not shown:  Arash Shaban-Nejab
  .................................................................................................

top


 

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

PRESENTERS

Updated March 19, 2014


Indicates that presentation slides or other resources are available.

Links within this page (in alphabetical order):

-->Thursday, February 27

HEALTH ANALYTICS

Time: 10:05 a.m. – 10:55 a.m.

Lattices for Representing and Analyzing Organogenesis

Click here for PDF of presentation

Chimezie Ogbuji, Metacognition LLC, United States
Rong Xu, Case Western University, United States

A systems-based understanding of the molecular processes and interactions that drive the development of the human heart, and other organs, is an important component of improving the treatment of common congenital defects. In this work, we present an application of Formal Concept Analysis (FCA) on molecular interaction networks obtained from the integration of two central biomedical ontologies (the Gene Ontology and Foundational Model of Anatomy) and a subset of a cross-species anatomy ontology. We compare the formal concept lattice produced by our method against a cardiac developmental (CD) protein database to verify the biological significance of the structure of the resulting lattice. Our method provides a unique and unexplored computational framework for understanding the molecular processes underlying human anatomy development.

[Top] [Full Agenda]

...............................................................................................................

A Semantic-driven Surveillance Model to Enhance Population Health Decision-making

Click here for PDF of presentation

Anya Okhmatovskaia, McGill University, Canada
Arash Shaban-Nejad, McGill University, Canada
Maxime Lavigne, McGill University, Canada
David L. Buckeridge, McGill University, Canada

The Population Health Record (PopHR) is a web-based software infrastructure that retrieves and integrates heterogeneous data from multiple sources (administrative, clinical, and survey) in near-real time and supports intelligent analysis and visualization of these data to create a coherent portrait of population health through a variety of indicators. Focused on improving population health decision-making, PopHR addresses common flaws of existing web portals with similar functionality, such as the lack of structure in presenting available indicators, insufficient transparency of computational algorithms and underlying data sources, overly complicated user interface, and poor support for linking different indicators together to draw conclusions about population health. PopHR presents population health indicators in a meaningful way, generates results that are contextualized by public health knowledge, and interacts with the user through a simple and intuitive natural language interface.

[Top] [Full Agenda]
...............................................................................................................

INTEROPERABILITY IN HEALTHCARE

Time: 2:05 p.m. – 2:55 p.m.

Towards Semantic Interoperability of the CDISC Foundational Standards

Click here for PDF of presentation

Scott Bahlavooni, Biogen Idec, United States
Geoff Low, Medidata Solutions, United Kingdom
Frederik Malfait, IMOS Consulting, Switzerland

The Clinical Data Interchange Standards Consortium (CDISC) mission is "to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare." The CDISC Foundational Standards support the clinical research life cycle from protocol development through analysis and reporting to regulatory agencies. Unfortunately, the majority of the Foundational Standards utilized by pharmaceutical, biologic, and device development companies are published in formats that are not machine-processable (e.g., Microsoft Excel, PDF, etc.). The PhUSE Computational Science Symposium, Semantic Technology (ST) project in consultation with CDISC has been working on representing the CDISC Foundational Standards in RDF based on an ISO 11179-type metamodel. Draft RDF representations are available for five of the CDISC Foundational Standards - CDASH, SDTM, SEND, ADaM, and controlled terminology - and have been published on GitHub. In Q1, 2014, these representations will undergo CDISC public review to facilitate their adoption as a CDISC standard and a standard representation of the CDISC Foundational Standards. Additional activities are ongoing to create RDF representations of conformance checks, the CDISC Protocol Representation Model (PRM), and the Study Design Model (SDS-XML). Further activities are planned aroud analysis metadata, linking to EHR, and representing clinical trial data in RDF. The presentation will provide an overview of the different models, their relationships to one another, and how they can be used to manage data standards in an ISO 11179-type metadata registry. It explains how to represent CDISC Foundational Standards in a machine readable format that enables full semantic interoperability. The presentation will also highlight how this work can facilitate a planned RDF export format for the CDISC SHARE environment, a metadata registry currently being developed by CDISC to develop and manage CDISC standards in a consistent way.

[Top] [Full Agenda]

...............................................................................................................

Therapeutic Areas Ontology for Study Data

Eric Prud'Hommeaux, W3C, United States
Charles Mead, W3C, France
Sajjad Hussain, INSERM U1142, France

In December 2012, the FDA solicited advice on how to address major challenges in cross-study data integration. A significant part of the feedback they received was to leverage Semantic Web technologies to capture embedded context for machine processing. They combined this with an already existing effort to standardize a set of around 60 "Therapeutic Areas" (TAs) -- attributes commonly captured for submissions for a particular disease.

In the fall of 2013, FDA asked IBM to provide ontologies for 12 TAs. Initial work on a Renal Transplantation TA has provided a shared ontologies for all TAs grounded in BRIDG, as well as set of emerging practices for efficiently extracting expertise from Subject Matter Experts (SMEs). The resulting test-driven development approach could establish practice for the production of future FDA TAs and, ideally, study design for the long tail of trials not covered by current TAs.

Further, the development of such a corpus of ontologies and the trivial expression of submission data as RDF can effectively create a standard for a clinical study metadata and data repository, subsuming both CDISC SHARE and ODM-based repositories.

[Top] [Full Agenda]

...............................................................................................................

LINKED DATA

Time: 3:20 p.m. – 4:35 p.m.

The RDF Platform of the European Bioinformatics Institute

Click here for PDF of presentation

Andrew M. Jenkinson, European Bioinformatics Institute, United Kingdom
Simon Jupp, European Bioinformatics Institute, United Kingdom
Jerven Bolleman, Swiss Institute of Bioinformatics, Switzerland
Marco Brandizi, European Bioinformatics Institute, United Kingdom
Mark Davies, European Bioinformatics Institute, United Kingdom
Leyla Garcia, European Bioinformatics Institute, United Kingdom
Anna Gaulton, European Bioinformatics Institute, United Kingdom
Sebastien Gehant, Swiss Institute of Bioinformatics, Switzerland
Camille Laibe, European Bioinformatics Institute, United Kingdom
James Malone, European Bioinformatics Institute, United Kingdom
Nicole Redaschi, Swiss Institute of Bioinformatics, Switzerland
Sarala M. Wimalaratne, European Bioinformatics Institute, United Kingdom
Maria Martin, European Bioinformatics Institute, United Kingdom
Helen Parkinson, European Bioinformatics Institute, United Kingdom
Ewan Birney, European Bioinformatics Institute, United Kingdom

The breadth of diversity of data in the life sciences that is available to support research is a valuable asset. Integrating these complex and disparate data does however present a challenge. The Resource Description Framework (RDF) technology stack offers a mechanism for storing, integrating and querying across such data in a flexible and semantically accurate manner, and is increasingly being used in this domain. However the technology is still maturing and has a steep learning curve. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. In order to support further adoption of RDF for molecular biology, the EBI RDF platform (https://www.ebi.ac.uk/rdf) has therefore been developed. In addition to coordinating RDF activities across the institute, the platform provides a new entry point to querying and exploring integrated resources available at the EBI.

[Top] [Full Agenda]

...............................................................................................................

PubChemRDF: Towards a semantic description of PubChem

Gang Fu, National Center for Biotechnology Information, United States
Bo Yu, National Center for Biotechnology Information, United States
Evan Bolton, National Center for Biotechnology Information, United States

PubChem is a community driven chemical biology resource containing information about the biological activities of small molecules located at the National Center for Biotechnology Information (NCBI). With over 250 contributors, PubChem is a sizeable resource with over 125 million chemical substance descriptions, 48 million unique small molecules, and 220 million biological activity results. PubChem integrates this information with other NCBI internal resources (such as PubMed, Gene, Taxonomy, BioSystems, OMIM, and MeSH) but also external resources (such as KEGG, DrugBank, and patent documents).

Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. These technologies include the Resource Description Framework (RDF). RDF is a family of World Wide Web Consortium (W3C) specifications used as a general method for concept description. The RDF data model can encode semantic descriptions in so-called triples (subject-predicate-object). For example, in the phrase “atorvastatin may treat hypercholesterolemia,” the subject is “atorvastatin,” the predicate is “may treat,” and the object is “hypercholesterolemia.”

The PubChemRDF project provides a great wealth of PubChem information in RDF format. One of the aims is to help researchers work with PubChem data on local computing resources using semantic web technologies. Another aim is to harness ontological frameworks to help facilitate PubChem data sharing, analysis, and integration with resources external to NCBI and across scientific domains.

This talk will give an overview of the recently released PubChemRDF project scope and some examples of its use.

[Top] [Full Agenda]

...............................................................................................................

Evolving from Data-analytics to Data-applications: How Modular Software Environments Built on Semantic Linked-Data Enable a New Generation of Collaborative Life-Science Applications

David King, Exaptive Inc., United States
Robert McBurney, The Accelerated Cure Project, United States

This talk proposes a new approach to solving one of the paramount challenges of gaining valuable insight from complex data environments like those found in the life sciences – the integration of not only distributed and disparate data, but also the integration of diverse, specialized, and often siloed scientists and their analytical tools. Traditional tightly-coupled architectures, relying on data-warehousing and monolithic software, are ill-equipped to handle a diverse and dynamic landscape of data, users and tools. Fragmented data cannot be leveraged by equally fragmented software. A successful approach must dynamically link scientists and their data and facilitate their cross-disciplinary collaboration. This talk will explore both the technical details for implementing architectures that accomplish this though linked-data and linked-visualization, and will show how this approach is being leveraged by The Accelerated Cure Project and the Orion Bionetworks Alliance in their efforts to lower the barriers to collaborative Multiple Sclerosis research. The combination of Object-Oriented and Flow-Based programming with Semantic Data standards and recent HTML5 Web advances creates the potential for a new breed of life-science analytics applications. Applications built from this style of combinatorial architecture achieve a loosely-coupled modularity that allows them to leverage the wide variety of components present in complex data systems instead of being hindered by it. By treating datasets more modularly, these applications give liquidity to otherwise frozen and siloed data. By treating algorithms and visualizations as data-agnostic modules, these applications lower the barriers to creative experimentation, a key component to hypothesis generation and insight. By leveraging recent advances in HTML5 Web capabilities, these techniques can launch in web-based, highly cohesive exploratory interfaces that can be rapidly deployed, evolved, and redeployed to a distributed and dynamic workforce of collaborating scientists, analysts, and subject matter experts. The technical details of this talk will be balanced by two real-world use-cases: how it is being applied by The Accelerated Cure Project as a next-generation interface to their unique biosample repository, and how it is being applied by the Orion Bionetworks Alliance as a means to link and visualize data from multiple large-scale clinical studies. A goal of the talk will be to strike a balance between the macro and micro views of modular data-application development to provide attendees with two sets of useful takeaways. At the macro level, the audience will be exposed to a generic architecture for combinatorial interactive interface development that can inform some of their future design decisions. At the micro level, the audience will be introduced to some specific linked-data and visualization techniques - exploratory interfaces that are both thought-provoking and inspiring for the attendees’ future work.

[Top] [Full Agenda]



-->Friday, February 28

SEMANTICS IN PHARMA

Time: 10:05 a.m. – 10:55 a.m.

The Open PHACTS Project:Progress and Future Sustainability

Click here for PDF of presentation

Tom Plasterer, AstraZeneca, United States
Lee Harland, Connected Discovery, United Kingdom
Bryn Williams-Jones, Connected Discovery, United Kingdom

The Open PHACTS project, an Innovative Medicines Initiative funded effort, has delivered a platform designed to reduce barriers to drug discovery and accelerate collaboration within and outside of biopharma. It has done so by building a community of life science and computational researchers around a use-case driven model, taking advantage of existing community efforts while pushing life science content providers towards the use and provisioning of linked data. The project is pursuing an ENSO (explore new scientific opportunities) call to expand the available content set and look at new distribution models. It is also looking to complete the project in the fall of 2014 and to turn these successes into a self-sustaining foundation, the Open PHACTS Foundation (OPF). We’ll provide a travelogue of this journey along with a look into some of the major contributions, from the Open PHACTS API to the exemplar applications built on top of this platform. From there a snapshot of the impact of Open PHACTS at AstraZeneca will be discussed, from the vantage point of characterising internal and external biochemistry assays using the BioAssay Ontology and Open PHACTS assay content.

[Top] [Full Agenda]
...............................................................................................................

Safety Information Evaluation and Visual Exploration (“SIEVE”)


Click here for PDF of presentation

Suzanne Tracy, AstraZeneca, United States
Stephen Furlong, AstraZeneca, United States
Robert Stanley, IO Informatics, United States
Peter Bogetti, AstraZeneca, United States
Jason Eshleman, IO Informatics, United States
Michael Goodman, AstraZeneca, United States

AstraZeneca (“AZ”) Patient Safety Science wanted to improve retrieval of clinical trial data and biometric assessments across studies. Traditionally, evaluation of clinical trials data across studies required manual intervention to deliver desired datasets. A proposal titled Safety Information Evaluation and Visual Exploration (“SIEVE”) was sponsored by Patient Safety Science. This took the form of collaboration between AZ and IO Informatics (“IO”). AZ provided the project environment, data resources, subject matter expertise (“SME”) and business expertise. IO provided semantic software, data modeling and integration services including solutions architecture, knowledge engineering and software engineering. The project goal was to improve search and retrieve of clinical trials data. SIEVE was to provide a web-based environment suitable for cross-study analysis. The environment was to align across biomarkers, statistics and bioinformatics groups. Over-arching goals included decision-making for biomarker qualification, trial design, concomitant medication analysis and translational medicine. The team analyzed approximately 42,000 trials records, identified by unique subjectIDs. IO’s Knowledge Explorer software was used by IO’s knowledge engineers in collaboration with AZ’s SMEs to explore the content of these records as linked RDF networks. Reference metadata such as studyID, subjectID, rowID, gender and DoB was central to assuring valid integration. Because almost all docs had both, subjectID and studyID, concatenation of these items as an individual identifier allowed connections that bridged multiple documents for data traversal. 36,000 records contained valid data, each including a unique trial, patient, and at least one row of laboratory data. IO created a semantic data model or “application ontology” to meet SIEVE requirements. The resulting data model and instances were harmonized by application of SPARQL-based rules and inference and were aligned with AZ standards. Data was integrated under this ontology, loaded into a semantic database and connected to IO’s “Web Query” software. The result is a web-based User Interface accessible to end users for cross-study searching, reporting, charting and sub-querying. Methods include “Quick Search” options, shared searches and query building containing nesting, inclusion / exclusion, ranges, etc. Advanced Queries are presented as filters for user entry to search subjects (“views” or “facets”) including Clinical Assays, Therapy Areas, Adverse Events and Subject Demographics. Reports include exporting, charting, hyperlink mapping and results-list based searches. Results include reduced time to evaluate data from clinical trials and to facilitate forward looking decisions relevant to portfolios. Alternatives are less efficient. Trial data could previously be evaluated within a study; however there was no method to evaluate trials data across studies without manual intervention. Semantic technologies applied for data description, manipulation and linking provided mission-critical value. This was particularly apparent for integration and harmonization, in light of differences discovered across resources. IO’s Knowledge Explorer applied data visualization and manipulation, application of inference and SPARQL-based rules to RDF creation. This resulted in efficient data modeling, transformation, harmonization and integration and helped assure a successful project.

[Top] [Full Agenda]

...............................................................................................................

DISCOVERY INFRASTRUCTURE

Time: 2:05 p.m. – 4:00 p.m.

Scalable Ontological Query Processing over Semantically Integrated Life Science Datasets using MapReduce

Click here for PDF of presentation

Hyeongsik Kim, North Carolina State University, United States
Kemafor Anyanwu, North Carolina State University, United States

While addressing the challenges of join-intensive Semantic Web workloads has been a key research focus, processing disjunctive queries has not. Disjunctive queries arise frequently in scenarios when querying heterogeneous datasets that have been integrated using rich and complex ontologies such as in biological data warehouses, UniProt, Bio2RDF, and Chem2Bio2RDF, etc. Here, the same or similar concepts may be modeled in different ways in the different datasets. Therefore, several disjunctions are often required in queries to describe the different related expressions relevant for a query, either included explicitly or generated as part of an inferencing-based query expansion process. Often, the width (\#branches) of such disjunctive queries can be large making them expensive to evaluate. They pose particularly significant challenges when cloud platforms based on MapReduce such as Apache Hive and Pig are used to scale up processing, translating to long execution workflows with large amount of I/O, sorting and network traffic costs. This paper presents an algebraic interpretation of such queries that produces query rewritings that are more amenable to efficient processing on MapReduce platforms than traditional relational query rewritings.

[Top] [Full Agenda]

...............................................................................................................

Multi-Domain Collaboration for Web-Based Literature Browsing and Curation

David H. Mason, Concordia University, Canada
Marie-Jean Meurs, Concordia University, Canada
Erin McDonnell, Concordia University, Canada
Ingo Morgenstern, Concordia University, Canada
Carol Nyaga, Concordia University, Canada
Vahé Chahinian, Concordia University, Canada
Greg Butler, Concordia University, Canada
Adrian Tsang, Concordia University, Canada

We present Proxiris, a web-based tool developed at the Centre for Structural and Functional Genomics, Concordia University. Proxiris is an Open Source, easily extensible annotation system that supports teams of researchers in literature curation on the Web via a browser proxy. The most important Proxiris features are iterative annotation refinement using stored documents, Web scraping, strong search capabilities, and a team approach that includes specialized software agents. Proxiris is designed in a modular way, using a collection of Free and Open Source components best suited to each task.

[Top] [Full Agenda]

..............................................................................................................

Improving Scientific Information Sharing by Fostering Reuse of Presentation Material

Bernadette Hyland, 3 Round Stones Inc., United States
David Wood, 3 Round Stones Inc., United States
Luke Ruth, 3 Round Stones Inc., United States
James Leigh, 3 Round Stones Inc., United States

Most scientific developments are recorded in published papers and communicated via presentations. Scientific findings are presented within organizations, at conferences, via Webinars and other fora. Yet after delivery to an audience, important information is often left to wither on hard drives, document management systems and even the Web. Accessing data underlying scientific findings has been the Achilles Heel of researchers due to closed and proprietary systems. Because there is no comprehensive ecosystem for scientific findings, important research and discovery is repeatedly performed and communicated over and over. In an ideal world, published papers and presentations would be the start of an ongoing dialogue with peers and colleagues. By definition this dialogue spans geographical boundaries and therefore must involve a global network, international data exchange standards, a universal address scheme and provision for open annotations. Security and privacy are key concerns in discussing and sharing scientific research. 3 Round Stones has created a collaboration system that is platform-agnostic relying on the architecture and standards of the Web using the Open Source Callimachus Linked Data Management system. The collaboration system facilitates the full ecosystem for sharing and annotating scientific findings. It includes modern social network capabilities (e.g., bio, links to social media sites, contact information), Open Annotation support for associating distinct pieces of information, and metrics on content usage. Papers and presentations can be annotated and iterated using a familiar review cycle involving authors, editors and peer reviewers. Proper attribution for content is handled automatically. The 3 Round Stones collaboration system has the potential to foster discovery, access and reuse of scientific findings.

[Top] [Full Agenda]

...............................................................................................................

***IO INFORMATICS BEST PAPER PRIZE***

GenomeSnip: Fragmenting the Genomic Wheel to Augment Discovery in Cancer Research

Click here for PDF of presentation

Maulik R. Kamdar, National University of Ireland, Ireland
Aftab Iqbal, National University of Ireland, Ireland
Muhammad Saleem, Universität Leipzig, Germany
Helena F. Deus, Foundation Medicine Inc., United States
Stefan Decker, National University of Ireland, Ireland

Cancer genomics researchers have greatly benefitted from high-throughput technologies for the characterization of genomic alterations in patients. These voluminous genomics datasets when supplemented with the appropriate computational tools have led towards the identification of 'oncogenes' and cancer pathways. However, if a researcher wishes to exploit the datasets in conjunction with this extracted knowledge his cognitive abilities need to be augmented through advanced visualizations. In this paper, we present GenomeSnip, a visual analytics platform, which facilitates the intuitive exploration of the human genome and displays the relationships between different genomic features. Knowledge, pertaining to the hierarchical categorization of the human genome, oncogenes and abstract, co-occurring relations, has been retrieved from multiple data sources and transformed a priori. We demonstrate how cancer experts could use this platform to interactively isolate genes or relations of interest and perform a comparative analysis on the 20.4 billion triples Linked Cancer Genome Atlas (TCGA) datasets.


[Top] [Full Agenda]

Conference on Semantics in Healthcare and Life Sciences

FULL AGENDA

Updated March 04, 2014


go directly to :[Thursday - February 27] [Friday - February 28]

Indicates that presentation slides or other resources are available.

 

WEDNESDAY, February 26
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TIME EVENT LOCATION
11:00 a.m. - 1:00 p.m. Registration Carver Foyer
1:00 p.m. - 3:00 p.m. Tutorial:
Enterprise and Scientific Data Sharing using Callimachus
Presenter: Dr. David Wood,
3 Round Stones Inc.

Click here for presentation slides
Carver 1
3:00 p.m. - 3:15 p.m. Break Carver Foyer
3:15 p.m. - 5:00 p.m. Tutorial continues Carver 1
4:00 p.m. - 7:00 p.m. Registration Carver Foyer
4:00 p.m. - 5:00 p.m. Poster (Author) Set-up Carver 2 & 3
5:00 p.m. - 6:30 p.m. Poster Reception Carver 2 & 3

THURSDAY, February 27
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TIME EVENT LOCATION
7:30 a.m. - 10:00 a.m. Registration Carver Foyer
8:00 a.m. - 8:45 a.m. Breakfast (continental) Carver 2 & 3
8:45 a.m. - 9:00 a.m. Welcome & Overview
- Bonnie Berger
ISCB Board Member
- Jonas Almeida
Conference Chair
- Ted Slater
Conference Chair
Carver 1
9:00 a.m. - 10:00 a.m. Keynote Presentation
RON COLLETTE

The Carpenter to the Screw: Well, I Have a Hammer.
You Must Be a Nail
Carver 1
10:05 a.m. - 10:55 a.m. Health Analytics
Lattices for Representing and Analyzing Organogenesis
Presenters: Chimezie Ogbuji, Metacognition LLC, United States
Rong Xu, Case Western University, United States

Click here for PDF of presentation
Carver 1
A Semantic-driven Surveillance Model to Enhance Population Health Decision-making
Presenters: Anya Okhmatovskaia, Arash Shaban-Nejad, and David L. Buckeridge (28)

Click here for PDF of presentation
Carver 1
10:55 a.m. - 11:15 a.m. Break Carver 2 & 3
11:15 a.m. - 11:35 a.m. Tech Talk 1
High Availability and Graph Mining using bigdata® - an Open-Source Graph Database
Presenter: Bryan Thompson, SYSTAP LLC

Click here for PDF of presentation.
Carver 1
11:40 p.m. - 12:00 p.m. Tech Talk 2
Successfully Navigating Diagnosis And Treatment In The Age Of Targeted Cancer Therapy
Presenter: Helena Deus, Foundation Medicine
Carver 1
12:00 p.m. - 1:00 p.m. Lunch Carver 2 & 3
1:00 p.m. - 2:00 p.m. Keynote Presentation
DEBORAH MCGUINNESS

Towards Semantic Health Assistants

Click here for PDF of presentation
Carver 1
2:05 p.m. - 2:55 p.m. Interoperability in Healthcare
Towards Semantic Interoperability of the CDISC Foundational Standards
Presenters: Scott Bahlavooni, Geoff Low and Frederik Malfait

Click here for PDF of presentation
Carver 1
Therapeutic Areas Ontology for Study Data
Presenters: Eric Prud'Hommeaux, Charles Mead and Sajjad Hussain (22)
Carver 1
2:55 p.m. - 3:20 p.m. Break Carver 2 & 3
3:20 p.m. - 4:35 p.m Linked Data
The RDF Platform of the European Bioinformatics Institute
Presenters: Andrew M. Jenkinson, Simon Jupp, Jerven Bolleman, Marco Brandizi, Mark Davies, Leyla Garcia, Anna Gaulton, Sebastien Gehant, Camille Laibe, James Malone, Nicole Redaschi, Sarala M. Wimalaratne, Maria Martin, Helen Parkinson and Ewan Birney (11)

Click here for PDF of presentation
Carver 1
PubChemRDF: Towards a semantic description of PubChem
Presenters: Gang Fu, Bo Yu,
Evan Bolton
Carver 1
Evolving from Data-analytics to Data-applications: How Modular Software Environments Built on Semantic Linked-Data Enable a New Generation of Collaborative Life-Science Applications
Presenters: David King and Robert McBurney (20)
Carver 1
4:35 p.m. - 5:00 p.m. Daily Closing Remarks
Jonas Almeida & Ted Slater, Conference Chairs
Carver 1
FRIDAY, February 28
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TIME EVENT LOCATION
7:30 a.m. - 10:00 a.m. Registration Carver Foyer
8:00 a.m. - 8:45 a.m. Breakfast (continental) Carver 2 & 3
8:45 a.m. - 9:00 a.m. Review - Previous Day
Jonas Almeida & Ted Slater, Conference Chairs
Carver 1
9:00 a.m. - 10:00 a.m. Keynote Presentation
HILMAR LAPP

Semantics of and for the Diversity of Life: Opportunities and Perils of Trying to Reason on the Frontier

Click here for presentation slides
Carver 1
10:05 a.m. - 10:55 a.m. Semantics in Pharma
The Open PHACTS Project:Progress and Future Sustainability
Presenters: Tom Plasterer, Lee Harland and Bryn Williams-Jones (7)

Click here for PDF of presentation
Carver 1
Safety Information Evaluation and Visual Exploration
(“SIEVE”)
Presenters: Suzanne Tracy, Stephen Furlong, Robert Stanley, Peter Bogetti, Jason Eshleman and Michael Goodman (16)


Click here for PDF of presentation
Carver 1
10:55 a.m. - 11:20 a.m. Break Carver 2 & 3
11:20 a.m. - 11:40 a.m. Tech Talk 3
Making Sense of Big Data in Pharma
Presenter: Andreas Matern
Thomson Reuters Life Sciences
 
11:45 a.m. - 12:05 p.m. Tech Talk 4
Using Supercomputer Architecture for Practical Semantic Web Research in the Life Sciences
Presenter: Matt Gianni, YarcData
 
12:05 p.m. - 1:00 p.m. Lunch Carver 2 & 3
1:00 p.m. - 2:00 p.m. Keynote Presentation
BAREND MONS

Semantics Based Biomedical Knowledge Search, Integration and Discovery
Carver 1
2:05 p.m. - 2:55 p.m. Discovery Infrastructure
Scalable Ontological Query Processing over Semantically Integrated Life Science Datasets using MapReduce
Presenters: Hyeongsik Kim and Kemafor Anyanwu (18)

Click here for PDF of presentation
Carver 1
Multi-Domain Collaboration for Web-Based Literature Browsing and Curation
Presenters: David H. Mason, Marie-Jean Meurs, Erin McDonnell, Ingo Morgenstern, Carol Nyaga, Vahé Chahinian, Greg Butler, Adrian Tsang (9)
 
2:55 p.m. - 3:10 p.m. Break Carver 2 & 3
3:10 p.m. - 4:00 p.m. Improving Scientific Information Sharing by Fostering Reuse of Presentation Material
Presenters: Bernadette Hyland, David Wood, Luke Ruth, James Leigh (6)

Click here for presentation slides
Carver 1
***AWARDED BEST PAPER***

GenomeSnip: Fragmenting the Genomic Wheel to Augment Discovery in Cancer Research

Presenters: Maulik R. Kamdar, Aftab Iqbal, Muhammad Saleem, Helena F. Deus and Stefan Decker

Click here for PDF of presentation
Carver 1
4:00 p.m. - 4:15 p.m. Future Actions and Conference Closing Remarks
Jonas Almeida & Ted Slater, Conference Chairs
Carver 1
4:15 p.m. Conference Ends Carver 1

TOP

Conference on Semantics in Healthcare and Life Sciences

POSTER PRESENTATIONS

Updated May 08, 2014


Indicates that the poster is available.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

POSTER DETAILS:



Posters will be on display throughout the conference, beginning 5 p.m. Wednesday, February 26, with a special reception for poster authors to present their work to conference delegates:

  • Poster Set-up: Wednesday – February 26, 4:00 p.m. – 5:00 p.m.
  • Poster Reception: Wednesday – February 26, 5:00 p.m. – 6:30 p.m.

When preparing and designing your poster please note that it should be no larger than 44 inches wide by 44 inches high (there are two posters per side).
  Posters must be removed by 3:00 p.m., Friday, February 28.


POSTERS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 01
A Semantic Computing Platform to Enable Translational and Clinical Omics


Jonathan Hirsch, Syapse, United States
Andro Hsu, Syapse, United States
Tony Loeser, Syapse, United States

To bring omics data from benchtop to point of care, labs and clinics must be able to handle three types of data with very different properties and requirements. The first, biomedical knowledge, is highly complex, continually evolving, and comprises millions of concepts and relationships. The second, medical data including clinical health and outcomes records, is temporal, unstructured, and hard to access. The third, omics data such as whole-genome sequence, is structured but voluminous. Attempts to bridge the three have had limited success. No single data architecture allows efficient querying of these types of data. The lack of scalable infrastructure that can integrate complex biomedical knowledge, temporal medical data, and omics data is a barrier to widespread use of omics data in clinical decisionmaking. Syapse has built a data platform that enables capture, modeling, and query of all three data types, along with applications and programming interfaces for seamless integration with lab and clinical workflows. Using a proprietary, semantic layer based on Resource Description Framework (RDF) and related standards, the Syapse platform enables assembly of a dynamic knowledgebase from biomedical ontologies such as SNOMED CT and OMIM as well as customers’ internally curated knowledge. Similarly, HL7-formatted medical data can be imported and represented as RDF objects. Lastly, Syapse enables federated queries that associate RDF-represented knowledge with omics data while retaining the benefits of scalable storage and indexing techniques. Biologists and clinicians can access the platform through a rich web application layer that enables role-specific customization at any point in the clinical omics workflow. We will describe how biologists and clinicians use Syapse as the infrastructure of an omics learning healthcare system. Clinical R&D performs data mining queries, e.g. selecting patients who share disease and treatment characteristics to identify associations between omics profiles and clinical outcomes. Organizations update clinical evidence such as variant interpretation or pharmacogenetic associations in a knowledgebase that triggers alerts in affected patient records. At point of care, clinical decision support interfaces present internal or external treatment guidelines based on patients’ omics profiles. The Syapse platform is a cloud-based solution that allows labs and clinics to deliver translational and clinical omics with minimal IT resources.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 02:
Optimization of RDF Data for Graph-based Applications


Ryota Yamanaka, The University of Tokyo, Japan
Tazro Ohta, Database Center for Life Science, Japan
Hiroyuki Aburatani, The University of Tokyo, Japan

Various omics datasets are publicly available in RDF via their data repositories or endpoints and this makes it easier to obtain integratable datasets from different data sources. Meanwhile, when we use parts of linked data in our own applications for data analyses or visualization, the data format does not have to be RDF but can be processed into appropriate formats according to usage. In fact, most web applications handle table format data rather than RDF.

Application developers can think of two reasons to convert semantic networks into other data models rather than keeping original RDF in backend databases. One reason is the difficulty of understanding complex RDF schema and writing SPARQL queries, and another reason is that the data model described in RDF is not always optimized for search performance. Consequently, we need practical methods to convert RDF into the data model optimized for each application in order to build an efficient database using parts of linked data.

The simplest method of optimizing RDF data for most applications is to convert it into table format data and store it in relational databases. In this method, however, we need to consider not only table definition but also de-normalization and indices to reduce the cost of table-join operations. As a result, we are focusing on graph databases instead. In graph databases, their data models can naturally describe semantic networks and enable network search operations such as traversal as well as in triplestores.

Although the data models in graph databases are similar in structure to RDF-based semantic networks, they are different in some aspects. For example, in the graph database management system we used in this project, Neo4j, relationships can hold properties, while edges in RDF-based semantic networks do not have properties. We are therefore researching how to fit RDF data to effectively use graph database features for better search performance and more efficient application development.

We have developed tools to convert RDF data (as well as table format data) and loaded sample data into graph database. Currently, we are developing demo applications to search and visualize graph data such as pathway networks. These resources are available at http://sem4j.org/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 03:
A Phenome-Guided Drug Repositioning through a Latent Variable Model


Halil Bisgin, National Center for Toxicological Research, United States
Zhichao Liu, National Center for Toxicological Research, United States    
Hong Fang, National Center for Toxicological Research, United States
Reagan Kelly, National Center for Toxicological Research, United States    
Xiaowei Xu, University of Arkansas at Little Rock, United States    
Weida Tong, National Center for Toxicological Research, United States

The phenome has been widely studied to find the overlaps with genome, which in turn, has identified correlations for diseases. Besides its explanatory power for causalities, the phenome has been also explored for drug repositioning, which is a process of identifying new uses of existing drugs. However, most of current phenome-based approaches limit the search space for candidate drugs with the most similar drug. For a comprehensive analysis of the phenome, we assumed that all phenotypes (indications and side effects) were generated with a probabilistic distribution that can provide the likelihood of new therapeutic indications for a given drug. We employed Latent Dirichlet Allocation (LDA), which introduces latent variables (topics) that were assumed to govern phenome distribution. Hence, we developed our model on the phenome information in Side Effect Resource (SIDER). We first examined the recovery potential of LDA by perturbating the drug-phenotype matrix for each of the 11,183 drug-indication pairs. Those indications were assumed to be unknown and tried to be recovered based on the remaining drug-phenotype pairs. We were able to recover known indications masked during the model development phase with a 70% success rate on a portion of drug-indication pairs (5516 out of 11183) that have probabilities greater than the random chance (p>0.005). After obtaining such a decision criterion that considers both probability and rank, we applied the model on the whole phenome to suggest alternative indications. We were able to retrieve FDA-approved indications of 6 drugs whose indications were not listed in SIDER. For 907 drugs that are present with their indication information, our model suggested at least one alternative treatment option for further investigations. Several of the suggested new uses can be supported with information from the scientific literature. The results demonstrated that the phenome can be further analyzed by a generative model, which can discover the probabilistic associations between drugs and therapeutic uses. In this regard, LDA stands as a promising tool to explore new uses of existing drugs by narrowing down the search space.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 04:
Cross-Linking DOI Author URIs Using Research Networking Systems


- Click here for PDF of poster

Nick Benik, Harvard Medical School, United States
Timothy Lebo, Rensselaer Polytechnic Institute, United States
Griffin Weber, Harvard Medical School, United States

A proof-of-concept application was created to automatically cross-link publications that were written by the same person by harvesting linked open data from institution-based research networking systems. This is important because it (1) helps people identify related articles when exploring the biomedical literature, (2) gives scientists appropriate credit for the work they have done, and (3) makes it easier to find experts in a subject area. Our presentation will include a demo of an interactive network visualization that allows exploration of these newly created links.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 05:
Simplified Web 3.0 Query Portal for Biomedical Research


Justin Lancaster Hydrojoule, LLC, United States

We will report on progress to develop a simpler, generalized web interface to provide a broader group of biomedical researchers useful access to one or more RDF triplestores for biomedical information (Bio2RDF, etc.). The approach "wraps" a SPARQL and/or Virtuoso-type query structure as a tool called by the more user-friendly query interface. The build employs a lightweight client, using JS, JS frameworks and JSON for exchange, with one or more servers providing backend routings and intermediary, user-specific storage (node.JS and a specialized graph-store DB). This is a first step on a path toward a longer-term prototype development to create hypothesis generation related to the research query and related to the first rounds of data returned from the triplestore.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 06:
A Minimal Governance Layer for the Web of Linked Data


- Click here for PDF of poster

Alexander Grüneberg, University of Alabama at Birmingham, United States
Jonas Almeida, University of Alabama at Birmingham, United States

The absence of governance engines capable of governing semantic web constructs by responding solely to embedded RDF (Resource Description Framework) descriptions of the governance rules remains one of the major obstacles to a web of linked data that can traverse domains with modulated privacy. This is particularly true when the governance is as fluid as the data models described by the RDF assertions. In part based on our previous experience developing and using S3DB (http://en.wikipedia.org/wiki/S3DB), we have realized that the most scalable solutions will place minimalist requirements on the governance mechanism and will maximize the use of existing dereferencing conventions. Accordingly, we propose a generalized rule-based authorization system decoupled from services exposed over HTTP.

The authorization rules are written using JSON-LD (JavaScript Object Notation for Linked Data), apply to resources identified by URIs, and can be dynamically inserted and removed. The solution found uses a simple inheritance model in which resources inherit rules from their parent resource. This procedure seeks to advance the recently proposed Linked Data Platform (http://www.w3.org/TR/ldp/) by weaving it with widely supported web authentication standards. The management of sensitive patient derived molecular pathology data is used as an illustrative case study where the proposed solution is validated. The mediating governance engine is made available as an open source NodeJS module at https://github.com/ibl/Bouncer

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
***F1000 POSTER PRIZE***

Poster 07:
BAO Reveal: Assay Analysis Using the BioAssay Ontology and Open PHACTS


- Click here for PDF of poster

Simon Rakov, AstraZeneca, United States
Linda Zander Balderud, AstraZeneca, Sweden
Tom Plasterer, AstraZeneca, United States

Biochemical assay data is complex and highly variable. It contains attributes and terms that are not applicable to all assays, and uses controlled vocabularies inconsistently. Sometimes relevant data is entered in the wrong attribute, or appears in a different document altogether. The complexity and inconsistency of assay data creates challenges for those who wish to query assay data for a given set of attributes, such as the technology chosen to conduct a given assay.

The BioAssay Ontology (BAO) is an ontology developed by the University of Miami and extended by the Open PHACTS consortium, itself part of the European Union’s Innovative Medicine Initiative (IMI). The purpose of the BAO is to standardize how we represent assay data. It incorporates many standard public ontologies, importing sections of the NCBI taxonomy, Uniprot, the Unit Ontology, the Ontology of Biomedical Investigation and the Gene Ontology, among others. More than 900 PubChem assays have been annotated according to BAO.

We have converted over 400 AstraZeneca primary HTS assays into BAO format in order to evaluate whether this common model can improve project success analyses based on assay technologies, help us to better understand the impact of technology artifacts such as frequent hitters, and improve our ability to employ data mining methodologies against assay data. We have created static visualizations that combine our internal data with the annotated PubChem assays. Most recently, this project has created a dynamic interface, “BAO Reveal,” for querying and visualizing BAO data.

Our frequent hitter analysis methodology has found twice as many frequently-hitting assays when assay data is structured using BAO than with previous methods that did not have the granularity of BAO. This has suggested improvements to the data capture process from these assays. The dynamic faceting features and linked biochemical information in BAO Reveal provide researchers ways to investigate the underlying causes of broad assay patterns. This will allow us to focus assay development efforts on the most promising approaches.

BAO Reveal facilitates identification of screening technologies used for similar targets and helps analyze the robustness of a specific assay technology for a biological target. lt can identify screening data to confirm assay reproducibility, and also assist frequent hitter analysis. As a linked data application built on Open PHACTS methodologies and other semantic web standards, BAO Reveal is well positioned for exploitation in multiple directions by multiple communities.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poster 08:
A High Level API for Fast Development of High Performance Graphic Analytics
on GPUs


- Click here for PDF of poster

Zhisong Fu, SYSTAP LLC., United States

High performance graph analytics are critical for a long list of application domains, ranging from social networks, information systems, security, biology, healthcare and life sciences. In recent years, the rapid advancement of many-core processors, in particular graphical processing units (GPUs), has sparked a broad interest in developing high performance graph analytics on these architectures. However, the single instruction multiple thread (SIMT) architecture used in GPUs places particular constraints on both the design and implementation of graph analytics algorithms and data structures, making the development of such programs difficult and time-consuming.

We present an open source library (MPGraph) that provides a high level abstraction which makes it easy to develop high performance graph analytics on massively parallel hardware. This abstraction is based on the Gather-Apply-Scatter (GAS) model as used in GraphLab.  To deliver high performance computation and efficiently utilize the high memory bandwidth of GPUs, the underlying CUDA kernels use multiple sophisticated strategies, such as vertex-degree-dependent dynamic parallelism granularity and frontier compaction. Our experiments show that for many graph analytics algorithms, an implementation, with our abstraction, is up to two order of magnitude faster than parallel CPU implementations on up 24 CPU cores and has performance comparable to a state-of-the-art manually optimized GPU implementation. In addition, with our abstraction, new algorithms can be implemented in a few hours that fully exploit the data-level parallelism of the GPU and offer throughput of up to 3 billion traversed edges per second on a single GPU.  We will explain the concepts behind the high-level abstraction and provide a starting point for people who want to write high throughput analytics.

MPGraph is now in its second release.  Future work will extend the platform to multi-GPU workstations and GPU compute clusters.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
***F1000 POSTER PRIZE***

Poster 09:
A Nanopublication Framework for Systems Biology and Drug Repurposing


Jim McCusker, Rensselaer Polytechnic Institute, United States
Kusum Solanki, Rensselaer Polytechnic Institute, United States
Cynthia Chang, Rensselaer Polytechnic Institute, United States
Michel Dumontier, Stanford University, United States
Jonathan Dordick, Rensselaer Polytechnic Institute, United States
Deborah McGuinness, Rensselaer Polytechnic Institute, United States

Systems biology studies interactions between proteins, genes, drugs, and other molecular entities. A number of databases have been developed that serve as a patchwork across the landscape of systems biology, focusing on different experimental methods, many species, and a wide diversity of inclusion criteria. Systems biology has been used in the past to generate hypotheses for drug effects, but has become fragmented under the large number of disparate and disconnected databases. In our efforts to create a systematic approach to discovering new uses for existing drugs, we have developed Repurposing Drugs with Semantics (ReDrugS). Our ReDrugS framework can accept data from nearly any database that contains biological or chemical entity interactions. We represent this information as sets of nanopublications, fine-grained assertions that are tied to descriptions of their attribution and supporting provenance. These nanopublications are required to have descriptions of the experimental methods used to justify their assertions. By inferring the probability of truth from those experimental methods, we are able to create consensus assertions, along with a combined probability. Those consensus assertions can be searched for via a set of Semantic Automated Discovery and Integration (SADI) web services, which are used to drive a demonstration web interface. We then show how associations between exemplar drugs and cancer-driving genes can be explored and discovered. Future work will incorporate protein/disease associations, perform hypothesis generation on indirect drug targets, and test the resulting hypotheses using high throughput drug screening.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
***IO INFORMATICS BEST POSTER PRIZE***

Poster 10:
An Interoperable Framework for Biomedical Image Retrieval and Knowledge Discovery

Syed Ahmad Chan Bukhari, University of New Brunswick Canada, Canada
Mate Levente Nagy, Yale University, United States
Paolo Ciccarese, Harvard University, United States
Artjom Klein, IPSNIP Computing Inc., Canada
Michael Krauthammer, Yale University, United States
Christopher Baker, University of New Brunswick, Canada

Biomedical images have an irrefutably central role in life science discoveries. Ongoing challenges associated with knowledge management and utility operations unique to biomedical image data are only recently gaining recognition. Making biomedical image content explicit is essential with regards to medical decision making such as diagnosis, treatment plans, follow-up, data management, data reuse for biomedical research and the assessment of care delivery. In our previous work, we have developed Yale Image finder (YIF), which is a novel BioMedical image search engine that indexes around two million biomedical image data, along with associated meta-data. While YIF is considered to be a veritable source of easily accessible biomedical images, there are still a number of usability and interoperability challenges that have yet to be addressed, including provenance and cross platform accessibility.

To overcome these issues and to accelerate the adoption of YIF for next generation biomedical applications, we have developed a publicly accessible Biomedical Image API with multiple modalities. The core API is powered by a dedicated semantic architecture that exposes Yale Image Finder (YIF) content as linked data, permitting integration with related information resources and consumption by linked data-aware data services. We have established a protocol to transform image data according to linked open data recommendations, and exposed it through a SPARQL endpoint and linked data explorer.

To facilitate the ad-hoc integration of image data with other online data resources, we built semantic web services, such that it is compatible with the SADI semantic web service framework. The utility of the combined infrastructure is illustrated with a number of compelling use cases and further extended through the incorporation of Domeo, a well known tool for open annotation. Domeo facilitates enhanced search over the images using annotations provided through crowdsourcing. In the current configuration, our triplestore holds more than thirty-five million triples and can be accessed and operated through syntactic or semantic solutions. Core features of the framework, namely: data reusability, system interoperability, semantic image search, automatic update and dedicated semantic infrastructure make system a state of the art resource for image data discovery and retrieval.
A demo can be accessed at: http://cbakerlab.unbsj.ca:8080/icyrus/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Poster 11:
HYDRA8: A Graphical Query Interface for SADI Semantic Web Services

Christopher J. O. Baker, CEO
IPSNP Computing Inc.
Saint John, NB, Canada

HYDRA is a high-performance query engine operating on networks of SADI services representing various distributed resources. Here we present an intuitive end user-oriented querying and data browsing tool is designed construct SPARQL queries for issue to the HYDRA back end. The current prototype permits users to (i) enter key phrases and generate suitable query graphs corresponding to the key phrases; (ii) select between the suggested graphs; (iii) extend the automatically suggested query graphs manually, adding new relations and entities according to the current semantic data schema (predicate map); (iv) run a query with Hydra and see the results in a tabular form. A demo will be available at the conference .
 


TOP