Conference on Semantics in Healthcare and Life Sciences (CSHALS)

PRESENTERS

Updated March 19, 2014

Indicates that presentation slides or other resources are available.

Links within this page (in alphabetical order):
Scott Bahlavooni Evan Bolton Aftab Iqbal Andrew M. Jenkinson Bernadette Hyland Hyeongsik Kim David King	David H. Mason Chimezie Ogbuji Tom Plasterer Eric Prud'Hommeaux Arash Shaban-Nejad Robert Stanley

-->Thursday, February 27

HEALTH ANALYTICS

Time: 10:05 a.m. – 10:55 a.m.

Lattices for Representing and Analyzing Organogenesis

Click here for PDF of presentation

Chimezie Ogbuji, Metacognition LLC, United States
Rong Xu, Case Western University, United States

A systems-based understanding of the molecular processes and interactions that drive the development of the human heart, and other organs, is an important component of improving the treatment of common congenital defects. In this work, we present an application of Formal Concept Analysis (FCA) on molecular interaction networks obtained from the integration of two central biomedical ontologies (the Gene Ontology and Foundational Model of Anatomy) and a subset of a cross-species anatomy ontology. We compare the formal concept lattice produced by our method against a cardiac developmental (CD) protein database to verify the biological significance of the structure of the resulting lattice. Our method provides a unique and unexplored computational framework for understanding the molecular processes underlying human anatomy development.

[Top] [Full Agenda]

...............................................................................................................

A Semantic-driven Surveillance Model to Enhance Population Health Decision-making

Click here for PDF of presentation

Anya Okhmatovskaia, McGill University, Canada
Arash Shaban-Nejad, McGill University, Canada
Maxime Lavigne, McGill University, Canada
David L. Buckeridge, McGill University, Canada

The Population Health Record (PopHR) is a web-based software infrastructure that retrieves and integrates heterogeneous data from multiple sources (administrative, clinical, and survey) in near-real time and supports intelligent analysis and visualization of these data to create a coherent portrait of population health through a variety of indicators. Focused on improving population health decision-making, PopHR addresses common flaws of existing web portals with similar functionality, such as the lack of structure in presenting available indicators, insufficient transparency of computational algorithms and underlying data sources, overly complicated user interface, and poor support for linking different indicators together to draw conclusions about population health. PopHR presents population health indicators in a meaningful way, generates results that are contextualized by public health knowledge, and interacts with the user through a simple and intuitive natural language interface.

[Top] [Full Agenda]
...............................................................................................................

INTEROPERABILITY IN HEALTHCARE

Time: 2:05 p.m. – 2:55 p.m.

Towards Semantic Interoperability of the CDISC Foundational Standards

Click here for PDF of presentation

Scott Bahlavooni, Biogen Idec, United States
Geoff Low, Medidata Solutions, United Kingdom
Frederik Malfait, IMOS Consulting, Switzerland

The Clinical Data Interchange Standards Consortium (CDISC) mission is "to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare." The CDISC Foundational Standards support the clinical research life cycle from protocol development through analysis and reporting to regulatory agencies. Unfortunately, the majority of the Foundational Standards utilized by pharmaceutical, biologic, and device development companies are published in formats that are not machine-processable (e.g., Microsoft Excel, PDF, etc.). The PhUSE Computational Science Symposium, Semantic Technology (ST) project in consultation with CDISC has been working on representing the CDISC Foundational Standards in RDF based on an ISO 11179-type metamodel. Draft RDF representations are available for five of the CDISC Foundational Standards - CDASH, SDTM, SEND, ADaM, and controlled terminology - and have been published on GitHub. In Q1, 2014, these representations will undergo CDISC public review to facilitate their adoption as a CDISC standard and a standard representation of the CDISC Foundational Standards. Additional activities are ongoing to create RDF representations of conformance checks, the CDISC Protocol Representation Model (PRM), and the Study Design Model (SDS-XML). Further activities are planned aroud analysis metadata, linking to EHR, and representing clinical trial data in RDF. The presentation will provide an overview of the different models, their relationships to one another, and how they can be used to manage data standards in an ISO 11179-type metadata registry. It explains how to represent CDISC Foundational Standards in a machine readable format that enables full semantic interoperability. The presentation will also highlight how this work can facilitate a planned RDF export format for the CDISC SHARE environment, a metadata registry currently being developed by CDISC to develop and manage CDISC standards in a consistent way.

[Top] [Full Agenda]

...............................................................................................................

Therapeutic Areas Ontology for Study Data

Eric Prud'Hommeaux, W3C, United States
Charles Mead, W3C, France
Sajjad Hussain, INSERM U1142, France

In December 2012, the FDA solicited advice on how to address major challenges in cross-study data integration. A significant part of the feedback they received was to leverage Semantic Web technologies to capture embedded context for machine processing. They combined this with an already existing effort to standardize a set of around 60 "Therapeutic Areas" (TAs) -- attributes commonly captured for submissions for a particular disease.

In the fall of 2013, FDA asked IBM to provide ontologies for 12 TAs. Initial work on a Renal Transplantation TA has provided a shared ontologies for all TAs grounded in BRIDG, as well as set of emerging practices for efficiently extracting expertise from Subject Matter Experts (SMEs). The resulting test-driven development approach could establish practice for the production of future FDA TAs and, ideally, study design for the long tail of trials not covered by current TAs.

Further, the development of such a corpus of ontologies and the trivial expression of submission data as RDF can effectively create a standard for a clinical study metadata and data repository, subsuming both CDISC SHARE and ODM-based repositories.

[Top] [Full Agenda]

...............................................................................................................

LINKED DATA

Time: 3:20 p.m. – 4:35 p.m.

The RDF Platform of the European Bioinformatics Institute

Click here for PDF of presentation

Andrew M. Jenkinson, European Bioinformatics Institute, United Kingdom
Simon Jupp, European Bioinformatics Institute, United Kingdom
Jerven Bolleman, Swiss Institute of Bioinformatics, Switzerland
Marco Brandizi, European Bioinformatics Institute, United Kingdom
Mark Davies, European Bioinformatics Institute, United Kingdom
Leyla Garcia, European Bioinformatics Institute, United Kingdom
Anna Gaulton, European Bioinformatics Institute, United Kingdom
Sebastien Gehant, Swiss Institute of Bioinformatics, Switzerland
Camille Laibe, European Bioinformatics Institute, United Kingdom
James Malone, European Bioinformatics Institute, United Kingdom
Nicole Redaschi, Swiss Institute of Bioinformatics, Switzerland
Sarala M. Wimalaratne, European Bioinformatics Institute, United Kingdom
Maria Martin, European Bioinformatics Institute, United Kingdom
Helen Parkinson, European Bioinformatics Institute, United Kingdom
Ewan Birney, European Bioinformatics Institute, United Kingdom

The breadth of diversity of data in the life sciences that is available to support research is a valuable asset. Integrating these complex and disparate data does however present a challenge. The Resource Description Framework (RDF) technology stack offers a mechanism for storing, integrating and querying across such data in a flexible and semantically accurate manner, and is increasingly being used in this domain. However the technology is still maturing and has a steep learning curve. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. In order to support further adoption of RDF for molecular biology, the EBI RDF platform (https://www.ebi.ac.uk/rdf) has therefore been developed. In addition to coordinating RDF activities across the institute, the platform provides a new entry point to querying and exploring integrated resources available at the EBI.

[Top] [Full Agenda]

...............................................................................................................

PubChemRDF: Towards a semantic description of PubChem

Gang Fu, National Center for Biotechnology Information, United States
Bo Yu, National Center for Biotechnology Information, United States
Evan Bolton, National Center for Biotechnology Information, United States

PubChem is a community driven chemical biology resource containing information about the biological activities of small molecules located at the National Center for Biotechnology Information (NCBI). With over 250 contributors, PubChem is a sizeable resource with over 125 million chemical substance descriptions, 48 million unique small molecules, and 220 million biological activity results. PubChem integrates this information with other NCBI internal resources (such as PubMed, Gene, Taxonomy, BioSystems, OMIM, and MeSH) but also external resources (such as KEGG, DrugBank, and patent documents).

Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. These technologies include the Resource Description Framework (RDF). RDF is a family of World Wide Web Consortium (W3C) specifications used as a general method for concept description. The RDF data model can encode semantic descriptions in so-called triples (subject-predicate-object). For example, in the phrase “atorvastatin may treat hypercholesterolemia,” the subject is “atorvastatin,” the predicate is “may treat,” and the object is “hypercholesterolemia.”

The PubChemRDF project provides a great wealth of PubChem information in RDF format. One of the aims is to help researchers work with PubChem data on local computing resources using semantic web technologies. Another aim is to harness ontological frameworks to help facilitate PubChem data sharing, analysis, and integration with resources external to NCBI and across scientific domains.

This talk will give an overview of the recently released PubChemRDF project scope and some examples of its use.

[Top] [Full Agenda]

...............................................................................................................

Evolving from Data-analytics to Data-applications: How Modular Software Environments Built on Semantic Linked-Data Enable a New Generation of Collaborative Life-Science Applications

David King, Exaptive Inc., United States
Robert McBurney, The Accelerated Cure Project, United States

This talk proposes a new approach to solving one of the paramount challenges of gaining valuable insight from complex data environments like those found in the life sciences – the integration of not only distributed and disparate data, but also the integration of diverse, specialized, and often siloed scientists and their analytical tools. Traditional tightly-coupled architectures, relying on data-warehousing and monolithic software, are ill-equipped to handle a diverse and dynamic landscape of data, users and tools. Fragmented data cannot be leveraged by equally fragmented software. A successful approach must dynamically link scientists and their data and facilitate their cross-disciplinary collaboration. This talk will explore both the technical details for implementing architectures that accomplish this though linked-data and linked-visualization, and will show how this approach is being leveraged by The Accelerated Cure Project and the Orion Bionetworks Alliance in their efforts to lower the barriers to collaborative Multiple Sclerosis research. The combination of Object-Oriented and Flow-Based programming with Semantic Data standards and recent HTML5 Web advances creates the potential for a new breed of life-science analytics applications. Applications built from this style of combinatorial architecture achieve a loosely-coupled modularity that allows them to leverage the wide variety of components present in complex data systems instead of being hindered by it. By treating datasets more modularly, these applications give liquidity to otherwise frozen and siloed data. By treating algorithms and visualizations as data-agnostic modules, these applications lower the barriers to creative experimentation, a key component to hypothesis generation and insight. By leveraging recent advances in HTML5 Web capabilities, these techniques can launch in web-based, highly cohesive exploratory interfaces that can be rapidly deployed, evolved, and redeployed to a distributed and dynamic workforce of collaborating scientists, analysts, and subject matter experts. The technical details of this talk will be balanced by two real-world use-cases: how it is being applied by The Accelerated Cure Project as a next-generation interface to their unique biosample repository, and how it is being applied by the Orion Bionetworks Alliance as a means to link and visualize data from multiple large-scale clinical studies. A goal of the talk will be to strike a balance between the macro and micro views of modular data-application development to provide attendees with two sets of useful takeaways. At the macro level, the audience will be exposed to a generic architecture for combinatorial interactive interface development that can inform some of their future design decisions. At the micro level, the audience will be introduced to some specific linked-data and visualization techniques - exploratory interfaces that are both thought-provoking and inspiring for the attendees’ future work.

[Top] [Full Agenda]

-->Friday, February 28

SEMANTICS IN PHARMA

Time: 10:05 a.m. – 10:55 a.m.

The Open PHACTS Project:Progress and Future Sustainability

Click here for PDF of presentation

Tom Plasterer, AstraZeneca, United States
Lee Harland, Connected Discovery, United Kingdom
Bryn Williams-Jones, Connected Discovery, United Kingdom

The Open PHACTS project, an Innovative Medicines Initiative funded effort, has delivered a platform designed to reduce barriers to drug discovery and accelerate collaboration within and outside of biopharma. It has done so by building a community of life science and computational researchers around a use-case driven model, taking advantage of existing community efforts while pushing life science content providers towards the use and provisioning of linked data. The project is pursuing an ENSO (explore new scientific opportunities) call to expand the available content set and look at new distribution models. It is also looking to complete the project in the fall of 2014 and to turn these successes into a self-sustaining foundation, the Open PHACTS Foundation (OPF). We’ll provide a travelogue of this journey along with a look into some of the major contributions, from the Open PHACTS API to the exemplar applications built on top of this platform. From there a snapshot of the impact of Open PHACTS at AstraZeneca will be discussed, from the vantage point of characterising internal and external biochemistry assays using the BioAssay Ontology and Open PHACTS assay content.

[Top] [Full Agenda]
...............................................................................................................

Safety Information Evaluation and Visual Exploration (“SIEVE”)

Click here for PDF of presentation

Suzanne Tracy, AstraZeneca, United States
Stephen Furlong, AstraZeneca, United States
Robert Stanley, IO Informatics, United States
Peter Bogetti, AstraZeneca, United States
Jason Eshleman, IO Informatics, United States
Michael Goodman, AstraZeneca, United States

AstraZeneca (“AZ”) Patient Safety Science wanted to improve retrieval of clinical trial data and biometric assessments across studies. Traditionally, evaluation of clinical trials data across studies required manual intervention to deliver desired datasets. A proposal titled Safety Information Evaluation and Visual Exploration (“SIEVE”) was sponsored by Patient Safety Science. This took the form of collaboration between AZ and IO Informatics (“IO”). AZ provided the project environment, data resources, subject matter expertise (“SME”) and business expertise. IO provided semantic software, data modeling and integration services including solutions architecture, knowledge engineering and software engineering. The project goal was to improve search and retrieve of clinical trials data. SIEVE was to provide a web-based environment suitable for cross-study analysis. The environment was to align across biomarkers, statistics and bioinformatics groups. Over-arching goals included decision-making for biomarker qualification, trial design, concomitant medication analysis and translational medicine. The team analyzed approximately 42,000 trials records, identified by unique subjectIDs. IO’s Knowledge Explorer software was used by IO’s knowledge engineers in collaboration with AZ’s SMEs to explore the content of these records as linked RDF networks. Reference metadata such as studyID, subjectID, rowID, gender and DoB was central to assuring valid integration. Because almost all docs had both, subjectID and studyID, concatenation of these items as an individual identifier allowed connections that bridged multiple documents for data traversal. 36,000 records contained valid data, each including a unique trial, patient, and at least one row of laboratory data. IO created a semantic data model or “application ontology” to meet SIEVE requirements. The resulting data model and instances were harmonized by application of SPARQL-based rules and inference and were aligned with AZ standards. Data was integrated under this ontology, loaded into a semantic database and connected to IO’s “Web Query” software. The result is a web-based User Interface accessible to end users for cross-study searching, reporting, charting and sub-querying. Methods include “Quick Search” options, shared searches and query building containing nesting, inclusion / exclusion, ranges, etc. Advanced Queries are presented as filters for user entry to search subjects (“views” or “facets”) including Clinical Assays, Therapy Areas, Adverse Events and Subject Demographics. Reports include exporting, charting, hyperlink mapping and results-list based searches. Results include reduced time to evaluate data from clinical trials and to facilitate forward looking decisions relevant to portfolios. Alternatives are less efficient. Trial data could previously be evaluated within a study; however there was no method to evaluate trials data across studies without manual intervention. Semantic technologies applied for data description, manipulation and linking provided mission-critical value. This was particularly apparent for integration and harmonization, in light of differences discovered across resources. IO’s Knowledge Explorer applied data visualization and manipulation, application of inference and SPARQL-based rules to RDF creation. This resulted in efficient data modeling, transformation, harmonization and integration and helped assure a successful project.

[Top] [Full Agenda]

...............................................................................................................

DISCOVERY INFRASTRUCTURE

Time: 2:05 p.m. – 4:00 p.m.

Scalable Ontological Query Processing over Semantically Integrated Life Science Datasets using MapReduce

Click here for PDF of presentation

Hyeongsik Kim, North Carolina State University, United States
Kemafor Anyanwu, North Carolina State University, United States

While addressing the challenges of join-intensive Semantic Web workloads has been a key research focus, processing disjunctive queries has not. Disjunctive queries arise frequently in scenarios when querying heterogeneous datasets that have been integrated using rich and complex ontologies such as in biological data warehouses, UniProt, Bio2RDF, and Chem2Bio2RDF, etc. Here, the same or similar concepts may be modeled in different ways in the different datasets. Therefore, several disjunctions are often required in queries to describe the different related expressions relevant for a query, either included explicitly or generated as part of an inferencing-based query expansion process. Often, the width (\#branches) of such disjunctive queries can be large making them expensive to evaluate. They pose particularly significant challenges when cloud platforms based on MapReduce such as Apache Hive and Pig are used to scale up processing, translating to long execution workflows with large amount of I/O, sorting and network traffic costs. This paper presents an algebraic interpretation of such queries that produces query rewritings that are more amenable to efficient processing on MapReduce platforms than traditional relational query rewritings.

[Top] [Full Agenda]

...............................................................................................................

Multi-Domain Collaboration for Web-Based Literature Browsing and Curation

David H. Mason, Concordia University, Canada
Marie-Jean Meurs, Concordia University, Canada
Erin McDonnell, Concordia University, Canada
Ingo Morgenstern, Concordia University, Canada
Carol Nyaga, Concordia University, Canada
Vahé Chahinian, Concordia University, Canada
Greg Butler, Concordia University, Canada
Adrian Tsang, Concordia University, Canada

We present Proxiris, a web-based tool developed at the Centre for Structural and Functional Genomics, Concordia University. Proxiris is an Open Source, easily extensible annotation system that supports teams of researchers in literature curation on the Web via a browser proxy. The most important Proxiris features are iterative annotation reﬁnement using stored documents, Web scraping, strong search capabilities, and a team approach that includes specialized software agents. Proxiris is designed in a modular way, using a collection of Free and Open Source components best suited to each task.

[Top] [Full Agenda]

..............................................................................................................

Improving Scientific Information Sharing by Fostering Reuse of Presentation Material

Bernadette Hyland, 3 Round Stones Inc., United States
David Wood, 3 Round Stones Inc., United States
Luke Ruth, 3 Round Stones Inc., United States
James Leigh, 3 Round Stones Inc., United States

Most scientific developments are recorded in published papers and communicated via presentations. Scientific findings are presented within organizations, at conferences, via Webinars and other fora. Yet after delivery to an audience, important information is often left to wither on hard drives, document management systems and even the Web. Accessing data underlying scientific findings has been the Achilles Heel of researchers due to closed and proprietary systems. Because there is no comprehensive ecosystem for scientific findings, important research and discovery is repeatedly performed and communicated over and over. In an ideal world, published papers and presentations would be the start of an ongoing dialogue with peers and colleagues. By definition this dialogue spans geographical boundaries and therefore must involve a global network, international data exchange standards, a universal address scheme and provision for open annotations. Security and privacy are key concerns in discussing and sharing scientific research. 3 Round Stones has created a collaboration system that is platform-agnostic relying on the architecture and standards of the Web using the Open Source Callimachus Linked Data Management system. The collaboration system facilitates the full ecosystem for sharing and annotating scientific findings. It includes modern social network capabilities (e.g., bio, links to social media sites, contact information), Open Annotation support for associating distinct pieces of information, and metrics on content usage. Papers and presentations can be annotated and iterated using a familiar review cycle involving authors, editors and peer reviewers. Proper attribution for content is handled automatically. The 3 Round Stones collaboration system has the potential to foster discovery, access and reuse of scientific findings.

[Top] [Full Agenda]

...............................................................................................................

***IO INFORMATICS BEST PAPER PRIZE***

GenomeSnip: Fragmenting the Genomic Wheel to Augment Discovery in Cancer Research

Click here for PDF of presentation

Maulik R. Kamdar, National University of Ireland, Ireland
Aftab Iqbal, National University of Ireland, Ireland
Muhammad Saleem, Universität Leipzig, Germany
Helena F. Deus, Foundation Medicine Inc., United States
Stefan Decker, National University of Ireland, Ireland

Cancer genomics researchers have greatly benefitted from high-throughput technologies for the characterization of genomic alterations in patients. These voluminous genomics datasets when supplemented with the appropriate computational tools have led towards the identification of 'oncogenes' and cancer pathways. However, if a researcher wishes to exploit the datasets in conjunction with this extracted knowledge his cognitive abilities need to be augmented through advanced visualizations. In this paper, we present GenomeSnip, a visual analytics platform, which facilitates the intuitive exploration of the human genome and displays the relationships between different genomic features. Knowledge, pertaining to the hierarchical categorization of the human genome, oncogenes and abstract, co-occurring relations, has been retrieved from multiple data sources and transformed a priori. We demonstrate how cancer experts could use this platform to interactively isolate genes or relations of interest and perform a comparative analysis on the 20.4 billion triples Linked Cancer Genome Atlas (TCGA) datasets.

[Top] [Full Agenda]