Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Conference Hotel

updated August 20, 2010

ROYAL SONESTA HOTEL BOSTON
40 Edwin Land Boulevard
Cambridge, MA, USA 02142
www.sonesta.com/Boston/

Delegates registered for CSHALS 2011 – Conference on Semantics for Healthcare and Life Sciences are able to take advantage until February 1, 2011 of the special conference rate of $130.00 plus taxes (see below). Delegates may reserve their preferred accommodations in a variety of ways:

1) Call the Hotel directly at 617-806-4200 and asking for Reservations

2) Email the Hotel directly at This email address is being protected from spambots. You need JavaScript enabled to view it.

3) Book on-line at: www.iscb.org/cshals2011reservations

ROYAL SONESTA CONFERENCE RATES AND INFORMATION
Group Rate: $130.00 single / $130.00 double occupancy

The group rate is for a standard guestroom type. Should you be interested in upgrading to our other accommodations, the following upgrade charges would apply:

Upgrade to a Deluxe Riverview room for an additional $50.00.

Upgrade to an Executive Suite for an additional $100.00.

The above rates are subject to 5.7% Massachusetts State tax, 6.0% Cambridge City tax, and 2.75% CCF tax, for a total tax of 14.45%. Please note that group rates may not be available before or after the dates contracted above. Upgrade requests are based on availability and upgrade fees are subject to change. For more than two guests, there will be a $25.00 charge for each additional person in the room. There is no charge for children 17 years or under when sharing a room with parents.

CANCELLATION POLICY FOR INDIVIDUAL RESERVATIONS
To avoid an individual guestroom cancellation charge equal to one night's room and tax, please notify the hotel of any cancellations by 6 p.m. on the day prior to arrival. Please be sure to obtain a cancellation number at the time of cancellation and retain this number for your files.

CHECK-IN/CHECK-OUT
Hotel check-in time is after 3:00 p.m. and our check-out time is 12 noon. Anyone arriving earlier than 3:00 p.m. will be checked in as soon as rooms become available; but they should be aware that there may be a wait. On your departure date, your guests should be notified that they may incur a late check-out fee should they check out after 12 noon. The hotel is pleased to arrange for baggage storage for attendees' luggage.

EARLY DEPARTURE FEES
An early departure fee of $50 will be assessed to any guest who departs earlier than the date confirmed at check-in. The fee will be automatically posted to the guest folio on or after departure. Emergency or special circumstance situations will be reviewed on a case by case basis.

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Full Agenda*

(*Schedule subject to change. Updated February 25, 2011)

Indicates that presentation slides are available.

go directly to :[Thursday - February 24] [Friday - February 25]

Wednesday – February 23, 2011
7:30 am - 1:00 pm	Registration
8:30 am - 10:30 am	W3C Tutorial - Multi-stakeholder Perspectives on Translational Medicine Presenter: Eric Prud'hommeaux, W3C
10:30 am - 11:00 am	Break
11:00 am - 12:15 pm	RPI-led Hands-on Tutorial - Semantic Healthcare and Life Sciences Tutorial: Mashing HC and LS Data RPI Tutorial Coordinator: Joanne Luciano, This email address is being protected from spambots. You need JavaScript enabled to view it. Presenters: Tim Lebo, This email address is being protected from spambots. You need JavaScript enabled to view it. Dominic DiFranzo, This email address is being protected from spambots. You need JavaScript enabled to view it. Jim McCusker, This email address is being protected from spambots. You need JavaScript enabled to view it.
12:15 pm - 1:00 pm	Lunch
1:00 pm - 3:00 pm	RPI-led Hands-on Tutorial (continues) - Mashup Workflows
3:00 pm - 3:15 pm	Break
3:15 pm - 5:00 pm	RPI-led Hands-on Tutorial (continues) - Data Visualization
4:00 pm - 7:00 pm	Registration
4:00 pm - 5:00 pm	Poster (Author) Set-up
5:00 pm - 7:00 pm	Poster Reception

Thursday – February 24, 2011
7:30 am - 10:00 am	Registration
7:30 am - 8:30 am	Breakfast (continental)
9:00 am - 9:15 am	Welcome & Overview BJ Morrison McKay, ISCB & Ted Slater, Conference Chair
9:15 am - 10:00 am	Keynote 1 How to Argue for Semantics Toby Segaran, Metaweb Technologies Presentation slides (.pdf)
10:05 am - 10:30 am	Biologics, Compounds & Chemistry Presenter: Christopher Baker Presentation slides (.pdf)
10:30 am - 10:45 am	Coffee Break sponsored by
10:45 am - 11:10 am	*Biomolecular Semantics* Presenter: James McCusker Presentation slides (.pdf)
11:15 am - 11:40 am	Biomolecular Semantics Presenter: Dexter Pratt
11:45 am - 12:10 pm	*New & Innovative* Presenter: Martin Romacker Presentation slides (.pdf)
12:15 pm - 12:25 pm	Tech Talk 1 RDF Browser for Pharma Discovery and Visual Query Building Presenter: Jan Aasman, Franz Inc.
12:30 pm - 12:40 pm	Tech Talk 2 SciVerse Platform Embraces Semantic Applications Presenters: Vishal Gupta, Elsevier and Ari Tuchman, Quantified Presentation slides (.pdf)
12:35 pm - 1:30 pm	Lunch
1:30 pm - 2:15 pm	Keynote 2 Computational Acceleration of Biomedical Discovery Lawrence Hunter, University of Colorado School of Medicine Presentation slides (.pdf)
2:20 pm - 2:45 pm	Safety, Efficacy & Outcomes Presenter: Sherri Matis-Mitchell Presentation slides (.pdf)
2:50 pm - 3:15 pm	Safety, Efficacy & Outcomes Presenter: Vicki Seyfert-Margolis
3:20 pm - 3:45 pm	*Clinical Harmonization* Presenter: Eric Neumann
3:45 pm - 4:00 pm	Break
4:00 pm - 4:25 pm	Genomics & Genetics Presenter: James Balhoff Presentation slides (.pdf)
4:30 pm - 4:55 pm	*New & Innovative* Presenter: Therese Vachon
5:00 pm - 5:45 pm	Keynote 3 Next-generation Architecture for cabbing Charles Mead, Center for Biomedical Informatics and Information Technology Presentation slides (.pdf)
5:45 pm - 5:50 pm	Daily Closing Remarks Ted Slater, Conference Chair

Friday – February 25, 2011
7:30 am - 10:00 am	Registration
7:30 am - 8:30 am	Breakfast (continental)
8:30 am - 8:45 am	Review - Previous Day Ted Slater, Conference Chair
8:45 am - 9:30 am	Keynote 4 Semantics for Computational Workflows: A Top Ten List Yolanda Gil, University of Southern California, Los Angeles Presentation slides (.pdf)
9:35 am - 10:00 am	Ontologies & Knowledge Bases Presenter: Mike Miller Presentation slides (.pdf)
10:05 am - 10:30 am	Ontologies & Knowledge Bases Presenter: Nigam Shah
10:30 am - 10:45 am	Break
10:45 am - 11:10 am	Emerging & Established Standards Presenter: Eric Prud'hommeaux
11:15 am - 11:40 am	Emerging & Established Standards Presenter: Dietrich Rebholz-Schuhmann
11:45 am - 12:10 pm	Emerging & Established Standards Presenter: John Madden Presentation slides (.pdf)
12:15 pm - 12:25 pm	Tech Talk 3 Analysis of Omics Data Using Reverse Causal Reasoning (RCR) in an Integrated Analysis Environment Presenter: Dexter Pratt, Selventa Presentation slides (.pdf)
12:30 pm - 12:40 pm	Tech Talk 4 A 'Killer App' for Semantic Technologies: Point-and-Click Data Integration Tools Make it Easy to Deliver Targeted Semantic Knowledge Bases Presenter: Chuck Rockey, IO Informatics
12:40 am - 1:30 pm	Lunch
1:30 pm - 1:55 pm	Translational Medicine Presenter: Sudeshna Das
2:00 pm - 2:25 pm	Semantic Web and Pharma Presenter: Lakshmish Ramaswamy
2:30 pm- 2:55 pm	Semantic Web and Pharma Presenter: Chris Bouton
3:00 pm - 3:15 pm	Future Actions Ted Slater, Conference Chair
3:15 pm	Conference Adjourns
[TOP]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Discussion Questions

Conference participants will take part in discussion sessions that explore answers to technology questions such as the following:

Where are semantic applications having the biggest impact?

What support exists for using semantics for automated reasoning and agent technologies?

How well do semantic tools enable the visualization and utilization of information and knowledge?

What are the measurable benefits of Semantic Web standards, such as RDF, OWL, and SPARQL?

Themes of past CSHALS conferences:

Clinical Information Management

Discovery Information Integration

Integrated Healthcare and Semantics in Electronic Health Records

Translational Medicine / Safety

Search and Document Management/Business Intelligence/Text Mining

Text Mining/ Information Extraction

Please check back for updated information as it becomes available.

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Sponsor Opportunities

Gold: $20,000
	Three (3) complimentary conference registrations Three (3) sponsor/VIP dinner invitations One (1) Exhibitor Showcase Display space One (1) 10 minute tech-talk (scheduled by the organizers) Logo recognition with hyperlink on home page of conference web site Logo recognition during conference opening session Logo recognition on conference sponsor signage* Gold Sponsor recognition with organization logo and 100 word description in conference program Full-page black and white advertisement in conference program Option to provide company brochure/flyer for placement in delegate packet

Silver: $15,000
	Two (2) complimentary conference registrations Two (2) sponsor/VIP dinner invitations One (1) Exhibitor Showcase Display space One (1) 10 minute tech-talk (scheduled by the organizers) Logo recognition with hyperlink on sponsors page of conference web site Logo recognition during conference opening session Logo recognition on conference sponsor signage* Silver Sponsor recognition with organization logo and 50 word description in conference program Half-page black and white advertisement in conference program

Bronze: $10,000
	One (1) complimentary conference registration One (1) sponsor/VIP dinner invitation One (1) Exhibitor Showcase Display space One (1) 10 minute tech-talk (scheduled by the organizers) Company name recognition with hyperlink on sponsors page of conference web site Logo recognition during conference opening session Logo recognition on conference sponsor signage* Bronze Sponsor recognition with organization name and company URL in conference program Quarter-page black and white advertisement in conference program

Exhibitor Showcase
	Not for Profit Organization: $1500.00 For Profit Organization: $2500.00 **Add $1000.00 to include tech-talk presentation (see description below).
	CSHALS 2011 offers organizations an opportunity to showcase products and services as part of the conference exhibitor showcase. A limited number of spaces are available on a first come, first served basis. The exhibitor showcase includes the following: Conference Registration for one representative Exhibit showcase space, with 6ft table. Please note the showcase is designed for pop-up displays, approximately 8 ft wide or a table-top exhibit.

Tech Talks
	Tech Talks are opportunities to showcase products and services with conference delegates. The cost for a Tech Talk is $1000. It does not include registration and the presenter must be registered to attend the conference. They are 10 minutes in length and are designed to allow organizations to create awareness of new technologies, services, etc, in a non - hard sales format. Requests for Tech Talks will be reviewed for approval by the Organizing Committee. Space is limited and it may not be possible to accept all Tech Talks requests.

To confirm your space contact:
	Steven Leard Conferences Coordinator This email address is being protected from spambots. You need JavaScript enabled to view it. Phone: 1+780-414-1663

	* Logo recognition on conference sponsor signage will be proportionately sized for each sponsorship level, with Gold Sponsor logos being the largest and appearing at the top of the signage.

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Organizing Committee

Updated January 11, 2011

Ted Slater, Merck & Co., Conference Chair

Jonas S. Almeida, University of Alabama at Birmingham, US

Mike Bevil, Merck & Co.

Lee Feigenbaum, Cambridge Semantics

Joanne Luciano, Rensselaer Polytechnic Institute, Tetherless World Constellation

Eric Neumann, Clinical Semantics Group

Logistical Organizers

Steven Leard, ISCB Conferences Director

BJ Morrison McKay, ISCB Executive Officer

Contact Us
This email address is being protected from spambots. You need JavaScript enabled to view it.

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Keynote Speakers

Updated February 25, 2011

	Dr. Yolanda Gil Associate Director for Research Intelligent Systems Division, USC/ISI Research Professor of Computer Science Information Sciences Institute University of Southern California, Los Angeles, USA >>Click here for presentation details

	Dr. Lawrence Hunter Director of the Computational Bioscience Program and of the Center for Computational Pharmacology University of Colorado School of Medicine Aurora, USA >>Click here for presentation details Presentation slides (.pdf)

	Dr. Charles Mead National Cancer Institute Center for Biomedical Informatics and Information Technology (CBIIT) Rockville, USA >>Click here for presentation details Presentation slides (.pdf)

	Toby Segaran Data Magnate Metaweb Technologies San Francisco, CA, USA >>Click here for presentation details Presentation slides (.pdf)

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Tutorial (hands on)

Updated February 21, 2011

CSHALS will be preceded by a full-day Tutorial focused on the current Semantic Web standard RDF tools to show participants how this technology meets drug development needs.

Note: You must be registered for the CSHALS 2011 conference to register for the full-day tutorial on Wednesday, February 23.

Go directly to: [ Hands-on Tutorial Synopsis] [ Tutorial Agenda ] [ Reading List ]

Multi-stakeholder Perspectives on Translational Medicine Tutorial
Presented by W3C

The W3C's Health Care and Life Sciences Interest Group (HCLS IG) has been working since 2005 with multiple communities and Semantic Web technologies towards goals such as immediate availability of scientific publications; improved synthesis between scientific findings; better patient recruitment for drug trials; and early redirection of non-promising clinical trials.

Today, there is a new convergence of communities in health care and life sciences. Pharmaceutical companies, clinical care providers and individual patients have intersecting interests in Translational Medicine. Pharmaceutical companies have a new interest in more detailed patient records rather than aggregate data because of the shift towards tailored therapeutics.

Consumers advocating for personally controlled health care records are a new audience interested in health care data. Clinicians, pharmaceutical companies and individuals will benefit from health care data which is easy to integrate with genomics, bio informatics chem informatics and environmental data.

In this tutorial session, we will show you how W3C's HCLS IG is integrating data across these domains. We will demonstrate the use of commodity Semantic Web tools to ask valuable questions of a corpus of health care data, and show how this corpus draws on such systems as the Indivo EHR system, the I2B2 clinical information exchange protocols, and databases backing conventional clinical data stores.

Attendees will learn to use and customize these open source tools to meet their clinical or research needs.

Semantic Healthcare and Life Sciences Tutorial: mashing HC and LS Data
Hands-on Tutorial Synopsis

Presented by RPI

**********************************
The tutorial team from RPI have provided instructions for your use prior to attending the Tutorial on Wednesday, Feb 23 at the Royal Sonesta Hotel.

Please take a moment to visit and review the instructions posted at:
http://sparql.tw.rpi.edu/?page_id=57

If you have questions or problems please email presenters Dominic, Tim, or Jim identified below.

RPI Tutorial Coordinator: Joanne Luciano, This email address is being protected from spambots. You need JavaScript enabled to view it.

Presenters:
Tim Lebo, This email address is being protected from spambots. You need JavaScript enabled to view it.
Dominic DiFranzo, This email address is being protected from spambots. You need JavaScript enabled to view it.
Jim McCusker, This email address is being protected from spambots. You need JavaScript enabled to view it.
**********************************

CSHALS has always been about the practical application of semantic technology to life sciences and pharmaceutical R&D. In keeping with that spirit, this hands-on tutorial will give participants practical experience using Semantic Web tools and technologies to develop mashups using data from the Linked Open Data cloud together with semantic data they create themselves from raw data. Participants will load data into a triple store, query it using SPARQL, use inference to expand the experimental knowledge, and build dynamic visualizations from their results. The tutorial will be loosely based on the book Programming the Semantic Web*, by Toby Segaran (one of our keynote speakers), Colin Evans, & Jamie Taylor, and will be led by Rensselaer Polytechnic Institute’s Tetherless World Constellation. Bring your laptops! It’s going to be fun.

(* All CSHALS participants will receive an eBook copy of Programming the Semantic Web, courtesy of O’Reilly Media.)

Tutorial Agenda

1. Part I – Semantic Data (11:00 a.m. - 12:15 p.m.)
Brief introduction to the basics of the Semantic Web, its principles, and its technologies. Here we will discuss what the Semantic Web is, why we need it, what technologies make up the Semantic Web, and how we can use them to link and query data.

1.1. Why do we need a Semantic Web?
1.2. Intro to basic Semantic Web technology
1.3. Just Enough RDF - Converting data to RDF
- a few technologies that can be used to convert data to RDF
1.4. Data Linking
1.5. SPARQL

2. Part II - Mashup Workflows (1:00 p.m. - 3:00 p.m.)
In this section we will discuss the workflow of building a mashup using semantic data. We will discuss the iterative process of discovering, exploring, linking our data, and publishing on the web. The aim is to learn the semantics and syntax of the data, and how to bring them together.

2.1. Data identification – what data sets to utilize
2.2. Data understanding – the semantics and syntax of your data
2.3. Data linking – how to link together your data

3. Part III - Data visualization - (3:15 p.m. - 5:00 p.m.)
Once we’ve gone through the data and discovered what’s inside it and how to link it to other datasets, we can begin to query the data and visualize the results. This allows us to see new patterns and correlations that weren’t visible before. We can also use our visualization to communicate these patterns and ideas to others. We will also cover some popular visualization APIs that can be used for visualization.

3.1. Uses
3.1.1. Data discovery - exploring your data through visualization. See new patterns that emerge from visually exploring your data
3.1.2. Data communication - communicate the story that is within your data, and see more effective ways to visually present your data for better communication
3.2. Visualization APIs
3.2.1. Google visualization APL
3.2.2. MIT Simile Exhibit

Reading List and References (as of November 3, 2010)

"What is RDF and what is it good for?"
by Joshua Tauberer
www.rdfabout.com/intro/
Quick Intro to RDF
Joshua Tauberer
www.rdfabout.com/quickintro.xpd
RDF Primer
Turtle version
www.w3.org/2007/02/turtle/primer/
Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL
by Dean Allemang and James Hendler
Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001.
Semantic Web Programming
by John Hebeler et al, Wiley, 2009
Foundations of Semantic Web Technologies
by Hitzler et al., Chapman & Hall/CRC, 2009

[Top]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Presenters

Updated February 7, 2011

Semantic Infrastructure for Automated Small Molecule Classification and Data Mining for Lipidomics

Christopher Baker
University of New Brunswick
Saint John, CA

Presentation Abstract: Background The development of high-throughput experimentation and combinatorial chemistry has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. At the same time, efforts to annotate, classify, analyze, and link these chemical entities to disparate data sources have largely remained in the hands of human curators using manual or semi-automated protocols. Since chemical function is often closely linked to its structure, and cocomitantly, position within a chemical ontology, the accurate classification and annotation of chemical entities is of primary importance in understanding their functionality as well as the full spectrum of potential applications. Unfortunately, neither the expressivity of formal ontologies, nor the potential of Semantic Web Technologies (SWT) to integrate disparate computational services have been fully exploited within the lipidomics and metabolomics communities. Results As part of a case study in the utility of SWT for chemical classification, we have developed a prototype framework for automated lipid classification and annotation. This framework comprises of the following components; Firstly a formal lipid ontology developed in OWL-DL, which is based in part on the lipid class hierachy from the LIPIDMAPS database and relevant literature. The Lipid Ontology, [ICBO2009], relies on structural features of small molecules to formally described lipid classes. Secondly a set of federated Semantic Web services deployed within the SADI framework is used to invoke the automated logical classification task. The first service, a structural annotation service, detects and enumerates relevant chemical subgraphs on a given input chemical graph. Secondly a classifier service assigns chemical entities to appropriate ontology classes by reasoning over class description in the ontology and checking them against the set of chemical subgroups provided by the structure annotation service. We illustrate the utility of these core services using the use case of Eicosanoid classification and combine them with additional SADI services linking the annotated lipids to related proteins found in the biomedical literature or within the public databases. Using these services we further contrast the performance of automated Eicosanoid classification with the existing lipid nomenclature systems and curated lipid databases and reflect on the contribution of our methodology in the context of high-throughput Lipidomics. Conclusions The prototype semantic web service framework we have developed is capable of accurate automatic classification of lipids and integration of information on given chemical entities from relevant databases. The services we provide within this framework can also be reused within other contexts and adapted to diverse lipidomics computational workflows. We conclude that SWT can provide an accurate and versatile means of classification and annotation of chemical entities.

The Phenoscape Knowledgebase: Linking Evolutionary Diversity to Genetic Data Using Phenotype Ontologies

James Balhoff
National Evolutionary Synthesis Center
Durham, US

Presentation Abstract: Objectives and motivation Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. We have created the Phenoscape Knowledgebase, which consists of a database and web application (http://kb.phenoscape.org/). The database combines ontologically annotated phenotypic character data for a large and diverse group of fishes with phenotypic annotations from the ZFIN model organism database. The web application provides query and browsing interfaces which allow users to exploit the the logical framework provide by the ontologies which underpin the data. Method We used OBD ("Ontology-based Database") to store phenotypic data, from ~50 phylogenetic publications, as statements using terms from ten different OBO ontologies. The phenotypic data, taxa, and specimens in these published data sets were annotated with ontology terms using our curation application, Phenex. In this process free-text phenotype descriptions were converted to semantic representations using an Entity-Quality (EQ) model, combining terms from separate anatomical and qualitative ontologies. The ontologies and annotated data sets, along with EQ phenotype annotations for zebrafish genes, exported from the ZFIN database, were loaded into OBD using its own triple-based schema. We used the SQL-based OBD reasoner to pre-compute inferred statements and add them to the Knowledgebase. We developed a web services API providing access to the Knowledgebase using the Restlet Java framework. We also developed a Ruby on Rails-based end-user web interface, which allows biologists to query the Knowledgebase, accessing the data via these public web services. Results The Phenoscape Knowledgebase integrates over 500,000 asserted phenotype statements, concerning ~2500 fish species, with over 20,000 phenotype statements linked to over 3700 zebrafish genes. Users can discover fish species matching arbitrary phenotypic profiles, which can be expressed as queries making use of the hierarchical nature of anatomical, qualitative, and taxonomic ontologies. Moreover, genes influencing these phenotypes can be simultaneously returned. At the same time users can visualize the structure and explore term definitions of the included ontologies. The Knowledgebase has been used to investigate patterns of anatomical coverage within published phylogenetic characters, as well as to generate hypotheses for candidate genes underlying evolutionary losses of both scales and skeletal elements. Conclusion Ontological annotations of free-text phenotypic data, built with shared community-driven ontologies, constitute a powerful resource when aggregated within a database system which makes full use of the semantic framework provided by those ontologies. For the first time, scientists can search phenotypic content from dozens of phylogenetic publications, querying across anatomical, qualitative, and taxonomic axes.

Identifying Unexpected Associations in Integrated Biomedical Data Sets: Novel Navigation, Analysis & Visualization Interaction Patterns for Semantic TripleStores

Christopher Bouton
Entagen, LLC
Newburyport, US

Presentation Abstract: A promise of semantic technologies is the facile integration of large quantities of disparate data. In the biomedical research and development (R&D) sector this type of integration is essential for the potential identification of connections across entity domains (e.g. compound to targets, targets to indications, pathways to indications). However, the vast majority of data currently utilized in biomedical R&D settings is not integrated in ways which make it possible for researchers to intuitively navigate, analyze and visualize these types of interconnections. Using the Linking Open Drug Data (http://esw.w3.org/HCLSIG/LODD) data sets, we have been experimenting with novel forms of biomedical data integration, navigation, analysis & visualization through the development of a web-based, rich-internet application (RIA). An essential goal of this work is the creation user interface paradigms which enable "bench" researchers to intuitively identify unexpected associations which may drive their research forward through the iterative process of effective hypothesis generation and subsequent testing.

Semantic Repository of Genomics Experiments

Sudeshna Das
Mass. General Hospital, Harvard Medical School
Cambridge, US

Presentation Abstract: Objectives Genome-wide experiments are routinely conducted to study gene expression, DNA-protein binding and epigenetic status. The importance of structured meta-data for these experiments for integration and reuse is widely recognized. For this purpose, first the MIAME standard was developed for microarrays and recently the ISA-TAB format was published as a generalized format for experiments employing omics technologies. Several MIAME-compliant repositories exist for genomics data, notably Array Express and GEO. However, these are not yet widely available as Linked Data compliant with standard biomedical ontologies such as the MAGE Ontology (MO), the Ontology for Biomedical Investigators (OBI) and the Experiment Factor Ontology (EFO). Researchers need friendly, useful and reusable software environments that can automatically produce such Linked Data. Method We have developed reusable software to build semantic repositories of genomics experiments. Our software is based on the open source content-management system Drupal (www.drupal.org). The primary content type is an experiment; which has a title, researcher, design details and is comprised of one or more bioassays. The experiment can be linked to publication(s). Bioassays are processes that have biomaterials and technologies as participants and data files as output. The main classes are mapped to MO & OBI. Biomaterials have various characteristics such as organism, disease state and cell types. These characteristics are mapped to existing published biomedical concepts. The data is entered in a structured format – thus, eliminating the need for future curation. We then use RDF modules in Drupal to produce Linked Data & a SPARQL endpoint. Results We have developed two repositories using this software in separate domains. One of them is a repository for hematopoietic stem cell data (http://bloodprogram.hsci.harvard.edu). It contains over 100 microarray, transcription factor binding and histone modification experiments. The majority of the data is from microarray experiments performed on model organisms (mouse and zebra fish) and encompasses various cell types and disease states. The cell types were mapped to the Cell Type (CL) Ontology. The other repository comprises of microarray profiles from Parkinson’s disease patients (http://pdexpression.org/). The disease subtypes and tissue of subjects were mapped to standard terms with the help of the NCBO (National Center for Biomedical Ontologies) annotation tool. The use of standard terminologies to describe the biomaterials allows interoperability with other repositories. However, finding the most appropriate mappings still remains a challenge. For example, when mapping the cell types – there were quite a few missing entries, whereas “Parkinson’s Disease” was found in over 20 systems. Addressing these issues is as much a social process as a technological one. Conclusions The main benefit of our software is the ability to create Linked Data in a synchronous manner that eliminates the need for latter curation. For each domain we can deploy an instance of the software that is pre-populated with relevant terms (mapped to existing terminologies) from that field. As more communities begin to adopt such reusable infrastructure and make Linked Data available, we will begin to address the integration challenge that is currently posed to biomedical researchers.

Rendering Medical Documents in RDF: Strategies and Gotchas

John F. Madden
Duke University, Durham, US

Presentation Abstract: Clinical medical records consist of documents such as laboratory reports, physician's progress notes, admission summaries, etc.. They often contain a mixture of full-sentence, natural language text and "bullet-point" or form-like content.

Non-explicit knowledge (the document’s purpose, genre, temporal context, author’s background knowledge, etc.) as well as references to external assertions found in other documents heavily condition the meaning of such documents. Rendering the content of such a document in RDF is a complex act of interpretation, akin to translation. There is no single "correct" RDF rendering.

We will examine sample medical documents and study some possible renderings into RDF/OWL, with the purpose of highlighting common challenges including the following:

dealing with anaphora, i.e., candidate triples whose appropriate subject is ambiguous or multiple ("sodium 142 mM:: Whose sodium? The patient's? The patient’s serum? The sample of the patient’s serum delivered to the laboratory? etc.?)
instances versus classes, especially when using legacy vocabularies ("Jim has influenza": Does Jim have SNOMED-influenza, or does he have an instance of SNOMED-influenza?)
dealing with references to assertions in other documents ("My colleague Dr. Smith diagnosed pneumonia last week": Is pneumonia the relevant fact, or is the diagnosis of pneumonia the relevant fact? How do I represent the difference?)

Advancing Regulatory Science for Public Health – An FDA Perspective

Vicki Seyfert-Margolis
US Food and Drug Administration

Presentation Abstract: : For breakthroughs in science and technology to reach their full potential, FDA must play an increasingly integral role as an agency not just dedicated to ensuring safe and effective products, but also to promote public health and participate more actively in the scientific research enterprise directed towards new treatments and interventions. We must also modernize our evaluation and approval processes to ensure that innovative products reach the patients who need them, when they need them. These new scientific tools, technologies, and approaches form the bridge to critical 21st century advances in public health. They form what we call regulatory science: the science of developing new tools, standards and approaches to assess the safety, efficacy, quality and performance of FDA-regulated products.

NoSQL: New Possibilities for Distributed Scientific Data Management, Workflow and Collaboration

Mike Miller
Assistant Research Professor, U. Washington
Founder/Chief Scientist, Cloudant Inc.
Boston, USA

Presentation Abstract: Inspired by new problems (exploding sensor data, complex workflows, geo-distribution, etc.), there has been a dramatic renaissance of alternatives to classic relational database management systems. We briefly review these “NoSQL” implementations including key/value stores, big tables, document stores and graph stores. Next we focus on specific qualities that enable new possibilities for scientific data management, processing and analysis, in particular: flexibility, scalability, expressiveness, REST interfaces, concurrency, replication and cloud hosting. Finally, we discuss relevant applications in physical and biological sciences.

PharmaConnect: Development of an Integrated Knowledge Platform by Extracting, Integrating and Analyzing Information to Support Systematic, Evidence Based Decision Making in R&D

Sherri Matis-Mitchell
Astrazeneca Pharmceuticals
Wilmington, US

Presentation Abstract: The Knowledge Engineering initiative within AstraZeneca has recently delivered the first version of a knowledgebase that integrates internal and external evidence for connections between key concepts such as targets, pathways, compounds, diseases, preclinical, and clinical outcome from Chemistry, Competitive, Disease and Safety Intelligence workstreams. This talk will describe the system, architecture, and it’s development; demonstrate the impact of this new platform with specific examples; and discuss lessons learned during its development. We will also detail linkages to additional data sources and system as well as plans for the future.

Conceptual Interoperability and Biomedical Data

James McCusker
Rensselaer Polytechnic Institute
Troy, US

Presentation Abstract: Computable semantic interoperability among domain models in biomedicine, as well as interoperability with cross cutting models, has become a major concern in biomedical research. The National Cancer Institute Center for Biomedical Informatics and Information Technology (NCI CBIIT) has begun the next phase for developing caBIG semantic interoperability through the adoption of layered semantics and data models. We discuss a possible mapping between conceptual and logical models. This mapping technique leverages OWL annotation capabilities paired with SKOS representations of existing biomedical ontologies. We show how this technique might provide interoperability among domain and cross-cutting models in caBIG and in other semantic environments. We demonstrate three capabilities that this mapping provides: conversion between domain models and cross-cutting models, conversion between domain models, and domain model-agnostic queries across multiple models. We discuss the application of this technique to the existing caBIG semantics, the proposed caBIG semantics, and to interoperability of biomedical data through the proposed translational research provenance vision.

Semantic Analysis and Visualization of Clinical Data

Eric Neumann
Clinical Semantics
Bedford, US

Presentation Abstract: Biomedical data generation is continuously growing both in terms of size and complexity. Clinical Study data is complicated by the fact that new forms of associated data are continuously created as technologies emerge, including biomarkers, pathway (mechanistic) knowledge, assay platforms, and model systems. W3C semantic standards such as RDF and OWL have been around for several years, but most informatics specialists are unsure where they can be applied effectively. Semantically Linked Data (SLD) can significantly change the organization and re-use of data without requiring a concomitant investment in data systems. SLD is especially fine-tuned for handling information extracted from literature, and relating it to structured data, even if they exist in other data systems.

BEL (Biological Expression Language): Using Causal Relationships to Represent Scientific Findings in Molecular Biology in Support of Applications

Dexter Pratt
Selventa
Cambridge, US

Presentation Abstract: The intent of scientific publication is to share knowledge. To do this effectively, scientific documents should be accessible to semantically enabled applications, with critical information encoded in a computationally accessible knowledge representation. This presentation describes the knowledge representation language BEL (Biological Expression Language), a language designed to pragmatically represent scientific findings in molecular biology as causal relationships. BEL was designed to capture knowledge about biological scientific findings as well as their contexts in a user friendly, intuitive way. Findings can be encoded via the representation of experimentally demonstrated causal relationships which are further annotated with information describing biological context, experimental methodology, literature source and curation process. Biological models appropriate for a given analysis or application can be created in a knowledge assembly process in which BEL-encoded findings are integrated, selected, transformed and augmented by inference. Knowledge can be selected based on provenance and biological context information associated with each finding, enabling a strategy where knowledge capture can be well separated from the design of useful models. Each relationship in a BEL-derived model can be justified by reference to its supporting findings. BEL closely links the represented knowledge to measurable quantities by focusing the ontology on terms denoting abundances and activities of entities at the molecular scale, facilitating the use of BEL-derived models in the interpretation of experimental data sets. BEL terms can be defined by reference to external vocabularies or ontologies, thereby supporting the integration of knowledge from multiple sources. Following eight years of development and proprietary use, BEL has proven to be an intuitive and effective language for scientists, supporting the creation of a large knowledgebase used in the interpretation of 'omics data sets via causal relationship-based analytics. BEL and supporting tools are now being made publicly available to the research community through the introduction of the BEL Web Portal™. The BEL Web Portal™ provides public access to BEL language specifications, documentation, knowledge representation examples, and BEL software tools.

Using SWObjects to Create and Query RDF Views

Eric Prud'hommeaux
W3C
Cambridge, US

Presentation Abstract: SPARQL CONSTRUCTs, RIF and other rule forms allow us to trivially tailor views of RDF data from sources like turtle files, GRDDL'd XML documents, RDF databases, or conventional relational databases. Views over databases are especially practical if they can be virtual, that is, SPARQL queries over the virtual graph are mechanically transformed into SPARQL queries for RDF databases or SQL queries for conventional relational data (e.g. and Employees table and an Address table).

This talk will discuss the utility of such an architecture, including efficient access to RDBs and pipelines of transformation services supported by parties other than the custodians of the final data resources. Real-world examples will include using the SWObjects toolbox to view Gene Ontology as BioPAX and to ask questions which unify across Uniprot and GO databases.

Semantics-enabled Proactive and Targeted Dissemination of New Medical Knowledge

Lakshmish Ramaswamy
University of Georgia
Athens, US

Presentation Abstract: The body of knowledge in the field of medicine is expanding at a tremendous pace. The number of citations in MEDLINE grew by more than 700,000 in 2009 and it is expected to grow by 1 million this year. This includes discovery of new drugs, previously unknown reactions to existing drugs, and new treatments. Some of the discoveries are so important that they have to reach the end-practitioners quickly so that they can act upon new knowledge, possibly by altering the course of treatment of relevant patients. Typically, medical knowledge dissemination occurs through channels such as conferences, medical journals, and memos. In the past decade, Web has, to some extent revolutionized medical knowledge dissemination by providing advanced search capabilities. However, this mode of knowledge dissemination is passive, and it has significant limitations. First, it requires the doctors to periodically search the online databases, which places additional burden on doctors. Second, the time lag for a doctor to become aware of a research depends upon how often she searches the online databases. Third, even after a doctor becomes aware of certain medical information, it would take additional time for her to search through the patient records to find out the patients to whom the information would be relevant. These limitations highlight the need for a proactive medical information dissemination paradigm. Our vision is to design and develop a semantics-enabled framework for proactive and targeted dissemination of new medical knowledge. We believe that such a system has to achieve two major design goals. First, in order to prevent information overload, information dissemination has to be targeted in the sense that a doctor should receive alerts about new discoveries if the information is likely to be relevant to one or more of her patients. Second, the additional workload on the doctor for participating in the system should very minimal. In other words, the system should function based upon the information that is recorded during the examination and treatment of patients. Towards achieving these two goals, our main idea is to utilize patients’ electronic medical records (EMRs) to identify information in scientific articles, memos, etc. that are relevant to a particular patient and alert her doctor accordingly. Several research challenges have to be addressed in order to make such a system efficient and scalable including: (1) EMRs, research publications (from PubMed, etc.), and memos from organizations such as CDC and FDA have to be automatically annotated with medical ontology-based semantics-rich metadata; (2) Novel, semantics-driven algorithms for retrieving, filtering and ranking information relevant to a particular EMR have to be designed; (3) In the interest of system scalability techniques to cluster EMRs based upon their semantic-similarity need to be utilized; and (4) The system has to be continuously tuned based upon explicit and implicit feedback from the users to maintain and improve its effectiveness. In this talk, we will motivate this work through real-world examples. We will elaborate upon the above challenges, discuss our ideas towards addressing them, and present an architecture for our framework.

Publishers' Content Linked with Bioinformatics Data Resources: Working Towards Brokering Standards in the SESL Pilot Project.

Dietrich Rebholz-Schuhmann
European Bioinformatics Institute, Hinxton, UK

Presentation Abstract: The SESL pilot project explores the technical feasibility for federated querying across full text literature and bioinformatics databases. Five Life Science and Pharmaceutical companies have collaborated with four publishers and the Rebholz group (EMBL-EBI) to extract selected data from bioinformatics databases (Uniprot, OMIM and ArrayExpress) and full text literature with focus on human diseases related to Type 2 diabetes mellitus. Gene to disease related assertions have been delivered through a single point of query to the scientist users.

The pilot implements the integration of content from public resources and extracted information from the scientific literature into a shared infrastructure based on Semantic Web technology. The SPARQL endpoint is hosted at the EBI and can be accessed remotely through SPARQL queries, a Web browser based graphical user interface or through a SOAP Web services client. The project delivers a preliminary set of standards describing the minimal infrastructure necessary to support a biology brokering service and the provision of a prototype instance of that infrastructure as a public demonstrator.

Semantic Representation of Events in the Pharmaceutical Industry

Martin Romacker, Samuel Läubli & Marc Bux
Novartis Pharma AG, NIBR-IT

Presentation Abstract: Data feeds from commercial content providers contain information highly relevant to pharmaceutical research. Processing and normalizing the data for in-depth analysis plays an important role in areas like competitive intelligence, strategic alliances or modeling and simulation.

Unfortunately, the data is not easy to be integrated and to be semantically syndicated. The standard transfer mode of knowledge in terms of XML files clearly lacks semantics. Additionally, many facts are locked in natural language statements instead of being accessible in a machine-readable and semantically valid representation. The challenge is even larger when content needs to be combined from different feeds. Heterogeneous ways of naming, different semantic typing and different content structures prevent the users from fully exploiting the rich knowledge contained in the feeds.

At NIBR-IT, we have implemented an automatic pipeline to process and normalize company names. At the same time, we have created a NLP pipeline which is able to derive facts from statements around phase transitions, mergers and acquisitions or licensing events. By doing so, we transform natural language statement into a normalized semantic representation which uses a Neo-Davidson-like form of notation. The different types of events are captured in a high-level ontology around the event types using OWL. Having this kind of representation it is now possible to ask queries like "What are the licensing events where Novartis gave a license to any company?". The company centric events are complemented by knowledge around indications, products and other semantic types. A secondary aspect of this project is to be able to demonstrate to the content providers that it might be an interesting idea to change to a semantically richer and computer-accessible way to deliver data.

In the presentation, we will first outline the business rationale behind our project. In the second part, we will give an overview on our way to process and normalize the free text sentences and will explain the Semantic Web approach we have taken to represent data.

NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources

Nigam H. Shah
Stanford School of Medicine
California, USA

Presentation Abstract: The volume of publicly available data in biomedicine is constantly increasing. However, this data is stored in different formats on different platforms. Integrating this data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology, we have developed the Resource Index—a growing, large-scale index of more than twenty diverse biomedical resources. The resources include heterogeneous data from a variety of repositories maintained by different researchers from around the world. Furthermore, we use a set of 200 publicly available ontologies, also contributed by researchers in various domains, to annotate and to aggregate these descriptions. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics under the hood.

Fueling Knowledge Federation Using Terminological Services

Therese Vachon
Novartis Pharma AG
Basel, CH

Presentation Abstract: Knowledge proliferation and data silos are well-known buzz words which characterize the way data is produced and stored in the pharmaceutical industry. Most efforts in knowledge mining try to make the knowledge burried in applications and data bases accessible. These efforts are both expensive and tedious. Additionally, not all knowledge can be recovered as the stored information tends to be ambiguous and incomplete. At the Novartis, we have been working on a principled approach to overcome these shortcomings. The basic idea is to create a federation layer based on well controlled terminologies aiming at a uniform wording within and across data repositories. Thus, we have been collecting and defining meaningful atomic units (basic concepts) together with their lexical representations (terms) in a knowledge integration framework. Within that framework we maintain a number of terminologies (like indication, company, target, gene, assay method). The terminologies are organized in terms of taxonomies and complemented by referential knowledge, so-called cross references or pointers which link out to other repositories. One of our objectives is to stay compatible with the major resources of the open biomedical community. With regards to the coverage, our terminologies focus on the terms which are really relevant to research at NIBR. Cross referencing is a powerful but formally simple means to link out to other knowledge repositories to get access to additional information. Our methods to maintain and enhance the different terminologies in our framework depend mainly on the concept type. We have different levels of automatic generation, versus intellectual curation of the content related to indications, companies or genes. We believe that for each of these concept types there as an optimal balance between automation and curation – the former being prone to errors and the latter being time consuming and therefore expensive. Furthermore, we intend to make the maintenance process more and more a collaborative task where scientists can access, review and modify the content according to their role profile. An important success factor for the widespread usage of terminologies is to bring them seamlessly to the point of usage. Consequently, we have implemented a service layer providing SOAP and JSON Web Services as well as a REST API. Importantly, the users have access to the knowledge without slowing their work and without having to leave the active application. The increasing usage of these services both in number of applications and in number of calls clearly demonstrates the importance of the flexible integration layer. It is important to mention that for some of the concept types we have reached a critical mass in usage which allows us to run queries across systems or provide concept centric views joining internal and external data. As we can demonstrate the benefits from using our resources more and more people and organizations are starting to buy in. In our oral presentation, we would like to give an overview on our approach to “Terminology Management” and illustrate how we represent knowledge (terminological and referential). Finally, some use cases demonstrate how the services are applied.

[top]

Page 2 of 3

Start
Prev
1
2
3
Next
End

Submit your work today

Submit your work today

Submit your workshop today!

New Webinars & Tutorials scheduled frequently

Ensure you get important ISCB emails

Donate Now!

UPCOMING DEADLINES & NOTICES

Upcoming Conferences

A Global Community

Professional Development, Training, and Education

ISCBintel and Achievements

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Conference Hotel

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Full Agenda*

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Discussion Questions

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Sponsor Opportunities

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Organizing Committee

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Keynote Speakers

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Tutorial (hands on)

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Presenters

Exclusively for members

Supporting ISCB

ISCB On the Web