Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Technical Talks

Updated February 02, 2012

Tech Talks showcase products and services of relevance to the CSHALS audience. Each Tech Talk is 10 minutes in length and designed to allow organizations to create awareness of new technologies, services, etc., in an informational presentation format.

For organizations interested in presenting a Tech Talk, please go to our Sponsor Opportunities page (click here) for further information.

Thursday, February 23
11:20 am - 11:40 am

Jans Aasman, Franz Inc.
Oakland, CA, US

Javascript - The Key to Successful SemWeb Deployments

Ideally, declarative query languages (ie SQL, SPARQL) would be so powerful that we would never need to perform any procedural server side programming in the database. However, the reality is that every Enterprise application backed by a relational database relies heavily on server side programming language link PL/SQL or a very vendor specific binding to Java/C++.

Currently, the W3C nor the Semantic Web community have a proposal for server side scripting languages or named services and from our view in the trenches we see that Javascript has all the right properties to be both this scripting language and the basis for named services.

During this presentation we will demonstrate our Javascript compiler and a Javascript library that can perform all the basic handling of RDF quads, indices, and databases. This server side Javascript can also include SPARQL and Prolog and has rich functionality for temporal and spatial functions and can be used to write graph based algorithms that will work at server speed.

MongoDB’s wide adoption has created a large demand for working with JSON objects. Our proposed library supports the MongoDB API to make working with JSON objects in combination with RDF nearly transparent. On top of all this, we can take any server defined program or function and make it instantly available as a REST call which instantly opens the Semantic Web to a host of programming talent.

[Return to Full Agenda Page]

Thursday, February 23
11:45 pm - 12:05 pm

Bryan Thompson, SYSTAP LLC.
Greensboro, NC USA

Managing Bigdata® in Bioinformatics

Bigdata® is a horizontally scaled open source semantic web database platform running on a single machine or a cluster. I will introduce the bigdata architecture, summarize some of its key differentiators and new features, including analytic query support, and show how different groups are using Bigdata® to tackle bioinformatics problems.

SYSTAP, LLC leads the development of the bigdata open source platform and offers consulting services related scalable information architectures and services and support for the bigdata platform. Bigdata® is available under both open source and OEM licenses.

www.systap.com

www.systap.com/bigdata.htm

[Return to Full Agenda Page]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Poster Presentations

Updated April 03, 2012

POSTER DETAILS:

Posters will be on display throughout the conference, beginning 5 p.m. Wednesday, February 22, with a special reception for poster authors to present their work to conference delegates:

Poster set-up: Wednesday, February 22, 4:00 p.m. – 5:00 p.m.

Poster Reception: Wednesday, February 22, 5:00 p.m. – 7:00 p.m.

When preparing and designing your poster please note that it should be no larger than 44 inches wide by 44 inches high (there are two posters per side).

Posters must be removed between 1:30 p.m. - 3:00 p.m. Friday, February 24

Poster 1
SDlink: An Integrated System for Linking Biological and Biomedical Semantic Data

Presenter: Alexandre Francisco
INESC-ID / IST, Technical University of Lisbon, Portugal

Additional Authors:
Pedro Reis, INESC-ID / IST, Technical University of Lisbon, Portugal
Dário Abdulrehman, INESC-ID / IST, Technical University of Lisbon, Portugal
Cátia Vaz, INESC-ID / ISEL, Poly Inst of Lisbon, Portugal
Mauro Santos, INESC-ID / IST, Technical University of Lisbon, Portugal
Ana Freitas, INESC-ID / IST, Technical University of Lisbon, Portugal

Nowadays, with the decreasing cost and increasing availability of high-throughput technologies, an enormous amount of biological and biomedical data is becoming available. Such data is usually represented and stored in different formats and platforms, most of the times off line and not standardized. The automatic integration of data from different databases suffers from several caveats, the most notable being the lack of interfaces for automatic querying and running integration and analysis tools. In order to solve some of these issues, semantic technologies have been proposed and used with great success. In these work we propose an integrated environment for querying, retrieving and analyzing linked data, suitable for users unfamiliar with such technologies, solving an issue that has been detracting a more generalized adoption of semantic methodologies in biology and biomedicine.

Method: The new sdlink system (http:/kdbio.inesc-id.pt/sdlink) assumes that data is annotated following a given ontology and provides data views, including graphical representations, and a friendly querying interface. The querying interface was developed to be used by semantic technologies unfamiliar users, where one can for instance define a query by means of a point and click simple interface, which is then translated to SPARQL. The sdlink system uses Virtuoso OSE as the underlying triplestore. To address user concerns with respect to security and privacy, the system supports user/project control access, based on OpenID for authentication and FOAF+WAC for authorization. The system is being used by two FP7 European projects, with good results both in what concerns scalability and usability by non-expert users. We made also available a public project for evaluation purposes (http:/kdbio.inesc-id.pt/sdlink/lubm/).

Results: Our results were twofold. First, through the development and deployment of sdlink, we were able to use semantic technologies and linked data on two large projects were most people were unaware of these technologies or of any reason to use them. The main contribution was an interface that simultaneously allows users to retrieve and query linked data, and does not lose expressiveness, efficiency or scalability. In particular, the system is self-adaptable to ontology changes and data transformations, depending only on the update of underlying ontologies. The projects where the system was tested are dealing with heterogeneous data, including sequence data and experiment results, resulting from several teams and work packages, that in the end became integrated, browsable and queryable. The data stored comprises about one million triples, which can be queried in less than one second for most usual queries.

Conclusions: The development of sdlink, and its deployment in a real scenario, allowed us to concluded about the importance and usefulness of semantic technologies, namely for domain representation and data integration. More importantly, it was possible to show that, by developing suitable interfaces, any user can benefit from such technologies. Currently, the unfriendliness of most semantic technologies, in particular in the fields of biology and biomedicine, have struggle the adoption of these technologies. The sdlink system is proposed to overcome this problem and to bring semantic technologies and linked data to a broader audience.

Poster 2
Spo: An Ontology for Describing Host-pathogen Interactions Inherent to Streptococcus Pneumoniae Infections

Presenter: Cátia Vaz
INESC-ID / ISEL, Poly Instiute of Lisbon, Portugal

Additional Authors:
Pedro Reis, INESC-ID / IST, Technical University of Lisbon, Portugal
Alexandre Francisco, INESC-ID / IST, Technical University of Lisbon, Portugal
Susana Vinga, INESC-ID, Lisbon, Portugal
Ana Freitas, INESC-ID / IST, Technical University of Lisbon, Portugal

Abstract: Over the past twenty years, the study of infection has tended to consider individual virulence factors or host factors. The Pneumopath project (www.pneumopath.org/), a FP7 European research project, has the objective of studying the host-pathogen interactions during infection of Streptococcus pneumonaie and finding new targets for diagnosis and treatment. This research purports to identify the most important and consistently involved host and pneumococcal factors, in contrast to previous approaches, where factors where studied in isolation. The transmission of Streptococcus pneumoniae to a new host can result in asymptomatic colonization or progress to invasive disease. The infection can be determined by multiple attributes of both host and pathogen, being important to take into account the epidemiological and genomic characterization of pneumococcal strains, the results from experiments that evaluate host or pneumococcal responses to infection or different environmental challenges, and also the results from experiments that identify host genetic susceptibility factors. In this work we propose Spo (kdbio.inesc-id.pt/~cvaz/pneumopath/), an ontology developed in the context of the Pneumopath project, which provides terms and semantic constructs for annotating all aspects of host-pneumococcal interactions.

Method: The data considered includes the characterization of pneumococcal strains, typing information, as well as data of in vitro and in vivo experiments with animals and cell models, relevant for identifying new targets to combat pneumococcal diseases. Some of these data are scattered across numerous information systems and repositories, each with its own terminologies, identifier schemes, and data formats. The need to share such data brings challenges for both data management and annotation, such as, the need to have a common understanding of the concepts that describes host-pneumococcal interactions. Thus, semantic annotation and interoperability become an absolute necessity for the integration of such diverse biomolecular data. Moreover, given the heterogeneous environment inherent to the project, the ontology construction took into consideration contributions from all partners, leading to a well-grounded set of concepts and annotations.

Results: Spo provides a framework to represent mentioned host-pneumococcal interactions, being flexible enough to accommodate the rapid changes and advancement of research and achieve data interoperability and interchange. This has been only possible because of semantic Web recommended practices for clearly specifying names for things and relationships, expressing data using standardized and well-specified knowledge representation languages. The ontology described in OWL Lite v1.0 includes 36 classes, 24 object properties and 43 data properties.

Conclusion: The main contribution of this work was not only Spo, but all the approach and methodology for its construction in the context of a large research project, where many people were not aware of semantic technologies. The proposed ontology does not only describe knowledge in this field, but also allows for validating and aggregating existing knowledge, which is essential for data integration. Furthermore, the ability to accurately describe the host-pneumococcal interactions through the use of Spo has facilitated the implementation of information systems capable of coping with the heterogeneous types of data and, by using well known semantic technologies, it allowed users to query data and discover new knowledge.

Poster 3
Chem2Bio2RDF: Linked Open Data for Drug Discovery

Presenter: Bin Chen
Indiana University, Bloomington, United States

Additional Authors:
Ying Ding, Indiana University, United States
Philip Yu, University of Illinois, Chicago, United States
Eric Gifford Pfizer, United States
David Wild, Indiana University, United States

A critical barrier in current drug discovery is the inability to utilize public datasets in an integrated fashion to fully understand the actions of drugs and chemical compounds on biological systems. There is a need for not only a resource to intelligently integrate the heterogeneous datasets pertaining to compounds, drugs, targets, genes, diseases, and drug side effects now available, but also robust, effective network data mining algorithms that can be applied to such integrative data sets to extract important biological relationships. In this talk, we discuss (i) the creation of the Chem2Bio2RDF for drug discovery data, integrating chemical compounds, protein targets, genes, metabolic pathways, diseases and side-effects using Semantic Web technologies, and (ii) the development of innovative data mining algorithms to facilitate drug discovery. Chem2Bio2RDF incorporates 25 public datasets related to systems chemical biology, grouped into 6 domains: chemical (PubChem Compound, ChEBI, PDB Ligand); chemogenomics (KEGG Ligand, CTD Chemical, BindingDB, MATADOR, PubChem BioAssay, QSAR, TTD, DrugBank, ChEMBL, Binding MOAD, PDSP, PharmGKB); biological (UNIPROT, HGNC, PDB, GI); systems (KEGG Pathway, Reactome, PPI, DIP); phenotypes (OMIM, Diseasome, SIDER, CTD diseases); and literature (MEDLINE/PubMed). The number of RDF triples is approximately 110 million. We developped the domain ontology (called Chem2Bio2OWL) to better integrate these 25 datasets. The primary classes of this ontology are: SmallMolecule, MacroMolecule, Disease, SideEffect, Pathway, BioAssay, Literature and Interaction based partially on the BioPAX classes. The primary classes were further refined in accordance with current instance data structure. We proposed and tested several graph mining and machine learning algorithms (e.g., Bio-LDA, path finding, subgraph mining and diversity ranking) on the generated Chem2Bio2RDf linked open dataset to facilitate drug discovery. We found that our Bio-LDA model used the bio-terms, journal information and word information to characterize the topic providing a better representation of topics than the simple LDA model, which only can provide the word representation. Rosiglitazone is one of several thiazolinediones on the market for diabetes. Our path finding algorithm presents the set of most informative and diverse associations between the drug and the potential side effects, which shows different causes of the hepatitis side effect. Our constraint-based subgraph and diversity ranking algorithm can detect the inhibition of Catechol O-methyltransferase (COMT) in Parkinson's disease. By combining information from Drugbank, Pubchem and Uniprot, we can find information regarding the gene that Tolcapone and Entacapone targets, its name, the protein it encodes, Pubmed articles related to their interaction with COMT, and the structure of the protein it targets. In this talk, We demonstrated the potentials of data mining and graph mining algorithms to identify hidden associations that could provide valuable directions for further exploration at the experimental level. In the future, we will focus on using the identified associations and paths existing between various bio terms to predict the potential connection of other unknown bio terms.

Poster 4:
The VIVO Ontology: Enabling Networking of Scientists

Presenter: Ying Ding
Indiana University, Bloomington, Indiana, United States

Additional Authors:
Stella Mitchell, Cornell University, United States
Jon Corson-rikert, Cornell University, United States
Brian Lowe, Cornell University, United States
Bing He, John Hopkins University, United States

VIVO, funded by NIH, utilizes Semantic Web technologies to model scientists and provides federated search to enhance the discovery of researchers and collaborators across disciplines and organizations. VIVO ontology is designed with the focus on modeling scientists, publications, resources, grants, locations, and services. It incorporates classes from popular ontologies, such as BIBO, Dublin Core, Event, FOAF, geopolitical, and SKOS. VIVO data is annotated based on the VIVO ontology to semantically represent and integrate information about faculty research (i.e., educational background, publications, expertise, grants), teaching (i.e., courses, seminars, training), and service (i.e., organizing conferences, editorial boards, other community services). The VIVO ontology has been adopted nationally and internationally, and enables the national and international federated search for finding experts. VIVO is an open source Semantic Web application that, when populated with researcher interests, activities, and accomplishments, enables discovery of research and scholarship across disciplines and organizations. The VIVO core ontology models the academic community in order to provide an consistent and connected perspective on the research community to various shareholders, including students, administrative and service officials, prospective faculty, donors, funding agencies, and the public. The major impetus for NIH to fund the VIVO effort to "develop, enhance, or extend infrastructure for connecting people and resources to facilitate national discovery of individuals and of scientific resources by scientists and students to encourage interdisciplinary collaboration and scientific exchange". The application is in use at the seven institutions of the NIH VIVO project and has been adopted or to be adopted by several other universities (e.g., Harvard University) and organizations in the USA (e.g., the United States Department of Agriculture), and several universities or institutions in Australia and China (e.g., Queensland University of Technology, Chinese National Academy of Sciences) (Gewin, 2009). More specifically, VIVO can support discovering potential collaborators with complementary expertise or skills, suggesting appropriate courses, programs, and faculty members according to students’ interests, and facilitate research currency, maintenance and communication. For example, a Computer-Aided Drug Discovery (CADD) group may want to find and team up with a computer specialist and a group using in vivo experiments in drug discovery. If the VIVO core ontology is implemented in the hypothetical situation, the group leader can search across experts in computer science and molecular biology. In this paper, we present a relatively comprehensive discussion of the development of the VIVO core ontology, including the latest updates.

Poster 5:
BioPAX Community Update

Presenter: Nadia Anwar
General Bioinformatics, Berkshire, United Kingdom

Additional Authors:
Gary Bader, University of Toronto, Canada
Emek Demir, Memorial Sloan-Kettering Cancer Center, United States
Igor Rodchenkov, University of Toronto, Canada
Chris Sander, Memorial Sloan-Kettering Cancer Center, United States

BioPAX, Biological Pathway Exchange, is an OWL ontology modelling biological pathway data. Biological pathways are constructs that biologists use to represent relationships between and within chains of cellular events. For example, metabolic pathways typically represents flow of chemical reactions, while signal transduction pathways represents the chain of interactions that transmit external signals received by a cell to deliver some response within the cell. These data are as heterogeneous as the numerous data sources (pathguide.org) that supply the data. Exchange, integration and annotation of these data is a considerable challenge. BioPAX was developed to ease the access, use, exchange and aggregation of pathway data. This poster will highlight the recent community developments. The current specification, BioPAX Level 3 and its supporting API, PaxTools, were released by the community in 2009. Since this release, the BioPAX community has focused on supporting developers with the transition from L2 to L3, community organisation, interoperability with other standards and future directions from user feedback. In 2010, BioPAX joined forces with SBML, SBGN and other standards, to form the 'COmputational Modeling in BIology' NEtwork (COMBINE). This initiative aims to coordinate the development of the various community standards and formats. Through learning from the experiences of community organization in other successful standards, the BIoPAX community have re-orgnaised themselves. In place now is an invited Scientific Advisory Board and an elected Editorial Commitee who are now co-ordintating governance and proposal development with the COMBINE netowrk. The annual BioPAX meetings are also now co-ordinated with the COMBINE network, providing economy of scale in development of standards through a shared forum to share experiences, and enabling the standardization efforts to work co-operativley. These meetings are organized into two events a Hackathon (Harmony May 21–25, 2012 in Masstricht) and the COMBINE forum (Combine August 15-19 2012, Toronto). The BioPAX community is also responding to feedback from a survey undertaken in 2011. Community members, consumers and data providers gave valuable information on how they use BioPAX, the difficulties they faced and how they want to see the specification progress in the future. This feedback will be used by the new governance teams to help establish the specification in the areas it is currently used, to help extend the community beyond current usage and also to determine future directions for the community. To get involved or find out more about BioPAX see www.biopax.org join the mailing list This email address is being protected from spambots. You need JavaScript enabled to view it. or attend a meeting in 2012.

Poster 7:
Dynamic Enhancement of Drug Product Labels through Semantic Web Technologies (.pdf)

Presenter: Richard Boyce
University of Pittsburgh, PA, United States

Additional Authors:
Jodi Schneider, Digital Enterprise Research Institute, Ireland
Michael Taylor, Microsoft, United States
Maria Liakata, EBI, United Kingdom
Anita De Waard, Elsevier, United States

FDA-approved drug product labeling (packages insert or PI) is a major source of information intended to help clinicians prescribe drugs in a safe and effective manner. Unfortunately, drug PIs have been identified as often lagging behind the drug knowledge expressed in the scientific literature, especially when it has been several years since a drug has been released to the market. Out-of-date or incomplete PI information can increase the risk of otherwise preventable adverse drug events. This can occur directly if the PI fails to provide information that is needed for safe dosing or to properly manage drugs known to interact. Clinicians might also be indirectly affected if they depend on third party drug information sources, and these sources fail to add information that is available in the scientific literature but not present in the PI. We are creating a Linked Data store that will enable the drug PI to be expanding as new information becomes available in the scientific literature. The goal of the Linked Data store will be to provide clinicians, patients, and the maintainers of drug information resources with the most complete and up-to-date information on particular claims made within a PI. We are focusing on 25 currently-marketed psychotropic medications (nine antipsychotics, twelve antidepressants, and four sedative hypnotics). To construct this Linked Data repository, we aim to use Natural Language Processing (NLP) technologies identify core claims in the scientific literature and various web-based data sources that pertain to pharmacokinetic drug-drug interactions, age-related changes in clearance, metabolic clearance pathways, and genetic polymorphisms that can affect metabolism. This work aligns with the CSHAL themes "Linked Data", "Text Analysis, NLP, Question Answering", "Data Modeling: Ontologies, Taxonomies", and "Clinical Applications." Method We will identify the core rhetorical components of the content sources using a basic Scientific Discourse ontology constructed (and compatible with) biomedical discourse ontologies (i.e., SWAN, OAC and AO) and discourse annotation metadata (specifically CoreSC). The ensuing discourse annotations will distinguish between facts, hypotheses, and evidence statements, and will be automatically recognised in text following an information extraction approach similar to conceptualisation zoning. The expected result is a Linked Open Data Node, a Triple store and a SPARQL endpoint available for use by different patient, clinician, and pharmacoepidemiology-centered data sources. Human readable summaries will also be generated to expand on existing PI information. Results: While we are in the early planning phases of the project, we have built a prototype system that demonstrates the concept by identifying how claims on metabolic clearance and drug-drug interactions could be updated in two drug PIs with evidence from the scientific literature. Conclusions: We envision using the resulting Linked Data store as the back end for a system that provides pharmacokinetic information on age-related clearance changes, metabolic clearance pathways, pharmacokinetic drug-drug interactions, and genetic polymorphisms. After developing a demonstrator for the 25 psychotropics, we anticipate that it will be feasible to subsequently deploy our system for any given drug.

Poster 8:
A Case Study in Using Literature to Find Predicate Relationships and Indirect Associations (.pdf)

Presenter: James Dixon
Linguamatics Ltd., Newton, MA, United States

Additional Authors:
David Milward, Linguamatics, United Kingdom

Objectives and Motivation Gene expression has been the focus of much research, especially for treatment of carcinomas. Recently attention has turned to smaller RNA molecules that are involved in post-transcriptional regulation, microRNAs. MicroRNAs (miRNAs) are known to bind to complementary sequences on target messenger RNA transcripts (mRNAs). MiRNA-expression profiling of different neoplasms has identified signatures associated with diagnosis, staging, progression, prognosis and response to treatment. In addition, profiling has been exploited to identify miRNA genes that might be involved in cancer or oncogenic pathways. To obtain a better insight into the connection between miRNAs and diseases requires understanding of the relationships between miRNAs and genes, and the relationship between the relevant genes and diseases. This paper compares and links together data from different sources: algorithmic predictions, experimental evidence and text mined literature.

Method There are a number of publically available databases that have the miRNA to gene mapping, usually via statistical calculations. However, few have established the mechanism of action. Each mechanism is different and may matter to a researcher. Using the Linguamatics I2E text mining platform we were able to mine research literature (Medline abstracts) using natural language processing (NLP) to add relational information to the miRNA-gene combinations. A particular challenge was the nomenclature for miRNAs, which may include prefixes and suffixes. For example, they may be prefixed to distinguish species, such as hsa-miR-19a for human and mmu-miR-19a for mouse. Our approach treated them as a single family since in general, their function is very similar. This also allowed us to extract literature results where the species is not identified and only the family name is used. Since our interests lie in connecting miRNA to carcinomas, we also used the same literature source, Medline, to extract gene to carcinoma relationships, to allow linking between the miRNA and the diseases via the genes they affect.

Results Using I2E, we found over 6000 miRNA to gene relationships from Medline abstracts. These relationships overlapped to some extent with commonly used databases in the genomic field, for instance TargetScan(1004), TarBase(135) and miRecords(316). The overlap of all three databases to each other was similar to what was found with the I2E results. Focusing on a single carcinoma, non-small cell lung cancer (NSCLC), as an example, we were able to extract over 400 indirect relationships between miRNA and NSCLC, where other public databases had less than 50 miRNA to NSCLC associations.

Conclusions Since all of the public database information used had modest overlap with the results from the literature, we are confident that we added not only relational information to the miRNA-gene interactions, but also added novel relationships to the miRNA-disease connections. In addition, we have extended our network from miRNA to gene and gene to disease to a more interesting relationship of miRNA to disease via their indirect links. Creating these associations will provide researchers new avenues to explore, lead to new target identification, and hopefully, new disease treatments.

Poster 9:
Image Retrieval in Controlled English (.pdf)

Presenter: Tobias Kuhn
Yale University School of Medicine, New Haven, CT, United States

Additional Authors:
Michael Krauthammer, Yale University, United States

The Yale Image Finder (YIF) project aims at improving biomedical image and document retrieval by developing advanced image parsing and indexing strategies. To this end, we have deployed a YIF search engine, which allows for keyword searches against indexed Pubmed Central open access images. Authors often follow well-accepted layouts when depicting experiment results as gels, graphs or plots, and use image text in an equally structured fashion for labeling different image elements. Image text placement often conveys higher-level semantics, such as the names of proteins being studied under different experimental conditions. We are currently exploring innovative ways for allowing YIF users to access such structured image text content. Here, we propose the use of a controlled language interface that guides users in composing natural language queries ("Find an image where X is measured under the condition Y") that are be subsequently mapped to indexed image text content. Our approach is based on controlled natural language, i.e. a restricted subset of English with a precise and unambiguous mapping to logic. We present a prototype called Rice (Retrieving Images through Controlled English) that is based on an interface we developed for a different domain (annotated text corpora) and adopted for image mining. Users can write seemingly natural queries like "Find an image that is a Western blot and where 'p38' is compared to 'MKK3'" which is subsequently translated into a logical representation like "western-blot & compared(p38,MKK3)". Such logical representations can then be matched with the formal model that we extract from images found in biomedical papers. One serious problem with controlled natural language is that it is very easy to read and understand but hard to write. Our prototype solves this problem by providing a predictive editor, with which users construct syntactically correct sentences in an iterative and guided way. For any partial sentence, the predictive editor of Rice shows the possible continuations in the form of different menu boxes. In this way, users do not need to know about the restrictions of our language beforehand. Previous evaluation has shown that this editor is very easy to use after very little or no training. Typical users of search engines are not familiar with logic notations and rarely have the time to learn one. Existing query interfaces are either very simple (i.e. keyword-based) or too complex to be usable without training. With Rice, complex queries can be written in a natural and intuitive way. The interface should be immediately accessible to researchers interested in the results represented in images of the biomedical literature. Rice supports queries with directed relationships "... where A is measured under the condition B", resulting in the retrieval of highly specific image sets. In contrast, keyword searches cannot build such refined query representations, and cannot easily tell apart a related query "… where B is measured under condition A". Our prototype is still incomplete, but we believe that it nicely demonstrates the potential of our approach, and the positive results of previous work make us confident of its practicality.

Poster 10:
A Simplified Method for Creating a Cell Cycle Ontology for the Laboratory Mouse (.pdf)

Presenter: Mary E Dolan
MGI, The Jackson Laboratory, Bar Harbor, ME, United States

Additional Authors:
Chris J. Mungall, Lawrence Berkeley National Laboratory, United States
Heiko Dietze, Lawrence Berkeley National Laboratory, United States
Judith Blake, MGI, The Jackson Laboratory, United States

The cell cycle is an essential, highly conserved, complex process. Understanding the cell cycle is important in understanding development, aging, and the progression of many diseases including cancer. Mouse Genome Informatics (MGI) is the international database resource for the laboratory mouse, providing genetic, genomic, and biological data to facilitate the study of the mouse as a model for human health and disease. We have recently developed a mouse cell cycle ontology as a novel approach to data integration for the diverse data on the laboratory mouse that is available at MGI. Currently at MGI, 1070 mouse genes are functionally associated with the cell cycle and have been annotated to the Gene Ontology (GO) term ‘cell cycle’ and its descendants. This mouse cell cycle gene set also has a large body of additional experimental annotation: 8126 experimental GO annotations in addition to ‘cell cycle’; 581 genes have phenotypic alleles with 31,134 phenotype annotations describing 10,129 affected anatomical systems; 512 genes have curated OMIM (Online Medelian Inheritance in Man) associations to mouse models; 58 genes have pathway (MouseCyc) annotations; and 1055 genes have human orthologs. Many of these data are described by different ontologies from the Open Biomedical Ontologies (OBO): gene product function data is annotated using the Gene Ontology; mouse phenotype data using the Mammalian Phenotype Ontology; expression data using the Adult Mouse Anatomical Dictionary and the Edinburgh Mouse Atlas for embryonic stages. Our mouse cell cycle ontology provides a view across these distinct ontologies providing a richer description of the data. The analysis of data related to cell cycle processes requires an integrated view that pulls together as much data as possible. Our approach adapts and extends a method that has been used by other groups to develop cell cycle ontologies for other organisms, including human, yeast, and Arabidopsis. In this work, we describe the structure and content of our mouse cell cycle ontology, Mouse_CCO, as an ‘application’ ontology built on experimental evidence-based annotations for the specific purpose (application) of studying the cell cycle. The structure of Mouse_CCO provides the generic template for the ontology, which is then populated using 1070 mouse cell cycle genes along with all their annotations from MGI and several additional data resources. The data drives the structure and allows a user to ‘discover’ connections. As an experimental evidence-based ontology, it is particularly important to keep the ontology up to date. The two newly developed tools also described in this work simplify maintenance of the ontology: the first allows a user to download mouse genes and selected annotations in OBO format that is then used by the second tool, Oort (OBO Ontology Release Tool), to perform MIREOT-like procedures to create a merged ontology bringing in subsets of external ontologies. The final product is Mouse_CCO in both OBO and OWL formats that can be queried and explored using a variety of free, publicly available tools. Our hope is that this resource will facilitate hypothesis generation based on the cell cycle as a biological system.

Poster 11:
OpenBEL, the BEL Framework, and the BEL Portal

Presenter: Julian Ray
Selventa, Cambridge, MA, United States

Additional Authors:
Ted Slater, Selventa, United States
Natalie Catlett, Selventa, United States
David De graaf, Selventa, United States

The Biological Expression Language (BEL) and supporting technology platform, the BEL Framework, will be released by Selventa, in conjunction with Pfizer, to the life sciences community in Q1 of 2012. BEL and the BEL Framework are designed to promote the collection, sharing, and interchange of structured knowledge within and among organizations. The BEL Portal, at http://belframework.org/, is the online community home for BEL and the BEL Framework. The Biological Expression Language (BEL) is a language for representing scientific findings in the life sciences in a computable form. BEL is designed to represent scientific findings by capturing causal and correlative relationships in context, where context can include information about the biological and experimental system in which the relationships were observed, the supporting publications cited, and the process of curation. BEL is intended as a knowledge capture and interchange medium, supporting the operation of systems that integrate knowledge derived from independent efforts. The BEL Language has been designed and used by our scientists and our customers for almost a decade. The language has been specifically designed to help scientists record life science facts in a way that is intuitive, easy to learn, concise, and appealing. A good language should help the user articulate an idea in a manner that is unambiguous, terse, and conveys the facts and associated contexts without loss or ambiguity. BEL is designed to do just this for life science applications. The current version of the language is small, which makes it easy to learn. BEL supports both causal and correlative relationships as well as negative relationships, which makes it suitable for recording a variety of experimental and clinical findings, and it can be used with almost any set of vocabularies and ontologies, which makes it highly adaptable and easy to adopt. BEL can be easily extended to annotate findings with use-specific contexts such as experimental and clinical parameters. The BEL Framework is an emerging open-platform technology specifically designed to overcome many of the challenges associated with capturing, integrating, and storing knowledge within an organization, and sharing the knowledge across the organization and between business partners. The BEL Framework provides mechanisms for knowledge capture and management; integration of knowledge from multiple, disparate knowledge streams; knowledge representation and standardization in an open, use-neutral format; creating customizable, computable biological networks from captured knowledge; and quickly enabling knowledge-aware applications using standardized application programming interfaces (APIs) across all major development platforms. Registering on the BEL Portal gives you access to more detailed documentation about BEL and the BEL Framework, and also allows you to participate in our community section and offer your views, opinions, and suggestions on the language and framework as well as keeping you informed on the progress of the official launch. Once you register you will have access to example documents, best practices, technical specifications, configuration guides, code examples, a wiki, and discussion groups.

Poster 12:
Semantic Integration to Characterize Microbial Pathogens: Multi-resource Enrichment of Experimental Proteomic and Genomic Datasets (.pdf)

Presenter: Erich Gombocz
IO Informatics, Inc., Berkeley, CA, United States

Additional Authors:
James Candlin, Sage-n Research, United States

Bacterial and viral-caused infectious diseases account for major health threats globally, yet the characterization, identification and understanding of them has been scientifically challenging. This is mainly due to the fact that while there is a wealth of information (and even complete genomes) available, its integrated utilization in context of the biological system to better understand causes and similarities in infectious diseases is still in its infancy. This poster tries to address some of the many obstacles involved in this endeavor as it attempts to identify peptides from different microorganism with common mechanism of actions causing disease, and to use them as biomarkers to detect pathogenic microbial threats prior to onset of disease symptoms to help in outbreak prevention. The presented workflow to accomplish this goal consist of 5 steps. The first step is a thorough peptide analysis of microorganism via mass spectrometry and their identification by sequence scoring (Sorcerer, indexed SEQUEST search, BioWorks). The second step is the annotation of peptides with genes and genomic sequences relevant to protein expression to qualify the accuracy of the identification. Step 3 involves the use of public domain microbial databases (PATRIC, ICTV, VIDA, Viral ORFeome, miRBase) to semantically integrate the experiments with organism taxon-specific functional genomic and pathway information relevant to diseases caused by the pathogens. Based on sequence similarity, sequences are clustered into homologous protein families (HPFs), and those protein families are enriched with annotations including functional classification, related protein structures, taxonomy, protein length, boundaries of conserved regions and bacterial or virus-specific genes. Further enrichment is achieved through addition of disease-related pathways (BioCyc, KEGG). The resulting knowledgebase provides a network with functional annotations to peptides and their relationships to diseases (Sentient Knowledge Explorer). In Step 4, those peptides in the network are identified which have similar disease-causing functions and appear in several pathogens. Interrogating the network via semantic queries (SPARQL) results in discovery of key pathway intersections commonly involved in the disease. The last step is the creation of molecular marker signatures (SPARQL, Applied Semantic Knowledgebases - ASK) and test their validity as decision support in multiplexed assays. Future applications will apply this technology for rapid detection of biological threats, to characterize origin and type of disease outbreaks and to develop preventive measures (such as broadly applicable drugs or vaccines) effective for entire classes of pathogenic organism.

Poster 13:
The Quad Economy of a Semantic Web Ontology Repository

Presenter: Trish Whetzel
National Center for Biomedical Ontology, Stanford, CA, United States

Additional Authors:
Paul R. Alexander, Stanford University, United States
Mark A. Musen, Stanford University, United States
Natalya F. Noy, Stanford University, United States

BioPortal is an open library of biomedical ontologies that can be accessed using a Web-based user interface or RESTful Web services. The Web-based user interface allows users to browse, search, and visualize ontologies and facilitates community participation in the ontology lifecycle, including reviews of ontologies, mappings between terms, comments and new term proposals. A suite of Web services, including Web services that expose information about terms in ontologies, mappings, notes, and metadata about the ontologies themselves, drives the Web-based interface. The NCBO Web services provide a common XML output of ontology content regardless of the ontology representation format, however there is no single uniform storage for the ontologies and their metadata. As the amount of information in BioPortal and number of hits to the NCBO Web services increase, a more scalable solution is needed. To address these issues, we analyzed the use of a quad store since quad stores easily scale to millions of triples and provides SPARQL query access to the ontologies. Currently each ontology in BioPortal includes the materialization of all owl:imports. Thus, if a small ontology imports a large ontology then the former becomes a large ontology. Taking into account that BioPortal stores multiple versions of an ontology, the problem is reproduced for every version. Our hypothesis was that we could optimize the number of quads in the system using a more granular model where owl:imports are not materialized and every ontology graph contains its own RDF triples without the triples from the owl:imports ontologies. One of the questions to be answered is the optimization ratio–in number of triples¬–when using an ontology-per-graph model versus a closure-materialized model. Of the 149 OWL ontologies reviewed, there are 299 ontologies in the import closure (i.e., if we follow all the owl:imports links from the 149 ontologies, we will create a set of 299 ontologies). These 299 OWL ontologies contain 303 owl:imports, the materialized import closure is a set of 495 owl:imports. We also reviewed the number of re-used triples. Ontologies with no imports gather 5.4M triples in the system; ontologies with one import 1.7M; ontologies with 2-9 imports reach 0.5M triples; and more than 10 imports 2.1M. To conclude, our analysis shows that while ontology reuse is still far from being the norm, effective reuse is a goal worth pursuing and the level of reuse can have significant implications for the scalability of ontology storage systems.

Poster 14:
A PubMed Search Engine for Rat Genome Curation at RGD

Presenter: Weisong Liu
The Medical College of Wisconsin, Milwaukee, WI, United States

Additional Authors:
Mary Shimoyama, The Medical College of Wisconsin, United States
Melinda Dwinell, The Medical College of Wisconsin, United States
Howard Jacob, The Medical College of Wisconsin, United States

The Rat Genome Database (RGD) is a collaborative effort between leading research institutions involved in rat genetic and genomic research. One of the main tasks of RGD is to curate rat gene related literatures and enter the information into our database. In this work, we built a PubMed search engine to help our curators locate paper-of-interest more efficiently. Using NCBI’s Entrez Utilities for Java, we have created a pipeline to weekly download PubMed data in XML format. We parse the XML files using a parser generated from NCBI’s efetch_pubmed.xsd file to extract information such as PMID, title, abstract, publication date and authors. The parsed information is stored in a MySQL database. This makes it easier for us to further utilize the information. By making use of the GATE and the UIMA frameworks, we built another pipeline to extract ontology (gene ontology, rat strain ontology, disease ontology, sequence ontology and organism ontology) terms and synonyms, and gene names/symbols from the PubMed titles and abstracts stored in the database. Some third-party plugins, such as Abner, OrganismTagger and MetaMap, were also used in this pipeline. The output of this pipeline includes ontology IDs, term positions within an abstract, and matching types. In order to make our framework scalable, we set up a small Hadoop cluster. The XML files are compressed and stored on Hadoop HDFS. Using the MapReduce framework, we can run our XML parsing and information extraction pipeline in many parallel threads. This dramatically reduced the total processing time comparing to a single-threaded program. The pipeline can also run on Amazon Web Services’ Elastic MapReduce. Along with the stored PubMed information, the pipeline output is fed into a Solr server. All information is indexed by Apache Lucene. With a web based user-interface, a user can search for PubMed abstracts by entering PMIDs, authors, publication dates, terms, ontologies, ontology IDs, gene names and(or) gene symbols. The search results are ranked by relevance. Matched terms are sorted by frequency of appearance in an abstract.

Poster 16:
Turning Biological Knowledge into Mathematical Models, Automated (.pdf)

Presenter: Oliver Ruebenacker
Center for Cell Analysis and Modeling, University of Connecticut Health Center, United States

Additional Authors:
Michael Blinov, Center for Cell Analysis and Modeling, University of Connecticut Health Center
Ion Moraru, Center for Cell Analysis and Modeling, University of Connecticut Health Center
James Schaff, Center for Cell Analysis and Modeling, University of Connecticut Health Center

Living organisms are so enormously complex that we need computer simulations to understand the consequences of their vast biochemical reaction networks. As we uncover an increasing part of these networks, our established knowledge is increasingly stored in free web databases and available for query and download in machine-readable formats, especially in the RDF/OWL-based community standard Biological Pathways Exchange (BioPAX) [1]. The available data is massive and growing, e.g. Pathway Commons [2] stores 1,700 pathways, 414 organisms, 440,000 interactions and 86,000 substances. This data is fully linked with open controlled terminologies such as gene ontology (e.g. anatomical features) [3] and other free online databases such as ChEBI (chemicals) [4], KEGG (genes a.o.) [5], UniProt (proteins) [6] and PubMed (publications) [7].

Automatic use of this knowledge for computer simulations of biological organisms has been an ongoing challenge [8,9,10]. Now, Systems Biology Pathway Exchange (SBPAX) [11], a BioPAX extension, allows the inclusion of quantitative data and systems biology terms, especially the Systems Biology Ontology (SBO) [12]. SBPAX support has been implemented by the Virtual Cell [13], Signaling Gateway Molecule Pages [14] and System for the Analysis of Biochemical Pathways - Reaction Kinetics (SABIO-RK) [15]. For the first time, a mathematical model can be automatically built and fully annotated from a pathway of interest.

Citations:
[1] Biological Pathway Exchange (BioPAX), www.biopax.org
[2] Pathway Commons, www.pathwaycommons.org
[3] Gene Ontology (GO), www.geneontology.org/
[4] Chemical Entities of Biological Interest (ChEBI), www.ebi.ac.uk/chebi/
[5] Kyoto Encyclopedia of Genes and Genomes (KEGG), wwww.genome.jp/kegg/
[6] UniProt, www.uniprot.org
[7] PubMed, www.ncbi.nlm.nih.gov/pubmed/
[8] Modeling without Borders: Creating and Annotating VCell Models Using the Web, Michael L. Blinov, Oliver Ruebenacker, James C. Schaff and Ion I. Moraru, Lecture Notes in Computer Science, 2010, Volume 6053 (2010).
[9] Using views of Systems Biology Cloud: application for model building, Oliver Ruebenacker, Michael Blinov, Theory in Biosciences, Volume 130, Number 1, 45-54 (2010).
[[10] Integrating BioPAX pathway knowledge with SBML models, Michael L Blinov, Oliver Ruebenacker, Ion I Moraru, IET Syst. Biol., 2009, Vol. 3, Iss. 5, pp. 317-328 (2009).
[11] Systems Biology Pathway Exchange (SBPAX), www.sbpax.org
[12] Systems Biology Ontology, www.ebi.ac.uk/sbo/main/
[13] Virtual Cell, http://vcell.org
[14] Signaling Gateway Molecule Pages, www.signaling-gateway.org/molecule/
[[15] System for the Analysis of Biological Pathways – Reaction Kinetics (SABIO-RK), http://sabio.villa-bosch.de/

Poster 17:
Bioquery-Asp: Querying Biomedical Ontologies In Natural Language Using Answer Set Programming (.pdf)

http://krr.sabanciuniv.edu/projects/BioQuery-ASP/

Presenter: Esra Erdem
Sabanci University, Istanbul, Turkey

Additional Authors:
Yelda Erdem, Research and Development Department
Sanovel Pharmaceutical Inc., Istanbul, Turkey
Halit Erdogan, Faculty of Engineering and Natural Sciences
Sabanci University, Istanbul, Turkey
Umut Oztok, Faculty of Engineering and Natural Sciences
Sabanci University, Istanbul, Turkey

Storing biomedical data in various structured forms, like biomedical ontologies, and at different locations have brought about many challenges for answering queries about the knowledge represented in these ontologies. One of the challenges is to represent a complex query in a natural language and get its answers in an understandable form. Another challenge is to answer complex queries that require appropriate integration of relevant knowledge stored in different places and in various forms, and/or that require auxiliary definitions, such as, chains of drug-drug interactions, cliques of genes based on gene-gene relations, similarity/diversity of genes/drugs. Furthermore, once an answer is found for a complex query, the experts may need further explanations about the answer. We have built a software system, called BIOQUERY-ASP, that handles all these challenges. Method: We have addressed the three challenges described above using the high-level knowledge representation formalism and efficient automated reasoners of Answer Set Programming (ASP) - a declarative programming paradigm that supports various semantic Web technologies. To address the first challenge, we have developed a controlled natural language for biomedical queries about drug discovery; this language is called BIOQUERY-CNL. Then we have built an intelligent user interface that allows users to enter biomedical queries in BIOQUERY-CNL and that presents the answers (possibly with explanations or related links, if requested) in BIOQUERY-CNL. To address the second challenge, we have developed a rule layer over biomedical ontologies and databases that not only integrates the concepts in these knowledge resources but also provides definitions of auxiliary concepts. We have introduced an algorithm to identify the relevant parts of the rule layer and the knowledge resources with respect to the given query, and used automated reasoners of ASP to answer queries considering these relevant parts. To address the third challenge, we have developed an intelligent algorithm to generate an explanation for a given answer, with respect to the query and the relevant parts of the rule layer and the knowledge resources. The overall system architecture for BIOQUERY-ASP is presented in the figure included in the supporting document. Results: We have shown the applicability of BIOQUERY-ASP to answer complex queries (specified by experts) over large biomedical knowledge resources.

Poster 18:
Proposed Ontology for Seizure and Epilepsy

Presenter: Robert Yao
Arizona State University, United States

Additional Authors:
Graciela Gonzalez, Arizona State University, United States
Jeffrey Buchhalter, Phoenix Children's Hospital, United States

The understanding and classification of seizures and epilepsy syndromes have constantly changed with the advent of new knowledge from new technologies. Ontologies provide a structured knowledge framework that could aid in more precisely defining and standardizing terminologies and diagnoses. This in turn could enhance the abilities of researchers and clinicians to pinpoint the causes of a disorder, discover new treatment measures, and improve patient outcomes.

Hypothesis We hypothesize that a more refined ontology for seizures and epilepsy syndromes that adequately reflects the latest measurements, observations and medical findings can be used to assist empirical diagnosis of epilepsy and to potentially differentiate new syndromes in a logical and standardized format.

Methods A review of previously proposed Seizure and Epilepsy classifications is being done to determine the most general way to classify each seizure, syndrome, and epilepsy. By analyzing and defining the building blocks of Epilepsy, an Epilepsy Ontology is iteratively formalized using Protege. Each seizure and syndrome will be instantiated to the ontology to determine if it provides a reasoning framework on epilepsy knowledge.

Results A poly-axial ontology is being defined to encode the conceptual building blocks of seizures and Epilepsy. The ontology will be open for both qualitative and quantitative evaluation when the data/evidence is available in preference over consensus expert opinion.

Conclusions The aim of this ongoing work is to help clinicians better understand the etiology of seizures and definitions of and relationships between seizures and epilepsy syndromes, and to provide a more helpful path towards research, diagnosis, and treatment of the disorder. Eventually, this ontology could be expanded for use with other diseases, providing more structured definitions. Such a standard framework could also help pinpoint knowledge deficits which in turn should drive laboratory and clinical experiments to discover missing knowledge.

Poster 19:
Exploiting Ontology Information for Extracting Keyphrases from Biomedical Articles

Presenter: Kyu-Baek Hwang
Soongsil University, Seoul, Korea

Additional Authors:
Sun Gon Kim, University of Seoul, Korea
Eunok Paek, University of Seoul, Korea

Keyphrases (or keywords) of a document serve a role of compactly representing its content. They can be used for indexing or summarization purposes. Our method for keyphrase extraction is based on supervised machine learning combined with ontology information. It consists of two stages: (1) keyphrase candidate generation and (2) keyphrase selection. In the first stage, keyphrase candidates are generated by extracting every unigram, bigram, and trigram of the words in the title and abstract of each article. Also, a set of ontology terms are assigned to each article. For this, any automated methods for ontology term assignment, e.g., vector space models, can be adopted. Ontology terms are used for expanding the set of keyphrase candidates. In specific, keyphrases, frequently co-occurred with the ontology terms assigned to a document, are added to its candidate set. In the second stage, keyphrases are selected from the expanded candidate set by supervised machine learning. Features for supervised learning include term and inverse document frequencies, length, first/last occurrence positions, and relationships with ontology terms. The confidence and lift of an association rule between keyphrases and ontology terms are used for representing their relationships. Because multiple ontology terms are usually assigned to a document, ontology-related feature values are averaged across all of them. Results: The proposed method was applied to a dataset consisting of 1,799 articles from three journals in the biomedical literature, i.e., IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal of Computational Biology, and Journal of Proteome Research. The MeSH (Medical Subject Heading) descriptors, which constitute a biomedical ontology, are manually assigned to the articles published in these journals for PubMed indexing. These MeSH descriptors represent the subject content. In our experiments, MeSH descriptors were automatically assigned to each document of our dataset by a vector space model-based method. In addition, each article of these journals is annotated with about four to six author-provided keyphrases. These author keyphrases were used as a gold standard for keyphrase extraction evaluation. We conducted a 10-fold cross validation experiment using several supervised machine learning methods including naïve Bayes classifiers and Bayesian networks. The experimental results showed that the inclusion of ontology information improved the keyphrase extraction performance about 100% in terms of the F1-measure. When the number of extracted keyphrases was set to five, our method achieved an F1-measure of about 0.185 and the performance increase was 129%. We also compared our method with KEA, a method for keyphrase extraction using syntactic features (which is accessible at www.nzdl.org/Kea/index.html). Our method was always better than KEA regardless of the number of extracted keyphrases (the performance increase was from 2 to 98%). These results confirm the fact that semantic information about document topics plays a central role in keyphrase extraction. Conclusions: We proposed a method for keyphrase extraction from documents using ontology information. Through a set of experiments, we showed that the inclusion of ontology information about document topics could greatly improve the performance in keyphrase extraction.

[TOP]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Presenters

Updated February 01, 2012

Links within this page:
Nadia Anwar Christopher Baker Richard Boyce Ron Calvanio Bin Chen Mélanie Courtot Anita de Waard Ying Ding	Alexandre Francisco Natalia Grabar James McCusker Eitan Rubin James Snowden Young Soo Song Cátia Vaz Trish Whetzel

Target Identification Using an Integrated Subset of the Yeast Interactome with Chemical Genomic Data in RDF

Nadia Anwar
General Bioinformatics
Reading, United Kingdom

Abstract: Semantic web technologies provide a well-established, efficient and cost-effective data integration strategy. We demonstrate the advantages these technologies offer in target discovery through combining experimental evidence from several target discovery technologies using the linked data paradigm. Drug discovery methods such as drug induced HaploInsufficiency profiling (HIP) are here combined with other chemical genomic data and genetic interaction networks to improve the sensitivity and specificity of target identification. We demonstrate the value of this integration using the yeast interactome with complementary experimental evidence within a fungal target discovery pipeline for triaging of hit lists, target identification and target deconvolution. Genetic interaction profiles generated from yeast deletion strains, using methods such as SGA, describe the relationships between genes. These profiles integrated into a network of genetic interactions are used to uncover and predict the functions of uncharacterized genes. Chemical Genomic data describes the influence of small molecules on biological systems and are used to characterize the effect of compounds at the cellular level. Chemical genomic methods including HaploInsufficiency Profiling (HIP), Homozygous profiling (HOP) and Multi-copy Suppression Profiling (MSP) are commonly used together to overcome the limitations in individual technologies. For example, HIP is used to identify small molecule (drug) targets, however, HIP is limited to molecules that inhibit cell growth and will fail to identify targets with functional paralogs. While HIP identifies direct targets, HOP is especially useful for providing insight into potential drug interactions. A combination of these approaches provides a more complete view, specifically identifying both on-target effects and off-target effects. Since HIP, HOP and MSP are based on the same principles, a combined approach, although more time consuming and expensive, delivers more comprehensive data. An alternative combined approach is based on the idea of re-using established genomic scale data. Constanzo et al. re-use their clustered genetic interactions (GI) by correlating these with chemical-genetic interactions (CGI) from HOP. They successfully re-use their GI and CGI data and demonstrate that their combined profiles are complementary to HIP. Following Costanzo et al, we used semantic technologies to integrate the genetic interaction profiles described in their paper, with a test set of compounds assayed using HIP. Specifically, we have followed the linked data approach using clustered networks of GI and CGI in yeast, layered over chemical genomic experiments. We show that this method is not only a cost-effective integration strategy for these data but it also simplifies the discovery of the target as well as relevant interactions. Creating a foundational resource of these data in this fashion allows new experimental results and clusters to be layered into the network efficiently, moving from boutique, case by case integration, to a scalable and robust integration resource. We demonstrate how this integrated data set can be used to identify profiles for compounds of interest, its target, and aid the visualization of the targets network proximal to the compound’s immediate target. Finally, we demonstrate how providing such a comprehensive view of view of the data eases the investigation of the compound’s mechanism of action.

[top]

Intelligent Surveillance of Health Care-associated Infections with SADI Semantic Web Services

Christopher Baker
University of New Brunswick
Saint John, Canada

Abstract: 1. Objectives and Motivation Clinical Intelligence (CI) tools support data analysis for the purposes of clinical research, surveillance and rational health care management. Ad-hoc querying of clinical data is one desirable type of functionality. Since most of the data is currently stored in relational form, ad-hoc querying is problematic as it requires specialized technical skills and the knowledge of particular data schemas. A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. Existing approaches to semantic querying of relational data, based on declarative semantic mappings from data schemas to ontologies cannot cope with situations when some computation is required in query time. We are reporting preliminary progress on a project dedicated to the use of SADI Semantic Web services [1] for semantic querying of clinical data for the surveillance of hospital-acquired infections (HAI) [2]. 2. Method We implement semantic access to a Relational DB by using an ontology for HAI and modeling the RDB in it. The modeling is implemented by SADI Semantic Web services that can be automatically discovered and invoked based on the needs of a particular query. The main services draw data from the DB, but services bringing data from external resources are also used. Users formulate SPARQL queries using primitives from the ontology and execute them via a SADI query engine. The querying can be both ad-hoc and self-service because the users need not know RDB programming. 3. Results To test our approach in a CI scenario dedicated to the surveillance for HAI, we are prototyping a SADI-based infrastructure for semantic querying of The Ottawa Hospital datawarehouse (see, e. g., [3]). Our infrastructure includes an ontology defining concepts suitable for reasoning about Hospital-Acquired Infections and a number of SADI services on the datawarehouse. To test the infrastructure, we write SPARQL queries representing questions a HAI surveillance professional would like to ask, such as "Which patients were diagnosed with SSI while they were taking corticosteroids?" or "How many diabetic patients were diagnosed with SSI?". To facilitate temporal comparisons required by many competency questions, we created a time ontology and wrote a set of SADI services implementing temporal reasoning. 4. Conclusions The main conclusion from our work on semantic querying so far is that the use of SADI services via a SPARQL interface is a viable general direction. Our approach will add to the pool of existing practical methods for semantic querying of RDB, at least in CI. 5. References [1] M. D. Wilkinson, B. Vandervalk, and L. McCarthy. SADI Semantic Web Services "cause you can't always GET what you want! Proceedings of the IEEE APSCC 2009. Singapore; 2009. [2] A. Shaban-Nejad, G.W. Rose, A. Okhmatovskaia, A. Riazanov, C.J. Baker, R. Tamblyn, A.J. Forster, and D.L. Buckeridge. Knowledge-based surveillance for preventing postoperative surgical site infection. Proceedings of MIE, Oslo, Norway 2011 [3] G.W. Rose. Use of an Electronic Data Warehouse to Enhance Cardiac Surgical Site Infection Surveillance at a Large Canadian Centre. MSc thesis, University of Ottawa.

[top]

Dynamic Enhancement of Drug Product Labels Through Semantic Web Technologies

Richard Boyce
University of Pittsburgh
Pittsburgh, United States

Abstract: FDA-approved drug product labeling (packages insert or PI) is a major source of information intended to help clinicians prescribe drugs in a safe and effective manner. Unfortunately, drug PIs have been identified as often lagging behind the drug knowledge expressed in the scientific literature, especially when it has been several years since a drug has been released to the market. Out-of-date or incomplete PI information can increase the risk of otherwise preventable adverse drug events. This can occur directly if the PI fails to provide information that is needed for safe dosing or to properly manage drugs known to interact. Clinicians might also be indirectly affected if they depend on third party drug information sources, and these sources fail to add information that is available in the scientific literature but not present in the PI. We are creating a Linked Data store that will enable the drug PI to be expanding as new information becomes available in the scientific literature. The goal of the Linked Data store will be to provide clinicians, patients, and the maintainers of drug information resources with the most complete and up-to-date information on particular claims made within a PI. We are focusing on 25 currently-marketed psychotropic medications (nine antipsychotics, twelve antidepressants, and four sedative hypnotics). To construct this Linked Data repository, we aim to use Natural Language Processing (NLP) technologies identify core claims in the scientific literature and various web-based data sources that pertain to pharmacokinetic drug-drug interactions, age-related changes in clearance, metabolic clearance pathways, and genetic polymorphisms that can affect metabolism. This work aligns with the CSHAL themes "Linked Data", "Text Analysis, NLP, Question Answering", "Data Modeling: Ontologies, Taxonomies", and "Clinical Applications." Method We will identify the core rhetorical components of the content sources using a basic Scientific Discourse ontology constructed (and compatible with) biomedical discourse ontologies (i.e., SWAN, OAC and AO) and discourse annotation metadata (specifically CoreSC). The ensuing discourse annotations will distinguish between facts, hypotheses, and evidence statements, and will be automatically recognised in text following an information extraction approach similar to conceptualisation zoning. The expected result is a Linked Open Data Node, a Triple store and a SPARQL endpoint available for use by different patient, clinician, and pharmacoepidemiology-centered data sources. Human readable summaries will also be generated to expand on existing PI information. Results: While we are in the early planning phases of the project, we have built a prototype system that demonstrates the concept by identifying how claims on metabolic clearance and drug-drug interactions could be updated in two drug PIs with evidence from the scientific literature. Conclusions: We envision using the resulting Linked Data store as the back end for a system that provides pharmacokinetic information on age-related clearance changes, metabolic clearance pathways, pharmacokinetic drug-drug interactions, and genetic polymorphisms. After developing a demonstrator for the 25 psychotropics, we anticipate that it will be feasible to subsequently deploy our system for any given drug.

[top]

E-Diary Data Collection in Neurology and Psychiatry: Computational Achievements and Challenges

Ron Calvanio
Massachusetts General Hospital & Harvard Medical School
Cambridge, United States

Additional Authors: F. Buonanno, MD

Dr. Calvanio will present e-diary data recorded by patients undergoing outpatient treatment in a neurology clinic at the Massachusetts General Hospital. Patients had, or were suspected of having, a sudden onset disorder: a stroke, a traumatic brain injury, etc. Patient symptom complaints were: sensory or motor spells, headaches, emotional outbursts, fatigue, sleep disturbances, cognitive lapses, or odd behavior. Personalized e-diaries were designed to identify routine events that may have influenced symptom expression. Identification of symptom influences was then used to resolve diagnostic issues and to enhance treatment outcomes. Dr. Calvanio will show: 1) how e-diary data reveal symptom influence patterns, many of which patients are not aware; 2) how identification of these influences improves care; 3) what the computational challenges are in data coding, data analysis, and data pattern representation.

[top]

Chem2Bio2RDF: Linked Open Data for Drug Discovery

Bin Chen
Indiana University
Bloomington, United States

Abstract: A critical barrier in current drug discovery is the inability to utilize public datasets in an integrated fashion to fully understand the actions of drugs and chemical compounds on biological systems. There is a need for not only a resource to intelligently integrate the heterogeneous datasets pertaining to compounds, drugs, targets, genes, diseases, and drug side effects now available, but also robust, effective network data mining algorithms that can be applied to such integrative data sets to extract important biological relationships. In this talk, we discuss (i) the creation of the Chem2Bio2RDF for drug discovery data, integrating chemical compounds, protein targets, genes, metabolic pathways, diseases and side-effects using Semantic Web technologies, and (ii) the development of innovative data mining algorithms to facilitate drug discovery. Chem2Bio2RDF incorporates 25 public datasets related to systems chemical biology, grouped into 6 domains: chemical (PubChem Compound, ChEBI, PDB Ligand); chemogenomics (KEGG Ligand, CTD Chemical, BindingDB, MATADOR, PubChem BioAssay, QSAR, TTD, DrugBank, ChEMBL, Binding MOAD, PDSP, PharmGKB); biological (UNIPROT, HGNC, PDB, GI); systems (KEGG Pathway, Reactome, PPI, DIP); phenotypes (OMIM, Diseasome, SIDER, CTD diseases); and literature (MEDLINE/PubMed). The number of RDF triples is approximately 110 million. We developped the domain ontology (called Chem2Bio2OWL) to better integrate these 25 datasets. The primary classes of this ontology are: SmallMolecule, MacroMolecule, Disease, SideEffect, Pathway, BioAssay, Literature and Interaction based partially on the BioPAX classes. The primary classes were further refined in accordance with current instance data structure. We proposed and tested several graph mining and machine learning algorithms (e.g., Bio-LDA, path finding, subgraph mining and diversity ranking) on the generated Chem2Bio2RDf linked open dataset to facilitate drug discovery. We found that our Bio-LDA model used the bio-terms, journal information and word information to characterize the topic providing a better representation of topics than the simple LDA model, which only can provide the word representation. Rosiglitazone is one of several thiazolinediones on the market for diabetes. Our path finding algorithm presents the set of most informative and diverse associations between the drug and the potential side effects, which shows different causes of the hepatitis side effect. Our constraint-based subgraph and diversity ranking algorithm can detect the inhibition of Catechol O-methyltransferase (COMT) in Parkinson's disease. By combining information from Drugbank, Pubchem and Uniprot, we can find information regarding the gene that Tolcapone and Entacapone targets, its name, the protein it encodes, Pubmed articles related to their interaction with COMT, and the structure of the protein it targets. In this talk, We demonstrated the potentials of data mining and graph mining algorithms to identify hidden associations that could provide valuable directions for further exploration at the experimental level. In the future, we will focus on using the identified associations and paths existing between various bio terms to predict the potential connection of other unknown bio terms.

[top]

Adverse Events Following Immunization: Standardization, Automatic Case Classification and Signal Detection

Mélanie Courtot
British Columbia Cancer Research Centre
Vancouver, Canada

Additional Authors:
Ryan R. Brinkman
BC Cancer Agency, Vancouver, BC, Canada
Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada

Alan Ruttenberg
University at Buffalo, Buffalo, NY, USA

Abstract: Analysis of spontaneous reports of Adverse Events Following Immunization (AEFIs) is an important way to identify potential problems in vaccine safety and efficacy and summarize experience for dissemination to health care authorities. However, current reporting methods are not sufficiently controlled. While there is general adoption of Medical Dictionary of Regulatory Activities (MedDRA) in the reporting systems we consider, definitions are not provided for MedDRA terms, reports are not annotated in a consistent manner, differing in experience of annotator, and annotation is done either at entry time, or post-hoc. Sometimes, only the final adverse event code is saved, discarding evidence supporting the diagnosis. Because of these practices, interpretation of such spontaneous reports is tedious, costly and time consuming. The Adverse Event Reporting Ontology (AERO)we are building plays a role in increasing accuracy and quality of reporting, ultimately enhancing response time to adverse event signals. Methods: In order to address these deficiencies, we work with the Brighton Collaboration who has done extensive work towards standardization of case definitions and diagnostic criteria for vaccine adverse events. Based on our initial results with AERO, a working group has been established within the Brighton network, including representation from the Public Health Agency of Canada (PHAC) and the US Food and Drug Administration (FDA), to incorporate logical representations of Brighton case definitions into AERO, with the aim of increasing quality and accuracy of AEFI reporting. As an example, only 9% of the Vaccine Adverse Event Reporting System (VAERS) anaphylaxis reports post-H1N1vaccination early 2010 were correctly annotated with the MedDRA anaphylaxis term. Working within the framework being established by the Open Biological and Biomedical Ontologies (OBO) Foundry, the Adverse Events Reporting Ontology (AERO) first documents assessments of relevant signs and symptoms textually. These elements of AEFI reports are then logically defined by being positioned into a hierarchy and related to each other in a way that supports computing an overall diagnosis. Our system allows automatic inference of a diagnosis according to the Brighton criteria based on the evidence encoded in the MedDRA annotations. As an additional test of our approach we will also attempt to parse the textual section of VAERS reports and annotate them with AERO terms with the aim of using the logic encoded in AERO to determine diagnoses as defined in the Brighton Guidelines. Results: Our approach allows us to unambiguously refer to a specific set of carefully defined signs and symptoms at the time of data entry, as well as an overall diagnosis that remains linked to its associated signs and symptoms. The adverse event diagnosis is formally expressed, making it amenable to further querying for example for statistical analysis ("what percentage of patients presented with motor manifestations?") and at different levels of granularity. Finally, by enabling automatic processing of adverse events reports, we will decrease time and money needed for their evaluation. This may allow earlier detection of adverse events signal in the datasets, and trigger a warning for experts to further investigate.

[top]

Executing Semantics Across Documents: Bringing Science Into Context

Anita de Waard
Disruptive Technology Director, Elsevier Labs
Utrecht, Netherlands

Abstract: In my presentation I will show how semantic technologies and Linked Data are forming the backbone of a new form of science publishing, where a paper is presented within three types of context. The first type of context is that of the research process. As we find better ways to integrate research data, executable components and workflow representations with the scientific narrative, we hope to add richness, depth and accountability to publications and improve the reader’s ability to evaluate and replicate the findings. The second type of context is that of the specific features of the object of study such as patient characteristics for clinical reports, species and subspecies for animal studies, or other experimental parameters such as instrumentation details. The third type of context we wish to enable the reader to have access to is the knowledge preceding and succeeding a given paper. By identifying the key claims the authors make and linking them to their supporting evidence both within and across papers, we hope to allow an infrastructure that will enable more straightforward ways of assessing trust and validity when assessing new information. I will demonstrate these principles with three use cases which we are working on together with our academic collaborations, pertaining to clinical guidelines, drug-drug interactions, and neuroscientific knowledge integration.

Some related references:

Bourne P, Clark T, Dale R, de Waard A, Herman I, Hovy E and Shotton D, on behalf of the Force11 community (2011). Force11 White Paper: Improving the Future of Research Communication and e-Scholarship. 27 October 2011. Available from http://force11.org/
de Waard, A. (2010). ‘The Future of the Journal? Integrating research data with scientific discourse’, LOGOS: The Journal of the World Book Community, Volume 21, Numbers 1-2, 2010 , pp. 7-11(5)
de Waard, A. (2010). From Proteins to Fairytales: Directions in Semantic Publishing. IEEE Intelligent Systems 25(2): 83-88 (2010
de Waard, A., Buckingham Shum, S., Carusi, A., Jack Park, Matthias Samwald and Ágnes Sándor. (2009). Hypotheses, Evidence and Relationships: The HypER Approach for Representing Scientific Knowledge Claims, Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), co-located with the 8th International Semantic Web Conference ((ISWC-2009).

[top]

The VIVO Ontology: Enabling Networking of Scientists

Ying Ding
Indiana University
Bloomington, United States

Abstract: VIVO, funded by NIH, utilizes Semantic Web technologies to model scientists and provides federated search to enhance the discovery of researchers and collaborators across disciplines and organizations. VIVO ontology is designed with the focus on modeling scientists, publications, resources, grants, locations, and services. It incorporates classes from popular ontologies, such as BIBO, Dublin Core, Event, FOAF, geopolitical, and SKOS. VIVO data is annotated based on the VIVO ontology to semantically represent and integrate information about faculty research (i.e., educational background, publications, expertise, grants), teaching (i.e., courses, seminars, training), and service (i.e., organizing conferences, editorial boards, other community services). The VIVO ontology has been adopted nationally and internationally, and enables the national and international federated search for finding experts. VIVO is an open source Semantic Web application that, when populated with researcher interests, activities, and accomplishments, enables discovery of research and scholarship across disciplines and organizations. The VIVO core ontology models the academic community in order to provide an consistent and connected perspective on the research community to various shareholders, including students, administrative and service officials, prospective faculty, donors, funding agencies, and the public. The major impetus for NIH to fund the VIVO effort to "develop, enhance, or extend infrastructure for connecting people and resources to facilitate national discovery of individuals and of scientific resources by scientists and students to encourage interdisciplinary collaboration and scientific exchange" . The application is in use at the seven institutions of the NIH VIVO project and has been adopted or to be adopted by several other universities (e.g., Harvard University) and organizations in the USA (e.g., the United States Department of Agriculture), and several universities or institutions in Australia and China (e.g., Queensland University of Technology, Chinese National Academy of Sciences) (Gewin, 2009). More specifically, VIVO can support discovering potential collaborators with complementary expertise or skills, suggesting appropriate courses, programs, and faculty members according to students’ interests, and facilitate research currency, maintenance and communication. For example, a Computer-Aided Drug Discovery (CADD) group may want to find and team up with a computer specialist and a group using in vivo experiments in drug discovery. If the VIVO core ontology is implemented in the hypothetical situation, the group leader can search across experts in computer science and molecular biology. In this paper, we present a relatively comprehensive discussion of the development of the VIVO core ontology, including the latest updates.

[top]

sdlink: An Integrated System for Linking Biological and Biomedical Semantic Data

Alexandre Francisco
Technical University of Lisbon
Lisboa, Portugal

Abstract: Nowadays, with the decreasing cost and increasing availability of high-throughput technologies, an enormous amount of biological and biomedical data is becoming available. Such data is usually represented and stored in different formats and platforms, most of the times off line and not standardized. The automatic integration of data from different databases suffers from several caveats, the most notable being the lack of interfaces for automatic querying and running integration and analysis tools. In order to solve some of these issues, semantic technologies have been proposed and used with great success. In these work we propose an integrated environment for querying, retrieving and analyzing linked data, suitable for users unfamiliar with such technologies, solving an issue that has been detracting a more generalized adoption of semantic methodologies in biology and biomedicine. Method: The new sdlink system (http:/kdbio.inesc-id.pt/sdlink) assumes that data is annotated following a given ontology and provides data views, including graphical representations, and a friendly querying interface. The querying interface was developed to be used by semantic technologies unfamiliar users, where one can for instance define a query by means of a point and click simple interface, which is then translated to SPARQL. The sdlink system uses Virtuoso OSE as the underlying triplestore. To address user concerns with respect to security and privacy, the system supports user/project control access, based on OpenID for authentication and FOAF+WAC for authorization. The system is being used by two FP7 European projects, with good results both in what concerns scalability and usability by non-expert users. We made also available a public project for evaluation purposes (http:/kdbio.inesc-id.pt/sdlink/lubm/). Results: Our results were twofold. First, through the development and deployment of sdlink, we were able to use semantic technologies and linked data on two large projects were most people were unaware of these technologies or of any reason to use them. The main contribution was an interface that simultaneously allows users to retrieve and query linked data, and does not lose expressiveness, efficiency or scalability. In particular, the system is self-adaptable to ontology changes and data transformations, depending only on the update of underlying ontologies. The projects where the system was tested are dealing with heterogeneous data, including sequence data and experiment results, resulting from several teams and work packages, that in the end became integrated, browsable and queryable. The data stored comprises about one million triples, which can be queried in less than one second for most usual queries. Conclusions: The development of sdlink, and its deployment in a real scenario, allowed us to concluded about the importance and usefulness of semantic technologies, namely for domain representation and data integration. More importantly, it was possible to show that, by developing suitable interfaces, any user can benefit from such technologies. Currently, the unfriendliness of most semantic technologies, in particular in the fields of biology and biomedicine, have struggle the adoption of these technologies. The sdlink system is proposed to overcome this problem and to bring semantic technologies and linked data to a broader audience.

[top]

Exploitation of Semantic Methods to Cluster Pharmacovigilance Terms

Natalia Grabar
Universite Lille
Villeneuve d’Ascq, France

Abstract: Pharmacovigilance activity is related to the collection, analysis and prevention of adverse drug reactions (ADRs) likely to be caused by drugs. This activity is achieved thanks to the case reporting to the pharmacovigilance authorities and pharmaceutical industries. Before their inclusion in pharmacovigilance databases, the ADRs of these case reports are coded with terms from dedicated terminologies, such as MedDRA. The analysis of the collected ADRs is related to the safety surveillance within these databases. It relies on the identification of relations between a drug and an ADR. It has been observed that some couples {drug, ADR} are not activated, when they should be. The main cause then is that MedDRA is a fine-grained terminology and that the encoding of the adverse reactions with MedDRA may have an impact on the signal dissolution: similar and close ADRs may be encoded with different terms and during the analysis of the databases they will remain isolated and the safety risk detection may be under-estimated. METHODS. We propose to exploit semantic resources and methods provided by Natural Language Processing and by Computer Sciences for automatic generation of clusters of the MedDRA terms, which have close semantic and clinical meaning. We exploit the ontological resource ontoEIM and MedDRA terms. The SMQs are exploited as the gold standard. Among the methods, we use semantic distance approaches, lexically-based methods for detection of hierarchical and synonymy relations between terms, as well as several clustering methods. The obtained clusters of terms are compared with the existing SMQs, both hierarchical and non hierarchical. The results are evaluated with three metrics: precision, recall and f-mesure. Results are evaluated quantitatively (against the gold standard) and qualitatively (by medical and pharmacovigilance experts).

RESULTS. Various factors have been tested: exploitation of formal definitions, several semantic distance approaches, weighting of the semantic axes within formal definitions, clustering methods, comparison and combination of semantic distance and lexical methods. We obtain results which indicate that the generated clusters can assist the creation of new SMQs or the hierarchical structuring of terms within SMQs. Depending on the SMQs, we obtain interesting results with the semantic distance approach (precision between 36% and 87%, recall between 15% and 77%) and for the lexical approach (precision between 10% and 92%, recall between 3% and 33%). Moreover, these two methods provide complementary results. Indeed, safety topics are better modeled with one or another of the methods. For instance, the generation of the Agranulocytosis cluster has poor results with semantic distance approach: the relevant terms are spread within the ontoEIM resource. Although this grouping shows high performances with the lexical method: the relevant terms have semantic similarities which can be detected at the lexical level. CONCLUSION. The performed experiences indicate that it is possible to generate meaningful clusters of terms on new safety topics in order to assist the creation of new SMQs. The exploited methods can also be exploited for the refinement of the hierarchical structure of the existing SMQs.

[top]

The Biospecimen Repository as Library: How HeLa is like Moby Dick

James McCusker
Rensselaer Polytechnic Institute
Troy, United States

Abstract: Provenance-oriented data models are becoming critical for fostering interoperability among scientific workflow systems. Tools to manage laboratory systems and biorepositories record the actions of people and equipment in order to keep track of exactly what has happened to experimental artifacts. We explore the similarities between the library science standard Functional Requirements for Bibliographic Resources (FRBR) and requirements for biospecimen management in research settings. Abstractive provenance, or the ability to describe entities and their history at multiple levels of abstraction, is used in FRBR to describe the relationship between a particular copy of a book and the concept of that book. A similar treatment can describe the relationship between a cell line, various physical colonies of cells from that cell line, and the originating organisms. We propose a similar standard, Functional Requirements for Biological Resources (FRBioR), to describe those requirements and an ontology that integrates with the W3C draft PROV provenance model and ontology.

[top]

Using Ontologies in the Age-Phenome Knowledge-base (APK)

Eitan Rubin
Ben Gurion University
Beer Sheva, Israel

Abstract: The importance of age in biomedical research and clinical care has resulted in an abundance of publications linking age and phenotypes. However, these data are organized such that searching for age-phenotype relationships is prohibitively difficult. Recently, we described the Age-Phenome Knowledge-base (APK), a computational platform for storage and retrieval of information concerning age-related phenotypic patterns. Here we present and discuss the incorporation and use of ontologies and standardized vocabularies in the APK. Methods and results: The Age-Phenome Knowledge-base contains evidence, such as scientific publications and clinical data analysis, connecting specific ages or age groups and phenotypes such as diseases. It makes extensive use of ontologies and fixed vocabularies in order to describe ages, diseases and other forms of phenotypes. Ages and age groups are described using the Age Ontology, a simple ontology developed for this purpose and based on the description of age-ranges in the Medical Subject Headings (MeSH). The Disease Ontology (DO) is used in APK to represent diseases while other forms of phenotypes are described by a subset of the Unified Medical Language System (UMLS) Metathesaurus. Complex searches are made possible by abstracting over the Age Ontology and the Disease Ontology's hierarchical structures. Conclusions: APK provides an example of how ontologies can be used in rapid development of new knowledge models. It makes integral use of ontologies and vocabularies to represent diseases and age groups in a standard, unambiguous way. Furthermore, the use of ontologies allows abstraction, which in turn makes it easy to develop/conduct complex queries.

[top]

Using Linked Open Data to Inform the Drug Discovery Process

James Snowden
UCB Celltech
Slough, United Kingdom

Abstract: The treatment of disease and identification of new targets via which the symptoms / causes of disease can be treated is one of the cornerstones of drug discovery research in the pharmaceutical sector. Whilst much of the information for these areas is available, it is distributed in many systems both internally and externally. Therefore, the main issue with gathering the required information is actually one of time / resource. In response to this at UCB, we have developed the Target Information Platform (TIP) and Disease Information Platform (DIP) systems to collate key information relating to targets and disease respectively and make this available in a single portal for easy access by our scientists. This approach is underpinned by the capabilities provided by semantic technology and in particular Linked Open Data (LOD), which allows complete querying of available data sources in a quick and automated manner. The public LOD system which is comprised of SPARQL endpoints over key biological data sources is queried using SPARQLMOTION scripts through the TopBRaid composer system. This takes in a single data item (Uniprot ID for target, disease name for disease) which is used to pull data an initial endpoint (UniProt / Diseaseome). The results from this are parsed and where relevant, additional calls are made out to endpoints for other data sources. The end result of this is that a RDF data package is generated which collates together relevant information from multiple sources in the public domain. Additionally for DIP, literature, patent and omics data is queried and stored in a Triplestore. Web pages are generated from this information which are provided to the scientists. The key benefits that have been derived through this approach so far have been speed, completeness of data searching and increasing the availability of target / disease information. A target search that may have taken 2-3 days work for 1-2 scientists can now be done in 5-10 minutes. This frees up scientist time, provides target information faster and allows many more targets to be queried. The data searching is done in a standardised manner with the aspect of human error removed and also more consistency in terms of data returned for targets and disease. Finally, providing the information returned in a central portal means that scientists always know where to go to access the information. All of these benefits are in some way related to the semantic / LOD approach used. Disadvantages of this approach are mostly related to technical issues of endpoint uptime / availability and also updating of information within the endpoints. This work has demonstrated that it is possible to utilise the public LOD framework in an automated manner that exemplifies that linked data principle by starting from a single point of information to gather detailed data. It has returned information relating to key concepts of vital importance to drug discovery that have helped to optimise this process at UCB and has demonstrated practical utility for semantics and LOD.

[top]

Domain Knowledge and Provenance-Integrated Knowledge Organization System Represented with RDFS and SPARQL

Young Soo Song
University of Alabama at Birmingham
Birmingham, United States

Abstract: Objectives and Motivation: Although semantic web technologies are expanding as a framework to construct knowledge organization system (KOS), without controlling data flow based on rules and consensus, its adoption will be limited. We have previously addressed this problem by defining a Markov process for user operators associated with a KOS, S3DB. The idea was that by annotating existing assertions with the domain neutral S3DB tags, the user-operator states describing the provenance could then be tracked by a parallel algebraic process. That solution includes a mechanism for resolving both the merging and the migration of multiple, often conflicting, provenance. This mechanism is currently supported by a open source prototype (s3db.org) with a SPARQL endpoint and a query language, S3QL. Although the core concept of S3DB includes both domain knowledge and provenance model and its implementation is currently used in several institutions, they are loosely coupled because domain knowledge model was expressed as RDFS and provenance model as numeric computation. In this study we seek to bridge between the logic and algebraic representations by describing user-operators as a RDFS model such that the integrated representation can be resolved by a SPARQL 1.1 engine. Method: Semantics of S3DB provenance model were thoroughly analyzed and represented as a semantically equivalent SPARQL query. While S3DB domain knowledge model is a pure RDFS model, its provenance model is a numerical model, which receives arguments from asserted RDF triples and produces outputs as inferred triple through the successive procedures. In this model, asserted triples represent assigned user operator relationships between users and entities and new relationships between them are inferred through numerical procedures. Semantically reinforced SPARQL 1.1 can simulate these numerical procedures. In particular, propagation of user operators corresponded to SPARQL designed with property paths and merging to SPARQL designed with aggregation function. These phrases could be assembled into SPARQL having subquery as a part. Although the procedure of merging was performed during propagation of user operations in the original numerical model, our SPARQL model performed merging after propagation is completed, producing the same results while not being affected by the order of the procedures. Results: For the proof of concept, our integrated model was implemented with ARQ version 2.8.8., although any triple store or application supporting SPARQL 1.1 should deliver the same results. Provenance model was tested upon the cancer genome atlas (TCGA) data, containing microarray and sequencing data for over 500 cancer patients. Verification of the validity of our model needs only three steps, 1) installation of ARQ, or equivalent application supporting SPARQL 1.1, 2) importing of knowledge model and test TCGA data, and 3) executing of query representing our provenance model. Effective user operators inferred from the query could be stored in the other namespaces separate from the assigned user operators. Conclusions: As a consequence, it is argued, complex provenance scenarios can be accommodated by data stores equipped with a SPARQL endpoint. This result signifies that the proposed solution can be handled in a scalable and distributed manner by regular triple stores.

[top]

Spo: An Ontology for Describing Host-pathogen Interactions Inherent to Streptococcus Pneumoniae Infections

Cátia Vaz
Poly Institute of Lisbon
Lisbon, Portugal

Abstract: Over the past twenty years, the study of infection has tended to consider individual virulence factors or host factors. The Pneumopath project (www.pneumopath.org/), a FP7 European research project, has the objective of studying the host-pathogen interactions during infection of Streptococcus pneumonaie and finding new targets for diagnosis and treatment. This research purports to identify the most important and consistently involved host and pneumococcal factors, in contrast to previous approaches, where factors where studied in isolation. The transmission of Streptococcus pneumoniae to a new host can result in asymptomatic colonization or progress to invasive disease. The infection can be determined by multiple attributes of both host and pathogen, being important to take into account the epidemiological and genomic characterization of pneumococcal strains, the results from experiments that evaluate host or pneumococcal responses to infection or different environmental challenges, and also the results from experiments that identify host genetic susceptibility factors. In this work we propose Spo (http://kdbio.inesc-id.pt/~cvaz/pneumopath/), an ontology developed in the context of the Pneumopath project, which provides terms and semantic constructs for annotating all aspects of host-pneumococcal interactions. Method: The data considered includes the characterization of pneumococcal strains, typing information, as well as data of in vitro and in vivo experiments with animals and cell models, relevant for identifying new targets to combat pneumococcal diseases. Some of these data are scattered across numberous information systems and repositories, each with its own terminologies, identifier schemes, and data formats. The need to share such data brings challenges for both data management and annotation, such as, the need to have a common understanding of the concepts that describes host-pneumococcal interactions. Thus, semantic annotation and interoperability become an absolute necessity for the integration of such diverse biomolecular data. Moreover, given the heterogeneous environment inherent to the project, the ontology construction took into consideration contributions from all partners, leading to a well-grounded set of concepts and annotations. Results: Spo provides a framework to represent mentioned host-pneumococcal interactions, being flexible enough to accommodate the rapid changes and advancement of research and achieve data interoperability and interchange. This has been only possible because of semantic Web recommended practices for clearly specifying names for things and relationships, expressing data using standardized and well-specified knowledge representation languages. The ontology described in OWL Lite v1.0 includes 36 classes, 24 object properties and 43 data properties. Conclusion: The main contribution of this work was not only Spo, but all the approach and methodology for its construction in the context of a large research project, where many people were not aware of semantic technologies. The proposed ontology does not only describe knowledge in this field, but also allows for validating and aggregating existing knowledge, which is essential for data integration. Furthermore, the ability to accurately describe the host-pneumococcal interactions through the use of Spo has facilitated the implementation of information systems capable of coping with the heterogeneous types of data and, by using well known semantic technologies, it allowed users to query data and discover new knowledge.

[top]

Annotation Analysis for Testing Drug Safety Signals

Trish Whetzel
Stanford University
Stanford, United States

Abstract: Introduction R is used versus using coded data alone. Changes in biomedical science, public policy, and electronic heath record (EHR) adoption have converged recently to enable a transformation in health care. While analyzing structured EHRs have proven useful in different contexts, the true richness and complexity of health records—roughly 80 percent—lies within the free-text clinical notes and it is crucial to develop methods to test for drug safety signals throughout the EHR. Using ontology-based approaches, we computed the risk of having a Myocardial infarction (MI) on taking Vioxx for Rheumatoid arthritis (RA) using the annotations created on the textual notes for over 1 million patients in the Stanford Clinical Data Warehouse (STRIDE). Methods Based on the NCBO Annotator Web service, we created a standalone NCBO Annotator Workflow that is highly optimized for both time and space. The workflow was extended to incorporate negation detection, the concept recognizer Unitex, and uses ontologies from BioPortal. To reproduce the risk of MI following Vioxx treatment, we identified patients in STRIDE with a pattern of RA, who are taking Vioxx, and then suffer MI. To identify patients with RA and MI, we scanned through structured data of 25 million coded ICD9 diagnoses for codes beginning with the ICD9 codes for RA and MI. We also scanned through the normalized annotations of the unstructured data, to look for non-negated mentions of MI and RA. We denote the first occurrence or mention of the condition as t0(RA) and t0(MI). We did not have access to the structured medication data; therefore, we relied upon annotations derived from the textual notes to identify patients taking Vioxx. We scanned through the normalized annotations of the unstructured data to look for non-negated mentions of Vioxx or rofecoxib. We denote the first occurrence or mention of the drug as t0(Vioxx). Results The Annotator Workflow was enhanced in both time and space and processed 9.5 million patient notes in 7 hours using 4.5 GB of disk space. From the observed patient counts, we constructed a contingency table and obtained a reporting odds ratio (ROR) of 2.058 with a confidence interval (CI) of [1.804, 2.349] and proportional reporting ratio (PRR) of 1.828 with CI of [1.645, 2.032]. The uncorrected ?2 statistic was significant with a p-value < 10-7. In comparison, without using the unstructured data and only using the ICD9 coded data, the results were more ambiguous. The corresponding risks for the results without the unstructured data were: ROR=1.524 with CI=[0.872, 2.666] confidence interval; and PRR=1.508 with CI=[0.8768, 2.594]; and ?2=0.06816. Conclusions We have significantly scaled the NCBO Annotator Workflow to computationally annotate the free-text narrative of over 9.5 million reports from STRIDE. Our results demonstrate that unstructured data in the EHR provide a viable source for testing drug safety signals using annotations created from the textual notes. Our analysis recapitulated the latent Vioxx risk signal and found that the risk is far more perceptible when ontology-based analysis methods of unstructured data in the EHR is used versus using coded data alone.

[top]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Keynote Speakers

Updated February 24, 2012

	Rinke Hoekstra, PhD Knowledge Representation & Reasoning Group VU University Amsterdam Leibniz Center for Law University of Amsterdam Netherlands Presentation Title: The Knowledge Reengineering Bottleneck >>Click here for presentation details
	Isaac (Zak) Kohane, MD, PhD Director, Children’s Hospital Informatics Program Henderson Professor of Pediatrics and Health Sciences and Technology Harvard Medical School (HMS) Co-Director, HMS Center for Biomedical Informatics Director of the HMS Countway Library of Medicine Cambridge, MA, USA Presentation Title: SMArt Semantics for Clinical Healthcare Delivery >>Click here for presentation details
	Barend Mons, PhD Scientific Director Netherlands Bioinformatics Center Biosemantics Group Leader, Leiden University Medical Centre Netherlands Presentation Title: E-Science Dictates E-Publication - Nanopublications as a Substrate for In-silico Knowledge Discovery >>Click here for presentation details

	Chris Welty, PhD IBM Research Scientist T.J. Watson Research Center New York, USA Presentation Title: Inside the Mind of Watson >>Click here for presentation details

	Stephen Wolfram, PhD Founder & CEO Wolfram Research Wolfram Research Champaign, IL - USA Presentation Title: Wolfram\|Alpha and the Quest for Computational Knowledge >>Click here for presentation details

[TOP]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Discussion Questions

Please check back for updates.

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Full Agenda*

(*Updated April 20, 2012. Schedule subject to change. )

Indicates that presentation slides or other resources are available.

go directly to :[Thursday - February 23] [Friday - February 24]

Wednesday – February 22, 2012
7:30 am - 1:00 pm	Registration
8:30 am - 10:30 am	Notice to Tutorial Participants - Click here RPI-led Hands-on Tutorial - Semantic Healthcare and Life Sciences Tutorial: Mashing HC and LS Data CSHALS Tutorial Coordinator: Lee Feigenbaum Cambridge Semantics RPI Tutorial Coordinator: Joanne Luciano, This email address is being protected from spambots. You need JavaScript enabled to view it. *Presenters:* Dominic DiFranzo, This email address is being protected from spambots. You need JavaScript enabled to view it. Presentation (.pdf) Jim McCusker, This email address is being protected from spambots. You need JavaScript enabled to view it. d3 Tutorial (web) Joshua Shinavier. This email address is being protected from spambots. You need JavaScript enabled to view it. http://services.fortytwo.net/linked-data Linked Data diagram (.pdf) Location: Charles Suite - 2nd Floor Royal Sonesta Hotel
10:30 am - 11:00 am	Break
11:00 am - 12:15 pm	RPI-led Hands-on Tutorial (continues)
12:15 pm - 1:00 pm	Lunch
1:00 pm - 3:00 pm	RPI-led Hands-on Tutorial (continues)
3:00 pm - 3:15 pm	Break
3:15 pm - 5:00 pm	RPI-led Hands-on Tutorial (continues)
4:00 pm - 7:00 pm	Registration
4:00 pm - 5:00 pm	Poster (Author) Set-up
5:00 pm - 7:00 pm	Poster Reception

Thursday – February 23, 2012
7:30 am - 10:00 am	Registration
7:30 am - 8:30 am	Breakfast (continental)
9:00 am - 9:15 am	Welcome & Overview from Conference Chairs: Jill Mesirov (ISCB Board Member) Presentation (.pdf) Mike Bevil and Joanne Luciano Presentation (.pdf)
9:15 am - 10:00 am	Keynote 1 SMArt Semantics for Clinical Healthcare Delivery Dr. Isaac Kohane Director, Children’s Hospital Informatics Program Henderson Professor of Pediatrics and Health Sciences and Technology Harvard Medical School (HMS) Co-Director, HMS Center for Biomedical Informatics Director of the HMS Countway Library of Medicine Cambridge, MA - USA Presentation (.pdf)
10:05 am - 11:00 am	Literature	Executing Semantics Across Documents: Bringing Science Into Context Presenter: Anita de Waard Disruptive Technology Director, Elsevier Labs Utrecht, Netherlands
		The Biospecimen Repository as Library: How HeLa is like Moby Dick Presenter: James McCusker Rensselaer Polytechnic Institute Troy, USA Presentation (web)
11:00 am - 11:20 am	Break
11:20 am - 11:40 am	Tech Talk 1	Javascript - The Key to Successful SemWeb Deployments Presenter: Jans Aasman Franz Inc. Oakland, CA, USA
11:45 am to 12:05 pm	Tech Talk 2	Managing Bigdata® in Bioinformatics Presenter: Bryan Thompson SYSTAP LLC Greensboro, NC USA Presentation (.pdf)
12:05 pm - 1:15 pm	Lunch
1:15 pm - 2:00 pm	Keynote 2 E-Science Dictates E-Publication - Nanopublications as a Substrate for In-silico Knowledge Discovery Dr. Barend Mons Scientific Director Netherlands Bioinformatics Center Biosemantics Group Leader, Leiden University Medical Centre Netherlands
2:05 pm - 3:30 pm	Community/Knowledge Management	The VIVO Ontology: Enabling Networking of Scientists Presenter: Ying Ding Indiana University Bloomington, USA Presentation (.pdf)
		Domain Knowledge and Provenance-Integrated Knowledge Organization System Represented with RDFS and SPARQL Presenter: Young Soo Song Birmingham, USA Presentation (.pdf)
		E-Diary Data Collection in Neurology and Psychiatry: Computational Achievements and Challenges Presenter: Ron Calvanio Massachusetts General Hospital & Harvard Medical School Cambridge, USA Presentation (.pdf)
3:30 pm - 3:45 pm	Break
3:45 pm - 5:10 pm	Drug Discovery, Linked Data	Target Identification Using an Integrated Subset of the Yeast Interactome with Chemical Genomic Data in RDF Presenter: Nadia Anwar General Bioinformatics Reading, United Kingdom Presentation (.pdf)
		Chem2Bio2RDF: Linked Open Data for Drug Discovery Presenter: Bin Chen Bloomington, USA Presentation (.pdf)
		Using Linked Open Data to Inform the Drug Discovery Process Presenter: James Snowden UCB Celltech Slough, United Kingdom Presentation (.pdf)
5:15 pm - 6:00 pm	Keynote 3 Title: Inside the Mind of Watson Dr. Chris Welty IBM Research Scientist T.J. Watson Research Center New York, USA
6:00 pm - 6:05 pm	Daily Closing Remarks	Mike Bevil and Joanne Luciano, Conference Chairs

Friday – February 24, 2012
7:30 am - 10:00 am	Registration
7:30 am - 8:30 am	Breakfast (continental)
8:30 am - 8:45 am	Review - Previous Day Mike Bevil and Joanne Luciano, Conference Chairs Presentation (.pdf)
8:45 am - 9:30 am	Keynote 4 The Knowledge Reengineering Bottleneck Dr. Rinke Hoekstra Knowledge Representation & Reasoning Group VU University Amsterdam Leibniz Center for Law, University of Amsterdam Netherlands Presentation (.pdf)
9:30 am - 10:25 am	Semantics in Research	sdlink: An Integrated System for Linking Biological and Biomedical Semantic Data Presenter: Alexandre Francisco Technical University of Lisbon Lisboa, Portugal
		Spo: An Ontology for Describing Host-pathogen Interactions Inherent to Streptococcus Pneumoniae Infections Presenter: Cátia Vaz Poly Institute of Lisbon Lisbon, Portugal
10:25 am - 10:40 am	Break
10:40 am - 12:35 pm	Pharmacovigilance & Safety	Dynamic Enhancement of Drug Product Labels Through Semantic Web Technologies Presenter: Richard Boyce University of Pittsburgh Pittsburgh, USA Presentation (.pdf)
		Adverse Events Following Immunization: Standardization, Automatic Case Classification and Signal Detection Presenter: Mélanie Courtot British Columbia Cancer Research Centre Vancouver, Canada Presentation (.pdf)
		Exploitation of Semantic Methods to Cluster Pharmacovigilance Terms Presenter: Natalia Grabar Universite Lille Villeneuve d’Ascq, France Presentation (.pdf)
		Annotation Analysis for Testing Drug Safety Signals Presenter: Trish Whetzel Stanford University Stanford, USA Presentation (.pdf)
12:35 pm - 1:30 pm	Lunch
1:30 pm - 2:15 pm	Keynote 5 Wolfram\|Alpha and the Quest for Computational Knowledge Dr. Stephen Wolfram Founder & CEO Wolfram Research Wolfram Research Champaign, IL - USA
2:20 pm - 3:15 pm	Clinical Healthcare	Using Ontologies in the Age-Phenome Knowledge-base (APK) Presenter: Eitan Rubin Ben Gurion University Beer Sheva, Israel
		Intelligent Surveillance of Health Care-associated Infections with SADI Semantic Web Services Presenter: Christopher Baker University of New Brunswick Saint John, Canada Presentation (.pdf)
3:15 pm - 4:15 pm	Panel Discussion	Moderators: Mike Bevil and Joanne Luciano, Conference Chairs
4:15 pm - 4:30 pm	Future Actions Mike Bevil and Joanne Luciano, Conference Chairs
4:30 pm	Conference Adjourns
[TOP]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Frequently Asked Questions (FAQ)

What is CSHALS?
What is ISCB?
Where will CSHALS be held?
How do I make my hotel reservation?
Where is Boston?
Where can I find a map of Greater Boston?
Boston's Logan International Airport
How do I get from Logan International Aiport to the conference hotel?

What is CSHALS?

The Conference on Semantics in Healthcare and Life Sciences (CSHALS) presented by the International Society for Computational Biology is considered the premier conference focusing on the pharmaceutical applications of Semantic Technologies. It serves as a forum for the presentation and discussion of practical semantics-based approaches to Drug R&D, organized along specific topics and moderated to induce interactive discussions around sets of key questions. The conference is intended for anyone interested in understanding how best to apply intelligent information technologies in Pharmaceutical R&D.

[TOP]
What is ISCB?

CSHALS in an official conference of the International Society for Computational Biology (ISCB). ISCB is dedicated to advancing the scientific understanding of living systems through computation and organizes annual conferences such as ISMB and Rocky Mountain Bioinformatics Conference. The ISCB communicates the significance of our science to the larger scientific community, governments, and the public at large. The ISCB serves a global membership by impacting government and scientific policies, providing high quality publications and meetings, and through distribution of valuable information about training, education, employment and relevant news from related fields. ISCB membership offers many benefits including reduced conference registration fees to several high impact events and reduced subscription prices for a selection of journals of Computational Biology and Bioinformatics.

[TOP]
Where will CSHALS be held?

CSHALS 2012 will be held at the Royal Sonesta Hotel Boston:

ROYAL SONESTA HOTEL BOSTON
40 Edwin Land Boulevard
Cambridge, MA, USA 02142
www.sonesta.com/Boston/

[TOP]
How do I make my hotel reservation?

You must book your room directly with the hotel. Details are available here: www.iscb.org/cshals2012-genl/cshals2012-hotel

Delegates registered for CSHALS 2012 – Conference on Semantics for Healthcare and Life Sciences are able to take advantage until January 31, 2012 of the special conference rate of $135.00 plus taxes (see below). Delegates may reserve their preferred accommodations in a variety of ways:

a) Call the Hotel directly at 617-806-4200 and asking for Reservations
b) Email the Hotel directly at This email address is being protected from spambots. You need JavaScript enabled to view it.
c) Book on-line at:
https://gc.synxis.com/rez.aspx?Hotel=12050&Chain=5157&arrive=2/21/2012&depart=2/24/2012&adult=1&child=0&group=CSHB12A

[TOP]
Where is Boston?

Boston is on the east coast of the United States in the State of Massachusetts.

[TOP]
Where can I find a map of Greater Boston?

The following maps are available. Click on the image to download the .pdf.

Boston Map
(Attractions/Points of Interest) Boston Convention Housing Map

[TOP]
Boston's Logan International Airport

For details on Logan International Airport visit:
www.massport.com/logan/default.aspx

[TOP]
How do I get from Logan International Aiport to the conference hotel?

You can take a taxi from the airport. The approximate cost is $30.00. If you are driving to the hotel you can find directions at:
www.sonesta.com/Boston/index.cfm?fa=gettinghere.home

[TOP]

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Conference Hotel

ROYAL SONESTA HOTEL BOSTON
40 Edwin Land Boulevard
Cambridge, MA, USA 02142
www.sonesta.com/Boston/

Delegates registered for CSHALS 2012 – Conference on Semantics for Healthcare and Life Sciences are able to take advantage until January 31, 2012 of the special conference rate of $135.00 plus taxes (see below). Delegates may reserve their preferred accommodations in a variety of ways:

1) Call the Hotel directly at 617-806-4200 and asking for Reservations

2) Email the Hotel directly at This email address is being protected from spambots. You need JavaScript enabled to view it.

3) Book on-line here:

ROYAL SONESTA CONFERENCE RATES AND INFORMATION
Group Rate: $135.00 single / $135.00 double occupancy

The group rate is for a standard guestroom type. Should you be interested in upgrading to our other accommodations, the following upgrade charges would apply:

Upgrade to a Deluxe Riverview room for an additional $50.00.

Upgrade to an Executive Suite for an additional $100.00.

The above rates are subject to 5.7% Massachusetts State tax, 6.0% Cambridge City tax, and 2.75% CCF tax, for a total tax of 14.45%. Please note that group rates may not be available before or after the dates contracted above. Upgrade requests are based on availability and upgrade fees are subject to change. For more than two guests, there will be a $25.00 charge for each additional person in the room. There is no charge for children 17 years or under when sharing a room with parents.

CANCELLATION POLICY FOR INDIVIDUAL RESERVATIONS
To avoid an individual guestroom cancellation charge equal to one night's room and tax, please notify the hotel of any cancellations by 6 p.m. on the day prior to arrival. Please be sure to obtain a cancellation number at the time of cancellation and retain this number for your files.

CHECK-IN/CHECK-OUT
Hotel check-in time is after 3:00 p.m. and our check-out time is 12 noon. Anyone arriving earlier than 3:00 p.m. will be checked in as soon as rooms become available; but they should be aware that there may be a wait. On your departure date, your guests should be notified that they may incur a late check-out fee should they check out after 12 noon. The hotel is pleased to arrange for baggage storage for attendees' luggage.

EARLY DEPARTURE FEES
An early departure fee of $50 will be assessed to any guest who departs earlier than the date confirmed at check-in. The fee will be automatically posted to the guest folio on or after departure. Emergency or special circumstance situations will be reviewed on a case by case basis.

[TOP]

Page 3 of 4

Start
Prev
1
2
3
4
Next
End

Boston Map (Attractions/Points of Interest)	Boston Convention Housing Map

Submission deadlines fast approaching

Submit your work today

Submit your work today

Submit your workshop today!

New Webinars & Tutorials scheduled frequently

Ensure you get important ISCB emails

Donate Now!

UPCOMING DEADLINES & NOTICES

Upcoming Conferences

A Global Community

Professional Development, Training, and Education

ISCBintel and Achievements

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Technical Talks

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Poster Presentations

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Presenters

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Keynote Speakers

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Discussion Questions

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Full Agenda*

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Frequently Asked Questions (FAQ)

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Conference Hotel

Exclusively for members

Supporting ISCB

ISCB On the Web