Ontologies in CellML: A Versatile Method to Describe Cellular Models

Poul Nielsen1, Matt Halstead2, Autumn Cuellar, Michael Dunstan, David Bullivant, Peter Hunter
1p.nielsen@auckland.ac.nz, University of Auckland; 2matt.halstead@auckland.ac.nz, University of Auckland

CellML is an XML-based exchange language used to describe the underlying mathematics and topology of a wide variety of biological models. Knowledge implicitly associated with a model, however, is not normally included in the CellML representation. In order to address this problem facilities to include ontologies have been added to CellML. An ontology is, in essence, a controlled vocabulary of terms that are related to one another in class hierarchies and are associated by a set of rules. Ontologies are powerful because they give computer applications the ability to infer meaning about a particular set of data based on how the data set associates with the ontology. Ontologies extend the current capabilities of CellML by adding classing mechanisms to CellML components and variables.

We are exploring how CellML may benefit from the incorporation of ontologies by defining the base CellML language, the reaction subset of the CellML language, and a conceptual rendering of a reaction as ontologies with rules about how they interact. A biochemical reaction is broken down into its participants and the expressions that relate the participants. These three branches of the ontology are part of a wider effort to build a formal knowledge representation of the physiome, with all entrants into the ontology being peer reviewed. The ontologies were defined using Protégé (http://protege.stanford.edu/), which is capable of exporting the user's vocabularies in a variety of standard formats, such as DAML+OIL and the W3C's Web Ontology Language, OWL (http://www.w3c.org/2001/sw/WebOnt/).

The benefits of using the CellML ontologies are numerous. The reaction ontologies serve as an interface between the scientist and the programmer by allowing the scientist to describe reaction pathways in a way that is biologically familiar and by breaking down the components of a reaction in a way that is conceptually significant and easy for the programmer to implement. For instance, the biologist can describe an enzyme-catalysed reaction with competitive inhibition using a pathway editor by creating an instance of the competitive inhibition class, a subclass of the Michaelis-Menten class. Because the competitive inhibition class is part of the reaction ontology, the editor knows that the reaction involves a substrate, enzyme, product, and inhibitor, and certain other parameters must be entered before the component is complete. What differentiates this methodology from other existing software is that the ontology is not application-specific. The same ontology may be shared and processed by many applications as long as the program can understand standard ontology representations. Furthermore, once an application is capable of processing ontologies, users may define and integrate their own ontologies for use by the program, or incorporate a number of existing ontologies.

The current ontologies created for use with CellML are both powerful and versatile. In the future further ontologies will be constructed to enable graphical information to be assigned to a component, provide better model validation techniques, and associate a model with other models or templates. For updates on how ontologies are being incorporated into CellML, see http://www.cellml.org/.