Bio–Ontologies: Tools, Techniques, and Examples

Carole Goble and Robert Stevens, TAMBIS team, University of Manchester, UK
Peter Karp, Bioinformatics Research Group, SRI International, USA

Synopsis

Ontologies are playing increasingly important roles within a range of bioinformatics applications. A tutorial at ISMB–98 set out to introduce the notion of ontologies, and to motivate for their adoption. That tutorial was followed by ontology workshops at ISMB–98 and ISMB–99, for the specialist community, but these have not been for the bio–ontology community as a whole.

The intended audience for this tutorial are (a) developers of bioinformatics databases who wish to have the development of their DB schema guided by ontology methodologies, and (b) developers of bioinformatics ontologies. Hard computer science detail, such as logical forms, are inappropriate for this forum, but references will be given to appropriate material. This tutorial seeks to inform the whole community on how to construct and deliver ontologies within the bioinformatics domain, in the setting of real, practical examples.

The University of Manchester and SRI International are leaders in the development and use of bio–ontologies. The two groups represent different approaches and have developed different types of applications. This gives the tutorial the wide coverage and multi–dimensional approach needed for such an introductory tutorial.

The tutorial will follow the topics and time scale:

The need for, and nature of, ontologies (30 minutes);
A survey of current bio–ontologies and how they are used (60 minutes);
Methodologies for building ontologies (60 minutes);
Ontology development tools (30 minutes);
Exchanging and re–using ontologies (30 minutes).

The tutorial aims to be interactive, with some time set aside for audience participation in the development of a small part of an ontology. This will be centered around the conceptualisation of proteins and their function -- features common to many of the current bio–ontologies. The tutorial will be rich in examples taken from the various bio–ontologies; examining the different conceptualisations, encodings and delivery styles and how this is linked to their purpose.

Details

The Need For, and Nature of, Ontologies

The tutorial will not take a deep philosophical approach to the nature and use of ontologies. Instead, a practical approach will be adopted, driven by the need to include knowledge of biological and related disciplines within bioinformatics applications. The tutorial will describe the type of knowledge needed, how it can be captured and what must remain within the human domain. Knowledge is couched in terms of a domain's concepts and the relationships held by those concepts. The importance of conceptualisation as a major stage in ontology development will be stressed through the tutorial.

Motivations for ontologies included their use in database interoperation, in machine learning (to provide a generalisation hierarchy to guide the learner), and in development of bioinformatics databases.

The tutorial will compare and contrast ontologies with controlled vocabularies, taxonomies, and database schemas.

A Survey of Current Bio–Ontologies

The bio–ontologies that will be covered by the tutorial includes, but will not be limited to:

The TAMBIS Ontology (TaO) of bioinformatics
The EcoCyc ontology
The Gene Ontology (GO)
Monica Riley's classification of gene function
The RiboWeb ontology

The whole tutorial will be illustrated with examples taken from these ontologies; in particular to highlight the different approaches that can be taken to the stages of ontology building, delivery and usage. In all these examples it will be shown how the use to which the ontology is put drives the type of knowledge represented; how it is represented and how it is delivered. Again, guidelines will be given to the costs and benefits of each approach and when it is valid to use each approach.

Methodologies for Building Ontologies

A variety of techniques exist for knowledge representation -- acquiring, encoding and delivering a conceptualisation. Various methods of representing knowledge in an ontology will be discussed. A framework will be used to describe these schemes, working along the axes of expressivity and well–foundedness. The former dimension will be divided into informal, structured and formal. The issues of consistency and interpretation will be discussed. For application developers, the role of an API that hides the representational scheme from the application itself will be explored.

The correctness and consistency of each method will be investigated and a brief introduction given to each type of encoding method. References to resources will be given for each type of encoding.

Irrespective of the knowledge representation used, the stages in building an ontology are much the same. Within the tutorial, a modified version of the well known software engineering V–model will be used to describe the process of building an ontology:

Identify purpose and scope: identifying the intended range of uses of the ontology

Knowledge Acquisition: the process of acquiring domain knowledge from which the ontology will be built

Building the ontology – conceptualisation: identifying the key concepts that exist in the domain, their properties and the relationships that hold between them

Building the ontology – integrating: use or specialise an existing ontology as a foundation

Building the ontology – encoding: representing the conceptualisation in some form of language

Documentation: informal and formal complete definitions

Evaluation: determining the appropriateness of an ontology for its intended application

The ontology fragment of proteins and their functions will run through this part of the tutorial. It will be shown that different conceptualisations can be used for different ontological tasks and how the properties of the encoding can alter the initial conceptualisation. Examples from each of the three types of encoding will be used to exhibit the different ontologies arising from building the same ontology in different ways. This section will also highlight the use of upper–level, organising ontologies.

The tutorial will present principles of ontology design, and will warn the tutee of potential pitfalls that they should avoid. The tutee should gain an overview of the advantages and disadvantages of encoding styles and the costs of each stage of building.

Ontology Development Tools

For such complex systems as an ontology, where consistency and clarity are important, the availability of tools that assist the process of conceptualisation and encoding are invaluable. Tools are also important for allowing developers to view their ontology and check development criteria, especially fitness for purpose. The tutorial will survey several existing ontology–development tools, including:

The GKB Editor (SRI International);
Ontolingua (Stanford University);
The tools for building description logic ontologies (Manchester University and others).

Exchanging and Re–using Ontologies

An important aspect of ontology building is re–use of existing ontologies when designing new ontologies. The implications of ontology re–use will be explored, paying particular reference to the biases imposed on an ontology by their conceptualisation, encoding and delivery. Attendees will find out whether ontologies can be exchanged so that the conceptualisation represented in one ontology in one representation can be transferred to another ontology using a different method. The difference between exchange and interchange will be explained. These issues will be explored so that bio–ontologists are fully aware of what is possible within exchange and where work has to be done.

In the 1999 bio–Ontologies workshop Peter Karp presented the XML Ontology Language (XOL). This is an XML based exchange language that can be used to describe the structure of an ontology. Work has progressed on XOL during the past year and the latest incarnation (Ontology Interchange Language: OIL) will be presented during the tutorial. The XML DTD will be described and participants introduced to its use in exchanging ontologies.

Tutorial Staff

The following three people have developed and will present the tutorial:

Peter Karp: Dr. Peter Karp received the Ph.D. in Computer Science from Stanford University in 1989. He has held positions at the US National Center for Biotechnology Information, at Pangea Systems, and at SRI International, where he directs the Bioinformatics Research Group. He has developed tools for developing ontologies and large knowledge bases, such as the GKB Editor, and has developed a large bioinformatics ontology as part of the EcoCyc project. He has also published several papers on interoperation of bioinformatics databases.

Carole Goble: Carole Goble is a senior lecturer in Computer Science and co–leads the Information Management Group at the University of Manchester. She was/is investigator on a number of projects using ontologies, represented using Description Logics for: medical information systems (PEN&PAD, GALEN), mediating disparate bioinformatics information sources (TAMBIS, TAMBIS–II), and improved protein function prediction using ontologies (Irbane). She is a co–investigator on a basic research project on Description Logic–based ontology servers (CAMELOT). She is currently involved in the development the Ontology Interchange Language, which the Information Management Group is developing with Peter Karp and the Free University of Amsterdam.

Robert Stevens: Robert is a bioinformatics researcher at the University of Manchester and has degrees in both Biochemistry and Computer Science. He has experience in the characterisation, modelling and interoperation between many bioinformatics resources, as well as skills in user requirements analysis and user interface design. He is one of the developers of the TAMBIS bioinformatics source mediation system.