Bioinformatics, Genes, Proteins and Computers
Author(s): C.A. Orengo, D.T. Jones and J.M.Thornton eds.

ISBN 1-85996-0545

More Information
Bios Scientific Publishers, Oxford. (July 2003)
Paperback: 298 pages

Bios have three series of titles that they call Instant Notes, Advanced Text and Advanced Methods. I recently reviewed Instant Notes in Bioinformatics by D.R. Westhead, J.H. Parish and R.M. Twyman (ref. to previous review). I would not say that the present volume under the Advanced Text title is more advanced than the Instant Notes, but it is certainly more thought provoking as you might expect. The major emphasis is on proteins and the major contribution is by staff of University College, London and their co-workers.

In her preface, Janet Thornton struggles a little to define bioinformatics. Her possible succinct definition is: ‘the collecting, archiving, organisation and interpretation of biological data’. This is too broad as it could be taken to include Ecology and Systematics (Taxonomy). I prefer her other thoughts: ‘a necessary evil’ … ‘whereby all biologists will become applied bioinformaticians at some level.’ However, in the very first chapter, Sylvia Nagl explains exactly what bioinformatics is about under the title ‘Molecular evolution.’ Organisms and cells contain information largely encoded in DNA, RNA and proteins. This information is manipulated by complex systems and is subject to evolutionary change. We can manipulate the same information in computers and thereby try to understand how these complex systems function in terms of the processes that are going on. For me, this is the essence of bioinformatics that distinguishes it from other branches of computational biology or biostatistics.

Chapter 2 on ‘Gene finding’ by J.G. Sgouros and R.M. Twyman is a succinct account of genetic and physical mapping (strangely lacking from the Instant Notes book), transcript mapping and the use of bioinformatics to predict regions of DNA coding for both RNA and proteins. The point is well made that experimental confirmation of both transcript and protein sequences is essential and that prediction can only guide these experiments.

Chapters 4-6 concentrate on protein sequences: ‘Sequence comparison methods’ by C. Orengo, ‘Amino acid residue conservation’ by W.S.J. Valdar and D.T. Jones and ‘Function prediction from protein sequence’ by S.B. Nagl. The material is well known and well presented and emphasises the fact that if you already have information about what a protein sequence is doing it is likely that similar sequences will be doing the same.

Chapters 6-11 relay the same message for protein structures: ‘Protein structure comparison’ by I. Sillitoe and C. Orengo, ‘Protein structure classifications’ by F. Pearl and C. Orengo, ‘Comparative modeling’ by A.C.R. Martin, ‘Protein structure prediction’ by D.T. Jones, ‘From protein structure to function’ by A.E. Todd and ‘From structure-based genome annotation to understanding genes and proteins’ by S.A. Teichmann. I used to find this area hard to understand, but I think it may have been because the people explaining it did not understand it very well either. A tremendous amount of progress has been made in the last 20 years and I recommend this as an excellent introduction for someone new to the subject. Annabel Todd asks ‘What is function?’ I think it is pretty clear that function cannot be described in isolation but only as part of a process in a working system. There must be examples of the same molecule that does different things according to the system and the environment in which it finds itself? Sarah Teichmann shows how structural knowledge of entire proteomes can help us to understand the evolution of protein families in complete genomes, in multidomain proteins and in their functional context in cells such as in metabolic pathways.

Chapters 12-16 are based on newer technologies relating to protein-protein interactions, gene expression and protein expression: ‘Global approaches for studying protein-protein interaction’ by S.A. Teichmann, ‘Predicting the structure of protein-biomolecular interactions’ by R.M. Jackson, ‘Experimental use of DNA arrays’ by P. Kellam and X. Liu, ‘Mining gene expression data’ by X. Liu and P. Kellam and ‘Proteomics’ by M.P. Weir, W.P. Blackstock and R.M. Twyman. These are hot topics in bioinformatics but are less well integrated into the intellectual flow. It might have been better to put gene expression after gene finding and proteomics before the protein chapters.

Chapter 17 ‘Data management of biological information’ by N.J. Martin with a bit of RDBS and SQL is oddly placed, it could have been Chapter 2, or omitted. Chapter 18 ‘Internet technologies of bioinformatics’ by A.C.R. Martin could certainly have been omitted, especially Figures 18.1 and 18.2. There is a brief glossary that seems rather too elementary.

Apart from some minor reservations mentioned, I can strongly recommend this book to those who require an up-to-date introductory account of established and emerging procedures for the study of proteins with the help of computer programs.

Martin Bishop. HGMP, UK