Keynote Abstracts - Intelligent Systems for Molecular Biology

ISMB | For additional assistance, please consult the Contact Information page.

Gerald M. Edelman
The Neurosciences Institute

HOW MATTER BECOMES IMAGINATION: FROM BRAIN DYNAMICS TO CONSCIOUSNESS

Most approaches to understanding consciousness are generally concerned with the contributions of specific brain areas or groups of neurons. By contrast, in this talk, I consider what kinds of neural processes can account for key properties of conscious experience including its unity and its diversity. Applying measures of neural integration and complexity, together with an analysis of extensive neurological data, leads to a testable proposal --the dynamic core hypothesis-- about the properties of the neural substrate of consciousness. This hypothesis is built on cortical mechanisms involving reentrant signaling. Supporting evidence from MEG studies of human subjects will be presented.

Leroy Hood
Institute for Systems Biology, University of Washington

COMPUTING LIFE AND BIOLOGICAL COMPLEXITY

Living organisms express many different levels of biological information--DNA, m RNA, proteins, informational pathways, informational networks, etc. The challenge for biology of the 21st century is to globally decipher and integrate information at these various levels. Ultimately the objective is to model biological complexity so that its informational structures can be deduced and its emergent properties predicted. The human genome project has revolutionized biology, catalysed a series of paradigm changes and led to the view that the post genomic world is all about systems biology. I will review these paradigm changes, discuss systems approaches to biology and provide several examples of biological systems so analyzed.

Minoru Kanehisa
Kyoto University

GRAPH COMPARISON AND PATH COMPUTATION METHODS FOR PREDICTING MOLECULAR NETWORKS FROM GENOMIC INFORMATION

The link from the amino acid sequence to the native 3D structure is essential for understanding the function of a protein molecule. The link can be established computationally by threading and other prediction methods based on the knowledge of experimentally determined protein structures. Similarly, the link from the complete genome sequence to the network of interacting molecules is essential for understanding higher order functions involving various cellular processes. The computational link would become feasible once the knowledge of actual molecular networks in the living cell is sufficiently accumulated and knowledge-based prediction methods are properly developed. We have been computerizing current knowledge on cellular processes in KEGG (http://www.genome.ad.jp/kegg/) in terms of the abstract network of gene products-- mostly proteins but including functional RNAs. The network prediction as we define here involves a conversion from a complete set of genes in the genome to a network of gene products in the cell, which is considered as a conversion between two graphs: the genome graph to the network graph. The genome is a one-dimensionally connected graph with genes as nodes while the network is another graph with gene products as nodes and interactions as edges. The network prediction is based not only on the reference knowledge in KEGG but also on the integrated analysis of additional data, which are also represented as different types of graphs or sets of binary relations. They include protein-protein interaction data determined by yeast two-hybrid systems, gene-gene relations derived from microarray gene expression profiles, and sequence similarity and other relations obtained from sequence information. We have developed a heuristic graph comparison algorithm to detect, what we call, correlated clusters, which can be used as empirical rules to relate two or more graphs. We are also developing other graph feature detection methods and path computation methods for network prediction.

J. Andrew McCammon
University of California at San Diego

DYNAMICS OF MOLECULAR RECOGNITION

The selective character of the binding and reactivity of key biological molecules is essential for life. Properly understood, such selectivity can be exploited in the design of drugs, novel antibodies or enzymes, sensors, or a host of other materials or devices. This talk will provide a brief overview of how computer simulations can be used quantitatively to interpret the selectivity of molecular behavior. Both thermodynamic and kinetic selectivity will be considered. The potential of new generations of computing hardware and methodology to dramatically transform this area of work will be emphasized. Images and animations related to this work can be found at the website http://mccammon.ucsd.edu/.

Gene Myers
Celera Genomics Corp.

A WHOLE GENOME ASSEMBLY OF DROSOPHILA AND A PROGRESS REPORT ON THE HUMAN GENOME

We report on the design of a whole genome shotgun assembler and its application to the sequencing of the Drosophila genome. Celera’s whole genome strategy consists of randomly sampling pairs of sequence reads of length 500-600 that are at approximately known distances from each other – short pairs at a distance of 2K, long pairs at 10K, and BAC-end pairs at 150K. For Drosophila, we collected 1.6 million pairs whereby the sum of the lengths of the reads is roughly 13 times the length of the genome (~120 million), a so called 13X shotgun data set. The reads were further collected so there are two short read pairs to every long read pair, with a sprinkling of roughly 12,000 BAC-end pairs. The experimental accuracy of the read sequences averages 99.5%. Given this data set, the problem is to determine the sequence of Drosophila’s 4 chromosomes that are roughly 10% repetitive sequence. The assembler computes all overlaps between the reads in under 18 hours on a 4-processor Compaq platform, and completes the entire assembly process in under 72 hours. We layer the ideas of uncontested interval graph collapsing, confirmed read pairs, and mutually confirming paths to yield a strategy that makes remarkably few errors. The assembler correctly identifies all unique stretches of the genome, correctly building contigs for each and ordering them into scaffolds spanning each of the chromosomes. Thus all useful proteomic information is firmly assembled. Overall the results of assembly, without any of the finishing effort that ensues for all projects, meets the community standards set by Chromosome 22 and C. Elegans, for completion and accuracy of finished sequence. Our assembler has been further engineered to scale another 30-fold in order to perform a whole genome assembly of the human genome. We will report on the progress of this assembly. We will further report on our algorithms and the results of a hybrid assembly of a 3.5X whole-genome shotgun data set and the publicly generated 4X draft-shotgun of BACs covering 90% of the genome.

Harold A. Scheraga
Cornell University

AB INITIO FOLDING OF PROTEINS

Computation of protein structure by an ab initio method, i.e. by global optimization of a potential energy function without use of secondary structure predictions, threading, homology modeling, etc., requires (1) a reasonably-reliable potential energy function and (2) an efficient method to surmount the multiple-minima problem which arises from the ruggedness of the multi-dimensional energy surface. This talk will be concerned with the evolution of procedures to deal with these two requirements, culminating (at present) with a hierarchical procedure. This procedure starts with global conformational analysis of a protein, using the Conformational Space Annealing method and the united-residue force field UNRES, developed recently in our laboratory. This is the crucial stage of the procedure. Next, the lowest-energy conformation [C-alpha trace] of each family is converted to an all-atom chain by (a) converting the C-alpha trace to all-atom backbones by optimal alignment of peptide-group dipoles; (b) energy optimization of the backbone conformation for a given C-alpha trace; (c) attaching all-atom side chains subject to the condition of non-overlap; (d) final energy refinement of the all-atom conformation. The latter stage is carried out by using the EDMC method with the ECEPP/3 all-atom force field plus the SRFOPT surface-hydration model. The approach has been successful in the recent blind-prediction CASP3 experiment: the lowest-energy conformations of globular proteins contained about 60-residue-long contiguous segments of mostly helical proteins with good native topology and secondary structure. With recently-derived analytical formulas for higher-order terms in the cumulant expansion of the free energy, and re-calibration of the weights of the various terms by Z-score optimization, it now appears possible to predict right-twisted beta-sheets as well.

(Coworkers in development of the hierarchical procedure are A. Liwo, C. Czaplewski, J. Pillardy, J. Lee, and D. R. Ripoll).

David B. Searls
SmithKline Beecham Pharmaceuticals

G. CHRISTIAN OVERTON LECTURE: READING THE BOOK OF LIFE

ISMB 2000 will feature the first lecture to commemorate the work of G. Christian Overton, founding director of the Center for Bioinformatics at the University of Pennsylvania. The lecture, "Reading the Book of Life," will be given by David B. Searls, a longtime colleague of Overton. Overton, who died May 31, was well known as a pioneer of the emerging field of bioinformatics. He recognized early the potential impact of applying computational technology to the study of biological problems. In 1997, he established Penn's Center for Bioinformatics (http://www.pcbi.upenn.edu), which houses the Computational Biology andInformatics Laboratory (CBIL).

With the completion of the human genome sequence, we are passing into a new phase in the analysis of what is popularly being called the "Book of Life". However symbolic the shift, it is nevertheless palpable: the data will be in a more or less contiguous, more or less stable form, constituting a reference text against which polymorphisms, model organism sequences, genetic phenomena, etc., can be systematically and reliably pinned. The role of the bioinformatics practitioner may also be expected to change, by degrees; one may become less like an archaeologist, discovering and poring over shards of evidence to piece together rudimentary translations, and more like a literary critic, attuned to theme and variation, elucidating ever more subtle nuances of meaning and interrelationship in a well-worn textus receptus. Indeed, the integrative task at hand has been characterized as "biosequence exegesis" [Boguski (1999) Science 286:453-5].

The notion of genome as literature may be seen as an extension of the linguistic metaphor that has dominated molecular biology from its inception, and which is evident in terminology such as 'transcription', 'translation', etc. Tools and techniques of a linguistic character have proven useful in biological sequence analysis, especially in the trend toward algorithms that model the syntactic features of the domain with increasing sophistication. Might the methodologies of textual criticism and literary theory also carry over to analysis of the genome? Will academic factions arise (The Bayesian School? Sequence/Structuralism?) and perhaps compete? Will there be a Comparative Literature of species? Will Post-Genomics partake of Post-Modernism? While the comparison may seem fanciful, there are some instructive analogies to be drawn between genomic and literary texts. This talk will comment on the grammar of genes, the poetics of proteins, and some correspondences between philology and phylogenetics.

Back to the Top

ISMB | For additional assistance, please consult the Contact Information page.