ISMB 2005: Michigan, June 25-29

biosketch: Ewan Birney trained as a biochemist at Oxford University, and did a Ph.D. in gene prediction with Richard Durbin at the Wellcome Trust Sanger Institute. He moved to the EBI in 2000 to coordinate the EBI’s contribution to Ensembl, a joint project with the Sanger Institute to provide a comprehensive, automatically generated annotation for the genomes of higher animals. Ensembl is widely used by biomedical researchers, serving around a million pages a week, and has been used to generate gene sets for several genomes, including human, mouse, rat and chicken. In a collaboration with Lincoln Stein at the Cold Spring Harbor Laboratory (NY, USA), Ewan’s team also produces Reactome – a knowledgebase of human biological pathways. Other collaborations include the ENCODE project, a detailed gene anatomy of a specified region of the human genome; and the BioSapiens Network of Excellence. Ewan actively supports the open source movement: he is co-leader of the open-source bioinformatics toolkit Bioperl and president of the Open Bioinformatics Foundation, which supports the development of several bioinformatics toolkits.

talk title: Genomes to Systems Biology

abstract: Modern biology has been revolutionized by the sequencing of genomes across the tree of life. However, this immensely rich data has brought its own challenges. These range from conceptually mundane and yet critical engineering tasks through to genuine changes in our scientific understanding of how life works. In this talk I will present two projects, Ensembl (www.ensembl.org) and Reactome (www.reactome.org), the first of which is focused on analyzing genome sequence and the second which is a starting point into building rich structures representing human pathways on top of the genome.

homepage: http://www.ebi.ac.uk/~birney

biosketch: Janet Thornton has been Director of the EBI since October 2001. Her active research group focuses on using computational approaches to understand biology (especially proteins) at the molecular level, and her research combines the use of genomic, transciptomic, structural and metabolomic data with the aim of discovering how molecules interact to perform their functions, and how these functions evolved. Under her directorship, the EBI has expanded into several new research areas and has secured funding to provide space for its burgeoning staff base. She works tirelessly to raise awareness of the need for a stable bioinformatics infrastructure in Europe. BioSapiens, the European-Union-funded Network of Excellence that she coordinates, is enabling bioinformaticians throughout Europe to work together and with experimental biologists to annotate genome data. She is a Fellow of the Royal Society, a Member of the European Molecular Biology Organization, a Foreign Associate of the US National Academy of Sciences and a Commander of the British Empire.

talk title: From Proteins to Life - Old and New Challenges

abstract: Since the early days of my research, when 'bioinformatics' was not yet a recognised discipline and almost no biologists used computers, the challenge of understanding how the sequence of a protein determines its structure, and how each structure performs its own biological function and works together with other proteins to orchestrate life, was already clearly stated. From having only 20 protein structures when I started, to over 30,000 available in the Worldwide Protein Databank (wwPDB) today, our understanding has grown enormously, though the original challenges still remain. Initially we struggled even to find words and robust parameters to describe the structures and to develop computer tools to display, simplifying where appropriate, and analyse these beautiful but complex arrangements of atoms and molecules. New approaches were developed to validate the structures (PROCHECK) and to compare molecules quantitatively in three dimensions, allowing for insertions, deletions and mutations. Using our tools (Promotif), many analyses of motifs were published, defining common patterns that recur in proteins and may be markers of biological function (e.g. metal binding sites) or structural motifs that are energetically stable (e.g. b-turns). From the beginning our approach contrasted with that common amongst structural biologists who were determining structures (a process which was arduous and often took 5 years or more and many graduate students!). We analysed many structures, rather than focussing on a single protein or structure, leading us to seek better ways to store and query the data and thus to use relational databases in the mid-80s. We despaired about the lack of data consistency of the old PDB files and lack of clarity in defining data items. Repeatedly we tried to use the approaches common in physics and chemistry to model structures, but were continuously forced towards a more heuristic data-driven approach by the complexity, size and subtlety of these biological molecules and their interactions. Ultimately this led us to develop a heuristic classification of protein structure domains (CATH), partly to organise the data and make it manageable, but also to better understand how proteins evolve to perform their functions. Today we focus increasingly on understanding higher order complexes and especially the relationship between structure and function.

The progress towards improvements in handling, analysing and understanding the structural data is mirrored in other types of data now available to biologists (such as transcriptome and metabolome data), even in other branches of science, like astronomy, where stars are classified, or chemistry, where molecular databases are essential. At the EBI we are tackling all these issues for the core biomolecular data resources we host, seeking to improve data validation, quality, accessibility and integration.

My initial studies have led me down paths that were only distant dreams, when starting out as an undergraduate physicist. Today we not only consider structural and biophysical data, but are drawn in to look at other high-throughput data, such as expression data, metabolic data and biological pathways and networks. Our goals have broadened and become increasingly ambitious in trying to use these data, not just to understand about the molecules, but also to understand more about complex biological systems, such as bacterial evolution, catalysis, the molecular basis of diseases and ageing.

In this award lecture, which I am honoured to have been asked to present,
I shall look back over the major challenges and developments we have faced in structural bioinformatics, acknowledging the many scientists with whom I have had the pleasure to collaborate and look forward to our current interests and future challenges. See http://www.ebi.ac.uk/Thornton for references and summary of current research

homepage: http://www.ebi.ac.uk/Thornton/

biosketch: Howard Cash, President of Gene Codes and Gene Codes Forensics, Inc., inc. Howard Cash was born in Detroit, studied musical composition and conducting at the University of Pennsylvania and, after a period as Assistant Conductor with the Pennsylvania Opera Theater, Psychoacoustics at Stanford.

He has been at the forefront of commercial bioinformatics development since 1984. He joined IntelliGenetics where some of the seminal biotech software tools were developed including the "IG-Suite" set of DNA and protein analysis modules and the "Stratagene" expert system for clone management. In 1988, he founded Gene Codes Corporation where he remains as President and CEO. He designed and developed the “Sequencher” program used in thousands of academic and commercial DNA sequencing labs in forty-four countries.

In 1997, Governor John Engler appointed him to the Michigan State Commission on Genetics, Privacy and Progress. The commission recommended legislation on a host of issues related to genetic information and privacy and Cash chaired the committee on Property Rights, Ownership, Collection, Use and Storage [POCUS]. All recommendations that have come from the thirteen-member commission have been signed into State law.

Shortly after 9-11, Cash was asked to put his company at the disposal of the New York City Office of Chief Medical Examiner and to develop new software for DNA analysis and data handing for the purpose of identifying the remains of those killed at the World Trade Center. A new corporation called Gene Codes Forensics, Inc. was formed to focus exclusively on this project. It has been a daunting task from a technical standpoint, and has also raised ethical and legal issues involving jurisdiction, family rights and genetic privacy. The Mass-Fatality Identification System ("M-FISys," pronounced like emphasis) was created and remains the most advanced tool in the world for combining DNA technologies for human identification including autosomal Short Tandem Repeat [STR] analysis, mitochondrial sequence profiling and forensic SNP matching.

In January 2005, following the Boxing Day earthquake and ensuing tsunami, the help of Cash and Gene Codes Forensics were enlisted to help identify those killed in Thailand. Information technology tools developed for 9-11 are a tremendous advantage in the response to this disaster, and political challenges have proven to be greater than scientific ones.

Among many awards, Cash has received the Arthur Anderson/MTC "Leading Edge Technology Award" the prestigious Ernst and Young “Entrepreneur of the Year” award for S.E. Michigan, the “Person of the Year” award from Genome Technology magazine, "Medal for Extraordinary Service to Humanity" from the Bear Search and Rescue Foundation, and in 2005, the Merlanti Prize for "Best Practices in Business Ethics."

Cash has served on several boards, including the Hot Springs Music Festival, 9-11 WVFA and CEBOS Corporation. He is a member of the HUGO Council.

talk title: Biology of Life and Death: Disaster, DNA and the Information Science of Human Identification

abstract: I have been working professionally in bioinformatics since joining IntelliGenetics in 1984. That same year, the remains of a U.S. serviceman from the Vietnam War were interred in the Tomb of the Unknown Soldier at Arlington National Cemetery. My interest in the scientific niche of DNA forensics began in the 1990's with speculation that the remains of that soldier might be identified. Air Force Lt. Michael J. Blassie was identified in June 1998 through mitochondrial DNA [mtDNA] testing and returned to his family for burial in St. Louis.

By the time the Vietnam Unkown was identified, I was at Gene Codes Corporation. There we developed tools for mtDNA profiling which became standards at major forensic biology centers such as the Armed Forces DNA Identification Laboratory [AFDIL], the FBI laboratory and the Institute for Forensic Medicine in Innsbruck. The community of forensic users was small, but the analysis functions created to support sequencing for comparison to a reference mtDNA sequence had other applications including comparative genomics and clinical HIV genotyping.

Forensics was a tiny part of our work until Sept 11, 2001. When the World Trade Center towers fell, it was initially thought that five- to ten-thousand people might have been killed, though the final number of fatalities is now believed to be 2,749. Because of the sheer mechanical violence of the collapse, nearly 20,000 samples were delivered to the Medical Examiner's office for identification. In most cases, DNA was the only possible way to identify the remains, and existing DNA profile matching tools were not designed to handle a problem of this scale. Because we had both the domain experience and the engineering capacity, we were asked by the City of New York to make available essentially all of our technical resources to meet their needs for DNA profile information management.

The Mass-Fatality Identification System, or "M-FISys" (pronounced like "emphasis") was developed on a brutally accelerated timeline using Extreme Programming methodologies and close collaboration with forensic biologists in the NYC Office of Chief Medical Examiner [OCME]. Programming began in early November, driven by constantly evolving priorities and needs of the agency's front line scientists, the WTC DNA Identification Unit. By December 12, 2001, only 105 identifications had been made using DNA methodologies. The next day, when the first version of M-FISys was delivered to the OCME, 55 matches were found that would be confirmed as new identifications by Dr. Charles Hirsch, the city's Chief Medical Examiner.

M-FISys continued its rapid development, combining mtDNA sequencing with autosomal Short Tandem Repeat [STR] analysis and more recently autosomal SNP profiling and Y-STR typing. Since persons would be identified either to direct references (such as DNA recovered from a victim's toothbrush) or familial profiles, both direct matching and complex kinship analysis had to be supported. As meta data errors were discovered (e.g., toothbrushes brought in from the wrong family member, family donors reporting erroneous blood relationships, or commingled remains with multiple profiles) we experienced a continuous race to implement data QC tools to catch errors before they could result in a misidentification. Badly degraded samples were tested and retested with ever more sensitive assays. The efforts were exhaustive and it was not until February 2005 that the Medical Examiner declared that every scientifically reasonable attempt had been made to identify each bone and tissue sample. 1,594 victims had been identified and 10,769 individual remains, of which 9,728 (90.33%) could be identified by no means other than DNA typing.

We believed that the close of the World Trade Center effort meant a much needed respite and a return to a normal work schedule for my staff, but our rest was short lived. The earthquake and tsunami in South Asia on Dec 26, 2004 presented new challenges in human identification, and once again our phones began to ring. The DNA analysis tools created to respond to a terrorist attack would be applicable to a natural disaster, but new analytical functions would be needed to address new and different laboratory challenges and to interact with the local systems.

This keynote address will cover some of the software engineering methodologies, the design and computational strategies, and the startlingly intense geopolitical pressures that characterized the efforts to apply dispassionate scientific methods to terrible human tragedy. It also highlights just one of the ways that the field we all work in can dramatically impact organizations, society and individuals.

Background:

http://www.bio-itworld.com/archive/091103/soul.html
http://www.freep.com/news/metro/walsh11_20020911.htm
http://www.bio-itworld.com/news/050905_report8343.htm l

biosketch: Peter Hunter completed an engineering degree in 1971 in Theoretical and Applied Mechanics at the University of Auckland, New Zealand, a Master of Engineering degree in 1972 (Auckland) on solving the equations of arterial blood flow and a DPhil (PhD) in Physiology at the University of Oxford in 1975 on finite element modeling of ventricular mechanics. His major research interests since then have been modelling many aspects of the human body using specially developed computational algorithms and an anatomically and biophysically based approach which incorporates detailed anatomical and microstructural measurements and material properties into the continuum models. The interrelated electrical, mechanical and biochemical functions of the heart, for example, have been modelled in the first 'physiome' model of an organ. As the current co-Chair of the Physiome & Bioengineering Committee of the International Union of Physiological Sciences he is helping to lead the international Physiome Project which aims to use computational methods for understanding the integrated physiological function of the body in terms of the structure and function of tissues, cells and proteins. He established the first undergraduate biomedical engineering program in New Zealand in 2000 and the Bioengineering Institute in 2001. He is currently Director of the Bioengineering Institute at the University of Auckland and Director of Computational Physiology at Oxford University.

talk title: Computational Physiology and the IUPS Physiome Project

abstract: The International Union of Physiological Sciences (IUPS) Physiome Project is an internationally collaborative open-source project to provide a public domain framework for computational physiology, including the development of modeling standards, computational tools and web-accessible databases of models of structure and function at all spatial scales [1,2,3]. It aims to develop an infrastructure for linking models of biological structure and function across multiple levels of spatial organization and multiple time scales. The levels of biological organisation, from genes to the whole organism, includes gene regulatory networks, protein-protein and protein-ligand interactions, protein pathways, integrative cell function, tissue and whole heart structure-function relations. The whole heart models include the spatial distribution of protein expression.

The project requires the creation of web-accessible databases of mathematical models of structure and function at spatial scales which encompass nano-scale molecular events to the meter scale of the intact heart and torso, a range of 109, and temporal scales from Brownian motion (microseconds) to a human lifetime (109s), a range of 1015. Clearly this cannot be represented by one model but rather a hierarchy of models and modeling approaches such as stochastic models of ion channels and receptors for ligand binding calculations, ordinary differential equation lumped cell models, and partial differential equation continuum models at the tissue and organ levels. It also requires the model parameters at one scale to be linked to detailed models of structure and function at a smaller spatial scale – hence the need for "multi-scale modeling."

The long term challenge for the Physiome Project is to build a modeling framework in which the effect of a gene mutation can be modeled all the way from its effect on protein structure and function to how the altered properties of the protein affect a cellular process such as signal transduction, and how the changed properties of that process alter the function of tissues and organs. There will be many other benefits from this integrative framework. Understanding how model parameters are affected by individual variation, by embryological growth, by ageing and by disease, for example, will bring benefits to the design of medical devices, the diagnosis and treatment of disease and the development of new drugs.

References

Hunter, P.J., Robbins. P. and Noble, D. The IUPS Human Physiome Project. European Journal of Physiology. 445 (1), 1-9, 2002.
Hunter, P.J. and Borg, T.K. Integration from proteins to organs: The Physiome Project. Nature Reviews Molecular and Cell Biology. 4, 237-243, 2003.
Crampin, E.J., Halstead, M., Hunter, P.J., Nielsen, P.M.F., Noble, D., Smith, N.P.and Tawhai, M. Computational physiology and the Physiome Project. Exp. Physiol. 89, 1-26, 2004.

homepage: http://www.bioeng.auckland.ac.nz/home/home.php

biosketch: Jill P. Mesirov is associate director and chief informatics officer of the Broad Institute of MIT and Harvard where she directs the Bioinformatics and Computational Biology Organization. She is also adjunct professor of bioinformatics at Boston University.

Mesirov is a computational scientist who has spent many years working in the area of high performance computing on problems that arise in science, engineering, and business applications. Her current research interest is computational biology with a focus on algorithms and analytic methodologies for pattern recognition and discovery with applications to cancer genomics, genome analysis and interpretation, and comparative genomics. In addition, Mesirov is committed to the development of practical, accessible software tools to bring these methods to the general biomedical research community.

Mesirov came to the Whitehead Institute/MIT Center for Genome Research, now part of the Broad Institute, in 1997 from IBM where she was manager of computational biology and bioinformatics in the Healthcare/Pharmaceutical Solutions Organization. Before joining IBM in 1995, she was Director of Research at Thinking Machines Corporation for ten years.

Mesirov is a trustee of the Institute for Defense Analyses, Vice Chair of the Interoperable Informatics Infrastructure Consortium (I3C) and a member of review committees for the Department of Energy’s Argonne and Los Alamos National Laboratories. She is a fellow of the American Association for the Advancement of Science, and serves on numerous academic and corporate scientific advisory and journal editorial boards.

talk title: Gene Expression Analysis: A Knowledge-based Approach

abstract: DNA microarrays now make it possible to capture the expression pattern of all the genes in the genome in a single experiment. Genome-wide expression analysis is at the heart of global genomic approaches to biomedical research and appears in over 1000 published papers a year. The challenge that now faces us is not obtaining these molecular profiles, but interpreting them to gain a better understanding of underlying biological processes.

We will describe how prior biological knowledge can be incorporated into a robust, quantitative approach for analyzing mRNA profile data and used to shed light on the mechanisms of disease.

biosketch: Satoru Miyano, Ph.D., is a Professor of Human Genome Center, Institute of Medical Science, University of Tokyo. He obtained Ph.D. in Mathematics from Kyushu University in 1984. His research group is developing computational methods for inferring gene networks from microarray data and other biological data, e.g., protein-protein interactions, promoter sequences. The group has also developed a software tool called Genomic Object Net for modeling and simulation of various biological systems. This software is now commercialized as Cell Illustrator. Currently, his research group is intensively working for developing the gene network of human endothelial cell by knocking down hundreds of genes. With these technological achievements, his research direction is now heading toward a creation of Systems Pharmacology.

talk title: Computational Challenges for Gene Networks

abstract: Gene networks play a central role in systems biology. This talk presents two computational approaches related to gene networks.

First, computational methods for estimating gene networks from microarray gene expression data are presented. We consider microarray data obtained by various perturbations such as such as gene disruptions, shocks, drug responses, time-course measurements, etc. The idea is to combine the Bayesian network approach with nonparametric regression, where genes are regarded as random variables and the nonparametric regression enables us to capture from linear to nonlinear structures between genes. As a criterion for choosing good networks, we defined an information criterion called the BNRC (Bayesian network and Nonparametric Regression Criterion) score. Naturally, the sole use of microarray data has limitations on gene network estimation. For improving the biological accuracy of estimated gene networks, we made a general framework by extending this method for using genome-wide other biological information such as sequence information on promoter regions and protein-protein interactions. The problem of finding optimal Bayesian networks is known computationally intractable. We also developed an algorithm for searching and enumerating optimal and suboptimal Bayesian networks in feasible time on supercomputers. Computational experiments with this search algorithm have provided evidences of the biological rationality of our computational strategy. Then gene networks were applied for searching drug target genes. By exploring gene networks estimated from microarray data based on gene disruptions and drug doses, a novel drug target gene was identified and validated. For this purpose, we developed a software for visualizing and analyzing gene networks which played an important role in discovery. This suggests that our gene network approach can be a strong tactics for searching drug target genes.

Second, a software tool for modeling and simulating gene networks which is based on the notion of Petri net is presented. Obviously, an important challenge is to create a software platform with which scientists in biology/medicine can comfortably model and simulate dynamic causal interactions and processes in the cell(s). For this direction, we developed a software Cell Illustrator (http://www.gene-networks.com) which uses the notion of Hybrid Functional Petri Net with extension (HFPNe) as its architecture. Cell Illustrator has a biology-oriented GUI and we can make modeling of very complex biological processes like a drawing tool. Further, we can create a personalized visualization of simulation by developing an XML document for animation. Its effectiveness has been demonstrated by modeling various biological processes.

homepage: http://bonsai.ims.u-tokyo.ac.jp

biosketch: Pavel Pevzner holds the Ronald R. Taylor Chair in Computer Science. He joined the UCSD faculty in 2000, following five years in the University of Southern California's Mathematics and Computer Science departments. From 1992-95, he was an associate professor at Pennsylvania State University. >From 1990-92 Pevzner was a postdoctoral researcher at USC. He received his Ph.D in 1988 from the Moscow Institute of Physics and Technology. Pevzner is the author of the book "Computational Molecular Biology: An Algorithmic Approach" (MIT Press, 2000) and also "Introduction to Bioinformatics Algorithms", co-authored with Neil Jones (MIT Press, 2004). He is an executive editor of the "Journal of Computational Biology," and co-founder of the International Conference on Research in Computational Biology (RECOMB).

homepage: http://www-cse.ucsd.edu/users/ppevzner

talk title: Transforming Mice into Men: Fragile versus Random Breakage Models of Chromosome Evolution

abstract: Despite some differences in appearance and habits, men and mice are genetically very similar. In a pioneering paper, Nadeau and Taylor, 1984 estimated that surprisingly few genomic rearrangements (about 200) have happened since the divergence of human and mouse 75 million years ago. The genomic sequences of human and mouse provide evidence for a larger number of rearrangements than previously thought and shed some light on previously unknown features of mammalian evolution. In particular, they provide evidence for extensive re-use of breakpoints from the same relatively short regions and reveal a great variability in the rate of micro-rearrangements along the genome. Our analysis also implies the existence of a large number of very short "hidden" synteny blocks that were invisible in comparative mapping data and were ignored in previous studies of chromosome evolution. These results suggest a new model of chromosome evolution that postulates that breakpoints are chosen from relatively short fragile regions that have much higher propensity for rearrangements than the rest of the genome.

biosketch: Gunnar von Heijne has a long-standing interest in membrane proteins, and has in particular contributed to the understanding of their membrane assembly and topology. In addition to experimental molecular biology work, he has also taken part in the development of widely used bioinformatics prediction methods such as SignalP, TargetP, TopPred and TMHMM. He has published around 240 scientific articles, and is listed in the ISI Highly Cited’ database.

talk title: Membrane Proteins in vivo and in silico - Getting the Best of Two Worlds

abstract: Membrane protein research has gained a lot of momentum in recent years: high-resolution structures are being produced at an increasing rate, membrane proteomics is coming on line, and membrane proteins are recognized as drug targets of major importance. Bioinformatics has always been an integral part of the developments in the field, and today provides the tools necessary to identify the membrane complement of proteomes and to predict topologies and – in lucky cases – full 3D models of membrane proteins.

As in so many other areas, much is to be gained from a tighter integration between bioinformatics and experimental studies of membrane proteins. In our own work, we are reaching both towards proteome-wide studies of membrane proteins and towards a quantitative understanding of the cellular processes underlying the integration of proteins into biological membranes; in both cases, experimental and theoretical approaches must be combined to push forward.

References

Hessa, T., Kim., H., Bihlmaier, K., Lundin, C., Boekel, J., Andersson, H., Nilsson, I.M., White, S.H., and von Heijne, G. (2005) Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 433, 377-381.
Daley, D.O., Rapp, M., Granseth, E., Melén, K., Drew, D., and von Heijne, G. (2005) Global topology analysis of the Escherichia coli inner membrane proteome. Science, in press.

homepage: http://www.sbc.su.se/gunnar