16th Annual
International Conference
Intelligent Systems
for Molecular Biology

Metro Toronto Convention Centre (South Building)
Toronto, Canada


Special Session Details

Health and Diseases in our Genomes: Mining Genetic Variation
Organizer(s): Thomas Hudson, PhD
Ontario Institute for Cancer Research
Toronto, Canada

Date: Sunday, July 20
Time: 10:45 a.m. - 12:40 p.m.
Room: 701A
*Schedule subject to change

Tremendous advances in human genetics have resulted from recent characterization of the landscape of human genetic variation, particularly as a result of the International HapMap project. With over 3 million single-nucleotide polymorphisms (SNPs) genotyped in samples from multiple populations, patterns of linkage disequilibrium have enabled the design of genome-wide SNP genotyping platforms with tag-SNPs that provide comprehensive analyses of common genetic variation across the genome. These have led to genome-wide association studies for a large number of common human diseases, and the identification of many new disease loci in the past two years.

In this Special Session, speakers have been selected to describe various methods in genetic epidemiology, statistical genetics and population genetics that are being applied in genome-wide association studies of complex human traits. After the talks - a panel discussion will serve to discuss current challenges in the validation of loci observed to date, the frequent lack of protein-coding genes at the epicentre of the association signals, and other approaches needed to detect rarer or structural variants that may predispose to disease.

Confirmed presenters to date:

-Mark Lathrop (France)
-Mark Daly (USA)
-David Balding (UK)


Frontiers in cell imaging and automated image analysis

Organizer(s): Robert F. Murphy, Ph.D.
Ray and Stephanie Lane Professor of Computational Biology Director,
Lane Center for Computational Biology Director,
Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology
Carnegie Mellon University
Sponsored by Cellumen, Inc., Pittsburgh, PA

Date: Sunday, July 20
Time: 2:15 p.m. - 4:10 p.m.
Room: 701A
*Schedule subject to change

Dramatic advances in fluorescent probe development, new fluorescence microscope designs to achieve greatly improved temporal and spatial resolution, and significant advances in digital camera and computer technology have enabled increasing use of fluorescence microscopy for quantitative, large scale studies of cell behavior. The high volume and high quality of images resulting from these studies has created and will continue to create many opportunities for computational analysis, especially in the realm of machine learning. A particular growth area is the systematic imaging of particular aspects of cell behavior across many conditions, such as proteome-wide determination of subcellular location. The focus of this special session will be on recent advances in cell imaging capabilities and on automated analysis of cell images to discover new biological knowledge about cell structure and function.


Title: Using Fluorescent Proteins and Automated Live Cell Imaging to Study Membrane Biogenesis

 Presenter: Mathew P.A. Henderson and David W. Andrews

Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, L8N 3Z5, Canada

Fluorescent proteins are most often used as tags to identify the subcellular localization of proteins in live cells.  They can also be used as organelle specific markers and as reporters for both protein:protein interactions and cellular processes such as organelle assembly.  When fluorescent proteins are used as reporters, assays do not have to 'end-point'.  The advantages to maintaining cell viability throughout screening includes facilitating functional cloning and time course experiments that can extend over prolonged periods.  We are exploring the use of fluorescent proteins for image based genetic screens in yeast and human cells.  I will highlight some surprising effects on the regulation of the expression of exogenous genes in cells that create difficulties for screening. Using high resolution automated fluorescent imaging and the yeast specific gene deletion library, the effect of 5100 non-essential genes on the assembly of proteins into the endoplasmic reticulum was examined.  Automated analysis of colocalization, texture and morphology features was carried out for > 75,000 images. Multidimensional clustering of the images and support vector machines were used to classify the images of the different strains of yeast.  An array of phenotypes including changes in ER morphology as well as other processes required to maintain the unique protein and lipid composition of the organelle were revealed.   Our preliminary data suggest that a surprising diversity of information can be recorded in image based genetic screens using live cells and fluorescent proteins.


Title: Inference of morphogenic pathways from live cell images


Presenter: Gaudenz Danuser

The Scripps Research Institute


Morphogenic pathways such as those implicated in cell migration are regulated in space and time. Often they integrate mechanical signals with long-range effects and chemical signals with shorter range effects. One of the prime challenges in the analysis of pathways is to define the hierarchy and kinetics of signal transduction between spatially and temporally distributed pathway components. My lab is building a novel image analysis paradigm by which we multiplex image measurements across many experiments to register time courses of signaling events relative to cellular outputs. Indirectly, this defines also the sequence of activation of pathway components that are not simultaneously measured. Key to our multiplexing concept is the local analysis of constitutive stochastic fluctuations in pathways components with high resolution, i.e. below the diffusion radius of signaling molecules. Under these conditions, the correlation of time courses reveals precise information of the timing and of the spatial relationships between pathway activities, independent of intra-cellular and inter-cellular heterogeneity. From the timing, we can then infer causality and derive maps of the pathway hierarchy. In this presentation I will outline the idea of image fluctuation analysis and multiplexing and present first examples of pathway inference that underpin the potential of this data analysis approach.


Title: Towards Describing the Systems Architecture of Cell Morphogenesis

Presenter: Chris Bakal, Harvard Medical School


Cell shape is determined by genetically encoded complex adaptive systems that interface with extracellular environment and engage in continual exploration of morphological space. Although most of the essential components of these systems have been identified, there exists very little understanding of process architecture. We have developed technologies in order to quantify hundreds of cellular features following systematic and high-throughput inhibition of gene function by RNAi. Genes can then be assigned "Quantitative Morphological Signatures" (QMSes) that describe the contribution of each gene to cell shape. Computational analysis of QMS compendiums reveal the structure of local networks, or subsystems, that act in a hierarchical manner to control different aspect of cell shape such as size, protrusion, and adhesion. QMSes can be used to infer the subcellular localization of local networks, as well as direct physical interactions between components.  Furthermore, we have also analyzed QMSes to describe how genomes have evolved in order to maximize effective morphological complexity.


Title: BISQUE: A Web based Database Infrastructure to Store, Organize and Analyze Bioimages

Presenter: B.S. Manjunath, University of California, Santa Barbara


Recent advances in imaging technologies have resulted in large volumes of image and video data, with most of the analysis still done manually and in a qualitative manner. Manual analysis is not only time intensive but often is not reproducible as well. Further, there is little, if any, data base support to manage these image/video collections, to store, search and retrieve image related information within an integrated framework. In this talk, I will present recent progress at the UCSB Center for Bioimage Informatics in building an image informatics infrastructure. The UCSB's BISQUE database system tightly integrates image analysis with traditional database functionalities, and offers a rich set of tools for biologists to store, access, analyze, annotate, and share their image collections.  I will discuss some of the embedded tools including image segmentation, tracking and registration, that enable quantitative analysis of the bioimage data, as well as efforts on benchmarking/validating some of these image processing methods. This work is in collaboration with Professors Ambuj Singh (databases), Ken Rose (pattern recognition), Steve Fisher (cell biology/retinal detachment), Stu Feinstein and Les Wilson (both on microtubule dynamics).


Interaction Networks and Disease
Organizer(s): Shoshana Wodak, PhD (1) and Gary Bader (2)
1 Professor, Departments of Biochemistry and Medical Genetics,
University of Toronto, Canada
Senior Scientist, Hospital for Sick Children, Toronto
Canada Research Chair in Computational Biology and Bioinformatics
2 University of Toronto

Date: Monday, July 21
Time: 10:45 a.m. - 12:40 p.m.
Room: 701A
*Schedule subject to change
The recent explosion of high throughput experimental technologies for characterizing protein interactions has generated large amounts of data describing interactions between thousands of genes and proteins, affording for the first time system-level investigations of the proteome and the underlying biological processes. These developments together with the new discoveries made by genome sequence efforts, notably on the conservation of cellular processes across organisms and the multi-faceted functions of proteins, has fundamentally altered the landscape of protein science. Exploring this landscape has become increasingly multidisciplinary and challenging. The impressive progress in analytical and imaging techniques, easy access to information and the availability of ever more powerful computers to interpret and integrate it, are helping us meet the challenge of finding the routes that lead to new knowledge.

The session will cover exciting achievements and frontiers in interactome analyses, These will include, use of structural information to help interpret data on protein interactions networks and their evolution; integration of data from biochemical, proteomics, mass spec and cryo-EM to build models of large molecular machines, learn about human diseases from information on interactomes of model organisms; mapping protein complexes onto pathways using information on genetic interactions and phenotypes.


Understanding Protein Function on a Genome-scale using Networks
Mark Gerstein(
MB&B Department, Bass 432, 266 Whitney Ave.,Yale University, New Haven, CT 06520 USA;
My talk will be concerned with topics in proteomics, in particular predicting protein function on a genomic scale. We approach this through the prediction and analysis of biological networks, focusing on protein-protein interaction and transcription-factor-target ones. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various simple topological statistics. In particular, I will discuss a number of specific analyses: (1) Integrating gene expression data with the regulatory network illuminates transient hubs; (2) Integration of the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces involved in interactions (one or many); (3) Analysis of betweenness in biological networks reveals that this quantity is more strongly correlated with essentially than degree; (4) Analysis of structure of the regulatory network shows that it has a hierarchiel layout with the "middle-managers" acting as information bottlenecks. (5) Development of a useful web-based tools for the analysis of networks, TopNet and tYNA.

Relevant links and publications

Integrated prediction of the helical membrane protein interactome in yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher, V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks. H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

The role of disorder in interaction networks: a structural analysis. PM Kim, A Sboner, Y Xia, M Gerstein (2008) Mol Syst Biol 4: 179.

Positive selection at the protein network periphery: evaluation in terms of
structural constraints and cellular context. PM Kim, JO Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

High Quality Binary Protein Interaction Map Reveals Properties of Yeast Interactome Network
Haiyuan Yu and Marc Vidal (
Center for Cancer Systems Biology,Dana-Farber Cancer Institute, 1 Jimmy Fund Way
Smith 858, Boston, MA 02115, USA

Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We developed an empirically-controlled mapping framework to produce a "second-generation" high-quality interactome dataset covering ~20% of all binary interactions. Compared to co-complex interactome models, this new map is enriched for transient signaling interactions and inter-complex connections. We find a highly significant clustering between essential proteins, an unexpected result that might relate to network evolution. Surprisingly, rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy. The wealth and quality of biological information obtained by our new approach argue for more comprehensive network maps for yeast and other organisms.

Mouse Functional Linkage Graphs and Complex Disease
Murat Tasan1, Weidong Tian1, Mikkel Oestergaard2, Jonathan Tyrer3, Jonathan Morrison3, David P Hill4, Judith A Blake4, Bruce AJ Ponder3 , Douglas F Easton5, Paul D. Pharoah3, and Frederick P. Roth (
1Harvard Medical School, Biological Chemistry & Molecular Pharmacology Dept, Boston, MA 02115, USA
2Dept. of Public Health and Primary Care, Strangeways Research Laboratories. Cambridge CB1 8RN, UK
3Cancer Research UK Dept. of Oncology, Strangeways Research Laboratories. Cambridge CB1 8RN, UK
4The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
3Cancer Research UK Genetic Epidemiology Group, Strangeways Research Laboratories. Cambridge CB1 8RN, UK

Objective: Several years after sequencing of the human and mouse genomes, functions for human and mouse genes remain largely undiscovered. A major challenge is to focus limited experimental resources on the most likely hypotheses using computational predictions of gene function and functional relationships.
Results: We predicted Gene Ontology (GO) and phenotype terms for 21,603 Mus musculus genes, using a diverse collection of integrated data sources (including expression, interaction, and sequence-based data), and an optimized combination of 'guilt-by-profiling' and 'guilt-by-association' inference methodologies. Predictions were evaluated using a held-out gene set, and top predictions were examined manually using available literature. A set of 'functionally linked' genes and corresponding SNP pairs was generated from prediction scores and used to focus a search for human SNP pairs associated with breast cancer status. In a sample of 4,000 breast cancer cases and 4,000 controls with ~13,500 SNP pairs tested, the false discovery rates were 83% among non-functionally linked SNP pairs but reduced to 50% among functionally linked SNP pairs (at a nominal p-value of 2x10-4).
Conclusions: We assigned a confidence score to each gene/term combination. Nearly every GO term achieved greater than 40% precision at 1% recall. Among the 36 novel predictions for GO terms and 40 for phenotypes that were studied manually, >80% and >40%, respectively, were identified as accurate. A combination of 'guilt-by-profiling' and 'guilt-by-association' outperforms either approach alone. We have shown that functional linkage graphs can be used to enrich for SNP pairs exhibiting complex association with breast cancer susceptibility.

Protein networks for classification of disease
Trey Ideker (
Department of Bioengineering, University of California San Diego,
9500 Gilman Drive, Mail Code 0412 La Jolla, CA 92093-0412;
During a decade of proof-of-principle analysis in model organisms, protein networks have been used to further the study of molecular evolution, to gain insight into the robustness of cells to perturbation, and for assignment of new protein functions. Following on these analyses, and with the recent rise of protein interaction measurements in mammals, protein networks are increasingly serving as tools to unravel the molecular basis of disease. We discuss promising applications of protein networks to disease in three major areas: identifying new disease genes; identifying disease-related subnetworks; and network-based disease classification. Applications in infectious disease, personalized medicine, and pharmacology are also forthcoming as the available protein network information improves in quality and coverage.
Relevant Publications and links:
Ideker T and Sharan R. Protein networks in disease. Genome Research 18(4):644-52 (2008).
Chuang, HY, Lee, E, Liu, YT, Lee, D, and Ideker, T. Network-based classification of breast cancer metastasis. Mol Syst Biol. 3:140 (2007).
Suthram S, Beyer A, Karp RM, Eldar Y, Ideker T. eQED: an efficient method for interpreting eQTL associations using protein networks. Mol Syst Biol. 4:162.(2008).
Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maerel S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang PL, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, Ideker T*, and Bader GB. (Ideker is corresponding author). Integration of Biological Networks and Gene Expression Data using Cytoscape Nature Protocols, 2(10):2366-82 Oct 1 (2007)

We gratefully acknowledge funding through NIH/NIGMS grant GM070743-01; NSF grant CCF-0425926; Unilever, and the Packard Foundation.

Unbiased Biology: Functional Insights from Genetic and Protein-Protein Interaction Maps
Nevan Krogan (
Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA.
Pathways and complexes can be considered fundamental units of cell biology, but their relationship to each other is difficult to define. Comprehensive tagging and purification experiments have generated networks of interactions that represent most stable protein complexes in yeast cells. We describe this work, and show how the analysis of pairwise epistatic relationships between genes complements the physical interaction data, and furthermore can used to classify gene products into parallel and linear pathways.

Determining the architectures of macromolecular assemblies by integrating spatial restraints from proteomic data
Frank Alber1, Svetlana Dokudovskaya5, Liesbeth M. Veenhoff6, Damien Devos7, Brian T. Chait3, Michael P. Rout2 and Andrej Sali4 (
1Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089 2Lab. of Cellular and Structural Biology and 3Lab. of Mass Spectrometry and Gaseous Ion Chemistry, The Rockefeller University, New York, NY 10021 4Dep. of Biopharm. Sciences and California Institute for Biosciences (QB3), UCSF, San Francisco, CA 94158, 5Institut Jaques Monod, Paris, 6University of Groningen, 7EMBL Heidelberg.
To understand the workings of the living cell, we need a detailed description of the architectures of its macromolecular assemblies. Here we show how low-resolution proteomic and biophysical data can be used to determine such structures. The process involves collection of sufficient and diverse proteomic and biophysical data, translation of this data into spatial restraints, and an optimization that uses these restraints to generate an ensemble of structures consistent with the data. Analysis of the ensemble produces a detailed architectural map and interaction network of the assembly. We developed our approach on a challenging model system, the nuclear pore complex (NPC). The NPC acts as a dynamic barrier to control access to and from the nucleus, and in yeast is a 50 MDa assembly of 456 proteins. Our resulting structure reveals the configuration of the proteins in the NPC, and provides insights into the evolution and architectural principles. The present approach should be applicable to many other macromolecular assemblies.


Structural Bioinformatics: Deciphering the proteome
Organizer(s): Igor Jurisica, PhD (1) and Ryan Lilien, PhD (2)
1 Ontario Cancer Institute and University of Toronto
Toronto, Canada
2 University of Toronto
Department of Computer Science & Medicine

Financial support for the Structural Bioinformatics special session was supported in part by MITACS and IBM

Date: Monday, July 21
Time: 2:15 p.m. - 4:10 p.m.
Room: 701A
*Schedule subject to change

Structural bioinformatics is the branch of bioinformatics, related to the analysis, prediction, and experimental determination of the three-dimensional structures of biological macromolecules such as proteins.

While there has been significant progress in individual sub-areas, a growing trend is to integrate data and algorithms across different fields to achieve more a systematic study of the proteome. For example, sequence analysis has improved motif space coverage for structural genomics, machine learning and data mining has made progress to optimize crystallization conditions, image analysis has enhanced high-throughput crystallization, prediction of protein complexes through various experimental and computational docking methods has improved understanding of protein-protein interactions, etc...

The 3 selected speakers in this special session are leaders in both experimental (X-ray crystallography and NMR spectroscopy) and computational (ab initio) techniques. They will present their thoughts on current bottlenecks and deficiencies in experimental protein structure determination, experimental study of protein structural dynamics, and tertiary structure prediction. The session will close with a panel discussion on structural bioinformatics - its challenges, impact and future development.

Dr. Richard Bonneau is the technical lead on the Human Proteome Folding project. Dr. Bonneau has expertise primarily in ab initio protein structure prediction, protein folding, and regulatory network inference. He is currently focused on applying structure prediction and structural information to functional annotation and the modeling/prediction of regulatory and physical networks. Dr. Bonneau working to develop general methods to solve protein structures and protein complexes with small sets of distance constraints derived from chemical cross-linking. At NYU Dr. Bonneau also works on a number of systems biology data-integration and analysis algorithms, including algorithms designed to infer global regulatory networks from systems biology data.

Title: Reading the Crystallization Tea Leaves With A Little Help from My Web Friends
Presenter:George De Titta, Ph.D. (joint work with C. A. Cumbaa, J., Luft, A. Lauricella, M. Malkowski, R. Nagel, E. Snell, I. Jurisica)

George De Titta took his Ph.D. From the University of Pittsburgh working with Bryan Craven and joined the scientific staff of the Hauptman-Woodward Institute (then the Medical Foundation of Buffalo) in 1973. He has served in a number of roles at HWI, most recently as its chief executive officer, a position he held for nine years. He is now a Principal Research Scientist within the Institute. In addition, he is Professor and Chairman of the Department of Structural Biology at the State University of New York at Buffalo. His research interests include algorithm development for the crystallographic phase problem, the structures of vitamins and prostaglandins, and the macromolecular crystallization problem.

Title: On creating publicly available datasets in drug discovery science. Future opportunities for data mining
Presenter: Aled Edwards, Ph.D.

Aled Edwards is the Chief Executive of the Structural Genomics Consortium, an Anglo-Canadian-Swedish public-private partnership created for the purpose of increasing substantially the number of protein structures of relevance to human health available in the public domain. Dr. Edwards received his Ph.D. in Biochemistry from McGill University and did his postdoctoral training at Stanford University with Roger Kornberg. He moved to McMaster University in 1992 and then to the University of Toronto in 1997. He is now Banbury Professor of Medical Research in the Banting and Best Department of Medical Research at the University of Toronto.


Sequencing thousands of human genomes - perspectives, challenges, and analysis
Organizer(s): Francisco De La Vega (1) and Gabor Marth (2)
1 Applied Biosystems, Foster City, CA, USA
2 Boston College, Boston, MA, USA

Date: Tuesday, July 22
Time: 10:45 a.m. - 12:40 p.m.
Room: 701A
*Schedule subject to change

DNA sequencing based on capillary electrophoresis revolutionized the world of biological science and led to the sequencing of the human genome and other important model organisms. The success of these genome projects has only whetted the appetite of the scientific community, thus creating demand for cheaper, faster, and more sensitive sequencing technologies. In 2004, NHGRI initiated a coordinated effort to support the development of technologies to dramatically reduce the cost of DNA sequencing, a move aimed at broadening the applications of genomic information in medical research and health care. NHGRI has awarded so far over $83 million to investigators to develop both "near term" and revolutionary sequencing technologies (a.k.a., next generation sequencing technologies). Many of the technologies funded by NIH under this program, and others developed independently, are now beginning to bear fruits, and new next generation sequencing technologies are being commercialized.

The term next generation sequencing (NGS) has been used in two ways: (a) To describe the potential technologies that will achieve the $100,000 dollar human genome and eventually the $1000 genome, and (b) To describe new sequencing methodologies that do not use the Sanger sequencing methodology (di-deoxy sequencing) and are massively parallel. To achieve the ambitious goals described above throughput must be increased dramatically. This is achieved by carrying out many parallel reactions. Despite the fact the read-length is short (down to 25-50 bp), the overall throughput is enormous, each instrument run producing up to several hundreds of million reads and billions of basepairs of sequence data.

Given the commercial availability of many NGS platforms, the Wellcome Trust and the NHGRI supported sequencing centers have started to plan the sequencing of up to 1000 genomes to enable the unbiased identification of sequence variation down to 1% in frequency in major human populations (the "1000 Genomes Project"). Previous surveys of sequence variations such as the International HapMap Project focused on common genetic variation through the genotyping of three million SNPs in a panel of samples representing four continental population groups. The SNPs for genotyping where selected through a high ascertainment bias to favor highly polymorphic SNPs. Although these reference data has proved invaluable to develop and interpret genome-wide association studies to study common complex disease, the role of rare and structural variation is still poorly understood. Therefore, the next wave of resequencing of human genomes aim to better understand and catalog more rare variation on an extended panel of HapMap samples (about 1,000 DNAs), and to develop the ground for personalized medicine based on the complete genome information.

Obviously, the magnitude of such sequencing projects present many bioinformatics challenges that need to be addressed quickly. Among these are dealing with the diverse outputs and error models of the new sequencing machines; managing the huge volumes of data; determining regions of genome sequences that can be uniquely resequenced with the shorter reads; the development of fast and accurate reference-guided read realignment algorithms; and adapting SNP and short-INDEL discovery software for the next-generation read types and developing new software for structural variation discovery. Clearly, the next frontier for the application of the next-generation sequencing instruments will be individual human resequencing projects.

Title of Talk Presenter
Challenges in the analysis of next generation sequencing data to enable sequencing of 1000's of human genomes. Gabor Marth, D.Sc., Boston College , Boston, MA, USA.
Analysis of Low Coverage Shotgun Sequence Data in Hundreds or Thousands of Individuals Goncalo Abecasis, Ph.D., University of Michigan, Ann Arbor, MI, USA.
Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals. Rasmus Nielsen, Ph.D., U.C. Berkeley, Berkeley, Ca, USA.
Genome-wide surveys of sequence and structural variation with short-read, next generation sequencing. Francisco De La Vega, D.Sc, Applied Biosystems, Foster City, CA, USA.

Francisco M. De La Vega is Scientific Fellow in Genetics and Senior Director, Computational Genetics, at Applied Biosystems in Foster City, California. Dr. De La Vega's held an Assistant Professor appointment at the Center for Research and Advanced Studies of the National Polytechnic Institute at Mexico City during 1989-1997, were he developed and headed the Computational Biology unit. There, he was involved in mining of protein families from genomic databases, scientific visualization, and statistical analysis of gene expression patterns in the bacteria-bateriophage system. A pioneer on distance education, he was faculty of the first internet-based course of biocomputing in 1996. In 1998 he moved to the industry to develop the bioinformatics efforts of the microarray products of Applied Biosystems. He then led the design, SNP selection, and analysis of a large pioneering project that genotyped over 160,000 gene-centric SNPs to survey the patterns of genetic variation. Recently he has been involved in the development of genotyping platforms and in the design and analysis of genetic association studies. His current interests include the application of ultra-high throughput sequencing technologies in genetic epidemiology and population studies. His team is working in algorithms for the identification of sequence variations (both single base and structural) with the recently released Applied Biosystems SOLiD System, a nextgeneration, high-throughput massively parallel sequencing platform based on oligonucleotide ligation chemistry. He has organized previously four session tracks at the Pacific Symposium on Biocomputing, including one on the subject of "Computational Tools for next-generation sequencing Applications" to be held January 2008.

Gabor Marth is Assistant Professor and Head of the Bioinformatics Program at the Boston College Biology Department, He received his doctorate in 1995 from Washington University in St. Louis from the Department of Systems Science and Mathematics. He was a postdoc under the guidance of Robert Waterston at the Washington University School of Medicine where he trained as a Bioinformatician, and developed software algorithms for genome sequencing. He also developed a novel Bayesian polymorphism discovery program, PolyBayes, that is still in wide use today. In 2000 he moved to the National Center for Biotechnology Information at the National Institutes of Health, working as a Staff Scientist with Stephen Altschul. During his tenure at the NCBI, Dr. Marth has lead a number of SNP discovery projects that contributed to the first dense SNP marker map of the human genome. He has also developed theoretical population genetic methods for improved demographic inferences from SNP data, and participated in the study design of the International HapMap project. He joined Boston College as a tenure-track faculty member in 2003. His current research includes the development of

Computational Challenges and Opportunities in Host-Pathogen Systems Biology
Organizer(s):T. M. Murali1 and Matthew D. Dyer2,3
1Department of Computer Science,
2Genetics, Bioinformatics, and Computational Biology Program, and
3Virginia Bioinformatics Institute,
Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061

Date: Tuesday, July 22
Time: 2:15 p.m. - 4:10 p.m.
Room: 701A
*Schedule subject to change

Description of the field
Sophisticated high-throughput biological experiments to interrogate a cell provide a wide range of information about cell state. These advances, combined with the public availability of the resulting datasets, herald the era of systems biology. Numerous, diverse, and rich resources exist for studying the systems biology of model organisms such as E. coli, S. cerevisiae, C. elegans, and D. melanogaster. Experimental and computational methods developed for these organisms are now being extended to mammals and are rapidly helping to shed light on the systems biology of complex human diseases such as cancer and diabetes.

Despite these advances, the computational molecular systems biology of infectious diseases has not received as much attention. Pathogen-related diseases result in millions of deaths each year and cause billions of dollars worth of damage to crops and livestock. Millions of dollars are spent annually to better understand how pathogens infect their hosts and to identify potential targets for therapeutics. Systems biology of infectious disease is intrinsically more difficult than that of model organisms as it involves the interaction of two complex systems. Furthermore, a significant hurdle to progress in host-pathogen systems biology is the severe lack of large-scale datasets detailing interactions between host and pathogen molecules during the process of infection and the events triggered by these interactions.

This special session will stimulate research in the interactions that occur at the interface between host and pathogen. The session will bring together researchers working at the forefront of this interface, be it in terms of studying comparative genomics, gene expression, protein interactions, infection pathways, immune response, or other related aspects. The recent availability of experimental data for host-pathogen systems makes the development of novel computational techniques and bioinformatic resources important. The prominent scientists who will speak at this session will discuss the challenges that arise in the experimental generation of cross-species molecular datasets and the computational analyses of these data. The potentially transformative research that will ensue is likely to yield novel datasets and computational techniques that will enable us to better understand mechanisms of infection and identify key molecular targets for potential therapeutics.

Title of Talk Presenter
Functional Genomics will save the world from killer viruses (but it better happen soon!) Michael G. Katze
Comparative interactomics of bacteria Peter Uetz
Genome evolution of fungal pathogens Christina Cuomo
Inferring genetic regulatory networks in host-pathogen interactions Brett Tyler

Title: Functional genomics will save the world from killer viruses (but it better happen soon!)
Michael G. Katze, Department of Microbiology and the Washington National Primate Center, University of Washington.

To understand more fully the molecular events associated with influenza virus infections, we have studied both low pathogenic and the more highly pathogenic 1918 and H5 avian related viruses in animal models including wild-type and knock out mice and non-human primates. Using histopathology, classic virology/immunology, and genomic and bioinformatics tools, we have been able to correlate the evolution of infection and disease with gene expression changes at a global level. We formerly examined and compared transcriptional profiles in the lungs, bronchial brushes, and whole blood of 1918 infected animals. The higher order bioinformatics analysis revealed a dramatic difference of gene expression changes in inflammatory, cytokine, and cell death gene expression at both a qualitative and quantitative level; the differences were most dramatic and accelerated in the lungs of animals infected with the completely reconstructed 1918 viruses. Molecular events occurring extremely early after infection appear to dictate the pathogenic outcome of infection. Similar experiments have now been performed with highly pathogenic H5 viruses in mice and macaques; we are now just completing a detailed genomics and clinical comparison of both 1918 and highly pathogenic H5 infections.

A long range goal of ours remains to develop an ``Influenza and Respiratory Virus Compendium'' which would enable us to compile a centralized data base to catalogue all transcriptional events in cells and organs infected by influenza and other virulent respiratory viruses such as SARS viruses with a wide range of virulence. Identification of new host pathways impacted by viruses should ultimately reveal novel therapeutic targets and improved and much needed antiviral therapies and vaccines.

About the Speaker
Michael G. Katze is Professor of Microbiology and Associate Director and Core Staff Scientist at the Washington National Primate Research Center. He received his Ph.D. from Hahnemann Medical College and was a postdoctoral fellow at the University of Uppsala in Sweden as part of a fellowship with the European Molecular Biology Organization. Prior to joining the faculty at the University of Washington, Dr. Katze conducted research in molecular biology and virology at the Memorial Sloan-Kettering Cancer Center in New York City. Dr. Katze has studied virus-host interactions for more than 25 years and is a recognized leader in applying genomic and proteomic technologies to the study of virus-host interactions and the interferon response. He is an author of over 170 papers and reviews and received the Milstein Award from the International Society of Interferon and Cytokine Research for his contributions to the interferon field. He was also recently awarded the prestigious Dozor Scholar Award by the Israeli Microbiology Society for his research accomplishments. Dr. Katze heads a laboratory of over 30 individuals, including graduate students, postdoctoral fellows, research scientists, technologists, software engineers, and bioinformatics and information technology specialists.

Title: Comparative interactomics of bacteria
Peter Uetz, the J. Craig Venter Institute and the Institute of Toxicology and Genetics, Forschungszentrum Karlsruhe

Proteins act by interacting with small chemical compounds and macromolecules, especially other proteins. We have been studying protein-protein interactions with a particular emphasis on proteins of unknown function, assuming that we can learn something about their activities. Recently we finished the protein interaction network of Treponema pallidum which has about 3,600 interactions although the real number may be easily twice as many. I provide examples from the bacterial flagellar apparatus on how these interactions can be used to infer functions of individual proteins. We have also set up genome-wide screens of Streptococcus pneumoniae and Francisella tularensis with additional genomes in the pipeline. First data from these screens will be compared to our Treponema data and to published data from E. coli and Campylobacter jejuni. Yeast two-hybrid screens appear to recover only about a quarter of previously known interactions. However, this number can be increased to one third or more by screening multiple genomes. It remains unclear how many of these interactions are biologically relevant and this issue will be discussed in more detail. Protein interaction maps of bacteria will be also discussed in the context of human-bacterial interactions.

About the Speaker
Peter Uetz is a Group leader at the Institute of Genetics, Research Center Karlsruhe, Germany and an Associate Investigator at the J. Craig Venter Institute. He received his Ph.D. in 1997 from the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany. Dr. Uetz was a postdoctoral fellow at the University of Washington under the direction of Stan Fields. Dr. Uetz is a co-creator of the high-throughput yeast two-hybrid method commonly used for identifying protein-protein interactions. His current research focuses on elucidating interactomes of bacterial and viral pathogens as well as their interactions with human proteins.

Title: Genome evolution of fungal pathogens
Christina Cuomo, Broad Institute of MIT and Harvard

By examining pathogen genomes and how they evolve, we can identify possible genomic signatures of pathogen-host interactions. I will describe recent findings from our genome-wide analyses of fungal pathogens from diverse hosts. For each of these we have compared the protein sets to those from other fungal genomes, which include pathogens and non-pathogens as well as other eukaryotes. We have examined orthologs of known pathogenesis-related proteins as well as expanded gene families. Batrachochytrium dendrobatidis, a fungal pathogen of amphibians, is the only basal chytrid fungus to be sequenced. We found several expanded gene families in B. dendrobatidis which have known roles in pathogenesis in other animal and plant pathogens. These include the fungalysin protease family, secreted proteins which could play a role in host cell degradation. Surprisingly, we also identified orthologs of crinkling-necrosis (Crn) effector genes from Phytophthora, which appear largely specific to these two species. In a separate analysis, we compared the genomes of 8 related Candida species, which are common human pathogens as well as commensals. In this comparison, we more closely examined differences between the pathogens and non-pathogen genomes, to help understand the evolution of pathogenesis in these species. By examining gene families across the Candida species for patterns of gene and loss, as well as rates of gene evolution, we identified gene families which correlate with the more aggressively pathogenic Candida species. These include genes involved in adhesion, secreted proteases, transporters, as well as uncharacterized genes, which provide a candidate list of new pathogenesis genes. We are also focusing on recent evolution of pathogenesis in examining the genome of Puccinia graminis, or wheat stem rust fungus. As P. graminis grows for part of its life cycle on a non-wheat host, we are examining gene expression on alternative hosts. In addition to an assembly of a strain from North America, we have generated 35X of Solexa sequence from a highly virulent strain (Ug99) currently causing an expanding epidemic in Africa. Comparing these strains provides a first look at genomic differences which could help uncover the molecular basis of a recent pathogen outbreak.

About the Speaker
Christina Cuomo is a research scientist for the Fungal Genome Initiative at the Broad Institute of MIT and Harvard and leads fungal genome analysis. Her current projects focus on fungal genome evolution in a range of pathogenic species, including B. dendrobatidis, P. graminis, S. sclerotiorum, and a comparative analysis of Candida spp. Dr. Cuomo joined the Whitehead Institute/MIT Center for Genome Research, now part of the Broad Institute, in 2002. For the first year, she co-led the genome closure team, as part of the public Human Genome Project. In 2003, she joined the Fungal Genome Initiative and participated in the analysis of the A. nidulans and N. crassa genomes. Her analysis of the F. graminearum genome found that discrete regions of the genome are highly polymorphic. These regions are also enriched for genes implicated in plant-fungus interactions; the high variability of these genes may allow rapid adaptation to challenges from the environment, particularly the plant host. Christina received her Ph.D. in genetics from Harvard University, working with Marjorie Oettinger on the V(D)J recombinase activating genes. Her postdoctoral work at UCSF and at Harvard was with Andrew Murray working on yeast chromosome segregation.

Title: Inferring genetic regulatory networks in host-pathogen interactions
Brett M. Tyler, Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University

The outcome of a host-pathogen interaction may be considered to be governed by a genetic regulatory network that encompasses both organisms. High throughput functional genomics data can be generated with describes the concentrations of mRNAs, proteins and metabolites during the interaction. However, deconvoluting this information into a computational network model that has useful predictive value remains a major challenge. One of the severest challenges is that functional genomics data typically contain drastically fewer samples (e.g. time points) than variables (e.g. genes). I will report progress in two approaches we are using to address this challenge. This work is in collaboration with Ina Hoeschele, Pedro Mendes and Reinhard Laubenbacher.

In the first, quantitative disease resistance in soybean against the oomycete pathogen Phytophthora sojae, we are using genetical genomics to infer genetic regulatory networks that are associated with disease resistance. We have assayed 297 recombinant inbred lines of soybean segregating for P. sojae resistance, using 2600 Affymetrix GeneChips that contain probes for both host and pathogen genes. Using methods refined using yeast data we are using our data to identify networks of expression QTLs associated with the disease resistance QTLs.

In the second project we are using transcriptional profiles of the oxidative stress responses of yeast, of Arabidopsis plant tissue and of P. sojae to evaluate the use of summary variables, such as those obtained using Principal Components Analysis, to create sequential dynamical models of the responses. We have created an approach called biologically plausible interpolation to infer families of models consistent with the data and to predict additional experiments that most cost-effectively refine the models.

About the Speaker
Brett Tyler received his B.Sc (Hons) degree from Monash University, Australia in 1977, and his Ph.D. in medical biology from the University of Melbourne, Australia in 1981. He was a postdoctoral fellow from 1982 to 1984 at the University of Georgia and a Research Fellow at the Australian National University, Australia from 1984 to 1988. He was appointed an Associate Professor in the Department of Plant Pathology at the University of California, Davis in 1988, and promoted to full professor in 1994. In 2002 he accepted a position at Virginia Polytechnic Institute and State University as Professor in the Virginia Bioinformatics Institute and in the Department of Plant Pathology, Physiology and Weed Science.


Future of Scientific Publishing

Organizer(s): Scott Markel, PhD
ISCB Publications Committee
California, USA

Date: Wednesday, July 23
Time:10:45 a.m. - 12:40 p.m.
Room: TBD
*Schedule subject to change

This special session will include representation from scientific journals and community members to examine the future of scientific publishing and how it will impact conference attendees. The session will examine what can be done to facilitate positive change. Three speakers from the research community will address the theme "As a publishing scientist how do/will these topics impact my work?" Following the three speakers will be three shorter talks by publishers. They will address the question "How do you see scientific publishing changing in the next five years and, accordingly, what are your journal's plans?"

Title: Mining the Combination of Images and Text in Literature Archives

Presenter: Robert Murphy (Carnegie Mellon University)

Title: Mining Publications to Study the Structure of Science

Presenter: Mark Gerstein (Yale University)

Title: FEBS Letters initiative

Presenter: Gianni Cesareni (University of Rome, Tor Vergata)

Publishing Presenters: Matt Cockerill (BioMed Central), Claire Bird (Oxford University Press), Catherine Nancarrow (Public Library of Science)

Title: Mining the Combination of Images and Text in Literature Archives

Presenter: Robert Murphy (Carnegie Mellon University)

There has been significant interest in development of systems for automatically

extracting biological information from literature, but most of this work has focused on analysis of text (either captions or full text). Much less work has been done on extracting structured information from the combination of text and images, even though images typically provide crucial supporting evidence for assertions in an article. The need goes beyond simply cataloguing or labeling images in articles. We have therefore worked for a number of years on a prototype system for this task, the Subcellular Location Image Finder (SLIF). SLIF automatically extracts information about protein subcellular locations from figure-caption pairs in biological literature. SLIF separates figures into panels and decides which panels contain fluorescence microscope images (FMI). It applies image processing methods to analyze the patterns in the FMI. The associated captions are also processed to identify which portions of the caption refer to which panels and to identify the names of proteins contained in the captions. The results of this analysis are stored in the SLIF database, which can be queried either interactively or via external program-generated links. The current release contains all papers in Pubmed Central, and work on improving and extending SLIF is ongoing. Generalizable lessons from the SLIF project will be discussed.

SLIF web site:

Murphy lab web site:

Center for Bioimage informatics:

Title: Mining Publications to Study the Structure of Science

Presenter: Mark Gerstein (Yale University)

My talk will focus on a vision for Scientific Publishing: How we could potentially mine all the information in publications, to glean new scientific facts and to study the structure of science itself. I'll provide some illustrations of the latter. I will also talk about some impediments to realizing this vision and potential solutions.


Open access: taking full advantage of the content.

PE Bourne, JL Fink, M Gerstein (2008) PLoS Comput Biol 4: e1000037.

Uncovering trends in gene naming.

MR Seringhaus, PD Cayting, MB Gerstein (2008) Genome Biol 9: 401.

Structured digital abstract makes text mining easy.

M Gerstein, M Seringhaus, S Fields (2007) Nature 447: 142.

RNAi development.

M Gerstein, SM Douglas (2007) PLoS Comput Biol 3: e80.

Chemistry Nobel rich in structure.

M Seringhaus, M Gerstein (2007) Science 315: 40-1.

Data mining on the web.

A Smith, M Gerstein (2006) Science 314: 1682; author reply 1682.

Tools needed to navigate landscape of the genome.

M Gerstein (2006) Nature 440: 740.

PubNet: a flexible system for visualizing literature derived networks.

SM Douglas, GT Montelione, M Gerstein (2005) Genome Biol 6: R80.

Annotation of the human genome.

M Gerstein (2000) Science 288: 1590.

E-publishing on the Web: promises, pitfalls, and payoffs for bioinformatics.

M Gerstein (1999) Bioinformatics 15: 429-31.


Title: FEBS Letters initiative

Presenter: Gianni Cesareni (University of Rome, Tor Vergata)

I will present the FEBS Letters SDA experiment aimed at integrating each manuscript with a structured summary precisely reporting, with database identifiers and predefined controlled vocabularies, the protein interactions reported in the manuscript. Authors play an important role in this process as they are requested to provide structured information to be appended, in the form of human readable
paragraphs, at the end of traditional summaries. We are working in the direction of having structured digital abstracts as an integral part of Medline abstracts. The experiment started in January 2008 and I will report the experience gained in these initial six months.  We hope that this initiative can set the basis for a community discussion aimed at proposing a widely accepted strategy for information storage and retrieval.

Robert F. Murphy is the Ray and Stephanie Lane Professor of Computational Biology and director of the Ray and Stephanie Lane Center for Computational Biology at Carnegie Mellon University. He also is Professor of Biological Sciences, Biomedical Engineering, and Machine Learning, and Director (with Jelena Kovacevic) of the Center for Biomedical Image Informatics at Carnegie Mellon. He also directs (with Ivet Bahar) the joint CMU-Pitt Ph.D. Program in Computational Biology. From 2005-2007, he served as the first full-term chair of NIH's Biodata Management and Analysis Study Section.

He was named a Fellow of the American Institute for Medical and Biological Engineering in 2006, and he received an Alexander von Humboldt Foundation Research Award in 2008. Dr. Murphy has received research grants from the National Institutes of Health, the National Science Foundation, the American Cancer Society, the American Heart Association, the Arthritis Foundation, and the Rockefeller Brothers Fund. He has co-edited two books and two special journal issues on "Cell and Molecular Imaging," and published over 150 research papers. He is President-elect of the International Society for the Advancement of Cytometry.

Dr. Murphy's career has centered on combining fluorescence-based cell measurement methods with quantitative and computational methods. His group at Carnegie Mellon pioneered the application of machine learning methods to high-resolution fluorescence microscope images depicting subcellular location patterns in the mid 1990's. This work led to the development of the first systems for

automatically recognizing all major organelle patterns in 2D and 3D images. He currently leads NIH funded projects for proteome-wide determination of subcellular location in 3T3 cells (with Peter Berget and Jonathan Jarvik) and continued development of the SLIF system for automated extraction of information from text and images in online journal articles (with William Cohen and Eric Xing). His

group is also responsible for providing image informatics tools for the NIH-funded Technology Center for Networks and Pathways (headquartered at Carnegie Mellon) and for providing structured, image based information on subcellular location for the National Center for Integrative Biomedical Informatics (headquartered at the University of Michigan).

Mark Gerstein is the Albert L Williams professor of Biomedical Informatics at Yale University. He is co-director the Yale Computational Biology and Bioinformatics Program, and has appointments in the Department of Molecular Biophysics and Biochemistry and the Department of Computer Science. He received his AB in physics summa cum laude from Harvard College in 1989 and his PhD in chemistry from Cambridge in 1993. He did post-doctoral work at Stanford and took up his post at Yale in early 1997. Since then he has received a number of young investigator awards (e.g. from the Navy and the Keck foundation) and has published appreciably in scientific journals. He has >250 publications in total, with a number of them in prominent journals, such as Science, Nature, and Scientific American. (His current publication list is at .) His research is focused on bioinformatics, and he is particularly interested in large-scale integrative surveys, biological database design, macromolecular geometry, molecular simulation, human genome annotation, gene expression analysis, and data mining.

Gianni Cesareni is a Full Professor of Genetics at the University of Rome Tor Vergata (Italy). After obtaining a degree in physics at the University of Rome La Sapienza he spent three years in Cambridge in the laboratory of Sidney Brenner. He then moved to the EMBL in Heidelberg where he led a group working on the mechanisms controlling plasmid DNA replication. Since 1989 he teaches and works in Rome. He is interested in the interplay between specificity and promiscuity in the protein interaction network mediated by protein recognition modules.