ISMB ECCB 2009

Tutorials Program

ISMB/ECCB 2009 features ten (10) half-day introductory to advanced tutorial sessions. The tutorials will be given on Sunday, June 28, 2009 one day prior to the conference scientific program. Tutorials are held on the same day as the second day of the SIG and Satellite meetings. Tutorial programs provide participants with lectures and instruction, on either well-established or new "cutting-edge" topics, relevant to the bioinformatics field. It offers participants an opportunity to learn about new areas of bioinformatics research, to get an introduction to important established topics, or to develop higher skill levels in areas in which they are already knowledgeable.
Tutorial attendees should register using the on-line registration system when registration opens on February 6, 2009.
Attendees will receive a Tutorial Entry Pass (coupon) at the time they register on site. Tutorial handouts and CD can be picked up at the door of each tutorial session in exchange for the coupon. Lunch is included in the registration fee for attendees registering for two tutorials. Those attending one tutorial only have the option to purchase a lunch ticket during on-line registration.
Tutorial participants must be registered for the conference.

*please note start and finish times are subject to change

Morning Tutorials: 8:30 a.m. - 12:30 p.m.

AM 1: Text mining for non text miners: state of the art for practical use
AM 2: From PPIs to networks and pathways
AM 3: Support Vector Machines and Kernels for Computational Biology
AM 4: Immunological Bioinformatics: From fundamental bioinformatics to clinical validation
AM 5: High-throughput genome-scale sequence analysis and mapping using compressed data structures

Lunch Break: 12:30 p.m. - 1:30 p.m.

Afternoon Tutorials 1:30 p.m. - 5:30 p.m.

PM 6: Metabolic Engineering - Metabolic Reconstruction, Strain Optimization, and Flux Measurements
PM 7: Comparative structural modeling of proteins and protein-DNA complexes: What can and what can’t be predicted
PM 8: IBM Systems Update 2009 (Cancelled)
PM 9: How to understand the cell by breaking it - computational analysis of single and combinatorial gene perturbations
PM 10: Ensemble Based Statistical Inference in Genomics and Molecular Biology

Tutorial Details

AM 1: Text mining for non text miners: state of the art for practical use

Presenter(s): Raul Rodriguez-Esteban, Pfizer, Cambridge, MA, USA, Raul.Rodriguez-Esteban@pfizer.com

Abstract: Familiarity with the tools and methods available in text mining offers computational biologists an opportunity to develop new approaches in their area of expertise. This tutorial will offer a broad survey of practical text mining.

Tutorial level: Introductory - Basic knowledge of biology and statistics is suggested - no prior knowledge of text mining needed.

↑ TOP

AM 2: From PPIs to networks and pathways

Presenter(s): Javier De Las Rivas, Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain, jrivas@usal.es
Carlos Prieto, Consejo Superior de Investigaciones Cientificas (CSIC) , Salamanca, Spain

Abstract: The tutorial aims to show how to explore and analyse data and information about protein-protein interactions (PPIs) to build complete or partial interactomes as networks and correlate them with specific cellular states, mapping pathways and specific biological processes. The information about the interaction between proteins is present in several major biological databases (BIND, BioGRID, DIP, HPRD, IntAct and MINT) and there are some very useful bioinformatic tools that allow adequate integrated exploration and analysis of such data (APID, APID2NET, NetworkAnalyzer, MCODE).

The tutorial is aimed to both biologists and bioinformaticians - a basic background in biology will be needed, but programming expertise is not required.

↑ TOP

AM 3: Support Vector Machines and Kernels for Computational Biology

Presenter(s): Gunnar Ratsch, Friedrich Miescher Laboratory of the Max Planck Society, Tubingen, Germany, Gunnar.Raetsch@tuebingen.mpg.de
Sören Sonnenburg, Fraunhofer Institute First, Berlin, Germany, Soeren.Sonnenburg@first.fraunhofer.de
Cheng Soon Ong, Max Planck Institute for Biological Cybernetics and
Friedrich Miescher Laboratory, Tübingen, Germany, chengsoon.ong@inf.eth.ch

Abstract: SVMs are very popular in data mining and bioinformatics. This tutorial introduces SVMs and kernel algorithms and illustrates their application to typical problems in computational biology. It covers advances in kernels on strings and graphs, predicting structured outputs and discusses how to derive biological insight from the classifiers.

↑ TOP

AM 4: Immunological Bioinformatics: From fundamental bioinformatics to clinical validation

Presenter(s): Morten Nielsen, The Technical University of Denmark, Lyngby, Denmark, mniel@cbs.dtu.dk
Claus Lundegaard, The Technical University of Denmark, Lyngby, Denmark, lunde@cbs.dtu.dk

Abstract: A major task is to develop methods identifying epitopes in pathogen proteomes. Bioinformatics and high throughput screenings can be used to develop reliable prediction systems using artificial neural networks, Gibbs sampling, weight matrices, integrative models and protein structural analysis. Predictions are verified clinically against pathogens like Flu, Tuberculosis and HIV.

This tutorial is suitable for those already working with epitope data with prior knowledge of basic and statistical bioinformatics.

↑ TOP

AM 5: Genome-scale sequence analysis using compressed index structures

Presenter(s): Veli Mäkinen, University of Helsinki, Finland, vmakinen@cs.helsinki.fi

Abstract: The tutorial focuses on tools and techniques that make genome-scale sequence analysis and mapping feasible with reasonable computing resources. The classical algorithms exploiting generalized suffix trees are reviewed and then modern data structure compression techniques are introduced that enable the simulation of those algorithms on genome-scale sequence collections.

Applications on high-throughput sequence mapping for short sequence reads (e.g. ChIP-seq data) are covered, including the backward search backtracking mechanism on Burrows-Wheeler-based indexes, which is the principle behind the state-of-the-art software tools for the task. On genome-scale sequence analysis tasks, the tutorial covers the technique of mining discriminative patterns, that can be used to produce good candidate "seeds" for the discovery of regulatory motifs or as a tool for comparative genomics. Future applications in the storage, retrieval, and analysis of collections of individual genomes are discussed.

In addition to the theoretical foundations, the practical performance of the data structures and algorithms are demonstrated by experiments on genome-scale tasks.

The tutorial is suitable for those familiar with basics of sequence analysis. Earlier experience on compression is not necessary; there will be a brief introductory part on the elementary data compression and information theory concepts as a prelude to the data structure compression realm.

↑ TOP

PM 6: Metabolic Engineering - Metabolic Reconstruction, Strain Optimization, and Flux Measurements

Presenter(s): Isabel Rocha, Portuguese Institute of Biotechnology and Bioengineering (IBB), Braga, Portugal, irocha@deb.uminho.pt
Marcelinus Pont, Dupont Inc.

Abstract: This tutorial is targeted to researchers willing to apply Bioinformatics and Systems Biology tools to the field of Metabolic Engineering and Industrial Biotechnology. The main topics to be covered are Metabolic Reconstruction and Analysis (including Metabolic Flux Analysis, Flux Balance Analysis and Elementary Flux Modes) and In silico Metabolic Engineering tools. Therefore, participants should be familiar with mathematical tools including Linear Algebra. Also, basic knowledge on microbial metabolism is required.

↑ TOP

PM 7: Comparative structural modeling of proteins and protein-DNA complexes: What can and what can’t be predicted

Presenter(s): Jan Kosinski, International Institute of Molecular and Cell Biology, Warsaw, Poland, kosa@genesilico.pl

Abstract: The increasing number of complexes with known structures allows modeling whole biological assemblies using comparative methods. This tutorial will provide detailed overview of the state-of-the-art methods for constructing comparative structural models of proteins and protein-DNA complexes. Several alternative methods will be presented for protein comparative modeling, editing DNA molecules and optimization of protein-DNA interfaces.

The tutorial is especially targeted to the audience with general knowledge about protein and nucleic acid structure, but little or no experience in protein structure modeling. Basic understanding of sequence similarity and sequence alignments, general knowledge about sequence and structural databases, and familiarity with structural molecular viewers will be advantageous, although not critical.

↑ TOP

PM 8: IBM Systems Update 2009 (Cancelled)

↑ TOP

PM 9: How to understand the cell by breaking it - computational analysis of single and combinatorial gene perturbations

Presenter(s):Florian Markowetz, Cancer UK Cambridge Research Institute, Cambridge, UK, florian@genomics.princeton.edu
Chad Myers, University of Minnesota, Minneapolis, USA, cmyers@cs.umn.edu

Abstract: The analysis of genome-wide single or combinatorial gene perturbation screens is moving to the center-stage of computational systems biology as more and better experimental systems are established in humans and model organisms. This tutorial will provide a comprehensive overview of recent advances both in experimental technologies and computational analysis methods. The tutorial will serve as a thorough introduction to gene perturbation screens and their analysis, but some advanced issues will also be introduced.

↑ TOP

PM 10: Ensemble Based Statistical Inference in Genomics and Molecular Biology

Presenter(s): Charles (Chip) E. Lawrence, Brown University, Providence, RI, USA, Charles_Lawrence@brown.edu

Abstract:
In the past decade, high-throughput data-acquisition technologies have rendered datasets with sizes unimaginable to our predecessors, including the sequence of many genomes and the products of numerous high-throughput technologies of the post-genome era. Although the emergence of such large datasets seems to imply more precise parameter estimates, paradoxically just the opposite is becoming increasingly common. This paradox emerged because these technologies simultaneously opened opportunities to draw inferences on previously unanswerable high-dimensional (high-D) questions. Novel ensemble based statistical inference procedures based on statistical decision theory addressing this paradox have recently been developed. The resulting centroid and maximum expected accuracy estimators have been shown to outperform traditional mode based procedures in the prediction of RNA secondary structure, protein structure prediction by homology, and prediction of transcription factor binding sites. In all three of these cases the identical probabilistic models were used for mode based estimates and ensemble-based estimates. Thus the reported improvements stem exclusively from the novel means that these procedures employ to extract information from posterior ensembles. Also, these ensemble-based procedures provide the means to describe the shape of posterior distributions, and interestingly they have shown that these distributions are frequently multimodal.

Arguably the best description of the importance and novelty of these ensemble-based approaches has been presented in independent reviews. Specifically, in a review entitled “Revolutions in RNA Secondary Structure Prediction” Mathews (J. Molecular Biology. (2006) 359, 526-532) describes the ensemble based sampling approach of Ding et al. (J. of Molecular Biology, (2006) 359: 554-571) as follows. “Recent results show the power of sampling in predicting regions in an RNA that are most likely to be accessible to hybridization, in predicting secondary structures with fewer false positive base-pairs, and to understanding the folding landscape. The impact of this is enormous because, for the first time, the set of predicted secondary structures is a statistical sample of the complete ensemble of structures. The probability of sampling any given structure is exactly its probability of occurring in the thermodynamic ensemble.” In the “Faculty of 1000” Michael Zhang (July 16, 2008) reported that the paper of Carvalho and Lawrence (PNAS: USA, 105: 3209-3214, 2008) is a “must read article” and stated that “I found this article interesting because it challenges common practice in many areas of computational genomics data analyses. This centroid estimation method was first discovered to give substantial improvement in RNA structure prediction. Now the authors have provided further theoretical foundations and demonstrate its successful application to other genomics problems.”
The goal of this tutorial is to raise serious questions about the conceptual foundations of the
traditional mode based estimation procedure that now dominate our field, and to introduce students to the theory, methods, and algorithms of alternative ensemble based estimation procedures.

This tutorial will be at an intermediate level on the theory and methods for prediction and
estimation in computational molecular biology and computational genomics, and at a beginning level on ensemble based statistical inference procedures.

↑ TOP