ISMB 2005: Michigan, June 25-29

TUTORIAL PROGRAM - Saturday, June 25: ISMB 2005 will feature half-day introductory and advanced tutorials. The tutorials will be given prior to the meeting. The purpose of the tutorial program is to provide participants with lectures and demos on either well-established, or new "cutting-edge" topics, relevant to the bioinformatics field. It offers participants to learn about new areas of bioinformatics research, to get an introduction to important established topics, or to develop higher skill levels in areas they are alread knowledgeable. Tutorial attendees should register using the on-line ISMB 2005 registration form.

Attendees will receive a Tutorial Entry Pass at the time they register on site. Tutorial handouts can be picked up at the door of each tutorial session.

Morning Tutorials: 8:30am-12:30pm

AM1: Integration and Analysis of Diverse Genomic Data, Olga Troyanskaya, Princeton University
AM2: Optimal Design and Analysis of Microarray Experiments, Ernst Wit, University of Glasgow and John McClure, University of Glasgow
AM3: RNA: Algorithms for Structure Prediction and Gene-finders, Peter Clote, Boston College
AM4: Developing and Using Special Purpose Hidden Markov Model Databases, Martin Gollery, University of Nevada, Reno
AM5: Weighted Finite-State Transducers in Computational Biology, Corinna Cortes, Google, Inc. and Mehryar Mohri, New York University
AM6: Semantic Aggregation, Integration and Inference of Pathway Data, Joanne Luciano and Jeremy Zucker, Harvard Medical School
AM7: Attacking Performance Bottlenecks, Ruud van der Pas, Sun Microsystems and Karl V. Steiner, University of Delaware

12:30-2pm Lunch break (lunch on your own)

Afternoon Tutorials: 2-6pm

PM8: Principles of Ontology Construction, Suzanna Lewis, University of California Berkeley
PM9: Introduction to Phylogenetic Networks, Daniel Huson, Center for Bioinformatics, Tuebingen University
PM10: Mining the Biomedical Literature : State of the Art, Challenges and Evaluation Issues, Hagit Shatkay, School of Computing, Queen's University, Kingston, Ontario
PM11: Gene Expression Levels as Traits in Genetic Linkage Analysis, Steffen Möller, University of Rostock and Robert Hoffman, European Bioinformatics Institute
PM12: A Bioinformatics Introduction to Cluster Computing, Andrew Boyd, and Abhijit Bose, University of Michigan
PM13: Computational Geometry of Protein Structure and Function, Iosif Vaisman, George Mason University
PM14: A Massively Parallel High Performance Computing Environment for Computational Biology, Gyan Bhanot and Bob Germain, IBM T.J. Watson Research Center

AM1: Integration and Analysis of Diverse Genomic Data (view proposal, pdf)

Presenter: Olga Troyanskaya, Princeton University, ogt@princeton.edu
Olga Troyanskaya is an Assistant Professor in the Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics at Princeton University. Her laboratory researches data integration, gene function prediction, and biological pathway modeling based on heterogeneous data. She has recently written an invited review paper about data integration for Briefings in Bioinformatics. In addition to multiple seminar and workshop presentations, Dr. Troyanskaya developed and taught two bioinformatics courses at Princeton: Analysis and Visualization of Large-Scale Biological Data and Computational Modeling of Biological Networks. She teaches microarray analysis for the bioinformatics course at CSHL and taught bioinformatics at CalState Hayward.

Contact: Click to send email

Abstract: In the recent years, multiple types of high-throughput functional genomic data have become available that facilitate rapid functional annotation and pathway modeling in the sequenced genomes. Gene expression microarrays are the most commonly available source of such data, and increasing amount of other data, including protein-protein interactions, sequence, literature, and localization data are being generated. However, genomic data sacrifice specificity for scale compared to traditional experimental methods, yielding large quantities of relatively lower quality measurements.

This problem has generated much interest in bioinformatics in the past two years, as sophisticated computational methods are necessary for accurate functional interpretation of these large-scale datasets. This tutorial will present an overview of recently developed methods for integrated analysis of functional genomic data and outline current challenges in the field. The focus will be on the development and use of such methods for gene function prediction, understanding of protein regulation, and modeling of biological networks. This tutorial will be of interest to computational researchers interested in contributing to the field of data integration and analysis of heterogeneous data and to biologists with some computational background who are interested in using the methods on their experimental data and understanding their properties and limitations.

AM2: Optimal Design and Analysis of Microarray Experiments (view proposal, pdf)

Presenters: Ernst Wit, University of Glasgow, ernst@stats.gla.ac.uk
A Reader in Statistics at the University of Glasgow with over 10 years university teaching experience. Currently, he is teaching Statistics for Psychology Students, a 2nd year Probability Distributions course, and a 4th year honours Time Series and Spatial Statistics course. Dr Wit also coordinates postgraduate teaching in the Statistics Department. He has completed an Institute for Learning and Teaching in Higher Education accredited programme. In September 2003 Dr Wit taught a 1.5 day course on microarray data analysis at the regional International Biometrics Society conference, which was repeated in May 2004 in Verona (Italy).

John McClure, University of Glasgow, jdmc4w@clinmed.gla.ac.uk
A Lecturer in Statistical Genetics at the University of Glasgow. He has worked bioinformatics for over three years, principally on statistical analysis methods for microarray experiments. At present he is teaching statistics to BSc Medical Science students. In 2002-2003 he taught a Practical Statistics course (S1B) for Science, Arts and Social Science students in the University of Glasgow. Dr McClure has also taught the statistics module in the Basic Bioinformatics course of the M.Sc. in IT (Bioinformatics Strand) in 2003-2004. He is at present completing an ILT accredited Postgraduate Certificate in Academic Practice.

Contact: Click to send email

Abstract: Gene expression profiling has become a routine technique that can be useful to many applied life scientists in some stage of their research. The ease with which it is possible to generate thousands of gene profiles stands in no comparison to the difficult and often treacherous path of getting reliable conclusions or even interesting research directions from these data. The data are very noisy and many traditional computational techniques are not made to deal with large numbers of variables, i.e. genes. Further, appropriate statistical design is essential to make the best use of the limited number of arrays and samples that are in general available.

This course will cover three of the main aspects of microarray experiments: to design optimal or near-optimal microarray experiments; to perform simple but effective microarray data cleaning techniques; to perform effective hypothesis tests with accurate error rate control.

The R computing environment will be used to implement the different techniques discussed.

AM3: RNA: Algorithms for Structure Prediction and Gene-finders (view proposal, pdf)

Presenter: Peter Clote, Boston College, clote@bc.edu
PhD 1979 Mathematics at Duke University Full Professor of Computer Science at Boston College since 1990 Gentzen Chair of Theoretical Computer Science at University of Munich 1995-2000 50 journal or peer-refereed proceedings publications, editor of 3 books (Oxford Univ Press, Springer-Verlag), author of 2 books (Computational Molecular Biology: An Introduction at Wiley & Sons, and Boolean Functions and Computation Models with Springer-Verlag). Editorial Board of Notre Dame J. of Formal Logic 1991-2003, Program Committee and Organizing Committee of numerous meetings. Semi-professional jazz alto saxophonist (played at Recomb 2003 in Berlin).

Contact: Click to send email

Abstract: RNA is a current focus of interest in molecular biology, due to post-transcriptional regulatory action of micro-RNA (miRNA) and small interfering RNA (siRNA), which allow geneticists to knock down protein translational products and better understand gene interactions. RNA seconday structure plays important roles in retranslation events such as incorporating selenocysteine using the UGA stop codon and in ribosomal frameshift slippage events.

This tutorial surveys soem of the recent biology of RNA, obtained from ribosomal conformation analysis, and some important algorithms concerning secondary structure prediction and non-coding gene finders. We additionally include recent findings of the author concerning the computation of the landscape of kinetic traps for RNA and the energy profile of random RNA.

AM4: Developing and Using Special Purpose Hidden Markov Model Databases (view proposal, pdf)

Presenter: Martin Gollery, University of Nevada, Reno, mgollery@unr.edu
Martin Gollery is the Associate Director of Bioinformatics at the University of Nevada at Reno. He has developed several custom HMM databases, all of which are publicly available. In his former role as Director of Research at TimeLogic corporation he was involved with the development of accelerated HMM search algorithms, as well as other methods such as TeraBLAST and GeneBLAST.

Contact: Click to send email

Abstract: Hidden Markov models provide a probabilistic model of protein domain or family data. Databases of Hidden Markov Models can be extremely useful tools for the analysis of sequence data. While the Pfam database is unquestionably the most popular collection of HMM's in existence today, there are many other useful collections that are currently available, and some of them might be more suitable for a particular need. In addition, there are several versions of Pfam, and it is useful to know which version to use, and why.

In this tutorial we will examine the varieties of software methods that are currently available, for HMM searches and for related methods such as profile and PSSM algorithms. We will study many different HMM databases, with specific tips and pointers for optimizing results from each one. Finally, we will look at methods for the development of customized collections of HMMs, with the associated benefits and pitfalls.

AM5: Weighted Finite-State Transducers in Computational Biology (view proposal, pdf)

Presenters: Corinna Cortes, Google, Inc., corinna@google.com
Corinna Cortes is a Research Scientist at Google, Inc. where she is working on a broad range of theoretical and applied large-scale machine learning problems. Dr. Cortes' research work is well-known in particular for her contributions to data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000, and her work on the theoretical foundations of support vector machines (SVMs) and kernel techniques for the analysis of variable-length sequences and weighted automata. She has been giving numerous talks and presentations in machine learning.

Mehryar Mohri, Courant Institute - New York University, mohri@cs.nyu.edu
Mehryar Mohri is a Professor of Computer Science at the Courant Institute of Mathematical Sciences. Before joining NYU, Dr. Mohri worked for ten years at AT\&T Bell Labs and AT\&T Labs - Research where he served as the Head of a Research Department, leading and directly contributing to a broad range of work in machine learning, automata theory, text and speech processing, and the design of general algorithms. He has taught dozens of tutorials on topics related to weighted finite-state transducers, their theory and algorithms, and their applications to sequence modeling.

Contact: Click to send email

Abstract: Projects such as genome sequencing and DNA microarray studies produce an ever-increasing amount of data and the area of computational biology now poses some of the biggest challenges in computer science and data mining such as data storage, visualization, and modeling. Statistical learning techniques are increasingly successfully applied for modeling, but they often require substantial algorithmic expertise: the conventional software packages do not naturally encompass variable-length sequences and general structures, thus, special-purpose algorithms must be implemented to solve the associated optimization problems.

This tutorial introduces a general framework, weighted finite-state transducers (WFSTs), that naturally matches a wealth of data in computational biology, together with a set of algorithms that efficiently allow for a series of operations on WFSTs, enabling biologists even with modest computer skills to successfully apply sophisticated alignment techniques and powerful kernel methods. It familiarizes the audience with WFSTs algorithms and techniques, their use and application to computational biology problems, and a software library (FSM Library) incorporating these algorithms and representations that is freely available for research and academic use.

AM6: Semantic Aggregation, Integration and Inference of Pathway Data (view proposal, pdf)

Presenters: Joanne Luciano, Massachusetts General Hospital / Harvard Medical School, jluciano@nmr.mgh.harvard.edu
Joanne Luciano is an active leader in several community-based initiatives, including BioPAX, the BioPathways Consortium, and the emerging Semantic Web for Life Sciences. She co-developed the BioPAX ontology that is on its way to becoming the standard for biological pathway knowledge representation. She is an authority in pathway databases and modelling languages. Joanne’s teaching experience ranges over 20 years, and includes such courses as artificial intelligence, data communications and networks, and data structures. Joanne has been an active member of the computational and systems biology community since 1996, presenting at many international conferences.

Jeremy Zucker, Dana-Farber Cancer Institute, Harvard Medical School, zucker@research.dfci.harvard.edu
Jeremy Zucker holds degrees in Computer Science and Applied Mathematics from the University of Colorado. He is an expert in semantic aggregation, integration and inference. Currently a bioinformatics specialist at the Dana-Farber Cancer Institute, and a computational biologist for the Church lab at Harvard Medical School, Jeremy is a lead developer for projects such as DARPA’s BioSPICE, the DOE Genomes to Life, and BioPAX. His teaching experience includes a course on systems biology at the Harvard Department of Molecular Cell Biology, and is a contributing author of the soon-to-be-released textbook: Introduction to Systems Biology (Science of Knowledge Press).

Contact: Click to send email

Abstract: The objective of this tutorial is provide the student with the knowledge and tools necessary to perform semantic aggregation, integration and inference on biological pathway data. The Pathway Resource List at http://cbio.mskcc.org/prl contains over 150 biological pathway databases and is growing. However, to consolidate all the knowledge for a particular organism, one must extract the pathways from each database, transform those pathways into a standard data representation, and load the data into an integrated repository. This tutorial will provide the student with the ability to avoid common pitfalls and reduce the amount of work that must be performed in order to accomplish pathway integration, thus leading to more productive and efficient scientific research overall.

The intended audience consists of bioinformaticians, computational biologists and database developers. Participants should be familiar with intermediate-level programming concepts (APIs, UML, XML parsing, object models) and intermediate-level database concepts (SQL, data modeling, Entity-Relation diagrams, etc.) Topics covered include issues in biological knowledge representation, semantic web technologies, data cleaning, and fundamentals of the extract, transform and load (ETL) methodology, illustrated through real world examples.

AM7: Attacking Performance Bottlenecks (view proposal, pdf)

Presenters: Ruud van der Pas, Sun Microsystems, ruud.vanderpas@sun.com
Ruud van der Pas has studied mathematics and physics. He has a strong background in High-Performance and Technical Computing and has been with Sun since 1998. Ruud is a Senior Staff Engineer in the Scalable Systems Group and operates on a world-wide basis. His expertise is in serial performance, shared memory parallelization and performance analysis. At Sun, he provides application tuning consultancy. Additionally, he regularly gives technical presentations and workshops at conferences and seminars. He is also responsible for the Sun Application Tuning Seminar and works closely with several groups within Sun to discuss enhancements to current and future products.

Karl V. Steiner, Delaware Biotechnology Institute, University of Delaware, steiner@dbi.udel.edu
Karl V. Steiner is the associate director of the Delaware Biotechnology Institute (DBI) and a professor of electrical and computer engineering at the University of Delaware. Prior to his current position, he served as executive director of the University of Delaware’s Center for Composite Materials, and the founding executive director of the Fraunhofer Center for Applied Materials Research. Dr. Steiner received his Ph.D. in mechanical engineering from the University of Kaiserslautern in Germany and his master’s degree in electrical and computer engineering from the University of Delaware. As DBI associate director he is responsible for the strategic planning and operational oversight of the Institute’s core instrumentation centers, including the bioinformatics center.

Bogdan Vasiliu, Sun Microsystems, bogdan.vasiliu@sun.com
Bogdan Vasiliu has been a member of the Vertical Markets and Solutions technical group at Sun Microsystems since 2000. His expertise is in grid and parallel computing, performance tuning and analysis. Bogdan studied computer science and engineering and has been involved with High Performance Computing since 1995. Prior to joining Sun, he worked at Ames Laboratory, Iowa State University and “Politehnica” University of Bucharest. At Sun, Bogdan is responsible for architecting grid solutions for life science partners and customers, porting, optimizing and benchmarking numerically-intensive Independent Software Vendor (ISV) applications on Sun systems, and acting as a liaison to product engineering, driving resolution to problems affecting ISVs, and providing feedback on future product directions for Sun.

Contact: Click to send email

Abstract: With CPU chip speeds peaking, simply upgrading the processor no longer offers the performance benefit that it did, leaving many of us facing a brick wall. This performance tutorial will teach you how to overcome this barrier and get more out of your applications using your /existing/ system(s). Basically, there are two approaches to address a performance bottleneck: 1) software - modify the code itself to parallelize and optimize it; 2) hardware - increase the efficiency of utilization through scheduled resource allocation. Both approaches will be covered in this tutorial through detailed case studies.

After a short introduction to application performance tuning, we will zoom in on parallelization. The OpenMP programming model will be introduced through relevant examples that will highlight how and when to apply its simple but powerful features. OpenMP is a widely available open standard (http://www.openmp.org).

Key DRM technologies (LSF, SGE, PBS) and how they differ, as well as key grid concepts including policy allocation, resource reservation, backfilling, accounting, reporting, and billing will be discussed. DBI (University of Delaware) will present several of their current problems in computational biology with solution instructions that attendees will be able to replicate and apply. Live demonstrations will be used throughout the tutorial to better illustrate how to execute the instructions that will be provided.

Additional Tutorial Authors: Douglas O'Neal, Delaware Biotechnology Institute, University of Delaware, USA; Loralyn Mears, Life Science Market Segment Manager, Sun Microsystems, Inc., USA

PM8: Principles of Ontology Construction (view proposal, pdf)

Presenters: Suzanna Lewis, University of California Berkeley, suzi@fruitfly.org
Head of the Berkeley Drosophila Genome Project bioinformatics group and a founder of the Gene Ontology Consortium. She, together with Martin Reese, taught one previous ISMB tutorial, the Genome Annotation Assessment Project held in 1999 in Heidelberg.

Barry Smith, Universitat des Saarlandes, phismith@buffalo.edu
President of European Centre for Ontological Research (ECOR) a new research program aimed at applying the theories developed by philosophers to a variety of problems in information science and related areas.

Contact: Click to send email

Abstract: Biomedical advances are increasingly stymied, not by a lack of experimental data, but by a lack of data that is available in computable form. Biological data must be readily accessible, comparable, and fully correlated, and it must be captured in a language that allows the formulation of coherent testable hypotheses if it is to efficiently provide relevant answers to scientific inquiries and thus enable discoveries. The bottleneck in research today, in other words, is the task of sifting through large data sets to decide what experiments should be done.

Ontological frameworks can provide a shared language for communicating biological information and thereby integrating biological knowledge and removing this data bottleneck. These rigorous semantic descriptions of the entities and relationships between these entities can then be used to formulate hypotheses about and navigate through the volumes of data. To this end there are an increasing number of groups developing ontologies in assorted biological domains. However, these efforts will only be beneficial and aid biological data integration if certain criteria are met. These prerequisites are that the ontologies are non-overlapping, that they are accepted and used by the community, and that they are well-principled.

Additional Tutorial Authors: Michael Ashburner, University of Cambridge; Mark Musen, Stanford University; Rama Balakrishnan, Stanford University; David Hill, Jackson Laboratory

PM9: Introduction to Phylogenetic Networks (view proposal, pdf)

Presenter: Daniel Huson, Center for Bioinformatics, Tuebingen University, huson@informatik.uni-tuebingen.de
1990 PhD Mathematics Bielefeld University
1990-97 Research Assistant Bielefeld University
1997-99 Post-Doc in Bioinformatics at UPenn and Princeton, with Tandy Warnow
1999-02 Senior Research Scientist Celera Genomics, Gene Myers group
2002- Professor of Algorithms in Bioinformatics at Tuebingen University.

Contact: Click to send email

Abstract: The evolutionary history of species is best represented by a phylogenetic tree and there exist many well-known methods for reconstructing such trees, in particular, from bio-molecular sequences. However, for some types of data, this may not be appropriate and approaches that generate phylogenetic networks may be more suitable. For example, hybridization between different plant species or recombination between closely related bacteria can lead to sequence data whose evolutionary history is best represented by a network.

There are two types of phylogenetic networks: ones that are intepretable as non-tree-like models of evolution, and ones that do not possess such an interpretation, but should rather be considered as visualizations of incompatibilities within a data set. Examples of the former are ancestral recombination graphs, hybridization networks and reticulation network, whereas examples of the latter are splits graphs, consensus networks and neighbor-nets. This tutorial will discuss both types.

We will first give a brief introduction to reticulate evolution and phylogenetic networks. Then, a number of important network reconstruction methods will be discussed. Finally, the application of some phylogenetic methods will be illustrated using typical data sets (haplotypes, plant hybrids, and whole genome data for a set of prokaryotes).

PM10: Mining the Biomedical Literature : State of the Art, Challenges and Evaluation Issues (view proposal, pdf)

Presenter: Hagit Shatkay, School of Computing, Queen's University, Kingston, Ontario, shatkay@cs.queensu.ca
Hagit Shatkay is an assistant professor at the School of Computing, Queen’s University in Kingston, Ontario. Her research is in the area of machine learning as it applies to biomedical data mining, and she is an active member of the biomedical text-mining research community. Prior to joining Queen’s University, she was an Informatics Research scientist with the Informatics Research group at Celera/Applied Biosystems, and before that a postdoctoral fellow at the National Center for Biotechnology Information (NCBI). She has a PhD in Computer Science from Brown University, and an MSc and BSc in Computer Science from the Hebrew University in Jerusalem.

Contact: Click to send email

Abstract: Much knowledge about genes and proteins is reported in the vast amount of biomedical literature. The recent advancement of high-throughput biological techniques is accompanied by a steady and overwhelming increase in the number of related publications. Thus the ability to rapidly and effectively survey the literature forms a necessary step for both the design and the interpretation of any large-scale experiment, as well as for the timely curation of biomedical knowledge in public databases. Consequently, the past few years have seen a surge of interest in biomedical text mining. Several text-related disciplines are harnessed in such efforts, including natural language processing, information extraction, and information retrieval.

The objective of this tutorial is to provide a structured introduction to the biomedical text mining field, from both the biomedical-application and the text-mining perspectives. It will present general and biomedical-specific text mining methods, examine and analyze existing work in biomedical literature mining, and place it in the framework of the explicit text mining disciplines. The tutorial will especially emphasize critical assessment, validation and evaluation methods that are used in text mining and information retrieval, and survey recent evaluation initiatives.

PM11: Gene Expression Levels as Traits in Genetic Linkage Analysis (view proposal, pdf)

Presenters: Steffen Möller, University of Rostock, Institute of Immunology, moeller@pzr.uni-rostock.de
1996 Diploma in Computer Science from the University of Hildesheim, Germany 1997-2001 EMBL fellowship for the EMBL-EBI European Bioinformatics Institute Cambridge, UK PhD from the University of Cambridge since 2001 Postdoctoral position at the Institute of Immunology, University of Rostock, Germany

Robert Hoffmann, European Bioinformatics Institute, hoffmann@ebi.ac.uk
1999 Diploma in Biology from the University of Vienna, Austria since 2001 PhD Student at the National Center of Biotechnology, Madrid, Spain currently Visitor at the EMBL-EBI European Bioinformatics Institute, Cambridge, UK

Contact: Click to send email

Abstract: Diseases that individuals develop late in life have no direct influence on their reproduction. Variants of genes that contribute to the susceptibility, onset or severity of these diseases are therefore likely to be kept in the population. Together with environmental factors, combinations of such defects lead to polygenic diseases. Some common autoimmune dieases (e.g. Multiple Sclerosis, Rheumatoid Arthritis) but also heart diseases and obesity are examples. Genes in polygenic diseases need to be investigated in combination for their phenotypic effects. However, genetic studies in humans are extremely difficult. Animal models came to rescue with shorter generation times, common environments and directed breeding.

Strains are searched that differ in the susceptibility to a disease. Genetic markers determine differences in the genotype. Statistical association of markers with differences in the trait of the disease among the individuals of the F2 or later generations led to the concept of quantitative trait loci.

This presentation explains how to combine the genotyping and disease phenotypes with RNA expression levels. In this connexion, the RNA expression level of genes is interpreted as a trait of the disease. Equivalent analyses yield expression QTLs representing candidates for cis- and trans-controlling regions for gene expression. With genes interacting, the effect on the expression of genes may be changed by more than the sum of the individual loci (epistatic effects). The cross-comparison with information on molecular pathways, extracted from Medline abstracts or manually curated, yields explicit candidates for further evaluation.

The approach is discussed on the example of a murine model of Multiple Sclerosis.

PM12: A Bioinformatics Introduction to Cluster Computing (view proposal, pdf)

Presenters: Andrew Boyd, University of Michigan/Michigan Center for Biological Information (MCBI), adboyd@umich.edu
Dr. Boyd is a postdoctoral research fellow in Biomedical Information at the University of Michigan. He is under the mentorship of Dr. Brian Athey, who is an Associate Professor in Biomedical Informatics and the Director of the Michigan Center for Biological Information. His research is on bioinformatics applications on cluster computing. Dr. Boyd has presented a similar workshop to the University of Michigan Bioinformatics Students in March 2004. The feedback from the students was positive; they were able to after the course understand the basic Principals of Cluster Computing to apply to future research topics.

Abhijit Bose, University of Michigan/Michigan Grid Research and Infrastructure Development (MGrid), abose@engin.umich.edu
Dr. Bose is the associate director of MGrid (Michigan Grid Research and Infrastructure Development) and a scientist at The University of Michigan. He is also a member of the DARPA-funded Virtual Soldier project team at Michigan. He has extensive experience in cluster computing and parallel algorithms. He has organized and taught eight NPACI (National Partnership for Advanced Computing Infrastructure) Parallel Computing workshops at Michigan since 2000. He has also taught a number of courses at the Annual Summer Insitute organized by the San Diego Supercomputer Center. These popular workshops have always been oversubscribed and additional sessions are often organized.

Contact: Click to send email

Abstract: Commodity processor-based clusters have rapidly become the compute systems of choice for computational modeling and simulation in many scientific fields. Many biological computations such as genomic and protein analysis are increasingly being performed on large-scale clusters based on Linux. There are many ways to configure a cluster, including 32 and 64-bit CPUs, interconnects such as Myrinet, Ethernet, Infiband, and storage systems such as commodity RAIDs and SANs. This tutorial focuses on clusters and their application to solving large-scale bioinformatics problems. The participants will learn the basics of Linux clusters, programming environments, batch and interactive computations, and a series of examples from Bioinformatics including how to use parallel versions of BLAST, Smith-Waterman, INTERPROSCAN etc. The basics of parallel computing on clusters such as Message Passing Interface (MPI) and threads will be introduced. The participants will learn how to interpret common debugging, system and application-related error messages on clusters. A unique feature of the tutorial is a hands-on session on a high-performance Linux cluster at The University of Michigan that the participants will be able to use during the course of the tutorial.

PM13: Computational Geometry of Protein Structure and Function (view proposal, pdf)

Presenter: Iosif Vaisman, George Mason University, ivaisman@gmu.edu
Dr. Iosif Vaisman is Associate Professor of Bioinformatics in the School of Computational Sciences at George Mason University. His research focuses on computational protein structure analysis. Dr. Vaisman has participated in bioinformatics curriculum and course development and he has been teaching bioinformatics, pharmacoinformatics, and computational biology at the University of North Carolina at Chapel Hill and George Mason University since 1995.

Contact: Click to send email

Abstract: The tutorial covers theoretical approaches, techniques and computational tools for computational geometry approach to protein structure analysis, classification, and prediction. Students will acquire knowledge of fundamental principles and methods for protein structure analysis, as well as practical skills necessary to use modern computational tools for such analysis.

PM14: A Massively Parallel High Performance Computing Environment for Computational Biology (view proposal, pdf)

Presenters: Gyan Bhanot, IBM T.J. Watson Research Center, gyan@us.ibm.com
Gyan Bhanot got his PhD in Theoretical Physics from Cornell University in 1979 and has held research positions at IAS, Princeton, Brookhaven Lab, CERN and ITP Santa Barbara. After a few years as Associate Professor of Physics at Florida State University, he joined Thinking Machines Corporation to develop applications for the Connection Machine. He joined IBM in 1994 as a Research Staff Member and has been there since. His interests are in Computational Biology, Simulations and Parallel Algorithms.In addition to his position at IBM Research, he is an Adjunct Professor in the Biomedical Engineering Department at Boston University and a Visiting Member in the Center for Systems Biology at the Institute for Advanced Study in Princeton.

Bob Germain, IBM T.J. Watson Research Center, rgermain@us.ibm.com
Robert S. Germain received his A.B. in physics from Princeton University in 1982 and his M.S. and Ph.D. in physics from Cornell University. After receiving his doctorate in 1989, he joined the IBM Thomas J. Watson Research Center as a Research Staff Member in the Physical Sciences Department. From 1995 to 1998, he was project leader for development of a large scale fingerprint identification system using an indexing scheme (FLASH) developed at IBM Research. Since 2000, Dr. Germain has been responsible for the science and associated application portions of the Blue Gene project (http://www.research.ibm.com/bluegene) as well as mnaging the Biomolecular Dynamics and Scalable Modeling group. His current research interests include the parallel implementation of algorithms for high performance scientific computing, the development of new programming models for parallel computing, and applications of high performance computing to challenging scientific problems in computational biology. Dr. Germain is a member of the IEEE and the American Physical Society.

Kirk Jordan, IBM Deep Computing, kjordan@us.ibm.com
Kirk E. Jordan received his Ph.D. in Applied Mathematics from the University of Delaware in 1980. He has had more than 20 years experience in high performance and parallel computing having held computational science positions at Exxon Research and Engineering, Argonne National Laboratory, Thinking Machines and Kendall Square Research before joining IBM in 1994. At IBM, he has held several positions in which he promoted high performance computing and high performance visualization. Currently, Jordan is Emerging Solutions Executive in IBM’s Strategic Growth Business Deep Computing Group where he is responsible for overseeing development of applications for IBM’s advanced computing architectures, investigating and developing concepts for new areas of growth for IBM especially in the life sciences involving high performance computing, and providing leadership in high-end computing and simulation in such areas as systems biology and high-end visualization. Dr. Jordan is a Research Affiliate in MIT’s Department of Aeronautic and Astronautics and holds several leadership positions in the Society for Industrial and Applied Mathematics (SIAM) including Vice President for Industry.

Contact: Click to send email

Abstract: Computation is playing an ever increasing and vital role in biology creating demand for new machines. Vendors strive to meet demand with advanced computer architectures. In this tutorial, we will give an overview of IBM's Blue Gene system, both the hardware and software architecture. We will emphasize the key features that allow thousands of processors to work together on a user's problem. We will present the programming model used. We will explain ways to take advantage of the Blue Gene nodes and associated networks. We will provide a foundation for attendees to begin to think about thier problems, how to design and implement them to scale out and take full advantage of the computational power of thousands of processors.

With hundreds of thousands of processors, scientists will tackle problems that to date they had not considered. This will happen in some ways we can not today predict. We will discuss alternative approaches to design of computation of the problem and discuss some examples which have been run on Blue Gene systems, showing actual performance results. More importantly, we will point how having unprecedented number of processors changes one's approach to the computational problem.