TUTORIAL
PROGRAM - Saturday, June 25: ISMB
2005 will feature half-day introductory
and advanced tutorials. The tutorials will be given prior
to the meeting. The purpose of the tutorial program is
to provide participants with lectures and demos on either
well-established, or new "cutting-edge" topics,
relevant to the bioinformatics field. It offers participants
to learn about new areas of bioinformatics research, to
get an introduction to important established topics, or
to develop higher skill levels in areas they are alread
knowledgeable. Tutorial attendees should register using
the on-line ISMB 2005
registration form.
Attendees will receive a Tutorial
Entry Pass at the time they register on site. Tutorial
handouts can be picked up at the door of each tutorial
session.
Presenter: Olga Troyanskaya,
Princeton University, ogt@princeton.edu
Olga Troyanskaya is an Assistant Professor in
the Department of Computer Science and Lewis-Sigler
Institute for Integrative Genomics at Princeton
University. Her laboratory researches data integration,
gene function prediction, and biological pathway
modeling based on heterogeneous data. She has
recently written an invited review paper about
data integration for Briefings in Bioinformatics.
In addition to multiple seminar and workshop
presentations, Dr. Troyanskaya developed and
taught two bioinformatics courses at Princeton:
Analysis and Visualization of Large-Scale Biological
Data and Computational Modeling of Biological
Networks. She teaches microarray analysis for
the bioinformatics course at CSHL and taught
bioinformatics at CalState Hayward.
Abstract: In
the recent years, multiple types of high-throughput
functional genomic data have become available
that facilitate rapid functional annotation
and pathway modeling in the sequenced genomes.
Gene expression microarrays are the most commonly
available source of such data, and increasing
amount of other data, including protein-protein
interactions, sequence, literature, and localization
data are being generated. However, genomic data
sacrifice specificity for scale compared to
traditional experimental methods, yielding large
quantities of relatively lower quality measurements.
This problem has generated much interest in
bioinformatics in the past two years, as sophisticated
computational methods are necessary for accurate
functional interpretation of these large-scale
datasets. This tutorial will present an overview
of recently developed methods for integrated
analysis of functional genomic data and outline
current challenges in the field. The focus will
be on the development and use of such methods
for gene function prediction, understanding
of protein regulation, and modeling of biological
networks. This tutorial will be of interest
to computational researchers interested in contributing
to the field of data integration and analysis
of heterogeneous data and to biologists with
some computational background who are interested
in using the methods on their experimental data
and understanding their properties and limitations.
Presenters:Ernst Wit, University of Glasgow, ernst@stats.gla.ac.uk
A Reader in Statistics at the University of
Glasgow with over 10 years university teaching
experience. Currently, he is teaching Statistics
for Psychology Students, a 2nd year Probability
Distributions course, and a 4th year honours
Time Series and Spatial Statistics course. Dr
Wit also coordinates postgraduate teaching in
the Statistics Department. He has completed
an Institute for Learning and Teaching in Higher
Education accredited programme. In September
2003 Dr Wit taught a 1.5 day course on microarray
data analysis at the regional International
Biometrics Society conference, which was repeated
in May 2004 in Verona (Italy).
John McClure, University of
Glasgow, jdmc4w@clinmed.gla.ac.uk
A Lecturer in Statistical Genetics at the University
of Glasgow. He has worked bioinformatics for
over three years, principally on statistical
analysis methods for microarray experiments.
At present he is teaching statistics to BSc
Medical Science students. In 2002-2003 he taught
a Practical Statistics course (S1B) for Science,
Arts and Social Science students in the University
of Glasgow. Dr McClure has also taught the statistics
module in the Basic Bioinformatics course of
the M.Sc. in IT (Bioinformatics Strand) in 2003-2004.
He is at present completing an ILT accredited
Postgraduate Certificate in Academic Practice.
Abstract: Gene
expression profiling has become a routine technique
that can be useful to many applied life scientists
in some stage of their research. The ease with
which it is possible to generate thousands of
gene profiles stands in no comparison to the
difficult and often treacherous path of getting
reliable conclusions or even interesting research
directions from these data. The data are very
noisy and many traditional computational techniques
are not made to deal with large numbers of variables,
i.e. genes. Further, appropriate statistical
design is essential to make the best use of
the limited number of arrays and samples that
are in general available.
This course will cover three of the main aspects
of microarray experiments: to design optimal
or near-optimal microarray experiments; to perform
simple but effective microarray data cleaning
techniques; to perform effective hypothesis
tests with accurate error rate control.
The R computing environment will be used to
implement the different techniques discussed.
AM3:
RNA: Algorithms for Structure Prediction and Gene-finders
(view
proposal, pdf)
Presenter: Peter
Clote, Boston College, clote@bc.edu
PhD 1979 Mathematics at Duke University Full
Professor of Computer Science at Boston College
since 1990 Gentzen Chair of Theoretical Computer
Science at University of Munich 1995-2000 50
journal or peer-refereed proceedings publications,
editor of 3 books (Oxford Univ Press, Springer-Verlag),
author of 2 books (Computational Molecular Biology:
An Introduction at Wiley & Sons, and Boolean
Functions and Computation Models with Springer-Verlag).
Editorial Board of Notre Dame J. of Formal Logic
1991-2003, Program Committee and Organizing
Committee of numerous meetings. Semi-professional
jazz alto saxophonist (played at Recomb 2003
in Berlin).
Abstract: RNA
is a current focus of interest in molecular
biology, due to post-transcriptional regulatory
action of micro-RNA (miRNA) and small interfering
RNA (siRNA), which allow geneticists to knock
down protein translational products and better
understand gene interactions. RNA seconday structure
plays important roles in retranslation events
such as incorporating selenocysteine using the
UGA stop codon and in ribosomal frameshift
slippage events.
This tutorial surveys soem of the recent biology
of RNA, obtained from ribosomal conformation
analysis, and some important algorithms concerning
secondary structure prediction and non-coding
gene finders. We additionally include recent
findings of the author concerning the
computation of the landscape of kinetic traps
for RNA and the energy profile of random RNA.
AM4:
Developing and Using Special Purpose Hidden Markov
Model Databases (view
proposal, pdf)
Presenter: Martin
Gollery, University of Nevada, Reno, mgollery@unr.edu
Martin Gollery is the Associate Director of
Bioinformatics at the University of Nevada at
Reno. He has developed several custom HMM databases,
all of which are publicly available. In his
former role as Director of Research at TimeLogic
corporation he was involved with the development
of accelerated HMM search algorithms, as well
as other methods such as TeraBLAST and GeneBLAST.
Abstract: Hidden
Markov models provide a probabilistic model
of protein domain or family data. Databases
of Hidden Markov Models can be extremely useful
tools for the analysis of sequence data. While
the Pfam database is unquestionably the most
popular collection of HMM's in existence today,
there are many other useful collections that
are currently available, and some of them might
be more suitable for a particular need. In addition,
there are several versions of Pfam, and it is
useful to know which version to use, and why.
In this tutorial we will examine the varieties
of software methods that are currently available,
for HMM searches and for related methods such
as profile and PSSM algorithms. We will study
many different HMM databases, with specific
tips and pointers for optimizing results from
each one. Finally, we will look at methods for
the development of customized collections of
HMMs, with the associated benefits and pitfalls.
AM5:
Weighted Finite-State Transducers in Computational
Biology (view
proposal, pdf)
Presenters: Corinna
Cortes, Google, Inc., corinna@google.com
Corinna Cortes is a Research Scientist at Google,
Inc. where she is working on a broad range of
theoretical and applied large-scale machine
learning problems. Dr. Cortes' research work
is well-known in particular for her contributions
to data-mining in very large data sets for which
she was awarded the AT&T Science and Technology
Medal in the year 2000, and her work on the
theoretical foundations of support vector machines
(SVMs) and kernel techniques for the analysis
of variable-length sequences and weighted automata.
She has been giving numerous talks and presentations
in machine learning.
Mehryar Mohri, Courant Institute
- New York University, mohri@cs.nyu.edu
Mehryar Mohri is a Professor of Computer Science
at the Courant Institute of Mathematical Sciences.
Before joining NYU, Dr. Mohri worked for ten
years at AT\&T Bell Labs and AT\&T Labs
- Research where he served as the Head of a
Research Department, leading and directly contributing
to a broad range of work in machine learning,
automata theory, text and speech processing,
and the design of general algorithms. He has
taught dozens of tutorials on topics related
to weighted finite-state transducers, their
theory and algorithms, and their applications
to sequence modeling.
Abstract: Projects
such as genome sequencing and DNA microarray
studies produce an ever-increasing amount of
data and the area of computational biology now
poses some of the biggest challenges in computer
science and data mining such as data storage,
visualization, and modeling. Statistical learning
techniques are increasingly successfully applied
for modeling, but they often require substantial
algorithmic expertise: the conventional software
packages do not naturally encompass variable-length
sequences and general structures, thus, special-purpose
algorithms must be implemented to solve the
associated optimization problems.
This tutorial introduces a general framework,
weighted finite-state transducers (WFSTs), that
naturally matches a wealth of data in computational
biology, together with a set of algorithms that
efficiently allow for a series of operations
on WFSTs, enabling biologists even with modest
computer skills to successfully apply sophisticated
alignment techniques and powerful kernel methods.
It familiarizes the audience with WFSTs algorithms
and techniques, their use and application to
computational biology problems, and a software
library (FSM Library) incorporating these algorithms
and representations that is freely available
for research and academic use.
AM6:
Semantic Aggregation, Integration and Inference
of Pathway Data (view
proposal, pdf)
Presenters:Joanne
Luciano, Massachusetts General Hospital / Harvard
Medical School, jluciano@nmr.mgh.harvard.edu
Joanne Luciano is an active leader in several
community-based initiatives, including BioPAX,
the BioPathways Consortium, and the emerging
Semantic Web for Life Sciences. She co-developed
the BioPAX ontology that is on its way to becoming
the standard for biological pathway knowledge
representation. She is an authority in pathway
databases and modelling languages. Joanne’s
teaching experience ranges over 20 years, and
includes such courses as artificial intelligence,
data communications and networks, and data structures.
Joanne has been an active member of the computational
and systems biology community since 1996, presenting
at many international conferences.
Jeremy Zucker, Dana-Farber
Cancer Institute, Harvard Medical School, zucker@research.dfci.harvard.edu
Jeremy Zucker holds degrees in Computer Science
and Applied Mathematics from the University
of Colorado. He is an expert in semantic aggregation,
integration and inference. Currently a bioinformatics
specialist at the Dana-Farber Cancer Institute,
and a computational biologist for the Church
lab at Harvard Medical School, Jeremy is a lead
developer for projects such as DARPA’s
BioSPICE, the DOE Genomes to Life, and BioPAX.
His teaching experience includes a course on
systems biology at the Harvard Department of
Molecular Cell Biology, and is a contributing
author of the soon-to-be-released textbook:
Introduction to Systems Biology (Science of
Knowledge Press).
Abstract: The
objective of this tutorial is provide the student
with the knowledge and tools necessary to perform
semantic aggregation, integration and inference
on biological pathway data. The Pathway Resource
List at http://cbio.mskcc.org/prl contains over
150 biological pathway databases and is growing.
However, to consolidate all the knowledge for
a particular organism, one must extract the
pathways from each database, transform those
pathways into a standard data representation,
and load the data into an integrated repository.
This tutorial will provide the student with
the ability to avoid common pitfalls and reduce
the amount of work that must be performed in
order to accomplish pathway integration, thus
leading to more productive and efficient scientific
research overall.
The intended audience consists of bioinformaticians,
computational biologists and database developers.
Participants should be familiar with intermediate-level
programming concepts (APIs, UML, XML parsing,
object models) and intermediate-level database
concepts (SQL, data modeling, Entity-Relation
diagrams, etc.) Topics covered include issues
in biological knowledge representation, semantic
web technologies, data cleaning, and fundamentals
of the extract, transform and load (ETL) methodology,
illustrated through real world examples.
Presenters:Ruud
van der Pas, Sun Microsystems, ruud.vanderpas@sun.com
Ruud van der Pas has studied mathematics and
physics. He has a strong background in High-Performance
and Technical Computing and has been with Sun
since 1998. Ruud is a Senior Staff Engineer
in the Scalable Systems Group and operates on
a world-wide basis. His expertise is in serial
performance, shared memory parallelization and
performance analysis. At Sun, he provides application
tuning consultancy. Additionally, he regularly
gives technical presentations and workshops
at conferences and seminars. He is also responsible
for the Sun Application Tuning Seminar and works
closely with several groups within Sun to discuss
enhancements to current and future products.
Karl V. Steiner, Delaware
Biotechnology Institute, University of Delaware,
steiner@dbi.udel.edu
Karl V. Steiner is the associate director of
the Delaware Biotechnology Institute (DBI) and
a professor of electrical and computer engineering
at the University of Delaware. Prior to his
current position, he served as executive director
of the University of Delaware’s Center
for Composite Materials, and the founding executive
director of the Fraunhofer Center for Applied
Materials Research. Dr. Steiner received his
Ph.D. in mechanical engineering from the University
of Kaiserslautern in Germany and his master’s
degree in electrical and computer engineering
from the University of Delaware. As DBI associate
director he is responsible for the strategic
planning and operational oversight of the Institute’s
core instrumentation centers, including the
bioinformatics center.
Bogdan Vasiliu, Sun Microsystems, bogdan.vasiliu@sun.com
Bogdan Vasiliu has been a member of the Vertical
Markets and Solutions technical group at Sun
Microsystems since 2000. His expertise is in
grid and parallel computing, performance tuning
and analysis. Bogdan studied computer science
and engineering and has been involved with High
Performance Computing since 1995. Prior to joining
Sun, he worked at Ames Laboratory, Iowa State
University and “Politehnica” University
of Bucharest. At Sun, Bogdan is responsible
for architecting grid solutions for life science
partners and customers, porting, optimizing
and benchmarking numerically-intensive Independent
Software Vendor (ISV) applications on Sun systems,
and acting as a liaison to product engineering,
driving resolution to problems affecting ISVs,
and providing feedback on future product directions
for Sun.
Abstract: With
CPU chip speeds peaking, simply upgrading the
processor no longer offers the performance benefit
that it did, leaving many of us facing a brick
wall. This performance tutorial will teach you
how to overcome this barrier and get more out
of your applications using your /existing/ system(s).
Basically, there are two approaches to address
a performance bottleneck: 1) software - modify
the code itself to parallelize and optimize
it; 2) hardware - increase the efficiency of
utilization through scheduled resource allocation.
Both approaches will be covered in this tutorial
through detailed case studies.
After a short introduction to application performance
tuning, we will zoom in on parallelization.
The OpenMP programming model will be introduced
through relevant examples that will highlight
how and when to apply its simple but powerful
features. OpenMP is a widely available open
standard (http://www.openmp.org).
Key DRM technologies (LSF, SGE,
PBS) and how they differ, as well as key grid
concepts including policy allocation, resource
reservation, backfilling, accounting, reporting,
and billing will be discussed. DBI (University
of Delaware) will present several of their current
problems in computational biology with solution
instructions that attendees will be able to
replicate and apply. Live demonstrations will
be used throughout the tutorial to better illustrate
how to execute the instructions that will be
provided.
Additional Tutorial Authors:
Douglas O'Neal, Delaware Biotechnology Institute,
University of Delaware, USA; Loralyn Mears,
Life Science Market Segment Manager, Sun Microsystems,
Inc., USA
Presenters:
Suzanna Lewis, University of California
Berkeley, suzi@fruitfly.org
Head of the Berkeley Drosophila Genome Project
bioinformatics group and a founder of the Gene
Ontology Consortium. She, together with Martin
Reese, taught one previous ISMB tutorial, the
Genome Annotation Assessment Project held in
1999 in Heidelberg.
Barry Smith, Universitat des Saarlandes,
phismith@buffalo.edu
President of European Centre for Ontological
Research (ECOR) a new research program aimed
at applying the theories developed by philosophers
to a variety of problems in information science
and related areas.
Abstract: Biomedical
advances are increasingly stymied, not by a
lack of experimental data, but by a lack of
data that is available in computable form. Biological
data must be readily accessible, comparable,
and fully correlated, and it must be captured
in a language that allows the formulation of
coherent testable hypotheses if it is to efficiently
provide relevant answers to scientific inquiries
and thus enable discoveries. The bottleneck
in research today, in other words, is the task
of sifting through large data sets to decide
what experiments should be done.
Ontological frameworks can provide
a shared language for communicating biological
information and thereby integrating biological
knowledge and removing this data bottleneck.
These rigorous semantic descriptions of the
entities and relationships between these entities
can then be used to formulate hypotheses about
and navigate through the volumes of data. To
this end there are an increasing number of groups
developing ontologies in assorted biological
domains. However, these efforts will only be
beneficial and aid biological data integration
if certain criteria are met. These prerequisites
are that the ontologies are non-overlapping,
that they are accepted and used by the community,
and that they are well-principled.
Additional Tutorial Authors:
Michael Ashburner, University of Cambridge;
Mark Musen, Stanford University; Rama Balakrishnan,
Stanford University; David Hill, Jackson Laboratory
Presenter: Daniel
Huson, Center for Bioinformatics, Tuebingen
University, huson@informatik.uni-tuebingen.de
1990 PhD Mathematics Bielefeld
University
1990-97 Research Assistant Bielefeld University
1997-99 Post-Doc in Bioinformatics at UPenn
and Princeton, with Tandy Warnow
1999-02 Senior Research Scientist Celera Genomics,
Gene Myers group
2002- Professor of Algorithms
in Bioinformatics at Tuebingen University.
Abstract: The
evolutionary history of species is best represented
by a phylogenetic tree and there exist many
well-known methods for reconstructing such trees,
in particular, from bio-molecular sequences.
However, for some types of data, this may not
be appropriate and approaches that generate
phylogenetic networks may be more suitable.
For example, hybridization between different
plant species or recombination between closely
related bacteria can lead to sequence data whose
evolutionary history is best represented by
a network.
There are two types of phylogenetic networks:
ones that are intepretable as non-tree-like
models of evolution, and ones that do not possess
such an interpretation, but should rather be
considered as visualizations of incompatibilities
within a data set. Examples of the former are
ancestral recombination graphs, hybridization
networks and reticulation network, whereas examples
of the latter are splits graphs, consensus networks
and neighbor-nets. This tutorial will discuss
both types.
We will first give a brief introduction to reticulate
evolution and phylogenetic networks. Then, a
number of important network reconstruction methods
will be discussed. Finally, the application
of some phylogenetic methods will be illustrated
using typical data sets (haplotypes, plant hybrids,
and whole genome data for a set of prokaryotes).
PM10:
Mining the Biomedical Literature : State of
the Art, Challenges and Evaluation Issues (view
proposal, pdf)
Presenter: Hagit
Shatkay, School of Computing, Queen's University,
Kingston, Ontario, shatkay@cs.queensu.ca
Hagit Shatkay is an assistant professor at the
School of Computing, Queen’s University
in Kingston, Ontario. Her research is in the
area of machine learning as it applies to biomedical
data mining, and she is an active member of
the biomedical text-mining research community.
Prior to joining Queen’s University, she
was an Informatics Research scientist with the
Informatics Research group at Celera/Applied
Biosystems, and before that a postdoctoral fellow
at the National Center for Biotechnology Information
(NCBI). She has a PhD in Computer Science from
Brown University, and an MSc and BSc in Computer
Science from the Hebrew University in Jerusalem.
Abstract: Much
knowledge about genes and proteins is reported
in the vast amount of biomedical literature.
The recent advancement of high-throughput biological
techniques is accompanied by a steady and overwhelming
increase in the number of related publications.
Thus the ability to rapidly and effectively
survey the literature forms a necessary step
for both the design and the interpretation of
any large-scale experiment, as well as for the
timely curation of biomedical knowledge in public
databases. Consequently, the past few years
have seen a surge of interest in biomedical
text mining. Several text-related disciplines
are harnessed in such efforts, including natural
language processing, information extraction,
and information retrieval.
The objective of this tutorial is to provide
a structured introduction to the biomedical
text mining field, from both the biomedical-application
and the text-mining perspectives. It will present
general and biomedical-specific text mining
methods, examine and analyze existing work in
biomedical literature mining, and place it in
the framework of the explicit text mining disciplines.
The tutorial will especially emphasize critical
assessment, validation and evaluation methods
that are used in text mining and information
retrieval, and survey recent evaluation initiatives.
PM11:
Gene Expression Levels as Traits in Genetic
Linkage Analysis (view
proposal, pdf)
Presenters:
Steffen Möller, University
of Rostock, Institute of Immunology, moeller@pzr.uni-rostock.de
1996 Diploma in Computer Science from the University
of Hildesheim, Germany 1997-2001 EMBL fellowship
for the EMBL-EBI European Bioinformatics Institute
Cambridge, UK PhD from the University of Cambridge
since 2001 Postdoctoral position at the Institute
of Immunology, University of Rostock, Germany
Robert Hoffmann, European
Bioinformatics Institute, hoffmann@ebi.ac.uk
1999 Diploma in Biology from the University
of Vienna, Austria since 2001 PhD Student at
the National Center of Biotechnology, Madrid,
Spain currently Visitor at the EMBL-EBI European
Bioinformatics Institute, Cambridge, UK
Abstract: Diseases
that individuals develop late in life have no
direct influence on their reproduction. Variants
of genes that contribute to the susceptibility,
onset or severity of these diseases are therefore
likely to be kept in the population. Together
with environmental factors, combinations of
such defects lead to polygenic diseases. Some
common autoimmune dieases (e.g. Multiple Sclerosis,
Rheumatoid Arthritis) but also heart diseases
and obesity are examples. Genes in polygenic
diseases need to be investigated in combination
for their phenotypic effects. However, genetic
studies in humans are extremely difficult. Animal
models came to rescue with shorter generation
times, common environments and directed breeding.
Strains are searched that differ
in the susceptibility to a disease. Genetic
markers determine differences in the genotype.
Statistical association of markers with differences
in the trait of the disease among the individuals
of the F2 or later generations led to the concept
of quantitative trait loci.
This presentation explains how
to combine the genotyping and disease phenotypes
with RNA expression levels. In this connexion,
the RNA expression level of genes is interpreted
as a trait of the disease. Equivalent analyses
yield expression QTLs representing candidates
for cis- and trans-controlling regions for gene
expression. With genes interacting, the effect
on the expression of genes may be changed by
more than the sum of the individual loci (epistatic
effects). The cross-comparison with information
on molecular pathways, extracted from Medline
abstracts or manually curated, yields explicit
candidates for further evaluation.
The approach is discussed on the
example of a murine model of Multiple Sclerosis.
Presenters:Andrew
Boyd, University of Michigan/Michigan Center
for Biological Information (MCBI), adboyd@umich.edu Dr. Boyd is a postdoctoral research fellow
in Biomedical Information at the University
of Michigan. He is under the mentorship of Dr.
Brian Athey, who is an Associate Professor in
Biomedical Informatics and the Director of the
Michigan Center for Biological Information.
His research is on bioinformatics applications
on cluster computing. Dr. Boyd has presented
a similar workshop to the University of Michigan
Bioinformatics Students in March 2004. The feedback
from the students was positive; they were able
to after the course understand the basic Principals
of Cluster Computing to apply to future research
topics.
Abhijit Bose, University of
Michigan/Michigan Grid Research and Infrastructure
Development (MGrid), abose@engin.umich.edu
Dr. Bose is the associate director of MGrid
(Michigan Grid Research and Infrastructure Development)
and a scientist at The University of Michigan.
He is also a member of the DARPA-funded Virtual
Soldier project team at Michigan. He has extensive
experience in cluster computing and parallel
algorithms. He has organized and taught eight
NPACI (National Partnership for Advanced Computing
Infrastructure) Parallel Computing workshops
at Michigan since 2000. He has also taught a
number of courses at the Annual Summer Insitute
organized by the San Diego Supercomputer Center.
These popular workshops have always been oversubscribed
and additional sessions are often organized.
Abstract: Commodity
processor-based clusters have rapidly become
the compute systems of choice for computational
modeling and simulation in many scientific fields.
Many biological computations such as genomic
and protein analysis are increasingly being
performed on large-scale clusters based on Linux.
There are many ways to configure a cluster,
including 32 and 64-bit CPUs, interconnects
such as Myrinet, Ethernet, Infiband, and storage
systems such as commodity RAIDs and SANs. This
tutorial focuses on clusters and their application
to solving large-scale bioinformatics problems.
The participants will learn the basics of Linux
clusters, programming environments, batch and
interactive computations, and a series of examples
from Bioinformatics including how to use parallel
versions of BLAST, Smith-Waterman, INTERPROSCAN
etc. The basics of parallel computing on clusters
such as Message Passing Interface (MPI) and
threads will be introduced. The participants
will learn how to interpret common debugging,
system and application-related error messages
on clusters. A unique feature of the tutorial
is a hands-on session on a high-performance
Linux cluster at The University of Michigan
that the participants will be able to use during
the course of the tutorial.
PM13:
Computational Geometry of Protein Structure
and Function (view
proposal, pdf)
Presenter: Iosif
Vaisman, George Mason University, ivaisman@gmu.edu
Dr. Iosif Vaisman is Associate Professor of
Bioinformatics in the School of Computational
Sciences at George Mason University. His research
focuses on computational protein structure analysis.
Dr. Vaisman has participated in bioinformatics
curriculum and course development and he has
been teaching bioinformatics, pharmacoinformatics,
and computational biology at the University
of North Carolina at Chapel Hill and George
Mason University since 1995.
Abstract: The
tutorial covers theoretical approaches, techniques
and computational tools for computational geometry
approach to protein structure analysis, classification,
and prediction. Students will acquire knowledge
of fundamental principles and methods for protein
structure analysis, as well as practical skills
necessary to use modern computational tools
for such analysis.
PM14:
A Massively Parallel High Performance
Computing Environment for Computational Biology(view
proposal, pdf)
Presenters:
Gyan Bhanot, IBM T.J. Watson Research
Center, gyan@us.ibm.com
Gyan Bhanot got his PhD in Theoretical Physics
from Cornell University in 1979 and has held
research positions at IAS, Princeton, Brookhaven
Lab, CERN and ITP Santa Barbara. After a few
years as Associate Professor of Physics at Florida
State University, he joined Thinking Machines
Corporation to develop applications for the
Connection Machine. He joined IBM in 1994 as
a Research Staff Member and has been there since.
His interests are in Computational Biology,
Simulations and Parallel Algorithms.In addition
to his position at IBM Research, he is an Adjunct
Professor in the Biomedical Engineering Department
at Boston University and a Visiting Member in
the Center for Systems Biology at the Institute
for Advanced Study in Princeton.
Bob Germain, IBM T.J. Watson
Research Center, rgermain@us.ibm.com
Robert S. Germain received his A.B. in physics
from Princeton University in 1982 and his M.S.
and Ph.D. in physics from Cornell University.
After receiving his doctorate in 1989, he joined
the IBM Thomas J. Watson Research Center as
a Research Staff Member in the Physical Sciences
Department. From 1995 to 1998, he was project
leader for development of a large scale fingerprint
identification system using an indexing scheme
(FLASH) developed at IBM Research. Since 2000,
Dr. Germain has been responsible for the science
and associated application portions of the Blue
Gene project (http://www.research.ibm.com/bluegene)
as well as mnaging the Biomolecular Dynamics
and Scalable Modeling group. His current research
interests include the parallel implementation
of algorithms for high performance scientific
computing, the development of new programming
models for parallel computing, and applications
of high performance computing to challenging
scientific problems in computational biology.
Dr. Germain is a member of the IEEE and the
American Physical Society.
Kirk Jordan, IBM Deep Computing,
kjordan@us.ibm.com
Kirk E. Jordan received his Ph.D. in Applied
Mathematics from the University of Delaware
in 1980. He has had more than 20 years experience
in high performance and parallel computing having
held computational science positions at Exxon
Research and Engineering, Argonne National Laboratory,
Thinking Machines and Kendall Square Research
before joining IBM in 1994. At IBM, he has held
several positions in which he promoted high
performance computing and high performance visualization.
Currently, Jordan is Emerging Solutions Executive
in IBM’s Strategic Growth Business Deep
Computing Group where he is responsible for
overseeing development of applications for IBM’s
advanced computing architectures, investigating
and developing concepts for new areas of growth
for IBM especially in the life sciences involving
high performance computing, and providing leadership
in high-end computing and simulation in such
areas as systems biology and high-end visualization.
Dr. Jordan is a Research Affiliate in MIT’s
Department of Aeronautic and Astronautics and
holds several leadership positions in the Society
for Industrial and Applied Mathematics (SIAM)
including Vice President for Industry.
Abstract: Computation
is playing an ever increasing and vital role
in biology creating demand for new machines.
Vendors strive to meet demand with advanced
computer architectures. In this tutorial, we
will give an overview of IBM's Blue Gene system,
both the hardware and software architecture.
We will emphasize the key features that allow
thousands of processors to work together on
a user's problem. We will present the programming
model used. We will explain ways to take advantage
of the Blue Gene nodes and associated networks.
We will provide a foundation for attendees to
begin to think about thier problems, how to
design and implement them to scale out and take
full advantage of the computational power of
thousands of processors.
With hundreds of thousands of processors, scientists
will tackle problems that to date they had not
considered. This will happen in some ways we
can not today predict. We will discuss alternative
approaches to design of computation of the problem
and discuss some examples which have been run
on Blue Gene systems, showing actual performance
results. More importantly, we will point how
having unprecedented number of processors changes
one's approach to the computational problem.