Special Session Details
ISMBECCB 2011 features Special Session presentations throughout the conference July 17 - July 19. Special Sessions have the purpose of introducing the scientific community to relevant scientific issues and topics that are typically not within the focus of the conference. Preliminary program information on Special Sessions is noted below. (Schedule subject to change)
Special Session 1: BioLINK: Integration across Genomics and Medicine
Christian Blaschke, Bioalma, Madrid, Spain
Lynette Hirschman, Mitre, Bedford, United States
Hagit Shatkay, University of Delaware, United States
Alfonso Valencia, Spanish National Cancer Research Centre, Madrid, Spain
Date: Sunday, July 17
Start Time: 10:45 a.m. - 12:40 p.m.
Room: Hall F2
This year, the BioLINK meeting reaches out to the diverse ISMB community with a highly relevant interdisciplinary topic: data integration and interoperability across the computational, biological and medical fields. Instead of the traditional one-day SIG before ISMB, which typically focused on various aspects of text mining, this year we take a broad focus at the interface of bioinformatics and medicine, and we are holding the meeting as a special session within the conference.
Much information in both biology and medicine exists in the form of text, including reports, published literature, database entries and medical patient records. Hence, text mining methods (which have traditionally been the focus of BioLINK) remain relevant to both the biological and the medical communities, and to the interdisciplinary research between the two.
However, many other types of relevant data, ranging from genomic mutations to chemical reactions and vital signs in patients are not textual. They come in the form of time series, images, genomic and proteomic sequences, among others.
Translating biological advances into practical medical solutions is the focus of much current research in several communities, and discussions of the needs and the challenges have recently been taking place at AMIA, particularly at the Summit on Translational Bioinformatics, as well as in the EU, at the workshop on Bridging the Gap in Biomedical Genetics (October, 2010, Cambridge, UK). This special session is specifically centered on the role and challenges of multiple types of data and models, and the way such diverse sources of information can be integrated and used effectively.
Talks and Speakers:
Speaker: Soren Brunak, PhD.
Technical University of Denmark &
University of Copenhagen
Integrating phenotypic data from electronic patient records with molecular level systems biology
Transition to parallel tracks
Speaker: Russ Altman, MD, PhD.
Issues in annotating whole-genome data with clinically relevant pharmacogenomics knowledge
Transition to parallel tracks
Speaker: Yves Moreau, PhD.
Katholieke Universiteit Leuven
Instrumenting the Healthcare System to Define the True Names of Disease
Transition to parallel tracks
Panel Discussion (Drs. Altman, Brunak and Moreau)
Interoperability across Information Extraction and Computational Bio-medicine Platforms
Led by Florian Leitner, CNIO, Spain,
and Karin Verspoor, PhD, University of Colorado, Denver
Talk Abstracts and Speakers Bios:
Integrating Phenotypic Data from Electronic Patient Records with Molecular Level Systems Biology
Speaker:Søren Brunak, Technical University of Denmark & University of Copenhagen
Abstract: Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently are mapped to systems biology frameworks.
Søren Brunak, Ph.D., is professor of Bioinformatics at the Technical University of Denmark and professor of Disease Systems Biology at the University of Copenhagen. Prof. Brunak is the founding Director of the Center for Biological Sequence Analysis, which was formed in 1993 as a multi-disciplinary research group of molecular biologists, biochemists, medical doctors, physicists, and computer scientists. Søren Brunak has been highly active within data integration, where machine learning techniques often have been used to integrate predicted or experimentally established functional genome and proteome annotation. His current research does combine molecular level systems biology and healthcare sector data such as electronic patient records and biobank questionnaires . The aim is to group and stratify patients not only from their genotype, but also phenotypically based on the clinical descriptions in the medical records.
Prof. Brunak completed his early studies in physics at the Niels Bohr Institute, University of Copenhagen, and his doctoral work in Computational Biology at the Technical University of Denmark. Brunak has published more than 175 papers with peer-review, altogether having more than 18,500 citations. His most cited paper (from 1997) has more than 3,800 citations, h-factor 53. Søren Brunak is member of the Danish Academy for the Natural Sciences (1997), The Danish Academy of Technical Sciences (2002), The Danish Royal Society of Science and Letters (2004) and of EMBO since 2009. More about his research group can be found at www.cbs.dtu.dk
Issues in Annotating Whole-Genome Data with Clinically Relevant Pharmacogenomics Knowledge
Speaker: Russ Biagio Altman. Professor of Bioengineering, Genetics, & Medicine; Chairman of the Bioengineering Department. Stanford University.
The mission of the Pharmacogenomics Knowledgebase (PharmGKB, http://www.pharmgkb.org/
) is to collect and encode all available information about how human genetic variation impacts drug response. We have recently reported using the PharmGKB to annotate the whole genome of an individual. This initial annotation was done manually and taught us what would be required for routine markup of genomes in a relatively high-throughput fashion. We have therefore revisited much of the PharmGKB contents and re-curated the data in order to make clinical annotation more facile and accurate. I will discuss the issues associated with annotating whole human genomes, particularly focusing on the challenges of annotating rare or novel mutations that have never been (and may never be) studied, but which look potentially important for the individual whose genome is being annotated.
Russ Biagio Altman is professor of bioengineering, genetics, & medicine (and of computer science, by courtesy) and chairman of the Bioengineering Department at Stanford University. His primary research interests are in the application of computing technology to basic molecular biological problems of relevance to medicine. He particularly interested in informatics methods for advancing pharmacogenomics, the study of how human genetic variation impacts drug response (e.g. http://www.pharmgkb.org
/). Other work focuses on the analysis of functional sites within macromolecules and the application of algorithms for determining the structure, dynamics and function of biological macromolecules (http://features.stanford.edu/
). Dr. Altman holds an M.D. from Stanford Medical School, a Ph.D. in Medical Information Sciences from Stanford, and an A.B. from Harvard College. He has been the recipient of the U.S. Presidential Early Career Award for Scientists and Engineers and a National Science Foundation CAREER Award. He is a fellow of the American College of Physicians, the American College of Medical Informatics, and the American Institute of Medical and Biological Engineering. He is a past-president and founding board member of the International Society for Computational Biology, and an organizer of the annual Pacific Symposium on Biocomputing. He leads one of seven NIH-supported National Centers for Biomedical Computation, focusing on physics-based simulation of biological structures (http://simbios.stanford.edu/
). He won the Stanford Medical School graduate teaching award in 2000. He is a member of the Institute of Medicine of the National Academies
Instrumenting the Healthcare System to Define the True Names of Disease
Speaker: Yves Moreau, Katholieke Universiteit Leuven
Abstract: Most disciplines of medicine, with the notable exception of infectious disease, continue to classify and treat disease at the phenomenological level. The promise of genomic medicine includes personalizing diagnoses and therapies but also of redefining diseases based on molecular markers to bring us closer to an etiological classification. The availability of commodity-priced genome-scale assays and large public data sets makes the goal of finding these “true names” of diseases now feasible. Several such approaches are described ranking from cancer to autism. Also discussed are several structural impediments achieving this goal including: A) the growth of the Incidentalome, the tsunami of false positives that inevitably result from application of massively parallel tests; B) the lack of systematic interpretations of genomic tests and evaluation of their performance C) the absence of a mechanism to transfer the growing knowledge of genomics to the physician at the point of care. The opportunity in instrumenting the healthcare enterprise for discovery research to overcome these impediments is reviewed.
About the Organizers:
Christian Blaschke is technical coordinator of the Spanish Institute of Bioinformatics (INB), Madrid, Spain. He obtained his PhD in the Group of Alfonso Valencia and was part of the organization of BioLINK SIG meetings starting in 2002. His academic work has been focused on text mining applied to molecular biology and biomedicine where he contributed in the areas of protein-protein interactions, DNA array analysis and automatic ontology learning. Christian Blaschke was also one of the organizers of the BioCreAtIvE (Critical Assessment of Information Extraction for Biology) challenge carried out in 2004.
Lynette Hirschman is Director, Biomedical Informatics for the Information Technology Center at the MITRE Corporation in Bedford, MA, USA. She has been an organizer of the BioLINK SIG on Text Mining in Biology for ISMB starting in 2002. and a founder of BioCreAtIvE, (Critical Assessment of Information Extraction for Biology), the first Challenge Evaluation for text mining applied to biological applications. Her research focuses on the application of natural language processing technology to biomedical applications, including removal of personally identifiable information (PII) for protection of privacy, automated fact extraction for medical records and on the coupling of genomic and clinical data to support translational medicine. She is a member of the board of the Genomic Standards Consortium, working on metadata capture for metagenomics and a member of the organizing team for the i2b2 2011 challenge evaluation for information extraction from medical records.
Hagit Shatkay is an Associate Professor at the Dept. of Computer and Information Sciences at the University of Delaware, with cross-appointments at the Biomedical Engineering Program, and at the Delaware Biotechnology Institute. Her research is in the area of machine learning as it applies to biomedical data mining. Among her many organizational roles, she has been on the organizing committee of the BioLINK SIG at ISMB since 2005, on the steering committee for TREC Genomics, and multiple times chair of the text mining area at ISMB.
She is an active member of the bio-text research community since its early days (1999), where her work focuses primarily on biomedical information retrieval, and more recently on integrating text, image and sequence data within the biological data mining process. Recent major projects in her lab include work on understanding and predicting heart disease using multiple sources of information, document retrieval through image and text data, SherLoc – protein subcellular location prediction by integrating text and sequence data, and F-SNP – a comprehensive system for ranking SNPs by potential deleterious effects.
Alfonso Valencia is the director of the Structural and Computational Biology Programme of the Spanish National Cancer Research Centre and Director of the Spanish Bioinformatics Institute. His group works in the development and application of Computational Biology methods for the study of problems related with the evolution of molecular interactions. In the field of text mining his group developed new approaches for the detection of protein function and interactions in biological text and a number of applications combining text mining with other genomic resources.
Dr. Valencia is co-executive editor of the journal "Bioinformatics", member of EMBO, co-organizer of ECCB 05, chair of the text mining/Biological networks areas of ISMB (03-07) and founder, former VP, and member of the board of directors of ISCB. Dr. Valencia has been part of the organization of BioLINK (Text Mining in Biology) since 2001 and co-organizer of the Critical Assessment of Information Extraction for Biology (BioCreAtIvE 2003-4, 2006-07, 2009-2012).
Special Session 2: Insights & outlook for individual genome interpretation: lessons from the Critical Assessment of Genome Interpretation
John Moult, Baltimore, University of Maryland, United States
Susanna Repo, University of California, Berkeley, United States
Steven E. Brenner, University of California, Berkeley, United States
Date: Sunday, July 17
Start Time: 2:30 p.m. – 4:25 p.m.
Room: Hall F2
This session will provide an overview and assessment of the state-of-the art in prediction of phenotype from genotype. The session is motivated by the burgeoning availability of individuals' genomes, and the desire to interpret them for research and clinical applications.
The Critical Assessment of Genome Interpretation (CAGI, \'kā-jē\), is a community experiment to assess the current methods of predicting phenotypic impacts of genomic variation. Based on the same model as CASP (Critical Assessment of Structure Prediction), the experimental goals of CAGI are to evaluate the current methods to make useful predictions of phenotypes from genomics data, to identify bottlenecks in genome interpretation that suggest especially critical areas of future research, and to engage and connect researchers from the diverse disciplines whose expertise is essential to methods for genome interpretation.
The first round of CAGI, which was organized during the latter half of 2010, included predictions of phenotype for two sets of cancer mutations, a set of mutations in a monogenic disease related enzyme, 10 personal genomes, and identification of the molecular mechanisms underlying the results of genome wide association studies. In total, 108 prediction submissions from 17 research groups in 8 different countries were submitted to the first CAGI experiment.
In this session, an overview of the initial CAGI experiment will be presented. We will discuss the broad successes and failures, and describe the prediction challenges of the different datasets of CAGI. Current methods for predicting phenotype from genomic data will be presented and the limitations of these methods discussed. The future directions for CAGI and genome interpretation will also be discussed.
2:30PM Introduction to the Critical Assessment of Genome Interpretation (CAGI) experiment
2:55PM Transition to parallel tracks
3:00PM Results from CAGI 2010: successes and challenges
3:25PM Transition to parallel tracks
3:30PM Methods to predict individuals’ medical phenotypes from genomic data
3:55PM Transition to parallel tracks
4:00PM Future directions for CAGI and Genome Interpretation, Panel Discussion
4:20PM Conclusions on panel discussion
4:25PM Transition to parallel tracks
Introduction to the Critical Assessment of Genome Interpretation (CAGI) experiment
Steven E. Brenner, University of California Berkeley
Susanna Repo, University of California Berkeley
In this session, CAGI organizers will provide an introduction to the experiment and survey the mechanisms by which genetic variation can yield phenotypic impacts and will define the different levels of phenotypes – organismal, cellular and molecular – that are predicted in the CAGI effort. The speakers will discuss variation phenomena, observation methods and resulting data, and exemplary biological phenotypes that result from each. The first CAGI experiment will also be described, with a description of the prediction datasets of that experiment.
Results from CAGI 2010: successes and challenges
Speaker: Pauline Ng, Genome Institute of Singapore
Pauline Ng, an assessor for CAGI , will present an overview of results of the initial experiment. She will discuss broad successes and failures, and describe the prediction challenges of the different CAGI datasets, focusing on the single nonsynonymous variations. She will identify bottlenecks of the current methods and datasets and suggest future research directions.
Methods to predict individuals’ medical phenotypes from genomic data
Sean Mooney, Buck Institute
Rachel Karchin, Johns Hopkins University
Hannah Carter, Johns Hopkins University
One of the most challenging CAGI datasets involved predicting the phenotypes of individuals from the Personal Genome Project. These individuals have made many of their genomes and medical characteristics publicly available. We solicited dozens of additional as-yet unpublished phenotypes from these individuals and presented these to predictors. The assessor for this dataset, Sean Mooney, will present an overview of the challenge and the nature of the results. Hannah Cater and Rachel Karchin will be shown in a video explaining their approach to making predictions for this dataset.
Future directions for CAGI and Genome Interpretation, Panel Discussion
Russ Altman, Stanford University
Rita Casadio, University of Bologna
Iddo Friedberg, Miami University
Mauno Vihinen, University of Tampere
John Moult, University of Maryland (Discussion moderator)
In this last session, panelists will discuss the future directions for CAGI and genome interpretation. In order to facilitate the participation of the ISMB community in the discussion, we will collect questions, comments, and suggestions from the audience throughout the special session using electronic communications.
Special Session 3: RNA structure: from genomes to nanotechnology
Peter Clote, Boston College, United States
Ivo L. Hofacker, University of Vienna, Austria
David Mathews, Rochester Medical Center, New York, United States
Peter Schuster, Professor emeritus, University of Vienna, Austria
Date: Monday, July 18
Start Time: 10:45 a.m. – 12:40 p.m.
Room: Hall F2
This Special Session includes both computational and experimental methods at the cutting edge of RNA research. The interdisciplinary aspects are focusing on theory and practice to engineer and control RNA molecules for performing computations, for controlling genes, for self-assembly, for binding as aptamers to specific targets, and ultimately for serving as pharmaceuticals.
Niles Pierce, Caltech
Rhiju Das, Stanford
Susan Gottesman, NIH
Peter Stadler, Leipzig
Engineering Nucleic Acid Devices
Niles Pierce, Caltech
The programmable chemistry of nucleic acid base-pairing provides a versatile framework for engineering molecular devices and systems. This talk will describe efficient algorithms for nucleic acid sequence design and illustrate their use in engineering programmable molecular amplifiers for multiplexed imaging of mRNA expression within intact vertebrate embryos.
Atomic accuracy and blind predictions through enumerative 3D RNA modeling
Prof Rhiju Das, Stanford
High-resolution 3D structure modeling of RNAs and other macromolecular systems is a long sought but still unachieved goal of computational biology. Inspired by successful protein modeling approaches, we recently showed that the Rosetta framework permits modeling and design of small non-canonical RNA motifs at near-atomic accuracy. Nevertheless, a major bottleneck remains: the difficulty of comprehensively sampling these molecules’ many degrees of freedom. Towards eliminating fundamental issues in this sampling bottleneck, we have postulated a ‘StepWise Ansatz’ (SWA) for constructing well-packed models in small steps, enumerating several million conformations for each nucleotide and covering all possible build-up paths. We present results on non-canonical RNA motifs as well as highly irregular protein loops that have been intractable for prior fragment assembly or analytic loop closure approaches. In all cases, the enumerative SWA method either reaches atomic accuracy or exposes flaws in Rosetta’s high-resolution energy function, suggesting that the approach is ready for more rigorous tests. Over the last year, we have therefore helped organize and participated in blind de novo modeling trials on RNAs with previously unsolved 3D structures. For a few cases, chemical mapping and crystallographic data have become available, and we report high accuracy de novo modeling in at least two non-canonical motifs.
Bacterial small RNAs in regulatory networks
Susan Gottesman, National Cancer Institute, NIH
Bacteria such as E. coli contain on the order of 100 small regulatory RNAs. A large number of these regulate gene expression by pairing with target mRNAs to activate or repress gene expression; expression of the sRNAs are themselves subject to stringent regulation of their transcription. One challenge has been to identify all the targets of a given sRNA, although both experimental and computational methods continue to improve. In an alternative approach, we have developed rapid methods to query the pairing sRNAs for their effects on a given target. The combined outcome of experiments in our lab and others demonstrates an important role for sRNAs in connecting transcriptional networks, in feedback regulation of networks, and in mediating multiple regulatory inputs to many targets.
Folding Algorithms for RNAs with Pseudoknots
Peter Stadler, Boston College
A large variety of dynamics programming algorithms for RNA structures with pseudoknots have been proposed that differ dramatically from one another in the classes of structures considered. Mostly, these are motivated by considerations of algorithmic tractability. An alternative is the use of the natural topological classification of RNA structures in terms of irreducible components that are embeddable in surfaces of fixed genus. For genus-1 structures, this adds four additional building blocks to the conventional secondary structures.
With a single exception the resulting class of structures encompasses all known pseudoknotted structured. A corresponding unambiguous multiple context free grammar provides an efficient dynamic programming approach for energy minimization, partition function, and stochastic sampling. The topology-based approach is not only appealing from a theoretical point of view: it admits a parametrization of pseudoknot penalties that depends on topological complexity, which increases sensitivity and positive predictive value for base pairs by 10-20%.
Reidys CM, Huang FWD, Andersen JE, Penner RC, Stadler PF, Nebel ME
Topology and prediction of RNA pseudoknots. Bioinformatics 27: 1076-1085 (2011)
Special Session 4: ELIXIR
Janet Thornton, European Bioinformatics Institute, United Kingdom
Monday, July 18
2:30 p.m. - 4:25 p.m.
Description: The mission of ELIXIR is to build a sustainable European infrastructure for biological information supporting life science research and its translation to:
the bioindustries, and
ELIXIR has completed a three year, European wide consultation involving academic and industrial users, data providers and international collaborators, including three stakeholders meetings, two surveys, and fourteen work packages.
ELIXIR will be a distributed infrastructure arranged as a hub and nodes, with the hub at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK.
The ELIXIR Steering Committee recently issued a Request for Suggestions for ELIXIR nodes, seeking input from organisations that are interested in hosting one of its ‘nodes’, in order to help shape ELIXIRs construction. Organisations wishing to contribute their input were requested to consider how they might contribute: data resources; bio-computing capacity; infrastructure for data integration; and services for the research community, including training and standards development.
The infrastructure is currently moving forward towards construction.
The ELIXIR special session will comprise:
2:30 p.m. - 2:55 p.m. Building ELIXIR presented by Janet Thornton (ELIXIR Project Coordinator) to introduce the audience to ELIXIR and the latest developments towards its construction
3:00 p.m. - 3:25 p.m. Two short presentations on:
'ELIXIR and the other ESFRI BMS Projects' presented by ELIXIR Project Manager, Andrew Lyall
'ELIXIR and Industry' presented by Dominic Clark, EBI Industry Programme Manager.
3:30 p.m. - 3:55 p.m. A panel session during which representatives from the ELIXIR Steering Committee will talk about how their country is working towards a pan European infrastructure and how ELIXIR has fostered the construction of an internal network within their country.
Chaired by: Janet Thornton (ELIXIR Project Coordinator)
Anna Tramontano (University of Rome) representing Italy
Alfonso Valencia (Spanish National Cancer Research Centre- CNIO) representing Spain
Søren Brunak (Technical University of Denmark- DTU) representing Denmark
Bengt Persson (Linköping University) representing Sweden
Tim Hubbard (Wellcome Trust Sanger Institute) representing the UK
Time will be allowed after each presentation and panel session for discussion.
4:00 p.m. - 4:25 p.m. Panel Continues
Special Session 5: Bioinformatics for Synthetic Biology: Scaling to Complex Biological Systems
C. Forbes Dewey, Jr., Massachusetts Institute of Technology, Cambridge, United States
Richard I. Kitney, Imperial College, London, United Kingdom
Date: Tuesday, July 19
Start Time: 10:45 a.m. - 12:40 p.m.
Room: Hall F2
Synthetic Biology is an emerging discipline that learns from natural biological systems and attempts to construct novel living devices that are based on biological principles. These constructs are self-replicating systems that have special functions. Making ethanol from switchgrass would be an example. Many of the constructs resemble electrical circuit elements such as and/nor gates, adders and multipliers, and feedback loops, so one can conceive of crating “living” entities that have complex decision capabilities as well as useful functions.
This Special Session will be devoted to exploring the role of bioinformatics in synthetic biology. One of the key concepts driving current research is the ability to combine multiple synthetic biological “parts” to achieve complex goals. With the publication and curation of these synthetic biology constructs, one can create a database of Standard Biological Parts. If the curation includes computable descriptions (e.g. ODEs that include known rate constants written in an accepted machine-readable format such as SBML), then one could predict the complex behavior of collections of these modules working as a single entity.
Presentations (in presentation order):
Richard I. Kitney. Systematic design challenges in synthetic biology.
Ron Weiss. Synthetic biology: from parts to modules to therapeutic systems.
Christopher Voigt. Refactoring complex gene clusters in bacteria.
C. Forbes Dewey, Jr. Managing the "BRICKS" of synthetic biology.
Abstract - Systematic design challenges in synthetic biology.
The paper will discuss the systematic design approach to synthetic biology. Key elements of this approach are the design cycle and the use of modularity. The paper will examine these principles in relation to two examples: biological logic gate design and a biosensor for urinary track infection.
Abstract - Synthetic biology: from parts to modules to therapeutic systems.
Synthetic biology is revolutionizing how we conceptualize and approach the engineering of biological systems. Recent advances in the field are allowing us to expand beyond the construction and analysis of small gene networks towards the implementation of complex multicellular systems with a variety of applications. In this talk I will describe our integrated computational / experimental approach to engineering complex behavior in living systems ranging from bacteria to stem cells. In our research, we appropriate design principles from electrical engineering and other established fields. These principles include abstraction, standardization, modularity, and computer aided design. But we also spend considerable effort towards understanding what makes synthetic biology different from all other existing engineering disciplines and discovering new design and construction rules that are effective for this unique discipline. We will briefly describe the implementation of genetic circuits and modules with finely-tuned digital and analog behavior and the use of artificial cell-cell communication to coordinate the behavior of cell populations. The first system to be presented is an RNAi-based logic circuit that can detect and destroy specific cancer cells based on their microRNA expression profiles. We will also discuss preliminary experimental results for obtaining precise spatiotemporal control over stem cell differentiation for tissue engineering applications. We will conclude by discussing the design and preliminary results for creating an artificial tissue homeostasis system where genetically engineered stem cells maintain indefinitely a desired level of pancreatic beta cells despite attacks by the autoimmune response, relevant for diabetes.
Abstract - Managing the "BRICKS" of synthetic biology.
One of the most exciting opportunities in synthetic biology is to assemble individual biological parts to achieve new functionality. Because the behavior of each part is, in general, highly nonlinear, simple methods of combination will not be productive. We examine current efforts to define the semantics of the “BRICKS” (Clotho, SBOL, MIRIAM, DICOM) and what is required of the underlying mathematical models that carry the quantitative information on temporal behavior.