9th Annual Rocky Mountain Bioinformatics Conference

Keynote Speakers

Updated November 21, 2011
Judith A. Blake, PhD
Associate Professor
The Jackson Laboratory
Bar Harbor, Maine - USA

CV: http://research.jax.org/faculty/judith_blake.html

Title: "The Future of Research Communication: From Surfing to Deep Diving"

Abstract: The Internet has revolutionized the transmission of data and of knowledge. Scientific journal publishers are reeling under the impact of on-line publishing, the inability to handle very large datasets for peer review, and the emergence of download tracking and commentary post-publication as impact metrics. Electronic management of research data and results provide a mechanism for open access to electronic project directories. Very large data sets are increasingly common, and testing reproducibility of reported results is increasingly difficult to undertake. Nonetheless, ontologies, semantic and accession ID mapping, and author-tagging provide immediate points for on-line integration of scientific results, both experimental and inferred, although the distinction is often missing. For scientists associated with large institutions, access to copyrighted biomedical literature continues unabated although the function of the university library is radically changing. Meanwhile, scientists and citizens without access to journal subscriptions have limited access to publically financed scientific results although plenty of scientific discourse is available on the Web. Open access publication, digital data repositories, and electronic journals will help in the dissemination of scientific research results. I will discuss these topics and the impact of these changes on the scientific enterprise.

Peter Clark, PhD
Vulcan Inc.
Seattle, Washington - USA

CV: www.cs.utexas.edu/users/pclark

Title: Project Halo: Constructing and Exploiting a Formal Representation of a Biology Textbook to Understand and Answer Users' Questions

Abstract: As part of Project Halo at Vulcan Inc, we are building a large-scale, broad-coverage, formal (logic-based) knowledge-base (KB) that represents a substantial portion of the knowledge in an AP-level biology textbook, and supports question interpretation, reasoning, and question answering. Because users pose questions in English, we are not spared the huge challenge of interpreting natural language; however, the KB does provide significant advantages for this task, in particular creating knowledge-based expectations of what would be coherent to ask, that can be used to coerce a user's question into something meaningful to the computer.

In this talk I will describe the project, our approach and progress in constructing the KB, and our successes and challenges in interpreting and answering users' biology questions with it. I will then speculate on the longer-term picture of using the knowledge base to guide interpretation of (parts of) biology texts themselves, with the potential to further expand the KB semi-automatically and ultimately create more knowledgeable machines.
Emek Demir, PhD
Computational Biology Center
Memorial Sloan Kettering Cancer Center
New York, New York USA

CV: http://cbio.mskcc.org/~demir/cv.html

Title: Building Cell Maps: Status and Challenges

Abstract: Advances in molecular technologies have led to rapid generation of data and information about cellular processes at an increasing rate. Current means of knowledge representation and scientific communication in biology cannot adequately deal with the complexity and volume of this information - a serious bottleneck for developing a causal, predictive understanding of the cell.

To address this problem we have developed BioPAX (Biological Pathway Exchange), a standard language for representing and exchanging pathway information. Recently released BioPAX level 3 can represent signaling and metabolic pathways, gene regulation and molecular and genetic interactions in great detail.

The latest version of Pathway Commons, our pathway data integration and aggregation server aims to provide a "merged" network of publicly available databases that support BioPAX. Through an iterative process of aggregation, alignment, matching, and merging- not very different conceptually than putting pieces of a puzzle together - we are building a cell map.

Pathway alignment, or finding similar and/or equivalent portions of two pathways, is a crucial step for this goal. PATCH is an algorithm that can align pathways even when there is missing or omitted knowledge. We have successfully applied Patch to find similar pathways between Reactome and NCI/PID, and detected several curation errors in both databases during this process.

As we build pathway resources and infrastructure it is becoming increasing possible to utilize a knowledge driven approach to biological problems. With a mechanistic understanding of the cellular events, we can better subtype diseases, predict drug responses, and choose combinations of drugs to optimally interfere with the disease. This would be revolutionary, especially for complex, multi-causal diseases such as cancer.

Hans Peter Graf, PhD
NEC Laboratories America, Inc.
Princeton, New Jersey - USA


SOMAmer-based Proteomic Analysis with Machine Learning

- Yanjun Qi, PhD, NEC Laboratories America
- Alexandru Niculescu-Mizil, PhD, NEC Laboratories America
- Igor Durdanovic, NEC Laboratories America
- Shintaro Kato, PhD, NEC Tokyo
- Iwao Waga, PhD, NEC Tokyo
- Alex Stuart, PhD, SomaLogic, Boulder

SOMAmer technology opens the possibility for a low-cost analysis of thousands of plasma proteins from a small sample of blood. Finding reliable biomarkers among such a large number of proteins is challenging since the plasma proteome has a complex composition and conditions can change quickly, affecting the concentrations of proteins we try to interpret.

Finding reliable biomarkers is a problem of feature selection, where we search for combinations of proteins that consistently indicate certain conditions, such as a disease, with high sensitivity and specificity. Typical approaches are univariate algorithms or algorithms that greedily search for groups of proteins. Recently, research has made considerable progress in feature learning where we try learning automatically features that are more indicative of the underlying phenomena. We applied methods based on deep learning and kernel techniques successfully in various image, text and protein analysis applications.

Here we demonstrate several multivariate feature selection algorithms based on L1 regularization and kernel techniques for the detection of biomarkers in data from three different cancer studies. Each data set consists of several hundred samples with between 800 and 1,000 different proteins. Due to the small sample sizes, individual runs of feature learning can be unstable, with variations in the selected proteins from one run to the next. Yet with statistical sampling, groups of proteins are identified reproducibly that provide high sensitivity and selectivity. These results are very encouraging since the ability to analyze large numbers of proteins holds great promise for a wide range of proteomic analysis applications.

Kirk E. Jordan, PhD
Emerging Solutions Executive & Associate Program Director
Computational Science Center
IBM T.J. Watson Research
Member, IBM Academy of Technology
Massachusetts - USA


Title: A New Day, A New High Performance Computer, What Does It Mean for the Life Sciences?

Collaborators: Vipin Sachdeva, Michael Perrone

Abstract: A new day is dawning with IBM’s latest High Performance Computing (HPC) System as we see core counts continue to rise. In this talk, I will briefly describe this system and explain how it fits into the road to Exascale. More importantly, I will describe some of the work underway pertinent to the life science. I will describe some of our experience on this new system including bring up Rosetta, a code that predicts protein structures from amino acid sequences in DNA. In addition, while related to our HPC work, I will describe work continuing to make HPC accessible to a wider audience and eventually targeting the healthcare and life science practitioner directly and explain why this work is of importance.
Michael R. Mehan, PhD
SomaLogic Inc.
Boulder, Colorado - USA

CV: pdf

Title: Advances in Protein Biomarker Discovery: Controlling for Sample Handling Artifacts and Confounding Effects

Collaborators: Rachel Ostroff, Alex Stewart, Glenn Sanders, Dom Zichi, Ed Brody, Steve Williams

Abstract: Many biomarker discovery studies may fail to validate because the clinical population does not represent the intended clinical use or because hidden preanalytic variability in the discovery samples contaminates the apparent disease specific information in the biomarkers. This preanalytic variability can arise from differences in blood sample processing between study sites, or worse, introduce case/control bias in samples collected differently at the same study site. To better understand the effect of different blood sample processing procedures, we evaluated protein measurement bias in a large multi-center lung cancer study. These analyses revealed that perturbations in serum protocols result in changes to many proteins in a coordinated fashion.

Using the SomaLogic SOMAscan platform we developed protein biomarker signatures of processes such as cell lysis, platelet activation, and complement activation and assembled these preanalytic signatures into quantitative multi-dimensional Sample Mapping Vectors (SMV).  The underlying platform technology uses SOMAmers (Slow Off-rate Modified Aptamers) as affinity reagents to quantify approximately 850 proteins. The SMV score provides critical evaluation of both the quality of every blood sample used in discovery, and also enables the evaluation of candidate protein biomarkers for resistance to preanalytic variability.

Anna Panchenko, PhD
Associate Investigator
National Center for Biotechnology Information, NLM, NIH
Maryland - USA

CV: www.ncbi.nlm.nih.gov/CBBresearch/Panchenko/

Title: Deciphering of Human Protein Interactome using Structural Complexes

Abstract: Proteins function by interacting with other biomolecules and knowledge of the entire set of interactions combined with the properties of protein binding sites is essential for our understanding of cellular functions and the origins of many diseases. Recently we developed a method (IBIS) which analyzes and predicts interaction partners and locations of binding sites in proteins based on the evolutionary conservation of binding sites in homologous structural complexes. IBIS imposes a number of rigorous criteria in order to increase the reliability of homology-based inference of interactions and provides binding site annotations for five different types of interaction partners (proteins, small molecules, nucleic acids, peptides and ions). It facilitates the mapping of the entire biomolecular interaction network for a given organism and we use this framework to map the human protein interactome and analyze its properties. We show that structurally inferred interaction network is highly modular and has small-world characteristics. Moreover it is more functionally coherent and reliable compared to high-throughput interaction networks. Since structurally inferred interaction network provides the details of binding interfaces, we analyze the effect of cancer associated point mutations on protein-protein binding. We show that cancer related mutations can either destabilize or make the complex more stable and lead to excessive activation or inactivation.

Cellular regulatory mechanisms provide a sensitive and specific response to external stimuli and such dynamic regulation can be achieved through reversible covalent modifications. We study the effect of phosphorylation on protein binding and function for different types of complexes from the human proteome. Our analysis of molecular mechanisms of phosphorylation shows that phosphorylation may modulate the binding affinity and trigger the transitions between different conformer and oligomeric states. We also show that phosphorylation sites are not only more likely to be evolutionary conserved than surface residues but even more so than the binding interface.

Gabriele Scheler, PhD
Mountain View, California - USA

CV: www.theoretical-biology.org/people/scheler.html

Title: Transfer Function Analysis of Signal Transduction
- The PSF System

Abstract: We present a new approach towards a modular and systematic analysis of biochemical reaction models using a modified steady-state assumption.

An ordinary differential equation (ODE) system for both complex formation and enzymatic reactions is automatically transformed into a set of signal-response transfer functions, called protein signaling functions (PSF). Elementary PSFs represent individual biochemical reactions out of context, systemic PSFs are the transformation of the elementary PSF in the context of a system of equations.

The use of systemic PSFs reduces the complexity of biological signal systems to manageable chunks which allow modular parameter adjustment.

The poster uses two published moderate-sized ODE models on striatal neural plasticity to present the analysis. The models use a different selection of proteins and interactions, but aim to model the same biological system. They are derived from essentially the same experimental data by standard methods of parameter tuning.

Re-analysis of the ODEs as PSF systems allows to directly compare shared components between the systems, such as the centrally important cAMP-PKA-DARPP32 connection. The results show that individual pathway components have become tuned to radically different quantitative transfer functions and concentration ranges, due to the influence of the different system embedding. By tuning transfer functions directly in the PSF system, current methods of system construction can be improved and cross-model consistency becomes achievable.