### Paper Presentation Schedule

All Highlights and Proceedings Track presentations are presented by scientific area part of the combined Paper Presentation schedule.

PP01 (PT) - Simple Topological Properties Predict Functional Misannotations in a Metabolic Network
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 4/5
Presenting author: John Pinney, Imperial College London, United Kingdom

Rodrigo Liberal, Imperial College London, United Kingdom

Session Chair: Erik Bongcam-Rudloff

Presentation Overview: Show/Hide

Motivation: Misannotation in sequence databases is an important obstacle for automated tools for gene function annotation, which rely extensively on comparison to sequences with known function. To improve current annotations and prevent future propagation of errors, sequence-independent tools are therefore needed to assist in the identification of misannotated gene products. In the case of enzymatic functions, each functional assignment implies the existence of a reaction within the organism’s metabolic network; a first approximation to a genome-scale metabolic model can be obtained directly from an automated genome annotation. Any obvious problems in the network, such as dead-end or disconnected reactions, can therefore be strong indications of misannotation. Results: We demonstrate that a machine learning approach using only network topological features can successfully predict the validity of enzyme annotations. The predictions are tested at 3 different levels. A random forest using topological features of the metabolic network and trained on curated sets of correct and incorrect enzyme assignments was found to have an accuracy of up to 86% in 5-fold cross validation experiments. Further cross validation against unseen enzyme superfamilies indicates that this classifier can successfully extrapolate beyond the classes of enzyme present in the training data. The random forest model was applied to several automated genome annotations, achieving an accuracy of 60% in most cases when validated against recent genome-scale metabolic models. We also observe that when applied to draft metabolic networks for multiple species, a clear negative correlation is observed between predicted annotation quality and phylogenetic distance to the major model organism for biochemistry (Escherichia coli for prokaryotes and Homo sapiens for eukaryotes). Contact: j.pinney@imperial.ac.uk

Keyword: Metabolic networks, Network topology, Enzyme function

TOP

PP02 (HT) - Heart Attacks: Leveraging A Cardiovascular Systems Biology Strategy To Predict Future Outcomes
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 7
Presenting author: Carlo Vittorio Cannistraci , King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Timothy Ravasi, King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Enrico Ammirati, San Raffaele Scientific Institute, Vita-Salute San Raffaele University, Italy

Presentation Overview: Show/Hide

Inflammation is likely involved in ST-elevation acute myocardial infarction (STEMI), and patients with STEMI can present with high levels of circulating interleukin-6 (IL6) at the onset of symptoms. We used machine learning techniques to identify characteristic inflammatory cytokine patterns in the blood of emergency-room patients with STEMI, and observed two functional modules characterizing the reciprocal behaviours of the cytokines in patients with high IL6 levels. Next, exploiting reverse engineering techniques, we inferred which cytokines were crucial inside the respective modules. Combining them together with IL6 in a unique formula yielded a risk-index – a kind of composed-biomarker – that outperformed any single cytokine and classical prognostic factors in the prediction of cardiac dysfunction at discharge and death at six months.
Our methodology was considered a translational research innovation for the definition of composed-inflammatory-markers in cardiology, while our findings have potential implications for risk-oriented patient stratification and design of immune-modulating therapies.

Keyword: Applied Bioinformatics, Applied Bioinformatics

TOP

PP03 (HT) - Computational identification of a transiently open L1/S3 pocket for reactivation of mutant p53.
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 14.2
Presenting author: Richard Lathrop , University of California, Irvine, United States

Christopher Wassman, Google Inc., United States
Roberta Baronio, University of California, Irvine, United States
Özlem Demir, University of California, San Diego, United States
Brad Wallentine, University of California, Irvine, United States
Chiung-Kuang Chen, University of California, Irvine, United States
Linda Hall, University of California, Irvine, United States
Faezeh Salehi, University of California, Irvine, United States
Da-Wei Lin, University of California, Irvine, United States
Benjamin Chung, University of California, Irvine, United States
Wesley Hatfield, University of California, Irvine, United States
Richard Chamberlin, University of California, Irvine, United States
Hartmut Luecke, University of California, Irvine, United States
Peter Kaiser, University of California, Irvine, United States
Rommie Amaro, University of California, San Diego, United States

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

The tumour suppressor p53 is the most frequently mutated gene in human cancer. Reactivation of mutant p53 by small molecules is an exciting potential cancer therapy. Although several compounds restore wild-type function to mutant p53, their binding sites and mechanisms of action are elusive. Here computational methods identify a transiently open binding pocket between loop L1 and sheet S3 of the p53 core domain. Mutation of residue Cys124, located at the centre of the pocket, abolishes p53 reactivation of mutant R175H by PRIMA-1, a known reactivation compound. Ensemble-based virtual screening against this newly revealed pocket selects stictic acid as a potential p53 reactivation compound. In human osteosarcoma cells, stictic acid exhibits dose-dependent reactivation of p21 expression for mutant R175H more strongly than does PRIMA-1. These results indicate the L1/S3 pocket as a target for pharmaceutical reactivation of p53 mutants.

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP

PP04 (PT) - Stability selection for regression-based models of transcription factor-DNA binding specificity
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 4/5
Presenting author: Fantine Mordelet , Duke University, United States

John Horton, Duke University, United States
Alexander Hartemink, Duke University, United States
Barbara Engelhardt, Duke University, United States
Raluca Gordan, Duke University, United States

Session Chair: Erik Bongcam-Rudloff

Presentation Overview: Show/Hide

Motivation: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix (PWM) model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. Results: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max, and Mad2) in their native genomic context. These high-throughput, quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar PWMs, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step towards better sequence-based models of individual TF-DNA binding specificity.

Availability: Our code is available at http://genome.duke.edu/labs/ gordan/ISMB2013. The PBM data used in this paper are available in the Gene Expression Omnibus under accession number GSE44604.

Keyword: DNA binding specificity, Regression models, LASSO, Protein binding microarr

TOP

PP05 (HT) - Of Men and Not Mice: Comparative Genomic Analysis of Human Diseases and Mouse Models
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 7
Presenting author: Wenzhong Xiao , Massachusetts General Hospital/Harvard Medical School and Stanford University, United States

Presentation Overview: Show/Hide

A cornerstone of modern biomedical research is the use of mouse models to explore basic disease mechanisms, evaluate new therapeutic approaches, and make decisions to carry new drug candidates forward into clinical trials. However, few of these human trials have shown success. Here we systematically compared the genomic response from publically available datasets of patients of different acute inflammatory diseases and corresponding murine models, and show that, although inflammation from different etiologies result in highly similar genomic responses in humans, the responses in mouse models correlate poorly with the human disease and also one another. Among genes changed significantly in humans, the murine orthologs are close to random in matching their human counterparts. In addition to improvements in the current animal model systems, our study supports higher priority for translational research to focus on the more complex human conditions rather than relying on mouse models to study human inflammatory diseases.

Keyword: Disease Models & Epidemiology, Evolution & Comparative Genomics

TOP

PP06 (HT) - Virtual ligand screening against comparative structural models of membrane transporters
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 14.2
Presenting author: Avner Schlessinger , Mount Sinai School of Medicine, United States

Ethan Geier, University of California, San Francisco, United States
Hao Fan, University of California, San Francisco, United States
Jonathan Gable, University of California, San Francisco, United States
John Irwin, University of California, San Francisco, United States
Kathleen Giacomini, University of California, San Francisco, United States
Andrej Sali, University of California, San Francisco, United States

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

We describe a structure-based discovery approach to identify small molecule ligands for pharmacologically important membrane proteins. Here, we focus on LAT-1, a transporter of amino acids, thyroid hormones, and prescription drugs that is highly expressed in the blood-brain-barrier (BBB) and various types of cancer. LAT-1 is important for cancer development as well as for mediating drug and nutrient delivery across the BBB, making it a key drug target. We identify four LAT-1 ligands, including one chemically novel substrate, by comparative modeling, virtual screening, and experimental testing. These results may rationalize the enhanced brain permeability of two drug-like molecules, including the anti-cancer agent acivicin. Two of our hits inhibited proliferation of a cancer cell-line by distinct molecular mechanisms, providing useful chemical tools to characterize the role of LAT-1 in cancer metabolism. Finally, our integrated approach is generally applicable to characterization of other protein families and their interactions with small molecule ligands.

Keyword: Protein Structure & Function, Applied Bioinformatics

TOP

PP07 (PT) - A Graph Kernel Approach for Alignment-Free Domain-Peptide Interaction Prediction with an Application to Human SH3 Domains
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: Hall 4/5
Presenting author: Kousik Kundu , University of Freiburg, Germany

Fabrizio Costa, University of Freiburg, Germany
Rolf Backofen, University of Freiburg, Germany

Session Chair: Erik Bongcam-Rudloff

Presentation Overview: Show/Hide

State-of-the-art experimental data for determining binding specificities of peptide recognition modules (PRMs) is obtained by high-throughput approaches like peptide arrays. Most prediction tools applicable to this kind of data are based on an initial multiple alignment of the peptide ligands. Building an initial alignment can be error-prone, especially in the case of the proline-rich peptides bound by the SH3 domains. Here we present a machine learning approach based on an efficient graph-kernel technique to predict the specificity of a large set of 70 human SH3 domains, which are a very important class of PRMs. The graph-kernel strategy allows us to 1) integrate several types of physico-chemical information for each amino acid, 2) consider high order correlations between these features and 3) eliminate the need for an initial peptide alignment. We build specialized models for each human SH3 domain and achieve competitive predictive performance of 0.73 area under precision-recall curve (AUC PR), compared to 0.27 AUC PR for state-of-the-art methods based on position weight matrices. We show that better models can be obtained when we use information on the on-interacting peptides (negative examples), which is currently not used by the state-of-the art approaches based on position-weight matrices. To this end, we analyze two strategies to identify subsets of high confidence negative data. The techniques introduced here are more general and hence can also be used for any other protein domains which interact with short peptides (i.e., other PRMs).

Keyword: PRM: Protein Recognition module, SVM: Support Vector Machine, PWM: Position Weight

TOP

PP08 (HT) - Impact of genetic dynamics and single-cell heterogeneity on development of nonstandard personalized medicine strategies for cancer
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: Hall 7
Presenting author: Chen-Hsiang Yeang , Academia Sinica, Taiwan

Robert Beckman, University of California, San Francisco, United States
Gunter Schemmann, World Water and Solar Technologies, United States

Presentation Overview: Show/Hide

Cancers are heterogeneous and genetically unstable. Current practice of personalized medicine tailors therapy to heterogeneity between cancers of the same organ type. However, it does not yet systematically address heterogeneity within a single individual’s cancer. We developed a mathematical model of personalized cancer therapy incorporating genetic evolutionary dynamics and single-cell heterogeneity, and examined simulated clinical outcomes. Analyses of an illustrative case and a virtual clinical trial of over 3 million evaluable “patients” demonstrate that augmented nonstandard personalized medicine strategies may lead to superior outcomes compared with the current personalized medicine approach. Current personalized medicine matches generally focuses on the average, static, and current properties of the sample. In contrast, nonstandard strategies also consider minor subclones, dynamics, and predicted future tumor states. Our methods allow systematic study and evaluation of nonstandard personalized medicine strategies. These findings may, in turn, suggest global adjustments and enhancements to translational oncology research paradigms.

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP

PP09 (HT) - Extensive changes in DNA methylation are associated with expression of mutant huntingtin
Date: Sunday, July 21, 11:30 a.m. - 11:55 p.m.Room: Hall 14.2
Presenting author: Christopher Ng , Massachusetts Institute of Technology, United States

Ferah Yildirim, Massachusetts Institute of Technology, United States
Yoon Yap, Massachusetts Institute of Technology, United States
Simona Dalin, Massachusetts Institute of Technology, United States
Bryan Matthews, Massachusetts Institute of Technology, United States
Patricio Velez, Massachusetts Institute of Technology, United States
Ernest Fraenkel, Massachusetts Institute of Technology, United States
David Housman, Massachusetts Institute of Technology, United States

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

With technological advances, it is becoming increasingly clear that DNA methylation has a role in wide range of biological processes, including neuronal activity, learning, and memory. In this paper, we explored the hypothesis that DNA methylation is altered in Huntington’s disease and used reduced representation bisulfite sequencing (RRBS) to map sites of DNA methylation in cells carrying either wild-type or mutant huntingtin (HTT). We found that a large fraction of the genes that change in expression in the presence of mutant HTT demonstrate significant changes in DNA methylation. Regions with low CpG content, which have previously been shown to undergo methylation changes in response to neuronal activity, were disproportionately affected. Using motif analysis, we identified transcriptional regulators associated with DNA methylation changes, and we confirmed these hypotheses using genome-wide chromatin immunoprecipitation sequencing (ChIP-Seq). Our findings suggest new mechanisms for the effects of polyglutamine-expanded HTT on DNA methylation and transcriptional dysregulation.

Keyword: Gene Regulation & Transcriptomics, Applied Bioinformatics

TOP

PP10 (HT) - Systems-based metatranscriptomic analysis
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: Hall 7
Presenting author: Xuejian Xiong , Hospital for Sick Children, Canada

John Parkinson, Hospital For Sick Children, Canada
Daniel Frank, University of Colorado, United States
Charles Robertson, University of Colorado, United States
Stacy Hung, Hospital for Sick Children, Canada
Janet Markle, Hospital for Sick Children, Canada
Jayne Danska, Hospital for Sick Children, Canada
Philippe Poussier, Sunnybrook Health Sciences Centre Research Institute, Canada
Kathy McCoy, University of Bern, Switzerland
Andrew MacPherson, University of Bern, Switzerland

Presentation Overview: Show/Hide

The emerging science of metagenomics is transforming our understanding of the relationships of microbes with their environments. Moving beyond cataloguing the organisms and genes present, metatranscriptomics offers the exciting prospect of providing a more mechanistic understanding of these relationships. Exploiting metatranscriptomic data from microbiomes of increasing complexity, generated using the Illumina platform, we are developing novel software pipelines to process and interpret these datasets. Key to these analyses is adopting a protein-protein interaction and other systems datasets as frameworks onto which metatranscriptomic data may be integrated and interpreted. In this presentation I will outline some of the significant challenges we have encountered in analysing metatranscriptomic data generated by next generation sequencing platforms and discuss how these challenges are may be addressed.

Keyword: Sequence Analysis, Applied Bioinformatics

TOP

PP11 (HT) - Metabolic phenotypic analysis uncovers reduced proliferation associated with oxidative stress in progressed breast cancer
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: Hall 14.2
Presenting author: Livnat Jerby Arnon , Tel Aviv University, Israel

Lior Wolf, Tel Aviv University, Israel
Carsten Denkert, Charité Hospital, Germany
Gideon Y Stein, Beilinson Hospital, Rabin Medical Center, Israel
Mika Hilvo, VTT Technical Research Centre of Finland, Finland
Matej Oresic, VTT Technical Research Centre of Finland, Finland
Tamar Geiger, Tel Aviv University, Israel
Eytan Ruppin, Tel Aviv University, Israel

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

The importance of metabolic reprogramming in cancer is being increasingly recognized. However, whole metabolic flux measurements in cancer are still scarce. Hence, we developed a novel Metabolic Phenotypic Analysis (MPA) method that profiles the metabolic phenotype of a tumor based on its gene or protein expression. We applied MPA to conduct the first genome-scale study of breast cancer metabolism based on the gene expression of a large cohort of cell lines and clinical samples. The modeling correctly predicted cell lines' growth rates, tumor lipid levels, and amino acid biomarkers, outperforming other metabolic modeling methods. MPA revealed that the tumor proliferation decreases as it evolves metastatic capability. We experimentally validated this "go or grow" dichotomy in-vitro, and linked the proliferation decrease to oxidative stress. Finally, we found fundamental metabolic differences between estrogen receptor (ER)+ and ER- tumors. These findings provide new insights into core metabolic aberrations in breast cancer.

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP

PP12 (HT) - Mapping the Strategies of Viruses Hijacking Human Host Cells – An Experimental and Computational Comparative Study
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 4/5
Presenting author: Jacques Colinge , CeMM, Austria

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

It is well known that viral proteins interfere with the innate immune system of the infected host to block detection or prevent response. Is it all what viruses do to human cells? Do they share common strategies? In a pan viral study mapping by mass spectrometry the protein interactions of 70 viral proteins from 30 viruses known to modulate the innate immune system, we tried to answer these questions. In particular, we found that viruses reprogram a broad range of biological functions through interactions with multifunctional general regulators. We proposed that size-limited virus genomes dictate such strategies, which we could support by comparing the functional and human interactome impact of diverse viral proteins showing non redundancy among a single genome and convergent evolution within virus families. In recent work, we are focusing on a smaller number of viruses whose host interactions have been mapped for almost all their proteins.

Keyword: Protein Interactions & Molecular Networks, Mass Spectrometry & Proteomics

TOP

PP13 (HT) - Ultrashort and progressive 4sU-tagging reveals key characteristics of RNA processing at nucleotide resolution
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 7
Presenting author: Caroline Friedel , Ludwig-Maximilians-Universität München, Germany

Lukas Windhager, Ludwig-Maximilians-Universität München, Germany
Thomas Bonfert, Ludwig-Maximilians-Universität München, Germany
Kaspar Burger, Helmholtz-Zentrum München, Germany
Zsolt Ruzsics, Ludwig-Maximilians-Universität München, Germany
Stefan Krebs, Ludwig-Maximilians-Universität München, Germany
Stefanie Kaufmann, Ludwig-Maximilians-Universität München, Germany
Georg Malterer, Ludwig-Maximilians-Universität München, Germany
Anne L’Hernault, University of Cambridge, United Kingdom
Markus Schilhabel, Christian-Albrechts-Universität Kiel, Germany
Stefan Schreiber, Christian-Albrechts-Universität Kiel, Germany
Philip Rosenstiel, Christian-Albrechts-Universität Kiel, Germany
Ralf Zimmer, Ludwig-Maximilians-Universität München, Germany
Dirk Eick, Helmholtz-Zentrum München, Germany
Lars Dölken, University of Cambridge, United Kingdom

Session Chair: Ivo Hofacker

Presentation Overview: Show/Hide

Metabolic tagging of newly transcribed RNA by 4-thiouridine (4sU) can reveal the relative contributions of RNA synthesis and decay rates. Recently, we showed that ultra-short 4sU-tagging combined with RNA-seq determines global RNA processing kinetics at nucleotide resolution. This allowed identification of classes of rapidly and slowly spliced/degraded introns characterized by a distinct association with intron length, gene length and splice site strength. For one class of introns, we also observed long lasting retention in the primary transcript, but efficient secondary splicing/degradation at later time points. Finally, we showed that processing of most small nucleolar (sno)RNA-containing introns is remarkably inefficient with the majority of introns being spliced and degraded rather than processed into mature snoRNAs. In summary, our study yielded unparalleled insights into the kinetics of RNA processing and provides the tools to study molecular mechanisms of RNA processing and their contribution to gene expression regulation at the nucleotide level.

Keyword: Gene Regulation & Transcriptomics, other

TOP

PP14 (HT) - Identifying differentially expressed transcripts from RNA-seq data with biological variation
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 14.2
Presenting author: Peter Glaus , University of Manchester, United Kingdom

Antti Honkela, University of Helsinki, Finland
Magnus Rattray, University of Manchester, United Kingdom

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Analysing RNA-seq data poses multiple challenges due to base mismatches, non-uniform read distribution, reads shared by multiple splice variants and other factors which make the expression analysis especially difficult. The BitSeq method uses a Bayesian approach to model the read generation and sequencing processes and infers expression estimates of individual transcripts. Transcript expression levels can be used to obtain more accurate gene expression estimates, in comparison to popular count based methods, or for identifying differentially expressed transcripts or genes. Our differential expression model combines the uncertainty of the expression estimates with variances estimated from biologically replicated experiments to identify significantly differentially expressed transcripts with improved precision.
We present advantages of using BitSeq in RNA-seq datasets dealing with multi-mapping reads and non-uniform read distribution. Experiments with real and synthetic datasets show that BitSeq produces state-of-the-art results in both expression estimation and differential expression analysis.

Keyword: Gene Regulation & Transcriptomics, Sequence Analysis

TOP

PP15 (PT) - Multi-task learning for Host-Pathogen protein interactions
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 4/5
Presenting author: Meghana Kshirsagar , Carnegie Mellon University , United States

Jaime Carbonell, Carnegie Mellon University, United States
Judith Klein-Seetharaman, University of Pittsburgh School of Medicine, United States

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

Motivation: An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host-pathogen interactions in several diseases in order to build stronger predictive models. Our approach is based on a formalism from machine-learning called multi-task learning', which considers the problem of building models across tasks that are related to each other. A task' in our scenario is the set of host-pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks. Results: Our current work on host-pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multi-task learning technique we develop uses a task based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex-Concave procedure based algorithm. We compare our integrative approach to baseline methods that build models on a single host-pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyse the protein interaction predictions generated by the models, and find some interesting insights.

Keyword: Host-pathogen protein interaction, multi-task learning, machine learning, bacteria hu

TOP

PP16 (HT) - Gene expression anti-profiles as a basis for accurate universal cancer signatures
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 7
Presenting author: Hector Corrada Bravo , University of Maryland, United States

Session Chair: Ivo Hofacker

Presentation Overview: Show/Hide

Gene expression anti-profiles are a new computational approach for developing cancer genomic signatures that specifically take advantage of gene expression heterogeneity. This presentation will describe the biological basis for this method derived from experimental findings suggesting that stochastic across-sample hyper-variability in the expression of specific genes is a stable and general property of cancer. Application of this methodology in screening patients for colon cancer based on expression measurements obtained from peripheral blood samples will be presented. We will also present results from development of a universal cancer anti-profile that accurately distinguishes cancer from normal regardless of tissue type. This method uses single-chip normalization and quality assessment methods so no further retraining of signatures would be required before their application in clinical settings. These results suggest that anti-profiles may be used to develop inexpensive and non-invasive universal cancer screening tests.

Keyword: Applied Bioinformatics, Gene Regulation & Transcriptomics

TOP

PP17 (PT) - GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 14.2
Presenting author: Wei Wang, UCLA, United States

Shunping Huang, UNC Chapel Hill, United States
Jack Wang, UNC Chapel Hill, United States
Xiang Zhang, Case Western Reserve University, United States
Fernando Pardo Manuel De Villena, UNC Chapel Hill, United States
Leonard McMillan, UNC Chapel Hill, United States
Zhaojun Zhang, UNC Chapel Hill, United States

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Motivation: RNA-seq techniques provide an unparalleled means for exploring a transcriptome with deep coverage and base pair level resolution. Various analysis tools have been developed to align and assemble RNA-seq data, such as the widely used TopHat/Cufflinks pipeline. A common observation is that a sizable fraction of the fragments/reads align to multiple locations of the genome. These multiple alignments pose substantial challenges to existing RNA-seq analysis tools. Inappropriate treatment may result in reporting spurious expressed genes (false positives), and missing the real expressed genes (false negatives). Such errors impact the subsequent analysis, such as differential expression analysis. In our study, we observe that about 3.5% of transcripts reported by TopHat/Cufflinks pipeline correspond to annotated nonfunctional pseudogenes. Moreover, about 10.0% of reported transcripts are not annotated in the Ensembl database. These genes could be either novel expressed genes or false discoveries. Results: We examine the underlying genomic features that lead to multiple alignments and investigate how they generate systematic errors in RNA-seq analysis. We develop a general tool, GeneScissors, which exploits machine learning techniques guided by biological knowledge to detect and correct spurious transcriptome inference by existing RNA-seq analysis methods. In our simulated study, GeneScissors can predict spurious transcriptome calls due to misalignment with an accuracy close to 90%. It provides substantial improvement over the widely used TopHat/Cufflinks or MapSplice/Cufflinks pipelines in both precision and F-measurement. On real data, GeneScissors reports 53.6% less pseudogenes and 0.97% more expressed and annotated transcripts, when compared with the TopHat/Cufflinks pipeline. In addition, among the 10.0% unannotated transcripts reported by TopHat/Cufflinks, GeneScissors finds that more than 16.3% of them are false positives. Availablility: The software can be downloaded at http://csbio.unc.edu/genescissors/

Keyword: Pseudogene, RNA-seq, RNA-seq Alignment, RNA-seq Assembling

TOP

PP18 (HT) - A Conserved Map of Genetic Interactions Induced by DNA Damage
Cancelled
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 4/5
Presenting author: Rohith Srivas , University of California, San Diego, United States

Aude Guenole, Leiden University Medical Center, Netherlands
Kees Vreeken, Leiden University Medical Center, Netherlands
Ze Zhong Wang, University of California, San Diego, United States
Shuyi Wang, University of California, San Francisco, United States
Nevan Krogan, University of California, San Francisco, United States
Trey Ideker, University of California, San Diego, United States
Haico van Attikum, Leiden University Medical Center, Netherlands

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

To protect the genome, cells have evolved a diverse set of pathways designed to sense, signal, and repair multiple types of DNA damage. To assess the degree of coordination and crosstalk among these pathways, we systematically mapped changes in the cell’s genetic network across a panel of differentDNA-damaging agents, resulting in ~1,800,000 differential measurements. Each agent was associated with a distinct interaction pattern, which, unlike single-mutant phenotypes or gene expression data, has high statistical power to pinpoint the specific repair mechanisms at work. The agent specific networks revealed roles for the histone acetyltranferase Rtt109 in the mutagenic bypass ofDNA lesions and the neddylation machinery in cell cycle regulation and genome stability, while the network induced by multiple agents implicatesIrc21, an uncharacterized protein, in checkpoint control and DNA repair. Our multiconditional genetic interaction map provides a unique resource that identifies agent-speciﬁc and general DNA damage
response pathways.

Keyword: Protein Interactions & Molecular Networks, Disease Models & Epidemiology

TOP

PP19 (HT) - Newborn screening for SCID identifies patients with ataxia telangiectasia
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 7
Presenting author: Steven Brenner , University of California, Berkeley, United States

Jacob Mallott, UCSF, United States
Antonia Kwan, UCSF, United States
Joseph Church, USC, United States
Diana Gonzalez, UCSF, United States
Fred Lorey, Public Health Institute, United States
Ling Tang, UCSF, United States
Rajgopal Srinivisan, Tata Conservancy Service, India
Sadhna Rana, Tata Conservancy Service, India
Uma Sunderam, Tata Conservancy Service, India

Session Chair: Ivo Hofacker

Presentation Overview: Show/Hide

Severe combined immunodeficiency (SCID) is characterized by failure of T lymphocyte development. Newborn screening to identify SCID is now performed in several states. In addition to infants with typical SCID, screening identifies infants with T lymphocytopenia who appear healthy and in whom a SCID diagnosis cannot be confirmed. Deep sequencing was employed to find causes of T lymphocytopenia in such infants. Whole exome sequencing and analysis were performed in infants and their parents. Upon finding deleterious mutations in the ataxia telangiectasia mutated (ATM) gene, we confirmed the diagnosis of ataxia telangiectasia (AT) in two infants. AT is usually not diagnosed until much later in life, after symptoms are manifest. Although there is no current cure for the progressive neurological impairment of AT, early detection permits avoidance of infectious complications, while providing information for families regarding reproductive recurrence risks and increased cancer risks in patients and carriers.

Keyword: Sequence Analysis, Disease Models & Epidemiology

TOP

PP20 (PT) - Poly(A) motif prediction using spectral latent features from human DNA sequences
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 14.2
Presenting author: Bo Xie , Georgia Institute of Technology, United States

Boris Yankovic, King Abdullah University of Science and Technology
Vladimir Bajic, King Abdullah University of Science and Technology
Le Song, Georgia Institute of Technology, United States
Xin Gao, King Abdullah University of Science and Technology

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA. Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge. Results: We propose a novel machine learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we employed hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine tune the classification performance. We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14,740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of previous state-of-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false negative rate and false positive rate by 26%, 15% and 35%, respectively. Meanwhile, our method made about 30% fewer error predictions relative to the other string kernels. Furthermore, our method can be used to visualize the importance of oligomers and positions in predicting poly(A) motifs, from which we can observe a number of characteristics in the surrounding regions of true and false motifs that have not been reported before. Availability: website:http://sfb.kaust.edu.sa/Pages/Software.aspx

Keyword: Poly(A) motif, classification, generative learning, discriminativ

TOP

PP21 (HT) - Synthetic lethality between gene defects affecting a single non-essential molecular pathway with reversible steps
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 4/5
Presenting author: Inna Kuperstein , Institut Cuire, France

Andrei Zinovyev, Institut Curie, France
Emmanuel Barillot, Institut Curie, France
Wolf-Dietrich Heyer, University of California, Davis, United States

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

Synthetic lethality (SL) is a framework to decipher molecular pathways and to develop new treatment strategies. The canonical explanation of SL considers two genes functioning in parallel, mutually compensatory pathways, the between-pathway SL. We classify all known types of synthetic lethal interactions and propose a novel mechanism of SL in a single pathway. The new within-reversible-pathway SL (wrpSL) involves pathway with reversible steps and kinetic trapping of a toxic intermediate or of an essential resource. Mathematical modeling recapitulates the possibility of kinetic trapping leading to lethality and reveals the potential contributions of synthetic dosage and positive masking interactions in a single pathway. Experimental data with Homologous Recombination DNA repair pathway validate the concept. Analysis of yeast gene interactions and pathways suggests broad applicability of this novel concept in many biological processes. These observations extend the interpretation of synthetic lethality and contribute to pathways reconstruction and therapeutic approach improvement.

Keyword: Protein Interactions & Molecular Networks, Disease Models & Epidemiology

TOP

PP22 (HT) - BioJS: An Open Source JavaScript Framework for Biological Data Visualization. Bioinformatics
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 7
Presenting author: Manuel Corpas , The Genome Analysis Centre, United Kingdom

John Gómez, EBI, United Kingdom
Leyla García, EBI, United Kingdom
Gustavo Salazar, University of Cape Town, South Africa
Jose Villaveces, Max Planck Institute, Germany
Swanand Gore, EBI, United Kingdom
Alexander García, Florida State University, United States
Maria Martín, EBI, United Kingdom
Guillaume Launay, Lyon1 University, France
Rafael Alcántara, EBI, United Kingdom
Noemi Del Toro Ayllón, EBI, United Kingdom
Marine Dumousseau, EBI, United Kingdom
Sandra Orchard, EBI, United Kingdom
Sameer Velankar, EBI, United Kingdom
Henning Hermjakob , EBI, United Kingdom
Chenggong Zong, UCLA, United States
Peipei Ping, UCLA, United States
Rafael Jiménez, EBI, United Kingdom

Session Chair: Ivo Hofacker

Presentation Overview: Show/Hide

This presentation first sets the scene for the problem: dynamic web visualization of bioinformatics, which depends heavily on JavaScript, has no coordination of efforts to date. Available applications in JavaScript are difficult to discover, develop, test, maintain, use, customize, extend or combine. BioJS provides a common specification to document, develop and register JavaScript graphical components in bioinformatics. Next, I will briefly talk about how components are developed to comply with our purposely-defined implementation guidelines. The rest of the talk is mostly taken by a practical demonstration of representative functionalities already available in the BioJS registry. Examples include a) the Sequence component to visualize proteins in fasta format in a variety of ways, b) the GeneExpressionSummary that links genes to phenotypes, c) the ChEBICompound and d) the InteractionTable. To conclude, I briefly show the portal for the project, how to contribute to this effort and who is involved.

Keyword: Bioimaging & Data Visualization, Databases & Ontologies

TOP

PP23 (PT) - Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 14.2
Presenting author: Dina Hafez , Duke University, United States

Uwe Ohler, Max Delbrück Center for Molecular Medicine, Germany
Jun Zhu, National Institutes of Health
Ting Ni, National Institutes of Health
Sayan Mukherjee, Duke University, United States

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Motivation: Pre-mRNA cleavage and polyadenylation is an essential step for 3' end maturation, and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage site (polyA site), which are frequently constrained by sequence content and position. More than 50\% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with varying 3'UTRs, thus affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries. Results: We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three adult cell types. We specified a linear effects regression model to identify tissue-specific biases indicating regulated alternative polyadenylation; the significance of differences between cell types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual cell types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6\%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical PAS signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation.

Keyword: Alternative polyadenylation, tissue-specific regulation, RNA-seq, predictive mo

TOP

PP24 (HT) - Efficient Computation of Gene Tree Probability based on Coalescent Theory under Incomplete Lineage Sorting
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 4/5
Presenting author: Yufeng Wu , University of Connecticut, United States

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

Incomplete lineage sorting is a genealogical phenomenon that is caused by the inherent stochasticity of population genealogical processes. With incomplete lineage sorting, gene tree topologies may be different from the species tree topologies and thus may potentially cause difficulty in inferring species phylogeny or population evolutionary history. An established topic in incomplete lineage sorting is computing the probability (called gene tree probability) of a gene tree topology for a given species tree based on coalescent theory. However, previously there exists no practical algorithm for computing the gene tree probability for large trees. Gene tree probability is. In this talk, I will present an algorithm for computing the gene tree probability. This algorithm is much faster than an existing algorithm and can be applied to larger trees. Thus, this new algorithm may be useful in large-scale phylogenetics study.

Keyword: Population Genomics, Evolution & Comparative Genomics

TOP

PP25 (PT) - Predicting protein contact map using evolutionary and physical constraints by integer programming
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 7
Presenting author: Jinbo Xu , Toyota Technological Institute at Chicago, United States

Zhiyong Wang, Toyota Technological Institute at Chicago

Session Chair: Alex Bateman

Presentation Overview: Show/Hide

Motivation. Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains very challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole contact map. A couple of recent methods predict contact map by using mutual information (MI) and enforcing a sparsity restraint (i.e., the contact matrix shall be very sparse), but these methods demand for a very large number of sequence homologs and the resultant contact map may be still physically infeasible. Results. This paper presents a novel method for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming (ILP). The evolutionary restraints are much more informative than MI and the physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and thus, significantly improves prediction accuracy. Experimental results show that our method outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration.

Keyword: Protein contact map prediction, integer programming, physical constraint, evolutio

TOP

PP26 (HT) - Interpreting Personal Transcriptomes: Personalized Mechanism-Scale Profiling Predicts Survival in Oral, Prostate, Lung and Gastric Cancers
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 14.2
Presenting author: Yves Lussier , The University of Illinois, United States

Xinan Yang, The University of Chicago, United States
Kelly Regan, Ohio State University, United States
Yong Huang, The University of Chicago, United States
Jianrong Li, The University of Illinois at Chicago, United States
Ezra Cohen, The University of Chicago, United States
Tanguy Zeiwert, The University of Chicago, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

Gene expression signatures that are predictive of therapeutic response or prognosis are increasingly useful in clinical care; however, mechanistic interpretation of expression arrays remains an unmet challenge. We developed a novel approach to generate “personal mechanism signatures” of molecular pathways and functions from gene expression arrays. FAIME, the Functional-Analysis-of-Individual-Microarray-Expression, computes mechanism scores using rank-weighted gene expression of an individual sample. In oral squamous cell carcinoma samples, the overlap of “Oncogenic Mechanisms of OSCC” (deregulated FAIME-derived scores of pathways and biological functions) accurately discriminate clinical samples in two additional datasets (n=35;91, F-accuracy=100%;97%, p<0.001), and predicts patients’ survival in two studies (p=0.0018;p=0.032). Previous approaches depending on group assignment of individual samples before selecting features or learning a classifier are limited by design to discrete-class prediction. FAIME is more amenable for clinical deployment since it translates the gene-level measurements of each given sample into pathways and molecular function profiles that can be applied to analyze continuous phenotypes(e.g. survival-time).

Keyword: Applied Bioinformatics, Databases & Ontologies

TOP

PP27 (HT) - Deconvolution of targeted protein-protein interaction maps
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: ICC Lounge 81
Presenting author: Alexey Stukalov , CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Austria

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

Guided by current knowledge on the modular structure of protein complexes, we propose BI-MAP, a novel statistical approach to analyze targeted medium-scale affinity purification-mass spectrometry (AP-MS) datasets. It allows confidently identifying protein modules, i.e. groups of proteins in strong interaction that are shared by multiple complexes. We show that BI-MAP can be applied from small and very detailed maps to large, sparse, and much noisier datasets. In the latter case, the analysis of the inferred posterior distribution helps identifying robust components that frequently recur in the most probable data models. Detailed performance analysis shows that BI-MAP clearly outperforms alternative algorithms addressing the same problem. A new graphical grammar representing the inferred modules and their interactions provides a convenient visual representation of the very complex underlying data that facilitates data interpretation by biologists. BI-MAP is open source with exports to R, Cytoscape and GraphML.

Keyword: Protein Interactions & Molecular Networks, Mass Spectrometry & Proteomics

TOP

PP28 (PT) - IBD-Groupon : An Efficient Method for Detecting Group-wise Identity-by-Descent regions simultaneously in Multiple Individuals based on Pairwise IBD relationships
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 4/5
Presenting author: Dan He , IBM T.J. Watson, United States

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

Detecting Identity-by-Descent (IBD) is a very important problem in genetics. Most of the existing methods focus on detecting pairwise IBDs, which have relatively low power to detect short IBDs. Methods to detect IBDs among multiple individuals simultaneously, or group-wise IBDs, have better performance for short IBD detection. In the meanwhile group-wise IBDs can be applied to a wide range of applications such as disease mapping, pedigree reconstruction, etc. The existing group-wise IBD detection method is computationally inefficient and is only able to handle small data sets such as 20, 30 individuals with hundreds of SNPs. It also requires a prior specification of the number of IBD groups, which may not be realistic in many cases. The method can only handle small number of IBD groups such as two or three due to scalability issue. What's more, it does not take LD into consideration. In this work, we developed a very efficient method \textit{IBD-Groupon}, which detects group-wise IBDs based on pairwise IBD relationships and it is able to address all the drawbacks mentioned above. To our knowledge, our method is the first group-wise IBD detection method that is scalable to very large data sets, for example, hundreds of individuals with thousands of SNPs, and in the meanwhile is powerful to detect short IBDs. Our method does not need to specify the number of IBD groups, which will be detected automatically. And our method takes LD into consideration as it is based on pairwise IBDs where LD can be easily incorporated.

Keyword: Identity-by-Descent, HMM, MCMC, group-wise IBD

TOP

PP29 (PT) - ThreaDom: Extracting Protein Domain Boundary Information from Multiple Threading Alignments
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 7
Presenting author: Zhidong Xue , University of Michigan, United States

Dong Xu, University of Michigan
Yan Wang, University of Michigan
Yang Zhang, University of Michigan

Session Chair: Alex Bateman

Presentation Overview: Show/Hide

Motivation: Protein domains are subunits that can fold and function independently. Identification of domain boundary locations is often the first step in protein folding and function annotations. Most of the current methods deduce domain boundaries by sequence-based analysis where accuracy is low. There is no efficient method for predicting discontinuous domains that consist of segments from separated sequences. Since template-based methods are most efficient for protein 3D structure modeling, combining multiple threading alignment information should increase the accuracy and reliability of computational domain predictions. Result: We develop a new domain predictor, ThreaDom, which deduces protein domain boundary locations based on multiple threading alignments. The core of the method development is the derivation of a domain conservation score that combines composite information from template domain structures and terminal and internal alignment gaps. Tested on 630 non-redundant sequences, without using homologous templates ThreaDom generates correct single- and multi-domain classifications in 81% of cases where 78% have the domain linker location assigned within 20 residues. In a second test on 486 proteins with discontinuous domains, ThreaDom achieves an average precision 84% and a recall 65% in domain boundary prediction. Finally, ThreaDom was examined on 56 targets from CASP8 and had a domain overlap rate 73%, 87% and 85% with the target structure for Free Modeling, Hard multiple-domain and discontinuous domain proteins, respectively, which are significantly higher than most of the domain predictors in the CASP8 experiment.

Keyword: Protein structure prediction, Protein domain prediction, Threading, CASP experim

TOP

PP30 (HT) - Compressive Genomics
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 14.2
Presenting author: Michael Baym , Harvard Medical School, United States

Po-Ru Loh, MIT, United States
Bonnie Berger, MIT, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

The past two decades have seen an exponential increase in sequencing capabilities, outstripping advances in computing power. Extracting new insights from the datasets currently being generated will require not only faster computers; it will require smarter algorithms. However, most genomes currently sequenced are highly similar to ones already collected; thus the amount of novel sequence information is growing much more slowly. We show that this redundancy can be exploited by compressing the data so as to allow direct computation on the compressed data. This approach reduces the computational task of operating on many similar genomes to slightly more than that of operating on just one. Moreover, its relative advantage over existing algorithms grows with the accumulation of future genomic data. We demonstrate this compressive architecture by implementing versions of both BLAST and BLAT, and emphasize how compressive genomics, more generally, will enable biologists to keep pace with current data.

Keyword: Sequence Analysis, Databases & Ontologies

TOP

PP31 (PT) - Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: ICC Lounge 81
Presenting author: Carlo Vittorio Cannistraci , King Abdullah University of Science and Technology, Saudi Arabia

Gregorio Alanis-Lobato, King Abdullah University of Science and Technology
Timothy Ravasi, King Abdullah University of Science and Technology

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

Motivation: Most functions within the cell emerge thanks to protein-protein-interactions (PPIs), yet their experimental determination is both expensive and time consuming. PPI-networks present signifi-cant levels of noise and incompleteness. Prediction of interactions using solely PPI-network-topology (topological prediction) is difficult but essential when biological prior-knowledge is absent or unreliable. Methods: Network-embedding emphasizes relations between net-work proteins embedded in a low-dimensional space, where protein-pairs closer to each other represent potential candidate interactions to predict. Network denoising, which boosts the prediction perfor-mance, is here achieved by minimum-curvilinear-embedding (MCE), combined with the shortest-path (SP) adopted in the reduced space for assigning likelihood scores to candidate interactions. Further-more, we introduce: (i) a new valid variation of MCE named non-centred-MCE (ncMCE); (ii) two automatic strategies for the selection of the appropriate embedding-dimension; (ii) two new randomised procedures for prediction evaluation. Results: We compared our method against several unsupervised and supervised embedding approaches, and node-neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader outperforming the current methods for topological link prediction. Conclusion: Minimum curvilinearity is a valuable nonlinear frame-work, which we successfully applied in embedding of protein net-works for unsupervised prediction of novel PPIs. The rationale is that biological and evolutionary prior-information is imprinted in the nonlinear patterns hidden behind the protein network topology, and can be exploited for prediction of new protein links. The predicted PPIs represent good candidates to test in high-throughput experi-ments or to exploit in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules.

Keyword: Network topology, network embedding, topological prediction, nonline

TOP

PP32 (PT) - Inference of historical migration rates via haplotype sharing
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: Hall 4/5
Presenting author: Pier Francesco Palamara , Columbia University, United States

Itsik Pe'Er, Columbia University, United States

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

Pairs of individuals from a study cohort will often share long-range haplotypes identical-by-descent (IBD). Such haplotypes are transmitted from common ancestors that lived tens to hundreds of generations in the past, and can now be efficiently detected in high-resolution genomic datasets, providing a novel source of information in several domains of genetic analysis. Recently, haplotype sharing distributions were studied in the context of demographic inference, and were used to reconstruct recent demographic events in several populations. We here extend such framework to handle demographic models that contain multiple demes interacting through migration. We extensively test our formalism in several demographic scenarios, and provide a freely available software tool for demographic inference.

Keyword: Population genetics, Demographic inference, Identity by descent, Haplot

TOP

PP33 (PT) - Protein Threading Using Context-Specific Alignment Potential
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: Hall 7
Presenting author: Sheng Wang, Toyota Technological Institute at Chicago, United States

Jinbo Xu, Toyota Technological Institute at Chicago, United States
Feng Zhao, Toyota Technological Institute at Chicago, United States
Jianzhu Ma, Toyota Technological Institute at Chicago, United States

Session Chair: Alex Bateman

Presentation Overview: Show/Hide

Motivation: Template-based modeling (TBM) including homology modeling and protein threading is the most reliable method for pro-tein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current TBM methods, especially when proteins under consideration are distantly related. Results: We present a novel context-specific alignment potential for protein threading including alignment and template selection. Our alignment potential measures the log odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global context-specific information. The local alignment potential quantifies how well one sequence residue can be aligned to one template residue based upon context-specific information of the residues. The global alignment potential quantifies how well two sequence residues can be placed into two template positions at a given distance, again based upon context-specific information. By accounting for correla-tion among a variety of protein features and making use of context-specific information, our alignment potential is much more sensitive than the widely used context-independent or profile-based scoring function. Experimental results confirm that our method generates significantly better alignments and threading results than the best profile-based methods on several very large benchmarks. Our method works particularly well for distantly-related proteins or pro-teins with sparse sequence profiles due to the effective integration of context-specific, structure and global information.

Keyword: Protein Threading, Alignment Potential, Protein Pairwise Information

TOP

PP34 (PT) - Predicting Drug-Target Interactions Using Restricted Boltzmann Machines
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: Hall 14.2
Presenting author: Jianyang Zeng, Tsinghua University, China

Yuhao Wang, Tsinghua University, China

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

Motivation: In silico prediction of drug-target interactions plays an important role towards identifying and developing new uses of existing or abandoned drugs. Network-based approaches have recently become a popular tool for discovering new drug-target interactions. Unfortunately, most of these network-based approaches can only predict binary interactions between drugs and targets, and information about different types of interactions has not been well exploited for drug-target interaction prediction in previous studies. On the other hand, incorporating additional information about drug-target relationships or drug modes of action can improve prediction of drug-target interactions. Furthermore, the predicted types of drug-target interactions can broaden our understanding about the molecular basis of drug action. Results: We propose a first machine learning approach to integrate multiple types of drug-target interactions and predict unknown drug-target relationships or drug modes of action. We cast the new drug-target interaction prediction problem into a two-layer graphical model, called restricted Boltzmann machine (RBM), and apply a practical learning algorithm to train our model and make predictions. Tests on two public databases show that our RBM model can effectively capture the latent features of a drug-target interaction network, and achieve excellent performance on predicting different types of drug-target interactions, with the area under precision-recall curve (AUPR) up to 89.6. In addition, we demonstrate that integrating multiple types of drug-target interactions can significantly outperform other predictions either by simply mixing multiple types of interactions without distinction or using only a single interaction type. Further tests show that our approach can infer a high fraction of novel drug-target interactions that has been validated by known experiments in the literature or other databases. These results indicate that our approach can have highly practical relevance to drug-target interaction prediction and drug repositioning, and hence advance the drug discovery process. Availability: Software and datasets are available upon request.

Keyword: Drug-Target Interaction, Drug Repositioning, Restricted Boltzmann Machine,

TOP

PP35 (PT) - Supervised de novo reconstruction of metabolic pathways from metabolome-scale compound sets
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: ICC Lounge 81
Presenting author: Masaaki Kotera , Kyoto University, Japan

Yasuo Tabei,
Yoshihiro Yamanishi, Kyushu University, Japan
Toshiaki Tokimatsu, Kyoto University, Japan
Susumu Goto, Kyoto University, Japan

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

Motivation: The metabolic pathway is an important biochemical reaction network involving enzymatic reactions among chemical compounds. However, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. Therefore, the most important challenge in metabolomics is the automated de novo reconstruction of metabolic pathways, which includes the elucidation of previously unknown reactions to bridge the metabolic gaps. Results: In this paper we develop a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as ”enzymatic-reaction likeness”, i.e., whether or not compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns, and in the large-scale applicability owing to the computational efficiency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in KEGG. Our comprehensively predicted reaction networks of 15,698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics.

Keyword: Metabolic network, de novo metabolic pathway reconstruction, enzymati

TOP

PP36 (PT) - Efficient network-guided multi-locus association mapping with graph cuts
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 4/5
Presenting author: Chloé-Agathe Azencott , Max-Planck-Institutes Tübingen, Germany

Dominik Grimm, Max-Planck-Institutes Tübingen, Germany
Mahito Sugiyama, Max-Planck-Institutes Tübingen, Germany
Yoshinobu Kawahara, Osaka University, Japan
Karsten Borgwardt, Max-Planck-Institutes Tübingen, Germany

Session Chair: Russell Schwartz

Presentation Overview: Show/Hide

As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. While several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci, or do not scale to genome-wide settings. We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype, while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly. SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci and exhibits higher power in detecting causal SNPs in simulation studies than other methods. On flowering time phenotypes and genotypes from Arabidposis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature.

Keyword: Feature selection, statistical genetics, network biology, graph minin

TOP

PP37 (HT) - The role of proteins encoded by chimeric RNAs in eukaryotes
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 7
Presenting author: Milana Frenkel-Morgenstern , Spanish National Cancer Research Centre (CNIO), Spain

Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain

Session Chair: Alex Bateman

Presentation Overview: Show/Hide

Chimeric RNAs of two or more genes are distinct from conventional alternatively spliced isoforms, because they result from the trans-splicing of pre-mRNAs or gene fusion following translocations. Only a limited number of chimeric transcripts and their associated proteins have been characterized, mostly result from chromosomal translocations and are associated with cancers. Therefore, it is important to extend these observations so as to catalog the chimeric transcripts expressed in different types of cancers, and to study the potential functions of their corresponding chimeric proteins, including the alterations they produce in protein-protein interaction networks. Indeed, we found already evidence that chimeric transcripts are translated into functional chimeric proteins and they can change cellular localization of parental proteins and can be identified in cancer patients using the specific and unique peptides. Finally, we collected the chimeric transcripts of human, mouse and fly in the ChiTaRS database to study the evolutionary conservation of chimeras.

Keyword: Evolution & Comparative Genomics, Sequence Analysis

TOP

PP38 (HT) - Navigating chemical and biological space – in the search of novel pharmaceuticals
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 14.2
Presenting author: Paula Petrone , Hoffmann-La Roche, Switzerland

Ben Simms, Novartis NIBR, United States
Anne Mai Wassermann, Novartis NIBR, United States
Eugen Lounkine, Novartis NIBR, United States
Peter Kutchukian, Novartis NIBR, United States
Paul Selzer, Novartis NIBR, United States
Florian Nigsch, Novartis NIBR, United States
Jeremy Jenkins, Novartis NIBR, United States
Allen Cornett, Novartis NIBR, United States
Zhan Deng, Novartis NIBR, United States
John W Davies, Novartis NIBR, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

Typically, virtual screening of compound libraries is based on the assumption that structurally similar compounds are likely to share similar properties and bind to the same group of proteins. This model often fails due to the rugged nature of the activity landscape. Furthermore, similarity in chemical space cannot explain the activity of compounds against a specific pathway or groups of pathways. Compounds that incur similar phenotypes and yet are structurally diverse are therefore often overlooked in automated searches. Our alternative perspective on virtual screening and library design is based solely on the interactions of compounds with the proteome. Ligands may be quantitatively grouped by the biological closeness of their targets by means of their biological fingerprints. We study similarity and diversity in biological space as necessary ingredients for compounds in screening libraries. We demonstrate here how compound-target interaction networks can be steered to find novel and biologically relevant chemical matter.

Keyword: Protein Interactions & Molecular Networks, other

TOP

PP39 (PT) - A framework for scalable parameter estimation of gene circuit models using structural information
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: ICC Lounge 81
Presenting author: Xin Gao , King Abdullah University of Science and Technology, Saudi Arabia

Ming Fan, King Abdullah University of Science and Technology , Saudi Arabia
Suojin Wang, Texas A&M University, United States
Hiroyuki Kuwahara, King Abdullah University of Science and Technology, Saudi Arabia

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

Motivation: Systematic and scalable parameter estimation is a key to construct complex gene regulatory models and to ultimately facilitate an integrative systems biology approach to quantitatively understand the molecular mechanisms underpinning gene regulation. Results: Here, we report a novel framework for efficient and scalable parameter estimation that focuses specifically on modeling of gene circuits. Exploiting the structure commonly found in gene circuit models, this framework decomposes a system of coupled rate equations into individual ones and efficiently integrates them separately to reconstruct the mean time evolution of the gene products. The accuracy of the parameters is refined by iteratively increasing the accuracy of numerical integration using the model structure. As a case study, we applied our framework to four gene circuit models with complex dynamics based on three synthetic data sets and one time-series microarray data set. We compared our framework to three state-of-the-art parameter estimation methods and found that our approach consistently generated higher quality parameter solutions efficiently. While many general-purpose parameter estimation methods have been applied for modeling of gene circuits, our results suggest that the use of more tailored approaches to employ domain specific information may be a key to reverse-engineering of complex biological systems. Availability: Website: http://sfb.kaust.edu.sa/Pages/Software.aspx

Keyword: Parameter estimation, gene circuits, systems biology, synthetic biology

TOP

PP40 (PT) - Identifying proteins controlling key disease signaling pathways
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 4/5
Presenting author: Anthony Gitter , Carnegie Mellon University , United States

Ziv Bar-Joseph, Carnegie Mellon University

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Several types of studies, including genome-wide association studies and RNA interference screens, strive to link genes to diseases. Although these approaches have had some success, genetic variants are often only present in a small subset of the population and screens are noisy with low overlap between experiments in different labs. Neither provides a mechanistic model explaining how identified genes impact the disease of interest or the dynamics of the pathways those genes regulate. Such mechanistic models could be used to accurately predict downstream effects of knocking down pathway members and allow comprehensive exploration of the effects of targeting pairs or higher-order combinations of genes. We developed methods to model the activation of signaling and dynamic regulatory networks involved in disease progression. Our model, SDREM, integrates static and time series data to link proteins and the pathways they regulate in these networks. SDREM utilizes prior information about proteins' likelihood of involvement in a disease (e.g. from screens) to improve the quality of the predicted signaling pathways. We used our algorithms to study the human immune response to H1N1 influenza infection. The resulting networks correctly identified many of the known pathways and transcriptional regulators of this disease. Furthermore, they accurately predict RNA interference effects and can be used to infer genetic interactions, greatly improving over other methods suggested for this task. Applying our method to the more pathogenic H5N1 influenza allowed us to identify several strain-specific targets of this infection.

Keyword: Pathway inference, viral infection, RNAi screens, genetic interaction

TOP

PP41 (PT) - Automated Cellular Annotation for High Resolution Images of Adult C. elegans
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 7
Presenting author: Sarah Aerni , Stanford University, United States

Xiao Liu, Stanford University, United States
Chuong Do, 23andMe, Inc., United States
Samuel Gross, Stanford University, United States
Andy Nguyen, Stanford University School of Medicine, United States
Stephen Guo, Stanford University, United States
Fuhui Long, Howard Hughes Medical Institute, United States
Hanchuan Peng, Allen Institute for Brain Science, United States
Stuart Kim, Stanford University School of Medicine, United States
Serafim Batzoglou, Stanford University, United States

Session Chair: Stefan Kramer

Presentation Overview: Show/Hide

Motivation: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm C. elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C. elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes. Results: In this paper, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal, and hypodermal cells) in high-resolution images of adult C. elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum flow and apply a cross-entropy based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system.These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C. elegans.

Keyword: Bioinformatics, Gene expression, Image analysis, Machine learning

TOP

PP42 (PT) - Haplotype assembly in polyploid genomes and identical by descent shared tracts
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 14.2
Presenting author: Derek Aguiar , Brown University, United States

Sorin Istrail, Brown University, United States

Session Chair: Sean O'Donoghue

Presentation Overview: Show/Hide

Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing these high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (1) do not consider individuals sharing haplotypes jointly which reduces the size and accuracy of assembled haplotypes and (2) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Particularly, polyploid organisms are becoming the target of many research groups interested in studying the genomics of disease, phylogenetics, botany, and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction. Results: In this work, we present a number of results, extensions, and generalizations of Compass graphs and our HapCompass framework (Aguiar et al. 2012). We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. We present graph theory-based algorithms for the problem of haplotype assembly from sequencing data using our previously developed HapCompass framework for (1) novel implementations of haplotype assembly optimizations (minimum error correction), (2) assembly of a pair of individuals sharing a tract identical by descent, and (3) assembly of polyploid genomes. We demonstrate the accuracy of each method on the 1000 Genomes Project, Pacific Biosciences, and simulated sequence data. HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/

Keyword: Haplotype, haplotype assembly, single individual haplotyping,

TOP

PP43 (HT) - Emerging methods in protein co‐evolution
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: ICC Lounge 81
Presenting author: David Juan , Spanish National Cancer Research Centre, Spain

Florencio Pazos, Spanish National Centre for Biotechnology, Spain
Alfonso Valencia, Spanish National Cancer Research Centre, Spain

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

Co‐evolution is an essential component of evolution that contributes to maintain the structure of ecological and molecular networks while allowing species, proteins and genes to change and adapt over time. A wide range of co‐evolution‐inspired computational methods has been designed for: protein modeling, detection of binding sites, deciphering protein mechanisms of action, prediction of protein–protein interaction partners and reconstruction of protein complexes and interaction networks. Interestingly, recent important breakthroughs in the field have resulted in a remarkable improved capacity to predict interactions between proteins, and contacts between different protein residues. While co‐evolution‐based approaches have been developed independently over the last several decades, we propose that unification under a common framework would be a major step forward in the understanding of the molecular basis of co‐evolution.

Keyword: Evolution & Comparative Genomics, Sequence Analysis

TOP

PP44 (PT) - Compressive genomics for protein databases
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 4/5
Presenting author: Noah Daniels , Tufts University, United States

Andrew Gallant, Tufts University, United States
Jian Peng, Massachusetts Institute of Technology, United States
Lenore Cowen, Tufts University, United States
Michael Baym, Harvard Medical School
Bonnie Berger, Massachusetts Institute of Technology, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Motivation: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed up homology search directly, but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. Results: We introduce a suite of homology search tools, powered by compressively-accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate to all known state- of-the-art tools including HHblits, DELTA-BLAST, and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP’s runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed up many tasks such as protein structure prediction and orthology mapping which rely heavily on homology search. Availability: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/

Keyword: Sequence Analysis, protein search,BLAST

TOP

PP45 (PT) - FuncISH: Learning a functional representation of neural ISH images
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 7
Presenting author: Noa Liscovitch , Bar Ilan University, Israel

Uri Shalit, Hebrew University of Jerusalem, Israel
Gal Chechik, Stanford University, United States

Session Chair: Stefan Kramer

Presentation Overview: Show/Hide

High spatial resolution imaging datasets of mammalian brains have recently become available in unprecedented amounts. Images now reveal highly complex patterns of gene expression varying on multiple scales. The challenge in analyzing these images is both in extracting the patterns that are most relevant functionally, and in providing a meaningful representation that allows neuroscientists to interpret the extracted patterns. Here we present FuncISH – a method to learn functional representations of neural in situ hybridization (ISH) images. We represent images using a histogram of local descriptors (SIFT) in several scales, and use this representation to learn detectors of functional (GO) categories for every image. As a result, each image is represented as a point in a low dimensional space whose axes correspond to meaningful functional annotations. The resulting representations define similarities between ISH images that can be easily explained by functional categories. We applied our method to the genomic set of mouse neural ISH images available at the Allen Brain Atlas, finding that the majority of GO biological processes can be inferred from spatial expression patterns with high accuracy. Using functional representations, we predict several gene interaction properties such as protein-protein interactions and cell type specificity more accurately than competing methods based on global correlations. We used FuncISH to identify similar expression patterns of GABAergic neuronal markers that were not previously identified, and to infer new gene function based on image-image similarities.

Keyword: PCA, censored data, censoring, single-cell qPCR, Gaussi

TOP

PP46 (PT) - Using State Machines to Model the IonTorrent Sequencing Process and Improve Read Error-Rates
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 14.2
Presenting author: David Golan , Tel Aviv University, Israel

Paul Medvedev, The Pennsylvania State University, United States

Session Chair: Sean O'Donoghue

Presentation Overview: Show/Hide

Motivation: The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is IonTorrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous semiconductor technology and innovation in chemistry, IonTorrent has been gaining popularity since its debut in 2011. Despite the advantages, however, IonTorrent read accuracy remains a significant concern. Results: We present FlowgramFixer, a new algorithm for converting flowgrams into reads. Our key observation is that the incorporation signals of neighboring flows, even after normalization and phase correction, carry considerable mutual information and are important in making the correct base-call. We therefore propose that base-calling of flowgrams should be done on a read-wide level, rather than one flow at a time. We show that this can be done in linear time by combining a state machine with a Viterbi algorithm to find the nucleotide sequence that maximizes the likelihood of the observed flowgram. FlowgramFixer is applicable to any flowgram based sequencing platform. We demonstrate FlowgramFixer’s superior performance on Ion Torrent E.Coli data, with a 4.8% improvement in the number of high-quality mapped reads and a 7.1% improvement in the number of uniquely mappable reads. Availability: Binaries and source code of FlowgramFixer are freely available at: http://www.cs.tau.ac.il/˜davidgo5/flowgramfixer.html

Keyword: Sequencing, Viterbi, IonTorrent, Flowgram, Base calling

TOP

PP47 (HT) - Short Toxin-like Proteins Abound in Cnidaria Genomes
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: ICC Lounge 81
Presenting author: Michal Linial , The Hebrew University of Jerusalem, Israel

Isaak Tirosh, The Hebrew University of Jerusalem, Israel
Manor Askenazi, The Hebrew University of Jerusalem, Israel
Itai Linial, The Hebrew University of Jerusalem, Israel

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

The publication of Tirosh et al (2012) deals with a neglected niche in functional genomics. The main finding is the identification of short active sequences that failed detection via classical alignment-based approaches. This research lies in the interface of computational biology and automatic functional annotation scheme.
Cnidaria is a rich phylum that includes thousands of marine species. In this study, we focused on Nematostella vectensis and Hydra magnipapillata genomes. We present a method for ranking toxin-like candidates. Toxin-like functions were revealed using ClanTox. Among 83,000 proteins from Cnidaria, we found 170 candidates that fulfill the properties of toxin-like-proteins. Remarkably, only 11% of the predicted toxin-like proteins were previously classified as toxins. Our prediction methodology inferred functions for protease inhibitors, membrane pore formation, ion channel blockers and metal binding proteins. We conclude that the evolutionary expansion of toxin-like proteins in Cnidaria contributes to their fitness in the complex environment of the aquatic ecosystem.

Keyword: Evolution & Comparative Genomics, Sequence Analysis

TOP

PP48 (HT) - Predicting the molecular complexity of sequencing libraries
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 4/5
Presenting author: Andrew Smith , University of Southern California, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of genome sequencing. Available methods to determine either how deeply to sequence, or predict the benefits of additional sequencing, are almost completely lacking. We introduce an empirical Bayesian method to implicitly model any source of bias and accurately characterize the molecular complexity of a DNA sample or library in almost any sequencing application.

Keyword: Applied Bioinformatics, other

TOP

PP49 (PT) - Automated annotation of gene expression image sequences via nonparametric factor analysis and conditional random fields
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 7
Presenting author: Iulian Pruteanu-Malinici , Duke University, United States

William Majoros, Duke University, United States
Uwe Ohler, Duke University, United States

Session Chair: Stefan Kramer

Presentation Overview: Show/Hide

Motivation: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. Methods: We describe a discriminative, undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a nonparametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy, incomplete samples, i.e. it can tolerate data missing from individual time points. Results: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared to previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages.

Keyword: Nonparametric sparse Bayesian factor analysis, undirected graphical models, gene annotation, miss

TOP

PP50 (HT) - A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 14.2
Presenting author: Mathieu Clément-Ziza , Biotec, Technische Universitaet Dresden, Germany

Paola Picotti, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Henry Lam, The Hong Kong University of Science and Technology, Hong Kong
David Campbell, Institute for Systems Biology, United States
Alexander Schmidt, University of Basel, Switzerland
Eric Deutsch, Institute for Systems Biology, United States
Hannes Röst, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Zhi Sun, Institute for Systems Biology, Seattle, United States
Olivier Rinner, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Lukas Reiter, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Qin Shen, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Jacob Michaelson, Technische Universitaet Dresden, Germany
Andreas Frei, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Simon Alberti, Max Planck Institute of Molecular Cell Biology and Genetics, Germany
Ulrike Kusebauch, Institute for Systems Biology, Seattle, United States
Bernd Wollscheid, nstitute of Molecular Systems Biology, ETH Zurich, Switzerland
Robert Moritz, Institute for Systems Biology, Seattle, United States
Andreas Beyer, BIOTEC, Technische Universitaet Dresden, Germany
Ruedi Aebersold, Institute of Molecular Systems Biology, ETH Zurich, Switzerland

Session Chair: Sean O'Donoghue

Presentation Overview: Show/Hide

sing a combination of new proteomics methods and novel computational algorithms we investigated the impact of natural genetic variation on protein concentrations. To accomplish this task we generated an almost complete reference map of the yeast proteome for shotgun and targeted proteomics. We used this map in a series of shotgun- and targeted proteomics experiments in a panel of 78 budding yeast strains in order to identify protein-QTL, i.e. genomic regions associated with protein abundance changes. These experiments were informed by computational network analysis. Using a powerful new machine-learning approach we could identify a surprisingly large fraction of protein-QTL being in epistasis with each other.
The network-based analysis facilitated the identification of protein modules, whose members are affected by several independent genetic variants in a coordinated way. This suggests that selective pressure favors the acquisition of sets of polymorphisms that adapt protein abundances at the pathway level.

Keyword: Mass Spectrometry & Proteomics, Gene Regulation & Transcriptomics

TOP

PP51 (PT) - Predicting protein interactions via parsimonious network history inference
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: ICC Lounge 81
Presenting author: Robert Patro , Carnegie Mellon University, United States

Carl Kingsford, Carnegie Mellon University, United States

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

Motivation: Reconstruction of the network-level evolutionary history of protein-protein interactions provides a principled way to relate interactions in several present-day networks. Here, we present a general framework for inferring such histories and demonstrate how it can be used to determine what interactions existed in the ancestral networks, which present-day interactions should we expect to exist based on evolutionary evidence, and what information extant networks contain about the order of ancestral protein duplications. Results: Our framework characterizes the space of likely parismonious network histories. It results in a structure that can be used to find probabilities for a number of events associated with the histories. The framework is based on a directed hypergraph formulation of dynamic programming that we extend to enumerate many optimal and near-optimal solutions. The algorithm is applied to reconstructing ancestral interactions among bZIP transcription factors, imputing missing present-day interactions among the bZIPs and among proteins from 5 herpes viruses, and determining relative protein duplication order in the bZIP family. Our approach more accurately reconstructs ancestral interactions compared with existing approaches. In cross-validation tests, we find that our approach ranks the majority of the left-out present-day interactions among the top 2% and 17% of possible edges for the bZIP and herpes networks, respectively, making it a competitive approach for edge imputation. It also estimates, from interaction data alone, relative bZIP protein duplication orders that are significantly correlated with sequence-based estimates. Availability: The algorithm is implemented in C++, is open source, and available at http://www.cs.cmu.edu/~ckingsf/software/parana2. Contact: robp@cs.cmu.edu and carlk@cs.cmu.edu

Keyword: Protein interaction evolution, ancestral network reconstruction, interaction pred

TOP

PP52 (HT) - Interpreting genomic data via entropic dissection
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 4/5
Presenting author: Rajeev Azad , University of North Texas, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Keyword: Sequence Analysis, Applied Bioinformatics

TOP

PP53 (PT) - A High-Throughput Framework to Detect Synapses in Electron Microscopy Images
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 7
Presenting author: Saket Navlakha , Carnegie Mellon University , United States

Joseph Suhan, Carnegie Mellon University
Alison Barth, Carnegie Mellon University
Ziv Bar-Joseph, Carnegie Mellon University

Session Chair: Stefan Kramer

Presentation Overview: Show/Hide

Motivation: Synaptic connections underlie learning and memory in the brain and are dynamically formed and eliminated during development and in response to stimuli. Quantifying changes in overall density and strength of synapses is an important pre-requisite for studying connectivity and plasticity in these cases or in diseased conditions. Unfortunately, most techniques to detect such changes are either low-throughput (e.g. electrophysiology), prone to error and difficult to automate (e.g. standard electron microscopy), or too coarse (e.g. MRI) to provide accurate and large-scale measurements. Results: To facilitate high-throughput analyses, we used a 50-year-old experimental technique to selectively stain for synapses in electron microscopy (EM) images, and we developed a machine learning framework to automatically detect synapses in these images. To validate our method we experimentally imaged brain tissue of the somatosensory cortex in six mice. We detected thousands of synapses in these images and demonstrate the accuracy of our approach using cross-validation with manually labeled data and by comparing against existing algorithms and against tools that process standard EM images. We also used a semi-supervised algorithm that leverages unlabeled data to overcome sample heterogeneity and improve performance. Our algorithms are highly efficient and scalable and are freely available for others to use.

Keyword: Image processing, Machine learning, Semi-supervised, Synapses, Elect

TOP

PP54 (HT) - A probabilistic histone modification map of the human genome and its implications for gene regulation
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 14.2
Presenting author: Misook Ha , Samsung Advanced Institute of Technology, Korea, Rep

Soondo Hong, Samsung Display Corporation, Korea, Rep
Wen-Hsing Li, University of Chicago, United States

Session Chair: Sean O'Donoghue

Presentation Overview: Show/Hide

Histone modifications play an important role in chromatin structure and gene regulation. To understand the relationship between genome sequence and chromatin structure we studied DNA sequences at histone modification sites in various human cell types. We found sequence specificity for histone modifications. Using the sequence specificities of H3 and H3K4me3 nucleosomes, we developed a model that computes the probability of H3K4me3 occupation at each base-pair from the genome sequence context. A comparison of our predictions with in vivo data suggests a high performance of our method. The predicted H3K4me3 sequence signature preferentially occurs at binding sites of transcription regulators involved in chromatin modification activities, including histone acetylases and enhancer- and insulator-associated factors. Clearly, the human genome sequence contains signatures for chromatin modifications essential for gene regulation and development. Our method may be applied to find new regulatory elements functioning by chromatin modifications and disease-causing impaired chromatin structures.

Keyword: Sequence Analysis, Gene Regulation & Transcriptomics

TOP

PP55 (HT) - Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence.
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: ICC Lounge 81
Presenting author: Yuval Tabach , Massachusetts General Hospital/ Harvard Medical School, United States

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

Small RNAs such as microRNAs and small interfering RNAs (siRNAs) require protein cofactors to promote their biogenesis and mediate their silencing functions. Even though small RNA pathways are widely distributed among animal, plant, fungal, and protist phyla, these pathways diverge or are lost in particular taxonomic clades. We used phylogenetic conservation patterns to identify new small RNA cofactor genes. We compared 86 divergent eukaryotic genome sequences to discern the sets of genes that show similar phylogenetic profiles with known small RNA cofactor genes. The top predictions from this phylogenetic screen were tested for defects in RNA interference and a large fraction of the candidate genes showed defects as strong as validated small RNA cofactor genes, revealing new components in the pathway. RNA splicing components were the most enriched class of new small RNA cofactors identified, suggesting a deep connection between the mechanism of RNA splicing and small RNA-mediated gene silencing.

Keyword: Evolution & Comparative Genomics, other

TOP

PP56 (PT) - IDBA-Tran: A More Robust de novo de Bruijn Graph Assembler for Transcriptomes with Uneven Expression Levels
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 4/5
Presenting author: Henry C.M. Leung, The University of Hong Kong

S.M. Yiu, The University of Hong Kong
Xin-Guang Zhu, Shanghai Institutes for Biological Sciences, China
Ming-Zhu Lv, Shanghai Institutes for Biological Sciences, China
Francis Chin, The University of Hong Kong
Yu Peng, The University of Hong Kong, Hong Kong

Session Chair: Debra Goldberg

Presentation Overview: Show/Hide

Motivation: RNA sequencing based on next-generation sequencing technology is an effective approach for analyzing transcriptomes. Similar to de novo genome assembly, de novo transcriptome assembly does not rely on a reference genome or additional annotated information. It is well-known that the transcriptome assembly problem is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100) which make it very difficult to identify low-expressed isoforms. Technically, a core issue is to remove erroneous vertices/edges with high multiplicity (produced by high-expressed isoforms) in the de Bruijn graph without removing those correct ones with not so high multiplicity corresponding to low-expressed isoforms. Failing to do so will result in the loss of low-expressed isoforms or having complicated subgraphs with transcripts of different genes mixed together due to the erroneous vertices and edges. Contributions: Unlike existing tools which usually remove erroneous vertices/edges if their multiplicities are lower than a global threshold, we developed a probabilistic progressive approach with local thresholds to iteratively remove those erroneous vertices/edges. This enables us to decompose the graph into disconnected components, each of which contains a few, if not single, genes, while keeping a lot of correct vertices/edges of low-expressed isoforms. Combined with existing techniques, IDBA-Tran is able to assemble both high-expressed and low-expressed transcripts and outperforms existing assemblers in terms of sensitivity and specificity for both simulated and real data. Availability: http://www.cs.hku.hk/~alse/idba_tran

Keyword: Transcriptome assembling, paired-end reads, isoforms, RNA-Seq, de Bruijn gra

TOP

PP57 (HT) - Visual Exploration for Cancer Subtype Analysis
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 7
Presenting author: Nils Gehlenborg , Harvard Medical School, United States

Alexander Lex, Harvard University, United States
Marc Streit, Johannes Kepler University Linz, Austria
Hans-Joerg Schulz, University of Rostock, Germany
Christian Partl, Graz University of Technology, Austria
Dieter Schmalstieg, Graz University of Technology, Austria
Peter Park, Harvard Medical School, United States

Session Chair: Thomas Lengauer

Presentation Overview: Show/Hide

This talk will introduce the promises and challenges of identifying and characterizing tumor subtypes in cancer genomics data sets from patient cohorts with hundreds of patients and how our visual exploration system Caleydo StratomeX (http://stratomex.caleydo.org) supports these processes. Heterogeneous data sets including multiple genomic (mRNA, miRNA, RPPA, copy number, gene mutations) and clinical data types can be loaded into the software to efficiently generate and confirm hypotheses about tumor subtypes and their functional and clinical effects.

In order to help analysts to identify promising candidate subtypes, StratomeX has been extended with computational methods to rank stratifications and identify stratifications that provide corroborating evidence for candidate subtypes. This previously unpublished feature as well as a new interactive website with large heterogeneous data sets from The Cancer Genome Atlas (TCGA) will be presented, too.

The talk will demonstrate the utility of StratomeX through a comprehensive case study from TCGA.

Keyword: Bioimaging & Data Visualization, Applied Bioinformatics

TOP

PP58 (HT) - Simulating Delta/Notch Signaling in Somitogenesis and Pancreas Development
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 14.2
Presenting author: Hendrik Tiedemann , Helmholtz Center Munich, Germany

Elida Schneltzer, Helmholtz Center Munich, Germany
Gerhard Przemeck, Helmholtz Center Munich, Germany
Martin Hrabě De Angelis, Helmholtz Center Munich, Germany

Session Chair: Lonnie Welch

Presentation Overview: Show/Hide

The Delta-Notch signal transduction pathway is involved in numerous processes in embryogenesis and adult organisms.
After binding of the Delta or Jagged ligand to the Notch receptors on the membrane of neighboring cells the cleaved-off
intracellular domain of Notch activates genes of the Hey/Hes trancription factor family, which show ultradian expression
in somitogenesis and some neural progenitor cells. While in somitogenesis D/N-signaling enforces the synchronization of
ultradian oscillators and is important for boundary formation, in neurogenesis it acts by lateral inhibition to give some
cells a different developmental fate than their neighbors. Similar processes destine some cells in intestinal crypts,
the developing airways of the lung, and the epithelial ducts of the developing pancreas to different fates.
With our gene- and cell-based computer model we simulated boundary formation in somitogenesis and islet progenitor
cell formation in pancreas and examined which parameters steer the systems toward lateral inhibition or synchronization,
respectively.

Keyword: Gene Regulation & Transcriptomics, Protein Interactions & Molecular Networks

TOP

PP59 (HT) - From sequence co-evolution to protein (complex) structure prediction
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: ICC Lounge 81
Presenting author: Martin Weigt , Universite Pierre and Marie Curie, France

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

Biological research has been revolutionized by high-throughput experiments. Unprecedented amounts of large-scale data have to be complemented by computational methods unveiling the information hidden in raw data, to increase our understanding of complex biological processes.

As an example, proteins show a remarkable degree of structural and
functional conservation in the course of evolution, despite large sequence divergence. We have developed a
statistical-inference approach, Direct Coupling Analysis, to link sequence variability to protein structure. Using sequence alone, we infer directly co-evolving residue pairs, to detect native residue-residue contacts. This information is used to guide tertiary and quaternary structure prediction. As a specific case study, I will discuss the auto-phosphorylation complex of histidine kinases, which
are involved in the majority of signal transduction systems in the bacteria. Only a multidisciplinary approach integrating statistical genomics, biophysical protein simulation, and mutagenesis experiments, allows us to predict and verify the, previously unknown, active kinase structure.

Keyword: Sequence Analysis, Protein Structure & Function

TOP

PP60 (PT) - Short Read Alignment with Populations of Genomes
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 4/5
Presenting author: Victoria Popic , Stanford University, United States

Lin Huang, Stanford University, United States
Serafim Batzoglou, Stanford University, United States

Session Chair: Debra Goldberg

Presentation Overview: Show/Hide

The increasing availability of high throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to-date there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this paper. We (1) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (2) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of 2 or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome.

Keyword: Short read alignment, genome collection, burrows-wheeler transform

TOP

PP61 (HT) - The cBio Portal for Cancer Genomics
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 7
Presenting author: Nikolaus Schultz , Memorial Sloan-Kettering Cancer Center, United States

Jianjiong Gao, Memorial Sloan-Kettering Cancer Center, United States
B. Arman Aksoy, Memorial Sloan-Kettering Cancer Center, United States
Benjamin Gross, Memorial Sloan-Kettering Cancer Center, United States
Gideon Dresdner, Memorial Sloan-Kettering Cancer Center, United States
S. Onur Sumer, Memorial Sloan-Kettering Cancer Center, United States
Ethan Cerami, Memorial Sloan-Kettering Cancer Center, United States
Anders Jacobsen, Memorial Sloan-Kettering Cancer Center, United States
Ugur Dogrusoz, Bilkent University, Turkey
Erik Larsson, University of Gothenburg, Sweden
Chris Sander, Memorial Sloan-Kettering Cancer Center, United States

Session Chair: Thomas Lengauer

Presentation Overview: Show/Hide

The cBio Portal for Cancer Genomics (cbioportal.org) provides an integrated and easy to use web resource for exploring, visualizing and analyzing multidimensional cancer genomics data. The portal reduces massive molecular profiling data from cancer tissues and cell lines to a readily understandable form as genetic, epigenetic, gene expression and proteomic events. The combination of a convenient query interface and customized data storage enables researchers to interactively explore genetic alterations across samples, genes and pathways and to link these to clinical outcomes, when available. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, and patient-centric queries. With its simple, yet powerful and flexible, interface and software programmatic access, the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries.

Keyword: Bioimaging & Data Visualization, Databases & Ontologies

TOP

PP62 (HT) - ATARiS: Computational quantification of gene suppression phenotypes from multisample RNAi screens
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 14.2
Presenting author: Aviad Tsherniak , Broad Institute of MIT and Harvard, United States

Diane Shao, Broad Institute, United States
William Hahn, Broad Institute, United States
Jill Mesirov, Broad Institute, United States

Session Chair: Lonnie Welch

Presentation Overview: Show/Hide

Genome-scale RNAi libraries enable the systematic interrogation of gene function. However, the interpretation of RNAi screens is complicated by the observation that RNAi reagents designed to suppress the mRNA transcripts of the same gene often produce a spectrum of phenotypic outcomes due to differential on-target gene suppression or perturbation of off-target transcripts. Here we present ATARiS, a computational method that takes advantage of patterns in RNAi data across multiple samples in order to enrich for RNAi reagents whose phenotypic effects relate to suppression of their intended targets. By summarizing only such reagent effects for each gene, ATARiS produces quantitative, gene-level phenotype values, which provide an intuitive measure of the effect of gene suppression in each sample. This method is robust for datasets that contain as few as ten samples and can be used to analyze screens of any number of targeted genes. ATARiS is available at http://broadinstitute.org/ataris

Keyword: Gene Regulation & Transcriptomics, Gene Regulation & Transcriptomics

TOP

PP63 (HT) - Accurate prediction of peptide-induced dynamical changes within the second PDZ domain of PTP1e
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: ICC Lounge 81
Presenting author: Elisa Cilia , Université Libre de Bruxelles, Belgium

Tom Lenaerts, Université Libre de Bruxelles, Belgium
Geerten Vuister, University Of Leicester, United Kingdom

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

Experimental NMR relaxation studies have shown that peptide binding induces dynamical changes at the side-chain level throughout the second PDZ domain of PTP1e, identifying as such the residues involved in long-range communication. Even though different computational approaches have identified qualitatively similar subsets of these residues, no quantitative analysis of the accuracy of these predictions was thus far determined.
We show that our own approach based on Monte-Carlo sampling and information theoretical analysis gives significantly more accurate results than the methods that aimed to tackle the same question earlier. Moreover, a network is inferred that captures clearly the residues involved in the process. We show furthermore that these predictions are consistent within both the human and mouse variants of this domain.
Together, these results improve the understanding of intra-protein communication and allostery in PDZ domains, underlining at the same time the necessity of producing similar data sets for further validation purposes.

Keyword: Protein Structure & Function

TOP

PP64 (PT) - Design of Shortest Double-Stranded DNA Sequences Covering All K-mers with Applications to Protein Binding Microarrays and Synthetic Enhancers
Date: Tuesday, July 23, 11:30 a.m. - 11:55 a.m.Room: Hall 4/5
Presenting author: Yaron Orenstein , Tel-Aviv University, Israel

Ron Shamir, Tel-Aviv University, Israel

Session Chair: Debra Goldberg

Presentation Overview: Show/Hide

Novel technologies can generate large sets of short double-stranded DNA sequences that can be used to measure their regulatory effects. Microarrays can measure in vitro the binding intensity of a protein to thousands of probes. Synthetic enhancer sequences inserted into an organism's genome allow us to measure in vivo the effect of such sequences on the phenotype. In both applications, by using sequence probes that cover all k-mers, a comprehensive picture of the effect of all possible short sequences on gene regulation is obtained. The value of k that can be used in practice is, however, severely limited by cost and space considerations. A key challenge is therefore to cover all k-mers with a minimal number of probes.The standard way to do this uses the de Bruijn sequence of length 4^k. However, since probes are double stranded, when a k-mer is included in a probe, its reverse complement k-mer is accounted for as well. Here we show how to efficiently create a shortest possible sequence with the property that it contains each k-mer or its reverse complement, but not necessarily both. The length of the resulting sequence approaches half that of the de Bruijn sequence as k increases. By reducing the total sequence length, experimental limitations can be overcome; alternatively, additional sequences with redundant k-mers of interest can be added.

Keyword: de Bruijn sequence, de Bruijn graph, protein binding microarray, oligo

TOP

PP65 (HT) - Visualizing and Mining Chemical-Biological Space
Date: Tuesday, July 23, 11:30 a.m. - 11:55 p.m.Room: Hall 7
Presenting author: Stefan Kramer , Johannes Gutenberg University Mainz, Germany

Andreas Karwath, Johannes Gutenberg University Mainz, Germany
Martin Gütlein, University of Freiburg, Germany

Session Chair: Thomas Lengauer

Presentation Overview: Show/Hide

It is generally agreed that a better understanding of chemical space and its bioactive compounds requires a better set of tools for the visualization and the mining of structures and associated activities. In the talk, I will present some progress towards this goal. In the first part, I will present the visualization tool CheS-Mapper (Chemical Space Mapping and Visualization in 3D), which arranges sets of chemical structures in 3D space, such that spatially close structures share more common properties than remote ones. In the second part of the talk, I will present new methods for predicting the bioactivities of compounds. These methods build upon a recently developed clustering scheme that clusters chemical structures by common "scaffolds", i.e., the existence of one large substructure shared by all cluster elements. With the help of such a structural clustering, prediction performance can be improved substantially, in particular on heterogeneous sets of structures.

Keyword: Applied Bioinformatics, Protein Structure & Function

TOP

PP66 (PT) - Learning Subgroup-Specific Regulatory Interactions and Regulator Independence with PARADIGM
Date: Tuesday, July 23, 11:30 a.m. - 11:55 a.m.Room: Hall 14.2
Presenting author: Andrew J. Sedgewick , University of Pittsburgh, United States

Stephen Benz, Five3 Genomics, LLC
Patrick Soon-Shiong, Chan Soon-Shiong Institute for Advanced Health

Session Chair: Lonnie Welch

Presentation Overview: Show/Hide

High-dimensional “-omics” profiling provides a detailed molecular view of individual cancers, however understanding the mechanisms by which tumors evade cellular defenses requires deep knowledge of the underlying cellular pathways within each cancer sample. We extended the PARADIGM algorithm (Vaske et al., 2010), a pathway analysis method for combining multiple “-omics” data types, to learn the strength and direction of 9139 gene and protein interactions curated from the literature. Using genomic and mRNA expression data from 1936 samples in The Cancer Genome Atlas (TCGA) cohort, we learned interactions that provided support for and relative strength of 7138 (78%) of the curated links. Gene set enrichment found that genes involved in the strongest interactions were significantly enriched for transcriptional regulation, apoptosis, cell cycle regulation, and response to tumor cells. Within the TCGA breast cancer cohort we assessed different interaction strengths between breast cancer subtypes, and found interactions associated with the MYC pathway and the ER alpha network to be among the most differential between basal and luminal A subtypes. PARADIGM with the Naive Bayesian assumption produced gene activity predictions that, when clustered, found groups of patients with better separation in survival than both the original version of PARADIGM and a version without the assumption. We found that this Naive Bayes assumption was valid for the vast majority of co-regulators, indicating that most co-regulators act independently on their shared target. Availability: http://paradigm.five3genomics.com

Keyword: Cancer, pathway, gene expression, copy number, probabilist

TOP

PP67 (HT) - A large‐scale evaluation of computational protein function prediction
Date: Tuesday, July 23, 11:30 a.m. - 11:55 p.m.Room: ICC Lounge 81
Presenting author: Predrag Radivojac , Indiana University, United States

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

The presentation will first provide motivation for and challenges of predicting protein function. This will include both biological significance and also precise computational problem formulation. We will then present details (at an appropriate level for a highlight presentation) of the CAFA experiment as described in the paper, discuss current state-of-the art in protein function prediction, and lay out possible avenues for improvements and accuracy assessment of computational function prediction. Finally, we intend to briefly discuss the next CAFA challenge whose start will coincide with the ISMB 2013 conference.

Keyword: Protein Structure & Function

TOP

PP68 (HT) - Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: Hall 4/5
Presenting author: Denisa Duma , University of California Riverside, United States

Stefano Lonardi, University of California Riverside, United States
Matthew Alpert, University of California Riverside, United States
Gianfranco Ciardo, University of California Riverside, United States
Timothy J. Close, University of California Riverside, United States
Steve Wanamaker, University of California Riverside, United States
Yaqin Ma, University of California Riverside, United States
Ming-Cheng Luo, University of California Davis, United States
Yonghui Wu, University of California Riverside, United States
Francesca Cordero, University of Torino, Italy
Marco Beccuti, University of Torino, Italy
Serdar Bozdag, Marquette University, United States
Prasanna R. Bhat, University of California Riverside, United States
Burair Alsaihati, University of California Riverside, United States
Josh Resnik, University of California Riverside, United States

Session Chair: Debra Goldberg

Presentation Overview: Show/Hide

The problem of obtaining the full genomic sequence of an organism has been solved either via a global brute-force approach (WGS) or by a divide-and-conquer strategy (clone-by-clone). While the advent of NGS instruments, made the WGS approach the preferred choice, the clone-by-clone strategy is still relevant especially for large complex genomes for which clone libraries and physical maps are available. In this paper, we demonstrate the feasibility of the clone-by-clone approach on the gene-space of a large, very repetitive plant genome. The novelty of our approach consists in exploiting the the high throughput of NGS instruments by pooling together hundreds of clones using a special type of combinatorial pooling design and a companion decoding algorithm.Our method allows accurate determination of the source clone(s) of each sequenced read. I will present extensive simulations and experimental results on the genomes of rice and barley, as well as new developments on decoding algorithms using Compressive Sensing ideas.

Keyword: Sequence Analysis, Applied Bioinformatics

TOP

PP69 (HT) - Designing with the user in mind: how UCD can work for bioinformatics
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: Hall 7
Presenting author: Jennifer Cham , European Bioinformatics Institute, United Kingdom

Katrina Pavelin, European Bioinformatics Institute, United Kingdom
Paula de Matos, European Bioinformatics Institute, United Kingdom
Cath Brooksbank, European Bioinformatics Institute, United Kingdom
Graham Cameron, European Bioinformatics Institute, United Kingdom
Hong Cao, European Bioinformatics Institute, United Kingdom
Rafael Alcantara, European Bioinformatics Institute, United Kingdom
Francis Rowland, European Bioinformatics Institute, United Kingdom
Brendan Vaughan, European Bioinformatics Institute, United Kingdom
Silvano Squizzato , European Bioinformatics Institute, United Kingdom
Youngmi Park, European Bioinformatics Institute, United Kingdom
Rodrigo Lopez, European Bioinformatics Institute, United Kingdom
Christoph Steinbeck, European Bioinformatics Institute, United Kingdom

Session Chair: Thomas Lengauer

Presentation Overview: Show/Hide

It is recognised that bioinformatics resources often suffer from usability problems: for example, they can be too complex for the infrequent user to navigate, and they can “lack sophistication” compared to other websites that people use in their daily lives. In this presentation, Dr. Jenny Cham, User-Experience Analyst at the European Bioinformatics Institute, UK, will describe specific case studies to show how user-centred design (UCD) principles can be applied to bioinformatics services.

As well as improved usability, the benefits of UCD can include more effective decision-making for design ideas and technologies during development; enhanced team-working and communication; cost effectiveness; and ultimately a bioinformatics service that more closely meets the needs of its target research community.

Keyword: other, other

TOP

PP70 (PT) - Hard-wired heterogeneity in blood stem cells revealed using a dynamic regulatory network model
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: Hall 14.2
Presenting author: Nicola Bonzanni , VU University Amsterdam, Netherlands

Abhishek Garg, Swiss Institute of Bioinformatics, Switzerland
K. Anton Feenstra, VU University Amsterdam, Netherlands
Judith Schütte, University of Cambridge, United Kingdom
Sarah Kinston, University of Cambridge
Diego Miranda-Saavedra, University of Cambridge
Jaap Heringa, VU University Amsterdam / Netherlands Bioinformatics Centre
Ioannis Xenarios, Swiss Institute of Bioinformatics
Berthold Göttgens, University of Cambridge, United Kingdom

Session Chair: Lonnie Welch

Presentation Overview: Show/Hide

Motivation: Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements therefore has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes. Results: Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as ‘stepping stones’ for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or ‘trigger’ is required to exit the stem cell state, with distinct triggers characterising maturation into the various different lineages. By focussing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1 which we confirmed experimentally thus validating our model. In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells.

Keyword: Haematopoietic stem cell, cis-regulatory elements, transcription factor netw

TOP

PP71 (HT) - Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and functioncc
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: ICC Lounge 81
Presenting author: Michael Liam Tress , Centro Nacional de Investigaciones Oncologicas (CNIO), Spain

Iakes Ezkurdia, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Angela del Pozo, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Jose Manuel Rodriguez, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Alfonso Valencia, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Jennifer Harrow, Wellcome Trust Sanger Centre, United Kingdom
Adam Frankish, Wellcome Trust Sanger Centre, United Kingdom
Keith Ashman, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

As part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases we provide a detailed overview of the population of alternatively spliced protein isoforms detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered.

Alternative isoforms generated from interchangeable homologous exons and from short indels were significantly enriched, both in human experiments and parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts.
The evidence of a strong bias towards subtle differences in coding sequence and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.

Keyword: Mass Spectrometry & Proteomics, Evolution & Comparative Genomics

TOP

PP72 (HT) - Genetic variants in the next generation: detection, reprioritizing and function annotation
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: Hall 4/5
Presenting author: Junwen Wang , The University of Hong Kong, China

Feng Xu, The University of Hong Kong, China
Mulin Li, The University of Hong Kong, China
Weixin Wang, The University of Hong Kong, China
Pak Sham, The University of Hong Kong, China
Panwen Wang, The University of Hong Kong, China

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

In this talk, I will first introduce a fast and accurate genetic variants detection (FaSD) program we recently developed for NGS data [1]. We assessed this program and compared its performance with several state-of-the-art programs on normal and cancer NGS data. We found that FaSD is a fast and highly accurate SNP detection method, particularly when the sequence depth is low.

Next, I will also introduce a GWASdb database we manually curated to catalog the GVs discovered by GWAS and WGS [2]. In addition, we developed a GWASrap tool that can re-prioritize genetic variants by combining the GWAS statistical value and variant prioritization score based on the additive effect principle [3]. Our evaluations demonstrated that this prioritization method is very effective in selecting disease susceptibility regions.

In summary, our algorithm, database and tools will greatly facilitate NGS studies and benefit scientific community in general.

Keyword: Databases & Ontologies, Sequence Analysis

TOP

PP73 (HT) - Metagenomic inference and biomarker discovery for the gut microbiome in inflammatory bowel disease
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: Hall 7
Presenting author: Timothy Tickle , Harvard School of Public Health, United States

Xochitl Morgan, Harvard School of Public Health, United States
Harry Sokol, University of Paris, France
Dirk Gevers, Broad Institute, United States
Kathryn Devaney, Massachusetts General Hospital, United States
Doyle Ward, Broad Institute, United States
Joshua Reyes, Harvard School of Public Health, United States
Samir Shah, Brown University, United States
Neal LeLeiko, Brown University, United States
Scott Snapper, Children's Hospital and Brigham and Women's Hospital, United States
Athos Bousvaros, Children's Hospital and Brigham and Women's Hospital, United States
Joshua Korzenik, Children's Hospital and Brigham and Women's Hospital, United States
Bruce Sands, Mount Sinai School of Medicine, United States
Ramnik Xavier, Massachusetts General Hospital, United States
Curtis Huttenhower, Harvard School of Public Health, United States

Session Chair: Alfonso Valencia

Presentation Overview: Show/Hide

The inflammatory bowel diseases have been consistently linked to dysbiosis in the gut microbiota. This microbial dysfunction has not been fully characterized, however, due to the lack of methods assessing community functional activity and statistically associating it with disease. In this study, "virtual" metagenomes were inferred using 16S rRNA gene sequencing of 231 biopsies and stool samples. This incorporated analysis of 1,119 microbial genomes and was validated by shotgun metagenomics . A multivariate approach linking microbiome shifts to disease, treatment, or environment recovered dysbioses in ~2% of microbial clades, including depletion of Clades IV and XIVa Clostridia and enrichment of Enterobacteriaceae. However, microbial functional activity was more consistently disrupted in disease, with 12% of pathways associated with IBD. These included decreases in short-chain fatty acid production, oxidative stress, and shifts from amino acid biosynthesis towards transport. These results provide initial methods for assessing biomolecular functions corresponding to changes in microbial community ecology.

Keyword: Disease Models & Epidemiology, Applied Bioinformatics

TOP

PP74 (HT) - Interplay of microRNAs, transcription factors and target genes: linking dynamic expression changes to function
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: Hall 14.2
Presenting author: Petr Nazarov , Centre de Recherche Public de la Sante, Luxembourg

Susanne Reinsbach, University of Luxembourg, Luxembourg
Arnaud Muller, Centre de Recherche Public de la Sante, Luxembourg
Nathalie Nicot, Centre de Recherche Public de la Sante, Luxembourg
Demetra Philippidou, University of Luxembourg, Luxembourg
Laurent Vallar, Centre de Recherche Public de la Sante, Luxembourg
Stephanie Kreis, University of Luxembourg, Luxembourg

Session Chair: Ralf Zimmer

Presentation Overview: Show/Hide

MicroRNAs (miRNAs), small non-coding RNAs that negatively regulate gene expression at the post-transcriptional level, are involved in fine-tuning fundamental cellular processes and are believed to confer robustness to biological responses. Using microarray data we investigated simultaneously the transcriptional changes of miRNA and mRNA expression levels over time after activation of the Jak/STAT pathway by IFN-γ stimulation of melanoma cells. We observed delayed responses of miRNAs (after 24-48 h) with respect to mRNAs (12-24 h) and identified biological functions involved at each step of the cellular response. Inference of the upstream regulators allowed for identification of transcriptional regulators involved in cellular reactions to IFN-γ stimulation. Linking expression profiles of transcriptional regulators and miRNAs with their annotated functions, we demonstrate the dynamic interplay of miRNAs and upstream regulators with biological functions. Finally, our data revealed network motifs in the form of feed-forward loops involving transcriptional regulators, mRNAs and miRNAs.

Keyword: Gene Regulation & Transcriptomics

TOP

PP75 (HT) - Systematic Computational Drug Repositioning
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: ICC Lounge 81
Presenting author: Philippe Sanseau , GlaxoSmithKline, United Kingdom

Mark Hurle, GlaxoSmithKline, United States
Lon Cardon, GlaxoSmithKline, United States
Pankaj Agarwal, GlaxoSmithKline, United States

Session Chair: Donna Slonim

Presentation Overview: Show/Hide

Systematic drug repositioning is perhaps one the best ways for computational biology to show clear translational value in the pharmaceutical and biotech industry. Bionformatics methods that use genome-wide association studies (GWAS), side effects and connectivity map data are proving to have value. We built a computational pipeline to examine the relationship between the drug disease indications of drugs and genetics findings such as GWAS traits. When the drug indication was different from the GWAS disease trait we hypothesized that the drug could potentially be repositioned. We identified almost 100 GWAS genes with at least one associated drug that suggest potential drug repositioning opportunities. Further investigations provided additional evidence for some of these opportunities. We will also show some recent developments in connectivity map and side effect methods to reposition rapidly drugs and ultimately benefit the patients.

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP

PP76 (PT) - Information-theoretic evaluation of predicted ontological annotations
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: Hall 4/5
Presenting author: Wyatt Clark , Indiana University, United States

Predrag Radivojac, Indiana University, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

The development of effective methods for the prediction of ontological annotations is an important goal in computational biology, with protein function prediction and disease gene prioritization gaining wide recognition. While various algorithms have been proposed for these tasks, evaluating their performance is difficult due to problems caused both by the structure of biomedical ontologies and biased or incomplete experimental annotations of genes and gene products. In this work, we propose an information-theoretic framework to evaluate the performance of computational protein function prediction. We use a Bayesian network, structured according to the underlying ontology, to model the prior probability of a protein's function. We then define two concepts, misinformation and remaining uncertainty, that can be seen as information-theoretic analogs of precision and recall. Finally, we propose a single statistic, referred to as semantic distance, that can be used to rank or train classification models. We evaluate our approach by analyzing the performance of three protein function predictors of Gene Ontology terms and provide evidence that we address several weaknesses of currently used metrics. We believe this framework provides useful insights into the performance of protein function prediction tools.

Keyword: Gene Ontology, Bayesian Network, Information Content, Protein Fun

TOP

PP77 (PT) - CAMPways: Constrained Alignment Framework for the Comparative Analysis of a Pair of Metabolic Pathways
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: Hall 7
Presenting author: Cesim Erten, Kadir Has University

Turker Biyikoglu, Izmir Institute of Technology
Gamze Abaka, Kadir Has University, Turkey

Session Chair: Alfonso Valencia

Presentation Overview: Show/Hide

Given a pair of metabolic pathways, an alignment of the pathways corresponds to a mapping between similar substructures of the pair. Successful alignments may provide useful applications in phylogenetic tree reconstruction, drug design, and overall may enhance our understanding of cellular metabolism. We consider the problem of providing one-to-many alignments of reactions in a pair of metabolic pathways. We first provide a constrained alignment framework applicable to the problem. We show that the constrained alignment problem even in a very primitive setting is computationally intractable which justifies efforts for designing efficient heuristics. We present our Constrained Alignment of Metabolic Pathways (CAMPWays) algorithm designed for this purpose. Through extensive experiments involving a large pathway database we demonstrate that when compared to a state-of-the-art alternative, the CAMPWays algorithm provides better alignment results on metabolic networks as far as measures based same-pathway inclusion are concerned. The execution speed of our algorithm constitutes yet another important improvement over alternative algorithms.

Keyword: Metabolic pathways, Network alignment, Graph matching, Algorithms

TOP

PP78 (PT) - Integrating sequence, expression and interaction data to determine condition-specific miRNA regulation
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: Hall 14.2
Presenting author: Hai-Son Le , Carnegie Mellon, United States

Ziv Bar-Joseph, Carnegie Mellon

Session Chair: Ralf Zimmer

Presentation Overview: Show/Hide

Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. MiRNAs were shown to play an important role in development and disease, and accurately determining the networks regulated by these miRNAs in a specific condition is of great interest. Early work on miRNA target prediction has focused on utilizing static sequence information. More recently, researchers have combined sequence and expression data to identify such targets in various conditions. Results: Here we propose a regression-based probabilistic method that integrates sequence, expression and interaction data to identify modules of mRNAs controlled by small sets of miRNAs. We formulate an optimization problem and develop a learning framework to determine the module regulation and membership. Applying our method to cancer data we show that by adding protein interaction data and modeling combinatorial regulation our method can accurately identify both miRNA and their targets improving upon prior methods. We next used our method to jointly analyze a number of different types of cancers and identified both common and cancer type specific miRNA regulators.

Keyword: microRNA, gene regulation, transcriptomics, regulatory netwo

TOP

PP79 (PT) - Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: ICC Lounge 81
Presenting author: Russell Schwartz, Carnegie Mellon University, United States

Stanley Shackney, Intelligent Oncotherapeutics
Thomas Ried, National Institutes of Health
Alejandro Schäffer, National Institutes of Health
Salim Akhter Chowdhury, Carnegie Mellon University, United States

Session Chair: Donna Slonim

Presentation Overview: Show/Hide

Motivation: Development and progression of solid tumors can be attributed to a process of mutations, which typically includes changes in the number of copies of genes or genomic regions. Although comparisons of cells within single tumors show extensive heterogeneity, recurring features of their evolutionary process may be discerned by comparing multiple regions or cells of a tumor. A particularly useful source of data for studying likely progression of individual tumors is fluorescence in situ hybridization (FISH), which allows one to count copy numbers of several genes in hundreds of single cells. Novel algorithms for interpreting such data phylogenetically are needed, however, to reconstruct likely evolutionary trajectories from states of single cells and facilitate analysis of their evolutionary trajectories. Results: In this paper, we develop phylogenetic methods to infer likely models of tumor progression using FISH copy number data and apply them to a study of FISH data from two cancer types. Statistical analyses of topological characteristics of the tree-based model provide insights into likely tumor progression pathways consistent with the prior literature. Furthermore, tree statistics from the resulting phylogenies can be used as features for prediction methods. This results in improved accuracy, relative to unstructured gene copy number data, at predicting tumor state and future metastasis. Availability: A package of source code for FISH tree building (FISHtrees) and the data on cervical cancer and breast cancer examined here are publicly available at the site ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees.

Keyword: dose-response analysis, antioxidant mechanisms of interactions, simulation

TOP

PP80 (HT) - Turning networks into ontologies of gene function
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 4/5
Presenting author: Janusz Dutkowski , University of California, San Diego, United States

Michael Kramer, University of California San Diego, United States
Michal Surma, 3. Max Planck Institute, Germany
Rama Balakrishnan, Stanford University, United States
J. Michael Cherry, Stanford University, United States
Nevan Krogan, University of California, San Francisco, United States
Trey Ideker, University of California San Diego, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Ontologies are of key importance to many domains of biological research. The Gene Ontology (GO), in particular, has proven instrumental in unifying knowledge about biological processes, cellular components, and molecular functions through a hierarchy of concepts and their interrelationships. However, given only partial biological knowledge and inconsistency in how this knowledge is curated, it has been difficult to construct, extend and validate GO in an unbiased manner. To address this problem we have recently developed a new computational system that infers ontological representations automatically from large-scale maps of gene and protein interactions. The result is a network-extracted ontology (NeXO), which contains 4,123 biological concepts and 5,766 hierarchical concept relations, capturing the majority of known cellular components and identifying approximately 600 new components and relationships. As we show, many new components can be validated using a combination of experimental and bioinformatic approaches, and used directly to update the Gene Ontology structure.

Keyword: Protein Interactions & Molecular Networks, Applied Bioinformatics

TOP

PP81 (HT) - Along Signal Paths: Connecting Pathway Annotation to Topological Analyses
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 7
Presenting author: Gabriele Sales , Università di Padova, Italy

Paolo Martini, Università di Padova, Italy
Enrica Calura, Università di Padova, Italy
Chiara Romualdi, Università di Padova, Italy

Session Chair: Alfonso Valencia

Presentation Overview: Show/Hide

Gene expression analysis is increasingly relying on information about pathway topology to enhance result interpretation. This connection between pathway annotation and analysis remains limited. Pathway representation formats have grown richer, but at the same time they gained a great deal of complexity that offers no direct advantage to data modelling. As a result, most analysis methods completely discard the information about topology and instead focus on simple gene lists.
Our recent efforts have been directed to fill this gap between annotation and analysis. We developed a totally new computational platform that exploits both the richness of the latest pathway data formats (such as BioPax 3) and the sensitivity of the topological analyses.
Our software is able to convert topological information into gene networks. From this, it can dissect the complexity of a pathway identifying the portions associated with a biological process, providing easy visualization, access and interpretation of expression data.

Keyword: Protein Interactions & Molecular Networks, Gene Regulation & Transcriptomics

TOP

PP82 (PT) - The RNA Newton Polytope and Learnability of Energy Parameters
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 14.2
Presenting author: Hamidreza Chitsaz, Wayne State University, United States

Elmirasadat Forouzmand, Wayne State University, United States

Session Chair: Ralf Zimmer

Presentation Overview: Show/Hide

Motivation: Computational RNA structure prediction is a mature important problem which has received a new wave of attention with the discovery of regulatory non-coding RNAs and the advent of high-throughput transcriptome sequencing. Despite nearly two scores of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. So far, researchers have proposed increasingly complex energy models and improved parameter estimation methods, experimental and/or computational, in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. Why is that? Approach: The first step towards high accuracy structure prediction is to pick an energy model that is inherently capable of predicting each and every one of known structures to date. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of such an inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. To the best of our knowledge, this is the first approach towards computing the RNA Newton polytope and a systematic assessment of the inherent capabilities of an energy model. The worst complexity of our algorithm is expontential in the number of features. However, one could employ dimensionality reduction techniques to avoid the curse of dimensionality. Results: We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U, C-G, and G-U base pairs. Our results show that this simple energy model satisfies the necessary condition for more than half of the input unpseudoknotted sequence-structure pairs (55%) chosen from the RNA STRAND v2.0 database and severely violates the condition for about 13%, which provide a set of hard cases that require further investigation. From 1350 RNA strands, the observed three dimensional feature vector for 749 strands is on the surface of the computed polytope. For 289 RNA strands, the observed feature vector is not on the boundary of the polytope but its distance from the boundary is not more than one. A distance of one essentially means one base pair difference between the observed structure and the closest point on the boundary of the polytope, which need not be the feature vector of a structure. For 171 sequences, this distance is larger than 2, and for only 11 sequences, this distance is larger than 5.

Keyword: RNA structure prediction, Energy parameter estimation, Computational algebra

TOP

PP83 (PT) - Automated target segmentation and fast alignment methods for high-throughput classification and averaging of crowded cryo-electron subtomograms
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: ICC Lounge 81
Presenting author: Min Xu , University of Southern California, United States

Frank Alber, University of Southern California

Session Chair: Donna Slonim

Presentation Overview: Show/Hide

Motivation: Cryo-electron tomography allows the imaging of macromolecular complexes in near living conditions. To enhance the nominal resolution of a structure it is necessary to align and average individual subtomograms each containing identical complexes. However, if the sample of complexes is heterogeneous, it is necessary to first classify subtomograms into groups of identical complexes. This task becomes challenging when tomograms contain mixtures of unknown complexes extracted from a crowded environment. Two main challenges must be overcome: First, classification of subtomograms must be performed without knowledge of template structures. However, most alignment methods are too slow to perform reference-free classification of a large number of (e.g. tens of thousands) of subtomograms. Second, subtomograms extracted from crowded cellular environments, contain often fragments of other structures besides the target complex. However, alignment methods generally assume that each subtomogram only contains one complex. Automatic methods are needed to identify the target complexes in a subtomogram even when its shape is unknown. Results: In this paper, we propose an automatic and systematic method for the isolation and masking of target complexes in subtomograms extracted from crowded environments. Moreover, we also propose a fast alignment method using fast rotational matching in real space. Our experiments show that, compared to our previously proposed fast alignment method in reciprocal space, our new method significantly improves the alignment accuracy for highly distorted and especially crowded subtomograms. Such improvements are important for achieving successful and unbiased high-throughput reference-free structural classification of complexes inside whole cell tomograms.

Keyword: Cryo-electron tomography, Subtomogram alignment, Subtomogram classification

TOP

PP84 (PT) - A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 4/5
Presenting author: Sophia Ananiadou, The University of Manchester

Tomoko Ohta, The University of Manchester
Rafal Rak, The University of Manchester
Andrew Rowley, The University of Manchester
Douglas B. Kell, The University of Manchester
Sampo Pyysalo, The University of Manchester
Makoto Miwa, The University of Manchester, United Kingdom

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Motivation: In order to create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models, and then turns them into queries for three text-mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machinelearning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk

Keyword: Text mining, Pathway, Ranking

TOP

PP85 (PT) - A Context-Sensitive Framework for the Analysis of Human Signalling Pathways in Molecular Interaction Networks
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 7
Presenting author: Alex Lan , Ben Gurion University, Israel

Michal Ziv-Ukelson, Ben Gurion University of the Negev, Israel
Esti Yeger-Lotem, Ben Gurion University, Israel

Session Chair: Alfonso Valencia

Presentation Overview: Show/Hide

A major challenge in systems biology is to reveal the cellular pathways that give rise to specific phenotypes and behaviours. Current techniques often rely on a network representation of molecular interactions, where each node represents a protein or a gene and each interaction is assigned a single static score. However, the use of single interaction scores fails to capture the tendency of proteins to favour different partners under distinct cellular conditions. Here we propose a novel context-sensitive network model, in which genes and protein nodes are assigned multiple contexts based on their gene ontology annotations, and their interactions are associated with multiple context-sensitive scores. Using this model we developed a new approach and a corresponding tool, ContextNet, based on a dynamic programming algorithm for identifying signalling paths linking proteins to their downstream target genes. ContextNet finds high-ranking context-sensitive paths in the interactome, thereby revealing the intermediate proteins in the path and their path-specific contexts. We validated the model using 18,348 manually-curated cellular paths derived from the SPIKE database. We next applied our framework to elucidate the responses of human primary lung cells to influenza infection. Top-ranking paths were much more likely to contain infection-related proteins, and this likelihood was highly correlated with path score. Moreover, the contexts assigned by the algorithm pointed to putative as well as previously known responses to viral infection. Thus context-sensitivity is an important extension to current network biology models and can be efficiently used to elucidate cellular response mechanisms. ContextNet is publicly available at http://netbio.bgu.ac.il/ContextNet.

Keyword: PPI-Network, Context-Sensitive-Path, Systems-Biology

TOP

PP86 (PT) - A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotides distribution
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 14.2