### Theme Presentation Schedule

Attention Conference Presenters - please review the Speaker Information Page available here.

Highlights, Late Breaking Research and Proceedings Track submissions are presented by scientific theme as part of the combined Theme Presentation schedule.

TP001 (HT) - Synthetic long read technologies in genome phasing and beyond
Theme: Genes
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: The Liffey A

Presenting author: Volodymyr Kuleshov, Stanford University, United States

Dan Xie, Stanford University, United States
Rui Chen, Stanford University, United States
Dmitry Pushkarev, Stanford University, United States
Zhihai M, Stanford University, United States
Tim Blauwkamp, Illumina, Inc., United States
Michael Kertesz, Stanford University, United States
Serafim Batzoglou, Stanford University, United States
Michael Snyder, Stanford University, United States

Session Chair: Siu Ming Yiu

Presentation Overview: Show
New synthetic long read technologies are finally offering us tools for studying unresolved aspects of the human genome such as structural variation and genomic phase. In this talk, we will present a new phasing technology based on Tru-seq synthetic long reads that places more than 99% of human genomic variants into highly accurate, megabase-long haplotypes. Its low cost and excellent performance bring haplotyping one step closer to being a routine tool for studying allele-specific phenomena such as differential methylation.

At the core of this technology is novel phasing algorithm called Prism that augments long-read phasing with statistical methods; this idea dramatically reduces sequencing requirements, increases haplotype length by almost 10x, and supports any long-read phasing technology. More generally, we will demonstrate through this as well as other ongoing work in metagenomics and de-novo assembly how synthetic long reads combined with sophisticated algorithms can help solve important problems in genomics.

TOP

TP002 (PT) - PAGER: Constructing PAGs and new PAG-PAG Relationships for Network Biology
Theme: Systems
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: Liffey Hall 2

Presenting author: Zongliang Yue, Indiana University-Purdue University Indianapolis, United States

Madhura Kshirsagar, Indiana University-Purdue University Indianapolis, United States
Thanh Nguyen, Indiana University–Purdue University Indianapolis, United States
Thanh Nguyen, Indiana University–Purdue University Indianapolis, United States
Michael Neylon, Indiana University-Purdue University Indianapolis, United States
Liugen Zhu, Indiana University–Purdue University Indianapolis, United States
Timothy Ratliff, Purdue University, United States
Jake Chen, Indiana University-Purdue University Indianapolis, United States

Session Chair: Igor Jurisica

Presentation Overview: Show
In this paper, we described a new database framework to perform integrative “gene-set, network, and pathway analysis” (GNPA). In this framework, we integrated heterogeneous data on pathways, annotated list, and gene-sets (PAGs) into a PAG electronic repository (PAGER). PAGs in the database are organized into P-type, A-type, and G-type PAGs with a three-letter-code standard naming convention. The PAGER database currently compiles 44,313 genes from 5 species including human, 38,663 PAGs, 324,830 gene-gene relationships, and two types of 3,174,323 PAG-PAG regulatory relationships—co-membership based and regulatory relationship based. To help users assess each PAG’s biological relevance, we developed a cohesion measure called Cohesion Coefficient (CoCo), which is capable of disambiguating between biologically significant PAGs and random PAGs with an Area-Under-Curve (AUC) performance of 0.98. PAGER database was set up to help users to search and retrieve PAGs from its online web interface. PAGER enable advanced users to build PAG-PAG regulatory networks that provide complementary biological insights not found in gene set analysis or individual gene network analysis. We provide a case study using cancer functional genomics data sets to demonstrate how integrative GNPA help improve network biology data coverage and therefore biological interpretability.

The PAGER database can be accessible openly at http://discovery.informatics.iupui.edu/PAGER/.

TOP

TP003 (PT) - MSProGene - Integrative proteogenomics beyond six-frames and single nucleotide polymorphisms
Theme: Proteins
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: The Liffey B

Presenting author: Franziska Zickmann, Robert Koch Institute, Germany

Bernhard Renard, Robert Koch Institute, Germany

Session Chair: Anna Tramontano

Presentation Overview: Show
Summary: Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions.
Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial six fold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments.
We overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. MSProGene is independent from existing reference databases or annotated SNPs and avoids large six-frame translated databases by constructing sample-specific transcripts. In addition, it creates a network combining RNA-Seq and peptide information that is optimized by a maximum-flow algorithm. It thereby also allows resolving the ambiguity of shared peptides for protein inference.
We applied MSProGene on three data sets and show that it facilitates a database-independent reliable yet accurate prediction on gene and protein level and additionally identifies novel genes.

Availability: MSProGene is written in Java and Python. It is open source and available at http://sourceforge.net/projects/msprogene/

TOP

TP004 (HT) - Computational dissection of transcriptional heterogeneity in single-cell RNA-Seq studies
Theme: Genes
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: The Liffey A

Presenting author: Oliver Stegle, EMBL European Bioinformatics Institute, United Kingdom

Session Chair: Siu Ming Yiu

Presentation Overview: Show
Many key biological processes are driven by differences in the regulatory landscape between single cells. Recent technical developments have enabled the transcriptomes of hundreds of cells to be assayed in an unbiased manner, opening up the possibility that new, and physiologically relevant, sub-populations of cells can be found. A key Bioinformatics challenge in analyzing these data is to comprehensively account for the different sources of variation between cells such that biologically relevant signatures can be reliably identified.

To address this, we here develop a computational approach to dissect single-cell transcriptome variation data, accounting for known and hidden sources of variation. We validate this latent variable model on single-cell data from labeled cellular states before applying it to study data generated from asynchronously differentiating T cells. By accounting for cell-to-cell correlations due to the cell cycle, we show how single-cell RNA-Seq data can be used to place individual cells on the trajectory between undifferentiated and differentiated cells.

TOP

TP005 (PT) - Metabolome-scale de novo pathway reconstruction using regioisomer-sensitive graph alignments
Theme: Systems
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: Liffey Hall 2

Presenting author: Masaaki Kotera, Tokyo Institute of Technology, Japan

Yasuo Tabei, Japan Science and Technology Agency, Japan
Yoshihiro Yamanishi, Kyushu University, Japan

Session Chair: Igor Jurisica

Presentation Overview: Show
Motivation: Recent advances in mass spectrometry and the related metabolomics technology enable rapid and comprehensive analysis of a huge number of metabolites, however, biosynthetic and biodegra- dation pathways are known only for a small portion of metabolites, and most metabolic pathways remain uncharacterized.
Results: In this study, we develop a novel method for supervi- sed de novo metabolic pathway reconstruction with an improved graph alignment-based approach in the reaction-filling framework. We propose a novel chemical graph alignment algorithm, which we call PACHA (Pairwise Chemical Aligner), in order to detect regioisomer-sensitive connectivities between aligned substructures of two compound structures. Unlike other existing graph alignment methods, PACHA can efficiently detect only one common subgraph between two compounds. Our results show that the proposed method outperforms previous descriptor-based methods or existing graph alignment-based methods in the enzymatic reaction-likeness predic- tion for isomer-enriched reactions, and it is also useful for reaction annotation that assigns potential reaction characteristics such as EC numbers and PIERO terms to substrate-product pairs. Finally, we make a comprehensive enzymatic reaction-likeness prediction for all possible uncharacterized compound pairs, suggesting potential metabolic pathways of newly predicted substrate-product pairs.

TOP

TP006 (HT) - Revising human protein coding gene numbers
Theme: Proteins
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: The Liffey B

Presenting author: Michael Tress, Spanish National Cancer Research Centre (CNIO), Spain

Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain
David Juan, Spanish National Cancer Research Centre (CNIO), Spain
Iakes Ezkurdia, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Spain
Jesus Vazquez, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Spain
Adam Frankish, Wellcome Trust Sanger Institute, Spain
Jennifer Harrow, Wellcome Trust Sanger Institute, Spain
Mark Diekhans, University of California Santa Cruz (UCSC), Spain
Jose Manuel Rodriguez, Spanish National Cancer Research Centre (CNIO), Spain

Session Chair: Anna Tramontano

Presentation Overview: Show
In this paper we mapped peptides from 7 large-scale proteomics studies to protein coding genes from the human genome. While we identified peptides for more than 96% of genes that evolved before bilateria, we did not find peptides for primate-specific genes, for genes without protein-like features or for genes with poor cross-species conservation. We described a set of 2,001 genes that were potentially non-coding based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We show that many of these genes behave more like non-coding genes than protein-coding genes, and suggest that many may not code for proteins. Their inclusion in the human protein coding gene catalogue is being revised as part of the ongoing human genome annotation effort.

TOP

TP007 (LT) - Scaffolding Draft Genomes with Nanopore Reads
Theme: Genes
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: The Liffey A

Presenting author: Rene Warren, BC Cancer Agency, Canada

Rene Warren, BC Cancer Agency, Canada
Benjamin Vandervalk, BC Cancer Agency, Canada
Steven Jones, BC Cancer Agency, Canada
Inanc Birol, BC Cancer Agency, Canada

Session Chair: Siu Ming Yiu

Presentation Overview: Show

TOP

TP008 (HT) - Unbiased Metabolic Pathway Analysis of Large Networks by Metabolomics Integration
Theme: Systems
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: Liffey Hall 2

Presenting author: Christian Jungreuthmayer, Austrian Centre of Industrial Biotechnology, Austria

Matthias Gerstl, Austrian Centre of Industrial Biotechnology, Austria
Juergen Zanghellini, Austrian Centre of Industrial Biotechnology, Austria

Session Chair: Igor Jurisica

Presentation Overview: Show
In the presentation we will introduce the theoretical concept of our novel approach, discuss the main aspects of its numerical implementation and illustrate the biological relevance. Then, we will give a brief demonstration of our toolkit, which is open source software and freely available for everyone from our website. Our presentation will go beyond published work in that we show that the number of relevant pathways can be reduced even further. By means of a novel method based on linear programming we show that only small subsets of all pathways can simultaneously carry a thermodynamically feasible flux.
We identify these phenotypically relevant subsets in a medium scale E. coli model and show that they are characterized by their ability to maximize biomass and ATP production, consistent with evolutionary interpretations of cell behavior.

TOP

TP009 (PT) - IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis
Theme: Proteins
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: The Liffey B

Presenting author: Yana Safonova, St. Petersburg State University, Russian Federation

Stefano Bonissone, University of California, San Diego, United States
Eugene Kurpilyansky, St. Petersburg Academic University, Russian Federation
Ekaterina Starostina, St. Petersburg State University, Russian Federation
Alla Lapidus, St. Petersburg State University, Russian Federation
Jeremy Stinson, Genentech, United States
Laura Depalatis, Genentech, United States
Wendy Sandoval, Genentech, United States
Jennie Lill, Genentech, United States
Pavel Pevzner, University of California, San Diego, United States

Session Chair: Anna Tramontano

Presentation Overview: Show
The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires (next generation sequencing (NGS) and mass spectrometry (MS)) present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore,
the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. While such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires.

Availability: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms.
The source code is available from http://bioinf.spbau.ru/igtools.
Contact: ppevzner@University of California, San Diego.edu

TOP

TP010 (HT) - Quality score compression improves genotyping accuracy
Theme: Genes
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: The Liffey A

Presenting author: Deniz Yorukoglu, Massachusetts Institute of Technology, United States

Y. William Yu, Massachusetts Institute of Technology, United States
Jian Peng, Massachusetts Institute of Technology, United States

Session Chair: Siu Ming Yiu

Presentation Overview: Show
In this presentation, we show how to recover quality information directly from sequence data using the compression tool “Quartz,” rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on reads in FASTQ format but can be trivially modified to discard quality scores in other formats with sequence-quality score pairs.

Discarding 95% of quality scores counterintuitively resulted in improved genotyping, implying that compression need not come at the expense of accuracy. In contrast to previous results, we show that although completely discarding quality scores comes at the cost of accuracy and quality score recalibration to improve variant calling accuracy generally decreases compressibility, there is a happy medium at which we can get both good compression and improved accuracy.

TOP

TP011 (PT) - Refined elasticity sampling for Monte Carlo-based identification of stabilizing network patterns
Theme: Systems
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: Liffey Hall 2

Presenting author: Dorothee Childs, European Molecular Biology Laboratory, Heidelberg, Germany

Sergio Grimbs, Jacobs University Bremen, Germany
Joachim Selbig, University of Potsdam and Max-Planck Institute for Molecular Plant Physiology, Germany

Session Chair: Igor Jurisica

Presentation Overview: Show
Motivation: Structural kinetic modeling (SKM) is a framework to analyse whether a metabolic steady state remains stable under perturbation, without requiring detailed knowledge about individual rate equations.
It provides a representation of the system`s Jacobian matrix that depends solely on the network structure, steady state measurements, and the elasticities at the steady state.
For a measured steady state, stability criteria can be derived by generating a large number of structural kinetic models (SK-models) with randomly sampled elasticities and evaluating the resulting Jacobian matrices. The elasticity space can be analysed statistically in order to detect network positions that contribute significantly to the perturbation response.
Here we extend this approach by examining the kinetic feasibility of the elasticity combinations created during Monte Carlo sampling.

Results: Using a set of small example systems, we show that the majority of sampled SK-models would yield negative kinetic parameters if they were translated back into kinetic models. To overcome this problem, a simple criterion is formulated that mitigates such infeasible models.
After evaluating the small example pathways, the methodology was used to study two steady states of the neuronal TCA cycle and the intrinsic mechanisms responsible for their stability or instability. The findings of the statistical elasticity analysis confirm that several elasticities are jointly coordinated to control stability and that the main source for potential instabilities are mutations in the enzyme alpha-ketoglutarate dehydrogenase.

TOP

TP012 (HT) - Bumps and traffic lights along the translation of secretory proteins
Theme: Proteins
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: The Liffey B

Presenting author: Michal Linial, The Hebrew University of Jerusalem, Israel

Session Chair: Anna Tramontano

Presentation Overview: Show
Protein translation is the most expensive operation. Therefore, managing the speed and allocation of resources is tightly controlled. In this study we show that the entire proteome in yeast, fly, human, worm, plant and cow do not show the unique properties at the N-terminal segment while a signal is associated with the Signal peptide (SP) containing proteins. We found pattern in the N-terminal for slowing down the translation rate for SP proteome. We critically analyze these observations from statistical and evolutionary perspectives. We generalize our observation to other groups of proteins that govern by the ‘speed controls’. Specifically, the pattern of codons and their prevalence was tested for GPI-anchored and mitochondrial Transit peptide containing proteins. In all cases, a “speed control” pattern is recorded for all tested organisms. We conclude that tuning the translation of a nascent protein is essential for coping with the constraints imposed by proteins’ cellular fate.

TOP

TP013 (PT) - De Novo Meta-Assembly of Ultra-deep Sequencing Data
Theme: Genes
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: The Liffey A

Presenting author: Stefano Lonardi, University of California, Riverside, United States

Hamid Mirebrahim, University of California, Riverside, United States
Timothy J. Close, University of California, Riverside, United States

Session Chair: Siu Ming Yiu

Presentation Overview: Show
We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e., coverage of 1,000x or higher). Our proposed meta-assembler SLICEMBLER partitions the input data into optimal- sized “slices” and uses a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray) to assemble each slice individually. SLICEMBLER uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly.

To improve its efficiency, SLICEMBLER uses a generalized suffix tree to identify these frequent contigs (or fraction thereof). Extensive experimental results on real ultra-deep sequencing data (8,000x coverage) and simulated data show that SLICEMBLER significantly improves the quali- ty of the assembly compared to the performance of the base as- sembler. In fact, most of the times SLICEMBLER generates error-free assemblies. We also show that SLICEMBLER is much more resistant against high sequencing error rate than the base assembler. SLICEMBLER can be accessed at http://slicembler.cs.ucr.edu/

TOP

TP014 (HT) - Dynamic networks reveal key players in aging
Theme: Systems
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: Liffey Hall 2

Presenting author: Tijana Milenkovic, University of Notre Dame, United States

Fazle Faisal, University of Notre Dame, United States
Han Zhao, University of Notre Dame, United States

Session Chair: Igor Jurisica

Presentation Overview: Show
Studying human aging is of societal importance. Analyses of gene expression or sequence data have been indispensible for studying human aging. But these typically ignore interconnectivities between genes (proteins). Since proteins interact to keep us alive, and since this is what biological networks (BNs) model, BN research will further our understanding of aging. Because different data types can give complementary biological insights, we integrate current static BNs with aging-related expression data to form dynamic, age-specific BNs. Then, we study cellular changes with age from such BNs to identify key players in aging. Also, analogous to sequence alignment, we use BN alignment to transfer aging-related knowledge from well-studied model species to poorly-studied human between conserved network regions. In the process, we propose a novel superior BN alignment method. We validate the aging-related candidates resulting from our integrative, dynamic, and comparative BN analyses by linking them to aging-related cellular processes and diseases.

TOP

TP015 (PT) - Protein (Multi-) Location Prediction: Utilizing Interdependencies via a Generative Model
Theme: Proteins
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: The Liffey B

Presenting author: Hagit Shatkay, University of Delaware, United States

Sebastian Briesemeister, University of Tuebingen, Germany
Oliver Kohlbacher, University of Tuebingen, Germany
Ramanuja Simha , University of Delaware, United States

Session Chair: Anna Tramontano

Presentation Overview: Show
Motivation: Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein's function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins.

Results:
We introduce a probabilistic generative model for protein localization, and develop a system based on it – which we call MDLoc – that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems.

MDLoc is available at: http://www.eecis.udel.edu/~compbio/mdloc.

TOP

TP016 (PT) - Misassembly Detection using Paired-End Sequence Reads and Optical Mapping Data
Theme: Genes
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: The Liffey A

Presenting author: Martin Muggli, Colorado State University, United States

Simon Puglisi, University of Helsinki, Finland
Roy Ronen, University of California, San Diego, United States
Christina Boucher, Colorado State University, United States

Session Chair: Siu Ming Yiu

Presentation Overview: Show
Motivation: A crucial problem in genome assembly is the discov- ery and correction of misassembly errors in draft genomes. We develop a method called MISSEQUEL that enhances the quality of draft genomes by identifying misassembly errors and their break- points using paired-end sequence reads and optical mapping data. Our method also fulfills the critical need for open source compu- tational methods for analyzing optical mapping data. We apply our method to various assemblies of the loblolly pine, Francisella tularen- sis, rice and budgerigar genomes. We generated and used stimulated optical mapping data for loblolly pine and Francisella tularensis, and used real optical mapping data for rice and budgerigar.

Results: Our results demonstrate that we detect more than 54% of extensively misassembled contigs and more than 60% of locally misassembed contigs in assemblies of Francisella tularensis, and between 31% and 100% of extensively misassembled contigs and between 57% and 73% of locally misassembed contigs in assemblies of loblolly pine. Using the real optical mapping data, we correctly iden- tified 75% of extensively misassembled contigs and 100% of locally misassembled contigs in rice, and 77% of extensively misassembled contigs and 80% of locally misassembled contigs in budgerigar.

Availability: MISSEQUEL can be used as a post-processing step in combination with any genome assembler and is freely available at http://www.cs.colostate.edu/seq/

TOP

TP017 (HT) - Understanding multicellular function and disease with human tissue-specific networks
Theme: Systems
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: Liffey Hall 2

Presenting author: Aaron Wong, Princeton University, United States

Arjun Krishnan, Princeton University, United States
Casey Greene, Dartmouth, United States
Emanuela Ricciotti, University of Pennsylvania, United States
Rene Zelaya, Dartmouth, United States
Daniel Himmelstein, University of California, San Francisco, United States
Ran Zhang, Princeton University, United States
Boris Hartmann, Icahn School of Medicine at Mount Sinai, United States
Elena Zaslavsky, Icahn School of Medicine at Mount Sinai, United States
Stuart Sealfon, Icahn School of Medicine at Mount Sinai, United States
Daniel Chasman, Brigham and Women's Hospital and Harvard Medical School, United States
Garret FitzGerald, University of Pennsylvania, Perleman School of Medicine, United States
Kara Dolinski, Princeton University, United States
Tilo Grosser, University of Pennsylvania, United States
Olga Troyanskaya, Princeton University, United States

Session Chair: Igor Jurisica

Presentation Overview: Show
Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation and reveal genes’ changing functional roles across tissues. We introduce NetWAS, which combines genes with nominally significant GWAS p-values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT (http://giant.princeton.edu), provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS, and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes across more than one hundred human tissues and cell types.

TOP

TP018 (HT) - Global view of the protein universe
Theme: Proteins
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: The Liffey B

Presenting author: Rachel Kolodny, University of Haifa, Israel

Nir Ben-Tal, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
Sergey Nepomnyachiy, Polytechnic Institute of New York University, United States

Session Chair: Anna Tramontano

Presentation Overview: Show
To globally explore protein space, we represent all similarities among a representative set of domains as networks. In the “domain network” edges connect domains that share “motifs,” i.e., significantly sized segments of similar sequence and structure, and in the “motif network” edges connect recurring motifs that appear in the same domain. These networks offer a way to organize protein space, and examine how the definition of “evolutionary relatedness” among domains influences their structure. At excessively strict thresholds the networks falls apart; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions: "discrete" versus “continuous.” The discrete region consists of isolated islands, each generally corresponding to a fold; the continuous region is dominated by domains with alternating alpha and beta elements. The networks can also suggest evolutionary paths between domains, and be used for protein search and design.

TOP

TP019 (HT) - Developments to the Combined Annotation Dependent Depletion (CADD) framework for estimating deleteriousness of human genetic variation
Theme: Disease
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: The Auditorium

Presenting author: Martin Kircher, University of Washington, United States

Daniela Witten, University of Washington, United States
Preti Jain, Columbia University, United States
Brian O'Roak, Oregon Health and Science University, United States
Gregory Cooper, HudsonAlpha Institute for Biotechnology, United States
Jay Shendure, University of Washington, United States

Session Chair: Yana Bromberg

Presentation Overview: Show
The interpretation of human genetic variation on a genome-wide scale is a crucial challenge in both research and clinical settings. Available annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. missense changes). We developed a broadly applicable metric that objectively weights and integrates the large, diverse, and otherwise unwieldy collection of annotation data available. Combined Annotation Dependent Depletion (CADD) integrates these annotations by contrasting variants that survived natural selection with simulated mutations. We show that CADD-based scores correlate with allelic diversity, pathogenicity of both coding and non-coding variants, and experimentally measured regulatory effects, and also highly rank causal variants within individual genome sequences. We pre-compute SNV scores for the whole human genome and enable scoring of short InDels (http://cadd.gs.washington.edu). We describe our method and discuss the integration of additional annotations as well as methodological improvements that we have made over the last year.

TOP

TP020 (LT) - Structural features of the 5-colors Drosophila chromatin types
Theme: Genes
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: The Liffey A

Presenting author: Davide Bau, National Center for Genomic Analysis, Spain

Session Chair: Reinhard Schneider

Presentation Overview: Show
Advances in genomic technologies and the development of new analytical methods (e.g. Hi-C) have allowed to get better insights into how the genome is organized inside the cell nucleus. Recently, it has been shown that chromatin is organized in Topologically Associating Domains (TADs), large interacting domains that are conserved among different cell types.
The Drosophila genome is also folded into TADs, which are packaged into a mosaic of five principal chromatin types, defined by a unique combination of proteins. The five types of chromatin differ substantially in their genome coverage, numbers of domains, and numbers of genes [1]. To determine whether these TADs correspond to functional domains defined by epigenetic marks, Hou et al. [2], examined the composition of chromatin types within physical domains, following the 5-colors classification described in [1]. To figure out whether these “chromatin color blocks” have characteristic structural features, we studied the relationship between the 3D architecture of selected regions of the Drosophila genome and their chromatin color. Using Hi-C data at 10 Kb resolution, we found that the analyzed regions have structural features characteristic of their functional signatures. Although with the present data resolution it is not possible to unambiguously distinguish between different chromatin types by simple comparison of their structural features, our results show that different chromatin type have specific structural characteristics that correlate with their functional roles, with active and inactive chromatin type showing significantly different structural characteristics.

[1] Filion et al. Cell, 143(2), 212–224.
[2] Hou et al. Molecular Cell, 48(3), 471–484.

TOP

TP021 (PT) - Genome-wide detection of intervals of genetic heterogeneity associated with complex traits
Theme: Systems
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: Liffey Hall 2

Presenting author: Felipe Llinares-Lopez, ETH Zürich, Switzerland

Dominik Grimm, ETH Zürich, Switzerland
Dean Bodenham, ETH Zurich, Switzerland
Udo Gieraths, ETH Zurich, Switzerland
Mahito Sugiyama, Osaka University, Japan
Beth Rowan, Max Planck Institut for Developmental Biology, Germany
Karsten Borgwardt, ETH Zurich, Switzerland

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: 1) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or 2) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals.

Results: Here, we present an approach that overcomes both problems: It allows one to automatically find all contiguous sequences of SNPs in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana GWAS data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping.
Conclusions: Our novel approach can contribute to the genome- wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes.

Availability: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html
Contact: felipe.llinares@bsse.ethz.ch

TOP

TP022 (LT) - Inferring mechanism of DNA double-strand break formation using sequencing data
Theme: Genes / Disease
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: The Liffey B

Presenting author: Maga Rowicka, University of Texas Medical Branch, United States

Maga Rowicka, University of Texas Medical Branch, United States

Session Chair: Janet Kelso

Presentation Overview: Show
Double-stranded DNA breaks (DSBs) are most dangerous form of DNA damage. Despite many studies on the mechanisms of DSB formation, our knowledge of them is very incomplete, due to lack of appropriate techniques to detect DSBs accurately genome-wide. We recently developed a method to label DSBs in situ followed by deep sequencing (BLESS), and used it to map DSBs in human cells with a resolution 2-3 orders of magnitude better than previously achieved. Here, we will show how mathematical modelling and numerical simulations can elucidate and quantify various mechanisms of DSB formation. This paradigm of using in silico experiments as a method of choice for discovery and quantification of global, genome-wide rules and chromatin context dependence should be also beneficial for other systems studied using omics data.

TOP

TP023 (HT) - Big Data, AI, and Evolution: Towards a Calculus for Precision Medicine
Theme: Disease
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: The Auditorium

Presenting author: Olivier Lichtarge, Baylor College of Medicine, United States

Martin Lisewski, Baylor College of Medicine, United States
Angela Wilkins, Baylor College of Medicine, United States
Panagiotis Katsonis, Baylor College of Medicine, United States

Session Chair: Yana Bromberg

Presentation Overview: Show
Slide 1 will break the problem of computing personalized therapy into steps, each one a paper. Slides 2-4 will discuss the Cell paper: a network compression approach to integrate and analyze structured Big Data, from databases, culminating with the discovery of the target and mechanism of the best anti-malarial drug with use for future drug screens. Slide 5-6, will expand integration to unstructured data from the entire literature using AI, with an application to p53 biology. Slide 7-9 will turn to the inclusion of personalized information into the network by scoring accurately individual genome variations. Illustrations will summarize winning performance to predict deleterious mutations at the CAGI blind competition and application to head and neck cancer. Slide 10 will summarize the strategy, key results and future directions.

TOP

TP024 (HT) - A comparative encyclopedia of DNAelements in the mouse genome
Theme: Genes
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: The Liffey A

Presenting author: Feng Yue, The Pennsylvania State University, United States

Yong Cheng, Stanford University, United States
Alessandra Breschi, Centre for Genomic Regulation and UPF, Spain
Jeff Vierstra, University of Washington, United States
Weisheng Wu, Computational Medicine & Bioinformatics, United States
Tyrone Ryba, New College of Florida, United States
Ricard Sandstrom, University of Washington, United States
Zhihai Ma, Stanford University, United States
Carrie Davis, Cold Spring Harbor Laboratory, United States
Benjamin Pope, Florida State University, United States
Yin Shen, University of California San Diego, United States
John Stamatoyannopoulos, University of Washington, United States
Michael Snyder, Stanford University, United States
Roderic Guigo, Centre for Genomic Regulation and UPF, Spain
Thomas Gingeras, Cold Spring Harbor Laboratory, United States
David Gilbert, Florida State University, United States
Ross Hardison, The Pennsylvania State University, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show
As the premier model organism in biomedical research, the laboratory mouse shares the vast majority of protein-coding genes with humans, but significant differences exist between the two mammals, posing considerable challenges in the modeling of human diseases. The mouse ENCODE consortium produced more than1000 coordinated datasets, including transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains, in over 100 mouse cell types and tissues. By comparative analysis with the data from human ENCODE, we found that although the majority of gene expression and cis-regulatory elements are conserved between the two species, a large degree of gene regulatory elements appear to be species-specific and these species-specific elements are enriched for genes involved in certain pathways such as immune system and metabolic process, suggesting different gene pathways evolve at distinct rates. Our work also provides a great resource for research into mammalian biology and mechanisms of human disease.

TOP

TP025 (PT) - Identification of causal genes for complex traits
Theme: Systems
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: Liffey Hall 2

Presenting author: Farhad Hormozdiari, University of California, Los Angeles, United States

Gleb Kichaev, University of California, Los Angeles, United States
Wen-Yun Yang, University of California, Los Angeles, United States
Bogdan Pasaniuc, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: Although genome-wide association studies (GWAS)
have identified thousands of variants associated with common
diseases and complex traits, only a handful of these variants
are validated to be causal. We consider “causal variants” as
variants which are responsible for the association signal at a
locus. As opposed to association studies that benefit from linkage
disequilibrium (LD), the main challenge in identifying causal variants
at associated loci lies in distinguishing among the many closely
correlated variants due to LD. This is particularly important for model
organisms such as inbred mice, where LD extends much further than
in human populations, resulting in large stretches of the genome
with significantly associated variants. Furthermore, these model
organisms are highly structured, and require correction for population
structure to remove potential spurious associations.

Results: In this work, we propose CAVIAR-Gene, a novel method
that is able to operate across large LD regions of the genome while
also correcting for population structure. A key feature of our approach
is that it provides as output a minimally sized set of genes that
captures the genes which harbor causal variants with probability .
Through extensive simulations, we demonstrate that our method not
only speeds up computation, but also have an average of 10% higher
recall rate compared to the existing approaches. We validate our
method using a real mouse high-density lipoprotein data (HDL) and
show that CAVIAR-Gene is able to identify Apoa2 (a gene known to
harbor causal variants for HDL), while reducing the number of genes
that need to be tested for functionality by a factor of 2.

The software is freely available for download at genetics.cs.University of California, Los Angeles.edu/caviar

TOP

TP026 (PT) - Comparing Genomes with Rearrangements and Segmental Duplications
Theme: Genes / Disease
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: The Liffey B

Presenting author: Mingfu Shao, EPFL, Switzerland

Bernard Moret, EPFL, Switzerland

Session Chair: Janet Kelso

Presentation Overview: Show
Motivation: Large-scale evolutionary events such as genomic rearrangements and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability.

Results:We study the comparison of two genomes under a model including general rearrangements (through DCJ) and segmental duplications. We formulate the comparison as an optimization problem, and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the ILP formulation yields an exact, yet practical, algorithm -- the first practical method to provide exact solutions to the problem of comparing two arbitrary genomes under rearrangements and duplications. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications), and compare its performance with that of the state-of-the-art method MSOAR (an approximation method), using both simulations and real data. On simulated datasets our method outperforms MSOAR by a significant margin, and on 5 well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons.

Availability: http://lcbb.epfl.ch/softwares/coser
Contact: mingfu.shao@epfl.ch

TOP

TP027 (LT) - Optimizing cancer genome sequencing and analysis
Theme: Disease
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: The Auditorium

Presenting author: Malachi Griffith, Washington University, United States

Malachi Griffith, Washington University, United States
Christopher Miller, Washington University, United States
Obi Griffith, Washington University, United States
Kilannin Krysiak, Washington University, United States
Zachary Skidmore, Washington University, United States
Avinash Ramu, Washington University, United States
Jason Walker, Washington University, United States
Ha Dang, Washington University, United States
Lee Trani, Washington University, United States
David Larson, Washington University, United States
Ryan Demeter, Washington University, United States
Michael Wendl, Washington University, United States
Rachel Austin, Washington University, United States
Vincent Magrini, Washington University, United States
Sean McGrath, Washington University, United States
Amy Ly, Washington University, United States
Shashikant Kulkarni, Washington University, United States
Joshua McMichael, Washington University, United States
Matt Cordes, Washington University, United States
Catrina Fronick, Washington University, United States
Robert Fulton, Washington University, United States
Christopher Maher, Washington University, United States
Li Ding, Washington University, United States
Jeffery Klco, Washington University, United States
Elaine Mardis, Washington University, United States
Timothy Ley, Washington University, United States
Richard Wilson, Washington University, United States

Session Chair: Yana Bromberg

Presentation Overview: Show
Tumors are typically sequenced to depths of 75-100x (exome) or 30-50x (whole genome). We demonstrate that current sequencing paradigms based on this coverage are inadequate for tumors that are impure, aneuploid, and/or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312x) whole genome sequencing and exome capture (up to ~433x) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested 7 alignment algorithms and 7 single-nucleotide variant callers, and validated ~200,000 putative SNVs by sequencing them to mean depths of ~1,000x. Additional targeted sequencing provided over 10,000x coverage and ddPCR assays provided up to ~250,000x sampling of selected sites (of up to 2 ug of input DNA per assay). Using these data, we evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource.

TOP

TP028 (PT) - Cypiripi: exact genotyping of CYP2D6 using High Throughput Sequencing Data
Theme: Genes
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: The Liffey A

Presenting author: Salem Malikic, Simon Fraser University, Canada

Ibrahim Numanagić, Simon Fraser University, Canada
Victoria Pratt, Indiana University School of Medicine, United States
Todd Skaar, IUPUI, United States
David A. Flockhart, Indiana University School of Medicine, United States
S. Cenk Sahinalp, Simon Fraser University, Canada

Session Chair: Reinhard Schneider

Presentation Overview: Show
Motivation: CYP2D6 is highly polymporphic gene which encodes the (CYP2D6) enzyme, involved in the metabolism of 20-25% of all clinically prescribed drugs and other xenobiotics in the human body. CYP2D6 genotyping is recommended prior to treatment decisions involving one or more of the numerous drugs sensitive to CYP2D6 allelic composition. In this context High Throughput Sequencing (HTS) technologies provide a promising time-efficient and cost- effective alternative to currently used genotyping techniques. In order to achieve accurate interpretation of HTS data, however, one needs to overcome several obstacles such as high sequence similarity and genetic recombinations between CYP2D6 and evolutionarily related pseudogenes CYP2D7 and CYP2D8, high copy number variation among individuals, and short read lengths generated by HTS technologies.

Results: In this work, we present the first algorithm to computationally infer CYP2D6 genotype at basepair resolution from HTS data. Our algorithm is able to resolve complex genotypes, including alleles that are the products of duplication, deletion and fusion events involving CYP2D6 and its evolutionarily related cousin CYP2D7. Through extensive experiments using simulated and real datasets we show that our algorithm accurately solves this important problem with potential clinical implications.

Availability: Cypiripi is available at http://sfu-compbio.github.io/cypiripi.
Contact: S. Cenk Sahinalp (cenk@sfu.ca)

TOP

TP029 (LT) - Exploring disease etiology through a large-scale mapping of deleterious genes to cell types
Theme: Systems
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: Liffey Hall 2

Presenting author: Alex Cornish, Imperial College London, United Kingdom

Ioannis Filippis, Imperial College London, United Kingdom
Alessia David, Imperial College London, United Kingdom
Michael Sternberg, Imperial College London, United Kingdom

Session Chair: Nicolas Le Novere

Presentation Overview: Show
While the majority of diseases are manifested within a specific anatomical structure, known disease-associated alleles are often inherited and therefore present throughout the body. Understanding how these ubiquitous alleles produce localized disease is key to understanding the mechanisms that drive disease. We have developed a novel approach, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes on cell type-specific interactomes to identify the cell types most likely to be affected by the alleles. Cell type-specific interactomes were created through the integration of protein-protein interaction (PPI) data and cell type-specific expression data from the FANTOM5 project. We conducted text-mining of the PubMed database to produce an independent map of disease-associated cell types, which we used to validate our method. Our method identifies previously-suggested associations, along with associations that warrant further study. These include mast cells and multiple sclerosis (MS); a population of cells that is currently being targeted in an MS phase 2 clinical trial. Furthermore, we used the associations identified by our method to construct a pathogenic cell type-based diseasome, offering insight into diseases linked by common etiology. The dataset produced represents the first large-scale mapping of diseases to their pathogenic cell types. Overall, we demonstrate that the GSC method links disease-associated genes to the phenotypes they produce; one of the key goals of systems biology.

TOP

TP030 (HT) - 3D hotspots of recurrent retroviral insertions reveal long-range interactions with cancer genes
Theme: Genes / Disease
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: The Liffey B

Presenting author: Jeroen de Ridder, Delft University of Technology, Netherlands

Sepideh Babaei, Delft University of Technology, Netherlands
Waseem Akhtar, Netherlands Cancer Institute, Netherlands
Johann de Jong, Netherlands Cancer Institute, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands

Session Chair: Janet Kelso

Presentation Overview: Show
Genomically distal mutations can contribute to deregulation of cancer genes by engaging in chromatin interactions. To study this, we overlay viral cancer-causing insertions obtained in a murine retroviral insertional mutagenesis screen with genome-wide chromatin conformation capture data. In this talk, we show that insertions tend to cluster in 3D hotspots within the nucleus. The identified hotspots are significantly enriched for known cancer genes, and bear the expected characteristics of bona-fide regulatory interactions, such as enrichment for transcription factor binding sites. Additionally, we observe a striking pattern of mutual exclusive integration. This is an indication that insertions in these loci target the same gene, either in their linear genomic vicinity or in their 3D spatial vicinity. Our findings shed new light on the repertoire of targets obtained from insertional mutagenesis screening and underlines the importance of considering the genome as a 3D structure when studying effects of genomic perturbations.

TOP

TP031 (PT) - Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival
Theme: Disease
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: The Auditorium

Presenting author: A. Grant Schissler, The University of Arizona, United States

Vincent Gardeux, The University of Arizona, United States
Qike Li, The University of Arizona, United States
Ikbel Achour, The University of Arizona, United States
Haiquan Li, The University of Arizona, United States
Walter W Piegorsch, The University of Arizona, United States
Yves A Lussier, The University of Arizona, United States

Session Chair: Yana Bromberg

Presentation Overview: Show
Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We pre-viously employed a metric that could prioritize the statistical signifi-cance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g., the equivalent to a gene expression fold-change).

Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples Mahalanobis Distance (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simula-tions), while not inflating false positive rate using a study with biolog-ical replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant, and are predic-tive of breast cancer survival (p<0.05, n=80 invasive carcinoma; TCGA RNA-sequences).

Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patient’s transcriptome. These path-ways offer the opportunities for deriving clinically actionable deci-sions that have the potential to complement the clinical interpretabil-ity of personal polymorphisms obtained from DNA acquired or inher-ited polymorphisms and mutations. In addition, it offers an opportuni-ty for applicability to diseases in which DNA changes may not be relevant, and thus expand the “interpretable ‘omics” of single sub-jects (e.g. personalome).

Availability: http://www.lussierlab.net/publications/N-of-1-pathways
Contact: yves@email.arizona.edu

TOP

TP032 (HT) - Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters
Theme: Genes / Proteins
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: The Liffey A

Presenting author: Marnix Medema, Wageningen University, Netherlands

Peter Cimermancic, University of California, San Francisco, United States
Jan Claesen, University of California, San Francisco, United States
Kenji Kurita, University of California, Santa Cruz, United States
Eriko Takano, University of Manchester, United Kingdom
Andrej Sali, University of California, San Francisco, United States
Roger Linington, University of California, Santa Cruz, United States
Michael Fischbach, University of California, San Francisco, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show
Bacterial secondary metabolism is of major importance to society, as it is the source of large numbers of antibiotics, anticancer agents, and other important bioactive compounds. The genes encoding the biosynthetic pathways to make these molecules are usually grouped together on the chromosome in so-called biosynthetic gene clusters (BGCs). In our recent paper (Cell 158: 412-421, 2014), we describe a novel algorithm to effectively identify BGCs, and apply this to perform a systematic analysis of BGCs throughout the prokaryotic tree of life. Network analysis of the predicted BGCs revealed numerous large gene cluster families, most of which are uncharacterized. We experimentally characterized the largest of these, which is widespread among bacteria and encodes the biosynthesis of molecules that appear to protect their hosts against oxidative stress. Finally, a detailed evolutionary genomic analysis of all known and predicted BGCs revealed how the astonishing molecular diversity of microbial secondary metabolism continuously evolves.

TOP

TP033 (PT) - Integrative Random Forest for Gene Regulatory Network Inference
Theme: Systems
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: Liffey Hall 2

Presenting author: Francesca Petralia, Icahn School of Medicine at Mount Sinai, United States

Pei Wang, Icahn School of Medicine at Mount Sinai, United States
Jialiang Yang, Icahn School of Medicine at Mount Sinai, United States
Zhidong Tu, Icahn School of Medicine at Mount Sinai, United States

Session Chair: Nicolas Le Novere

Presentation Overview: Show

TOP

TP034 (PT) - In silico phenotyping via co-training for improved phenotype prediction from genotype
Theme: Genes / Disease
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: The Liffey B

Presenting author: Damian Roqueiro, ETH Zurich, Switzerland

Menno Witteveen, ETH Zurich, Switzerland
Verneri Anttila, Broad Institute of MIT and Harvard, United States
Gisela Terwindt, Leiden University Medical Center, Netherlands
Arn van den Maagdenberg, Leiden University Medical Center, Netherlands
Karsten Borgwardt, ETH Zurich, Switzerland

Session Chair: Janet Kelso

Presentation Overview: Show
Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction.

Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium.

Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction.

TOP

TP035 (PT) - FERAL: Network Based Classifier with Application to Breast Cancer Outcome Prediction
Theme: Disease
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: The Auditorium

Presenting author: Amin Allahyar, Delft University of Technology, Netherlands

Jeroen De Ridder, Delft University of Technology, Netherlands

Session Chair: Yana Bromberg

Presentation Overview: Show
Motivation: Breast cancer outcome prediction based on gene expression profiles is an important strategy for personalize patient care. To improve performance and consistency of discovered markers of the intial molecular classifiers, Network based Outcome Prediction methods (NOPs) have been proposed. In spite of the initial claims, recent studies revealed that neither performance nor consistency can be improved using these methods. NOPs typically rely on the construction of meta-genes by averaging the expression of several genes connected in a network that encodes protein interactions or pathway information. In this paper, we expose several fundamental issues in NOPs that impede on the prediction power, consistency of discovered markers and obscures biological interpretation.

Results: To overcome these issues, we propose FERAL, a network- based classifier that hinges upon the Sparse Group Lasso which performs simultaneous selection of marker genes and training of the prediction model. An important feature of FERAL, and a significant departure from existing NOPs, is that is uses multiple operators to summarize genes into meta-genes. This gives the classifier the opportunity to select the most relevant meta-gene for each gene set. Extensive evaluation revealed that the discovered markers are markedly more stable across independent datasets. Moreover, interpretation of the marker genes detected by FERAL reveals valuable mechanistic insight into the aetiology of breast cancer.

TOP

TP036 (HT) - Understanding operon evolution using an event-driven model and phylogenetic visualizatons
Theme: Genes / Proteins
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: The Liffey A

Presenting author: Iddo Friedberg, Miami University, United States

David Ream, Miami University, United States
Asma Bankapur, Miami University, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show
Gene blocks are genes co-located on the chromosome. In many cases, genes blocks are conserved between bacterial species, sometimes as operons, when genes are co-transcribed. The conservation is rarely absolute: gene loss, gain, duplication,
block splitting, and block fusion are frequently observed. An open question in bacterial molecular evolution is that of the formation and breakup of gene blocks, for which several models have been proposed. These models, however, are not generally applicable to all types of gene blocks, and consequently cannot be used to broadly compare and study gene block evolution. To address this problem we introduce an event-based
method for tracking gene block evolution in bacteria.

In my talk will explain this method, and demonstrate a new visualization technique we call phylomatrices. I will show how we can easily gauge operon conservation, and discover interesting clade-based aberrations as well as horizontal gene transfers.

TOP

TP037 (PT) - Gene network inference by fusing data from diverse distributions
Theme: Systems
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: Liffey Hall 2

Presenting author: Marinka Zitnik, University of Ljubljana, Slovenia

Blaz Zupan, University of Ljubljana, Slovenia

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of data sets.

Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed data sets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of data sets offers substantial gains relative to inference of separate networks for each data set. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies.

TOP

TP038 (HT) - The human splicing code reveals new insights into the genetic determinants of disease
Theme: Genes / Disease
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: The Liffey B

Presenting author: Brendan Frey, University of Toronto, Canada

Hui Xiong, University of Toronto, Canada
Babak Alipanahi, University of Toronto, Canada
Leo Lee, University of Toronto, Canada
Hannes Bretschneider, University of Toronto, Canada
Daniele Merico, University of Toronto, Canada
Ryan Yuen, University of Toronto, Canada
Yimin Hua, University of Toronto, Canada
Serge Gueroussov, University of Toronto, Canada
Tim Hughes, University of Toronto, Canada
Quaid Morris, University of Toronto, Canada
Yoseph Barash, University of Toronto, Canada
Nebojsa Jojic, University of Toronto, Canada
Steve Scherer, University of Toronto, Canada
Ben Blencowe, University of Toronto, Canada

Session Chair: Janet Kelso

Presentation Overview: Show
To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.

TOP

TP039 (PT) - Integrating Different Data Types by Regularized Unsupervised Multiple Kernel Learning with Application to Cancer Subtype Discovery
Theme: Disease
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: The Auditorium

Presenting author: Nora Katharina Speicher, Max Planck Institute for Informatics, Germany

Nico Pfeifer, Max Planck Institute for Informatics, Germany

Session Chair: Yana Bromberg

Presentation Overview: Show
Despite ongoing cancer research, available therapies are still limited in quantity and effectiveness, and making treatment decisions for individual patients remains a hard problem. Established subtypes, which help guide these decisions, are mainly based on individual data types. However, the analysis of multidimensional patient data involving the measurements of various molecular features could reveal intrinsic characteristics of the tumor. Large-scale projects accumulate this kind of data for various cancer types, but we still lack the computational methods to reliably integrate this information in a meaningful manner. Therefore, we apply and extend current multiple kernel learning for dimensionality reduction approaches. On the one hand, we add a regularization term to avoid overfitting during the optimization procedure, and on the other hand, we show that one can even use several kernels per data type and thereby alleviate the user from having to choose the best kernel functions and kernel parameters for each data type beforehand.

We have identified biologically meaningful subgroups for five different cancer types. Survival analysis has revealed significant differences between the survival times of the identified subtypes, with P-values comparable or even better than state-of-the-art methods. Moreover, our resulting subtypes reflect combined patterns from the different data sources, and we demonstrate that input kernel matrices with only little information have less impact on the integrated kernel matrix. Our subtypes show different responses to specific therapies, which could eventually assist in treatment decision making.

TOP

TP040 (PT) - Deconvolving Molecular Signatures of Interactions Between Microbial Colonies
Theme: Genes / Proteins
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: The Liffey A

Presenting author: Yeu-Chern Harn, University of North Carolina, Chapel Hill, United States

Matthew Powers, University of North Carolina, Chapel Hill, United States
Elizabeth Shank, University of North Carolina, Chapel Hill, United States
Vladimir Jojic, University of North Carolina, Chapel Hill, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show
Motivation: The interactions between microbial colonies through chemical signaling is not well understood. A microbial colony can use different molecules to inhibit or accelerate the growth of other colonies. A better understanding of the molecules involved in these interactions could lead to advancements in health and medicine. Imaging mass spectrometry (IMS) applied to co-cultured microbial communities aims to capture the spatial characteristics of the colonies’ molecular ﬁngerprints. These data are high-dimensional and require computational analysis methods to interpret.

Results: Here we present a dictionary learning method that deconvolves spectra of different molecules from IMS data. We call this method MOLecular Dictionary Learning (MOLDL). Unlike standard dictionary learning methods which assume Gaussian-distributed data, our method uses the Poisson distribution to capture the count nature of the mass spectrometry data. Also, our method incorporates universally applicable information on common ion types of molecules in MALDI mass spectrometry. This greatly reduces model parametrization and increases deconvolution accuracy by eliminating spurious solutions. Moreover, our method leverages the spatial nature of IMS data by assuming that nearby locations share similar abundances, thus avoiding overﬁtting to noise. Tests on simulated data sets show that this method has good performance in recovering molecule dictionaries. We also tested our method on real data measured on a microbial community composed of two species. We conﬁrmed through follow-up validation experiments that our method recovered true and complete signatures of molecules. These results indicate that our method can discover molecules in IMS data reliably, and hence can help advance the study of interaction of microbial colonies.

Availability : The code used in this paper is available at: https://github.com/frizfealer/IMS_project

TOP

TP041 (PT) - Inferring orthologous gene regulatory networks using interspecies data fusion
Theme: Systems
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: Liffey Hall 2

Presenting author: Christopher Penfold, University of Warwick, United Kingdom

Jonathan Millar, University of Warwick, United Kingdom
David Wild, University of Warwick, United Kingdom

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between, related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved “hypernetwork”. In both frameworks information about network similarity is captured via graph kernels, with the networks additionally informed by species- specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression.

Results: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than stand alone inference. The direct propagation of of network information via the non-hierarchical framework is more appropriate when there are relatively few species, whilst the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally the use of S.cerevisiae data and networks to inform inference of networks in the budding yeast S.pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase.

Availability: Matlab code is available from a temporary anonymous url for peer review http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/

TOP

TP042 (HT) - Associating enhancers with TH2 memory differentiation and asthma susceptibility
Theme: Genese / Disease
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: Liffey Hall 2

Presenting author: Lukas Chavez, German Cancer Research Institute, Germany

Session Chair: Janet Kelso

Presentation Overview: Show
A characteristic feature of asthma is the aberrant accumulation, differentiation or function of memory CD4(+) T cells that produce type 2 cytokines (TH2 cells). By mapping genome-wide histone modification profiles for subsets of T cells isolated from peripheral blood of healthy and asthmatic individuals, we identified enhancers with known and potential roles in the normal differentiation of human TH1 and TH2 cells. We discovered disease-specific enhancers in T cells that differ between healthy and asthmatic individuals. Enhancers that gained the histone H3 Lys4 dimethyl (H3K4me2) mark during TH2 cell development showed the highest enrichment for asthma-associated single nucleotide polymorphisms (SNPs), which supported a pathogenic role for TH2 cells in asthma. In silico analysis of cell-specific enhancers revealed transcription factors, microRNAs and genes potentially linked to human TH2 cell differentiation. Our results establish the feasibility and utility of enhancer profiling in well-defined populations of specialized cell types involved in disease pathogenesis.

TOP

TP043 (PT) - Reconstructing 16S rRNA genes in metagenomic data
Theme: Genes
Date: Monday, July 13, 10:10 am - 10:30 amRoom: The Liffey A

Presenting author: Yanni Sun, Michigan State University, United States

Jikai Lei, Michigan State University, United States
James Cole, Michigan State University, United States
Cheng Yuan, Michigan State University, United States

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Metagenomic data, which contains sequenced DNA reads of uncultured microbial species from environmental samples, provide a unique opportunity to thoroughly analyze microbial species that have never been identified before. Reconstructing 16S ribosomal RNA, a phylogenetic marker gene, is usually required to analyze the composition of the metagenomic data. However, massive volume of dataset, high sequence similarity between related species, skewed microbial abundance, and lack of reference genes make 16S rRNA reconstruction difficult. Generic de novo assembly tools are not optimized for assembling 16S rRNA genes.
In this work, we introduce a targeted rRNA assembly tool, REAGO (REconstruct 16S ribosomal RNA Genes from metagenOmic data). It addresses the above challenges by combining secondary structure-aware homology search, properties of rRNA genes, and de novo assembly. Our experimental results show that our tool can correctly recover more rRNA genes than several popular generic metagenomic assembly tools and specially designed rRNA construction tools.

Availability: The source code of REAGO is freely available at github.
Contact: chengy@msu.edu and yannisun@msu.edu

TOP

TP044 (PT) - Inferring parental genomic ancestries using pooled semi-Markov processes
Theme: Systems
Date: Monday, July 13, 10:10 am - 10:30 amRoom: The Liffey B

Presenting author: James Zou, Microsoft Research, United States

Eran Halperin, Tel Aviv University, Israel
Esteban Burchard, University of California San Francisco, United States
Sriram Sankararaman, Harvard Medical School, United States

Session Chair: Hidde de Jong

Presentation Overview: Show
Motivation: A basic problem of broad public and scientific interest is to use the DNA of an individual to infer the genomic ancestries of the parents. In particular, we are often interested in the fraction of each parent's genome that come from specific ancestries (e.g. European, African, Native American, etc). This has many applications ranging from understanding the inheritance of ancestry-related risks and traits to quantifying human assortative mating patterns.

Results: We model the problem of parental genomic ancestry inference as a pooled semi-Markov process. We develop a general mathematical framework for pooled semi-Markov processes and construct efficient inference algorithms for these models. Applying our inference algorithm to genotype data from 231 Mexican trios and 258 Puerto Rican trios where we have the true genomic ancestry of each parent, we demonstrate that our method accurately infers parameters of the semi-Markov processes and parents' genomic ancestries. We additionally validated the method on simulations. Our model of pooled semi-Markov process and inference algorithms may be of independent interest in other settings in genomics and machine learning.

TOP

TP045 (PT) - A hierarchical Bayesian model for flexible module discovery in three-way time series data
Theme: Disease / Other
Date: Monday, July 13, 10:10 am - 10:30 amRoom: Liffey Hall 2

Presenting author: David Amar, Tel Aviv University, Israel

Daniel Yekutieli, Tel Aviv University, Israel
Adi Maron-Katz, Tel Aviv University, Israel
Talma Hendler, Tel Aviv University, Israel
Ron Shamir, Tel Aviv University, Israel

Session Chair: Yves Moreau

Presentation Overview: Show
Motivation: Detecting modules of coordinated activity is fundamental in the analysis of large biological studies. For two-dimensional data (e.g. genes x patients) this is often done via clustering or biclustering. More recently, studies monitoring patients over time have added another dimension. Analysis is much more challenging in this case, especially when time measurements are not synchronized. New methods that can analyze 3-way data are thus needed.

Results: We present a new algorithm for finding coherent and flexible modules in 3-way data. Our method can identify both core modules that appear in multiple patients and patient-specific augmentations of these core modules that contain additional genes. Our algorithm is based on a hierarchical Bayesian data model and Gibbs sampling. The algorithm outperforms extant methods on both simulated and real data.The method successfully dissected key components of septic shock response from time series measurements of gene expression. Detected patient-specific module augmentations were informative for disease outcome. In analyzing brain fMRI time series of subjects at rest, it detected the pertinent brain regions involved.

Availability: R code and data are available at http://acgt.cs.tau.ac.il/twigs/

TOP

TP046 (HT) - A genome-wide map of hyper-edited RNA reveals numerous new sites
Theme: Genes
Date: Monday, July 13, 10:30 am - 10:50 amRoom: The Liffey A

Presenting author: Erez Levanon, Bar-Ilan University, Israel

Hagit Porath, Bar-Ilan University, Israel
Shai Carmi , Columbia University, United States

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Adenosine-to-inosine editing is one of the most frequent post-transcriptional modifications, manifested as A-to-G mismatches when comparing RNA sequences with their source DNA. Recently, a number of RNA seq data sets have been screened for the presence of A-to-G editing, and hundreds of thousands of editing sites identified. Here we show that existing screens missed the majority of sites by ignoring reads with excessive ('hyper') editing that do not easily align to the genome. We show that careful alignment and examination of the unmapped reads in RNA-seq studies in human reveal numerous new sites, usually many more than originally discovered, and in precisely those regions that are most heavily edited. Specifically, we more than double the number of detected sites in several published screens. We also identify thousands of new sites in mouse, rat, opossum and fly. Our results establish that hyper-editing events account for the majority of editing sites.

TOP

TP047 (PT) - Adapt-Mix: Learning local genetic correlation structure improves summary statistics based analyses
Theme: Systems
Date: Monday, July 13, 10:30 am - 10:50 amRoom: The Liffey B

Presenting author: Brielin Brown, University of California at Berkeley, United States

Celeste Eng, University of California San Francisco, United States
Scott Huntsman, University of California San Francisco, United States
Donglei Hu, University of California San Francisco, United States
Dara Torgerson, University of California San Francisco, United States
Esteban Burchard, University of California, San Francisco, United States
Noah Zaitlen, University of California San Francisco, United States
Danny Park, University of California San Francisco, United States

Session Chair: Hidde de Jong

Presentation Overview: Show
Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants, and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics based methods rely on global “best guess” reference panels in order to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure, and is not feasible when appropriate reference panels are missing or small. Here we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics based methods in arbitrary populations.

Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics based methods: imputation and joint-testing. When using our method as opposed to the current standard of “best guess” reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing.

Availability: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt mix

TOP

TP048 (LT) - CANDL: Coarsely Aligning Networks with Diffusion and Landmarks
Theme: Disease / Other
Date: Monday, July 13, 10:30 am - 10:50 amRoom: Liffey Hall 2

Presenting author: Benjamin Hescott, Tufts University, United States

Inbar Fried, Tufts University, United States
Anthony Cannistra, Tufts University, United States
Carter Casey, Tufts University, United States
Adam Piel, Tufts University, United States
Mark Crovella, Boston University, United States
Benjamin Hescott, Tufts University, United States

Session Chair: Yves Moreau

Presentation Overview: Show
In this work we shift focus in the global network alignment problem, moving away from identifying local structural similarities, and focusing instead on finding coherent, functionally related groups of genes across species. We introduce CANDL — Coarsely Aligning Networks with Diffusion and Landmarks. Unlike previous methods that seek to conserve local motifs, CANDL identifies neighborhoods that are functionally similar. To do this, CANDL incorporates two key innovations. First, it uses a small set of known homologs to establish a set of landmarks that form the basis for a metric embedding of network nodes. Second, CANDL embeds the network using metrics known to capture functionally-relevant network structure, namely random walk commute time and eigenvectors of the Laplacian heat kernel. We show that CANDL captures functionally coherent neighborhood mappings considerably better than current state of the of art aligners. To do so we introduce two new validation tests based of functional coherence: cross validation using known homologs, and similarity of GO terms in neighborhoods. In the process, we also identify and quantify previously overlooked limitations of structural network alignment techniques that arise due to network automorphisms.

TOP

TP049 (HT) - Simultaneous reconstruction of microRNA-target and ceRNA networks
Theme: Genes
Date: Monday, July 13, 10:50 am - 11:10 amRoom: The Liffey A

Presenting author: Pavel Sumazin, Baylor College of Medicine, United States

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
We introduce a method for simultaneous prediction of microRNA-target interactions and their mediated competitive endogenous RNA (ceRNA) interactions. Using high-throughput validation assays in breast cancer cell lines, we show that our integrative approach significantly improves on microRNA-target prediction accuracy as assessed by both mRNA and protein level measurements. Our biochemical assays support nearly 500 microRNA-target interactions with evidence for regulation in breast-cancer tumors. Moreover, these assays constitute the most extensive validation platform for computationally inferred networks of microRNA-target interactions in breast-cancer tumors, providing a useful benchmark to ascertain future improvements.

TOP

TP050 (HT) - Natural genetic variation impacts expression levels of coding, non-coding, and antisense transcripts in fission yeast
Theme: Systems
Date: Monday, July 13, 10:50 am - 11:10 amRoom: The Liffey B

Presenting author: Mathieu Clément-Ziza, University of Cologne, Germany

Francesc X Marsellach, University College London, United Kingdom
Sandra Codlin, University College London, United Kingdom
Manos A Papadakis, Technical university of Denmark, Denmark
Susanne Reinhardt, Technische Universität Dresden, Germany
Maria Rodriguez-Lopez, University College London, United Kingdom
Stuart Martin, University College London, United Kingdom
Samuel Marguerat, Imperial College London, United Kingdom
Alexander Schmidt, University of Basel, Switzerland
Eunhye Grace Lee, University College London, United Kingdom
Christopher T Workman, Technical university of Denmark, Denmark
Jürg Bähler, University College London, United Kingdom
Andreas Beyer, University of Cologne, Germany

Session Chair: Hidde de Jong

Presentation Overview: Show
Our current understanding of how natural genetic variation affects gene expression beyond well annotated coding genes is still limited. The use of deep sequencing technologies for the study of expression quantitative trait loci (eQTLs) has the potential to close this gap. Here, we generated the first recombinant strain library for fission yeast and conducted an RNA-seq-based QTL study of the coding, non-coding, and antisense transcriptomes. We show that the frequency of distal effects (trans-eQTLs) greatly exceeds the number of local effects (cis-eQTLs) and that non-coding RNAs are as likely to be affected by eQTLs as protein-coding RNAs. We identified a genetic variation of swc5 that modifies the levels of many RNAs, with effects on both sense and antisense transcription, and downstream effects on the histone composition at promoters. The strains, methods, and datasets generated here provide a rich resource for future studies.

TOP

TP051 (HT) - Bayesian inference of viral fitness landscapes in the quasispecies model
Theme: Disease / Other
Date: Monday, July 13, 10:50 am - 11:10 amRoom: Liffey Hall 2

Presenting author: David Seifert, ETH Zurich, Switzerland

Francesca Di Giallonardo, University Hospital Zurich, Switzerland
Karin J. Metzner, University Hospital Zurich, Switzerland
Huldrych F. Günthard, University Hospital Zurich, Switzerland
Niko Beerenwinkel, ETH Zurich, Switzerland

Session Chair: Yves Moreau

Presentation Overview: Show
QuasiFit is a Bayesian MCMC sampler for inferring intra-host viral fitness landscapes from next-generation sequencing data. To estimate fitness, QuasiFit uses cross-sectional genetic data and assumes the viral quasispecies to be in mutation-selection equilibrium. With the inferred posterior fitness distribution, effects such as epistasis and neutral genotype networks can be determined, which will be helpful in judging which viral strains are highly fit and driving intra-host evolution. We applied QuasiFit to infer the viral fitness landscapes in two HIV-infected patients. By using intra-host data, QuasiFit enables learning of host-specific, personalized viral fitness landscapes.

TOP

TP052 (HT) - NGS-Logistics: federated analysis of NGS sequence variants across multiple locations.
Theme: Genes
Date: Monday, July 13, 11:40 am - 12:00 pmRoom: The Liffey A

Presenting author: Amin Ardeshirdavani, KU Leuven, Belgium

Erika Souche, KU Leuven, Belgium
Luc Dehaspe, KU Leuven, Belgium
Jeroen Van Houdt, KU Leuven, Belgium
Joris Robert Vermeesch, KU Leuven, Belgium
Yves Moreau, KU Leuven, Belgium

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
As many personal genomes are now being sequenced, collaborative analysis of those genomes has become essential to effectively gain biomedical knowledge from those sequencing efforts. However, analysis of personal genomic data raises important confidentiality issues. We propose a methodology called NGS-Logistics, for federated analysis of sequence variants from personal genomes that contributes to alleviate those problems. Our method allows querying the genome for both a set of samples to which the user has authorized direct access (active data set) and for the whole set of samples. The query results are statistics that do not breach data confidentiality but allow further exploration of the data. Relevant samples outside the active data set can be identified through pseudonymous identifiers so that researchers can negotiate access to these samples with the authorized party. This approach minimizes the impact on data confidentiality while enabling powerful data analysis by gaining access to important rare samples.

TOP

TP053 (LT) - Co-analysis of transcriptome, exome, and protein interaction network information in cancers points to therapeutically targetable mutations
Theme: Systems / Disease
Date: Monday, July 13, 11:40 am - 12:00 pmRoom: The Liffey B

Presenting author: Sarah-Jane Schramm, The University of Sydney, Australia

Shila Ghazanfar, The University of Sydney, Australia
Sarah-Jane Schramm, The University of Sydney, Australia
John T. Ormerod, The University of Sydney, Australia
Graham J. Mann, The University of Sydney, Australia
Jean Yee Hwa Yang, The University of Sydney, Australia

Session Chair: Hidde de Jong

Presentation Overview: Show
A long standing goal in cancer research is to describe the landscape of mutations responsible for neoplastic development and progression. Improved understanding of how gene and protein networks function in cancers would lead to identification of potential therapeutic targets, paving the way for advances in disease management at the level of individual patients. Using melanoma as a model disease, we recently found that differences in the coordination of gene co-expression among protein-protein interaction (PPI) networks were significantly associated (p<0.05) with patient survival. Moreover, these survival-related networks showed significant increases in the number of functional mutations present, relative to networks without such gene co-expression disruption. These findings suggest that increased functional mutation burden may be a pathogenic mechanism behind the differential network behavior observed. If true, these mutations would form a selectable basis of accumulation of disturbances during tumorigenesis, and be important drivers of disease progression/clinical outcome. Extending these analyses, we have recently shown in unpublished work that our original findings are reproducible in other cancers including lung squamous cell carcinoma (p<0.02), and serous ovarian cancer (p<0.05). Subsequent literature-based analysis reveals these survival-related networks are highly relevant to biology underlying tumor behaviour. These findings may guide the identification of therapeutically targetable mutations, including outside the exome.

TOP

TP054 (HT) - Finding Novel Molecular Connections between Developmental Processes and Disease
Theme: Disease / Other
Date: Monday, July 13, 11:40 am - 12:00 pmRoom: Liffey Hall 2

Presenting author: Donna Slonim, Tufts University, United States

Heather Wick, Johns Hopkins University, United States
Daniel Kee, Braintree Payments, United States
Keith Noto, Ancestry, Inc, United States
Jill Maron, Tufts Univerisy, United States
Donna Slonim, Tufts University, United States

Session Chair: Yves Moreau

Presentation Overview: Show
Experiences during early development can affect lifelong health and disease risk. In our study, we have identified significant and surprising links between diseases and several tissue-specific developmental processes. Our work relies on a novel approach whose strength comes from pooling disease genes across related diseases, overcoming problems posed by limited information about gene-disease associations. We demonstrate the efficacy of the pooling method by evaluation on withheld data. We further validate the links between developmental processes and disease by demonstrating that our results, collectively, recover expected connections, such as those between heart development and cardiovascular disorders. We also describe some of the more surprising connections we found, several of which are consistent with other molecular evidence or recent literature. Finally, we present a web-based application that enables users to perform the same analysis for any set of genes of interest, and includes a visualization tool for exploration of the results.

TOP

TP055 (HT) - The functional importance of synonymous mutations in cancer and microbes analyzed in massive genomic data sets
Theme: Genes
Date: Monday, July 13, 12:00 pm - 12:20 pmRoom: The Liffey A

Presenting author: Fran Supek, Centre for Genomic Regulation, Spain

Belén Miñana, Centre For Genomic Regulation, Barcelona, Spain
Juan Valcárcel, Centre For Genomic Regulation, Barcelona, Spain
Toni Gabaldón, Centre For Genomic Regulation, Barcelona, Spain
Ben Lehner, Centre For Genomic Regulation, Barcelona, Spain
Anita Kriško, Mediterranean Institute for Life Sciences, Split, Croatia
Tea Copić, Mediterranean Institute for Life Sciences, Split, Croatia

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Synonymous mutations do not change the encoded amino acids, but are known to have subtle effects on protein translation and on regulation of splicing. We have examined the prevalence of synonymous mutations among somatic changes catalogued across ~3800 human cancer genomes (Supek et al, Cell 2014). Oncogenes harbor an excess of synonymous mutations when compared against the broader gene set and intronic mutation rates. Such mutations were likely to alter exonic splicing enhancer/silencer motifs; RNA-Seq data indicated this leads to aberrantly spliced transcripts. Next, we analyzed the synonymous codon usage biases in 910 prokaryotic genomes (Krisko et al, Genome Biol 2014). Here, we found associations of codon biases within orthologous gene clusters to environmental preferences of microbes, and used this to predict the adaptive value of genes for aerobic, hot, or hypersaline environments. Out of 200 novel functional annotations for COG groups thus obtained, we experimentally validated 35/44 tested predictions.

TOP

TP056 (HT) - A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networks.
Theme: Systems / Disease
Date: Monday, July 13, 12:00 pm - 12:20 pmRoom: The Liffey B

Presenting author: Mohammed AlQuraishi, Harvard Medical School, United States

Grigoriy Koytiger, Harvard Medical School, United States
Anne Jenney, Harvard Medical School, United States
Gavin MacBeath, Harvard Medical School, United States
Peter Sorger, Harvard Medical School, United States

Session Chair: Hidde de Jong

Presentation Overview: Show
Functional interpretation of genomic variation is critical to understanding human disease, but it remains difficult to predict the effects of specific mutations on protein interaction networks and the phenotypes they regulate. We describe an analytical framework based on multiscale statistical mechanics that integrates genomic and biophysical data to model the human SH2-phosphoprotein network in normal and cancer cells. We apply our approach to data in The Cancer Genome Atlas (TCGA) and test model predictions experimentally. We find that mutations mapping to phosphoproteins often create new interactions but that mutations altering SH2 domains result almost exclusively in loss of interactions. Some of these mutations eliminate all interactions, but many cause more selective loss, thereby rewiring specific edges in highly connected subnetworks. Moreover, idiosyncratic mutations appear to be as functionally consequential as recurrent mutations. By synthesizing genomic, structural and biochemical data, our framework represents a new approach to the interpretation of genetic variation.

TOP

TP057 (LT) - Uncovering the mechanisms modulating cardiac electrophysiology using systems genetics approaches in recombinant inbred rat strains
Theme: Disease / Other
Date: Monday, July 13, 12:00 pm - 12:20 pmRoom: Liffey Hall 2

Presenting author: Michiel Adriaens, Maastricht University, Netherlands

Aida Moreno-Moral, Imperial College London, United Kingdom
Elisabeth Lodder, AMC, Netherlands
Carol Ann Remme, AMC, Netherlands
Rianne Wolswinkel, AMC, Netherlands
Enrico Petretto, Imperial College London, United Kingdom
Stuart Cook, Imperial College London, United Kingdom
Connie Bezzina, AMC, Netherlands

Session Chair: Yves Moreau

Presentation Overview: Show
Genome-wide association studies have identified many common genetic variants impacting on susceptibility to cardiac arrhythmias and sudden cardiac death (SCD). But uncovering the underlying disease mechanisms remains a substantial challenge, as the required resources for the human heart are sparse and underpowered. Hence, the only means to paint the full picture is to complement insights derived from human studies with systems genetics approaches in statistically powerful animal models. In this study we use 29 BXH/HXB recombinant inbred (RI) rat strains, a strong model to uncover the mechanisms modulating cardiac electrical function. Prolonged ECG indices of conduction and repolarization are risk factors for cardiac arrhythmias and SCD, and here we combine such indices with genotyping and RNA-seq transcriptomics data. In this data we hunt for quantitative trait loci (QTL): genetic markers associated with changes in a quantitative trait, i.e. an ECG index or gene expression level. Using a Bayesian systems genetics framework, we identified multiple candidate genes and networks. One of these genes is Acbd4: a nearby genetic marker appears to modulate the expression of this gene (eQTL). Additionally, the same marker is associated with PR prolongation (ecgQTL). The protein product of Acbd4 plays a role in vesicle formation, deregulation of which is known to be linked to heart disease. Acbd4’s co-expression network is significantly positively correlated with PR duration and partly conserved in human, suggesting that the underlying mechanism may be of clinical relevance as well. Validation of our findings is currently ongoing.

TOP

TP058 (PT) - Robust reconstruction of gene expression profiles from reporter gene data using linear inversion
Theme: Genes
Date: Monday, July 13, 12:20 pm - 12:40 pmRoom: The Liffey A

Presenting author: Valentin Zulkower, INRIA Grenoble-Rhône-Alpes, France

Michel Page, INRIA Grenoble-Rhône-Alpes, IAE Grenoble, France
Delphine Ropers, INRIA Grenoble-Rhône-Alpes, UJF Grenoble, France
Johannes Geiselmann, INRIA Grenoble-Rhône-Alpes, UJF Grenoble, France
Hidde de Jong, INRIA Grenoble-Rhône-Alpes, France

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Motivation: Time-series observations from reporter gene experiments
are commonly used for inferring and analyzing dynamical models
of regulatory networks. The robust estimation of promoter activities
and protein concentrations from primary data is a difficult problem
due to measurement noise and the indirect relation between the
measurements and quantities of biological interest.

Results: We propose a general approach based on regularized linear
inversion to solve a range of estimation problems in the analysis of
reporter gene data, notably the inference of growth rate, promoter
activity, and protein concentration profiles. We evaluate the validity
of the approach using in-silico simulation studies, and observe
that the methods are more robust and less biased than indirect
approaches usually encountered in the experimental literature based
on smoothing and subsequent processing of the primary data. We
apply the methods to the analysis of fluorescent reporter gene data
acquired in kinetic experiments with Escherichia coli. The methods
are capable of reliably reconstructing time-course profiles of growth
rate, promoter activity, and protein concentration from weak and noisy
signals at low population volumes. Moreover, they capture critical
features of those profiles, notably rapid changes in gene expression
during growth transitions.

Availability: The methods described in this paper are made available
as a Python package (LGPL licence) and also accessible through a
fr/ibis/wellinverter.
Contact: Hidde.de-Jong@inria.fr

TOP

TP059 (HT) - Biological network modeling helps finding genetic determinants of metastatic colon cancer
Theme: Systems / Disease
Date: Monday, July 13, 12:20 pm - 12:40 pmRoom: The Liffey B

Presenting author: Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, France

Maia Chanrion, Institut Curie-CNRS UMR 144, France
David Cohen, Institut Curie – U900 INSERM - Mines ParisTech, France
Emmanuel Barillot, Institut Curie – U900 INSERM - Mines ParisTech, France
Daniel Louvard, Institut Curie-CNRS UMR 144, France
Sylvie Robine, Institut Curie-CNRS UMR 144, France
Andrei Zinovyev, Institut Curie – U900 INSERM - Mines ParisTech, France

Session Chair: Hidde de Jong

Presentation Overview: Show
Epithelial-to-mesenchymal transition (EMT) initiates metastases in cancer, however the key players of the process are still debatable. We constructed a comprehensive map of EMT signaling network and performed structural analysis that allowed highlighting the network organization principles and complexity reduction up to core regulatory routs. Using the reduced network we compared combinations of single and double mutants for achieving the EMT phenotype; predicted that a combination of p53 knock-out and overexpression of Notch would induce EMT and suggested the molecular mechanism. This prediction lead to generation of colon cancer mice model with metastases in distant organs. We confirmed in invasive human colon cancer samples that EMT markers are associated with modulation of Notch and p53 gene expression in similar manner as in the mice model, supporting a synergy between these genes to permit EMT induction. The computational and experimental approaches lead to discovery of new metastasis mechanism in colon cancer.

TOP

TP060 (LT) - Relating Essential Proteins To Drug Side-Effects Using Canonical Component Analysis
Theme: Disease / Other
Date: Monday, July 13, 12:20 pm - 12:40 pmRoom: Liffey Hall 2

Presenting author: Tianyun Liu, Stanford University, United States

Session Chair: Yves Moreau

Presentation Overview: Show
We identified molecular mechanisms of drug side-effects by associating drugs to essential proteins using canonical component analysis.

TOP

TP061 (HT) - Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes
Theme: Disease
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: The Auditorium

Presenting author: Mark Leiserson, Brown University, United States

Fabio Vandin, Brown University, United States
Hsin-Ta Wu, Brown University, United States
Jason Dobson, Brown University, United States
Alexandra Papoutsaki, Brown University, United States
Beifang Niu, Washington University in St. Louis, United States
Michael McLellan, Washington University in St. Louis, United States
Michael Lawrence, Broad Institute of MIT and Harvard, United States
Abel Gonzalez-Perez, Pompeu Fabra University, Spain
David Tamborero, Pompeu Fabra University, Spain
Gregory Ryslik, Yale University, United States
Yuwei Cheng, Yale University, United States
Nuria Lopez-Bigas, Pompeu Fabra University, Spain
Li Ding, Washington University in St. Louis, United States
Benjamin Raphael, Brown University, United States

Session Chair: Paul Horton

Presentation Overview: Show
A key challenge in cancer genomics is to identify mutations that drive cancer in a cohort of tumor samples. These mutations often target genetic regulatory and signaling pathways and protein complexes, each including multiple genes. We present the HotNet2 (diffusion oriented subnetworks) algorithm for identifying significantly mutated subnetworks in a protein interaction network. HotNet2 uses an insulated heat diffusion process to simultaneously encode the local topology of a protein and its mutations when identifying significantly mutated (hot) subnetworks. We applied HotNet2 to the The Cancer Genome Atlas Pan-Cancer dataset, including 3110 tumor samples from twelve cancer types. HotNet2 identified significantly mutated subnetworks overlapping well-known cancer pathways, protein complexes with recently characterized roles in cancer (e.g. SWI/SNF and BAP1), and less characterized complexes (including the condensin and cohesin complexes). Our presentation will also include recent extensions and applications of the HotNet2 algorithm.

TOP

TP062 (PT) - Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data
Theme: Genes
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: The Liffey A

Presenting author: Fabian J. Theis, Helmholtz Zentrum München; German Research Center for Environmental Health, Germany

Laleh Haghverdi, Helmholtz Center Munich, Germany
Nikola S. Mueller, Helmholtz Center Munich, Germany
Andrea Ocone, Helmholtz Center Munich, Germany

Session Chair: Uwe Ohler

Presentation Overview: Show
Motivation: High-dimensional single-cell snapshot data is becoming widespread in the systems biology community, as a mean to understand biological processes at the cellular level. However, as temporal information is lost with such data, mathematical models have been limited to capture only static features of the underlying cellular mechanisms.

Results: Here, we present a modular framework which allows to recover the temporal behaviour from single-cell snapshot data and reverse engineer the dynamics of gene expression. The framework combines a dimensionality reduction method with a cell time-ordering algorithm to generate pseudo time-series observations. These are in turn used to learn transcriptional ODE models and do model selection on structural network features. We apply it on synthetic data and then on real hematopoietic stem cells data, to reconstruct gene expression dynamics during differentiation pathways and infer the structure of a key gene regulatory network.

TOP

TP063 (HT) - Inference of interactions between chromatin modifiers and histone modifications: from ChIP-Seq data to chromatin-signaling
Theme: Systems
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: The Liffey B

Presenting author: Ho-Ryun Chung, Max-Planck-Institut F. Molekulare Genetik, Germany

Juliane Perner, Max Planck Institute for Molecular Genetics, Germany
Julia Lasserre, Max Planck Institute for Molecular Genetics, Germany
Sarah Kinkley, Max Planck Institute for Molecular Genetics, Germany
Martin Vingron, Max Planck Institute for Molecular Genetics, Germany

Session Chair: Russell Schwartz

Presentation Overview: Show
Chromatin modifiers and histone modifications form chromatin-signaling networks that regulate and drive transcription. In many cases, interactions between chromatin modifiers and histone modifications have only been studied in vitro or are based on the analysis of a few genes. Due to the biased nature of these experimental approaches and the dynamic complexity of chromatin signaling networks many interactions remain undisclosed. To recover novel interactions between chromatin modifiers and histone modifications, we applied computational methods to genome-wide ChIP-Seq data. The identified chromatin-signaling network recovered several previously described interactions and revealed as of yet unknown interactions. We experimentally verified two of these interactions, linking H4K20me1 with members of the Polycomb Repressive Complexes 1 and 2. These findings demonstrate that our computational method identifies interactions with experimental support and leads to novel biological insights, underlining its power in unraveling the connectivity of highly dynamic chromatin signaling networks.

TOP

TP064 (HT) - Using the Power of Big Data and Crowdsourcing for Catalyzing Breakthroughs in Amyotrophic Lateral Sclerosis (ALS)
Theme: Systems / Disease
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: Liffey Hall 2

Presenting author: Robert Kueffner, Helmholtz Center Munich, Germany

Zach Neta, Prize4Life, Israel
Gustavo Stolovitzky, IBM, United States

Session Chair: Knut Reinert

Presentation Overview: Show
We developed a crowdsourced DREAM Challenge to predict ALS disease progression using clinical trial data. The data are complex and non-uniform as they were measured by different laboratories. Therefore an important step was the harmonization of the different data sets.
On this clinical data, tree-based ensemble regression techniques proved to be most effective for machine learning. Based on the accuracy of the winning algorithms, we will present a simulation model to estimate the expected reduction in the number of patients needed for a clinical trial. The best performing submissions also outperformed the predictions of a group of world leading clinicians. One important outcome of the challenge was the identification of novel predictors of progression rate, potentially offering novel insights about disease mechanisms. We will also discuss our registrant survey where we determined factors that motivated or discouraged potential solvers to participate.

TOP

TP065 (LT) - Statistical Assessment of Darwinian Selection for Mitochondrial Mutations in Cancer
Theme: Disease
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: The Auditorium

Presenting author: Thomas LaFramboise, Case Western Reserve University, United States

Session Chair: Paul Horton

Presentation Overview: Show
Somatic mitochondrial DNA (mtDNA) mutations accumulate in human cancers, although the mutations’ roles in tumorigenesis are unclear and subject to some debate. In contrast to the nuclear genome’s two copies per cell, the mitochondrial genome – although very small at 16,568 bp – is typically present at hundreds to thousands of copies per cell. This complicates analysis of mtDNA level variants since they may be present at a continuous range of abundances between 0% and 100%, as opposed to the 0%, 50% or 100% discrete levels for nuclear genome variants. Furthermore, the per-cell copy number of the mitochondrial genome often shifts dramatically between the tumor and surrounding normal tissue, although the reasons for this phenomenon and its role, if any, in tumor development are unclear. To address these issues in a rigorous manner, we perform an analysis of cancer-specific mutational patterns and copy number changes in the whole mitochondrial genomes of 7,817 patient samples across 14 tumor types. We develop and apply statistical tests to query selection for somatic and inherited variants in mitochondrial DNA. We find specific tumor types and specific genes that show particularly prominent signals of positive selection. Since selection implies function, our results support the role of mtDNA mutations as causative factors in the initiation and development of human cancer.

TOP

TP066 (HT) - Large-scale Imputation of Epigenomic Datasets for Systematic Annotation of Diverse Human Tissues
Theme: Genes
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: The Liffey A

Presenting author: Jason Ernst, University of California Los Angeles, United States

Manolis Kellis, Massachusetts Institute of Technolog, United States

Session Chair: Uwe Ohler

Presentation Overview: Show
With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

TOP

TP067 (LT) - Genome-wide modelling of transcription kinetics reveals patterns of RNA processing delays
Theme: Systems
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: The Liffey B

Presenting author: Antti Honkela, University of Helsinki, Finland

Antti Honkela, University of Helsinki, Finland
Jaakko Peltonen, Aalto University, Finland
Hande Topa, Aalto University, Finland
Iryna Charapitsa, Institute for Molecular Biology Mainz, Germany
Filomena Matarese, Radboud University Nijmegen, Netherlands
Korbinian Grote, Genomatix Software GmbH, Germany
Hendrik G. Stunnenberg, Radboud University Nijmegen, Netherlands
George Reid, Institute for Molecular Biology Mainz, Germany
Neil D. Lawrence, University of Sheffield, United Kingdom
Magnus Rattray, University of Manchester, United Kingdom

Session Chair: Russell Schwartz

Presentation Overview: Show
Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles due to differences in transcription time, degradation rate and RNA processing kinetics. Recent studies have shown that a splicing-associated RNA processing delay can be significant. We introduce a joint model of transcriptional activation and mRNA accumulation which can be used for inference of transcription rate, RNA processing delay and degradation rate given genome-wide data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a non-parametric statistical modelling approach which allows us to capture a broad range of activation kinetics, and use Bayesian parameter estimation to quantify the uncertainty in the estimates of the kinetic parameters.

We apply the model to data from estrogen receptor (ER-α) activation in the MCF-7 breast cancer cell line. We use RNA polymerase II (pol-II) ChIP-Seq time course data to characterise transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 minutes between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated processing delays in many genes.

TOP

TP068 (HT) - A community computational challenge to predict the activity of pairs of compounds
Theme: Systems / Disease
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: Liffey Hall 2

Presenting author: Gustavo Stolovitzky, IBM Research / Mt Sinai Hospital, United States

Andrea califano, Columbia University, United States
James Costello, University of Colorado, United States
Mukesh Bansal, Columbia University, United States

Session Chair: Knut Reinert

Presentation Overview: Show
Recent therapeutic successes have renewed interest in drug combinations, but experimental screening approaches are costly and often identify only small numbers of synergistic combinations. The DREAM consortium launched an open challenge to foster the development of in silico methods to computationally rank 91 compound pairs, from the most synergistic to the most antagonistic, based on gene-expression profiles of human B cells treated with individual compounds at multiple time points and concentrations. Using scoring metrics based on experimental dose-response curves, we assessed 32 methods, four of which performed significantly better than random guessing. We highlight similarities between the methods. Although the accuracy of predictions was not optimal, we find that computational prediction of compound-pair activity is possible, and that community challenges can be useful to advance the field of in silico compound-synergy prediction.

TOP

TP069 (LT) - Identifying driver genomic alterations in cancers by searching minimum-weight, mutually exclusive sets
Theme: Disease
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: The Auditorium

Presenting author: Xinghua Lu, University of Pittsburgh,

Session Chair: Paul Horton

Presentation Overview: Show
An important goal of cancer genomic research is to identify the driving pathways underlying disease mechanisms. It is well known that somatic genome alterations (SGAs) affecting the genes that encode the proteins within a common signaling pathway exhibit mutual exclusivity, in which these SGAs usu-ally do not co-occur in a tumor. With some success, this property has been utilized as an objective function to guide the search for driver mutations. However, the mutual exclusivity alone is not suffi-cient to indicate that genes affected by such SGAs are in common pathways. Here, we propose a nov-el, signal-oriented framework for identifying driver SGAs, such that our new method constrains the mutual exclusivity only on tumors that have SGAs to perturb a common signal (not on all tumors as previous methods used). We apply this framework to the OV and GBM data from TCGA, and perform systematic evaluations. Our results indicate that the signal-oriented approach enhances the ability to find informative sets of driver SGAs that likely constitute signaling pathways.

TOP

TP070 (HT) - Correcting for sample heterogeneity in epigenome-wide association studies.
Theme: Genes
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: The Liffey A

Presenting author: James Zou, Microsoft Research, United States

Session Chair: Uwe Ohler

Presentation Overview: Show
In epigenome-wide association studies, cell-type composition
often differs between cases and controls, yielding associations
that simply tag cell type rather than reveal fundamental
biology. Current solutions require actual or estimated
cell-type composition—information not easily obtainable
for many samples of interest. We propose a method,
FaST-LMM-EWASher, that automatically corrects for cell-type
composition without the need for explicit knowledge of it,
and then validate our method by comparison with the
state-of-the-art approach.

TOP

TP071 (HT) - Widespread degradation of transcripts by splicing and nonsense-mediated mRNA decay (NMD) includes ultraconserved targets whose regulation by alternative splicing and NMD is conserved between kingdoms
Theme: Systems
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: The Liffey B

Presenting author: Steven Brenner, University of California, Berkeley, United States

Liana Lareau, University of California, Berkeley, United States

Session Chair: Russell Schwartz

Presentation Overview: Show
Ultraconserved elements, unusually long regions of perfect sequence identity, are found in genes encoding numerous RNA-binding proteins including SR splicing factors. Expression of these genes is regulated via alternative splicing of the ultraconserved regions to yield mRNAs that are degraded by nonsense- mediated mRNA decay (NMD), a process termed unproductive splicing. We have found that unproductive splicing of affects all human SR genes, but rather than being ancestral, it arose independently in nearly every case. We demonstrate that unproductive splicing of the splicing factor SRSF5 is conserved among all animals and even observed in fungi; this is a rare example of alternative splicing conserved between kingdoms, yet its effect is to trigger mRNA degradation. As the gene duplicated, the ancient unproductive splicing was lost in paralogs, and distinct unproductive splicing evolved rapidly and repeatedly to take its place.

TOP

TP072 (PT) - Improving compound-protein interaction prediction by building up highly credible negative samples
Theme: Systems / Disease
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: Liffey Hall 2

Presenting author: Yanni Sun, Fudan University, China

Jianjiang Sun, Fudan University, China
Jihong Guan, Tongji University, China
Jie Zheng, Nanyang Technological University, Singapore
Shuigeng Zhou, Fudan University, China
Hui Liu, Changzhou University, China

Session Chair: Knut Reinert

Presentation Overview: Show
Motivation : Computational prediction of compound-protein interactions is of great importance for drug design and development, as genome-scale experimental validation of compound-protein interactions is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative compound-protein interaction samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods.

Results : This paper aims at building up a set of highly credible negative samples of compound-protein interactions via an in silico screening method. As most existing computational models assume that similar compounds are likelyto interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not likely to be targeted by the compound, and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network, and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly-generated negative samples for both human and C.elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile, Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an SVM classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases.

Availability: Supplementary files and a preliminary Web server of this work are available at: http://admis.fudan.edu.cn/negative-cpi/

TOP

TP073 (LT) - An optimized chemical genomics pipeline for genome-wide discovery of new molecular probes from large compound collections
Theme: Disease
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: The Auditorium

Presenting author: Chad Myers, University of Minnesota, United States

Scott Simpkins, University of Minnesota, United States
Justin Nelson, University of Minnesota, United States
Jeff Piotrowski, University of Wisconsin-Madison, United States
Raamesh Deshpande, University of Minnesota, United States
Sheena Li, RIKEN, Japan
Jacqueline Barber, RIKEN, Japan
Hamid Safizadeh, University of Minnesota, United States
Reika Okamoto, RIKEN, Japan
Mami Yoshimura, RIKEN, Japan
Tamio Saito, RIKEN, Japan
Minoru Yoshida, RIKEN, Japan
Charles Boone, University of Toronto, Canada
Chad Myers, University of Minnesota, United States

Session Chair: Paul Horton

Presentation Overview: Show
As an alternative to“target-centric” approaches to drug discovery, we have developed an ultra high-throughput yeast chemical genomics assay that allows the prediction of a compound’s gene- and process-level targets across the entire genome. This methodology provides a novel and informative way to screen compounds for specific bioactivities. This methodology was applied to screen more than 13,000 compounds with diverse origins (synthetic, natural product and derivative, and clinically-relevant compounds). We obtain high confidence process-level predictions for over 10% of the screened compounds. At the current level of throughput, we can screen more than 10,000 compounds and generate genome-wide target predictions within a few months’ time, demonstrating that we have developed an efficient, high-throughput method to assess genome-wide bioactivities.

TOP

TP074 (LT) - Histone variants delineate the transcription orientation at enhancers
Theme: Genes
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: The Liffey A

Presenting author: Kyoung-Jae Won, University of Pennsylvania, United States

Kyoung-Jae Won, University of Pennsylvania, United States
Inchan Choi, Univerisity of Pennsylvania, United States
Benjamin Garcia, University of Pennsylvania, United States

Session Chair: Uwe Ohler

Presentation Overview: Show
Genome-wide localization analyses using chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) against the four histone variants (H3.1, H3.3, H2A.Z and macroH2A) identified various combinations of histone variants (histone variants codes). While H2A.Z were highly enriched at promoter, H3.3 and H3.1 were observed at the body and the 3’UTR of active genes. While majority of distal regulatory regions were enriched for H3.3 and/or H2A.Z, we newly identified a group of regulatory regions enriched in H3.1 and the histone variant associated with repressive marks macroH2A, indicating that histone variants are deposited at regulatory regions to assist gene regulation. Systematic analysis identified both symmetric and asymmetric patterns of histone variant (H3.3 and H2A.Z) occupancies at intergenic regulatory regions. Strikingly, these directional patterns were associated with RNA Polymerase II (PolII). These asymmetric patterns correlated with the enhancer activities measured by global run-on sequencing (GRO-seq) data. We also showed that enhancers with skewed histone variants patterns well facilitate enhancer activity. Our study indicates that H2A.Z and H3.3 delineate the orientation of transcription at enhancers as observed at promoters.

TOP

TP075 (LT) - Genome-wide ceRNA networks
Theme: Systems
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: The Liffey B

Presenting author: Mario Flores, University of Texas at San Antonio, United States

Session Chair: Russell Schwartz

Presentation Overview: Show
Postranscriptional regulation of gene expression can be modeled as a competitive endogenous RNA (ceRNA) network in which mRNAs compete for miRs binding. Previous research shows that this competition maintains and fine-tune levels of protein coding genes and the disruption of the network contributes to phenotypic conditions like cancer. Based on our previous studies we provided a tool (TraceRNA) for reconstruction of ceRNA networks around a gene of interest (GoI). The approach used in TraceRNA although practical and useful for gene-based studies provides only a partial landscape of the ceRNA mechanisms and phenotypes. Besides TraceRNA offers an ad-hoc approach for the study of the ceRNA phenomenon. In this work we present a formal genome-wide approach for ceRNA networks study. This novel and formal treatment of the ceRNA phenomenon provides new perspectives in the study of ceRNA networks and its specific phenotype. We divide the study of genome-wide ceRNA networks in three main sections: network construction, analysis of network components by network perturbation and network stability.

TOP

TP076 (PT) - ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes
Theme: Genes / Other
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: Liffey Hall 2

Presenting author: Siavash Mirarab, University of Texas at Austin, United States

Tandy Warnow, The University of Illinois at Urbana-Champaign, United States

Session Chair: Knut Reinert

Presentation Overview: Show
Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL (ECCB 2014), which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent- based methods on the datasets we examined (Mirarab et al., 2014a). ASTRAL heuristically solves an NP-hard problem in polynomial time, by constraining the search space through a set of allowed “bipartitions”. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent.

Results: We present a new version of ASTRAL, which we call ASTRAL-II. We will show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes), and has substantially better accuracy under some conditions. ASTRAL’s running time is $O(n^2k|X|^2)$, and ASTRAL-II’s running time is $O(nk|X|^2)$, where n is the number of species, k is the number of loci, and X is the set of allowed bipartitions for the search space.

Availability: ASTRAL-II is available in open source at https://github.com/smirarab/ASTRAL.
Contact: smirarab@gmail.com

TOP

TP077 (PT) - Using collective expert judgements to evaluate quality measures of mass spectrometry images
Theme: Data
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: Wicklow Hall 2A

Presenting author: Andrew Palmer, EMBL, Germany

Ekaterina Ovchinnikova, EMBL, Germany
Mikael Thune, Denator, Sweden
Regis Lavigne, Inserm U1085, France
Blandine Guevel, Inserm U1085, France
Andrey Dyatlov, Uni Bremen, Germany
Olga Vitek, Northeastern University, United States
Charles Pineau, Inserm U1085, France
Mats Boren, Denator, Sweden
Theodore Alexandrov, EMBL, Germany

Session Chair: Robert F. Murphy

Presentation Overview: Show
Motivation: Imaging Mass Spectrometry (IMS) is a maturating technique of molecular imaging. Confidence in the reproducible quality of IMS data is essential for its integration into routine use. However, the predominant method for assessing quality is visual examination, a time consuming, unstandardised and non-scalable approach. So far, the problem of assessing the quality has only been marginally addressed and existing measures do not account for the spatial information of IMS data. Importantly, no approach exists for unbiased evaluation of potential quality measures.

Results: We propose a novel approach for evaluating potential measures by creating a gold-standard set using collective expert judgements upon which we evaluated image-based measures. To produce a gold standard, we engaged 80 IMS experts, each to rate the relative quality between 52 pairs of ion images from MALDI- TOF IMS datasets of rat brain coronal sections. Experts’ optional feedback on their expertise, the task and the survey showed that (i) they had diverse backgrounds and sufficient expertise, (ii) the task was properly understood, and (iii) the survey was comprehensible. A moderate inter-rater agreement was achieved with Krippendorff’s alpha of 0.5. A gold-standard set of 634 pairs of images with accompanying ratings was constructed and showed a high agreement of 0.85. Eight families of potential measures with a range of parameters and statistical descriptors, giving 143 in total, were evaluated. Both signal-to-noise and spatial chaos based measures performed highly with a correlation of 0.7 to 0.9 with the gold standard ratings. Moreover, we showed that a composite measure with the linear coefficients (trained on the gold standard with regularised least squares optimisation and lasso) showed a strong linear correlation of 0.94 and an accuracy of 0.98 in predicting which image in a pair was of higher quality.

Availabiility: The anonymised data collected from the survey and the Matlab source code for data processing can be found at: https: //github.com/alexandrovteam/IMS_quality.

TOP

TP078 (PT) - Phenome-driven Disease Genetics Prediction Towards Drug Discovery
Theme: Disease
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: The Auditorium

Presenting author: Rong Xu, Case Western Reserve University, United States

Li Li, Case Western Reserve University, United States
Guo-Qiang Zhang, Case Western Reserve University, United States
Yang Chen, Case Western Reserve University, United States

Session Chair: Paul Horton

Presentation Overview: Show
Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease associated genes from integrated phenotypic and genomic data.

Methods: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely-used phenotype database in disease gene prediction studies. We developed a network analysis approach to predict disease-gene associations from the integrated disease phenotype networks and a gene network.

Results: Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross validation and de novo gene prediction analysis, our approach achieved the area under the curves (AUCs) of 90.7% and 90.3%, which are significantly higher than 84.2% (p

TOP

TP079 (LT) - Integrated reporters reveal distinct pathways of gene silencing in Drosophila
Theme: Genes
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: The Liffey A

Presenting author: Guillaume Filion, Center for Genomic Regulation, Spain

Session Chair: Uwe Ohler

Presentation Overview: Show
Recent genome-wide mapping studies in eukaryotes have shown that most transcriptionally silent domains lack repressive histone marks and repressors of transcription, prompting to ask what makes genes of these regions silent. Here we set out to answer this question by assaying position effects genome-wide for several reporters of transcription. To this end, we used a shotgun approach called TRIP (Thousands of Reporters Integrated in Parallel) to insert identical reporter genes at different loci of the Drosophila genome and measure their expression. We obtained expression data for more than 85,000 integrated reporters under eight different promoters, constituting the largest dataset of position effects available to date. We identified 10-100 kb domains of either high or low reporter activity. These domains are similar for different reporter constructs, showing that they correspond to the underlying organization of the genome. While these domains are similar between constructs, the degree of response to the context of each promoter is variable, yet the constructs are equally permeable to the neighboring chromatin. We identified novel protein signatures associated to the repression of reporter genes. One of them consists of chromatin proteins associated to transcriptionally active regions with a deficit of DMAP1, which suggests that this protein is critical for the expression of reporters. Overall, our results reveal that the effect of the chromatin context on transcription results from multiple processes at work simultaneously.

TOP

TP080 (LT) - Alternatively spliced homologous exons have ancient origins and are highly expressed at the protein level
Theme: Proteins
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: The Liffey B

Presenting author: Michael Liam Tress, Spanish National Cancer Research Centre (CNIO), Spain

Michael Tress, Spanish National Cancer Research Centre (CNIO), Spain
Federico Abascal, Spanish National Cancer Research Centre (CNIO), Spain
Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain
Juan Rodriguz, Spanish National Cancer Research Centre (CNIO), Spain
Jose Manuel Rodriguez, Spanish National Cancer Research Centre (CNIO), Spain
Iakes Ezkurdia, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Spain
Jesus Vazquez, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Spain
Angela del Pozo, Hospital Universitario La Paz, Spain

Session Chair: Russell Schwartz

Presentation Overview: Show
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Although large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, results have been contradictory.

Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing detectable by high-resolution mass spectroscopy. While we identified peptides for almost 64% of human protein coding genes, we detected just 282 splice events. We demonstrate that this is fewer splice events than would be expected, and show that most genes have a single dominant isoform at the protein level.
The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from their frequency in the genome. These homologous exon substitution events were remarkably conserved - all the homologous exons we identified evolved over 460 million years ago - and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing is a clear indication that isoforms generated from homologous exons may have important cellular roles.

TOP

TP081 (HT) - Pooled Assembly of Metagenomic Data: Chimeric Contigs Enable Better Annotation and Discovery of New Marine Bacteria
Theme: Genes / Other
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: Liffey Hall 2

Presenting author: Dietlind Gerloff, Foundation for Applied Molecular Evolution, United States

Session Chair: Knut Reinert

Presentation Overview: Show
A metagenomic sample can contain billions of cells, thousands of different genomes. The set of sequencing reads derived from it will be sparse by comparison and underrepresent this complexity by orders of magnitude. Additionally, metagenome annotation is confounded by short reads that capture only small fragments of genes, and by the small fraction of known microbes represented in sequence databases, often described as "the culturable 1%". Difficulties include distinguishing known from novel species and often affect the majority of reads in a data set.  In our paper, we demonstrate quantitatively how careful assembly of marine metagenomic pyrosequencing reads within, but also across, datasets can alleviate annotation problems. Our results outline exciting prospects for data sharing in the metagenomics community. In follow-on work, we have developed a new "geographic profiling" approach that allows us to use chimeric contigs obtained through pooled assembly for (low-cost) discovery of new species in old data.

TOP

TP082 (HT) - Interactive and exploratory visual analytics of epigenome-wide data
Theme: Data
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: Wicklow Hall 2A

Presenting author: Hector Corrada Bravo, University of Maryland, United States

Florin Chelaru, University of Maryland, United States

Session Chair: Robert F. Murphy

Presentation Overview: Show
Data visualization is an integral aspect of the analysis of epigenomic experimental results. Commonly, the data visualized
in these tools is the output of analyses performed in computing
environments like _Bioconductor_. These two essential aspects of data
analysis, algorithmic/statistical analysis and visualization, are
usually distinct and disjoint but are most effective when used
iteratively. We will introduce epigenomics data visualization tools that
provide tight-knit integration with computational and statistical
modeling and data analysis: _Epiviz_ (_http://epiviz.cbcb.umd.edu_), a
web-based genome browser application, and the _Epivizr_ Bioconductor
package that provides interactive integration with _R/Bioconductor_
sessions. This combination of technologies permits interactive
visualization within a state-of-the-art functional genomics analysis
platform. The web-based design of our tools facilitates the reproducible
dissemination of interactive data analyses in a user-friendly platform.

TOP

TP083 (LT) - Interactome based drug discovery
Theme: Disease
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: The Auditorium

Presenting author: Gaurav Chopra, University of California, San Francisco, United States

Gaurav Chopra, UCSF & SUNY-Buffalo, United States
Ram Samudrala, SUNY-Buffalo, United States

Session Chair: Paul Horton

Presentation Overview: Show
We have developed a Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando/) funded by a 2010 NIH Director's Pioneer Award that analyzes compound-proteome interaction signatures to determine drug behavior, in contrast to traditional single (or few) target approaches. Our platform implements a modeling pipeline that generates an interaction matrix between 3,733 human approved drugs and 48,278 proteins using a hierarchical chem- and bio-informatic fragment-based docking with dynamics protocol (~ 1 billion predicted interactions evaluated, considering multiple binding sites per protein). The platform then uses similarity of interaction signatures across all proteins indicative of similar functional behavior and nonsimilar signatures for off- and anti-target (side) effects, in effect inferring homology of compound/drug behavior at a proteomic level. The benchmarking accuracy using this approach to rank compounds for over 650 indications/diseases is ~36%, in contrast to accuracies of ~0.2% obtained when using scrambled control matrices. We prospectively validated “high value” predictions in vitro and in vivo preclinical studies for more than a dozen indications, including type 1 diabetes, herpes, dental caries, dengue, tuberculosis, malaria, hepatitis B, and different cancers. Our drug prediction accuracy is ~35% across the nine indications, where 57/162 compounds validated thus far show comparable or better activity than an existing drug, or micromolar inhibition at the cellular level, and serve as novel repurposeable therapies. Our approach is broadly applicable beyond repurposing, enables personalized and precision medicine, and foreshadows a new era of faster, safer, and cheaper drug discovery.

TOP

TP084 (HT) - Predicting the human epigenome from DNA motifs
Theme: Genes
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: The Liffey A

Presenting author: John Whitaker, Janssen Pharmaceutical Companies of Johnson & Johnson, United States

Wei Wang, UCSD, United States
Zhou Chen, UCSD, United States
Kai Zhang, UCSD, United States

Session Chair: Uwe Ohler

Presentation Overview: Show
The epigenome is established and maintained by the site-specific recruitment of chromatin-modifying enzymes and their cofactors. Identifying the cis elements that regulate epigenomic modification is critical for understanding the regulatory mechanisms that control gene expression patterns. We present Epigram, an analysis pipeline that predicts histone modification and DNA methylation patterns from DNA motifs. The identified cis elements represent interactions with the site-specific DNA-binding factors that establish and maintain epigenomic modifications. We cataloged the cis elements in embryonic stem cells and four derived lineages and found numerous motifs that have location preference, such as at the center of H3K27ac or at the edges of H3K4me3 and H3K9me3, which provides mechanistic insight about the shaping of the epigenome.

TOP

TP085 (LT) - Modeling Ribosome Profiling Data with Bayesian Hidden Markov Models
Theme: Systems
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: The Liffey B

Presenting author: Brandon Malone, Max Planck Institute for Biology of Ageing, Germany

Brandon Malone, Max Planck Institute for Biology of Ageing, Germany
Florian Aeschimann, Friedrich Miescher Institute for Biomedical Research, Switzerland
Jieyi Xiong, Max Planck Institute for Biology of Ageing, Germany
Helge Grosshans, Friedrich Miescher Institute for Biomedical Research, Switzerland
Christoph Dieterich, Max Planck Institute for Biology of Ageing, Germany

Session Chair: Russell Schwartz

Presentation Overview: Show
Ribosome profiling via high-throughput sequencing (ribo-seq) is a promising new technique for characterizing the occupancy of ribosomes on messenger RNA (mRNA) at base-pair resolution. The ribosome is responsible for translating mRNA into proteins, so information about its occupancy offers a detailed view of ribosome density and position which could be used to discover new upstream open reading frames, alternative start codons and new isoforms. Furthermore, this data allows the study of translational dynamics, such as decoding speed and ribosome pausing. Despite the wealth of information offered by ribo-seq, current analysis techniques have focused on coarse, gene-level statistics. In this work, we propose a hidden Markov model (HMM) approach to predict, at base-pair resolution, ribosome occupancy and translation. We use state-of-the-art learning algorithms to fit the parameters of our model, which correspond to biologically meaningful quantities, such as expected ribosome occupancy. Furthermore, we extend the model with Bayesian hyperparameters to quantify the uncertainty of the learned parameters. Preliminary evaluation shows that the HMM achieves a much higher true positive rate, and overall higher AUC, in identifying proteomics-verified coding regions compared to using the raw profile.

TOP

TP086 (HT) - Deciphering cocktail party of biological, technical and artefactual signals in tumoural transcriptomes
Theme: Genes / Other
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: Liffey Hall 2

Presenting author: Emmanuel Barillot, Institut Curie, France

Anne Biton, University of California, United States
Emmanuel Barillot, Institut Curie, France

Session Chair: Knut Reinert

Presentation Overview: Show
Large-scale projects are generating massive amounts of molecular profiles for tumoural samples. It remains a challenge to unravel their complexity into the action of relatively few independent signals. This ambitious task can be approached by blind source separation methods such as Independent Component Analysis (ICA). We analysed data on nine different cancers from 21 patient cohorts and 6671 tumours and identified their commonalities, as well as the cancer type-specific characteristics. By carefull interpretation of ICA results, we managed to distinguish the signals coming from tumoural cells from those coming from the tumour microenvironment, clearly identified signals associated with technology and related to different treatments of tumour tissue biases. We showed that the information captured in independent components is also reflected into anatomopathological staining microscopy images. Analysis of one of the bladder cancer-specific ICA component led to formulating a new hypothesis on the role of PPARG gene which was experimentally verified.

TOP

TP087 (PT) - A generic methodological framework for studying single cell motility in high-throughput time-lapse data
Theme: Data
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: Wicklow Hall 2A

Presenting author: Alice Schoenauer Sebag, Mines ParisTech - INSERM - Agro Paristech, France

Céline Raulet-Tomkiewicz, INSERM - Paris V, France
Robert Barouki, INSERM - Paris V, France
Jean-Philippe Vert, Mines ParisTech - Institut Curie, France
Thomas Walter, Institut Curie, France

Session Chair: Robert F. Murphy

Presentation Overview: Show
Motivation: Motility is a fundamental cellular attribute, which plays a major part in processes ranging from embryonic development to metastasis. Traditionally, single cell motility is often studied by live cell imaging. Yet, such studies were so far limited to low throughput. In order to systematically study cell motility at a large scale, we need robust methods to quantify cell trajectories in live cell imaging data.

Results: The primary contribution of this paper is to present MotIW, a generic workflow for the study of single cell motility in High-Throughput (HT) time-lapse screening data. It is composed of cell tracking, cell trajectory mapping to an original feature space, and hit detection according to a new statistical procedure. We show that this workflow is scalable and demonstrate its power by application to simulated data, as well as large-scale live cell imaging data. This application enables the identification of an ontology of cell motility patterns in a fully unsupervised manner.

Availability: Python code and examples available at http://cbio.ensmp.fr/~aschoenauer/motiw.html
Contact: thomas.walter@mines-paristech.fr

TOP

TP088 (LT) - A Study Of Common Disease Using The Human Phenotype Ontology
Theme: Data
Date: Tuesday, July 14, 10:10 am - 10:30 amRoom: Liffey Hall 2

Presenting author: Tudor Groza, Garvan Institute of Medical Research, Australia

Tudor Groza, Garvan Institute of Medical Research, Australia
Sebastian Köhler, Charité-Universitätsmedizin Berlin, Germany
Dawid Moldenhauer, Charité-Universitätsmedizin Berlin, Germany
Nicole Vasilevsy, Oregon Health & Science University, United States
Gareth Baynam, King Edward Memorial Hospital, Australia
Lynn Schriml, University of Maryland School of Medicine, United States
Warren Kibbe, National Cancer Institute, United States
Tim Beck, University of Leicester, United Kingdom
Anthony Brookes, University of Leicester, United Kingdom
Andreas Zankl, The Children's Hospital at Westmead, Australia
Nicole Washington, Lawrence Berkeley National Laboratory, United States
Christopher Mungall, Lawrence Berkeley National Laboratory, United States
Suzanna Lewis, Lawrence Berkeley National Laboratory, United States
Melissa Haendel, Oregon Health & Science University, United States
Peter Robinson, Charité-Universitätsmedizin Berlin, Germany

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Deep phenotyping, the precise and comprehensive analysis of individual phenotypic abnormalities for the purpose of translational research, diagnostics, or personalized care, depends on computational resources to capture the phenotype of patients or diseases and integrate it with other relevant information such as genomic variation. The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence variation data, and translational research, but a comparable resource has not been available for common disease. This presentation introduces disease models for 3,145 common human diseases comprising a total of 132,006 annotations to terms of the HPO, which enabled us to build a common disease phenotypic network, as well as to study the phenotypic and genetic overlap across common diseases.

TOP

TP089 (LT) - Protein Structures in the PDB Show the Temperature Dependance of Hydrophobicity
Theme: Proteins
Date: Tuesday, July 14, 10:10 am - 10:30 amRoom: The Liffey B

Presenting author: Sanne Abeln, VU University, Netherlands

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
The hydrophobic effect is the main driving force in protein folding. One can estimate the relative strength of this hydrophobic effect for each amino acid by mining a large set of experimentally determined protein structures. However, the hydrophobic force is known to be strongly temperature dependent. This temperature dependence is thought to explain the denaturation of proteins at low temperatures. Here we investigate if it is possible to extract this temperature dependence directly from a large set of protein structures determined at different temperatures.
Using NMR structures filtered for sequence identity, we were able to extract hydrophobicity propensities for all amino acids at five different temperature ranges (spanning 265-340 K). These propensities show that the hydrophobicity becomes weaker at lower temperatures, in line with current theory. Alternatively, one can conclude that the temperature dependence of the hydrophobic effect has a measurable influence on protein structures. Moreover, this work provides a method for probing the individual temperature dependence of the different amino acid types, which is difficult to obtain by direct experiment.

TOP

TP090 (HT) - Pancancer analysis of DNA methylation-driven genes using MethylMix
Theme: Disease
Date: Tuesday, July 14, 10:10 am - 10:30 amRoom: The Liffey A

Presenting author: Olivier Gevaert, Stanford University, United States

Session Chair: Louxin Zhang

Presentation Overview: Show
Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hyper-methylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to twelve individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hyper-methylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals ten pancancer clusters reflecting new similarities across malignantly transformed tissues.

TOP

TP091 (PT) - Exploiting Ontology Graph for Predicting Sparsely Annotated Gene Function
Theme: Data
Date: Tuesday, July 14, 10:30 am - 10:50 amRoom: Liffey Hall 2

Presenting author: Sheng Wang, University of Illinois at Urbana-Champaign, United States

Hyunghoon Cho, Massachusetts Institute of Technology, United States
Chengxiang Zhai, University of Illinois at Urbana, United States
Bonnie Berger, Massachusetts Institute of Technology, United States
Jian Peng, University of Illinois at Urbana-Champaign, United States

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Motivation: Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated genes, which constitute about half of the GO terms in yeast, mouse and human, pose a unique challenge in that any prediction algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this “overfitting” issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog.

Results: We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions.

Availability: https://github.com/wangshenguiuc/clusDCA

TOP

TP092 (LT) - Protein Structure Novelty has Regressed 20 Years
Theme: Proteins
Date: Tuesday, July 14, 10:30 am - 10:50 amRoom: The Liffey B

Presenting author: John-Marc Chandonia, Lawrence Berkeley National Laboratory, United States

John-Marc Chandonia, Berkeley National Lab, United States
Steven Brenner, University of California, Berkeley, United States

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
The number of new protein structures deposited every month in the PDB has steadily increased, and is now at over 750 structures per month. On average, fewer than 15 of these structures (i.e., 2%) represent the first solved structure from a Pfam protein family. Fifteen families per month is the lowest rate at which families have been structurally characterized in nearly 20 years, despite vastly more efficient technology. Today, less than half as many families are newly structurally characterized every month as during the heyday of Structural Genomics, between 2003 and 2007. Because the rate of sequencing has outpaced the rate of structural characterization of families, the fraction of large protein families with a known structure peaked 7 years ago, and is 10% lower today than it was at its peak. This makes curation of protein structure classification databases easier, but interpretation of sequence variation is more challenging than would otherwise be the case.

TOP

TP093 (PT) - MEMCover: Integrated Analysis of Mutual Exclusivity and Functional Net-work Reveals Dysregulated Pathways Across Multiple Cancer Types
Theme: Disease
Date: Tuesday, July 14, 10:30 am - 10:50 amRoom: The Liffey A

Presenting author: Yoo-Ah Kim, NCBI/NLM/NIH, United States

Dongyeon Cho, NCBI/NLM/NIH, United States
Phuong Dao, NCBI/NLM/NIH, United States
Teresa Przytycka, NCBI/NLM/NIH, United States

Session Chair: Louxin Zhang

Presentation Overview: Show
The data gathered by the Pan-Cancer initiative has created an unprecedented opportunity for illuminating common features across different cancer types. However separating tissue specific features from across cancer signatures has proven to be challenging. One of the often-observed properties of the mutational landscape of cancer is the mutual exclusivity of cancer driving mutations. Even though studies based on individual cancer types suggested that mutually exclusive pairs often share the same functional pathway, the relationship between across cancer mutual exclusivity and functional connectivity has not been previously investigated. Here we introduce a classification of mutual exclusivity into three basic classes: within tissue type exclusivity, across tissue type exclusivity, and between tissue type exclusivity. We then combined across-cancer mutual exclusivity with interactions data to uncover pan-cancer dysregulated pathways. Our new method, Mutual Exclusivity Module Cover (MEMCover) not only identified previously known Pan-Cancer dysregulated sub-networks but also novel subnetworks whose across cancer role has not been appreciated well before. In addition, we demonstrate the existence of mutual exclusivity hubs, putatively corresponding to cancer drivers with strong growth advantages. Finally, we show that while mutually exclusive pairs within or across cancer types are predominantly functionally interacting, the pairs in between cancer mutual exclusivity class are more often disconnected in functional networks.

TOP

TP094 (HT) - SwissTargetPrediction: a web server for target prediction of bioactive small molecules
Theme: Data
Date: Tuesday, July 14, 10:50 am - 11:10 amRoom: Liffey Hall 2

Presenting author: David Gfeller, University of Lausanne, Switzerland

Aurelien Grosdidier, Swiss Institute of Bioinformatics, Switzerland
Matthias Wirth, Swiss Institute of Bioinformatics, Switzerland
Antoine Daina, Swiss Institute of Bioinformatics, Switzerland
Olivier Michielin, Swiss Institute of Bioinformatics, Switzerland
Vincent Zoete, Swiss Institute of Bioinformatics, Switzerland

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Large-scale phenotypic screening initiatives increasingly allow researchers to test the functional impact of small molecules in different eukaryotic species. However, for most bioactive compounds the targets are only partially known. Here, we introduce a new computational approach to predict the targets of bioactive small molecules based on a combination of chemical similarity measures [Gfeller et al. Bioinformatics, Dec 2013]. We further investigate the use of target homology to transfer small molecule-target interactions across organisms. Interestingly, when considering separately orthology and paralogy relationships, we find that mapping small molecule interactions among orthologs significantly improves prediction accuracy, while including paralogs leads to lower prediction accuracy. Overall, our work provides a novel approach to accurately predict the targets of small molecules by combining different kinds of chemical similarity measures and, for the first time, integrates target homology to leverage data from different species. The method is accessible at http://www.swisstargetprediction.ch.

TOP

TP095 (PT) - Large-Scale Model Quality Assessment for Improving Protein Tertiary Structure Prediction
Theme: Proteins
Date: Tuesday, July 14, 10:50 am - 11:10 amRoom: The Liffey B

Presenting author: Jianlin Cheng, University of Missouri-Columbia, United States

Debswapna Bhattacharya, University of Missouri-Columbia, United States
Jilong Li, University of Missouri-Columbia, United States
Renzhi Cao, University of Missouri-Columbia, United States

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
Motivation: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well.

Results: Here, we develop a novel large-scale model quality assessment method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model quality assessment methods to generate consensus model rankings, followed by model refinement based on model combination (i.e., averaging). Our experiment demonstrates that the large-scale model quality assessment approach is more consistent and robust in selecting models of better quality than any individual quality assessment method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked 3rd out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and 2nd according to the total scores of the best of the five models predicted for these domains. MULTICOM’s outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale quality assessment approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling.

Availability: The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/.
Contact: chengji@missouri.edu

TOP

TP096 (HT) - Integrated exome and transcriptome sequencing reveals ZAK isoform usage in gastric cancer
Theme: Disease
Date: Tuesday, July 14, 10:50 am - 11:10 amRoom: The Liffey A

Presenting author: Jinfeng Liu, Genentech, United States

Mark McCleland, Genentech, United States
Eric Stawiski, Genentech, United States
Oleg Mayba, Genentech, United States
Peter Haverty, Genentech, United States
Steffen Durinck, Genentech, United States
Ying-Jiun Chen, Genentech, United States
Christiaan Klijn, Genentech, United States
Suchit Jhunjhunwala, Genentech, United States
Michael Lawrence, Genentech, United States
Hanbin Liu, Genentech, United States
Yinan Wan, Genentech, United States
Vivek Chopra, Genentech, United States
Murat Yaylaoglu, Genentech, United States
Wenlin Yuan, Genentech, United States
Connie Ha, Genentech, United States
Houston Gilbert, Genentech, United States
Jens Reeder, Genentech, United States
Gregoire Pau, Genentech, United States
Jeremy Stinson, Genentech, United States
Howard Stern, Genentech, United States
Gerard Manning, Genentech, United States
Thomas Wu, Genentech, United States
Richard Neve, Genentech, United States
Frederic de Sauvage, Genentech, United States
Zora Modrusan, Genentech, United States
Somasekar Seshagiri, Genentech, United States
Ron Firestein, Genentech, United States
Zemin Zhang, Genentech, United States

Session Chair: Louxin Zhang

Presentation Overview: Show
Integrative data analysis of genomic and transcriptomic alterations has become critical towards our understanding of disease drivers and personalized cancer therapy. Here, we describe the first comprehensive characterization of paired exomes and transcriptomes of 48 primary tumors and 21 cell lines from gastric cancer, the second leading cause of worldwide cancer mortality. We found that more than half of our patient collection could potentially benefit from targeted therapies. We performed systematic analysis of both mutation-dependent aberrant splicing and mutation-independent splicing isoforms in gastric cancer, and identified 55 splice-site mutations accompanied by aberrant splicing products and about 200 genes with differential isoform usage between tumors and normals. Among genes in cancer pathways found to have altered splicing in tumors, we discovered that the long isoform of ZAK kinase was preferentially upregulated in several cancer types, and isoform-specific oncogenic properties of ZAK were subsequently confirmed by functional validation.

TOP

TP097 (PT) - Inferring Models of Multiscale Copy Number Evolution for Single-Tumor Phylogenetics
Theme: Genes / Disease
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: The Auditorium

Presenting author: Russell Schwartz, Carnegie Mellon University, United States

E. Michael Gertz, NCBI/NLM/NIH, United States
Darawalee Wangsa, NCI/NIH, United States
Thomas Ried, NCI/NIH, United States
Alejandro Schaffer, NCBI/NLM/NIH, United States
Salim Akhter Chowdhury, Carnegie Mellon University, United States

Session Chair: Niko Beerenwinkel

Presentation Overview: Show
Motivation: Phylogenetic algorithms have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Developing reliable phylogenies for tumor data requires quantitative models of cancer evolution that include the unusual genetic mechanisms by which tumors evolve, such as chromosome abnormalities, and allow for heterogeneity between tumor types and individual patients. Previous work on inferring phylogenies of single tumors by copy number evolution assumed models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models.

Results: We propose a framework for inferring models of tumor progression from single-cell gene copy number data, including variable rates for different gain and loss events. We propose a new algorithm for identification of most parsimonious combinations of single gene and single chromosome events. We extend it via dynamic programming to include genome duplications. We implement an expectation maximization (EM)-like method to estimate mutation-specific and tumor-specific event rates concurrently with tree reconstruction. Application of our algorithms to real cervical cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for the metastasis of primary cervical cancers and for tongue cancer survival.

Availability: Our software (FISHtrees) and two datasets are available at ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees.

TOP

TP098 (LT) - Entropy-scaling search of massive biological data
Theme: Data
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: Liffey Hall 2

Presenting author: Noah Daniels, Massachusetts Institute of Technology, United States

Y. William Yu, MIT, United States
Bonnie Berger, MIT, United States
David Danko, MIT, United States

Session Chair: Ioannis Xenarios

Presentation Overview: Show
The continual onslaught of new omics data has forced upon scientists the fortunate problem of having too much data to analyze. Luckily, it turns out that many datasets exhibit well-defined structure that can be exploited for the design of smarter analysis tools. We introduce an entropy-scaling data structure—which given a low fractal dimension database, scales in both time and space with the entropy of that underlying database—to perform similarity search, a fundamental operation in data science. Using these ideas, we present accelerated versions of standard tools for use by practitioners in the three domains of high-throughput drug screening, metagenomics, and protein structure search, none of which have any loss in specificity or significant loss in sensitivity: a 12x speedup of small molecule similarity search (SMSD) with less than 4% loss in sensitivity; a 673x speedup of BLASTX with less than 5% loss in sensitivity; and a 10x speedup of protein structure search (FragBag) with less than 0.2% loss in sensitivity.

TOP

TP099 (PT) - cNMA: A framework of encounter complex-based normal mode analysis to model conformational changes in protein interactions
Theme: Proteins
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: The Liffey B

Presenting author: Yang Shen, Texas A&M University, United States

Tomasz Oliwa, Toyota Technological Institute as Chicago, United States

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
Motivation: It remains both a fundamental and practical challenge to understand and anticipate motions and conformational changes of proteins during their associations. Conventional normal mode analysis (NMA) based on anisotropic network model (ANM) addresses the challenge by generating normal modes reflecting intrinsic flexibility of proteins, which follows a conformational selection model for protein--protein interactions. But earlier studies have also found cases where conformational selection alone could not adequately explain conformational changes and other models have been proposed. Moreover, there is a pressing demand of constructing a much reduced but still relevant subset of protein conformational space in order to improve computational efficiency and accuracy in protein docking, especially for the difficult cases with significant conformational changes.

Method and Results: With both conformational selection and induced fit models considered, we extend ANM to include concurrent but differentiated intra- and inter-molecular interactions and develop an encounter complex-based NMA (cNMA) framework. Theoretical analysis and empirical results over a large data set of significant conformational changes indicate that cNMA is capable of generating conformational vectors considerably better at approximating conformational changes with contributions from both intrinsic flexibility and inter-molecular interactions than conventional NMA only considering intrinsic flexibility does. The empirical results also indicate that a straightforward application of conventional NMA to an encounter complex often does not improve upon NMA for an individual protein under study and intra- and inter-molecular interactions need to be differentiated properly. Moreover, in addition to induced motions of a protein under study, the induced motions of its binding partner as well as the coupling between the two sets of protein motions present in a near-native encounter complex lead to the improved performance. A study to isolate and assess the sole contribution of intermolecular interactions towards improvements against conventional NMA further validates the additional benefit from induced-fit effects. Taken together, these results provide new insights into molecular mechanisms underlying protein interactions and new tools for dimensionality reduction for flexible protein docking.

Availability: Source codes are available upon request.

TOP

TP100 (PT) - Reconstruction of clonal trees and tumor composition from multi-sample sequencing data
Theme: Genes / Disease
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: The Auditorium

Presenting author: Mohammed El-Kebir, Brown University, United States

Layla Oesper, Brown University, United States
Hannah Acheson-Field, Brown University, United States
Ben Raphael, Brown University, United States

Session Chair: Niko Beerenwinkel

Presentation Overview: Show
Motivation: DNA sequencing of multiple samples from the same tumor provides data to analyze the process of clonal evolution in the population of cells that give rise to a tumor.

Results: We formalize the problem of reconstructing the clonal evolution of a tumor using single-nucleotide mutations as the Variant Allele Frequency Factorization Problem (VAFFP). We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete. We derive an integer linear programming solution to the VAFFP in the case of error-free data and extend this solution to real data with a probabilistic model for errors. The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence variant allele frequencies.

TOP

TP101 (PT) - MeSHLabeler: Improving the Accuracy of Large-scale MeSH indexing by Integrating Diverse Evidence
Theme: Data
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: Liffey Hall 2

Presenting author: Shanfeng Zhu, Fudan University, China

Shengwen Peng, Fudan University, China
Junqiu Wu, Central South University, China
Chengxiang Zhai, UIUC, United States
Hiroshi Mamitsuka, Kyoto University, Japan
Ke Liu, Fudan University, China

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Motivation: Medical Subject Headings (MeSH) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assistin MeSH annotation, which uses {\it k}-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation.

Methods: We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using "learning to rank''. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy.

Result: MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge.
Note that this accuracy is around 9.15\% higher than 0.5724, obtained by
MTI.
Availability: The software is available upon request.

TOP

TP102 (HT) - Sequence co-evolution gives 3D contacts and structures of protein complexes
Theme: Proteins
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: The Liffey B

Presenting author: Charlotta Schaerfe, University of Tübingen/Harvard Medical School, Germany

Thomas Hopf, Technische Universität München/Harvard Medical School, Germany
João Rodrigues, Utrecht University, Netherlands
Anna Green, Harvard Medical School, United States
Oliver Kohlbacher, University of Tübingen, Germany
Chris Sander, Memorial Sloan Kettering Cancer Center, United States
Alexandre Bonvin, Utrecht University, Netherlands
Debora Marks, Harvard Medical School, United States

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
The interactions of proteins with other biomolecules are essential for all biological activity and thus the accurate prediction of protein-protein interaction partners and interface-residues has been of great interest to the scientific community. Here we present a method, EVcomplex, that allows to predict such data from the evolutionary sequence record alone by making use of residue coevolution between proteins.
This method can have stark implications for various topics from the determination of the actual binding partners and binding sites in large protein complexes to whole genome interactome predictions. In the presentation I will show that the evolutionary record allows us to predict novel protein-protein interactions as well as alternate binding conformations without additional external knowledge of the protein’s 3D structure.

TOP

TP103 (HT) - Reconstructing the Evolutionary History of Tumors
Theme: Genes / Disease
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: The Auditorium

Presenting author: Amit Deshwar, University of Toronto, Canada

Shankar Vembu, University of Toronto, Canada
Christina Yung, Ontario Institute for Cancer Research, Canada
Gun Ho Jang, Ontario Institute for Cancer Research, Canada
Lincoln Stein, Ontario Institute for Cancer Research, Canada

Session Chair: Niko Beerenwinkel

Presentation Overview: Show
Tumors often contain multiple, genetically-diverse subpopulations. Reconstructing the genotype of these subpopulations by determining which of the somatic tumor-associated mutations they contain is a problem of considerable interest to aid in the understanding of tumor development and treatment response. While there has been considerable progress in automated methods for reconstruction, many fundamental questions about this problem remain unanswered. Many subclonal reconstruction methods, including ours, attempt to reconstruct the evolutionary history of the tumour as a means to assign complete genotypes to each subpopulations. I will discuss the current state of the field and our latest work on this problem. I will introduce PhyloWGS, a Bayesian method that is the first to use CNVs and SNVs to perform phylogenetic subclonal reconstruction. PhyloWGS returns a distribution over possible subclonal reconstructions, enabling the identification of portions of the reconstruction that are highly certain and those that are not.

TOP

TP104 (PT) - Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
Theme: Data
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: Liffey Hall 2

Presenting author: Graciela Gonzalez, Arizona State University, United States

Tasnia Tahsin, Arizona State University, United States
Rachel Beard, Arizona State University, United States
Mari Firago, Arizona State University, United States
Robert Rivera, Arizona State University, United States
Matthew Scotch, Arizona State University, United States
Davy Weissenbacher, Arizona State University, United States

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles.

In this paper, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles in order to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic, and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a "metadata heuristic").

For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences.

TOP

TP105 (PT) - Finding Optimal Interaction Interface Alignments between Biological Complexes
Theme: Proteins
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: The Liffey B

Presenting author: Xuefeng Cui, King Abdullah University of Science and Technology, Saudi Arabia

Hammad Naveed, King Abdullah University of Science and Technology, Saudi Arabia
Xin Gao, King Abdullah University of Science and Technology, Saudi Arabia

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
Motivation: Biological molecules perform their functions through
interactions with other molecules. Structure alignment of interaction
interfaces between biological complexes is an indispensable step in detecting
their structural similarities, which are key to understanding their
evolutionary histories and functions. Although various structure alignment
methods have been developed to successfully access the similarities of protein
structures or certain types of interaction interfaces, existing alignment tools
cannot directly align arbitrary types of interfaces formed by protein, DNA or
RNA molecules. Specifically, they require a "blackbox preprocessing" to
standardize interface types and chain identifiers. Yet their performance is
limited and sometimes unsatisfactory.

Results: Here we introduce a novel method, PROSTA-inter, that
automatically determines and aligns interaction interfaces between two
arbitrary types of complex structures. Our method uses sequentially remote
fragments to search for the optimal superimposition. The optimal residue
matching problem is then formulated as a maximum weighted bipartite matching
problem to detect the optimal sequence order-independent alignment. Benchmark
evaluation on all non-redundant protein-DNA complexes in PDB shows significant
performance improvement of our method over TM-align and iAlign (with the
"blackbox preprocessing"). Two case studies where our method discovers, for
the first time, structural similarities between two pairs of functionally
related protein-DNA complexes are presented. We further demonstrate the power
of our method on detecting structural similarities between a protein-protein
complex and a protein-RNA complex, which is biologically known as a protein-RNA
mimicry case.

TOP

TP106 (LT) - Importance of rare copy number alterations for personalized tumor characterization
Theme: Disease
Date: Tuesday, July 14, 2:00 pm - 2:20 pmRoom: The Auditorium

Presenting author: Andreas Beyer, University of Cologne, Germany

Andreas Beyer, University of Cologne, Germany
Betty Friedrich, ETH Zurich, Switzerland
Michael Seifert, TU Dresden, Germany

Session Chair: Natasa Przulj

Presentation Overview: Show
Copy number alterations (CNAs) of large genomic regions are frequent in many tumor types, but only few of them are assumed to be relevant for the cancerous phenotype. It has proven exceedingly difficult to ascertain rare mutations that might have strong effects in individual patients. Here, we show that a genome-wide transcriptional regulatory network inferred from gene expression and gene copy number data of 768 human cancer cell lines can be used to quantify the impact of individual patient-specific gene CNAs on cancer-specific survival signatures. The model was highly predictive for gene expression in 4,548 clinical samples originating from 13 different tissues. Focused analysis of tumors from six tissues revealed that in an individual patient a combination of up to 100 gene CNAs directly or indirectly affected the expression of clinically relevant survival signature genes. Importantly, rare patient-specific mutations (< 1% in a given cohort) often had stronger effects on signature genes than frequent mutations. Subsequent integration with genomic data suggests that frequency variation among high-impact genes is mainly driven by gene location rather than gene function. Our framework contributes to the individualized quantification of cancer risk, along with determining individual key risk factors and their downstream targets.

TOP

TP107 (LT) - An Integrated Mass Spectrometry-Computational Approach for Modelling Large Protein Assemblies
Theme: Proteins
Date: Tuesday, July 14, 2:00 pm - 2:20 pmRoom: The Liffey B

Presenting author: Argyris Politis, King's College London, United Kingdom

Session Chair: Donna Slonim

Presentation Overview: Show
We present an integrated mass spectrometry (MS)-computational method for modelling the structure and dynamics of large protein assemblies. This method computationally integrates orthogonal data sets derived from native MS, ion mobility MS and labelling MS experiment with different levels of resolution and information content. We assessed the method on its ability to reproduce the native structures in a set of five benchmark complexes with varying levels of MS-derived data. Then we applied the method to characterizing the 3D architecture of the yeast eukaryotic initiation factor eIF3 in complex with eIF5.

TOP

TP108 (LT) - Accurate phasing of allele-specific copy-numbers for inferring tumour evolution with probe-level resolution
Theme: Disease
Date: Tuesday, July 14, 2:20 pm - 2:40 pmRoom: The Auditorium

Presenting author: Roland Schwarz, European Molecular Biology Laboratory - European Bioinformatics Institute, United Kingdom

Roland Schwarz, European Molecular Biology Laboratory, United Kingdom

Session Chair: Natasa Przulj

Presentation Overview: Show
Accurate reconstruction of the evolutionary history of cancer in the patient and quantification of intra-tumour heterogeneity are current challenges in cancer genomics. The accuracy of tree inference from genomic rearrangements depends on the quality of the phasing of copy-numbers: the assignment of major and minor copy-numbers to the two physical parental alleles. So far phasing has been done using evolutionary criteria alone, a heuristic and computationally expensive procedure which impedes probe-level resolution tree reconstruction.

We here present a novel phasing algorithm, which extends our previous work on allele-specific segmentation of copy-numbers. Using the shared genetic background of multiple samples from the same patient we assign copy-numbers to physical alleles based on the bi-allelic frequency distribution of heterozygous SNPs. In combination with our previously established evolutionary phasing algorithm this provides a new, accurate and fast phasing method which leverages the available SNP data effectively. This is a crucial step towards probe-level resolution tree inference on genomic rearrangement events in cancer and exact quantification of genetic heterogeneity for routine applications in translational cancer research.

TOP

TP109 (HT) - In silico prediction of physical protein interactions and characterization of interactome orphans
Theme: Proteins
Date: Tuesday, July 14, 2:20 pm - 2:40 pmRoom: The Liffey B

Presenting author: Igor Jurisica, Princess Margaret Cancer Centre, Canada

Flavia Pivetta, CRO, Italy
A Losardo, CRO, Italy
Z Ding, MDA, United States
Yun Niu, Nanjing University, China
F Vafaee, USW, Australia
Julia Petschnigg, UCL, United Kingdom
Gordon Mills, MDA, United States

Session Chair: Donna Slonim

Presentation Overview: Show
Protein interaction networks represent an essential infrastructure for systems biology. However, about 20% of human proteins have no interactions and another 33% have <= five. Many of these proteins play important roles in disease and are potential drug targets. To reduce this “disease-related sparseness” of the human interactome, we introduced a data mining-based method, FpClass, and predict 250,452 high confidence PPIs among 10,529 proteins, including 1,089 interactome orphans. Compared to previous methods, FpClass achieved better agreement with experimentally detected PPIs. Using three bioassays we validated 137 of 233 tested predictions; 5 involving orphans now shown to interact with P53. Overall, validation achieved 74% sensitivity with 53% FDR. To better understand why some proteins have few known interactions we investigated their properties and discovered that they are significantly younger, more tissue specific, and more likely to be extracellular than other proteins. However, additional challenges prevent systematic study of these proteins.

TOP

TP110 (LT) - Inferring clonal evolution from single-cell sequencing data
Theme: Disease
Date: Tuesday, July 14, 2:40 pm - 3:00 pmRoom: The Auditorium

Presenting author: Edith Ross, University of Cambridge, United Kingdom

Edith Ross, University of Cambridge, United Kingdom
Florian Markowetz, University of Cambridge, United Kingdom

Session Chair: Natasa Przulj

Presentation Overview: Show
Tumour evolution leads to genetic intra-tumour heterogeneity, which poses major challenges to cancer therapy. While this heterogeneity has been documented in several cases, many details of the underlying evolutionary processes are still unknown.

Studying pathways of tumour evolution promises to provide insights into early stages of cancer development and to allow predictions about whether or not early-stage tumours are likely to progress to more aggressive forms. So far, most methods for inferring tumour phylogenies use bulk sequencing data. However, they struggle to deconvolute the mixed signal into separate clones and their corresponding genotypes.

Here, we present oncoNEM, a probabilistic method for inferring intra-tumour evolutionary lineage trees from noisy exome- or genome-wide single-cell sequencing data. OncoNEM is based on the nested structure of mutations observed between cells and jointly infers the tree structure, the number of clones and their composition.

We evaluate the accuracy of oncoNEM in the controlled setting of a simulation study and demonstrate that (i) our method can accurately infer trees of tumour evolution despite the high allelic dropout rates of current single-cell sequencing technologies, (ii) it is robust to inaccuracies in the estimation of model parameters and (iii) it substantially outperforms competing methods.

TOP

TP111 (HT) - Cereblon as a gateway for pharmacologically induced teratogenicity
Theme: Proteins
Date: Tuesday, July 14, 2:40 pm - 3:00 pmRoom: The Liffey B

Presenting author: Andrei Lupas, Max-Planck-Institute for Developmental Biology, Germany

Iuliia Boichenko, Max-Planck-Institute for Developmental Biology, Germany
Mateusz Korycinski, Max-Planck-Institute for Developmental Biology, Germany
Hongbo Zhu, Max-Planck-Institute for Developmental Biology, Germany
Murray Coles, Max-Planck-Institute for Developmental Biology, Germany
Fabio Zanini, Max-Planck-Institute for Developmental Biology, Germany
Marcus Hartmann, Max-Planck-Institute for Developmental Biology, Germany
Birte Hernandez Alvarez, Max-Planck-Institute for Developmental Biology, Germany

Session Chair: Donna Slonim

Presentation Overview: Show
In the public perception, thalidomide mainly evokes children with stunted limbs. Less known is its ongoing importance for treating multiple myeloma and leprosy. Interest in its further pharmacological development thus remains high, but is hindered by the limited understanding of its teratogenic side-effects. Even the main target of thalidomide in the human body, cereblon, was unknown until recently. Given the intractability of cereblon for biochemical studies, we analyzed the evolution of its thalidomide-binding domain and used sequence-structure relationships to identify a prokaryotic model system, which we validated in vitro and in vivo, in a zebrafish fin development assay. In computational and experimental searches we identified uridine as the first biological, universally available ligand. We also found that a surprisingly large number of pharmacologically important substances with known teratogenic effects act through the same binding site as thalidomide, identifying cereblon as a gateway for teratogenicity in the human body.

TOP

TP112 (PT) - Exploring the structure and function of temporal networks with dynamic graphlets
Theme: Systems
Date: Tuesday, July 14, 3:30 pm - 3:50 pmRoom: The Auditorium

Presenting author: Tijana Milenkovic, University of Notre Dame, United States

Huili Chen, University of Notre Dame, United States
Yuriy Hulovatyy, University of Notre Dame, United States

Session Chair: Natasa Przulj

Presentation Overview: Show
Motivation: With increasing availability of temporal real-world networks, how to efficiently study these data? One can model a temporal network as a single aggregate static network, or as a series of time-specific snapshots, each being an aggregate static network over the corresponding time window. Then, one can use established methods for static analysis on the resulting aggregate network(s), but losing in the process valuable temporal information either completely, or at the interface between different snapshots, respectively. Here, we develop a novel approach for studying a temporal network more explicitly, by capturing inter-snapshot relationships.

Results: We base our methodology on well-established graphlets (subgraphs), which have been proven in numerous contexts in static network research. We develop new theory to allow for graphlet-based analyses of temporal networks. Our new notion of dynamic graphlets is different from existing dynamic network approaches that are based on temporal motifs (statistically significant subgraphs). The latter have limitations: their results depend on the choice of a null network model that is required to evaluate the significance of a subgraph, and choosing a good null model is non-trivial. Our dynamic graphlets overcome the limitations of the temporal motifs. Also, when we aim to characterize the structure and function of an entire temporal network or of individual nodes, our dynamic graphlets outperform the static graphlets. Clearly, accounting for temporal information helps. We apply dynamic graphlets to temporal age-specific molecular network data to deepen our limited knowledge about human aging.

TOP

TP113 (HT) - Computational saturated mutagenesis for mapping protein binding landscapes and identifying affinity- and specificity-enhancing mutations
Theme: Proteins
Date: Tuesday, July 14, 3:30 pm - 3:50 pmRoom: The Liffey B

Presenting author: Julia Shifman, Hebrew University of Jerusalem, Israel

Yonatan Aizner, Hebrew University of Jerusalem, Israel
Jason Shirian, Hebrew University of Jerusalem, Israel
Oz Sharabi, Hebrew University of Jerusalem, Israel

Session Chair: Donna Slonim

Presentation Overview: Show
We developed an in silico saturation mutagenesis protocol that allows us to scan any binding interface with all amino acids and to predict changes in free energy of binding due to all single mutations, thereby constructing binding landscapes for various protein-protein interactions (PPIs). We tested the performance of the in silico saturation mutagenesis protocol in two evolutionary different classes of PPIs: high-affinity and multispecific PPIs and demonstrated that their binding landscapes are remarkably different. Wild-type sequences of high-affinity complexes are nearly optimized for binding and contain only a handful of mutations that enhance binding affinity further. In contrast, sequences of multispecific proteins lie far from the fitness maximum, presenting multiple possibilities for improvement. In both examples we show that our computational predictions agree well with experimental results and allow for successful identification of affinity- and specificity-enhancing mutations and cold-spot positions where mutations to several amino acids produce affinity improvement.

TOP

TP114 (LT) - Data visualization and modeling using Atlas of Cancer Signaling Network predicts clinical outcome
Theme: Systems
Date: Tuesday, July 14, 3:50 pm - 4:10 pmRoom: The Auditorium

Presenting author: Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, France

Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, France
Eric Bonnet, Institut Curie –U900 INSERM - Mines ParisTech, France
Eric Viara, Institut Curie –U900 INSERM - Mines ParisTech, France
Maia Chanrion, Institut Curie –U900 INSERM - Mines ParisTech, France
Hien-Anh Nguyen, Institut Curie –U900 INSERM - Mines ParisTech, France
David Cohen, Institut Curie –U900 INSERM - Mines ParisTech, France
Laurence Calzone, Institut Curie –U900 INSERM - Mines ParisTech, France
luca Grieco, Institut Curie –U900 INSERM - Mines ParisTech, France
Christophe Russo, Institut Curie –U900 INSERM - Mines ParisTech, France
Maria Kondratova, Institut Curie –U900 INSERM - Mines ParisTech, France
Marie Dutreix, Institut Curie –U900 INSERM - Mines ParisTech, France
Sylvie Robine, Institut Curie –U900 INSERM - Mines ParisTech, France
Emmanuel Barillot, Institut Curie –U900 INSERM - Mines ParisTech, France
Andrei Zinovyev, Institut Curie –U900 INSERM - Mines ParisTech, France

Session Chair: Natasa Przulj

Presentation Overview: Show
The successful application of bioinformatics and systems biology methods for analysis of high-throughput data in cancer research depends on availability of global and detailed reconstructions of signaling networks amenable for computational analysis. The Atlas of Cancer Signaling Network (ACSN) is an interactive and comprehensive map of molecular mechanisms implicated in cancer that includes tools for map navigation, visualization and analysis of molecular data in the context of signaling network maps. Constructing and updating ACSN involves manual literature curation and participation of experts in the corresponding fields. The cancer-oriented content of ACSN is original and covers major mechanisms involved in cancer progression. Cell signaling mechanisms are depicted in details, together creating a seamless ‘geographic-like’ map of molecular interactions frequently deregulated in cancer. The map is browsable using NaviCell web interface using the Google Maps engine and semantic zooming principle. The associated web-blog provides a forum for commenting and curating the ACSN content. ACSN allows uploading heterogeneous omics data from users on top of the maps for visualization and performing functional analyses. We suggest several scenarios for ACSN application in cancer research for visualizing high-throughput data. In addition, we show a study on drug sensitivity prediction using the ACSN. Finally, we describe how epithelial to mesenchymal transition (EMT) signaling network from the ACSN collection has been used for finding metastasis inducers in colon cancer through network analysis. ACSN may support data analysis and interpretation; patient stratification; prediction of treatment response and resistance to cancer drugs and design of novel treatment strategies.

TOP

TP115 (PT) - Using Kernelized Partial Canonical Correlation Analysis To Study Directly Coupled Side Chains and Allostery in Small G Proteins
Theme: Proteins
Date: Tuesday, July 14, 3:50 pm - 4:10 pmRoom: The Liffey B

Presenting author: Mu Zhu, University of Waterloo, Canada

Forbes Burkowski, University of Waterloo, Canada
Mu Zhu, University of Waterloo, Canada

Session Chair: Donna Slonim

Presentation Overview: Show
Motivation: Inferring structural dependencies among a protein’s side
chains helps us understand their coupled motions. It is known that
coupled fluctuations can reveal pathways of communication used for
information propagation in a molecule. Side-chain conformations are
commonly represented by multivariate angular variables, but existing
partial correlation methods that can be applied to this inference task
are not capable of handling multivariate angular data. We propose
a novel method to infer direct couplings from this type of data, and
show that this method is useful for identifying functional regions and
their interactions in allosteric proteins.

Results: We developed a novel extension of canonical correlation
analysis (CCA), which we call “kernelized partial CCA” (or simply
KPCCA), and used it to infer direct couplings between side chains,
while disentangling these couplings from indirect ones. Using the
conformational information and fluctuations of the inactive structure
alone for allosteric proteins in the Ras and other Ras-like families,
our method identified allosterically important residues not only as
strongly coupled ones but also in densely connected regions of the
interaction graph formed by the inferred couplings. Our results were
in good agreement with other empirical findings. By studying distinct
members of the Ras, Rho, and Rab sub-families, we show further that
KPCCA was capable of inferring common allosteric characteristics in
the small G protein super-family.

TOP

TP116 (LT) - Detecting Molecular Similarities Between Allergenic And Metazoan Parasitic Proteins: Allergy In The Light of Immunity
Theme: Proteins
Date: Tuesday, July 14, 4:10 pm - 4:30 pmRoom: The Liffey B

Presenting author: Nicholas Furnham, London School of Hygiene and Tropical Medicine, United Kingdom

Nidhi Tyagi, European Molecular Biology Laboratory, United Kingdom
Edward Farnell, University of Cambridge, United Kingdom
Colin Fitzsimmons, University of Cambridge, United Kingdom
Stephanie Ryan, University of Edinburgh, United Kingdom
Rick Maizels, University of Edinburgh, United Kingdom
David Dunne, University of Cambridge, United Kingdom
Janet Thornton, European Molecular Biology Laboratory, United Kingdom
Nicholas Furnham, London School of Hygiene & Tropical Medicine, United Kingdom

Session Chair: Donna Slonim

Presentation Overview: Show
Allergic reactions are observed to be very similar to those implicated in the acquisition of an important degree of immunity against metazoan parasites, eliciting a similar immunoglobulin E (IgE) immune response. Based on the hypothesis that IgE-mediated immune responses evolved to provide extra protection against metazoan parasites rather than to cause allergy, we predict that environmental allergens will share key molecular properties with metazoan parasite antigens that are specifically targeted by IgE. Using large scale computational studies, we have established molecular similarity between parasite proteins and allergens and are able to predict the regions of parasite proteins that potentially share similarity with the IgE-binding region(s) of allergens. Nearly half of 2445 parasite proteins that show significant similarity with allergenic proteins fall within the 10 most abundant allergenic protein domain families. Our experimental studies support the predictions, and we present the first confirmed example of a plant pollen-like protein that is the commonest allergen in pollen in a worm and confirming it is targeted by IgE in those exposed to infection in a schistosomiasis endemic area of Uganda. The identification of such similarities explains the ‘off-target’ effects of the IgE-mediated immune system in allergy.

TOP