Paper Presentation Schedule

Attention Conference Presenters - please review the Speaker Information Page available here.

All Highlights and Proceedings Track presentations are presented by scientific area part of the combined Paper Presentation schedule.


PP01 (HT) Simultaneous Identification of Multiple Driver Pathways in Cancer
Date: Sunday, July 13, 10:30 am - 10:55 amRoom: 311

Presenting author: Mark Leiserson, Brown University, United States

Additional authors:
Dima Blokh, Tel-Aviv University, Israel
Roded Sharan, Tel-Aviv University, Israel
Benjamin Raphael, Brown University, United States

Session Chair: Terry Gaasterland

Presentation Overview: Show More

An important challenge in cancer genome sequencing is to distinguish the small subset of somatic driver mutations that cause cancer from the multitude of random passenger mutations in a tumor. Since patients with the same cancer type typically have different collections of mutations, single-gene tests of recurrence are insufficient for this task. We present Multi-Dendrix, an algorithm to identify combinations of mutations with combinatorial properties consistent with cancer pathways. Multi-Dendrix does not use prior knowledge of pathways, and finds multiple sets of mutations simultaneously since driver mutations target multiple pathways in a patient. We applied Multi-Dendrix to glioblastoma and breast cancer data from The Cancer Genome Atlas. In both cancers, Multi-Dendrix identified gene sets overlapping major signaling pathways -- including Rb, PI(3)K, and p53 -- that were manually annotated in the TCGA publications, as well as novel gene sets that include transcription factors and regulators.

Keyword: Applied Bioinformatics, Protein Interactions & Molecular Networks

TOP


PP02 (PT) Ragout - A reference-assisted assembly tool for bacterial genomes
Date: Sunday, July 13, 10:30 am - 10:55 amRoom: 304

Presenting author: Mikhail Kolmogorov, Saint-Petersburg Academic University, Russia

Additional authors:
Benedict Paten, University of California, Santa Cruz, United States
Brian Raney, University of California, Santa Cruz, United States
Son Pham, University of California, San Diego, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show More

Bacterial genomes are simpler than mammalian ones, and yet assembling the former from the data currently generated by high-throughput short read sequencing machines still results in hundreds of contigs. To improve assembly quality, recent studies have utilized longer Pacific Biosciences (PacBio) reads or jumping libraries to connect contigs into larger scaffolds or help assemblers resolve ambiguities in repetitive regions of the genome. However, their popularity in contemporary genomic research is still limited by high cost and error rates. In this work, we explore the possibility of improving assemblies by using complete genomes from closely related species/strains. We present Ragout, a genome rearrangement approach, to address this problem. In contrast with most reference-guided algorithms, where only one reference genome is used, Ragout uses multiple references along with the evolutionary relationship among these references in order to determine the correct order of the contigs. Additionally, Ragout uses the assembly graph and multi-scale synteny blocks to reduce assembly gaps caused by small contigs from the input assembly. In simulations as well as real datasets, we believe that for common bacterial species, where many complete genome sequences from related strains have been available, the current high-throughput short read sequencing paradigm is sufficient to obtain a single high-quality scaffold for each chromosome. The Ragout software is freely available at: https://github.com/fenderglass/Ragout.

Keyword: Sequence Analysis

TOP


PP03 (HT) Linking hypothetical patterns to disease molecular signatures in Alzheimer's disease
Date: Sunday, July 13, 11:00 a.m. - 11:25 a.m.Room: 311

Presenting author: Ashutosh Malhotra, Fraunhofer institute for algorithms and scientific computing, Germany

Additional authors:
Martin Hofmann-Apitius, Fraunhofer institute for algorithms and scientific computing, Germany
Erfan Younesi, Fraunhofer institute for algorithms and scientific computing, Germany

Session Chair: Terry Gaasterland

Presentation Overview: Show More

Automated information extraction and knowledge acquisition technology (“text mining”) share the potential to possibly reduce manual reading and human curation efforts for the construction of knowledge bases. Particularly in reference to complex, mostly idiopathic diseases like Alzheimer’s disease (AD), automatic recognition of stage specific speculative statements communicating experimental finding can provide a new insights into the directions of disease etiology and progression. However, a systematic gathering of all scientific speculation that exists in a given context is a non-trivial task and, if done manually, is laborious and time-consuming.This work presents a methodology that demonstrates how using a dictionary of speculative patterns (HypothesisFinder approach) in combination with designed Alzheimer's disease ontology (ADO) enables the collection, interpretation, curation and discovery of a broad spectrum of knowledge needed for efficient and systematic AD research.

Keyword: Text Mining, Databases & Ontologies

TOP


PP04 (PT) AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
Date: Sunday, July 13, 11:00 am - 11:25 amRoom: 304

Presenting author: Ergude Bao, University of California, Riverside, United States

Additional authors:
Tao Jiang, University of California, Riverside, United States
Thomas Girke, University of California, Riverside, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show More

De novo assemblies of genomes remain one of the most challenging applications in next generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species. Here we introduce AlignGraph, an algorithm for extending and joining de novo assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and pre-assembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the paired-end multi-positional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7-62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9-94.5% and 80.3-165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph’s efficiency in improving genome assemblies by taking advantage of closely related references.

Keyword: Sequence Analysis

TOP


PP05 (PT) Cross-study validation for assessment of prediction models and algorithms
Date: Sunday, July 13, 11:00 am - 11:25 amRoom: 302

Presenting author: Christoph Bernau, Leibniz Supercomputing Center, Germany

Additional authors:
Markus Riester, Harvard School of Public Health, United States
Anne-Laure Boulesteix, LMU Munich, Germany
Giovanni Parmigiani, Dana-Farber Cancer Institute, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Levi Waldron, City University of New York School of Public Health, United States
Lorenzo Trippa, Dana-Farber Cancer Institute,, United States

Session Chair: Bonnie Berger

Presentation Overview: Show More

Motivation: Numerous competing algorithms for prediction modeling in high-dimensional settings have been developed in the statistical and machine learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few examplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples processed in different laboratories, and cross-validation within examplary datasets may not adequately reflect performance in this context. Methods: Systematic cross-study validation is performed in simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene expression datasets, with the objective of predicting distant metastasis-free survival (DMFS). An evaluation statistic, in this paper the C-index, is computed for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: We develop a systematic approach to “cross-study validation” to replace or supplement conventional cross-validation for evaluation of high-dimensional prediction models when independent datasets are available. In data-driven simulations and in our application to survival prediction with eight breast cancer microarray datasets, standard cross-validation suggests inflated discrimination accuracy for all competing algorithms when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor.

Keyword: Gene Regulation and Transcriptomics

TOP


PP06 (HT) Constructing module maps for integrated analysis of heterogeneous biological networks
Date: Sunday, July 13, 11:30 am - 11:55 pmRoom: 311

Presenting author: David Amar, Tel Aviv University, Israel

Session Chair: Terry Gaasterland

Presentation Overview: Show More

We developed a method that takes as input two types of gene interactions and constructs a summary module map, which integrates the two sources. The presentation will start with a thorough introduction to the concept of module maps as tools for summarizing heterogeneous interaction networks. Then we will discuss extant and novel methods to construct such maps. We shall show that our novel algorithm considerably improves over prior art on simulated and real data. We shall demonstrate the method in analyses of data from three distinct domains: (1) yeast protein-protein interactions and negative genetic interactions, (2) protein-protein interactions and DNA damage-specific positive genetic interactions in yeast, and (3) gene expression profiles of lung cancer patients. Each analysis provides confirmatory and novel insights, and demonstrates the power of module maps for deeper analysis of heterogeneous high throughput data.

Keyword: Protein Interactions & Molecular Networks, Gene Regulation & Transcriptomics

TOP


PP07 (PT) ExSPAnder: a Universal Repeat Resolver for DNA Fragment Assembly
Date: Sunday, July 13, 11:30 am - 11:55 pmRoom: 304

Presenting author: Andrey D. Prjibelski, St. Petersburg Academic University, Russia

Additional authors:
Irina Vasilinetc, St. Petersburg Academic University, Russia
Anton Bankevich, St. Petersburg Academic University, Russia
Alexey Gurevich, St. Petersburg Academic University, Russia
Tatiana Krivosheeva, St. Petersburg Academic University, Russia
Sergey Nurk, St. Petersburg Academic University, Russia
Son Pham, University of California, San Diego, United States
Anton Korobeynikov, St. Petersburg Academic University, Russia
Alla Lapidus, St. Petersburg Academic University, Russia
Pavel Pevzner, University of California, San Diego, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show More

Next-generation sequencing (NGS) technologies have raised a challenging de novo genome assembly problem that is furtheramplified in recently emerged single-cell sequencing projects. Whilevarious NGS assemblers can utilize information from several libraries of read-pairs, most of them were originally developed for a singlelibrary and do not fully benefit from multiple libraries. Moreover, mostassemblers assume uniform read coverage, condition that does nothold for single-cell projects where utilization of read-pairs is evenmore challenging. We have developed an exSPAnder algorithm thataccurately resolves repeats in the case of both single and multiplelibraries of read-pairs in both standard and single-cell assemblyprojects.

Keyword: Sequence Analysis

TOP


PP08 (PT) Large scale analysis of signal reachability
Date: Sunday, July 13, 12:00 pm - 12:25 pmRoom: 311

Presenting author: Andrei Todor, University of Florida, United States

Additional authors:
Haitham Gabr, University of Florida, United States
Alin Dobra, University of Florida, United States
Tamer Kahveci, University of Florida, United States

Session Chair: Terry Gaasterland

Presentation Overview: Show More

Motivation: Major disorders, such as leukemia, have been shown to alter the transcription of genes. Understanding how gene regulation is affected by such aberrations is of utmost importance. One promising strategy towards this objective is to compute whether signals can reach to the transcription factors through the transcription regulatory network. Due to the uncertainty of the regulatory interactions, this is a #P-complete problem and thus solving it for very large transcription regulatory networks remains to be a challenge. Results: We develop a novel and scalable method to compute the probability that a signal originating at any given set of source genes can arrive at any given set of target genes (i.e., transcription factors) when the topology of the underlying signaling network is uncertain. Our method tackles this problem for large networks while providing a provably accurate result. Our method follows a divide-and-conquer strategy. We break down the given network into a sequence of non-overlapping subnetworks such that reachability can be computed autonomously and sequentially on each subnetwork. We represent each interaction using a small polynomial. The product of these polynomials express different scenarios when a signal can or cannot reach to target genes from the source genes. We introduce polynomial collapsing operators for each subnetwork. These operators reduce the size of the resulting polynomial and thus the computational complexity dramatically. We show that our method scales to entire human regulatory networks in only seconds, while the existing methods fail beyond a few tens of genes and interactions. We demonstrate that our method can successfully characterize key reachability characteristics of the entire transcriptions regulatory networks of patients affected by eight different subtypes of leukemia, as well as those from healthy control samples.

Keyword: Gene Regulation and Transcriptomics

TOP


PP09 (HT) Complete Genome Assembly with Long Reads
Date: Sunday, July 13, 12:00 pm - 12:25 pmRoom: 304

Presenting author: Adam Phillippy, National Biodefense Analysis and Countermeasures Center, United States

Additional authors:
Sergey Koren, National Biodefense Analysis and Countermeasures Center, United States
Gregory Harhay, U.S. Department of Agriculture, United States
Timothy Smith, U.S. Department of Agriculture, United States
James Bono, U.S. Department of Agriculture, United States
Dayna Harhay, U.S. Department of Agriculture, United States
Scott McVey, U.S. Department of Agriculture, United States
Diana Radune, National Biodefense Analysis and Countermeasures Center, United States
Nicholas Bergman, National Biodefense Analysis and Countermeasures Center, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show More

The short reads generated by first- and second-generation sequencing often produce highly fragmented assemblies, even for small genomes. Single-molecule sequencing addresses this problem by greatly increasing read length, which simplifies assembly. By analyzing the repeat complexity of 2,267 complete microbial genomes, we have shown that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio sequencing library. This reduces the cost of microbial finishing by an order of magnitude. More recently, we assembled the eukaryotic genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, and Drosophila melanogaster using only PacBio reads. In the case of D. melanogaster, our PacBio Corrected Reads (PBcR) algorithm assembled the genome more completely than the current reference, which involved over a decade of manual finishing. I will present both these past and present results, as well as a new approach for scaling long-read assembly to gigabase-sized genomes.

Keyword: Sequence Analysis, Applied Bioinformatics

TOP


PP10 (PT) GRASP: Analysis of genotype-phenotype results from 1,390 genome-wide association studies and corresponding open access database
Date: Sunday, July 13, 3:05 pm - 3:30 pmRoom: 311

Presenting author: Andrew Johnson, National Institute's of Health, United States

Additional authors:
Christopher O'Donnell, National Institute's of Health, United States
Richard Leslie, University of Massachusetts Medical School, United States

Session Chair: Fran Lewitter

Presentation Overview: Show More

We created a deeply extracted and annotated database of GWAS study results. GRASP v1.0 contains >6.2 million SNP-phenotype association from among 1,390 GWAS studies. We re-annotated GWAS results with 16 annotation sources including some rarely compared to GWAS results (e.g., RNAediting sites, lincRNAs, PTMs). RESULTS. GWAS have grown exponentially, with increases in sample sizes and markers tested, and continuing bias toward European ancestry samples. GRASP contains >100,000 phenotypes, roughly: eQTLs (71.5%), metabolite QTLs (21.2%), methylation QTLs (4.4%), and diseases, biomarkers and other traits (2.8%). cis-eQTLs, meQTLs, mQTLs and MHC region SNPs are highly enriched among significant results. After removing these categories, GRASP still contains a greater proportion of studies and results than comparable GWAS catalogs. Cardiovascular disease and related risk factors predominate remaining GWAS results, followed by immunological, neurological and cancer traits. Significant results in GWAS display a highly gene-centric tendency. Sex chromosome X (OR=0.18[0.16-0.20]) and Y (OR=0.003[0.001-0.01]) genes are depleted for GWAS results. Gene length is correlated with GWAS results at nominal significance (P<0.05) levels. We show this gene length correlation decays at increasingly more stringent P-value thresholds. Potential pleiotropic genes and SNPs enriched for multi-phenotype association in GWAS are identified. However, we note possible population stratification at some of these loci. Finally, via re-annotation we identify compelling functional hypotheses at GWAS loci, in some cases unrealized in studies to date. CONCLUSION. Pooling summary-level GWAS results and re-annotating with bioinformatics predictions and molecular features provides a good platform for new insights. The GRASP database is available at http://apps.nhlbi.nih.gov/grasp.

Keyword: Population Genomics

TOP


PP11 (HT) Capturing short tandem repeat variation from paired-end sequencing data
Date: Sunday, July 13, 3:05 pm - 3:30 pmRoom: 304

Presenting author: Mikael Boden, The University of Queensland, Australia

Additional authors:
Minh Cao, The University of Queensland, Australia
Edward Tasker, Monash University, Australia
Sailaja Vishwanathan, Monash University, Australia
Sridevi Sureshkumar, Monash University, Australia
Sureshkumar Balasubramanian, Monash University, Australia
Kai Willadsen, The University of Queensland, Australia
Michael Imelfort, The University of Queensland, Australia

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

Expansion of tri-nucleotide repeats is known to cause over twenty neurological diseases. While next-generation sequencing technologies offer unprecedented opportunities to assess variation in genomes, they have limitations in regard to repeat regions. We review options scientists have to estimate length variation of short tandem repeats of biological significance, and to investigate what causes their instability. We present a Bayesian method to statistically detect variation from paired-end sequence data to (for the first time) analyse repeat tracts of sizes beyond the read length of current technology. Using strains of A. thaliana, we experimentally validate estimates, and recover the only known unstable repeat locus IIL1. Extensive quantitative comparisons of alternative analysis pipelines provide guidance to the likely outcome in terms of repeat variant calling accuracy.

Keyword: Sequence Analysis, Population Genomics

TOP


PP12 (HT) How antibodies chase antigens, how Antigens try to escape and how we can use this to predict antibody specificity
Date: Sunday, July 13, 3:05 pm - 3:30 pmRoom: 302

Presenting author: Inbal Sela-Culang, Bar-Ilan University, Israel

Additional authors:
Yanay Ofan, Bar Ilan University, Israel
Vered Kunik, Bar Ilan University, Israel
Anat Burkovitz, Bar Ilan University, Israel
Guy Nimrod, Bar Ilan University, Israel
Mohammed Rafii-El-Idrissi Benhnia, La Jolla Institute for Allergy and Immunology, United States
Michael H. Matho, La Jolla Institute for Allergy and Immunology, United States
Thomas Kaever, La Jolla Institute for Allergy and Immunology, United States
Matt Maybeno, La Jolla Institute for Allergy and Immunology, United States
Andrew Schlossman, La Jolla Institute for Allergy and Immunology, United States
Dirk Zajonc, La Jolla Institute for Allergy and Immunology, United States
Shane Crotty, La Jolla Institute for Allergy and Immunology, United States
Bjoern Peters, La Jolla Institute for Allergy and Immunology, United States
Sheng Li, University of Texas Health Science Center, United States
Yan Xiang, University of Texas Health Science Center, United States

Session Chair: Toni Kazic

Presentation Overview: Show More

Abs must bind indistinct patches on proteins that attempt to escape recognition. They must be able to recognize virtually any surface while strictly maintaining their own fold. A little is known about the mechanisms that allow Abs to do this. Thus, while most drugs that are in clinical development are Abs, there is currently no simple way to determine experimentally or computationally what exactly they bind.
We will review a series of studies that revealed key mechanisms that enable Abs to perform these tasks. We will present a novel prediction approach that utilizes these findings, combined with simple competition assays, to predict where on an Ag a given Ab will bind. The accuracy of these predictions is verified experimentally using crystallography and other methods. To conclude, we will bring more examples, and discuss the power of combining sophisticated predictions with simple experiments.

Keyword: Protein Structure & Function, Protein Interactions & Molecular Networks

TOP


PP13 (HT) A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer
Date: Sunday, July 13, 3:35 pm - 4:00 pmRoom: 311

Presenting author: Levi Waldron, City University of New York, United States

Additional authors:
Benjamin Haibe-Kains, Princess Margaret Cancer Centre, Canada
Aedin Culhane, Dana-Farber Cancer Institute, United States
Markus Riester, Dana-Farber Cancer Institute, United States
Thomas Risch, Dana-Farber Cancer Institute, United States
Svitlana Tyekucheva, Dana-Farber Cancer Institute, United States
Ina Jazic, Dana-Farber Cancer Institute, United States
Xin Victoria Wang, Dana-Farber Cancer Institute, United States
Mahnaz Ahmadifar, Dana-Farber Cancer Institute, United States
Benjamin Frederick Ganzfried, Dana-Farber Cancer Institute, United States
Giovanni Parmigiani, Dana-Farber Cancer Institute, United States
Curtis Huttenhower, Havard School of Public Health, United States
Michael Birer, Massachusetts General Hospital, United States
Christoph Bernau, LMU Munich, Germany

Session Chair: Fran Lewitter

Presentation Overview: Show More

The growth of genomic technologies has generated a bioinformatics cottage industry creating gene signatures of disease and disease outcomes. Specialized tools for regression and machine learning now make it relatively easy to tune and train high-dimensional models for the prediction of patient outcome from genomic data, but most such published models remain orphaned in the literature, without follow-up validation or clinical application. We therefore undertook a systematic evaluation of 14 published gene expression-based outcome prediction models for late-stage, high-grade, serous ovarian cancer, in a curated database of 1,251 patients from 10 microarray datasets. This work assesses: 1) the accuracy of published predictive models when applied to independent datasets, 2) which modeling approaches that have been most and least effective, and 3) the influence of popular validation datasets on the literature. This talk argues for changes in what constitutes “validation” of prediction models generated from genomic data.

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP


PP14 (PT) BlockClust: efficient clustering and classification of non-coding RNAs from short read profiles
Date: Sunday, July 13, 3:35 pm - 4:00 pmRoom: 304

Presenting author: Fabrizio Costa, University of Freiburg, Germany

Additional authors:
Dominic Rose, University of Freiburg, Germany
Rolf Backofen, University of Freiburg, Germany
Pavankumar Videm, University of Freiburg, Germany

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

Non-coding RNAs play a vital role in many cellular processes such as RNA splicing, translation, gene regulation. However the vast majority of ncRNAs still have no functional annotation. One prominent approach for putative function assignment is clustering of transcripts according to sequence and secondary structure. However sequence information is changed by post-transcriptional modifications, and secondary structure is only a proxy for the true three dimensional conformation of the RNA polymer. A different type of information that does not suffer from these issues and that can be used for the detection of RNA classes, is the pattern of processing and its traces in small RNA-seq reads data. Here we introduce BlockClust, an efficient approach to detect transcripts with similar processing patterns. We propose a novel way to encode expression profiles in compact discrete structures, which can then be processed using fast graph kernel techniques. We perform both unsupervised clustering and develop family specific discriminative models; finally we show how the proposed approach is both scalable, accurate and robust across different organisms, tissues and cell lines.

Keyword: RNA Bioinformatics

TOP


PP15 (PT) Tertiary structure-based prediction for conformational B-cell epitopes through B factors
Date: Sunday, July 13, 3:35 pm - 4:00 pmRoom: 302

Presenting author: Jing Ren, University of Technology Sydney, Australia

Additional authors:
Qian Liu, University of Technology Sydney, Australia
John Ellis, University of Technology Sydney, Australia
Jinyan Li, University of Technology Sydney, Australia

Session Chair: Toni Kazic

Presentation Overview: Show More

Motivation: B-cell epitope is a small area on the surface of an antigen that binds to an antibody. Accurately locating epitopes is of critical importance for vaccine development. Compared with wet-lab methods, computational methods have strong potential for efficient and large-scale epitope prediction for antigen candidates at much lower cost. However, it is still not clear which features are good determinants for accurate epitope prediction, leading to the unsatisfactory performance of existing prediction methods. Method and results: We propose a much more accurate B-cell epitope prediction method. Our method uses a new feature B factor (obtained from X-ray crystallography), combined with other basic physicochemical, statistical, evolutionary and structural features of each residue. These basic features are extended by a sequence window and a structure window. All these features are then learned by a two-stage random forest model to identify clusters of antigenic residues and to remove isolated outliers. Tested on a dataset of 55 epitopes from 45 tertiary structures, we prove that our method significantly outperforms all three existing structure-based epitope predictors. Following comprehensive analysis, it is found that features such as B factor, relative accessible surface area and protrusion index play an important role in characterizing B-cell epitopes. Our detailed case studies on an HIV antigen and an influenza antigen confirm that our second stage learning is effective for clustering true antigenic residues and for eliminating self-made prediction errors introduced by the first-stage learning.

Keyword: Protein Structure and Function

TOP


PP16 (HT) Dissecting Cancer Heterogeneity with network based approach
Date: Sunday, July 13, 4:05 pm - 4:30 pmRoom: 311

Presenting author: Teresa Przytycka, National Institutes of Health, United States

Additional authors:
DongYeon Cho, NIH, United States

Session Chair: Fran Lewitter

Presentation Overview: Show More

One of the major obstacles in developing cancer treatment is cancer heterogeneity. Heterogeneity of genetic and epigenetic alterations leads heterogeneity in gene expression making the discovery of genetic drivers and key genes dysregulated by their aberration very challenging.
Pathway-centric approaches have emerged as methods that can empower studies of cancer heterogeneity. I will describe two approaches we have recently developed. First, combining the utility of algorithmic techniques with the power of network-centric approaches, we designed a novel approach that allows unsupervised detection of subnetworks that are dysregulated in a subgroup of patients. The second, complementary approach, builds in topic modeling and utilizes a mixture model. Our model is based on two components (i) a measure of phenotypic similarity between the patients (ii) a list of features - possible disease causes such as mutations, copy number variations. This works complements the appreciation of cancer diversity wight the ability to represent it.

Keyword: Protein Interactions & Molecular Networks, Disease Models & Epidemiology

TOP


PP17 (HT) WISECONDOR detects small fetal chromosomal aberrations in low-coverage NGS data of maternal plasma.
Date: Sunday, July 13, 4:05 pm - 4:30 pmRoom: 304

Presenting author: Roy Straver, VU University Medical Center Amsterdam, Netherlands

Additional authors:
Erik Sistermans, VU University Medical Center Amsterdam, Netherlands
Henne Holstege, VU University Medical Center Amsterdam, Netherlands
Daphne van Beek, VU University Medical Center Amsterdam, Netherlands
Allerdien Visser, VU University Medical Center Amsterdam, Netherlands
Cees Oudejans, VU University Medical Center Amsterdam, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

The presentation gives a background of non-invasive prenatal testing with NGS and addresses the necessity of using low coverage data, as well as the resulting disadvantages on downstream analysis. We highlight our algorithmic contribution in the detection of fetal copy number aberrations which is based on a within-sample read depth comparison. Also, we address some of the underlying basic assumptions incorporated in our approach. Then we show that our method, called WISECONDOR, reliably detects small copy number changes in a cohort of more than 200 pregnancies. We take a few exemplary cases that highlight the potential of WISECONDOR and show how clinical geneticists can use this tool. Finally, we discuss how the tool is currently being implemented throughout hospitals in the Netherlands.

Keyword: Applied Bioinformatics, Sequence Analysis

TOP


PP18 (PT) An Efficient Parallel Algorithm for Accelerating Computational Protein Design
Date: Sunday, July 13, 4:05 pm - 4:30 pmRoom: 302

Presenting author: Yichao Zhou, Tsinghua University, China

Additional authors:
Wei Xu, Tsinghua University, China
Bruce R. Donald, Duke University, United States
Jianyang Zeng, Tsinghua University, China

Session Chair: Toni Kazic

Presentation Overview: Show More

Motivation: Structure-based computational protein design is an important topic in protein engineering. Under the assumption of a rigid backbone and a finite set of discrete conformations for side-chains, various methods have been proposed to address this problem. A popular method is to combine the Dead-End Elimination (DEE) and A* tree search algorithms, which provably finds the Global Minimum Energy Conformation (GMEC) solution. Results: In this paper, we improve the efficiency of computing A* heuristic functions for protein design and also propose a variant of A* algorithm in which the search process can be performed on GPUs in a massively parallel fashion . In addition, we made some efforts to address the memory exceeding problems in A* search. As a result, our enhancements can achieve a significant speedup of the original A* search for protein design in four orders of magnitude on big scale test data, while still maintaining an acceptable memory overhead. Our parallel A* search algorithm can be combined with iMinDEE, a recent DEE criterion for rotamer pruning to further improve structure-based computational protein design with the consideration of side-chain flexibility.

Keyword: Protein Structure and Function

TOP


PP19 (PT) Inductive Matrix Completion for Predicting Gene-Disease Associations
Date: Sunday, July 13, 4:35 pm - 5:00 pmRoom: 311

Presenting author: Nagarajan Natarajan, University of Texas at Austin, United States

Additional authors:
Inderjit Dhillon, University of Texas at Austin, United States

Session Chair: Fran Lewitter

Presentation Overview: Show More

Motivation: Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies --- for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms by observing patients. Similarly, the type of evidence available for genes varies --- for example, specific microarray probes convey information only for certain sets of genes. In this paper, we apply a novel matrix completion method recently developed by Jain 2013 to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases emph{not} seen at training time, unlike traditional matrix completion approaches and network-based inference methods that are transductive. Results: Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better - it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed method (second best) that has less than 15\% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e., genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature curated by Bornigen. Availability: Source code and datasets at http://www.cs.utexas edu/~naga86/research/IMC

Keyword: Disease Models and Epidemiology

TOP


PP20 (PT) Probabilistic Method for Detecting Copy Number Variation in a Fetal Genome using Maternal Plasma Sequencing
Date: Sunday, July 13, 4:35 pm - 5:00 pmRoom: 304

Presenting author: Ladislav Rampášek, University of Toronto, Canada

Additional authors:
Aryan Arbabi, University of Toronto, Canada
Michael Brudno, University of Toronto, Canada

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

Motivation: The past several years have seen the development of methodologies to identify genomic variation within a fetus through the non-invasive sequencing of maternal blood plasma. These methods are based on the observation that maternal plasma contains a fraction of DNA (typically 5-15%) originating from the fetus, and such methodologies have already been used for the detection of whole- chromosome events (aneuploidies), and to a more limited extent for smaller (typically several megabases long) Copy Number Variants (CNVs). Results: Here we present a probabilistic method for non-invasive analysis of de novo CNVs in fetal genome based on maternal plasma sequencing. Our novel method combines three types of information within a unified Hidden Markov Model: the imbalance of allelic ratios at SNP positions, the use of parental genotypes to phase nearby SNPs, and depth of coverage to better differentiate between various types of CNVs and improve precision. Our simulation results, based on in silico introduction of novel CNVs into plasma samples with 13% fetal DNA concentration, demonstrate a sensitivity of 90% for CNVs >400 kilobases (with 13 calls in an unaffected genome), and 40% for 50-400kb CNVs (with 108 calls in an unaffected genome). Availability: Implementation of our model and data simulation method is available at http://github.com/compbio-UofT/fCNV

Keyword: Population Genomics

TOP


PP21 (HT) BioC: a minimalist approach to interoperability for biomedical text processing
Date: Sunday, July 13, 4:35 pm - 5:00 pmRoom: 302

Presenting author: Rezarta Islamaj Doğan, U.S. National Library of Medicine, United States

Additional authors:
Paolo Ciccarese, Harvard University, United States
Kevin Bretonnel Cohen, University of Colorado, United States
Martin Krallinger, Spanish National Cancer Research Centre, Spain
Florian Leitner, Spanish National Cancer Research Centre, Spain
Zhiyong Lu, U.S. National Library of Medicine, United States
Yifan Peng, University of Delaware, United States
Fabio Rinaldi, University of Zurich, Switzerland
Manabu Torii, University of Delaware, United States
Alfonso Valencia, Spanish National Cancer Research Centre, Spain
Karin Verspoor, The University of Melbourne, Australia
Thomas C. Wiegers, North Carolina State University, United States
Cathy H. Wu, University of Delaware, United States
W. John Wilbur, U.S. National Library of Medicine, United States
Donald C. Comeau, U.S. National Library of Medicine, United States

Session Chair: Toni Kazic

Presentation Overview: Show More

After a brief motivation, there will an overview of the BioC format and supporting input/output libraries. Then there will be a summary of the BioC implementations, tools, services, and corpora currently available. Implementations of BioC to hold this data, read it from and write it back to XML files are available in C++, Go, Java, Python, and Ruby. Online services using the format are available for semantic role labeling, sentence simplification, and entity labeling. Text preprocessing pipelines for sentence segmentation, tokenization, parts of speech, lemmatization, and parsing are available in C++ and Java. Named entity recognizers are available for disease, genes, chemicals, species and mutations. Annotated corpora are available for abbreviation definition detection, disease mentions, protein-protein interaction events and metabolites. The examples will focus on tools and applications that demonstrate the features and flexibility of BioC.

Keyword: Text Mining, Databases & Ontologies

TOP


PP22 (HT) Country-specific antibiotic use practices impact the human gut resistome
Date: Monday, July 14, 10:30 am - 10:55 amRoom: 311

Presenting author: Kristoffer Forslund, European Molecular Biology Laboratory, Germany

Additional authors:
Shinichi Sunagawa, EMBL, Germany
Jens Roat Kultima, EMBL, Germany
Daniel Mende, EMBL, Germany
Manimozhiyan Arumugam, Copenhagen University, Germany
Athanasios Typas, EMBL, Germany
Peer Bork, EMBL, Germany

Session Chair: Janet Kelso

Presentation Overview: Show More

Despite increasing concerns over inappropriate use of antibiotics in medicine and food production, population-level resistance transfer into the human gut microbiota has not been demonstrated beyond individual case studies. To determine the "antibiotic resistance potential" for entire microbial communities, we employ metagenomic data and quantify the totality of known resistance genes in each community (its resistome) for 68 classes and subclasses of antibiotics. In 252 fecal metagenomes, we show that the most abundant resistance determinants are those for antibiotics also used in animals, and for antibiotics longer in use. Resistance potential is higher in samples from Spain, Italy and France than from Denmark, the US, or Japan. Differences in country-level data on antibiotic use in both humans and animals, where available, match the observed resistance potential differences. Antibiotic resistance determinants of individuals persist in the human gut flora for at least a year.

Keyword: Disease Models & Epidemiology, Evolution & Comparative Genomics

TOP


PP23 (PT) RNA-Skim: a rapid method for RNA-Seq quantification at transcript level
Date: Monday, July 14, 10:30 am - 10:55 amRoom: 304

Presenting author: Zhaojun Zhang, UNC - Chapel Hill, United States

Additional authors:
Wei Wang, University of California, Los Angeles, United States

Session Chair: Bernard Moret

Presentation Overview: Show More

Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base-pair level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. In order to improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method. Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity and introduces the notion of sig-mers that are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable to any state of the art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses less than 4% of the k-mers and less than 10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in less than 10 minutes per sample by using just a single thread on a commodity computer, which represents more than 100 speedup over the state of the art alignment-based methods, while delivering comparable or higher accuracy. Availability: The software is available at http://www.csbio.unc.edu/rs

Keyword: RNA Bioinformatics

TOP


PP24 (PT) Metabolite Identification through Multiple Kernel Learning on Fragmentation Trees
Date: Monday, July 14, 10:30 am - 10:55 amRoom: 302

Presenting author: Huibin Shen, Aalto University, Finland

Additional authors:
Kai Dührkop, Friedrich-Schiller-University Jena, Germany
Sebastian Böcker, Friedrich-Schiller-University Jena, Germany
Juho Rousu, Aalto University, Finland

Session Chair: Yanay Ofran

Presentation Overview: Show More

Motivation: Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods has been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods has been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures. Results: Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures. We introduce a family of kernels capturing the similarity of fragmentation trees, and combine these kernels using recently proposed multiple kernel learning approaches. Experiments on two large reference datasets show that the new methods significantly improve molecular fingerprint prediction accuracy. These improvements result in better metabolite identification, doubling the number of metabolites placed ranked at the top position of the candidate list.

Keyword: Mass Spectrometry and Proteomics

TOP


PP25 (PT) Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation
Date: Monday, July 14, 10:30 am - 10:55 amRoom: 312

Presenting author: Tarmo Äijö, Aalto University, Finland

Additional authors:
Vincent Butty, Massachusetts Institute of Technology, United States
Zhi Chen, University of Turku, Finland
Verna Salo, University of Turku, Finland
Subhash Tripathi, University of Turku / Åbo Akademi University , Finland
Christopher Burge, Massachusetts Institute of Technology, United States
Riitta Lahesmaa, University of Turku, Finland
Harri Lähdesmäki, Aalto University,, Finland

Session Chair: Robert F. Murphy

Presentation Overview: Show More

Motivation: Gene expression profiling using RNA-seq is a powerful technique for screening RNA species’ landscapes and their dynamics in an unbiased way. While several advanced methods exist for differential expression analysis of RNA-seq data, proper tools to analyze RNA-seq time-course have not been proposed. Results: In this study, we use RNA-seq to measure gene expression during the early human T helper 17 (Th17) cell differentiation and T cell activation (Th0). To quantify Th17 specific gene expression dynamics, we present a novel statistical methodology, DyNB, for analyzing time-course RNA-seq data. We use non- parametric Gaussian process to model temporal correlation in gene expression and combine that with negative binomial likelihood for the count data. To account for experiment specific biases in gene expression dynamics, such as differences in cell differentiation efficiencies, we propose a method to rescale the dynamics between replicated measurements. We develop an MCMC sampling method to make inference of differential expression dynamics between conditions. DyNB identifies several known and novel genes involved in Th17 differentiation. Analysis of differentiation efficiencies revealed consistent patterns in gene expression dynamics between different cultures. We use qRT-PCR to validate differential expression and differentiation efficiencies for selected genes. Comparison of the results with those obtained via traditional time point wise analysis shows that time-course analysis together with time rescaling between cultures identifies differentially expressed genes which would not otherwise be detected. Availability: An implementation of the proposed computational methods will be available at http://research.ics.aalto.fi/csb/software/

Keyword: Gene Regulation and Transcriptomics

TOP


PP26 (HT) Relating the metatranscriptome and metagenome of the human gut
Date: Monday, July 14, 11:00 am - 11:25 amRoom: 311

Presenting author: Eric Franzosa, Harvard School of Public Health, United States

Additional authors:
Xochitl Morgan, Harvard School of Public Health, United States
Nicola Segata, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Joshua Reyes, Harvard School of Public Health, United States
Ashlee Earl, The Broad Institute, United States
Georgia Giannoukos, The Broad Institute, United States
Dawn Ciulla, The Broad Institute, United States
Dirk Gevers, The Broad Institute, United States
Matthew Boylan, Division of Gastroenterology, United States
Andrew Chan, Division of Gastroenterology, United States
Jacques Izard, Department of Microbiology, United States
Wendy Garrett, Department of Immunology and Infectious Diseases, United States

Session Chair: Janet Kelso

Presentation Overview: Show More

We have conducted one of the first human microbiome studies in a well-described large prospective cohort incorporating taxonomic, metagenomic, and metatranscriptomic profiling at multiple body sites. Systematic comparison of the gut metagenome and metatranscriptome revealed that a substantial fraction of microbial transcripts were not differentially regulated relative to their genomic abundances. Of the remainder, consistently under-expressed pathways included sporulation and amino acid biosynthesis, while upregulated pathways included ribosome biogenesis and methanogenesis. Across subjects, metatranscriptional profiles were significantly more individualized than DNA-level functional profiles, indicative of subject-specific whole-community regulation. This work also identified a subset of abundant oral microbes that routinely survive transit to the gut, but with minimal transcriptional activity there. Together, these results provide a community-wide profile of biomolecular regulatory processes in the gut, as well as validating one of the first protocols appropriate for large-scale functional profiling of the microbiome in human populations.

Keyword: Gene Regulation & Transcriptomics, Applied Bioinformatics

TOP


PP27 (HT) Power and Limitations of RNA-Seq: findings from the SEQC (MAQC-III) consortium
Date: Monday, July 14, 11:00 am - 11:25 amRoom: 304

Presenting author: David Kreil, Boku University Vienna, Austria

Session Chair: Bernard Moret

Presentation Overview: Show More

In the US-FDA led SEQC/MAQC-III project, different sequencing platforms were tested at over ten sites using well-established reference RNA samples with built-in truths to assess the discovery and expression-profiling performances of platforms and analysis pipelines. The results demonstrate that novel exon-exon junctions can still be discovered beyond existing comprehensive annotations and at high sequencing depths. Extensive investigations encompassing diverse performance metrics characterizing reproducibility, accuracy, and information content were combined with comparisons to qPCR and microarray platforms showing that good inter-site and cross-platform concordances for differentially expressed genes are possible, which is particularly critical in clinical and regulatory settings. In general, however, performance is application, platform, and pipeline dependent, with transcript-level profiling affected more strongly. Together with data from applications of RNA-Seq from several preclinical and clinical problems, the entire SEQC data sets comprise >100 billion reads (10Tb) and provide a unique resource for testing future developments of RNA-Seq.

Keyword: Gene Regulation & Transcriptomics, other

TOP


PP28 (HT) Computational Biology in Medicine: Novel Targets and Drug Repositioning Use Cases
Date: Monday, July 14, 11:00 am - 11:25 amRoom: 302

Presenting author: Pankaj Agarwal, GSK, United States

Additional authors:
Philippe Sanseau, GSK, United States
Mark Hurle, GSK, United States

Session Chair: Yanay Ofran

Presentation Overview: Show More

Identifying the protein to target with a medicine is a critical step in drug discovery, and it is commonly thought that innovation in drug discovery is limited because pharmaceutical companies tend to work on the same drug targets, leading to ‘me too’ drugs. However, we found that 42% of targets were innovative and not duplicative at all. In fact, competition on targets increased with more target validation as should be the case (Nat Rev Drug Discov. 2013 Aug;12(8):575-6). We also discuss systematic drug repositioning techniques based on computational analysis of data from transcriptomics (such as, Connectivity Map), side effects, phenotypic screens, and genome-wide association studies (Clin Pharmacol Ther. 2013 Apr;93(4):335-41). I will also present some key bioinformatics problems in medicine discovery.

Keyword: Applied Bioinformatics, other

TOP


PP29 (HT) Cell-selective labeling using amino acid precursors for proteomic studies of multicellular environments
Date: Monday, July 14, 11:00 am - 11:25 amRoom: 312

Presenting author: Nicholas Gauthier, Memorial Sloan Kettering Cancer Center, United States

Additional authors:
Boumediene Soufi, University of Tübingen, Germany
William Walkowicz, Memorial Sloan-Kettering Cancer Center, United States
Virginia Pedicord, Memorial Sloan-Kettering Cancer Center, United States
Konstantinos Mavrakis, Memorial Sloan-Kettering Cancer Center, United States
Boris Macek, University of Tübingen, Germany
Chris Sander, Memorial Sloan Kettering Cancer Center, United States
Martin Miller, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Robert F. Murphy

Presentation Overview: Show More

Tissue development, homeostasis, and pathogenesis involve complex signaling between many cell types through both secreted factors and direct cell-cell contact. We report a new technique to selectively and continuously label the proteomes of individual cell types in co-culture, named cell type-specific labeling using amino acid precursors (CTAP). In short, mammalian cell expression of exogenous amino acid biosynthesis enzymes from lower organisms allows specific populations of cells to produce their own supply of amino acids from supplemented amino acid precursors. The conversion of heavy isotope-labeled precursors to heavy labeled amino acids is restricted to enzyme-expressing populations, providing a way to genetically control protein labeling. Using quantitative mass spectrometry, we demonstrate the method’s ability to differentiate the cell-of-origin of intra- and intercellular proteins derived from multicellular cultures. Linking proteins to their cellular source using CTAP facilitates cell-cell communication studies and the discovery of cell type-specific biomarkers.

Keyword: Mass Spectrometry & Proteomics, Protein Interactions & Molecular Networks

TOP


PP30 (PT) Pipasic: Similarity and Expression Correction for Strain-Level Identification and Quantification in Metaproteomics
Date: Monday, July 14, 11:30 am - 11:55 pmRoom: 311

Presenting author: Bernhard Y. Renard, Robert Koch Institute, Germany

Additional authors:
Martin S. Lindner, Robert Koch Institute, Germany
Joerg Doellinger, Robert Koch Institute, Germany
Piotr Wojtek Dabrowski, Robert Koch Institute, Germany
Nitsche Andreas, Robert Koch Institute, Germany
Anke Penzlin, Robert Koch Institute, Germany

Session Chair: Janet Kelso

Presentation Overview: Show More

Motivation: Metaproteomic analysis allows studying the interplay of organisms or functional groups and has become increasingly popular also for diagnostic purposes. However, difficulties arise due to the high sequence similarity between related organisms. Further, the state of conservation of proteins between species can be correlated with their expression level which can lead to significant bias in results and interpretation. These challenges are similar but not identical to the challenges arising in the analysis of metagenomic samples and require specific solutions. Results: We introduce Pipasic (peptide intensity-weighted proteome abundance similarity correction) as a tool which corrects identification and spectral counting based quantification results using peptide similarity estimation and expression level weighting within a non-negative lasso framework. Pipasic has distinct advantages over approaches only regarding unique peptides or aggregating results to the lowest common ancestor, as demonstrated on examples of viral diagnostics and an acid mine drainage dataset.

Keyword: Mass Spectrometry and Proteomics

TOP


PP31 (PT) Deep learning of the tissue-regulated splicing code
Date: Monday, July 14, 11:30 am - 11:55 pmRoom: 304

Presenting author: Michael Leung, University of Toronto, Canada

Additional authors:
Hui Xiong, University of Toronto, Canada
Leo Lee, University of Toronto, Canada
Brendan Frey, University of Toronto, Canada

Session Chair: Bernard Moret

Presentation Overview: Show More

Motivation: Alternative splicing is a regulated process that directs the generation of different transcripts from single genes. A computational model that can accurately predict splicing patterns based on genomic features and cellular context is highly desirable, both in understanding this widespread phenomenon, and in exploring the effects of genetic variations on alternative splicing. Methods: Using a deep neural network, we developed a model inferred from mouse RNA-Seq data that can predict splicing patterns in individual tissues and differences in splicing patterns across tissues. Our architecture uses hidden variables that jointly represent features in genomic sequences and tissue types when making predictions. A graphics processing unit was used to greatly reduce the training time of our models with millions of parameters. Results: We show that the deep architecture surpasses the performance of the previous Bayesian method for predicting alternative splicing patterns. With the proper optimization procedure and selection of hyperparameters, we demonstrate that deep architectures can be beneficial, even with a moderately sparse dataset. An analysis of what the model has learned in terms of the genomic features is presented.

Keyword: Gene Regulation and Transcriptomics

TOP


PP32 (PT) DrugComboRanker: Drug Combination Discovery Based on Target Network Analysis
Date: Monday, July 14, 11:30 am - 11:55 pmRoom: 302

Presenting author: Lei Huang, Peking University, China

Additional authors:
Fuhai Li, Houston Methodist Hospital Research Institute , United States
Jianting Sheng, Houston Methodist Hospital Research Institute, United States
Xiaofeng Xia, Houston Methodist Hospital Research Institute, United States
Jinwen Ma, Peking University
Ming Zhan, Houston Methodist Hospital Research Institute, United States
Stephen Wong, Houston Methodist Hospital Research Institute, United States

Session Chair: Yanay Ofran

Presentation Overview: Show More

Motivation: Currently there are no curative anti-cancer drugs and drug resistance is often acquired after drug treatment. One of the reasons is that cancers are complex diseases, regulated by multiple signaling pathways and cross-talks among the pathways. It is expected that drug combinations can reduce drug resistance and improve patients’ outcomes. In clinical practice, the ideal and feasible drug combinations are combinations of existing FDA approved drugs or bioactive compounds that are already used on patients or have entered clinical trials and passed safety tests. These drug combinations could directly be used on patients with less concern of toxic effects. However, there is so far no effective computational approach to search effective drug combinations from the enormous number of possibilities. Results: In this study, we propose a novel systematic computational tool DrugComboRanker to prioritize synergistic drug combinations and uncover their mechanisms of action. We first build a drug functional network based on their genomic profiles, and partition the network into numerous drug network communities by using a Bayesian non-negative matrix factorization approach. As drugs within overlapping community share common mechanisms of action, we next uncover potential targets of drugs by applying a recommendation system on drug communities. We meanwhile build disease-specific signaling networks based on patients’ genomic profiles and interactome data. We then identify drug combinations by searching drugs whose targets are enriched in the complementary signaling modules of the disease signaling network. The novel method was evaluated on lung adenocarcinoma and endocrine receptor (ER) positive breast cancer, and compared with other drug combination approaches. These case studies discovered a set of effective drug combinations top ranked in our prediction list, and mapped the drug targets on the disease signaling network to highlight the mechanisms of action of the drug combinations.

Keyword: Protein Interactions and Molecular Networks

TOP


PP33 (PT) Automated detection and tracking of many cells by using 4D live-cell imaging data
Date: Monday, July 14, 11:30 am - 11:55 pmRoom: 312

Presenting author: Terumasa Tokunaga, The Institute of Statistical Mathematics, Japan

Additional authors:
Osamu Hirose, Kanazawa University, Japan
Shotaro Kawaguchi, Kanazawa University, Japan
Yu Toyoshima, The University of Tokyo, Japan
Takayuki Teramoto, Kyushu University, Japan
Hisaki Ikebata, Graduate University of Advanced Studies, Japan
Sayuri Kuge, Kyushu University, Japan
Takeshi Ishihara, Kyushu University, Japan
Yuichi Iino, The University of Tokyo, Japan
Ryo Yoshida, The Institute of Statistical Mathematics, Japan

Session Chair: Robert F. Murphy

Presentation Overview: Show More

Motivation: Automated fluorescence microscopes produce massive amounts of images observing cells often in four dimensions of space and time. This study addresses two tasks of time-lapse imaging analyses; detection and tracking of many imaged cells, especially intended for 4D live-cell imaging of neuronal nuclei of C. elegans. Cells of interest appear in little more generic forms than ellipsoids. They distribute densely and move rapidly in a series of 3D images. In such cases, existing tracking methods often fail due to that, for instance, many trackers transit from one to the other of different objects during rapid moves. Results: The present method starts from converting each 3D image to a smooth continuous function by performing the kernel density estimation. Cell bodies in an image are assumed to lie in regions around multiple local maxima of the density function. Then, the tasks of detecting and tracking cells are addressed with two hill-climbing algorithms that we derive. By applying the cell detection method to an image at the first frame, the positions of trackers are initialized. The tracking algorithm keeps attracting them to around local maxima changing over time in a subsequent image sequence. To prevent the trackers from turnovers and coalescences, we employ Markov random fields (MRFs) to model spatial and temporal covariation of cells, and maximize the image forces and the MRF- induced constraints on transitions of the trackers. The tracking procedure is demonstrated with dynamic 3D images containing more than one hundred neurons of C. elegans.

Keyword: Bioimaging and Data Visualization

TOP


PP34 (HT) Primate Transcript and Protein Expression Levels Evolve under Compensatory Selection Pressures
Date: Monday, July 14, 12:00 pm - 12:25 pmRoom: 311

Presenting author: Zia Khan, University of Maryland, United States

Additional authors:
Michael Ford, MS Bioworks, LLC, United States
Darren Cusanovich, University of Chicago, United States
Amy Mitrano, University of Chicago, United States
Jonathan Prichard, Stanford University, United States
Yoav Gilad, University of Chicago, United States

Session Chair: Janet Kelso

Presentation Overview: Show More

Due to the technical and computational challenges of conducting comparative, genome-scale proteomics, essentially all studies of gene regulatory evolution across primates and other mammals have focused on mRNA levels rather than protein levels. Yet, proteins perform much of the work of the cell and are subject to regulation not revealed by mRNA levels alone. Using quantitative mass spectrometry and novel computational analysis methods, we obtained thousands of comparative mRNA and protein expression measurements from human, chimpanzee, and rhesus macaque lymphoblastoid cell lines. We used data from all three species to identify genes whose regulation might have evolved under natural selection, and considered jointly, our data allowed us to identify genes where lineage-specific changes might specifically affect post-transcriptional or post-translational regulation. Our analyses indicate that on an evolutionary timescale, there is surprising flexibility in primate mRNA levels, as these changes are often either buffered or compensated for at the protein level.

Keyword: Mass Spectrometry & Proteomics, Gene Regulation & Transcriptomics

TOP


PP35 (HT) Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data
Date: Monday, July 14, 12:00 pm - 12:25 pmRoom: 304

Presenting author: Yuanfang Guan, University of Michigan, United States

Additional authors:
Hongdong Li, University of Michigan, United States
Rajasree Menon, University of Michigan, United States
Yuchen Wen, University of Michigan, United States
Gilbert S. Omenn, University of Michigan, United States
Matthias Kretzler, University of Michigan, United States
Yuanfang Guan, University of Michigan, United States

Session Chair: Bernard Moret

Presentation Overview: Show More

Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6.

Keyword: Protein Structure & Function, Gene Regulation & Transcriptomics

TOP


PP36 (HT) Drug synergy screen and network modeling in dedifferentiated liposarcoma identifies CDK4 and IGF1R as synergistic drug targets
Date: Monday, July 14, 12:00 pm - 12:25 pmRoom: 302

Presenting author: Martin Miller, Memorial Sloan Kettering Cancer Center, United States

Additional authors:
Evan Molinelli, Memorial Sloan Kettering Cancer Center, United States
Jayasree Nair, Memorial Sloan Kettering Cancer Center, United States
Tahir Sheikh, Memorial Sloan Kettering Cancer Center, United States
Rita Samy, Memorial Sloan Kettering Cancer Center, United States
Xiaohong Jing, Memorial Sloan Kettering Cancer Center, United States
Qin He, Memorial Sloan Kettering Cancer Center, United States
Anil Korkut, Memorial Sloan Kettering Cancer Center, United States
Aimee Crago, Memorial Sloan Kettering Cancer Center, United States
Samuel Singer, Memorial Sloan Kettering Cancer Center, United States
Gary Schwartz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Yanay Ofran

Presentation Overview: Show More

In this study, entitled “Drug synergy screen and network modeling in dedifferentiated liposarcoma identifies CDK4 and IGF1R as synergistic drug targets”, we have successfully deployed a powerful perturbation-based systems biology approach to discover and characterize drug combinations with translational relevance for cancer treatment. We demonstrate its applicability in dedifferentiated liposarcoma via the discovery of a synergistic IGF1R and CDK4 combination therapy, which we will further test in a clinical setting. From a drug combination screen with 14 anti-cancer drugs, proteomic and phenotypic response profiles serve as input for our de novo network inference method that is subsequently used to predict pathway-based mechanisms of drug synergy. We believe that this study is highly relevant for the interdisciplinary and systems biology-oriented participants of the ISMB conference. We anticipate that such integrated approaches to combinatorial therapeutics will be widely used and provide opportunities to bridge basic cancer research and clinical drug development.

Keyword: Protein Interactions & Molecular Networks, Disease Models & Epidemiology

TOP


PP37 (HT) NaviCell: a web-based environment for navigation, curation, maintenance and data analysis in the context of large molecular interaction maps
Date: Monday, July 14, 12:00 pm - 12:25 pmRoom: 312

Presenting author: Emmanuel Barillot, Institut Cuire, France

Additional authors:
Inna Kuperstein, Institut Cuire, France
David Cohen, Institut Cuire, France
Stuart Pook, Institut Cuire, France
Eric Viara, Institut Cuire, France
Laurence Calzone, Institut Cuire, France
Emmanuel Barillot, Institut Cuire, France
Andrei Zinovyev, Institut Cuire, France

Session Chair: Robert F. Murphy

Presentation Overview: Show More

Biological knowledge can be systematically represented in a computer-readable form as a comprehensive map of molecular interactions. NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner. NaviCell is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting community feedback. NaviCell can be used for studying molecular entities of interest in the context of signaling pathways and crosstalk between pathways within a global signaling network. It greatly facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive and user-friendly fashion due to an imbedded blogging system. In addition, NaviCell provides tools for omics data integration and overlaying several types of data; visualization and analysis in the context of signaling networks.

Keyword: Protein Interactions & Molecular Networks, Gene Regulation & Transcriptomics

TOP


PP38 (HT) Efficient Modeling and Active Learning Discovery of Biological Responses
Date: Monday, July 14, 2:10 pm - 2:35 pmRoom: 311

Presenting author: Armaghan Naik, Carnegie Mellon University, United States

Additional authors:
Joshua Kangas, Carnegie Mellon, United States
Christopher Langmead, Carnegie Mellon, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show More

High throughput and high content screening involve determination of the effect of many compounds on a given target. As currently practiced, screening for each new target typically makes little use of information from screens of prior targets. Further, choices of compounds to advance to drug development are made without significant screening against off-target effects. The overall drug development process could be made more effective if potential effects of all compounds on all possible targets could be considered, yet the cost of complete experimentation is prohibitive. Here we describe a potential solution: probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for efficiently selecting which experiments to perform in order to build those models and determining when to stop. Using simulated and experimental data, we show that our approaches can produce accurate estimates of unmeasured experiments much faster than by selecting experiments at random.

Keyword: Applied Bioinformatics, other

TOP


PP39 (HT) Analyzing DNase I biases reveals a novel DNA methylation readout mechanism
Date: Monday, July 14, 2:10 pm - 2:35 pmRoom: 304

Presenting author: Harmen Bussemaker, Columbia University, United States

Additional authors:
Allan Lazarovici, Columbia University, United States
Tianyin Zhou, University of Southern California, United States
Anthony Shafer, University of Washington, United States
Ana Carolina Dantas Machado, University of Southern California, United States
Richard Sandstrom, University of Washington, United States
Peter Sabo, University of Washington, United States
Yan Lu, University of Southern California, United States
Remo Rohs, University of Southern California, United States
John Stamatoyannopoulos, University of Washington, United States

Session Chair: Michal Linial

Presentation Overview: Show More

We have uncovered a novel and general mechanism by which cytosine methylation can dramatically strengthen specific protein-DNA interactions. By analyzing DNase I digests of purified human genomic DNA, we discovered that (i) cleavage rate varies over a thousand-fold range with the surrounding sequence, and that cleavage near CpG dinucleotides is ten-fold higher when the cytosine is methylated. Combining all-atom computer simulation predictions of DNA shape with statistical analysis of massively parallel sequencing data, we were able to find a unified explanation for these phenomena. It turns out that cytosine methylation narrows the DNA minor groove, which in turn strengthens interactions with positively charged amino-acid side chains. Such minor groove contacts occur for a wide range of transcription factors, as well as nucleosomes. The novel structural mechanism put forward in this study therefore has the potential to significantly deepen our understanding of how epigenetic information is "read" by the cell.

Keyword: Gene Regulation & Transcriptomics, Protein Interactions & Molecular Networks

TOP


PP40 (HT) Statistical challenges in whole-exome sequencing of 7000 individuals with schizophrenia and controls
Date: Monday, July 14, 2:10 pm - 2:35 pmRoom: 302

Presenting author: Menachem Fromer, Icahn School of Medicine at Mount Sinai, United States

Additional authors:
Shaun Purcell, Icahn School of Medicine at Mount Sinai, United States

Session Chair: Dietlind Gerloff

Presentation Overview: Show More

In dealing with large next-generation sequencing data in the study of disease, the sheer number of potentially relevant neutral genetic variants results in a decreased signal-to-noise ratio. In this talk, we will discuss how we dealt with these issues in two recently published studies of schizophrenia. In one, ~2500 schizophrenia cases and ~2500 matched controls were whole-exome-sequenced. The magnitude of neutral mutations largely overwhelms the number of genetic variants more likely related to disease, which we focus on by frequency filtering, biological impact, and pathway analysis. In the second study, de novo mutations were sought out in father-mother-child trios to find mutations not yet subjected to selective pressures. In this instance, the overwhelming majority of such potential mutations (seemingly arising as new in the children) are false positives. We carefully sifted through these using sequencing and other metrics to find the real ones most likely associated with disease.

Keyword: Sequence Analysis, Disease Models & Epidemiology

TOP


PP41 (PT) Pareto-Optimal Phylogenetic Tree Reconciliation
Date: Monday, July 14, 2:10 pm - 2:35 pmRoom: 312

Presenting author: Yi-Chieh Wu, Massachusetts Institute of Technology, United States

Additional authors:
Mukul S. Bansal, University of Connecticut, United States
Manolis Kellis, Massachusetts Institute of Technology, United States
Ran Libeskind-Hadas, Harvey Mudd College, United States

Session Chair: Russell Schwartz

Presentation Overview: Show More

Motivation: Phylogenetic tree reconciliation is a widely-used method for reconstructing the evolutionary histories of gene families and species, hosts and parasites, and other dependent pairs of entities. Reconciliation is typically performed using maximum parsimony, in which each evolutionary event type is assigned a cost and the objective is to find a reconciliation of minimum total cost. It is generally understood that reconciliations are sensitive to event costs, but little is understood about the relationship between events costs and solutions. Moreover, choosing appropriate event costs is a notoriously difficult problem. Results: We address this problem by giving an efficient algorithm for computing Pareto-optimal sets of reconciliations, thus providing the first systematic method for understanding the relationship between event costs and reconciliations. This, in turn, results in new techniques for computing event support values and, for cophylogenetic analyses, performing robust statistical tests. We provide new software tools and demonstrate their use on a number of datasets from evolutionary genomic and cophylogenetic studies. Availability: Our Python tools are freely available at www.cs.hmc.edu/~hadas/xscape Contact: mukul@engr.uconn.edu

Keyword: Evolution and Comparative Genomics

TOP


PP42 (PT) Stochastic EM-based TFBS motif discovery with MITSU
Date: Monday, July 14, 2:40 pm - 3:05 pmRoom: 311

Presenting author: Alastair Kilpatrick, The University of Edinburgh, United Kingdom

Additional authors:
Bruce Ward, The University of Edinburgh, United Kingdom
Stuart Aitken, The University of Edinburgh, United Kingdom

Session Chair: Reinhard Schneider

Presentation Overview: Show More

Motivation: The Expectation-Maximisation (EM) algorithm has been successfully applied to the problem of transcription factor binding site (TFBS) motif discovery and underlies the most widely used motif discovery algorithms. In the wider field of probabilistic modelling, the stochastic EM (sEM) algorithm has been used to overcome some of the limitations of the EM algorithm; however, the application of sEM to motif discovery has not been fully explored. Results: We present MITSU, a novel algorithm for motif discovery which combines sEM with an improved approximation to the likelihood function which is unconstrained with regard to the distribution of motif occurrences within the input dataset. The algorithm is evaluated quantitatively on realistic synthetic data and several collections of characterised prokaryotic TFBS motifs and shown to outperform EM and an alternative sEM-based algorithm, particularly in terms of site-level positive predictive value.

Keyword: Sequence Analysis

TOP


PP43 (HT) Constructing Hepitypes:Phasing Local Genotyping and DNA Methylation
Date: Monday, July 14, 2:40 pm - 3:05 pmRoom: 304

Presenting author: Wen-Yu Chung, National Kaohsiung University of Applied Sciences, Taiwan

Additional authors:
Robert Schmitz, The Salk Institute for Biological Studies, United States
Tanya Biorac, Life Technologies Corp.-Ion Torrent, United States
Delia Ye, Life Technologies Corp.-Ion Torrent, United States
Miroslav Dudas, Life Technologies Corp.-Ion Torrent, United States
Gavin Meredith, Life Technologies Corp.-Ion Torrent, United States
Christopher Adams, Life Technologies Corp.-Ion Torrent, United States
Joseph Ecker, The Salk Institute for Biological Studies, United States
Michael Zhang, University of Texas at Dallas, United States

Session Chair: Michal Linial

Presentation Overview: Show More

Whole-genome DNA methylation sequencing provides both methylation patterns and genetic information. We utilized base resolution methylomes to directly identify allelic linkage of DNA methylation and genomic variants. The paired association was further extended to construct hepitypes by the simultaneous phasing of genotype and methylation. Using such approach, the sequencing reads provide direct statistics of the interdependence between methylcytosines and nucleotide variations; consequently, the detailed patterns of genetic and epigenetic variations can be readily inferred by data. Moreover, the analysis is not limited by known single nucleotide variants. In addition to imprinted regions and SNV-in-CpG sites, we show numerous cis-regulatory sequence-associated DNA methylation sites. We extended this strategy to incorporate multiple nucleotide and methylation sites and ranked hepitypes according to the observed frequency. The top-ranked hepitypes indicate that methylated sites are often observed from the same allele.

Keyword: Sequence Analysis, Gene Regulation & Transcriptomics

TOP


PP44 (HT) Genome leaks
Date: Monday, July 14, 2:40 pm - 3:05 pmRoom: 302

Presenting author: Steven Brenner, University of California, Berkeley, United States

Session Chair: Dietlind Gerloff

Presentation Overview: Show More

Genome science is reaching a critical juncture. More than 10,000 genetic variants have been associated with traits, allowing breakthroughs in basic biological research and medical applications. However, privacy concerns have excluded most researchers from directly analyzing the vast wealth of human genomic information. Nonetheless, risks of vast data breaches are rapidly rising—and further progress requires ever-larger cohorts. We currently inhibit research without effectively protecting human subjects; prospects for harm to both individuals and to medical research are growing.

An extended discussion of the genome leaks issues may be found at http://compbio.berkeley.edu/proj/leak/

Keyword: other, Population Genomics

TOP


PP45 (HT) Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling
Date: Monday, JUly 14, 2:40 pm - 3:05 pmRoom: 312

Presenting author: Yuval Tabach, Massachusetts General Hospital, United States

Additional authors:
Gary Ruvkun, Massachusetts General Hospital, United States
Carmit Levy, Tel Aviv University, United States

Session Chair: Russell Schwartz

Presentation Overview: Show More

Genes with common profiles of the presence and absence in disparate genomes tend to function in the same pathway. By mapping all human genes into about 1000 clusters of genes with similar patterns of conservation across eukaryotic phylogeny, we determined that sets of genes associated with particular diseases have similar phylogenetic profiles. By focusing on those human phylogenetic gene clusters that significantly overlap some of the thousands of human gene sets defined by their coexpression or annotation to pathways or other molecular attributes, we reveal the evolutionary map that connects molecular pathways and human diseases. The other genes in the phylogenetic clusters enriched for particular known disease-genes or molecular pathways identify candidate genes for roles in those same disorders and pathways. Focusing on proteins coevolved with the microphthalmia-associated transcription factor(MITF), we identified the Notch pathway suppressor of hairless (RBP-Jk/SuH) transcription factor, and showed that RBP-Jk functions as an MITF cofactor.

Keyword: Evolution & Comparative Genomics, Disease Models & Epidemiology

TOP


PP46 (PT) Gene network inference by probabilistic scoring of relationships from a factorized model of interactions
Date: Monday, July 14, 3:10 pm - 3:35 pmRoom: 311

Presenting author: Marinka Zitnik, University of Ljubljana, Slovenia

Additional authors:
Blaz Zupan, University of Ljubljana, Slovenia

Session Chair: Reinhard Schneider

Presentation Overview: Show More

Motivation: Epistasis analysis is an essential tool of classical genetics for inferring the order of function of genes in a common pathway. Typically, it considers single and double mutant phenotypes and for a pair of genes observes if a change in the first gene masks the effects of the mutation in the second gene. Despite the recent emergence of biotechnology techniques that can provide gene interaction data on a large, possibly genomic scale, very few methods are available for quantitative epistasis analysis and epistasis-based network reconstruction. Results: We here propose a conceptually new probabilistic approach to gene network inference from quantitative interaction data. The approach is founded on epistasis analysis. Its features are joint treatment of the mutant phenotype data with a factorized model and probabilistic scoring of pairwise gene relationships that are inferred from the latent gene representation. The resulting gene network is assembled from scored pairwise relationships. In an experimental study, we show that the proposed approach can accurately reconstruct several known pathways and that it surpasses the accuracy of current approaches.

Keyword: Protein Interactions and Molecular Networks

TOP


PP47 (HT) A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes
Date: Monday, July 14, 3:10 pm - 3:35 pmRoom: 304

Presenting author: Erez Levanon, Bar-Ilan University, Israel

Additional authors:
Yishay Pinto, Bar-Ilan University, Israel
Haim Cohen, Bar-Ilan University, Israel
Lily Bazk, Bar-Ilan University, Israel
Ami Haviv, Bar-Ilan University, Israel
Michal Barak, Bar-Ilan University, Israel
Jasmine Jacob-Hirsch, Bar-Ilan University, Israel
Patricia Deng, Stanford University, United States
Rui Zhang, Stanford University, United States
Jin Billy Li, Stanford University, United States
Gidi Rechavi, Chaim Sheba Medical Center, Israel

Session Chair: Michal Linial

Presentation Overview: Show More

RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon is not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu that form double-stranded RNA undergo editing, although most sites exhibit editing at only low levels. We estimate that there are over 100 million human Alu editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.

Keyword: Gene Regulation & Transcriptomics, Evolution & Comparative Genomics

TOP


PP48 (PT) Privacy Preserving Protocol for Detecting Genetic Relatives Using Rare Variants
Date: Monday, July 14, 3:10 pm - 3:35 pmRoom: 302

Presenting author: Farhad Hormozdiari, University of California, Los Angeles, United States

Additional authors:
Jong Wha Joo, University of California, Los Angeles, United States
Feng Guan, University of California, Los Angeles, United States
Akshay Wadia, University of California, Los Angeles, United States
Rafail Ostrosky, University of California, Los Angeles, United States
Amit Sahai, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States

Session Chair: Dietlind Gerloff

Presentation Overview: Show More

Motivation: High-throughput sequencing technologies have impacted many areas of genetic research. One such area is the identification of relatives from genetic data. The standard approach for the identification of genetic relatives collects the genomic data of all individuals and stores it in a database. Then, each pair of individuals are compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test. Results: In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provides the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins which was not possible using the existing methods. We also show in the 1000 genomes data with cryptic relationships that our method can detect these individuals.

Keyword: Population Genomics

TOP


PP49 (HT) Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules
Date: Monday, July 14, 3:10 pm - 3:35 pmRoom: 312

Presenting author: Sushmita Roy, University of Wisconsin, Madison, United States

Additional authors:
Ilan Wapinski, Harvard Medical School, United States
Jenna Pfiffner, Broad institute, United States
Courtney French, University of California, United States
Amanda Socha, Darthmouth College, United States
Jay Konieczka, Broad institute, United States
Naomi Habib, Broad institute, United States
Manolis Kellis, MIT, United States
Dawn Thompson, Broad institute, United States
Aviv Regev, MIT & Broad institute, United States

Session Chair: Russell Schwartz

Presentation Overview: Show More

Comparative functional genomics seeks to measure and compare functional measurements such as mRNA, chromatin states across multiple species. A major challenge is to develop effective tools to systematically compare these data across multiple species. In this talk, we will present a new computational approach, Arboretum to systematically identify modules of co-expressed genes in a species phylogeny. Arboretum is based on a probabilistic model of expression data that is applicable to complex phylogenies with multiple gene duplication and loss events. We applied Arboretum to study the evolution of transcriptional modules in yeast and mammalian species. In yeast, we find substantial conservation in the module expression patterns, although the specific genes in each module diverge in a life-style or clade-specific manner. We will also present some recent results on application of Arboretum to identify conservation and divergence of tissue-specific modules in mammalian species.

Keyword: Evolution & Comparative Genomics, Gene Regulation & Transcriptomics

TOP


PP50 (PT) Functional Association Networks as Priors for Gene Regulatory Network Inference
Date: Monday, July 14, 3:40 pm - 4:05 pmRoom: 311

Presenting author: Erik Sonnhammer, SciLifeLab, Sweden

Additional authors:
Andreas Tjärnberg, SciLifeLab, Sweden
Torbjörn Nordling, SciLifeLab, Sweden
Sven Nelander, Uppsala University, Sweden
Matthew Studham, SciLifeLab, Sweden

Session Chair: Reinhard Schneider

Presentation Overview: Show More

Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data is inadequate to fully explain the network, informative priors have been shown to improve the accuracy of inferences. This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic data sets indicates that if the prior networks have enough causal information then they can improve GRN inference accuracy, and if not then accuracy may decrease. This opens the door to the possibility that functional association databases can be used as priors to make GRN inference more reliable.

Keyword: Gene Regulation and Transcriptomics

TOP


PP51 (HT) Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression
Date: Monday, July 14, 3:40 pm - 4:05 pmRoom: 304

Presenting author: Steve Lianoglou, Memorial Sloan Kettering Cancer Center, United States

Additional authors:
Christina Leslie, Memorial Sloan Kettering Cancer Center, United States
Julie Yang, Memorial Sloan Kettering Cancer Center, United States
Christine Mayr, Memorial Sloan Kettering Cancer Center, United States
Vidur Garg, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Michal Linial

Presentation Overview: Show More

More than half of human genes use alternative cleavage and polyadenylation (ApA) to generate mRNA transcripts that differ in the lengths of their 3′ untranslated regions (UTRs), thus altering the post-transcriptional fate of the message and likely the protein output. We developed a sequencing method called 3′-seq to quantitatively map the 3′ ends of the transcriptome of diverse human tissues and isogenic transformation systems. We found that most tissue-restricted genes have single 3′ UTRs, whereas most ubiquitously transcribed genes generate multiple 3′ UTRs. During transformation and differentiation, single-UTR genes change their mRNA abundance levels, while multi-UTR genes typically change 3′ UTR isoform ratios to achieve tissue specificity. However, these regulation programs target genes that function in the same pathways and processes that characterize the new cell type. Finally, tissue-specific usage of ApA sites appears to be a mechanism for changing the landscape targetable by ubiquitously expressed microRNAs.

Keyword: Gene Regulation & Transcriptomics, Applied Bioinformatics

TOP


PP52 (HT) Detecting chromatin modifications in cancer samples: challenges and solutions
Date: Monday, July 14, 3:40 pm - 4:05 pmRoom: 302

Presenting author: Valentina Boeva, Institut Curie, France

Additional authors:
Haitham Ashoor, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Aurelie Herault, UMR 144 CNRS, Subcellular Structure and Cellular Dynamics, France
Aurelie Kamoun, Institut Curie, France
Francois Radvanyi, Institut Curie, France
Vladimir B. Bajic, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Emmanuel Barillot, Institut Curie, Mines ParisTech, France

Session Chair: Dietlind Gerloff

Presentation Overview: Show More

Changes in gene expression in cancer cells are usually associated with specific changes in epigenetic profiles (e.g., histone modification profiles). In order to characterize these changes, ChIP-seq experiments are often employed. Our recent work (Ashoor et al., 2013) has demonstrated that for detection of histone modifications in cancer cells one should apply specific methods that take into account possible DNA copy number aberrations. We apply the method we developed for the detection of histone modifications to reanalyze ENCODE ChIP-seq datasets generated for cancer cell lines. We show that current ENCODE histone modification profiles and called regions with histone modifications have a systematic copy number bias: irrespective of the particular histone modification, regions of genomic gain tend to contain more called histone modifications than regions of genomic loss. Our results suggest that the ENCODE cancer cell datasets should be reanalyzed in order to eliminate the copy number bias we have observed.

Keyword: Gene Regulation & Transcriptomics, Applied Bioinformatics

TOP


PP53 (HT) Proteomic Universal Correlate of Evolution
Date: Monday, July 14, 3:40 pm - 4:05 pmRoom: 312

Presenting author: David Horn, Tel-Aviv University, Israel

Session Chair: Russell Schwartz

Presentation Overview: Show More

We introduce a novel unifying methodology for the investigation of Compositional Order (CO) of protein sequences. It accounts for all types of low-complexity regions and repetitive phenomena, including the existence of large periodic structures in protein sequences. We define new CO measures providing insights into the correlation of CO with protein function and with evolution. In particular, a large-scale analysis of 94 proteomes shows that the CO vocabulary of frequently appearing amino acid triplets serves as a measure of taxonomic ordering separating major clades from each other. It serves as a novel phylogenetic tool and suggests that major CO generation occurs during the creation of a completely new species, i.e. during macroevolutionary events. It provides an alternative to the traditional ordering of species based on effective population size x mutation rate, Neu, with which it anti-correlates well, signifying that increasing FT vocabulary is associated with low evolutionary pressure.

Keyword: Evolution & Comparative Genomics, Sequence Analysis

TOP


PP54 (HT) Novel Burkholderia mallei Virulence Factors Linked to Specific Host-Pathogen Protein Interactions
Date: Tuesday, July 15, 10:30 am - 10:55 amRoom: 311

Presenting author: Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States

Additional authors:
Vesna Memisevic, U.S. Army Medical Research and Materiel Command, United States
Nela Zavaljevski, U.S. Army Medical Research and Materiel Command, United States
Rembert Pieper, J. Craig Venter Institute, United States
Seesandra Rajagopala, J. Craig Venter Institute, United States
Keehwan Kwon, J. Craig Venter Institute, United States
Katherine Townsend, J. Craig Venter Institute, United States
Chenggang Yu, U.S. Army Medical Research and Materiel Command, United States
Xueping Yu, U.S. Army Medical Research and Materiel Command, United States
David DeShazer, U.S. Army Medical Research Institute of Infectious Diseases, United States
Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States
Anders Wallqvist, U.S. Army Medical Research and Materiel Command, United States

Session Chair: Scott Markel

Presentation Overview: Show More

Bacterial proteins required for virulence, i.e., virulence factors, are a key component of bacterial pathogenicity, as they control and promote pathogenic infection and intracellular survival. Here, we present a combined in silico, in vitro, and in vivo strategy to identify and characterize novel virulence factors of Burkholderia mallei, an infectious intracellular pathogen and the causative agent of glanders. First, we used bioinformatics approaches to identify 49 putative virulent factors involved in B. mallei pathogenicity. Using yeast two-hybrid assays against normalized whole human and whole murine proteome libraries, we identified interactions between each of the putative virulent factors and host proteins. The analysis of these interactions helped us identify and characterize three novel B. mallei virulence factors, as well as host processes and pathways that can be exploited for drug and vaccine design. Finally, using murine aerosol challenge model experiments we verified that three novel virulence factors did indeed attenuate virulence.

Keyword: Protein Interactions & Molecular Networks

TOP


PP55 (HT) Mapping Functional Transcription Factor Networks from Gene Expression Data
Date: Tuesday, July 15, 10:30 am - 10:55 amRoom: 304

Presenting author: Michael Brent, Washington University, United States

Session Chair: Robert F. Murphy

Presentation Overview: Show More

I will present an algorithm, NetProphet, for inferring transcriptional regulatory networks from expression profiling of transcription factor (TF) deletion mutants. I will then show that a network constructed from this type of expression data identifies direct binding targets more accurately than one constructed from a large chromatin immunoprecipitation (ChIP) data set. However, ChIP networks contain many edges with no evidence of functional effect on target expression levels; in our network, every edge is functional. Furthermore, gene expression experiments are much easier, more reliable, and more high throughput than ChIP experiments. We conclude that gene expression, not ChIP, is currently the optimal method for network reconstruction. Finally, we show some new biological discoveries we've made with NetProphet, and describe a large deletion and expression profiling experiment that is being driven by NetProphet. TFs likely to participate in a specific biological process are targeted for deletion using a method we call PhenoProphet.

Keyword: Gene Regulation & Transcriptomics

TOP


PP56 (HT) Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions.
Date: Tuesday, July 15, 10:30 am - 10:55 amRoom: 302

Presenting author: Arne Elofsson, Stockholm University, Sweden

Additional authors:
Sara Light, Stockholm University, Sweden
Rauan Sagit, Stockholm University, Sweden
Oxana Sachenkova, Stockholm University, Sweden
Diana Ekman, Stockholm University, Sweden

Session Chair: Predrag Radivojac

Presentation Overview: Show More

Proteins evolve not only through point mutations but also by insertion
and deletion events, which affect the length of the protein. It is known that such indel events most frequently occur in
surface-exposed loops. However, detailed analysis of indel events in
distantly related and fast evolving proteins is hampered by the
difficulty involved in correctly aligning such sequences. We
circumvent this problem by first only analyzing homologous proteins
based on length variation rather than pairwise alignments. Using this
approach we find a surprisingly strong relationship between difference
in length and difference in the number of intrinsically disordered
residues, where up to 75% of the length variation can be
explained by changes in the number of intrinsically disordered
residues. Further, we find that disorder is common in both insertions
and deletions. A more detailed analysis reveals that indel events do
not induce disorder but rather that already disordered regions accrue
indels, suggesting that there is a lowered selective pressure for
indels within intrinsically disordered regions.

Keyword: Evolution & Comparative Genomics, Protein Structure & Function

TOP


PP57 (PT) Graph Regularized Dual Lasso for Robust eQTL Mapping
Date: Tuesday, July 15, 10:30 am - 10:55 amRoom: 312

Presenting author: Wei Cheng, UNC at Chapel Hill, United States

Additional authors:
Xiang Zhang, Case Western Reserve University, United States
Zhishan Guo, UNC at Chapel Hill, United States
Yu Shi, University of Science and Technology of China
Wei Wang, University of California, Los Angeles, United States

Session Chair: Toni Kazic

Presentation Overview: Show More

As a promising tool for dissecting the genetic basis of complex traits, expression quantitative trait loci (eQTL) mapping has attracted increasing research interest. An important issue in eQTL mapping is how to effectively integrate networks representing interactions among genetic markers and genes. Recently, several Lasso-based methods have been proposed to leverage such network information. Despite their success, existing methods have three common limitations: 1) a preprocessing step is usually needed to cluster the networks; 2) the incompleteness of the networks and the noise in them are not considered; 3) other available information, such as location of genetic markers and pathway information, are not integrated. To address the limitations of the existing methods, we propose Graph-regularized Dual Lasso (GDL), a robust approach for eQTL mapping. GDL integrates the correlation structures among genetic markers and traits simultaneously. It also takes into account the incompleteness of the networks and is robust to the noise. GDL utilizes graph-based regularizers to model the prior networks and does not require an explicit clustering step. Moreover, it enables further refinement of the partial and noisy networks. We further generalize GDL to incorporate the location of genetic makers and gene pathway information. We perform extensive experimental evaluations using both simulated and real datasets. Experimental results demonstrate that the proposed methods can effectively integrate various available priori knowledge and significantly outperform the state-of-the-art eQTL mapping methods.

Keyword: Gene Regulation and Transcriptomics

TOP


PP58 (HT) Mathematical Modeling of Virus-Host Interactions
Date: Tuesday, July 15, 11:00 am - 11:25 amRoom: 311

Presenting author: Lars Kaderali, Technische Universität Dresden, Germany

Additional authors:
Marco Binder, University of Heidelberg, Germany
Nurgazy Sulaimanov, University of Heidelberg, Germany
Diana Clausznitzer, Technische Universität Dresden, Germany
Manuel Schulze, Technische Universität Dresden, Germany
Cristian Hüber, University of Heidelberg, Germany
Simon Lenz, University of Heidelberg, Germany
Johannes Schloeder, University of Heidelberg, Germany
Martin Trippler, University Hospital Essen, Germany
Ralf Bartenschlager, University of Heidelberg, Germany
Volker Lohmann, University of Heidelberg, Germany

Session Chair: Scott Markel

Presentation Overview: Show More

As obligate intracellular parasites, viruses rely on host factors for every single step of their lifecycle. This gives rise to complex interaction networks between virus and host cell, constituting a prime example necessitating a systems biology approach. I will show how we tightly integrated mathematical modeling, bioinformatics and wetlab experiments to decipher interactions between hepatitis C virus and its host cell. Through an iterative cycle between modeling and experiment, including genome-wide siRNA screening and expression profiling, we identified key host processes determining differences in infection between different cell lines, and set up predictive mathematical models that quantitatively and mechanistically explain differences in viral replication. Mathematical model analysis has implications for drug design for hepatitis C virus infection, which I will discuss in the presentation. I will furthermore present ongoing work on integrating cellular immune response, as well as extensions of the model to other viruses, and implications for antiviral treatment.

Keyword: Disease Models & Epidemiology, Applied Bioinformatics

TOP


PP59 (HT) Chromatin landscapes and long-range interactions of retroviral and transposon integrations
Date: Tuesday, July 15, 11:00 am - 11:25 amRoom: 304

Presenting author: Jeroen de Ridder, Delft University of Technology, Netherlands

Additional authors:
Johann de Jong, Netherlands Cancer Institute, Netherlands
Lodewyk Wessels, Netherlands Cancer Institute, Netherlands
Sepideh Babaei, Delft University of Technology, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands
Waseem Akhtar, Netherlands Cancer Institute, Netherlands

Session Chair: Robert F. Murphy

Presentation Overview: Show More

The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools for cancer gene discovery and gene therapy. These integrating elements have distinct integration biases. To study these biases, we generated very large datasets consisting of ∼120000 to ∼180000 unselected genomic integrations for 3 types of integrating elements. We overlaid these integration profiles with ∼80 (epi)genomic features to generate bias maps at both local and genome-wide scales. We moreover overlay a large collection of retroviral cancer-causing insertions with genome-wide chromatin capture conformation (Hi-C) data. This enables the exploration of the occurrence of 3D hot-spots of recurrent mutations that are in spatial proximity of putative cancer genes. Taken together, our results provide an assessment of integration bias at unprecedented resolution and provide new insights into the mechanisms through which retroviral integrations deregulate cellular processes in cancer cells.

Keyword: Gene Regulation & Transcriptomics, Disease Models & Epidemiology

TOP


PP60 (HT) Characterizing changes in the rate of protein-protein dissociation upon interface mutation using hotspot energy and organization
Date: Tuesday, July 15, 11:00 am - 11:25 am Room: 302

Presenting author: Rudi Agius, Cancer Research UK, United Kingdom

Additional authors:
Mieczyslaw Torchala, Cancer Research UK, United Kingdom
Iain H. Moal, Joint BSC-IRB Research Program in Computational Biology, Spain
Juan Fernández-Recio, Joint BSC-IRB Research Program in Computational Biology, Spain
Paul A, Cancer Research UK, United Kingdom

Session Chair: Predrag Radivojac

Presentation Overview: Show More

Protein-protein interactions vary considerably in their degree of stickiness. Mutations at protein interfaces can alter the interaction between protein pairs, causing them to dissociate faster or slower and rework the dynamics of the cellular networks. Therefore, the calculation and interpretation of mutants, which affect the rate of dissociation, is critical to our understanding of complex networks and disease. In this work, we exploit the energy and distribution of key binding ‘hotspot’ residues for the calculation of off-rate changes upon mutations. This enables us to pin-point the critical regions of stability and how they change for complexes of different sizes. Moreover, we provide a comprehensive map of the key determinants responsible for the accurate characterization of different classes of mutations, complexes and interface regions. This paves the way for more intelligent computational-interface-design algorithms and provides new insight into the interpretation of destabilizing mutations involved in complex diseases.

Keyword: Protein Structure & Function, Protein Interactions & Molecular Networks

TOP


PP61 (PT) EPIQ - Efficient detection of SNP-SNP epistatic interactions for quantitative traits
Date: Tuesday, July 15, 11:00 am - 11:25 amRoom: 312

Presenting author: Yaara Arkin, Tel-Aviv University, Israel

Additional authors:
Elior Rahmani, Tel Aviv University, Israel
Marcus E. Kleber, University of Heidelberg, Germany
Reijo Laaksonen, University of Tampere, Finland
Winfried Maerz, University of Heidelberg, Germany
Eran Halperin, Tel-Aviv University, Israel

Session Chair: Toni Kazic

Presentation Overview: Show More

Motivation: Gene-gene interactions are of potential biological and medical interest, as they can shed light on both the inheritance mechanism of a trait and on the underlying biological mechanisms. Evidence of epistatic interactions has been reported in both humans and other organisms. Unlike single-locus genome wide association studies (GWAS), which proved efficient in detecting numerous genetic loci related with various traits, interaction-based GWAS have so far produced very few reproducible discoveries. Such studies introduce a great computational and statistical burden by necessitating a large number of hypotheses to be tested including all pairs of SNPs. Thus, many software tools have been developed for interaction-based case- control studies, some leading to reliable discoveries. For quantitative data, on the other hand, only a handful of tools exist, and the computational burden is still substantial. Results: We present an efficient algorithm for detecting epistasis in quantitative GWAS, achieving a substantial runtime speedup by avoiding the need to exhaustively test all SNP pairs using metric embedding and random projections. Unlike previous metric embedding methods for case-control studies, we introduce a new embedding, where each SNP is mapped to two Euclidean spaces. We implemented our method in a tool named EPIQ (EPIstasis detection for Quantitative GWAS), and we show by simulations that EPIQ requires hours of processing time where other methods require days and sometimes weeks. Applying our method to a dataset from the Ludwigshafen Risk and Cardiovascular Health study discovered a pair of SNPs with a near-significant interaction (p=2.2×10-13), in only 1.5 hours on 10 processors. Availability: https://github.com/yaarasegre/EPIQ

Keyword: Applied Bioinformatics

TOP


PP62 (PT) Accurate viral population assembly from ultra-deep sequencing data
Date: Tuesday, July 15, 11:30 am - 11:55 pmRoom: 311

Presenting author: Serghei Mangul, University of California, Los Angeles, United States

Additional authors:
Nicholas Wu, University of California, Los Angeles, United States
Nicholas Mancuso, Georgia State University, United States
Alex Zelikovsky, Georgia State University, United States
Ren Sun, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States

Session Chair: Toni Kazic

Presentation Overview: Show More

Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. In this paper we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population which allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allows VGA to assemble rare variants. VGA utilizes an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method which scales to millions of sequencing reads. The open source C++/Python implementation of VGA is freely available for download at http://genetics.cs.ucla.edu/vga/ Contact: serghei@cs.ucla.edu

Keyword: Sequence Analysis

TOP


PP63 (HT) Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data
Date: Tuesday, July 15, 11:30 am - 11:55 pmRoom: 304

Presenting author: Jianlin Cheng, University of Missouri Columbia, United States

Additional authors:
Tuan Trieu, University of Missouri, Columbia, United States

Session Chair: Robert F. Murphy

Presentation Overview: Show More

The three-dimensional (3D) structure of a genome is critical for studying genome folding, genome function, and spatial gene regulation, but it has not been well studied. In this presentation, I will first describe a novel chromosomal-contact driven method to take Hi-C chromosomal interaction data as input in order to reconstruct the 3D structures of chromosomes. The method will be followed by a live video demonstrating how the 3D shape of a chromosome is constructed from the Hi-C data of human B-cells. Then I will show that the 3D chromosomal structures reconstructed from the Hi-C data of human B-Cells not only satisfy the observed Hi-C chromosomal contact data and some known chromatin organization features well, but also predict new Hi-C contacts accurately according to the validation test. Finally, I will describe how to assemble chromosomal structures into the 3D shape of the whole genome and discuss its dynamics.

Keyword: Evolution & Comparative Genomics

TOP


PP64 (HT) Why N-Terminal domains tend to be shorter than C-Terminal domains?
Date: Tuesday, July 15, 11:30 am - 11:55 pmRoom: 302

Presenting author: Ron Unger, Bar Ilan University, Israel

Additional authors:
Etai Jacob, Bar-Ilan University, Israel
Amnon Horovitz, Weizmann Institute , Israel

Session Chair: Predrag Radivojac

Presentation Overview: Show More

Computational analysis of proteomes in all kingdoms of life reveals a strong tendency for N-terminal domains in two-domain proteins to have shorter sequences than their neighboring
C-terminal domains. Given that folding rates are affected by chain length, we asked whether the tendency for N-terminal domains to be shorter than their neighboring C-terminal domains reflects selection for faster folding N-terminal domains. Calculations of
contact order, another predictor of folding rate, provide additional evidence that N-terminal domains tend to fold faster than their C-terminal neighboring domains. A possible explanation for this bias, which is more pronounced in prokaryotes than in eukaryotes, is that faster folding of N-terminal domains reduces the risk of protein aggregation during folding by preventing formation of non-native interdomain interactions. This explanation is supported by our finding that two-domain proteins with a shorter N-terminal domain are more abundant than those with a shorter C-terminal domain.

Keyword: Protein Structure & Function, Sequence Analysis

TOP


PP65 (PT) A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data
Date: Tuesday, July 15, 11:30 am - 11:55 pmRoom: 312

Presenting author: Iman Hajirasouliha, Brown University, United States

Additional authors:
Ahmad Mahmoody, Brown University, United States
Ben Raphael, Brown University, United States

Session Chair: Toni Kazic

Presentation Overview: Show More

High-throughput sequencing of tumor samples has shown that most tumors exhibit extensive intra-tumor heterogeneity, with multiple subpopulations of tumor cells containing different somatic mutations. Recent studies have quantified this intra- tumor heterogeneity by clustering mutations into subpopulations according to the observed counts of DNA sequencing reads containing the variant allele. However, these clustering approaches do not consider that the population frequencies of different tumor subpopulations are correlated by their shared ancestry in the same population of cells. In this paper, we introduce the binary tree partition, a novel combinatorial formulation of the problem of constructing the subpopulations of tumor cells from the variant allele frequencies of somatic mutations. We show that finding a binary tree partition is an NP-complete problem; derive an approximation algorithm for an optimization version of the problem; and present a recursive algorithm to find a binary tree partition with errors in the input. We show that the resulting algorithm outperforms existing clustering approaches on simulated and real sequencing data.

Keyword: Evolution and Comparative Genomics

TOP


PP66 (HT) Deciphering human disease mutations through the atomic-resolution protein interactome network
Date: Tuesday, July 15, 12:00 pm - 12:25 pmRoom: 311

Presenting author: Haiyuan Yu, Cornell University, United States

Additional authors:
Yu Guo, Cornell University, United States
Jishnu Das, Cornell University, United States
Hao Ran Lee, Cornell University, United States
Xiaomu Wei, Cornell University, United States
Jin Liang, Cornell University, United States
Robert Fragoza, Cornell University, United States
Adithya Sagar, Cornell University, United States
Xiujuan Wang, Cornell University, United States
Matthew Mort, Cardiff University, United Kingdom
Peter Stenson, Cardiff University, United Kingdom
David Cooper, Cardiff University, United Kingdom
Andrew Grimson, Cornell University, United States
Steven Lipkin, Weill Cornell Medical College, United States
Andrew Clark, Cornell University, United States

Session Chair: Toni Kazic

Presentation Overview: Show More

To better understand the molecular mechanisms and genetic basis of human disease, we combined the massive scale of network systems biology with the supreme resolution of traditional structural biology to generate the first comprehensive atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with structurally-defined interface residues for each interaction. We found that disease mutations are significantly enriched both among interface residues and other non-interface ones within the same domains, contradicting the previous assumption that only a few interface residues are mutation hot spots for disease. We further classified 94,476 disease-associated mutations according to their inheritance modes and found that the widely-accepted “guilt-by-association” principle does not apply to dominant mutations. Furthermore, recessive truncating mutations on the same interface are much more likely to cause the same disease, even if they are close to the N-terminus of the protein, indicating that a significant fraction of truncating mutations can generate functional protein products.

Keyword: Protein Interactions & Molecular Networks, Disease Models & Epidemiology

TOP


PP67 (PT) A statistical approach for inferring the 3D structure of the genome
Date: Tuesday, July 15, 12:00 pm - 12:25 pmRoom: 304

Presenting author: Nelle Varoquaux, Mines ParisTech, France

Additional authors:
Ferhat Ay, University of Washington, United States
William Noble, University of Washington, United States
Jean-Philippe Vert, Mines ParisTech, France

Session Chair: Robert F. Murphy

Presentation Overview: Show More

Motivation: Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate three dimensional models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely upon multidimensional scaling (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, and thereby may lead to incorrect structure reconstruction. Methods: We propose a novel approach to infer a consensus three- dimensional structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. Results: We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS- based algorithms—two metric MDS methods using different stress functions, a nonmetric version of MDS, and ChromSDE, a recently described, advanced MDS method—on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions. Availability: A Python implementation of the proposed method is available at http://cbio.ensmp.fr/pastis.

Keyword: Applied Bioinformatics

TOP


PP68 (PT) New Directions for Diffusion-Based Network Prediction of Protein Function: Incorporating Pathways with Confidence
Date: Tuesday, July 15, 12:00 pm - 12:25 pmRoom: 302

Presenting author: Mengfei Cao, Tufts University, United States

Additional authors:
Christopher Pietras, Tufts University, United States
Xian Feng, Tufts University, United States
Kathryn Doroschak, University of Minnesota, United States
Thomas Schaffner, Tufts University, United States
Jisoo Park, Tufts University, United States
Hao Zhang, Tufts University, United States
Lenore Cowen, Tufts University, United States
Benjamin Hescott, Tufts University, United States

Session Chair: Predrag Radivojac

Presentation Overview: Show More

Motivation: It has long been hypothesized that incorporating models of network noise as well as edge directions and known pathway information into the representation of protein-protein interaction networks might improve their utility for functional inference. However, a simple way to do this has not been obvious. We find that DSD, our recent diffusion-based metric for measuring dissimilarity in protein-protein interaction (PPI) networks, has natural extensions that incorporate confidence, directions, and can even express coherent pathways by calculating DSD on an augmented graph. Results: We define three incremental versions of DSD which we term cDSD, caDSD, and capDSD, where the capDSD matrix incorporates confidence, known directed edges, and pathways into the measure of how similar each pair of nodes is according to the structure of the PPI network. We test four popular function prediction methods (majority vote, weighted majority vote, multiway cut, and functional flow) using these different matrices on the Baker's yeast PPI network in cross-validation. The best performing method is weighted majority vote using capDSD. We then test the performance of our augmented DSD methods on an integrated heterogeneous set of protein association edges from the STRING database. The superior performance of capDSD in this context confirms that treating the pathways as probabilistic units is more powerful than simply incorporating pathway edges independently into the network. Availability: All source code for calculating the confidences, for extracting pathway information from KEGG XML files, and for calculating the cDSD, caDSD and capDSD matrices is available from http://dsd.cs.tufts.edu/capdsd

Keyword: Protein Interactions and Molecular Networks

TOP


PP69 (HT) Reconstructing tumor evolution using simple somatic mutation frequencies
Date: Tuesday, July 15, 12:00 pm - 12:25 pmRoom: 312

Presenting author: Quaid Morris, University of Toronto, Canada

Additional authors:
Wei Jiao, Ontario Institute for Cancer Research, Canada
Shankar Vembu, University of Toronto, Canada
Amit Deshwar, University of Toronto, Canada
Lincoln Stein, Ontario Institute for Cancer Research, Canada

Session Chair: Toni Kazic

Presentation Overview: Show More

Tumors often contain multiple, genetically diverse subclonal populations of cells. To aid in the identification of driver mutations and improve understanding of tumor development, there is considerable interest in reconstructing the evolutionary history of these subclonal populations. I will describe when this it is possible to do this reconstruction using only the allelic frequencies of individual ‘simple somatic mutations (SSMs)’ (i.e., single nucleotide variants or small indels) from one or more tumor samples. I will also describe a new model, PhyloSub, that automatically performs this reconstruction. PhyloSub uses Bayesian inference, so it explicitly represents its uncertainty when multiple phylogenies are consistent with the frequency data. PhyloSub has promising results on real and simulated data, including one example where PhyloSub provides a near perfect reconstruction of three subclonal populations based on a single set of SSM frequencies from acute myeloid leukemia.

Keyword: Evolution & Comparative Genomics, Disease Models & Epidemiology

TOP


PP70 (HT) Genomic underpinnings for network patterns and evolution
    Cancelled
Date: Tuesday, July 15, 2:00 pm - 2:25 pmRoom: 311

Presenting author: Luay Nakhleh, Rice University, United States

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

In this talk, we discuss novel studies of the evolution of regulatory networks by tightly connecting them to the underlying genomes and shedding "the light of evolution" on the combined genome-network genotype. In particular, we conduct extensive population genetic simulations and show how network motifs arise due to neutral evolutionary forces when accounting for genomic features. Further, we use data on whole-genome duplication pairs in yeast to estimate the rate of evolution of protein interactions.

Keyword: Protein Interactions & Molecular Networks, Gene Regulation & Transcriptomics

TOP


PP71 (PT) Inferring Gene Ontologies from Pairwise Similarity Data
Date: Tuesday, July 15, 2:00 pm - 2:25 pmRoom: 304

Presenting author: Michael Kramer, University of California, San Diego, United States

Additional authors:
Janusz Dutkowski, University of California, San Diego, United States
Michael Yu, University of California, San Diego, United States
Vineet Bafna, University of California, San Diego, United States
Trey Ideker, University of California, San Diego, United States

Session Chair: Lenore Cowen

Presentation Overview: Show More

Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: (1) analyze a full matrix of gene–gene pairwise similarities from -omics data; (2) infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and (3) respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge—none has been evaluated for GO inference. Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method’s ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast. Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ~30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20–25% precision, recall). Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.

Keyword: Applied Bioinformatics

TOP


PP72 (PT) Evaluating Synteny for Improved Comparative Studies
Date: Tuesday, July 15, 2:00 pm - 2:25 pmRoom: 302

Presenting author: Cristina G. Ghiurcuta, EPFL, Switzerland

Additional authors:
Bernard M.E. Moret, EPFL, Switzerland

Session Chair: Alex Bateman

Presentation Overview: Show More

Motivation: Comparative genomics aims to understand the structure and function of genomes by translating knowledge gained about some genomes to the object of study. Early approaches used pairwise comparisons, but today researchers are attempting to leverage the larger potential of multiway comparisons. Comparative genomics relies on the structuring of genomes into syntenic blocks: blocks of sequence that exhibit conserved features across the genomes. Syntenic blocks are required for complex computations to scale to the billions of nucleotides present in many genomes; they enable comparisons across broad ranges of genomes because they filter out much of the individual variability; they highlight candidate regions for in-depth studies; and they facilitate whole-genome comparisons through visualization tools. However, the concept of syntenic block remains loosely defined. Tools for the identification of syntenic blocks yield quite different results, thereby preventing a systematic assessment of the next steps in an analysis. Current tools do not include measurable quality objectives and thus cannot be benchmarked against themselves. Comparisons among tools have also been neglected—what few results are given use superficial measures unrelated to quality or consistency. Results: We present a theoretical model as well as an experimental basis with quality measures for comparing syntenic blocks and thus also for improving or designing tools for the identification of syntenic blocks. We illustrate the application of the model and the measures by applying them to syntenic blocks produced by 3 different contemporary tools (DRIMM-Synteny, i-ADHoRe and Cyntenator) on a dataset of 8 yeast genomes. Our findings highlight the need for a well founded, systematic approach to the decomposition of genomes into syntenic blocks. Our experiments demonstrate widely divergent results among these tools, throwing into question the robustness of the basic approach in comparative genomics. We have taken the first step towards a formal approach to the construction of syntenic blocks by developing a simple quality criterion based on sound evolutionary principles.

Keyword: Applied Bioinformatics

TOP


PP73 (HT) Network-based stratification of tumor mutations
Date: Tuesday, July 15, 2:00 pm - 2:25 pmRoom: 312

Presenting author: Matan Hofree, University of California, San Diego, United States

Additional authors:
John P. Shen, University of California, San Diego, United States
Andrew Gross, University of California, San Diego, United States
Hannah Carter, University of California, San Diego, United States

Session Chair: Paul Horton

Presentation Overview: Show More

Classification of cancer is predominantly organ based and fails to account for considerable heterogeneity of clinical outcomes such as survival or response to therapy. Somatic tumor genomes provide a rich new source of data for uncovering subtypes, but have proven difficult to compare, as tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in multiple cancer cohorts from The Cancer Genome Atlas. In each case, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence.

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP


PP74 (PT) Scale-space measures for graph topology link protein network architecture to function
Date: Tuesday, July 15, 2:30 pm - 2:55 pmRoom: 311

Presenting author: Marc Hulsman, Delft University of Technology, Netherlands

Additional authors:
Christos Dimitrakopoulos, ETH Zurich, Switzerland
Jeroen de Ridder, Delft University of Technology, Netherlands

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

Motivation: The network architecture of physical protein interactions is an important determinant for the molecular functions that are carried out within each cell. To study this relation, the network architecture can be characterized by graph-topological characteristics such as shortest paths and network hubs. These characteristics have an important shortcoming: they do not take into account that interactions occur across different scales. This is important since some cellular functions may involve a single direct protein interaction (small scale) while others require more and/or indirect interactions, such as protein complexes (medium scale) and interactions between large modules of proteins (large scale). Results: In this work, we derive generalized, scale-aware versions of known graph-topological measures based on diffusion kernels. We apply these to characterize the topology of networks across all scales simultaneously, generating a so called graph-topological scale-space. The comprehensive physical interaction network in yeast is used to show that scale-space based measures consistently give superior performance when distinguishing protein functional categories and three major types of functional interactions: genetic interaction, co-expression and perturbation interactions. Moreover, we demonstrate that graph-topological scale-spaces capture biologically meaningful features that provides new insights into the link between function and protein network architecture. Availability: Matlab code to calculate the STMs is available from: http://bioinformatics.tudelft.nl/TSSA Contact: j.deridder@tudelft.nl

Keyword: Protein Interactions and Molecular Networks

TOP


PP75 (PT) Using association rule mining to determine promising secondary phenotyping hypotheses
    Cancelled
Date: Tuesday, July 15, 2:30 pm - 2:55 pmRoom: 304

Presenting author: Anika Oellrich, Wellcome Trust Sanger Institute, United States

Additional authors:
Julius Jacobsen, Wellcome Trust Sanger Institute, United Kingdom
Irene Papatheodorou, Wellcome Trust Sanger Institute, United Kingdom
The Sanger Mouse Genetics Project, Wellcome Trust Sanger Institute, United Kingdom
Damian Smedley, Wellcome Trust Sanger Institute, United Kingdom

Session Chair: Lenore Cowen

Presentation Overview: Show More

Motivation: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help to identify the influences of genes and their modification on phenotypes. Gene–phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are about 20,000 genes in higher vertebrate genomes and the experimental verification of gene–phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing. Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene–phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1,967 secondary phenotype hypotheses that cover 243 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed. Availability: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/ gene/secondaryphenotype/list. Contact: ao5@sanger.ac.uk

Keyword: Databases, Ontologies and Text Mining

TOP


PP76 (PT) Robust Clinical Outcome Prediction based on Bayesian Analysis of Transcriptional Profiles and Prior Causal Networks
Date: Tuesday, July 15, 2:30 pm - 2:55 pmRoom: 302

Presenting author: Kourosh Zarringhalam, UMass Boston/Pfizer, United States

Additional authors:
Ahmed Enayetallah, Biogen Idec, United States
Padmalatha Reddy, Pfizer, United States
Daniel Ziemek, Pfizer, Germany

Session Chair: Alex Bateman

Presentation Overview: Show More

Motivation: Understanding and predicting an individual’s response in a clinical trial is key to better treatments and cost effective medicine. Over the coming years, more and more large-scale omics datasets will become available to characterize patients with complex and heterogeneous diseases at a molecular level. Unfortunately, genetic, phenotypical, and environmental variation is much higher in a human trial population than currently modeled or measured in most animal studies. In our experience, this high variability can lead to failure of trained predictors in independent studies and undermines the credibility and utility of promising high-dimensional datasets. Methods: We propose a method that utilizes patient-level genome- wide expression data in conjunction with causal networks based on prior knowledge. Our approach infers a differential expression profile for each patient and uses a Bayesian approach to infer corresponding upstream regulators. These regulators and their corresponding posterior probabilities of activity are used in a regularized regression framework to predict response. Results: We validated our approach using two clinically relevant phenotypes, namely acute rejection in kidney transplantation and response to Infliximab in ulcerative colitis. To demonstrate pitfalls in translating trained predictors across independent trials, we analyze performance characteristics of our and alternative approaches on two independent datasets for each phenotype and show that the proposed approach is able to successfully incorporate causal prior knowledge to give robust performance estimates.

Keyword: Disease Models and Epidemiology

TOP


PP77 (PT) Detecting independent and recurrent copy number aberrations using interval graphs
Date: Tuesday, July 15, 2:30 pm - 2:55 pmRoom: 312

Presenting author: Hsin-Ta Wu, Brown University, United States

Additional authors:
Iman Hajirasouliha, Brown University, United States
Benjamin Raphael, Brown University, United States

Session Chair: Paul Horton

Presentation Overview: Show More

Somatic copy number aberrations are frequent in cancer genomes, but many of these are random, passenger events. A common strategy to distinguish functional aberrations from passengers is to identify those aberrations that are recurrent across multiple samples. However, the extensive variability in the length and position of copy number aberrations makes the problem of identifying recurrent aberrations notoriously difficult. We introduce a combinatorial approach to the problem of identifying independent and recurrent copy number aberrations, focusing on the key challenging of separating the overlaps in aberrations across individuals into independent events. We derive independent and recurrent copy number aberrations as maximal cliques in an interval graph constructed from overlaps between aberrations. We efficiently enumerate all such cliques, and derive a dynamic programming algorithm to find an optimal selection of non-overlapping cliques, resulting in a very fast algorithm, which we call RAIG (Recurrent Aberrations from Interval Graphs). We show that RAIG outperforms other methods on simulated data and performs well on data from three cancer types from The Cancer Genome Atlas (TCGA). In contrast to existing approaches that employ various heuristics to select independent aberrations, RAIG optimizes a well-defined objective function. We show that this allows RAIG to identify rare aberrations that are likely functional, but are obscured by overlaps with larger passenger aberrations.

Keyword: Population Genomics

TOP


PP78 (HT) GraphProt: How to make sense out of CLIP-seq data
Date: Tuesday, July 15, 3:00 pm - 3:25 pmRoom: 311

Presenting author: Rolf Backofen, University of Freiburg, Germany

Additional authors:
Sita Lange, University Freiburg, Germany
Daniel Maticzka, University Freiburg, Germany
Fabrizio Costa, University Freiburg, Germany

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

The paper deals with one of today's hottest topics in biology, namely the analysis of RNA-protein interactions. Recent studies revealed that hundreds of RNA-binding proteins (RPBs) regulate a plethora of post-transcriptional processes. The gold standard for identifying RBP targets are experimental CLIP-seq approaches. However, a large number of binding sites remain unidentified, which is a major yet underestimated problem. The reason is simply that CLIP-seq is sensitive to expression levels. Thus, available CLIP-seq experiments for a specific protein in liver cells cannot be used to infer targets say in kidney cells.

We provide a solution by learning an accurate protein-binding model based on an efficient graph-kernel approach that learns sequence-structure properties from several thousands binding sites. Transcripts targeted in any other cells can be identified with high specificity. E.g. we show that the up-regulation in an AGO-knockdown cannot be explained with existing AGO-CLIP-seq data, but it can when using our predictions.

Keyword: Sequence Analysis, Protein Interactions & Molecular Networks

TOP


PP79 (HT) Improved exome prioritization of disease genes through cross-species phenotype comparison
Date: Tuesday, July 15, 3:00 pm - 3:25 pmRoom: 304

Presenting author: Peter Robinson, Charite University Hospital, Germany

Additional authors:
Sebastian Köhler, Charité, Germany
Anika Oellrich, Sanger Institute, United Kingdom
Kai Wang, UCS, United States
Christopher Mungall, Lawrence Berkeley National Laboratory, United States
Suzanna Lewis, Lawrence Berkeley National Laboratory, United States
Nicole Washington, Lawrence Berkeley National Laboratory-, United States
Sebastian Bauer, Charité- , Germany
Dominik Seelow, Charité- , United States
Peter Krawitz, Charité, Germany
Christian Gilissen, Nijmegen, Netherlands
Melissa Haendel, U Oregon, United States
Damian Smedley, Sanger Institute- , United Kingdom

Session Chair: Lenore Cowen

Presentation Overview: Show More

I will present an explanation of how cross-species phenotype analysis works. The International Mouse Phenotyping Consortium is currently creating KOs for all mammalian genes, resulting in an extremely useful resource for computational analysis. This is particularly interesting for human genetics, since about 3000 Mendelian disease genes are known in humans, but ca. 8000 genes have phenotypes in mouse ko models. Our work shows how to exploint this information to identify novel disease genes. We will present some examples of disease gene identifications in current projects.

Keyword: Databases & Ontologies, Sequence Analysis

TOP


PP80 (PT) MIRA: Mutual Information-Based Reporter Algorithm for Metabolic Networks
Date: Tuesday, July 15, 3:00 pm - 3:25 pmRoom: 302

Presenting author: A. Ercument Cicek, Carnegie Mellon University, United States

Additional authors:
Kathryn Roeder, Carnegie Mellon University, United States
Gultekin Ozsoyoglu, Case Western Reserve University, United States

Session Chair: Alex Bateman

Presentation Overview: Show More

Motivation: Discovering the transcriptional regulatory architecture of the metabolism has been an important topic to understand the implications of transcriptional fluctuations on metabolism. The reporter algorithm (RA) was proposed to determine the hot spots in metabolic networks, around which transcriptional regulation is focused due to a disease or a genetic perturbation. Using a z-score based scoring scheme, RA calculates the average statistical change in the expression levels of genes that are neighbors to a target metabolite in the metabolic network. The RA approach has been used in numerous studies to analyze cellular responses to the downstream genetic changes. In this paper, we propose a mutual information-based multivariate reporter algorithm (MIRA) with the goal of eliminating the following problems in detecting reporter metabolites: (1) conventional statistical methods suffer from small sample sizes, (2) as z-score ranges from minus to plus infinity, calculating average scores can lead to canceling out opposite effects, and (3) analyzing genes one by one, then aggregating results can lead to information loss. MIRA is a multivariate and combinatorial algorithm that calculates the aggregate transcriptional response around a metabolite using mutual information. We show that MIRA’s results are biologically sound, empirically significant and more reliable than RA. Results: We apply MIRA to gene expression analysis of six knock-out strains of E. coli, and show that MIRA captures the underlying metabolic dynamics of the switch from aerobic to anaerobic respiration. We also apply MIRA, to an Autism Spectrum Disorder gene expression dataset. Results indicate that MIRA’s reports metabolites that highly overlap with recently found metabolic biomarkers in the autism literature. Overall, MIRA is a promising algorithm for detecting metabolic drug targets and understanding the relation between gene expression and metabolic activity.

Keyword: Metabolomic Networks

TOP


PP81 (HT) Emerging landscape of oncogenic signatures across human cancers
Date: Tuesday, July 15, 3:00 pm - 3:25 pmRoom: 312

Presenting author: Giovanni Ciriello, Memorial Sloan Kettering Cancer Center, United States

Additional authors:
Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Bulent Arman Aksoy, Memorial Sloan Kettering Cancer Center, United States
Yasin Senbabaoglu, Memorial Sloan Kettering Cancer Center, United States
Nikolaus Schultz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Paul Horton

Presentation Overview: Show More

Cancer therapy is challenged by the diversity of molecular implementations of oncogenic processes and by the resulting variation in therapeutic responses. Projects such as The Cancer Genome Atlas (TCGA) provide molecular tumor maps in unprecedented detail. The interpretation of these maps remains a major challenge. Here we distilled thousands of genetic and epigenetic features altered in cancers to ~500 selected functional events (SFEs). Using this simplified description,
we derived a hierarchical classification of 3,299 TCGA tumors from 12 cancer types. The top classes are dominated by either mutations (M class) or copy number changes (C class).
This distinction is clearest at the extremes of genomic instability, reflecting different oncogenic processes. The full hierarchy shows event signatures characteristic of cross-tissue tumor classes. Targetable functional events are suggestive of class-specific combination therapy. These results may assist in the definition of clinical trials to match actionable oncogenic signatures with personalized therapies.

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP


PP82 (HT) From 1D to 3D and back: Genome scaffolding from DNA interaction frequency
Date: Tuesday, July 15, 3:30 pm - 3:55 pmRoom: 311

Presenting author: Noam Kaplan, University of Massachusetts Medical School, United States

Additional authors:
Job Dekker, University of Massachusetts Medical School, United States

Session Chair: Cenk Sahinalp

Presentation Overview: Show More

Despite the advancement of DNA sequencing technologies, assembly of complex genomes remains a major challenge. Surprisingly, the quality of published complex genomes has decreased, due to the growing use of short read sequencing.

We have developed a high-throughput scaffolding approach, based on the notion that loci that are near each other in the genomic sequence have a high probability of interacting with each other. We demonstrate that genome-wide in vivo chromatin interaction frequency measurements can be used as genomic distance proxies to accurately detect the positions of contigs over large distances without requiring any sequence overlap. Furthermore, we demonstrate our approach can karyotype and scaffold an entire genome de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods. Our approach can theoretically bridge any gap size, is simple, robust, scalable and applicable to any species.

Keyword: other, other

TOP


PP83 (HT) A community effort to assess drug sensitivity prediction algorithms identifies approaches for improved performance
Date: Tuesday, July 15, 3:30 pm - 3:55 pmRoom: 304

Presenting author: James Costello, University of Colorado Anschutz Medical Campus, United States

Additional authors:
Laura Heiser, OHSU, United States
Elisabeth Georgii, Aalto University, Finland
Michael Menden, EMBL, United Kingdom
Nicholas Wang, OHSU, United States
Mukesh Bansal, Columbia University, United States
Mohammad Ammad-ud-din, Aalto University, Finland
Petteri Hintsanen, University of Helsinki, Finland
Suleiman Khan, Aalto University, Finland
John-Patrick Mpindi, University of Helsinki, Finland
Olli Kallioniemi, University of Helsinki, Finland
Antti Honkela, University of Helsinki, Finland
Tero Aittokallio, University of Helsinki, Finland
Krister Wennerberg, University of Helsinki, Finland
James Collins, Boston University, United States
Dan Gallahan, NIH, United States
Dinah Singer, NIH, United States
Julio Saez-Rodriguez, EMBL, United Kingdom
Samuel Kaski, Aalto University, Finland
Joe Gray, OHSU, United States
Gustavo Stolovitzky, IBM, United States
Mehmet Gonen, Aalto University , Finland

Session Chair: Lenore Cowen

Presentation Overview: Show More

Predicting the best treatment strategy from genomic information is a core goal of personalized medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling datasets measured in human breast cancer cell lines. Through a collaborative effort between the NCI and the DREAM project, we present a total of 44 drug sensitivity prediction algorithms. We identify characteristics of top-performing methodologies, namely modeling nonlinear relationships and the application of biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling datasets; however, performance was increased by including multiple, independent datasets. We present the top-performing methodology, Bayesian Multitask MKL, which implements kernelized regression, multiview learning, multitask learning and Bayesian inference. This study establishes benchmarks for drug sensitivity prediction and identifies features that can be leveraged for future method development. We provide detailed descriptions of all methods at:http://www.the-dream-project.org/

Keyword: Applied Bioinformatics, Disease Models & Epidemiology

TOP


PP84 (PT) Metabolome-scale prediction of intermediate compounds in multi-step metabolic pathways with a recursive supervised approach
Date: Tuesday, July 15, 3:30 pm - 3:55 pmRoom: 302

Presenting author: Masaaki Kotera, Kyoto University, Japan

Additional authors:
Yasuo Tabei, Japan Science and Technology Agency, Japan
Yoshihiro Yamanishi, Kyushu University, Japan
Ai Muto, Kyoto University, Japan
Yuki Moriya, Kyoto University, Japan
Toshiaki Tokimatsu, Kyoto University, Japan
Susumu Goto, Kyoto University, Japan

Session Chair: Alex Bateman

Presentation Overview: Show More

Motivation: Metabolic pathway analysis is crucial not only in syste- matic metabolic engineering but also in rational drug design. Howe- ver, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale. Results: In this paper we develop a novel method to predict the multi-step reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as ”multi-step reaction sequence likeness”, i.e., whether or not a compound-compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multi-step reaction sequences, based on chemical substructure fingerprints of compounds. In the results, we demonstrate the usefulness of our pro- posed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set, and discuss characteristic featu- res of the extracted chemical substructure transformation patterns in multi-step reaction sequences. Our comprehensively predicted reac- tion networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways.

Keyword: Metabolomic Networks

TOP


PP85 (HT) MicroRNA-gene association as a prognostic biomarker in cancer exposes disease mechanisms
Date: Tuesday, July 15, 3:30 pm - 3:55 pmRoom: 312

Presenting author: Rotem Ben-Hamo, Bar Ilan University, Israel

Session Chair: Paul Horton

Presentation Overview: Show More

This work demonstrates a new metric to uncover clinical stratifications hidden in the association between microRNAs and genes. We will explain the methods and algorithms used in this paper and highlight the importance of finding these underlying mechanisms that may be at the core of progression disease progression. The potential of microRNAs to act both as therapeutic agents and as disease biomarkers places this family of molecules at the forefront of biomedical interest, and the identification of genomic regulatory mechanisms, their affiliation with clinical outcome and the association between specific modifications in genome sequences that may explain gain and loss of such regulatory activity, combine to suggest specific disease mechanisms and possible means of intervention in the course of the disease. This discovery has been made possible by employing regulation as a quantifiable metric, combined with the availability of whole genome sequences.

Keyword: Applied Bioinformatics, Applied Bioinformatics

TOP