Highlights Track Presentations

All Highlights and Proceedings Track presentations are presented by scientific area part of the combined Paper Presentation schedule.


Applied Bioinformatics
Databases & Ontologies
Disease Models & Epidemiology
Evolution & Comparative Genomics
Gene Regulation & Transcriptomics
Mass Spectrometry & Proteomics
Protein Interactions & Molecular Networks
Protein Structure & Function
Sequence Analysis
Text Mining
Other


Highlights Track: Applied Bioinformatics
Presenting author: Mark Leiserson, Brown University, United States
Date:Sunday, July 13 10:30 am - 10:55 amRoom: 311

Additional authors:
Dima Blokh, Tel-Aviv University, Israel
Roded Sharan, Tel-Aviv University, Israel
Benjamin Raphael, Brown University, United States
Benjamin Raphael, Brown University, United States

Area Session Chair: Terry Gaasterland

Presentation Overview:
An important challenge in cancer genome sequencing is to distinguish the small subset of somatic driver mutations that cause cancer from the multitude of random passenger mutations in a tumor. Since patients with the same cancer type typically have different collections of mutations, single-gene tests of recurrence are insufficient for this task. We present Multi-Dendrix, an algorithm to identify combinations of mutations with combinatorial properties consistent with cancer pathways. Multi-Dendrix does not use prior knowledge of pathways, and finds multiple sets of mutations simultaneously since driver mutations target multiple pathways in a patient. We applied Multi-Dendrix to glioblastoma and breast cancer data from The Cancer Genome Atlas. In both cancers, Multi-Dendrix identified gene sets overlapping major signaling pathways -- including Rb, PI(3)K, and p53 -- that were manually annotated in the TCGA publications, as well as novel gene sets that include transcription factors and regulators.
TOP

Presenting author: Levi Waldron, City University of New York, United States
Date:Sunday, July 13 3:35 pm - 4:00 pmRoom: 311

Additional authors:
Benjamin Haibe-Kains, Princess Margaret Cancer Centre, Canada
Aedin Culhane, Dana-Farber Cancer Institute, United States
Markus Riester, Dana-Farber Cancer Institute, United States
Thomas Risch, Dana-Farber Cancer Institute, United States
Svitlana Tyekucheva, Dana-Farber Cancer Institute, United States
Ina Jazic, Dana-Farber Cancer Institute, United States
Xin Victoria Wang, Dana-Farber Cancer Institute, United States
Mahnaz Ahmadifar, Dana-Farber Cancer Institute, United States
Benjamin Frederick Ganzfried, Dana-Farber Cancer Institute, United States
Giovanni Parmigiani, Dana-Farber Cancer Institute, United States
Curtis Huttenhower, Havard School of Public Health, United States
Michael Birer, Massachusetts General Hospital, United States
Christoph Bernau, LMU Munich, Germany

Area Session Chair: Fran Lewitter

Presentation Overview:
The growth of genomic technologies has generated a bioinformatics cottage industry creating gene signatures of disease and disease outcomes. Specialized tools for regression and machine learning now make it relatively easy to tune and train high-dimensional models for the prediction of patient outcome from genomic data, but most such published models remain orphaned in the literature, without follow-up validation or clinical application. We therefore undertook a systematic evaluation of 14 published gene expression-based outcome prediction models for late-stage, high-grade, serous ovarian cancer, in a curated database of 1,251 patients from 10 microarray datasets. This work assesses: 1) the accuracy of published predictive models when applied to independent datasets, 2) which modeling approaches that have been most and least effective, and 3) the influence of popular validation datasets on the literature. This talk argues for changes in what constitutes “validation” of prediction models generated from genomic data.
TOP

Presenting author: Roy Straver, VU University Medical Center Amsterdam, Netherlands
Date:Sunday, July 13 4:05 pm - 4:30 pmRoom: 304

Additional authors:
Erik Sistermans, VU University Medical Center Amsterdam, Netherlands
Henne Holstege, VU University Medical Center Amsterdam, Netherlands
Daphne van Beek, VU University Medical Center Amsterdam, Netherlands
Allerdien Visser, VU University Medical Center Amsterdam, Netherlands
Cees Oudejans, VU University Medical Center Amsterdam, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands

Area Session Chair: Cenk Sahinalp

Presentation Overview:
The presentation gives a background of non-invasive prenatal testing with NGS and addresses the necessity of using low coverage data, as well as the resulting disadvantages on downstream analysis. We highlight our algorithmic contribution in the detection of fetal copy number aberrations which is based on a within-sample read depth comparison. Also, we address some of the underlying basic assumptions incorporated in our approach. Then we show that our method, called WISECONDOR, reliably detects small copy number changes in a cohort of more than 200 pregnancies. We take a few exemplary cases that highlight the potential of WISECONDOR and show how clinical geneticists can use this tool. Finally, we discuss how the tool is currently being implemented throughout hospitals in the Netherlands.
TOP

Presenting author: Pankaj Agarwal, GSK, United States
Date:Monday, July 14 11:00 am - 11:25 amRoom: 302

Additional authors:
Philippe Sanseau, GSK, United States
Mark Hurle, GSK, United States

Area Session Chair: Yanay Ofran

Presentation Overview:
Identifying the protein to target with a medicine is a critical step in drug discovery, and it is commonly thought that innovation in drug discovery is limited because pharmaceutical companies tend to work on the same drug targets, leading to ‘me too’ drugs. However, we found that 42% of targets were innovative and not duplicative at all. In fact, competition on targets increased with more target validation as should be the case (Nat Rev Drug Discov. 2013 Aug;12(8):575-6). We also discuss systematic drug repositioning techniques based on computational analysis of data from transcriptomics (such as, Connectivity Map), side effects, phenotypic screens, and genome-wide association studies (Clin Pharmacol Ther. 2013 Apr;93(4):335-41). I will also present some key bioinformatics problems in medicine discovery.
TOP

Presenting author: Armaghan Naik, Carnegie Mellon University, United States
Date:Monday, July 14 2:10 pm - 2:35 pmRoom: 311

Additional authors:
Joshua Kangas, Carnegie Mellon, United States
Christopher Langmead, Carnegie Mellon, United States
Robert Murphy, Carnegie Mellon University, United States

Area Session Chair: Reinhard Schneider

Presentation Overview:
High throughput and high content screening involve determination of the effect of many compounds on a given target. As currently practiced, screening for each new target typically makes little use of information from screens of prior targets. Further, choices of compounds to advance to drug development are made without significant screening against off-target effects. The overall drug development process could be made more effective if potential effects of all compounds on all possible targets could be considered, yet the cost of complete experimentation is prohibitive. Here we describe a potential solution: probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for efficiently selecting which experiments to perform in order to build those models and determining when to stop. Using simulated and experimental data, we show that our approaches can produce accurate estimates of unmeasured experiments much faster than by selecting experiments at random.
TOP

Presenting author: Matan Hofree, University of California, San Diego, United States
Date:Tuesday, July 15 2:00 pm - 2:25 pmRoom: 312

Additional authors:
John P. Shen, University of California, San Diego, United States
Andrew Gross, University of California, San Diego, United States
Hannah Carter, University of California, San Diego, United States
Trey Ideker, University of California, San Diego, United States

Area Session Chair: Paul Horton

Presentation Overview:
Classification of cancer is predominantly organ based and fails to account for considerable heterogeneity of clinical outcomes such as survival or response to therapy. Somatic tumor genomes provide a rich new source of data for uncovering subtypes, but have proven difficult to compare, as tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in multiple cancer cohorts from The Cancer Genome Atlas. In each case, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence.
TOP

Presenting author: Giovanni Ciriello, Memorial Sloan Kettering Cancer Center, United States
Date:Tuesday, July 15 3:00 pm - 3:25 pmRoom: 312

Additional authors:
Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Bulent Arman Aksoy, Memorial Sloan Kettering Cancer Center, United States
Yasin Senbabaoglu, Memorial Sloan Kettering Cancer Center, United States
Nikolaus Schultz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States

Area Session Chair: Paul Horton

Presentation Overview:
Cancer therapy is challenged by the diversity of molecular implementations of oncogenic processes and by the resulting variation in therapeutic responses. Projects such as The Cancer Genome Atlas (TCGA) provide molecular tumor maps in unprecedented detail. The interpretation of these maps remains a major challenge. Here we distilled thousands of genetic and epigenetic features altered in cancers to ~500 selected functional events (SFEs). Using this simplified description,
we derived a hierarchical classification of 3,299 TCGA tumors from 12 cancer types. The top classes are dominated by either mutations (M class) or copy number changes (C class).
This distinction is clearest at the extremes of genomic instability, reflecting different oncogenic processes. The full hierarchy shows event signatures characteristic of cross-tissue tumor classes. Targetable functional events are suggestive of class-specific combination therapy. These results may assist in the definition of clinical trials to match actionable oncogenic signatures with personalized therapies.
TOP

Presenting author: James Costello, University of Colorado Anschutz Medical Campus, United States
Date:Tuesday, July 15 3:30 pm - 3:55 pmRoom: 304

Additional authors:
Laura Heiser, OHSU, United States
Elisabeth Georgii, Aalto University, Finland
Michael Menden, EMBL, United Kingdom
Nicholas Wang, OHSU, United States
Mukesh Bansal, Columbia University, United States
Mohammad Ammad-ud-din, Aalto University, Finland
Petteri Hintsanen, University of Helsinki, Finland
Suleiman Khan, Aalto University, Finland
John-Patrick Mpindi, University of Helsinki, Finland
Olli Kallioniemi, University of Helsinki, Finland
Antti Honkela, University of Helsinki, Finland
Tero Aittokallio, University of Helsinki, Finland
Krister Wennerberg, University of Helsinki, Finland
James Collins, Boston University, United States
Dan Gallahan, NIH, United States
Dinah Singer, NIH, United States
Julio Saez-Rodriguez, EMBL, United Kingdom
Samuel Kaski, Aalto University, Finland
Joe Gray, OHSU, United States
Gustavo Stolovitzky, IBM, United States
Mehmet Gonen, Aalto University , Finland

Area Session Chair: Lenore Cowen

Presentation Overview:
Predicting the best treatment strategy from genomic information is a core goal of personalized medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling datasets measured in human breast cancer cell lines. Through a collaborative effort between the NCI and the DREAM project, we present a total of 44 drug sensitivity prediction algorithms. We identify characteristics of top-performing methodologies, namely modeling nonlinear relationships and the application of biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling datasets; however, performance was increased by including multiple, independent datasets. We present the top-performing methodology, Bayesian Multitask MKL, which implements kernelized regression, multiview learning, multitask learning and Bayesian inference. This study establishes benchmarks for drug sensitivity prediction and identifies features that can be leveraged for future method development. We provide detailed descriptions of all methods at:http://www.the-dream-project.org/
TOP

Presenting author: Rotem Ben-Hamo, Bar Ilan University, Israel
Date:Tuesday, July 15 3:30 pm - 3:55 pmRoom: 312

Area Session Chair: Paul Horton

Presentation Overview:
This work demonstrates a new metric to uncover clinical stratifications hidden in the association between microRNAs and genes. We will explain the methods and algorithms used in this paper and highlight the importance of finding these underlying mechanisms that may be at the core of progression disease progression. The potential of microRNAs to act both as therapeutic agents and as disease biomarkers places this family of molecules at the forefront of biomedical interest, and the identification of genomic regulatory mechanisms, their affiliation with clinical outcome and the association between specific modifications in genome sequences that may explain gain and loss of such regulatory activity, combine to suggest specific disease mechanisms and possible means of intervention in the course of the disease. This discovery has been made possible by employing regulation as a quantifiable metric, combined with the availability of whole genome sequences.
TOP

Highlights Track: Databases & Ontologies
Presenting author: Peter Robinson, Charite University Hospital, Germany
Date:Tuesday, July 15 3:00 pm - 3:25 pmRoom: 304

Additional authors:
Sebastian Köhler, Charité, Germany
Anika Oellrich, Sanger Institute, United Kingdom
Kai Wang, UCS, United States
Christopher Mungall, Lawrence Berkeley National Laboratory, United States
Suzanna Lewis, Lawrence Berkeley National Laboratory, United States
Nicole Washington, Lawrence Berkeley National Laboratory-, United States
Sebastian Bauer, Charité- , Germany
Dominik Seelow, Charité- , United States
Peter Krawitz, Charité, Germany
Christian Gilissen, Nijmegen, Netherlands
Melissa Haendel, U Oregon, United States
Damian Smedley, Sanger Institute- , United Kingdom

Area Session Chair: Lenore Cowen

Presentation Overview:
I will present an explanation of how cross-species phenotype analysis works. The International Mouse Phenotyping Consortium is currently creating KOs for all mammalian genes, resulting in an extremely useful resource for computational analysis. This is particularly interesting for human genetics, since about 3000 Mendelian disease genes are known in humans, but ca. 8000 genes have phenotypes in mouse ko models. Our work shows how to exploint this information to identify novel disease genes. We will present some examples of disease gene identifications in current projects.
TOP

Highlights Track: Disease Models & Epidemiology
Presenting author: Kristoffer Forslund, European Molecular Biology Laboratory, Germany
Date:Monday, July 14 10:30 am - 10:55 amRoom: 311

Additional authors:
Shinichi Sunagawa, EMBL, Germany
Jens Roat Kultima, EMBL, Germany
Daniel Mende, EMBL, Germany
Manimozhiyan Arumugam, Copenhagen University, Germany
Athanasios Typas, EMBL, Germany
Peer Bork, EMBL, Germany

Area Session Chair: Janet Kelso

Presentation Overview:
Despite increasing concerns over inappropriate use of antibiotics in medicine and food production, population-level resistance transfer into the human gut microbiota has not been demonstrated beyond individual case studies. To determine the "antibiotic resistance potential" for entire microbial communities, we employ metagenomic data and quantify the totality of known resistance genes in each community (its resistome) for 68 classes and subclasses of antibiotics. In 252 fecal metagenomes, we show that the most abundant resistance determinants are those for antibiotics also used in animals, and for antibiotics longer in use. Resistance potential is higher in samples from Spain, Italy and France than from Denmark, the US, or Japan. Differences in country-level data on antibiotic use in both humans and animals, where available, match the observed resistance potential differences. Antibiotic resistance determinants of individuals persist in the human gut flora for at least a year.
TOP

Presenting author: Lars Kaderali, Technische Universität Dresden, Germany
Date:Tuesday, July 15 11:00 am - 11:25 amRoom: 311

Additional authors:
Marco Binder, University of Heidelberg, Germany
Nurgazy Sulaimanov, University of Heidelberg, Germany
Diana Clausznitzer, Technische Universität Dresden, Germany
Manuel Schulze, Technische Universität Dresden, Germany
Cristian Hüber, University of Heidelberg, Germany
Simon Lenz, University of Heidelberg, Germany
Johannes Schloeder, University of Heidelberg, Germany
Martin Trippler, University Hospital Essen, Germany
Ralf Bartenschlager, University of Heidelberg, Germany
Volker Lohmann, University of Heidelberg, Germany

Area Session Chair: Scott Markel

Presentation Overview:
As obligate intracellular parasites, viruses rely on host factors for every single step of their lifecycle. This gives rise to complex interaction networks between virus and host cell, constituting a prime example necessitating a systems biology approach. I will show how we tightly integrated mathematical modeling, bioinformatics and wetlab experiments to decipher interactions between hepatitis C virus and its host cell. Through an iterative cycle between modeling and experiment, including genome-wide siRNA screening and expression profiling, we identified key host processes determining differences in infection between different cell lines, and set up predictive mathematical models that quantitatively and mechanistically explain differences in viral replication. Mathematical model analysis has implications for drug design for hepatitis C virus infection, which I will discuss in the presentation. I will furthermore present ongoing work on integrating cellular immune response, as well as extensions of the model to other viruses, and implications for antiviral treatment.
TOP

Highlights Track: Evolution & Comparative Genomics
Presenting author: Yuval Tabach, Massachusetts General Hospital, United States
Date:Monday, JUly 14 2:40 pm - 3:05 pmRoom: 312

Additional authors:
Gary Ruvkun, Massachusetts General Hospital, United States
Carmit Levy, Tel Aviv University, United States

Area Session Chair: Russell Schwartz

Presentation Overview:
Genes with common profiles of the presence and absence in disparate genomes tend to function in the same pathway. By mapping all human genes into about 1000 clusters of genes with similar patterns of conservation across eukaryotic phylogeny, we determined that sets of genes associated with particular diseases have similar phylogenetic profiles. By focusing on those human phylogenetic gene clusters that significantly overlap some of the thousands of human gene sets defined by their coexpression or annotation to pathways or other molecular attributes, we reveal the evolutionary map that connects molecular pathways and human diseases. The other genes in the phylogenetic clusters enriched for particular known disease-genes or molecular pathways identify candidate genes for roles in those same disorders and pathways. Focusing on proteins coevolved with the microphthalmia-associated transcription factor(MITF), we identified the Notch pathway suppressor of hairless (RBP-Jk/SuH) transcription factor, and showed that RBP-Jk functions as an MITF cofactor.
TOP

Presenting author: Sushmita Roy, University of Wisconsin, Madison, United States
Date:Monday, July 14 3:10 pm - 3:35 pmRoom: 312

Additional authors:
Ilan Wapinski, Harvard Medical School, United States
Jenna Pfiffner, Broad institute, United States
Courtney French, University of California, United States
Amanda Socha, Darthmouth College, United States
Jay Konieczka, Broad institute, United States
Naomi Habib, Broad institute, United States
Manolis Kellis, MIT, United States
Dawn Thompson, Broad institute, United States
Aviv Regev, MIT & Broad institute, United States

Area Session Chair: Russell Schwartz

Presentation Overview:
Comparative functional genomics seeks to measure and compare functional measurements such as mRNA, chromatin states across multiple species. A major challenge is to develop effective tools to systematically compare these data across multiple species. In this talk, we will present a new computational approach, Arboretum to systematically identify modules of co-expressed genes in a species phylogeny. Arboretum is based on a probabilistic model of expression data that is applicable to complex phylogenies with multiple gene duplication and loss events. We applied Arboretum to study the evolution of transcriptional modules in yeast and mammalian species. In yeast, we find substantial conservation in the module expression patterns, although the specific genes in each module diverge in a life-style or clade-specific manner. We will also present some recent results on application of Arboretum to identify conservation and divergence of tissue-specific modules in mammalian species.
TOP

Presenting author: David Horn, Tel-Aviv University, Israel
Date:Monday, July 14 3:40 pm - 4:05 pmRoom: 312

Additional authors:
Erez Persi, Tel-Aviv University, Israel

Area Session Chair: Russell Schwartz

Presentation Overview:
We introduce a novel unifying methodology for the investigation of Compositional Order (CO) of protein sequences. It accounts for all types of low-complexity regions and repetitive phenomena, including the existence of large periodic structures in protein sequences. We define new CO measures providing insights into the correlation of CO with protein function and with evolution. In particular, a large-scale analysis of 94 proteomes shows that the CO vocabulary of frequently appearing amino acid triplets serves as a measure of taxonomic ordering separating major clades from each other. It serves as a novel phylogenetic tool and suggests that major CO generation occurs during the creation of a completely new species, i.e. during macroevolutionary events. It provides an alternative to the traditional ordering of species based on effective population size x mutation rate, Neu, with which it anti-correlates well, signifying that increasing FT vocabulary is associated with low evolutionary pressure.
TOP

Presenting author: Arne Elofsson, Stockholm University, Sweden
Date:Tuesday, July 15 10:30 am - 10:55 amRoom: 302

Additional authors:
Sara Light, Stockholm University, Sweden
Rauan Sagit, Stockholm University, Sweden
Oxana Sachenkova, Stockholm University, Sweden
Diana Ekman, Stockholm University, Sweden

Area Session Chair: Predrag Radivojac

Presentation Overview:
Proteins evolve not only through point mutations but also by insertion
and deletion events, which affect the length of the protein. It is known that such indel events most frequently occur in
surface-exposed loops. However, detailed analysis of indel events in
distantly related and fast evolving proteins is hampered by the
difficulty involved in correctly aligning such sequences. We
circumvent this problem by first only analyzing homologous proteins
based on length variation rather than pairwise alignments. Using this
approach we find a surprisingly strong relationship between difference
in length and difference in the number of intrinsically disordered
residues, where up to 75% of the length variation can be
explained by changes in the number of intrinsically disordered
residues. Further, we find that disorder is common in both insertions
and deletions. A more detailed analysis reveals that indel events do
not induce disorder but rather that already disordered regions accrue
indels, suggesting that there is a lowered selective pressure for
indels within intrinsically disordered regions.
TOP

Presenting author: Jianlin Cheng, University of Missouri Columbia, United States
Date:Tuesday, July 15 11:30 am - 11:55 pmRoom: 304

Additional authors:
Tuan Trieu, University of Missouri, Columbia, United States

Area Session Chair: Robert F. Murphy

Presentation Overview:
The three-dimensional (3D) structure of a genome is critical for studying genome folding, genome function, and spatial gene regulation, but it has not been well studied. In this presentation, I will first describe a novel chromosomal-contact driven method to take Hi-C chromosomal interaction data as input in order to reconstruct the 3D structures of chromosomes. The method will be followed by a live video demonstrating how the 3D shape of a chromosome is constructed from the Hi-C data of human B-cells. Then I will show that the 3D chromosomal structures reconstructed from the Hi-C data of human B-Cells not only satisfy the observed Hi-C chromosomal contact data and some known chromatin organization features well, but also predict new Hi-C contacts accurately according to the validation test. Finally, I will describe how to assemble chromosomal structures into the 3D shape of the whole genome and discuss its dynamics.
TOP

Presenting author: Quaid Morris, University of Toronto, Canada
Date:Tuesday, July 15 12:00 pm - 12:25 pmRoom: 312

Additional authors:
Wei Jiao, Ontario Institute for Cancer Research, Canada
Shankar Vembu, University of Toronto, Canada
Amit Deshwar, University of Toronto, Canada
Lincoln Stein, Ontario Institute for Cancer Research, Canada

Area Session Chair: Toni Kazic

Presentation Overview:
Tumors often contain multiple, genetically diverse subclonal populations of cells. To aid in the identification of driver mutations and improve understanding of tumor development, there is considerable interest in reconstructing the evolutionary history of these subclonal populations. I will describe when this it is possible to do this reconstruction using only the allelic frequencies of individual ‘simple somatic mutations (SSMs)’ (i.e., single nucleotide variants or small indels) from one or more tumor samples. I will also describe a new model, PhyloSub, that automatically performs this reconstruction. PhyloSub uses Bayesian inference, so it explicitly represents its uncertainty when multiple phylogenies are consistent with the frequency data. PhyloSub has promising results on real and simulated data, including one example where PhyloSub provides a near perfect reconstruction of three subclonal populations based on a single set of SSM frequencies from acute myeloid leukemia.
TOP

Highlights Track: Gene Regulation & Transcriptomics
Presenting author: Eric Franzosa, Harvard School of Public Health, United States
Date:Monday, July 14 11:00 am - 11:25 amRoom: 311

Additional authors:
Xochitl Morgan, Harvard School of Public Health, United States
Nicola Segata, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Joshua Reyes, Harvard School of Public Health, United States
Ashlee Earl, The Broad Institute, United States
Georgia Giannoukos, The Broad Institute, United States
Dawn Ciulla, The Broad Institute, United States
Dirk Gevers, The Broad Institute, United States
Matthew Boylan, Division of Gastroenterology, United States
Andrew Chan, Division of Gastroenterology, United States
Jacques Izard, Department of Microbiology, United States
Wendy Garrett, Department of Immunology and Infectious Diseases, United States
Curtis Huttenhower, Harvard School of Public Health, United States

Area Session Chair: Janet Kelso

Presentation Overview:
We have conducted one of the first human microbiome studies in a well-described large prospective cohort incorporating taxonomic, metagenomic, and metatranscriptomic profiling at multiple body sites. Systematic comparison of the gut metagenome and metatranscriptome revealed that a substantial fraction of microbial transcripts were not differentially regulated relative to their genomic abundances. Of the remainder, consistently under-expressed pathways included sporulation and amino acid biosynthesis, while upregulated pathways included ribosome biogenesis and methanogenesis. Across subjects, metatranscriptional profiles were significantly more individualized than DNA-level functional profiles, indicative of subject-specific whole-community regulation. This work also identified a subset of abundant oral microbes that routinely survive transit to the gut, but with minimal transcriptional activity there. Together, these results provide a community-wide profile of biomolecular regulatory processes in the gut, as well as validating one of the first protocols appropriate for large-scale functional profiling of the microbiome in human populations.
TOP

Presenting author: David Kreil, Boku University Vienna, Austria
Date:Monday, July 14 11:00 am - 11:25 amRoom: 304

Area Session Chair: Bernard Moret

Presentation Overview:
In the US-FDA led SEQC/MAQC-III project, different sequencing platforms were tested at over ten sites using well-established reference RNA samples with built-in truths to assess the discovery and expression-profiling performances of platforms and analysis pipelines. The results demonstrate that novel exon-exon junctions can still be discovered beyond existing comprehensive annotations and at high sequencing depths. Extensive investigations encompassing diverse performance metrics characterizing reproducibility, accuracy, and information content were combined with comparisons to qPCR and microarray platforms showing that good inter-site and cross-platform concordances for differentially expressed genes are possible, which is particularly critical in clinical and regulatory settings. In general, however, performance is application, platform, and pipeline dependent, with transcript-level profiling affected more strongly. Together with data from applications of RNA-Seq from several preclinical and clinical problems, the entire SEQC data sets comprise >100 billion reads (10Tb) and provide a unique resource for testing future developments of RNA-Seq.
TOP

Presenting author: Harmen Bussemaker, Columbia University, United States
Date:Monday, July 14 2:10 pm - 2:35 pmRoom: 304

Additional authors:
Allan Lazarovici, Columbia University, United States
Tianyin Zhou, University of Southern California, United States
Anthony Shafer, University of Washington, United States
Ana Carolina Dantas Machado, University of Southern California, United States
Richard Sandstrom, University of Washington, United States
Peter Sabo, University of Washington, United States
Yan Lu, University of Southern California, United States
Remo Rohs, University of Southern California, United States
John Stamatoyannopoulos, University of Washington, United States

Area Session Chair: Michal Linial

Presentation Overview:
We have uncovered a novel and general mechanism by which cytosine methylation can dramatically strengthen specific protein-DNA interactions. By analyzing DNase I digests of purified human genomic DNA, we discovered that (i) cleavage rate varies over a thousand-fold range with the surrounding sequence, and that cleavage near CpG dinucleotides is ten-fold higher when the cytosine is methylated. Combining all-atom computer simulation predictions of DNA shape with statistical analysis of massively parallel sequencing data, we were able to find a unified explanation for these phenomena. It turns out that cytosine methylation narrows the DNA minor groove, which in turn strengthens interactions with positively charged amino-acid side chains. Such minor groove contacts occur for a wide range of transcription factors, as well as nucleosomes. The novel structural mechanism put forward in this study therefore has the potential to significantly deepen our understanding of how epigenetic information is "read" by the cell.
TOP

Presenting author: Erez Levanon, Bar-Ilan University, Israel
Date:Monday, July 14 3:10 pm - 3:35 pmRoom: 304

Additional authors:
Yishay Pinto, Bar-Ilan University, Israel
Haim Cohen, Bar-Ilan University, Israel
Lily Bazk, Bar-Ilan University, Israel
Ami Haviv, Bar-Ilan University, Israel
Michal Barak, Bar-Ilan University, Israel
Jasmine Jacob-Hirsch, Bar-Ilan University, Israel
Patricia Deng, Stanford University, United States
Rui Zhang, Stanford University, United States
Jin Billy Li, Stanford University, United States
Gidi Rechavi, Chaim Sheba Medical Center, Israel

Area Session Chair: Michal Linial

Presentation Overview:
RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon is not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu that form double-stranded RNA undergo editing, although most sites exhibit editing at only low levels. We estimate that there are over 100 million human Alu editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.
TOP

Presenting author: Steve Lianoglou, Memorial Sloan Kettering Cancer Center, United States
Date:Monday, July 14 3:40 pm - 4:05 pmRoom: 304

Additional authors:
Christina Leslie, Memorial Sloan Kettering Cancer Center, United States
Julie Yang, Memorial Sloan Kettering Cancer Center, United States
Christine Mayr, Memorial Sloan Kettering Cancer Center, United States
Vidur Garg, Memorial Sloan Kettering Cancer Center, United States
Christine Mayr, Memorial Sloan Kettering Cancer Center, United States

Area Session Chair: Michal Linial

Presentation Overview:
More than half of human genes use alternative cleavage and polyadenylation (ApA) to generate mRNA transcripts that differ in the lengths of their 3′ untranslated regions (UTRs), thus altering the post-transcriptional fate of the message and likely the protein output. We developed a sequencing method called 3′-seq to quantitatively map the 3′ ends of the transcriptome of diverse human tissues and isogenic transformation systems. We found that most tissue-restricted genes have single 3′ UTRs, whereas most ubiquitously transcribed genes generate multiple 3′ UTRs. During transformation and differentiation, single-UTR genes change their mRNA abundance levels, while multi-UTR genes typically change 3′ UTR isoform ratios to achieve tissue specificity. However, these regulation programs target genes that function in the same pathways and processes that characterize the new cell type. Finally, tissue-specific usage of ApA sites appears to be a mechanism for changing the landscape targetable by ubiquitously expressed microRNAs.
TOP

Presenting author: Valentina Boeva, Institut Curie, France
Date:Monday, July 14 3:40 pm - 4:05 pmRoom: 302

Additional authors:
Haitham Ashoor, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Aurelie Herault, UMR 144 CNRS, Subcellular Structure and Cellular Dynamics, France
Aurelie Kamoun, Institut Curie, France
Francois Radvanyi, Institut Curie, France
Vladimir B. Bajic, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Emmanuel Barillot, Institut Curie, Mines ParisTech, France

Area Session Chair: Dietlind Gerloff

Presentation Overview:
Changes in gene expression in cancer cells are usually associated with specific changes in epigenetic profiles (e.g., histone modification profiles). In order to characterize these changes, ChIP-seq experiments are often employed. Our recent work (Ashoor et al., 2013) has demonstrated that for detection of histone modifications in cancer cells one should apply specific methods that take into account possible DNA copy number aberrations. We apply the method we developed for the detection of histone modifications to reanalyze ENCODE ChIP-seq datasets generated for cancer cell lines. We show that current ENCODE histone modification profiles and called regions with histone modifications have a systematic copy number bias: irrespective of the particular histone modification, regions of genomic gain tend to contain more called histone modifications than regions of genomic loss. Our results suggest that the ENCODE cancer cell datasets should be reanalyzed in order to eliminate the copy number bias we have observed.
TOP

Presenting author: Michael Brent, Washington University, United States
Date:Tuesday, July 15 10:30 am - 10:55 amRoom: 304

Area Session Chair: Robert F. Murphy

Presentation Overview:
I will present an algorithm, NetProphet, for inferring transcriptional regulatory networks from expression profiling of transcription factor (TF) deletion mutants. I will then show that a network constructed from this type of expression data identifies direct binding targets more accurately than one constructed from a large chromatin immunoprecipitation (ChIP) data set. However, ChIP networks contain many edges with no evidence of functional effect on target expression levels; in our network, every edge is functional. Furthermore, gene expression experiments are much easier, more reliable, and more high throughput than ChIP experiments. We conclude that gene expression, not ChIP, is currently the optimal method for network reconstruction. Finally, we show some new biological discoveries we've made with NetProphet, and describe a large deletion and expression profiling experiment that is being driven by NetProphet. TFs likely to participate in a specific biological process are targeted for deletion using a method we call PhenoProphet.
TOP

Presenting author: Jeroen de Ridder, Delft University of Technology, Netherlands
Date:Tuesday, July 15 11:00 am - 11:25 amRoom: 304

Additional authors:
Johann de Jong, Netherlands Cancer Institute, Netherlands
Lodewyk Wessels, Netherlands Cancer Institute, Netherlands
Sepideh Babaei, Delft University of Technology, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands
Waseem Akhtar, Netherlands Cancer Institute, Netherlands

Area Session Chair: Robert F. Murphy

Presentation Overview:
The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools for cancer gene discovery and gene therapy. These integrating elements have distinct integration biases. To study these biases, we generated very large datasets consisting of ∼120000 to ∼180000 unselected genomic integrations for 3 types of integrating elements. We overlaid these integration profiles with ∼80 (epi)genomic features to generate bias maps at both local and genome-wide scales. We moreover overlay a large collection of retroviral cancer-causing insertions with genome-wide chromatin capture conformation (Hi-C) data. This enables the exploration of the occurrence of 3D hot-spots of recurrent mutations that are in spatial proximity of putative cancer genes. Taken together, our results provide an assessment of integration bias at unprecedented resolution and provide new insights into the mechanisms through which retroviral integrations deregulate cellular processes in cancer cells.
TOP

Highlights Track: Mass Spectrometry & Proteomics
Presenting author: Nicholas Gauthier, Memorial Sloan Kettering Cancer Center, United States
Date:Monday, July 14 11:00 am - 11:25 amRoom: 312

Additional authors:
Boumediene Soufi, University of Tübingen, Germany
William Walkowicz, Memorial Sloan-Kettering Cancer Center, United States
Virginia Pedicord, Memorial Sloan-Kettering Cancer Center, United States
Konstantinos Mavrakis, Memorial Sloan-Kettering Cancer Center, United States
Boris Macek, University of Tübingen, Germany
Chris Sander, Memorial Sloan Kettering Cancer Center, United States
Martin Miller, Memorial Sloan Kettering Cancer Center, United States

Area Session Chair: Robert F. Murphy

Presentation Overview:
Tissue development, homeostasis, and pathogenesis involve complex signaling between many cell types through both secreted factors and direct cell-cell contact. We report a new technique to selectively and continuously label the proteomes of individual cell types in co-culture, named cell type-specific labeling using amino acid precursors (CTAP). In short, mammalian cell expression of exogenous amino acid biosynthesis enzymes from lower organisms allows specific populations of cells to produce their own supply of amino acids from supplemented amino acid precursors. The conversion of heavy isotope-labeled precursors to heavy labeled amino acids is restricted to enzyme-expressing populations, providing a way to genetically control protein labeling. Using quantitative mass spectrometry, we demonstrate the method’s ability to differentiate the cell-of-origin of intra- and intercellular proteins derived from multicellular cultures. Linking proteins to their cellular source using CTAP facilitates cell-cell communication studies and the discovery of cell type-specific biomarkers.
TOP

Presenting author: Zia Khan, University of Maryland, United States
Date:Monday, July 14 12:00 pm - 12:25 pmRoom: 311

Additional authors:
Michael Ford, MS Bioworks, LLC, United States
Darren Cusanovich, University of Chicago, United States
Amy Mitrano, University of Chicago, United States
Jonathan Prichard, Stanford University, United States
Yoav Gilad, University of Chicago, United States

Area Session Chair: Janet Kelso

Presentation Overview:
Due to the technical and computational challenges of conducting comparative, genome-scale proteomics, essentially all studies of gene regulatory evolution across primates and other mammals have focused on mRNA levels rather than protein levels. Yet, proteins perform much of the work of the cell and are subject to regulation not revealed by mRNA levels alone. Using quantitative mass spectrometry and novel computational analysis methods, we obtained thousands of comparative mRNA and protein expression measurements from human, chimpanzee, and rhesus macaque lymphoblastoid cell lines. We used data from all three species to identify genes whose regulation might have evolved under natural selection, and considered jointly, our data allowed us to identify genes where lineage-specific changes might specifically affect post-transcriptional or post-translational regulation. Our analyses indicate that on an evolutionary timescale, there is surprising flexibility in primate mRNA levels, as these changes are often either buffered or compensated for at the protein level.
TOP

Highlights Track: Protein Interactions & Molecular Networks
Presenting author: David Amar, Tel Aviv University, Israel
Date:Sunday, July 13 11:30 am - 11:55 pmRoom: 311

Additional authors:
Ron Shamir, Tel Aviv University, Israel

Area Session Chair: Terry Gaasterland

Presentation Overview:
We developed a method that takes as input two types of gene interactions and constructs a summary module map, which integrates the two sources. The presentation will start with a thorough introduction to the concept of module maps as tools for summarizing heterogeneous interaction networks. Then we will discuss extant and novel methods to construct such maps. We shall show that our novel algorithm considerably improves over prior art on simulated and real data. We shall demonstrate the method in analyses of data from three distinct domains: (1) yeast protein-protein interactions and negative genetic interactions, (2) protein-protein interactions and DNA damage-specific positive genetic interactions in yeast, and (3) gene expression profiles of lung cancer patients. Each analysis provides confirmatory and novel insights, and demonstrates the power of module maps for deeper analysis of heterogeneous high throughput data.
TOP

Presenting author: Teresa Przytycka, National Institutes of Health, United States
Date:Sunday, July 13 4:05 pm - 4:30 pmRoom: 311

Additional authors:
DongYeon Cho, NIH, United States

Area Session Chair: Fran Lewitter

Presentation Overview:
One of the major obstacles in developing cancer treatment is cancer heterogeneity. Heterogeneity of genetic and epigenetic alterations leads heterogeneity in gene expression making the discovery of genetic drivers and key genes dysregulated by their aberration very challenging.
Pathway-centric approaches have emerged as methods that can empower studies of cancer heterogeneity. I will describe two approaches we have recently developed. First, combining the utility of algorithmic techniques with the power of network-centric approaches, we designed a novel approach that allows unsupervised detection of subnetworks that are dysregulated in a subgroup of patients. The second, complementary approach, builds in topic modeling and utilizes a mixture model. Our model is based on two components (i) a measure of phenotypic similarity between the patients (ii) a list of features - possible disease causes such as mutations, copy number variations. This works complements the appreciation of cancer diversity wight the ability to represent it.
TOP

Presenting author: Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Date:Monday, July 14 12:00 pm - 12:25 pmRoom: 302

Additional authors:
Evan Molinelli, Memorial Sloan Kettering Cancer Center, United States
Jayasree Nair, Memorial Sloan Kettering Cancer Center, United States
Tahir Sheikh, Memorial Sloan Kettering Cancer Center, United States
Rita Samy, Memorial Sloan Kettering Cancer Center, United States
Xiaohong Jing, Memorial Sloan Kettering Cancer Center, United States
Qin He, Memorial Sloan Kettering Cancer Center, United States
Anil Korkut, Memorial Sloan Kettering Cancer Center, United States
Aimee Crago, Memorial Sloan Kettering Cancer Center, United States
Samuel Singer, Memorial Sloan Kettering Cancer Center, United States
Gary Schwartz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States

Area Session Chair: Yanay Ofran

Presentation Overview:
In this study, entitled “Drug synergy screen and network modeling in dedifferentiated liposarcoma identifies CDK4 and IGF1R as synergistic drug targets”, we have successfully deployed a powerful perturbation-based systems biology approach to discover and characterize drug combinations with translational relevance for cancer treatment. We demonstrate its applicability in dedifferentiated liposarcoma via the discovery of a synergistic IGF1R and CDK4 combination therapy, which we will further test in a clinical setting. From a drug combination screen with 14 anti-cancer drugs, proteomic and phenotypic response profiles serve as input for our de novo network inference method that is subsequently used to predict pathway-based mechanisms of drug synergy. We believe that this study is highly relevant for the interdisciplinary and systems biology-oriented participants of the ISMB conference. We anticipate that such integrated approaches to combinatorial therapeutics will be widely used and provide opportunities to bridge basic cancer research and clinical drug development.
TOP

Presenting author: Emmanuel Barillot, Institut Cuire, France
Date:Monday, July 14 12:00 pm - 12:25 pmRoom: 312

Additional authors:
Inna Kuperstein, Institut Cuire, France
David Cohen, Institut Cuire, France
Stuart Pook, Institut Cuire, France
Eric Viara, Institut Cuire, France
Laurence Calzone, Institut Cuire, France
Emmanuel Barillot, Institut Cuire, France
Andrei Zinovyev, Institut Cuire, France

Area Session Chair: Robert F. Murphy

Presentation Overview:
Biological knowledge can be systematically represented in a computer-readable form as a comprehensive map of molecular interactions. NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner. NaviCell is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting community feedback. NaviCell can be used for studying molecular entities of interest in the context of signaling pathways and crosstalk between pathways within a global signaling network. It greatly facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive and user-friendly fashion due to an imbedded blogging system. In addition, NaviCell provides tools for omics data integration and overlaying several types of data; visualization and analysis in the context of signaling networks.
TOP

Presenting author: Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States
Date:Tuesday, July 15 10:30 am - 10:55 amRoom: 311

Additional authors:
Vesna Memisevic, U.S. Army Medical Research and Materiel Command, United States
Nela Zavaljevski, U.S. Army Medical Research and Materiel Command, United States
Rembert Pieper, J. Craig Venter Institute, United States
Seesandra Rajagopala, J. Craig Venter Institute, United States
Keehwan Kwon, J. Craig Venter Institute, United States
Katherine Townsend, J. Craig Venter Institute, United States
Chenggang Yu, U.S. Army Medical Research and Materiel Command, United States
Xueping Yu, U.S. Army Medical Research and Materiel Command, United States
David DeShazer, U.S. Army Medical Research Institute of Infectious Diseases, United States
Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States
Anders Wallqvist, U.S. Army Medical Research and Materiel Command, United States

Area Session Chair: Scott Markel

Presentation Overview:
Bacterial proteins required for virulence, i.e., virulence factors, are a key component of bacterial pathogenicity, as they control and promote pathogenic infection and intracellular survival. Here, we present a combined in silico, in vitro, and in vivo strategy to identify and characterize novel virulence factors of Burkholderia mallei, an infectious intracellular pathogen and the causative agent of glanders. First, we used bioinformatics approaches to identify 49 putative virulent factors involved in B. mallei pathogenicity. Using yeast two-hybrid assays against normalized whole human and whole murine proteome libraries, we identified interactions between each of the putative virulent factors and host proteins. The analysis of these interactions helped us identify and characterize three novel B. mallei virulence factors, as well as host processes and pathways that can be exploited for drug and vaccine design. Finally, using murine aerosol challenge model experiments we verified that three novel virulence factors did indeed attenuate virulence.
TOP

Presenting author: Haiyuan Yu, Cornell University, United States
Date:Tuesday, July 15 12:00 pm - 12:25 pmRoom: 311

Additional authors:
Yu Guo, Cornell University, United States
Jishnu Das, Cornell University, United States
Hao Ran Lee, Cornell University, United States
Xiaomu Wei, Cornell University, United States
Jin Liang, Cornell University, United States
Robert Fragoza, Cornell University, United States
Adithya Sagar, Cornell University, United States
Xiujuan Wang, Cornell University, United States
Matthew Mort, Cardiff University, United Kingdom
Peter Stenson, Cardiff University, United Kingdom
David Cooper, Cardiff University, United Kingdom
Andrew Grimson, Cornell University, United States
Steven Lipkin, Weill Cornell Medical College, United States
Andrew Clark, Cornell University, United States

Area Session Chair: Toni Kazic

Presentation Overview:
To better understand the molecular mechanisms and genetic basis of human disease, we combined the massive scale of network systems biology with the supreme resolution of traditional structural biology to generate the first comprehensive atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with structurally-defined interface residues for each interaction. We found that disease mutations are significantly enriched both among interface residues and other non-interface ones within the same domains, contradicting the previous assumption that only a few interface residues are mutation hot spots for disease. We further classified 94,476 disease-associated mutations according to their inheritance modes and found that the widely-accepted “guilt-by-association” principle does not apply to dominant mutations. Furthermore, recessive truncating mutations on the same interface are much more likely to cause the same disease, even if they are close to the N-terminus of the protein, indicating that a significant fraction of truncating mutations can generate functional protein products.
TOP

    Cancelled
Presenting author: Luay Nakhleh, Rice University, United States
Date:Tuesday, July 15 2:00 pm - 2:25 pmRoom: 311

Area Session Chair: Cenk Sahinalp

Presentation Overview:
In this talk, we discuss novel studies of the evolution of regulatory networks by tightly connecting them to the underlying genomes and shedding "the light of evolution" on the combined genome-network genotype. In particular, we conduct extensive population genetic simulations and show how network motifs arise due to neutral evolutionary forces when accounting for genomic features. Further, we use data on whole-genome duplication pairs in yeast to estimate the rate of evolution of protein interactions.
TOP

Highlights Track: Protein Structure & Function
Presenting author: Inbal Sela-Culang, Bar-Ilan University, Israel
Date:Sunday, July 13 3:05 pm - 3:30 pmRoom: 302

Additional authors:
Yanay Ofan, Bar Ilan University, Israel
Vered Kunik, Bar Ilan University, Israel
Anat Burkovitz, Bar Ilan University, Israel
Guy Nimrod, Bar Ilan University, Israel
Mohammed Rafii-El-Idrissi Benhnia, La Jolla Institute for Allergy and Immunology, United States
Michael H. Matho, La Jolla Institute for Allergy and Immunology, United States
Thomas Kaever, La Jolla Institute for Allergy and Immunology, United States
Matt Maybeno, La Jolla Institute for Allergy and Immunology, United States
Andrew Schlossman, La Jolla Institute for Allergy and Immunology, United States
Dirk Zajonc, La Jolla Institute for Allergy and Immunology, United States
Shane Crotty, La Jolla Institute for Allergy and Immunology, United States
Bjoern Peters, La Jolla Institute for Allergy and Immunology, United States
Sheng Li, University of Texas Health Science Center, United States
Yan Xiang, University of Texas Health Science Center, United States
Yanay Ofran, Bar-Ilan University, Israel

Area Session Chair: Toni Kazic

Presentation Overview:
Abs must bind indistinct patches on proteins that attempt to escape recognition. They must be able to recognize virtually any surface while strictly maintaining their own fold. A little is known about the mechanisms that allow Abs to do this. Thus, while most drugs that are in clinical development are Abs, there is currently no simple way to determine experimentally or computationally what exactly they bind.
We will review a series of studies that revealed key mechanisms that enable Abs to perform these tasks. We will present a novel prediction approach that utilizes these findings, combined with simple competition assays, to predict where on an Ag a given Ab will bind. The accuracy of these predictions is verified experimentally using crystallography and other methods. To conclude, we will bring more examples, and discuss the power of combining sophisticated predictions with simple experiments.
TOP

Presenting author: Yuanfang Guan, University of Michigan, United States
Date:Monday, July 14 12:00 pm - 12:25 pmRoom: 304

Additional authors:
Hongdong Li, University of Michigan, United States
Rajasree Menon, University of Michigan, United States
Yuchen Wen, University of Michigan, United States
Gilbert S. Omenn, University of Michigan, United States
Matthias Kretzler, University of Michigan, United States
Yuanfang Guan, University of Michigan, United States

Area Session Chair: Bernard Moret

Presentation Overview:
Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6.
TOP

Presenting author: Rudi Agius, Cancer Research UK, United Kingdom
Date:Tuesday, July 15 11:00 am - 11:25 am Room: 302

Additional authors:
Mieczyslaw Torchala, Cancer Research UK, United Kingdom
Iain H. Moal, Joint BSC-IRB Research Program in Computational Biology, Spain
Juan Fernández-Recio, Joint BSC-IRB Research Program in Computational Biology, Spain
Paul A, Cancer Research UK, United Kingdom
Paul Bates, Cancer Research UK, United Kingdom

Area Session Chair: Predrag Radivojac

Presentation Overview:
Protein-protein interactions vary considerably in their degree of stickiness. Mutations at protein interfaces can alter the interaction between protein pairs, causing them to dissociate faster or slower and rework the dynamics of the cellular networks. Therefore, the calculation and interpretation of mutants, which affect the rate of dissociation, is critical to our understanding of complex networks and disease. In this work, we exploit the energy and distribution of key binding ‘hotspot’ residues for the calculation of off-rate changes upon mutations. This enables us to pin-point the critical regions of stability and how they change for complexes of different sizes. Moreover, we provide a comprehensive map of the key determinants responsible for the accurate characterization of different classes of mutations, complexes and interface regions. This paves the way for more intelligent computational-interface-design algorithms and provides new insight into the interpretation of destabilizing mutations involved in complex diseases.
TOP

Presenting author: Ron Unger, Bar Ilan University, Israel
Date:Tuesday, July 15 11:30 am - 11:55 pmRoom: 302

Additional authors:
Etai Jacob, Bar-Ilan University, Israel
Amnon Horovitz, Weizmann Institute , Israel

Area Session Chair: Predrag Radivojac

Presentation Overview:
Computational analysis of proteomes in all kingdoms of life reveals a strong tendency for N-terminal domains in two-domain proteins to have shorter sequences than their neighboring
C-terminal domains. Given that folding rates are affected by chain length, we asked whether the tendency for N-terminal domains to be shorter than their neighboring C-terminal domains reflects selection for faster folding N-terminal domains. Calculations of
contact order, another predictor of folding rate, provide additional evidence that N-terminal domains tend to fold faster than their C-terminal neighboring domains. A possible explanation for this bias, which is more pronounced in prokaryotes than in eukaryotes, is that faster folding of N-terminal domains reduces the risk of protein aggregation during folding by preventing formation of non-native interdomain interactions. This explanation is supported by our finding that two-domain proteins with a shorter N-terminal domain are more abundant than those with a shorter C-terminal domain.
TOP

Highlights Track: Sequence Analysis
Presenting author: Adam Phillippy, National Biodefense Analysis and Countermeasures Center, United States
Date:Sunday, July 13 12:00 pm - 12:25 pmRoom: 304

Additional authors:
Sergey Koren, National Biodefense Analysis and Countermeasures Center, United States
Gregory Harhay, U.S. Department of Agriculture, United States
Timothy Smith, U.S. Department of Agriculture, United States
James Bono, U.S. Department of Agriculture, United States
Dayna Harhay, U.S. Department of Agriculture, United States
Scott McVey, U.S. Department of Agriculture, United States
Diana Radune, National Biodefense Analysis and Countermeasures Center, United States
Nicholas Bergman, National Biodefense Analysis and Countermeasures Center, United States

Area Session Chair: Serafim Batzoglou

Presentation Overview:
The short reads generated by first- and second-generation sequencing often produce highly fragmented assemblies, even for small genomes. Single-molecule sequencing addresses this problem by greatly increasing read length, which simplifies assembly. By analyzing the repeat complexity of 2,267 complete microbial genomes, we have shown that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio sequencing library. This reduces the cost of microbial finishing by an order of magnitude. More recently, we assembled the eukaryotic genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, and Drosophila melanogaster using only PacBio reads. In the case of D. melanogaster, our PacBio Corrected Reads (PBcR) algorithm assembled the genome more completely than the current reference, which involved over a decade of manual finishing. I will present both these past and present results, as well as a new approach for scaling long-read assembly to gigabase-sized genomes.
TOP

Presenting author: Mikael Boden, The University of Queensland, Australia
Date:Sunday, July 13 3:05 pm - 3:30 pmRoom: 304

Additional authors:
Minh Cao, The University of Queensland, Australia
Edward Tasker, Monash University, Australia
Sailaja Vishwanathan, Monash University, Australia
Sridevi Sureshkumar, Monash University, Australia
Sureshkumar Balasubramanian, Monash University, Australia
Kai Willadsen, The University of Queensland, Australia
Michael Imelfort, The University of Queensland, Australia

Area Session Chair: Cenk Sahinalp

Presentation Overview:
Expansion of tri-nucleotide repeats is known to cause over twenty neurological diseases. While next-generation sequencing technologies offer unprecedented opportunities to assess variation in genomes, they have limitations in regard to repeat regions. We review options scientists have to estimate length variation of short tandem repeats of biological significance, and to investigate what causes their instability. We present a Bayesian method to statistically detect variation from paired-end sequence data to (for the first time) analyse repeat tracts of sizes beyond the read length of current technology. Using strains of A. thaliana, we experimentally validate estimates, and recover the only known unstable repeat locus IIL1. Extensive quantitative comparisons of alternative analysis pipelines provide guidance to the likely outcome in terms of repeat variant calling accuracy.
TOP

Presenting author: Menachem Fromer, Icahn School of Medicine at Mount Sinai, United States
Date:Monday, July 14 2:10 pm - 2:35 pmRoom: 302

Additional authors:
Shaun Purcell, Icahn School of Medicine at Mount Sinai, United States

Area Session Chair: Dietlind Gerloff

Presentation Overview:
In dealing with large next-generation sequencing data in the study of disease, the sheer number of potentially relevant neutral genetic variants results in a decreased signal-to-noise ratio. In this talk, we will discuss how we dealt with these issues in two recently published studies of schizophrenia. In one, ~2500 schizophrenia cases and ~2500 matched controls were whole-exome-sequenced. The magnitude of neutral mutations largely overwhelms the number of genetic variants more likely related to disease, which we focus on by frequency filtering, biological impact, and pathway analysis. In the second study, de novo mutations were sought out in father-mother-child trios to find mutations not yet subjected to selective pressures. In this instance, the overwhelming majority of such potential mutations (seemingly arising as new in the children) are false positives. We carefully sifted through these using sequencing and other metrics to find the real ones most likely associated with disease.
TOP

Presenting author: Wen-Yu Chung, National Kaohsiung University of Applied Sciences, Taiwan
Date:Monday, July 14 2:40 pm - 3:05 pmRoom: 304

Additional authors:
Robert Schmitz, The Salk Institute for Biological Studies, United States
Tanya Biorac, Life Technologies Corp.-Ion Torrent, United States
Delia Ye, Life Technologies Corp.-Ion Torrent, United States
Miroslav Dudas, Life Technologies Corp.-Ion Torrent, United States
Gavin Meredith, Life Technologies Corp.-Ion Torrent, United States
Christopher Adams, Life Technologies Corp.-Ion Torrent, United States
Joseph Ecker, The Salk Institute for Biological Studies, United States
Michael Zhang, University of Texas at Dallas, United States

Area Session Chair: Michal Linial

Presentation Overview:
Whole-genome DNA methylation sequencing provides both methylation patterns and genetic information. We utilized base resolution methylomes to directly identify allelic linkage of DNA methylation and genomic variants. The paired association was further extended to construct hepitypes by the simultaneous phasing of genotype and methylation. Using such approach, the sequencing reads provide direct statistics of the interdependence between methylcytosines and nucleotide variations; consequently, the detailed patterns of genetic and epigenetic variations can be readily inferred by data. Moreover, the analysis is not limited by known single nucleotide variants. In addition to imprinted regions and SNV-in-CpG sites, we show numerous cis-regulatory sequence-associated DNA methylation sites. We extended this strategy to incorporate multiple nucleotide and methylation sites and ranked hepitypes according to the observed frequency. The top-ranked hepitypes indicate that methylated sites are often observed from the same allele.
TOP

Presenting author: Rolf Backofen, University of Freiburg, Germany
Date:Tuesday, July 15 3:00 pm - 3:25 pmRoom: 311

Additional authors:
Sita Lange, University Freiburg, Germany
Daniel Maticzka, University Freiburg, Germany
Fabrizio Costa, University Freiburg, Germany

Area Session Chair: Cenk Sahinalp

Presentation Overview:
The paper deals with one of today's hottest topics in biology, namely the analysis of RNA-protein interactions. Recent studies revealed that hundreds of RNA-binding proteins (RPBs) regulate a plethora of post-transcriptional processes. The gold standard for identifying RBP targets are experimental CLIP-seq approaches. However, a large number of binding sites remain unidentified, which is a major yet underestimated problem. The reason is simply that CLIP-seq is sensitive to expression levels. Thus, available CLIP-seq experiments for a specific protein in liver cells cannot be used to infer targets say in kidney cells.

We provide a solution by learning an accurate protein-binding model based on an efficient graph-kernel approach that learns sequence-structure properties from several thousands binding sites. Transcripts targeted in any other cells can be identified with high specificity. E.g. we show that the up-regulation in an AGO-knockdown cannot be explained with existing AGO-CLIP-seq data, but it can when using our predictions.
TOP

Highlights Track: Text Mining
Presenting author: Ashutosh Malhotra, Fraunhofer institute for algorithms and scientific computing, Germany
Date:Sunday, July 13 11:00 a.m. - 11:25 a.m.Room: 311

Additional authors:
Martin Hofmann-Apitius, Fraunhofer institute for algorithms and scientific computing, Germany
Erfan Younesi, Fraunhofer institute for algorithms and scientific computing, Germany

Area Session Chair: Terry Gaasterland

Presentation Overview:
Automated information extraction and knowledge acquisition technology (“text mining”) share the potential to possibly reduce manual reading and human curation efforts for the construction of knowledge bases. Particularly in reference to complex, mostly idiopathic diseases like Alzheimer’s disease (AD), automatic recognition of stage specific speculative statements communicating experimental finding can provide a new insights into the directions of disease etiology and progression. However, a systematic gathering of all scientific speculation that exists in a given context is a non-trivial task and, if done manually, is laborious and time-consuming.This work presents a methodology that demonstrates how using a dictionary of speculative patterns (HypothesisFinder approach) in combination with designed Alzheimer's disease ontology (ADO) enables the collection, interpretation, curation and discovery of a broad spectrum of knowledge needed for efficient and systematic AD research.
TOP

Presenting author: Donald C. Comeau, U.S. National Library of Medicine, United States
Date:Sunday, July 13 4:35 pm - 5:00 pmRoom: 302

Additional authors:
Rezarta Islamaj Doğan, U.S. National Library of Medicine, United States
Paolo Ciccarese, Harvard University, United States
Kevin Bretonnel Cohen, University of Colorado, United States
Martin Krallinger, Spanish National Cancer Research Centre, Spain
Florian Leitner, Spanish National Cancer Research Centre, Spain
Zhiyong Lu, U.S. National Library of Medicine, United States
Yifan Peng, University of Delaware, United States
Fabio Rinaldi, University of Zurich, Switzerland
Manabu Torii, University of Delaware, United States
Alfonso Valencia, Spanish National Cancer Research Centre, Spain
Karin Verspoor, The University of Melbourne, Australia
Thomas C. Wiegers, North Carolina State University, United States
Cathy H. Wu, University of Delaware, United States
W. John Wilbur, U.S. National Library of Medicine, United States
Donald Comeau, U.S. National Library of Medicine, United States

Area Session Chair: Toni Kazic

Presentation Overview:
After a brief motivation, there will an overview of the BioC format and supporting input/output libraries. Then there will be a summary of the BioC implementations, tools, services, and corpora currently available. Implementations of BioC to hold this data, read it from and write it back to XML files are available in C++, Go, Java, Python, and Ruby. Online services using the format are available for semantic role labeling, sentence simplification, and entity labeling. Text preprocessing pipelines for sentence segmentation, tokenization, parts of speech, lemmatization, and parsing are available in C++ and Java. Named entity recognizers are available for disease, genes, chemicals, species and mutations. Annotated corpora are available for abbreviation definition detection, disease mentions, protein-protein interaction events and metabolites. The examples will focus on tools and applications that demonstrate the features and flexibility of BioC.
TOP

Highlights Track: other
Presenting author: Steven Brenner, University of California, Berkeley, United States
Date:Monday, July 14 2:40 pm - 3:05 pmRoom: 302

Additional authors:
Steven Brenner, University of California, Berkeley, United States

Area Session Chair: Dietlind Gerloff

Presentation Overview:
Genome science is reaching a critical juncture. More than 10,000 genetic variants have been associated with traits, allowing breakthroughs in basic biological research and medical applications. However, privacy concerns have excluded most researchers from directly analyzing the vast wealth of human genomic information. Nonetheless, risks of vast data breaches are rapidly rising—and further progress requires ever-larger cohorts. We currently inhibit research without effectively protecting human subjects; prospects for harm to both individuals and to medical research are growing.

An extended discussion of the genome leaks issues may be found at http://compbio.berkeley.edu/proj/leak/
TOP

Presenting author: Noam Kaplan, University of Massachusetts Medical School, United States
Date:Tuesday, July 15 3:30 pm - 3:55 pmRoom: 311

Additional authors:
Job Dekker, University of Massachusetts Medical School, United States

Area Session Chair: Cenk Sahinalp

Presentation Overview:
Despite the advancement of DNA sequencing technologies, assembly of complex genomes remains a major challenge. Surprisingly, the quality of published complex genomes has decreased, due to the growing use of short read sequencing.

We have developed a high-throughput scaffolding approach, based on the notion that loci that are near each other in the genomic sequence have a high probability of interacting with each other. We demonstrate that genome-wide in vivo chromatin interaction frequency measurements can be used as genomic distance proxies to accurately detect the positions of contigs over large distances without requiring any sequence overlap. Furthermore, we demonstrate our approach can karyotype and scaffold an entire genome de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods. Our approach can theoretically bridge any gap size, is simple, robust, scalable and applicable to any species.
TOP