Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

#ISMB2016

Sponsors

Silver:
Bronze:
F1000
Recursion Pharmaceuticals

Copper:
Iowa State University

General and Travel Fellowship Sponsors:
Seven Bridges GBP GigaScience OverLeaf PLOS Computational Biology BioMed Central 3DS Biovia GenenTech HiTSeq IRB-Group Schrodinger TOMA Biosciences

Highlights Track Presentations

Highlights, Late Breaking Research and Proceedings Track presentations will be presented by Theme.

Attention Conference Presenters - please review the Speaker Information Page available here.

Presenters names in bold (for updates and changes email steven@iscb.org)

TP001: Robust Detection of Alternative Splicing in a Population of Single Cells
Date:Sunday, July 10 10:10 am - 10:30 am
Room: Northern Hemisphere A1/A2
Topic: GENES
  • Joshua Welch, UNC Chapel Hill, United States
  • Yin Hu, Sage Bionetworks, United States
  • Jan Prins, UNC Chapel Hill, United States

Area Session Chair: Ioannis Xenarios

Presentation Overview: Show

Single cell RNA-seq data promises to be an invaluable tool for characterizing cellular heterogeneity, but study of alternative splicing in single cells has been limited by the unique challenges of single cell data and lack of suitable analysis methods. We present SingleSplice, which is to our knowledge the first algorithm for identifying alternative splicing in a population of single cells. SingleSplice uses a statistical model trained on the technical noise profile of synthetic spike-in transcripts to identify genes exhibiting biological variation in isoform composition. We applied SingleSplice to data from 279 mouse embryonic stem cells and discovered genes that show significant alternative splicing across the set of cells. A subset of these genes are linked to cell cycle stage, suggesting a novel connection between alternative splicing and the cell cycle. Using SingleSplice, we also characterized the isoform usage heterogeneity of 466 adult and fetal human cortical cells.

TP005: Unexpected Features of the Dark Proteome
Date:Sunday, July 10 10:30 am - 10:50 am
Room: Northern Hemisphere A3/A4
Topic: PROTEINS
  • Nelson Perdigão, Universidade de Lisboa, Portugal
  • Julian Heinrich, CSIRO, Australia
  • Christian Stolte, CSIRO, Australia
  • Kenneth Sabir, Garvan Institute of Medical Research, Australia
  • Michael Buckley, CSIRO, Australia
  • Bruce Tabor, CSIRO, Australia
  • Beth Signal, Garvan Institute of Medical Research, Australia
  • Brian Gloss, Garvan Institute of Medical Research, Australia
  • Christopher Hammang, Garvan Institute of Medical Research, Australia
  • Burkhard Rost, Technische Universität München, Germany
  • Andrea Schafferhans, Technische Universität München, Germany
  • Sean O'Donoghue, CSIRO & Garvan Institute, Australia

Area Session Chair: Lenore Cowen

Presentation Overview: Show

We surveyed the "dark" proteome - that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only 14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.

TP008: Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing
Date:Sunday, July 10 10:50 am - 11:10 am
Room: Northern Hemisphere A3/A4
Topic: PROTEINS
  • Xinping Yang, Dana-Farber Cancer Institute, United States
  • Jasmin Coulombe-Huntington, McGill University, Canada
  • Shuli Kang, University of California, San Diego, United States
  • Gloria M. Sheynkman, Dana-Farber Cancer Institute, United States
  • Tong Hao, Dana-Farber Cancer Institute, United States
  • Aaron Richardson, Dana-Farber Cancer Institute, United States
  • Song Sun, University of Toronto, Canada
  • Fan Yang, University of Toronto, Canada
  • Yun A. Shen, Dana-Farber Cancer Institute, United States
  • Ryan R. Murray, Dana-Farber Cancer Institute, United States
  • Kerstin Spirohn, Dana-Farber Cancer Institute, United States
  • Bridget E. Begg, Dana-Farber Cancer Institute, United States
  • Miquel Duran-Frigola, Institute for Research in Biomedicine (IRB Barcelona), Spain
  • Andrew MacWilliams, Dana-Farber Cancer Institute, United States
  • Samuel J. Pevzner, Dana-Farber Cancer Institute, United States
  • Quan Zhong, Dana-Farber Cancer Institute, United States
  • Shelly A. Trigg, Dana-Farber Cancer Institute, United States
  • Stanley Tam, Dana-Farber Cancer Institute, United States
  • Lila Ghamsari, Dana-Farber Cancer Institute, United States
  • Nidhi Sahni, Dana-Farber Cancer Institute, United States
  • Song Yi, Dana-Farber Cancer Institute, United States
  • Maria D. Rodriguez, Dana-Farber Cancer Institute, United States
  • Dawit Balcha, Dana-Farber Cancer Institute, United States
  • Guihong Tan, University of Toronto, Canada
  • Michael Costanzo, University of Toronto, Canada
  • Brenda Andrews, University of Toronto, Canada
  • Charles Boone, University of Toronto, Canada
  • Xianghong J. Zhou, University of Southern California, United States
  • Kourosh Salehi-Ashtiani, Dana-Farber Cancer Institute, United States
  • Benoit Charloteaux, Dana-Farber Cancer Institute, United States
  • Alyce A. Chen, Dana-Farber Cancer Institute, United States
  • Michael A. Calderwood, Dana-Farber Cancer Institute, United States
  • Patrick Aloy, Institute for Research in Biomedicine (IRB Barcelona), Spain
  • Frederick P. Roth, University of Toronto, Canada
  • David E. Hill, Dana-Farber Cancer Institute, United States
  • Lilia M. Iakoucheva, University of California, San Diego, United States
  • Yu Xia, McGill University, Canada
  • Marc Vidal, Dana-Farber Cancer Institute, United States

Area Session Chair: Lenore Cowen

Presentation Overview: Show

While alternative splicing is known to diversify the functional characteristics of some genes, the extent to which protein isoforms globally contribute to functional complexity on a proteomic scale remains unknown. To address this systematically, we cloned full-length open reading frames of alternatively spliced transcripts for a large number of human genes, and combined protein-protein interaction profiling with computer modeling to functionally compare hundreds of protein isoform pairs. The majority of isoform pairs share less than 50% of their interactions. In the global context of interactome network maps, alternative isoforms tend to behave like distinct proteins rather than minor variants of each other. Interaction partners specific to alternative isoforms tend to be expressed in a highly tissue-specific manner and belong to distinct functional modules. Our integrated experimental and computational strategy reveals a widespread expansion of protein interaction capabilities through alternative splicing and suggests that many alternative isoforms are functionally divergent.

TP009: Single molecule-level characterization of bacterial epigenomes, heterogeneity and gene regulation
Date:Sunday, July 10 10:50 am - 11:10 am
Room: Northern Hemisphere E1/E2
Topic: GENES
  • John Beaulaurier, Icahn School of Medicine at Mount Sinai, United States
  • Xue-Song Zhang, New York University Medical School, United States
  • Shijia Zhu, Icahn School of Medicine at Mount Sinai, United States
  • Robert Sebra, Icahn School of Medicine at Mount Sinai, United States
  • Chaggai Rosenbluh, Icahn School of Medicine at Mount Sinai, United States
  • Gintaras Deikus, Icahn School of Medicine at Mount Sinai, United States
  • Nan Shen, Icahn School of Medicine at Mount Sinai, United States
  • Diana Munera, Harvard Medical School, United States
  • Matthew Waldor, Harvard Medical School, United States
  • Andrew Chess, Icahn School of Medicine at Mount Sinai, United States
  • Martin Blaser, New York University Medical School, United States
  • Eric Schadt, Icahn School of Medicine at Mount Sinai, United States
  • Gang Fang, Icahn School of Medicine at Mount Sinai, United States

Area Session Chair: Alex Bateman

Presentation Overview: Show

Beyond its role in host defense, bacterial DNA methylation also plays important roles in the regulation of gene expression, virulence and antibiotic resistance. Bacterial cells in a clonal population can generate epigenetic heterogeneity to increase population-level phenotypic plasticity. Single molecule, real-time (SMRT) sequencing enables the detection of N6-methyladenine and N4-methylcytosine, two major types of DNA modifications comprising the bacterial methylome. However, existing SMRT sequencing-based methods for studying bacterial methylomes rely on a population-level consensus that lacks the single-cell resolution required to observe epigenetic heterogeneity. Here, we present SMALR (single-molecule modification analysis of long reads), a novel framework for single molecule-level detection and phasing of DNA methylation. Using seven bacterial strains, we show that SMALR yields significantly improved resolution and reveals distinct types of epigenetic heterogeneity. SMALR is a powerful new tool that enables de novo detection of epigenetic heterogeneity and empowers investigation of its functions in bacterial populations.

TP011: Large-scale Text Mining Web Services for Bioinformatics Research
Date:Sunday, July 10 11:40 am - 12:00 pm
Room: Northern Hemisphere A3/A4
Topic: DATA
  • Chih-Hsuan Wei, NCBI, United States
  • Robert Leaman, NCBI, United States
  • Zhiyong Lu, NCBI, United States

Area Session Chair: Lenore Cowen

Presentation Overview: Show

Processing the biomedical literature with automated tools becomes more important as its growth accelerates. We present NCBI text-mining web services, an online version of our text mining suite for biomedical concept recognition and information extraction. Our service incorporates five state of the art tools we developed previously: DNorm (for diseases), GNormPlus (genes/proteins), SR4GN (species), tmChem (chemicals and drugs), and tmVar (variants). Using our service, users can instantly retrieve results from all five tools for any abstract in PubMed. Users may also process arbitrary text – such as full-text articles or non-PubMed publications – using our asynchronous batch mode, or easily visualize results through our web-based application PubTator. We simplify interoperability by supporting multiple data formats, and handle large requests through a computer cluster to ensure scalability. Our web service is already in wide use, supporting research projects in biocuration, crowdsourcing and translational bioinformatics. The web service is freely available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl

TP012: Genetic Architectures of Quantitative Variation in RNA Editing Pathways
Date:Sunday, July 10 11:40 am - 12:00 pm
Room: Northern Hemisphere E1/E2
Topic: GENES
  • Tongjun Gu, University of Florida, United States
  • Daniel Gatti, The Jackson Laboratory, United States
  • Anuj Srivastava, The Jackson Laboratory, United States
  • Elizabeth Snyder, The Jackson Laboratory, United States
  • Narayanan Raghupathy, The Jackson Laboratory, United States
  • Petr Simecek, The Jackson Laboratory, United States
  • Karen Svenson, The Jackson Laboratory, United States
  • Ivan Dotu, The Jackson Laboratory, United States
  • Jeffrey Chuang, The Jackson Laboratory, United States
  • Mark Keller, University of Wisconsin, United States
  • Alan Attie, University of Wisconsin, United States
  • Robert Braun, The Jackson Laboratory, United States
  • Gary Churchill, The Jackson Laboratory, United States

Area Session Chair: Alex Bateman

Presentation Overview: Show

RNA editing refers to post-transcriptional processes that alter the base sequence of RNA. Recently, hundreds of new RNA editing targets have been reported. However, the mechanisms that determine the specificity and degree of editing are not well understood. We examined quantitative variation of site-specific editing in a genetically diverse multiparent population, Diversity Outbred mice, and mapped polymorphic loci that alter editing ratios globally for C-to-U editing and at specific sites for A-to-I editing. An allelic series in the C-to-U editing enzyme Apobec1 influences the editing efficiency of Apob and 58 additional C-to-U editing targets. We identified 49 A-to-I editing sites with polymorphisms in the edited transcript that alter editing efficiency. In contrast to the shared genetic control of C-to-U editing, most of the variable A-to-I editing sites were determined by local nucleotide polymorphisms in proximity to the editing site in the RNA secondary structure. Our results indicate that RNA editing is a quantitative trait subject to genetic variation and that evolutionary constraints have given rise to distinct genetic architectures in the two canonical types of RNA editing.

TP014: Text as Data: Using text-based features for proteins representation and for computational prediction of their characteristics
Date:Sunday, July 10 12:00 pm - 12:20 pm
Room: Northern Hemisphere A3/A4
Topic: DATA / PROTEINS
  • Hagit Shatkay, University of Delaware, United States
  • Scott Brady, University of Toronto, Canada
  • Andrew Wong, Mount Sinai Hospital, Canada

Area Session Chair: Lenore Cowen

Presentation Overview: Show

The current era of large-scale biology is characterized by a fast-paced growth in the number of sequenced genomes and, consequently, by a multitude of identified proteins whose function has yet to be determined.
Simultaneously, any known or postulated information concerning genes and proteins is part of the ever-growing published scientific literature, which is expanding at a rate of over a million new publications per year.
Computational tools that attempt to automatically predict and annotate protein characteristics, such as function and localization patterns, are being developed along with systems that aim to support the process via text mining.
Most work on protein characterization focuses on features derived directly from protein sequence data. Protein-related work that does aim to utilize the literature typically concentrates on extracting specific facts (e.g., protein interactions) from text.
In the past few years we have taken a different route, treating the literature as a source of text-based features, which can be employed just as sequence-based protein-features were used in earlier work, for predicting protein subcellular location and possibly also function. We discuss here in detail the overall approach, along with results from work we have done in this area demonstrating the value of this method and its potential use.

TP019: Temporal dynamics of collaborative networks in large scientific consortia
Date:Sunday, July 10 2:00 pm - 2:20 pm
Room: Northern Hemisphere A1/A2
Topic: SYSTEMS / DATA
  • Daifeng Wang, Yale University, United States
  • Koon-Kiu Yan, Yale University, United States
  • Joel Rozowsky, Yale University, United States
  • Eric Pan, Yale University, United States
  • Mark Gerstein, Yale University, United States

Area Session Chair: Hagit Shatkay

Presentation Overview: Show

The emergence of collective creative enterprise such as large scientific consortia is a unique feature in modern scientific research, especially in the biomedical field. Recent examples include the ENCyclopedia Of DNA Elements (ENCODE) consortium annotating the human genome and the 1000 Genomes consortium generating a catalog of uniformly called variants for the biomedical community. To ensure that the scientific community can benefit from these efforts, it is important to understand the connections between consortium members and researchers outside of the consortium. To address the issue, we analyzed the temporal co-authorship network structures of ENCODE and modENCODE consortia [1]. Our analysis revealed their publication patterns showing that the consortium members work closely as a community whereas non-members collaborate in the scale of a few laboratories. We also identified a few brokers playing an important role to facilitate collaborations with outside researchers, which suggests that large scientific consortia should set up formal an outreach group to communicate with outside researchers.

[1] Daifeng Wang, Koon-Kiu Yan, Joel Rozowsky, Eric Pan, Mark Gerstein, "Temporal dynamics of collaborative networks driven by large scientific consortia," in press, Trends in Genetics, 2016, doi: 10.1016/j.tig.2016.02.006

TP022: Positive and negative forms of replicability in gene network analysis
Date:Sunday, July 10 2:20 pm - 2:40 pm
Room: Northern Hemisphere A1/A2
Topic: SYSTEMS / DATA
  • Wim Verleyen, Cold Spring Harbor Laboratory, United States
  • Sara Ballouz, Cold Spring Harbor Laboratory, United States
  • Jesse Gillis, Cold Spring Harbor Laboratory, United States

Area Session Chair: Hagit Shatkay

Presentation Overview: Show

Presentation description
In this work, we build a model of scientific communities in which simulated researchers characterizes gene function through an individual analysis of particular network data. We model each researcher by sampling from a pool of machine learning algorithms, each of which then samples individually from various public resources. By simulating groups of researchers operating under different constraints, we are able to assess practices leading to successful group decisions. Our analysis reveals an important principle limiting the value of replication, namely that targeting it directly causes ‘easy’ or uninformative replication to dominate analyses. We provide examples of this problem in action and walk through seminal results which replicate precisely because they are unlikely to be true. We also show that this bias has a strong impact in protein-protein interaction data leading to negative correlations between replicability and good quality control. We discuss some implications for public discourse, particularly on scientific matters.

Scientific Justification
Our recent work analyzes what is usually considered a fundamental basis of science – replication – and shows that not only can it be useless as a general heuristic for discovering the truth, it can be damaging when applied naively. Intuitively, the idea is close to that of overfitting in machine learning. Two researchers both of whom overfit to some data might obtain more replicable results, but this form of replicability is of little value. Using real data and analysis techniques, we show this problem is apparent in the field of gene network analysis as a whole.

While we focus on the field-wide meta-analysis, the detailed examples in the paper are particularly important:

A) We show that a seminal result in autism genetics replicates because it is false. Our detailed walk-through makes results that are otherwise very surprising into intuitive principles.

B) We show that the negative relationship our model predicts between replicability and quality control can be seen directly in even reports for individual protein-protein interactions.

Our research in this area is ongoing and our talk will discuss additional examples, drawn principally from medically important cases (e.g., point (A)) which I think will be of high interest at ISMB, as well as methods for identifying these problems.

Although the focus is on networks, the model and examples are of relevance to any knowledge-base (hence our area choice). This is work that repays careful consideration and I’m confident that discussing it at ISMB will provide exceptional value to our colleagues.

TP023: COSMOS: accurate detection of somatic structural variations through asymmetric comparison between tumor and normal samples
Date:Sunday, July 10 2:20 pm - 2:40 pm
Room: Northern Hemisphere A3/A4
Topic: DISEASE / GENES
  • Koichi Yamagata, AIST, Japan
  • Ayako Yamanishi, Graduate School of Medicine, Osaka University, Japan
  • Chikara Kokubu, Graduate School of Medicine, Osaka University, Japan
  • Junji Takeda, Graduate School of Medicine, Osaka University, Japan
  • Jun Sese, AIST, Japan

Area Session Chair: Paul Horton

Presentation Overview: Show

An important challenge in cancer genomics is precise detection of structural variations (SVs) by high-throughput short-read sequencing, which is hampered by the high false discovery rates of existing analysis tools. Here we propose an accurate SV detection method named COSMOS, which compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. COSMOS also prioritizes the candidate SVs using strand-specific read-depth information. Performance tests on modeled tumor genomes revealed that COSMOS outperformed existing methods in terms of F-measure. We also applied COSMOS to an experimental mouse cell-based model, in which SVs were induced by genome engineering and gamma-ray irradiation, followed by polymerase chain reaction-based confirmation. The precision of COSMOS was 84.5 %, while the next best existing method was 70.4%. Moreover, the sensitivity of COSMOS was the highest, indicating that COSMOS has great potential for cancer genome analysis.

TP027: Covariation Is a Poor Measure of Molecular Coevolution
Date:Sunday, July 10 2:40 pm - 3:00 pm
Room: Northern Hemisphere E1/E2
Topic: PROTEINS
  • David Talavera, University of Manchester, United Kingdom
  • Simon Lovell, University of Manchester, United Kingdom
  • Simon Whelan, Uppsala University, Sweden

Area Session Chair: Jianlin Cheng

Presentation Overview: Show

Covariation of amino-acid residues is widely studied for applications such as protein structure prediction, protein design and analysis of protein-protein interactions. However, there is no consensus as to the underlying evolutionary mechanisms that give rise to covariation. We have developed a theoretical model with the aim of understanding the origins of covariation. Our model predicts that covariation is generated only if strong selective pressure is present for extremely long periods of time. Our empirical analyses confirm this expectation as we demonstrate 1) that covariation methods select pairs of residues with slow evolutionary rates; and, 2) that the location of conserved residues in the core of the protein structure explains the precision of these methods at finding residues in close proximity. Altogether, our results explain the relative performance and limitations of current covariation methods, and the difficulties for developing evolutionary models for detecting coevolution.

TP028: Quantitative analysis of microRNA mediated regulation on competing endogenous RNAs
Date:Sunday, July 10 3:30 pm - 3:50 pm
Room: Northern Hemisphere A1/A2
Topic: SYSTEMS / GENES
  • Ye Yuan, Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University, China
  • Bing Liu, Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University, China
  • Peng Xie, Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University, China
  • Michael Zhang, Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas, Dallas, United States
  • Yanda Li, Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University, China
  • Zhen Xie, Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University, China
  • Xiaowo Wang, Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology/Department of Automation, Tsinghua University, China

Area Session Chair: Hagit Shatkay

Presentation Overview: Show

Each microRNA species can bind various types of target RNAs. Therefore, target RNAs could indirectly regulate each other by sequestering shared microRNAs. This phenomenon is called competing endogenous RNAs (ceRNA) effect. The off-target phenomenon in RNAi technology is also closely related to this effect. With the combination of systems biology modeling analysis and synthetic biology experiments, we established a mathematical model to describe the microRNA regulation and built relative synthetic gene circuits in cultured human cells to quantify the ceRNA effect under variable conditions. The results suggested that the ceRNA effect is affected by the abundance of microRNA and targets, the number and affinity of binding site, and the mRNA degradation pathway determined by the degree of microRNA-mRNA complementarity. Furthermore, a non-reciprocal competing effect of microRNA and RNAi was also demonstrated, while providing a new direction for the improvement of RNAi technology.

TP034: Identification of essential molecular and cellular processes controlling the response time and intensity of inflammation
Date:Sunday, July 10 4:10 pm - 4:30 pm
Room: Northern Hemisphere A1/A2
Topic: SYSTEMS / DISEASE
  • Alexander Mitrophanov, Department of Defense Biotechnology High Performance Computing Software Applications Institute, United States
  • Sridevi Nagaraja, Department of Defense Biotechnology High Performance Computing Software Applications Institute, United States
  • Jaques Reifman, Department of Defense Biotechnology High Performance Computing Software Applications Institute, United States

Area Session Chair: Hagit Shatkay

Presentation Overview: Show

Pathological inflammation, including inflammatory response with exaggerated intensity (sepsis) or with delayed resolution (chronic inflammation), has defied attempts at efficacious treatment. Here, we developed and applied a computational strategy to demonstrate how specific molecular and cellular components can be manipulated to achieve targeted modulation of the inflammatory response time and intensity. The strategy was based on comprehensive sensitivity and correlation analyses using our recently developed kinetic model that can represent thousands of possible inflammation scenarios. We identified three molecular mediators whose inhibition may robustly restore pathological inflammation to its normal course. We found that inflammation timing was more difficult to control than its intensity. Yet, simultaneous inhibition of two distinct targets suggested a reliable means to normalize both excessively strong and abnormally prolonged inflammatory responses. Our model was validated with existing experimental data and suggested new in vivo experiments.

TP035: Robust discrimination of cell types from tissue expression profiles
Date:Sunday, July 10 4:10 pm - 4:30 pm
Room: Northern Hemisphere A3/A4
Topic: DISEASE / DATA
  • Aaron M. Newman, Stanford University, United States
  • Andrew J. Gentles, Stanford University, United States
  • Chih Long Liu, Stanford University, United States
  • Michael R. Green, University of Nebraska Medical Center, United States
  • Weiguo Feng, Stanford University, United States
  • Scott V. Bratman, University of Toronto, Canada
  • Dongkyoon Kim, Stanford University, United States
  • Yue Xu, Stanford University, United States
  • Amanda Khuong, Stanford University, United States
  • Chuong D. Hoang, National Cancer Institute, United States
  • Viswam S. Nair, Stanford University, United States
  • Robert B. West, Stanford University, United States
  • Sylvia K. Plevritis, Stanford University, United States
  • Maximilian Diehn, Stanford University, United States
  • Ash A. Alizadeh, Stanford University, United States

Area Session Chair: Paul Horton

Presentation Overview: Show

Changes in cellular composition underlie diverse physiological states. While flow cytometry and immunohistochemistry are commonly used to characterize tissue heterogeneity, the former requires cell dissociation, which can alter representation, while the latter is generally limited to one marker per section. To complement these methods, we developed CIBERSORT, an in silico deconvolution approach that robustly enumerates cell subsets of interest from gene expression profiles (GEPs) of bulk tissues. We evaluated CIBERSORT using fresh, frozen, and fixed specimens, including solid tumors, and found that it outperforms previous deconvolution methods with respect to noise, unknown mixture content, and closely related cell types. When applied to GEPs from 25 tumor types in a pan-cancer analysis, CIBERSORT revealed complex associations between 22 tumor-infiltrating leukocyte subsets and clinical outcomes. Predictions linking specific immune phenotypes to survival were validated in lung adenocarcinoma. CIBERSORT provides a novel platform for tissue characterization without requiring antibodies, disaggregation, or living cells.

TP042: Core Regulatory Circuitry of the Plant Circadian System
Date:Monday, July 11 10:30 am - 10:50 am
Room: Northern Hemisphere E1/E2
Topic: SYSTEMS / GENES
  • Mathias Foo, University of Warwick, United Kingdom
  • David Somers, The Ohio State University, United States
  • Pan-Jun Kim, Asia Pacific Center for Theoretical Physics, Korea, Republic of

Area Session Chair: Nicola Mulder

Presentation Overview: Show

Sleep/wake cycles in animals exemplify daily biological rhythms driven by internal molecular clocks, circadian clocks, which are important for plant life as well. The plant circadian clock is much more complex than any other organisms, eluding our understanding of its design principle. Based on the mechanistic modeling and simulation of Arabidopsis thaliana, we successfully identified a kernel of the plant circadian system, the critical gene regulatory circuitry for clock function. The kernel integrates four major negative feedback loops for molecular circadian oscillations. Strikingly, the kernel structure, as well as the whole clock circuitry, was found to be overwhelmingly composed of inhibitory, not activating, interactions among genes. This fact facilitates the global coordination of plant circadian molecular profiles to often exhibit sharply-shaped, cuspidate waveforms, which indicate clock events that are markedly peaked at very specific times of day. Our approach elucidates a design principle of biological clockwork, implicated in synthetic biology.

TP043: DNA editing of LTR retrotransposons reveals the impact of APOBECs on vertebrate genomes
Date:Monday, July 11 10:50 am - 11:10 am
Room: Northern Hemisphere A1/A2
Topic: GENES
  • Binyamin Knisbacher, The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Israel
  • Erez Levanon, The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Israel

Area Session Chair: Yana Bromberg

Presentation Overview: Show

LTR retrotransposons are retrovirus-like entities widespread in vertebrate genomes. These replicating endogenous retroviruses (ERVs) must be restricted to prevent deleterious mutations and maintain genome integrity. The APOBEC DNA-editing enzymes can do so by inflicting C-to-U hypermutation in retrotransposon DNA during their mobilization. In some cases, hypermutated retrotransposons successfully integrate into the genome, introducing unique sequences, which increase retrotransposon diversity and the probability of developing new function at the loci of insertion. We developed a computational approach to identify such events, applied it to genomes of 123 diverse species and identified numerous DNA edited sites in humans and various vertebrate lineages. Unexpectedly, DNA editing is exceptionally prevalent in some birds, including one of Darwin's finches. Edited ERVs are enriched in genic regions, thereby raising the probability of their exaptation for novel function. Our results show that DNA editing has a substantial role in vertebrate innate immunity and may accelerate genome evolution.

TP044: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
Date:Monday, July 11 10:50 am - 11:10 am
Room: Northern Hemisphere A3/A4
Topic: DATA / PROTEINS
  • Hannes Bretschneider, University of Toronto,
  • Brendan Frey, University of Toronto, Canada
  • Andrew Delong, Deep Genomics, Canada
  • Babak Alipanahi, University of Toronto, Canada

Area Session Chair: Bruno Gaeta

Presentation Overview: Show

Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with ‘deep learning’ techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a ‘mutation map’ that indicates how variations affect binding within a specific sequence.

TP047: Revisiting the computational analysis of DNase sequencing
Date:Monday, July 11 11:40 am - 12:00 pm
Room: Northern Hemisphere A3/A4
Topic: GENES
  • Ivan G. Costa, RWTH Aachen Universtiy, Germany
  • Eduardo Gadde Gusmao, RWTH Aachen Universtiy, Germany
  • Manuel Allhoff, RWTH Aachen Universtiy, Germany
  • Martin Zenke, RWTH Aachen University, Germany

Area Session Chair: Bruno Gaeta

Presentation Overview: Show

DNase-seq is a powerful technique for detection of cell-specific binding sites in a genome-wide manner. Computational footprinting methods, which search for footprint-like DNase I cleavage patterns on the DNA, allow the detection of binding sites in a base pair resolution. There is, however, a debate in the literature on the influence of experimental artifacts as DNase I cleavage bias and transcription factor residence time on computational footprint methods. We investigated these artifacts in a comprehensive panel of DNase-seq data sets, 10 footprinting methods and 88 transcription factors. Our comparative analysis indicates the advantage of HINT, DNase2TF and PIQ in relation to other footprinting methods. We demonstrate that correcting the DNase-seq signal based on cleavage bias estimation significantly improves accuracy of computational footprinting. We also propose a score to detect footprints arising from transcription factors with short residence time, as footprints of such factors have low predictive performance.

TP052: Deciphering evolutionary strata on plant sex chromosomes and fungal mating-type chromosomes through compositional segmentation
Date:Monday, July 11 12:20 pm - 12:40 pm
Room: Northern Hemisphere A1/A2
Topic: GENES / SYSTEMS
  • Rajeev Azad, University of North Texas, United States
  • Ravi Shanker Pandey, University of North Texas, United States

Area Session Chair: Yana Bromberg

Presentation Overview: Show

Abstract:
Sex chromosomes have evolved from a pair of homologous autosomes which differentiated into sex determination systems, such as XY or ZW systems, as a consequence of successive recombination suppression between gametologous chromosomes. To identify regions of recombination suppression, the “evolutionary strata”, even when only the sequence of sex chromosome in the homogametic sex (i.e. X or Z chromosome) is available, we have developed an integrated segmentation and clustering method. In order to understand the early evolution of sex chromosomes, we applied our method to recently evolved plant sex chromosomes. Our method could decipher all known evolutionary strata on papaya and Silene latifolia X chromosomes, and decipheried two, yet unknown, evolutionary strata on an incipient sex chromosome of Populus trichocarpa. Application to sex chromosome V of brown alga Ectocarpus sp. recovered sex determining and pseudoautosomal regions, and application to mating-type chromosomes of an anther-smut fungus Microbotryum lychnidis-dioicae uncovered five new strata.

Justification:
Evolution of sex chromosomes in animals and birds is relatively well-studied than in plants, although 48 dioecious plants have already been reported. A key aspect in understanding sex chromosome evolution is to decipher the successive regions of recombination suppression between the gametologous sex chromosomes. However, until now, only two plants Silene latifolia and papaya have been examined for the recombination suppressed regions, namely, the evolutionary strata, on their X chromosomes. This was made possible by sequencing of sex-linked genes on both X and Y chromosomes, which is a requirement of all current methods that determine strata structure based on comparison of gametologous sex chromosomes. To circumvent this limitation and detect strata even in the absence of Y chromosome sequence, we have developed an integrated segmentation and clustering method, which could recapitulate the previously identified strata on the Silene latifolia and papaya X chromosomes without X-Y comparison, and deciphered two, yet unknown, strata on an incipient sex chromosome of Populus trichocarpa.

Emergence and evolution of sex chromosomes in many plants are much recent than the mammalian sex chromosome histories, and therefore, our approach provides a much needed tool for understanding early evolution of sex chromosomes using dioecious plants as model systems. The paucity of heterogametic sex chromosome sequence (Y or W sequence) makes our approach even more relevant, and perhaps the only available tool, for understanding the sex chromosome evolution without being constrained by the unavailability of Y or W sequence, or by the loss of Y-linked or W-linked genes.

TP053: Predicting effects of noncoding variants with deep learning-based sequence model
Date:Monday, July 11 12:20 pm - 12:40 pm
Room: Northern Hemisphere A3/A4
Topic: GENES / DATA
  • Jian Zhou, Princeton University, United States
  • Olga Troyanskaya, Princeton University, United States

Area Session Chair: Bruno Gaeta

Presentation Overview: Show

Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning-based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants.

TP054: Integrative genomics analyses unveil downstream biological effectors of disease-specific polymorphisms buried in intergenic regions
Date:Monday, July 11 12:20 pm - 12:40 pm
Room: Northern Hemisphere E1/E2
Topic: GENES / DISEASE
  • Haiquan Li, University of Arizona, United States
  • Ikbel Achour, University of Arizona Center for Biomedical Informatics and Biostatistics, United States
  • Lisa Bastarache, Vanderbilt University, United States
  • Joanne Berghout, The University of Arizona, United States
  • Vincent Gardeux, The University of Illinois at Chicago, France
  • Jianrong Li, University of Arizona, United States
  • Younghee Lee, University of Utah, United States
  • Lorenzo Pesce, The University of Chicago, United States
  • Xinan Yang, the University of Chicago, United States
  • Kenneth Ramos, The University of Arizona, United States
  • Ian Foster, Argonne National Laboratory & The University of Chicago, United States
  • Joshua Denny, Vanderbilt University, United States
  • Jason Moore, University of Pennsylvania, United States
  • Yves Lussier, The University of Arizona, United States

Area Session Chair: Nicola Mulder

Presentation Overview: Show

Altered biological mechanisms arising from disease-associated polymorphisms, remain difficult to characterize when those variants are intergenic. We developed a computational method that identifies shared downstream mechanisms by which inter- and intragenic SNPs contribute to a specific physiopathology. Modelling 2,000,000 pairs of disease-associated SNPs (GWAS) with eQTL and Gene Ontology functional annotations, we predicted 3,870 inter-intra and inter-intra SNP-pairs with convergent biological mechanisms (FDR<0.05). These SNP-pairs with overlapping mRNA targets or similar functional annotations were more associated with the same disease than unrelated pathologies (OR>12). We independently confirmed synergistic and antagonistic genetic interactions for prioritized SNP-pairs of Alzheimer’s (p=0.046), cancer (p=0.039), and rheumatoid arthritis (p<10-4). Using ENCODE, we validated that the biological mechanisms shared within prioritized SNP-pairs are frequently governed by matching transcription factor binding sites and long-range chromatin interactions. These results provide a “roadmap” of disease mechanisms emerging from GWAS and further identify downstream candidate therapeutic targets of intergenic SNPs.

TP056: Alignment-free scaffolding of large genome drafts using long sequences and jumping library MPET reads
Date:Monday, July 11 2:00 pm - 2:20 pm
Room: Northern Hemisphere A1/A2
Topic: GENES
  • Rene Warren, BC Cancer Agency, Genome Sciences Centre, Canada
  • Lauren Coombe, BC Cancer Agency, Genome Sciences Centre, Canada
  • Sarah Yeo, BC Cancer Agency, Genome Sciences Centre, Canada
  • Chen Yang, BC Cancer Agency, Genome Sciences Centre, Canada
  • Justin Chu, BC Cancer Agency, Genome Sciences Centre, Canada
  • Austin Hammond, BC Cancer Agency, Genome Sciences Centre, Canada
  • Hamid Mohamadi, BC Cancer Agency, Genome Sciences Centre, Canada
  • Ben Vandervalk, BC Cancer Agency, Genome Sciences Centre, Canada
  • Erdi Kucuk, BC Cancer Agency, Genome Sciences Centre, Canada
  • Inanc Birol, BC Cancer Agency, Genome Sciences Centre, Canada

Area Session Chair: Pedja Radivojac

Presentation Overview: Show

=====150 word description of the presentation

Over the past months, single-molecule long-reads from established and emerging technologies have proven valuable to the assembly of complete bacterial draft genomes, and to help track viral outbreaks. At the moment, the use of those technologies on their own is still too often costly for de novo assembly of mammalian-size genomes. Last year, we demonstrated that despite the lower base accuracy associated with long-read sequencing platforms, they are indisputably effective for scaffolding small and large high-quality draft genomes, as it increases the contiguity and completeness of low-cost assemblies, and thereby reduces the complexity of genome drafts. During the course of the year, a new read-linking technology from 10X Genomics has emerged, and holds promise for genome scaffolding. We will present advances in scaffolding and genome finishing, describing further developments to the LINKS scaffolder and how we applied these technologies to the large genomes of American bullfrog and spruce.

=====250 word justification-like argument

We submit the enclosed manuscript entitled “LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads” for consideration as a presentation for the highlights track of ISMB.
Long sequence reads are of prime importance to genome assembly, which is in turn cornerstone to genome characterization. Although long reads from existing and upcoming technologies still have ways to go before being used routinely in de novo genome assembly projects, their utility for scaffolding existing good-quality assemblies is paramount. The scaffolding problem has been explored by many, including our group, but has only recently been applied to emerging long DNA sequence reads from Oxford Nanopore Technologies (ONT) Ltd.
In our presentation we discuss an effective and elegant method for genome scaffolding with long and imperfect sequences that use linked k-mers at set distance intervals. We present new developments since publication, including native scaffolding with jumping library (MPET) reads and the use of an improved Bloom filter to exclude erroneous k-mer pairs. We demonstrate that even low accuracy sequence data has tremendous potential for increasing genome assembly contiguity without the need for error correction or pre-processing, and show how our alignment-free solution scales up to large eukaryotic genomes.
We anticipate that this timely work will be of broad interest to ISMB attendees as the uptake of genomics in research labs and in the clinic increases with the affordability of DNA sequencing. We expect LINKS to have utility in helping assemble large genomes, as we enter the era of long DNA sequence reads.

TP059: Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems
Date:Monday, July 11 2:20 pm - 2:40 pm
Room: Northern Hemisphere BCD
Topic: DATA / SYSTEMS
  • Michael Ku Yu, UCSD, United States
  • Michael Kramer, UCSD, United States
  • Janusz Dutkowski, UCSD, Data4Cure, United States
  • Rohith Srivas, UCSD, Stanford University, United States
  • Katherine Licon, UCSD, United States
  • Jason F. Kreisberg, UCSD, United States
  • Cherie Ng, aTyr Pharmaceuticals, United States
  • Nevan Krogan, UCSF, United States
  • Roded Sharan, Tel Aviv University, United States
  • Trey Ideker, UCSD, United States

Area Session Chair: Russell Schwartz

Presentation Overview: Show

Accurately translating genotype to phenotype requires accounting for the functional impact of genetic variation at many biological scales. Here, we present a strategy for genotype-phenotype reasoning based on existing knowledge of cellular subsystems. These subsystems and their hierarchical organization are defined by the Gene Ontology or a complementary ontology inferred directly from previously published datasets. Guided by the ontology’s hierarchical structure, we organize genotype data into an “ontotype,” that is, a hierarchy of perturbations representing the effects of genetic variation at multiple cellular scales. The ontotype is then interpreted using logical rules generated by machine learning to predict phenotype. This approach substantially outperforms previous non-hierarchical methods for translating yeast genotype to cell growth phenotype, and it accurately predicts the growth outcomes of two new screens of 2,503 double gene knockouts affecting DNA repair or nuclear lumen. Ontotypes also generalize to larger knockout combinations, setting the stage for interpreting the complex genetics of disease.

TP066: SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets
Date:Monday, July 11 2:40 pm - 3:00 pm
Room: Northern Hemisphere E1/E2
Topic: DISEASE / SYSTEMS
  • Jing Guo, School of Computer Engineering, Nanyang Technological University, Singapore
  • Hui Liu, Changzhou University, China
  • Jie Zheng, School of Computer Engineering, Nanyang Technological University, Singapore

Area Session Chair: Judith Blake

Presentation Overview: Show

250-word Scientific Justification

Synthetic lethality (SL) is a type of genetic interaction between two genes such that simultaneous perturbations of the two genes result in cell death, while a perturbation of either gene alone is not lethal. Hence, the inhibition of SL partners of genes with cancer-specific mutations could selectively kill cancer cells but spare normal cells. Therefore, SL is emerging as a promising anticancer strategy that could potentially overcome the drawbacks of traditional chemotherapies by reducing severe side effects. However, there has not been a comprehensive database dedicated to collecting SL pairs and related knowledge. In this paper, we propose a comprehensive database, SynLethDB (http://histone.sce.ntu.edu.sg/SynLethDB/), which contains SL pairs collected from biochemical assays, computational predictions and text mining results on human and four model species, i.e. mouse, fruit fly, worm and yeast. For each SL pair, a confidence score was calculated by integrating individual scores derived from different evidence sources. We also developed a statistical analysis module to estimate the sensitivity of cancer cells to drugs targeting human SL partners, based on large-scale genomics data, gene expression profiles and drug sensitivity profiles on more than 1000 cancer cell lines. To help users access and mine the wealth of the data, functionalities such as search and filtering, orthology search, gene set enrichment analysis as well as a user-friendly web interface have been implemented to facilitate data mining and interpretation. SynLethDB would be a useful resource for biomedical research community and pharmaceutical industry.



150-word Presentation Description

Synthetic lethality (SL) is a type of genetic interaction between two genes such that simultaneous perturbations of the two genes result in cell death, while a perturbation of either gene alone is not lethal. Hence, the inhibition of SL partners of genes with cancer-specific mutations could selectively kill cancer cells but spare normal cells. Therefore, SL is an emerging anticancer strategy that could potentially overcome the drawbacks of traditional chemotherapies by reducing severe side effects. However, there has not been a comprehensive database dedicated to collecting SL pairs and related knowledge. In this talk, I will present the SynLethDB database (http://histone.sce.ntu.edu.sg/SynLethDB/), which contains SL pairs collected from chemical assays and computational predictions on human and model species. I will introduce the computational problem of SL prediction, with SynLethDB as benchmark data. Biologists can use the knowledge and data resources to guide wet-lab screenings of SL using newest technologies (e.g. CRISPR-Cas9).



Source of Original Publication:
Jing Guo, Hui Liu, Jie Zheng. SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets. Nucleic Acids Research, 44 (D1): D1011 – D1017, 2016 (Impact Factor = 9.112).

TP067: CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations
Date:Monday, July 11 3:30 pm - 3:50 pm
Room: Northern Hemisphere BCD
Topic: DATA / DISEASE
  • Maria Chikina, University of Pittsburgh, United States
  • Stuart Sealfon, Icahn School of Medicine at Mount Sinai, United States
  • Elena Zaslavsky, Icahn School of Medicine at Mount Sinai, United States

Area Session Chair: Russell Schwartz

Presentation Overview: Show

Identifying alterations in gene expression associated with different clinical states is important for the study of human biology. However, clinical samples used in gene expression studies are often derived from heterogeneous mixtures with variable cell-type composition, complicating statistical analysis.

Considerable effort has been devoted to modeling sample heterogeneity, and presently there are many methods that can estimate cell proportions or pure cell-type expression from mixture data. However, there is no method that comprehensively addresses mixture analysis in the context of differential expression without relying on additional proportion information, which can be inaccurate and is frequently unavailable.

In this study we consider a clinically relevant situation where neither accurate proportion estimates nor pure cell expression is of direct interest, but where we are rather interested in detecting and interpreting relevant differential expression in mixture samples. We develop a method, cell-type COmputational Differential Estimation (CellCODE), that addresses the specific statistical question directly, without requiring a physical model for mixture components. Our approach is based on latent variable analysis and is computationally transparent, requires no additional experimental data, yet outperforms existing methods that use independent proportion measurements. CellCODE has few parameters that are robust and easy to interpret. The method can be used to track changes in proportion, improve power to detect differential expression and assign the differentially expressed genes to the correct cell-type.

TP070: Gene essentiality and synthetic lethality in haploid human cells
Date:Monday, July 11 3:30 pm - 3:50 pm
Room: Northern Hemisphere E1/E2
Topic: GENES / SYSTEMS
  • Jacques Colinge, IRCM Inserm U1194, University of Montpellier, ICM, France
  • Vincent Blomen, NKI, Netherlands
  • Peter Májek, CeMM, Austria
  • Lucas Jae, NKI, Netherlands
  • Johannes Bigenzahn, CeMM, Austria
  • Joppe Nieuwenhuis, NKI, Netherlands
  • Jacqueline Staring, NKI, Netherlands
  • Roberto Sacco, CeMM, Austria
  • Ferdy van Diemen, NKI, Netherlands
  • Nadine Olk, CeMM, Austria
  • Alexey Stukalov, CeMM, Austria
  • Caleb Marceau, Stanford University School of Medicine, United States
  • Hans Janssen, NKI, Netherlands
  • Jan Carette, Stanford University School of Medicine, United States
  • Keiryn Bennett, CeMM, Austria
  • Giulio Superti-Furga, CeMM, Austria
  • Thijn Brummelkamp, NKI, Netherlands

Area Session Chair: Judith Blake

Presentation Overview: Show

Among the many things one might want to know about a human cell, the list of its indispensable components, i.e. genes, is of great interest. Due to technical barriers, transposition of pioneering work done in yeast has taken years. We present a first genome-wide mutational screen conducted in human haploid cells that unraveled ~2000 genes required for fitness in culture condition. Bioinformatic analyses were performed to extract global characteristic of human essential genes and the interactions the have with other genes. By performing similar screens on cells depleted of specific genes we could obtain a synthetic lethality network around the secretory pathway, thus providing a first genetic interaction network in human cells obtained by mutagenesis.

Finally, we will comment on differences and similarities with concomitant essential gene lists published by two other groups (Wang et al., Science, 2015; Hart et al., Cell, 2015).

TP078: Mogrify: a predictive system for cell reprogramming
    Cancelled
Date:Monday, July 11 4:10 pm - 4:30 pm
Room: Northern Hemisphere E1/E2
Topic: GENES / SYSTEMS
  • Owen Rackham, Duke-NUS, Singapore
  • Jaber Firas, Monash University, Australia
  • Jose Polo, Monash University, Australia
  • Julian Gough, University of Bristol, United Kingdom

Area Session Chair: Judith Blake

Presentation Overview: Show

Transdifferentiation, the process of converting from one cell type to another without going through a pluripotent state, has great promise for regenerative medicine. The identification of key transcription factors for reprogramming is currently limited by the cost of exhaustive experimental testing of plausible sets of factors, an approach that is inefficient and unscalable. Here we present a predictive system (Mogrify http://mogrify.net) that combines gene expression data with regulatory network information to predict the reprogramming factors necessary to induce cell conversion. We have applied Mogrify to over 300 human cell types and tissues, defining an atlas of cellular reprogramming. Mogrify correctly predicts the transcription factors used in known transdifferentiations. Furthermore, we validated two new transdifferentiations predicted by Mogrify. We provide a practical and efficient mechanism for systematically implementing novel cell conversions, facilitating the generalization of reprogramming of human cells. Predictions are made available to help rapidly further the field of cell conversion.

TP079: Compressive Mapping for Next-Generation Sequencing
Date:Tuesday, July 12 10:10 am - 10:30 am
Room: Northern Hemisphere A1/A2
Topic: GENES
  • Deniz Yorukoglu, Massachusetts Institute of Technology, United States
  • Yun William Yu, Massachusetts Institute of Technology, United States
  • Jian Peng, University of Illinois at Urbana-Champaign, United States
  • Bonnie Berger, Massachusetts Institute of Technology, United States

Area Session Chair: Scott Markel

Presentation Overview: Show

The high cost of mapping next-generation sequencing (NGS) read data onto a reference is a major bottleneck to sequencing analysis pipelines. We introduce COmpressive Read-mapping Accelerator (CORA), a framework that first maps reads to reads and reference to reference, exploiting inherent redundancies in both read and reference sequences, to accelerate read to reference mapping. We use this framework to map paired-end reads from the 1000 Genomes Project to the human reference, eliminating redundant sequence comparisons and improving time and sensitivity by orders of magnitude, particularly for multi-reads. The relative speed advantage of our approach will increase with the explosion of NGS data and advances in sequencing technologies, allowing researchers to keep pace with this data onslaught.

TP085: ADAGE-Based Extraction of Biological Context from Public Gene Expression Data
Date:Tuesday, July 12 10:50 am - 11:10 am
Room: Northern Hemisphere A1/A2
Topic: GENES / DATA
  • Jie Tan, Geisel School of Medicine at Dartmouth, United States
  • John Hammond, Geisel School of Medicine at Dartmouth, United States
  • Deborah Hogan, Geisel School of Medicine at Dartmouth, United States
  • Casey Greene, University of Pennsylvania, United States

Area Session Chair: Scott Markel

Presentation Overview: Show

In this talk, I will introduce the overarching question that I’m addressing in my thesis: “How do we extract biological patterns from heterogeneous public gene expression data using unsupervised methods.” To address this challenge, we recently developed and published ADAGE (Analysis using Denoising Autoencoders for Gene Expression) in the journal mSystems. ADAGE is a method based on deep learning that extracts features representing biological states of an organism from the organism’s complete expression compendium without requiring pathway annotations or other curated knowledge. In this talk, I’ll primarily highlight the ADAGE method, and I’ll demonstrate how ADAGE can be applied to analyzing new RNA-Seq datasets. I’ll cover how ADAGE can be used to generate new hypotheses about how different environments activate distinct pathways. I’ll wrap up by mentioning an upcoming contribution: an approach that we call eADAGE that significantly improves the abundance and completeness of pathways extracted by ADAGE.

TP086: Precision drug repurposing and multi-target drug design using structural systems pharmacology
Date:Tuesday, July 12 10:50 am - 11:10 am
Room: Northern Hemisphere A3/A4
Topic: PROTEINS / DISEASE
  • Thomas Hart, Rockefeller University, United States
  • Shihab Dider, Hunter College, CUNY, United States
  • Weiwei Han, Jilin University, China
  • Hua Xu, University of Texas Health Center, United States
  • Zhongming Zhao, University of Texas Health Center, United States
  • Philip Bourne, National Institute of Health, United States
  • Lei Xie, Hunter College, The City University of New York, United States

Area Session Chair: Natasa Przulj

Presentation Overview: Show

Precision medicine is an emerging method for disease treatment. However, its advance is hindered by a lack of mechanistic understanding of the energetics and dynamics of genome-wide drug-target and genetic interactions. To address this challenge, we have developed a novel structural systems pharmacology approach to elucidate molecular basis and genetic biomarkers of drug action. We have applied our approach to repurposing metformin, an anti-diabetes drug, for precision anti-cancer therapy. Through searching the human structural proteome, we identified putative metformin binding targets, and experimentally verified the predictions. Subsequently, we linked these binding targets to genes whose expressions are altered by metformin through protein-protein interactions, and identified network biomarkers of drug phenotypic response. The key nodes in genetic networks are largely consistent with the existing experimental evidence. Their interactions can be affected by the observed cancer mutations. This study demonstrates that structural systems pharmacology is a powerful tool for precision medicine.

TP093: Leveraging electronic medical records for systematic drug repositioning
Date:Tuesday, July 12 12:00 pm - 12:20 pm
Room: Northern Hemisphere E1/E2
Topic: DISEASE / DATA
  • Hyojung Paik, UCSF, United States
  • Ah-Young Chung, Korea University, Korea, Republic of
  • Hae-Chul Park, Korea University, Korea, Republic of
  • Rae Woong Park, Ajou University, Korea, Republic of
  • Kyoungho Suk, Kyungpook National University, Korea, Republic of
  • Atul Butte, UCSF, United States
  • Jihyun Kim, Ajou University, Korea, Republic of
  • Hyosil Kim, Ajou University, Korea, Republic of

Area Session Chair: Yves Moreau

Presentation Overview: Show

Prediction of new disease indications for approved drugs by computational methods has been based largely on the genomics signatures of drugs and diseases. We propose a method for drug repositioning that uses the clinical signatures extracted from electronic medical records of a tertiary hospital, including > 9.4 M laboratory tests from > 530,000 patients, in addition to diverse genomics signatures. Cross-validation shows this approach outperforms various predictive models based on genomics signatures. The prediction suggests that terbutaline sulfate, which is widely used for asthma, is a promising candidate for amyotrophic lateral sclerosis for which there are few therapeutic options. In vivo tests, terbutaline sulfate prevents defects in neuromuscular degeneration, and also have a therapeutic potential. Cotreatment with a b2-adrenergic receptor antagonist, butoxamine, suggests that the effect of terbutaline is mediated by activation of b2-adrenergic receptors. Our approach suggests that EMRs are valuable resources for discovering novel indications of drugs.

TP094: Fast and accurate computation of differential splicing across multiple conditions
Date:Tuesday, July 12 12:20 pm - 12:40 pm
Room: Northern Hemisphere A1/A2
Topic: GENES
  • Jc Entizne, Pompeu Fabra University, Spain
  • A Pages, Pompeu Fabra University, Spain
  • Jl Trincado, Pompeu Fabra University, Spain
  • Gp Alamancos, Pompeu Fabra University, Spain
  • M Skalic, Pompeu Fabra University, Spain
  • N Bellora, Pompeu Fabra University, Spain
  • Eduardo Eyras, Pompeu Fabra University, Spain

Area Session Chair: Scott Markel

Presentation Overview: Show

Abstract

Alternative splicing plays an essential role in many cellular processes in eukaryotes and high-throughput RNA sequencing has allowed genome-wide studies of splicing across multiple conditions. However, the increasing number of data sets represents a major computational challenge and there are no dedicated tools for the study of splicing changes across multiple conditions. We describe SUPPA (Alamancos et al. 2015), a computational tool to calculate relative inclusion values of alternative splicing events from transcript quantification. Using simulated and experimental datasets, SUPPA achieves similar accuracies compared to standard methodologies but is thousand times faster. We extended SUPPA to calculate differential splicing across multiple conditions. Applied to data across different stages of cell differentiation SUPPA uncovers new splicing regulatory networks governing specific cell fates. SUPPA facilitates the study of splicing regulation across multiple conditions with large number of samples with limited computational resources.

Impact

Alternative pre-mRNA splicing diversifies the repertoire of transcripts in multicellular organisms, thereby providing a complex layer of gene regulation. There is increasing evidence that alternative splicing plays a crucial role in development and disease, and it has been identified as a key regulatory mechanism capable of triggering undifferentiated cell states (Gabut et al. 2011, Han et al. 2013). High-throughput sequencing technologies allow the determination of splicing patterns across multiple conditions, but poses major computational challenges. SUPPA meets these challenges by allowing for fast computation of splicing patterns across multiple conditions. SUPPA’s accuracy has been extensively tested using RNA sequencing data for a 23-point time-course of Arabidopsis plants transferred from 20°C to 4°C, and comparing with a RT-PCR platform using the same samples (Zhang et al. 2015). This has moreover facilitated the identification of new splicing changes in response to temperature. We have applied SUPPA to data across different stages of cell differentiation in human to uncover novel regulatory programs of pluripotency controlled by RNA binding proteins. In summary, SUPPA provides a powerful mean to uncover new relevant gene regulatory mechanisms and allows the systematic analysis of splicing by small labs with limited computational resources (Sebestyen et al. 2016). Finally, SUPPA is developed in Python and is an open source project with multiple contributors (https://bitbucket.org/regulatorygenomicsupf/suppa).

Alamancos et al. (2015). RNA 21(9):1521-31.
Zhang et al. (2015). New Phytol 208(1):96-101
Sebestyen et al. (2016) http://biorxiv.org/content/early/2015/08/02/023010
Gabut et al. (2011). Cell 147, 132–146
Han et al. (2013). Nature. 20113;498(7453):241-5.

TP100: GeneiASE: Detection of conditiondependent and static allele-specific expression from RNA-seq data without haplotype information
Date:Tuesday, July 12 2:20 pm - 2:40 pm
Room: Northern Hemisphere A1/A2
Topic: GENES
  • Daniel Edsgärd, KTH Royal Institute of Technology, Sweden
  • Maria Jesus Iglesias, KTH Royal Institute of Technology, Sweden
  • Sarah-Jayne Reilly, Karolinska Institute, Sweden
  • Anders Hamsten, Karolinska Institute, Sweden
  • Per Tornvall, Karolinska Institutet, Sweden
  • Jacob Odeberg, Karolinska Insitutet, Sweden
  • Olof Emanuelsson, KTH Royal Institute of Technology, Sweden

Area Session Chair: Janet Kelso

Presentation Overview: Show

Allele-specific expression (ASE) is the imbalance in transcription between maternal and paternal alleles at a locus and can be probed in single individuals using massively parallel DNA sequencing technology. Assessing ASE within a single sample provides a static picture of the ASE, but the magnitude of ASE for a given transcript may vary between different biological conditions in an individual. Such condition-dependent ASE could indicate a genetic variation with a functional role in the phenotypic difference. We developed a method, GeneiASE, to detect genes exhibiting static or condition-dependent ASE in single individuals. GeneiASE performed consistently over a range of read depths and ASE effect sizes, and did not require phasing of variants to estimate haplotypes. We applied GeneiASE on both our own and publicly available data sets, and validated a number of ASE cases using qPCR. GeneiASE is available at https://sourceforge.net/projects/geneiase/.

TP105: CD30 cell graphs of Hodgkin lymphoma are not scale-free—an image analysis approach
Date:Tuesday, July 12 2:40 pm - 3:00 pm
Room: Northern Hemisphere E1/E2
Topic: DATA / DISEASE
  • Hendrik Schäfer, Johann Wolfgang Goethe Universität, Germany
  • Tim Schäfer, Institute of Computer Science, Department of Molecular Bioinformatics, Germany
  • Joerg Ackermann, Johann Wolfgang Goethe Universität, Germany
  • Norbert Dichter, Institute of Computer Science, Department of Molecular Bioinformatics, Germany
  • Claudia Döring, Senckenberg Institute of Pathology, Germany
  • Sylvia Hartmann, Senckenberg Institute of Pathology, Germany
  • Martin-Leo Hansmann, Senckenberg Institute of Pathology, Germany
  • Ina Koch, Johann Wolfgang Goethe University Frankfurt am Main, Germany

Area Session Chair: Curtis Huttenhower

Presentation Overview: Show

In this talk, we present an investigation from the field of digital pathology. Using whole slide images, we analyzed the cell distribution of CD30 positive cells in Hodgkin lymphoma (HL). HL is a malignancy of the immune system that usually originates from B cells. For diagnosis, biopsies are taken from patients and immunostained. We detected cells in digitized versions of the images using a custom imaging pipeline. The spatial distribution of CD30 cells in the tissue was modeled as a CD30 cell graph. We found that the cell distribution in the tissue is not random. The cells show pronounced clustering in the tissue, which is higher for the lymphoma cases. The vertex degree distributions of the graphs could be modeled by the Gamma distribution, and thus were not scale-free. Our findings are a first step towards modeling the complex spatial interactions of different cell types in the lymph node.

TP109: PSAMM: A Portable System for the Analysis of Metabolic Models
Date:Tuesday, July 12 3:50 pm - 4:10 pm
Room: Northern Hemisphere A3/A4
Topic: SYSTEMS
  • Jon Steffensen, University of Rhode Island, United States
  • Keith Dufault-Thompson, University of Rhode Island, United States
  • Ying Zhang, University of Rhode Island, United States

Area Session Chair: Trey Ideker

Presentation Overview: Show

The broad application of genome-scale metabolic modeling has made it a useful technique for tackling fundamental questions in biological research and engineering. Today over 100 models have been constructed for organisms of diverse metabolic activities spanning all three kingdoms of life. These models, however, have been curated independently following different conventions. The maintenance of model consistency has been challenging due to the lack of consensus in model representation and the absence of integrated modeling software for associating mathematical simulations with the annotation and biological interpretation of metabolic models. To solve this problem, we developed a new software package, PSAMM, and a new model format that incorporates heterogeneous, model-specific annotation information into modular representations of model definitions and simulation settings. PSAMM provides significant advances in standardizing the workflow of model annotation and consistency checking. Compared to existing tools, PSAMM supports more flexible configurations and is more efficient in running constraint-based simulations.

TP110: A Low-Latency, Big Database System and Browser for Storage, Querying and Visualization of 3D Genomic Data
Date:Tuesday, July 12 4:10 pm - 4:30 pm
Room: Northern Hemisphere A1/A2
Topic: GENES / DATA
  • Alexander Butyaev, McGill University, Canada
  • Ruslan Mavlyutov, University of Fribourg, Switzerland
  • Mathieu Blanchette, McGill University, Canada
  • Philippe Cudré-Mauroux, University of Fribourg, Switzerland
  • Jérôme Waldispühl, McGill University, Canada

Area Session Chair: Janet Kelso

Presentation Overview: Show

Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures.
The 3D genome browser is available at http://3dgb.cs.mcgill.ca/.