All Highlights and Proceedings Track presentations are presented by scientific area part of the combined Paper Presentation schedule.
A full schedule of Paper Presentations can be found here.
KN1 - How Chromatin organization and epigenetics talk with alternative splicing
Room: Hall 1Date: Sunday, July 21Author(s): Gil Ast, Tel Aviv University, Israel
TOPKN3 - Sequencing based functional genomics (analysis)
Room: TBADate: Monday, July 22Author(s): Lior Pachter, University of California, Berkeley, United States
TOPKN4 - Searching for Signals in Sequences
Room: Hall 1Date: Monday, July 22Author(s): Gary Stormo, Washington University in St. Louis, United States
TOPKN5 - Results may vary: what is reproducible? why do open science and who gets the credit?
Room: Hall 1Date: Tuesday, July 23Author(s): Carole Goble, University of Manchester, United Kingdom
TOPKN6 - Protein Interactions in Health and Disease
Room: Hall 1Date: Tuesday, July 23Author(s): David Eisenberg, UCLA, United States
TOPLBR01 - Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies
Room: Hall 15.2Date: Sunday, July 21Author(s):Olga Troyanskaya, Princeton University, us
Qian Zhu, Princeton University, United States
Arjun Krishnan, Princeton University, United States
Young-suk Lee, Princeton University, United States
Directly dealing with multicellularity and heteorogeneity of human gene expression samples is paramount for understanding human homeostasis, disease manifestation and pharmacokinetics/pharmacodynamics. However, leveraging gene expression data through large-scale integrative analyses is challenging because most samples are not fully annotated to their tissue/cell-type of origin. A computational method to classify samples using their entire gene expression profiles is needed. Such a method must be applicable across thousands of independent studies, hundreds of gene expression technologies, and hundreds of diverse human tissues and cell-types. We present URSA (Unveiling RNA Sample Annotation) that leverages the complex tissue/cell-type relationships and simultaneously estimates the probabilities associated to hundreds of tissues/cell-types for any given gene expression profile. URSA provides accurate and intuitive probability values for expression profiles across independent studies and outperforms other methods irrespective of data preprocessing techniques. Moreover, without re-training, URSA can be used to classify samples from diverse microarray platforms and even from next generation sequencing technology. Finally, we provide a molecular interpretation for the tissue and cell-type models as the biological basis for URSA’s classifications.
TOPLBR02 - A Model-Based Analysis of GC-Biased Gene Conversion
Room: Hall 15.2Date: Sunday, July 21Author(s): John Capra, Vanderbilt University, us
John Capra, Vanderbilt University, United States
Melissa Hubisz, Cornell University, United States
Dennis Kostka, University of Pittsburgh, United States
Katherine Pollard, University of California, San Francisco, United States
Adam Siepel, Cornell University, United States
Interpreting patterns of DNA sequence variation between the genomes of closely related species is critically important to understanding the causes and functional effects of nucleotide substitutions. In addition to well-studied adaptive processes, like natural selection, other forces influence substitution patterns. GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that favors the fixation of strong (G/C) over weak (A/T) alleles. In mammals, gBGC is thought to promote variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations. It also has the potential to produce false positives in common tests for positive selection. However, because it is difficult to incorporate gBGC into existing statistical models of evolution, its genome-wide influence is poorly understood. In this work, we describe a new phylogenetic hidden Markov model that jointly models the effects of selection and gBGC and apply it to the human and chimpanzee genomes. We find that gBGC has influenced a small, but important fraction of these genomes. Fast evolving regions and disease-associated polymorphisms show significant enrichment for gBGC. Overall, our analyses indicate that gBGC has been an important force in recent human evolution, and our publicly available algorithms and predictions will enable other researchers to consider gBGC in their analyses.
TOPLBR03 - Determination of hormone induced structural changes in genomic topological domains
Room: Hall 15.2Date: Sunday, July 21Author(s): Davide Bau, Centro Nacional de Analisis Genomica, es
Davide Bau, Centro Nacional de Analisis Genomica, Spain
Marc Marti-Renom, Centro Nacional de Analisis Genomica, Spain
Advances in genomic technologies have allowed getting better insights into how the genome is organized inside the cell nucleus. Recently, it has been shown that chromatin is organized in Topologically Associating Domains (TADs), large interaction domains that appear to be conserved among different cell types. To determine whether these TADs have a functional role during the dynamic changes of gene expression in terminally differentiated cells, we studied the relationship between the spatial position of Progesterone (Pg) responsive genes and the TAD structure in breast cancer cells. Using Hi-C data, we found that the genome is organized into about 2,000 TADs. TADs were similarly positioned before and after hormone treatment; nonetheless the Pg induced some changes in the intra-TAD chromatin interactions. Unexpectedly, a large proportion of genes that responded similarly upon Pg treatment was clustered within individual TADs, indicating a topological segregation of Pg up- and down-regulation sites. Remarkably, hormone induced correlated epigenetic changes that spread over several 100kb, revealing regional remodeling of chromatin. Although consecutive TADs can be covered by one or more similar epigenetic changes, their combination differs among individual consecutive TADs, reflecting topologically restrained combinatory chromatin signatures. Integrative 3D modeling of the intra-TAD contacts before and after Pg stimulation further supports this hypothesis, showing dynamic structural changes correlated with the transcriptional response. Given the segregation of target genes in TADs and the fine-tuning of Pg induced chromatin changes, we propose that TADs behave as regulons enabling spatially proximal genes to be coordinately transcribed in response to hormone.
TOPLBR04 - Maximum Parsimony Interpretation of Chromatin Capture Experiments
Room: Hall 15.2Date: Sunday, July 21Author(s): Andrzej Kudlicki, University of Texas Medical Branch, us
Andrzej Kudlicki, UT Medical Branch, United States
Genome-wide chromatin conformation capture experiments allow characterizing the spatial structure of genome; however, existing methods of data processing provide no means of appreciating the variability between the cells in the sample. We present a novel algorithmic framework that addresses this problem by analyzing the geometric and topological characteristics of an experimental DNA contact network. Our method applied to the measurement of interactions in the yeast genome of Duan et al (2010) prove that indeed no homogeneous conformation can agree with the observed 3C contacts, and attempting to construct a homogeneous 3D model will lead to thousands of geometrically impossible structural motifs. The topological properties of the DNA contact network, along with Occam’s razor principle, are used to reconstruct the chromatin conformations characteristic of uniform subpopulations of cells confounding the experimental sample. Specifically, the individual chromatin states are inferred by analyzing and coloring a line graph representing geometrical conflicts within the DNA contact network, i.e., loci whose direct interpretation will lead to violation of the triangle inequality. We show that hundreds of thousands of conflicting interactions can be resolved by just a handful of chromatin states, and the the properties of these states point to different transcriptional programs being executed.
TOPLBR05 - A protein domain-centric approach for the comparative analysis of human and yeast phenotypically relevant mutations
Room: Hall 15.2Date: Sunday, July 21Author(s): Maricel Kann, UMBC, us
The body of disease mutations with known phenotypic relevance continues to increase and is expected to do so even faster with the advent of new experimental techniques such as whole- genome sequencing coupled with disease association studies. However, genomic association studies are limited by the molecular complexity of the phenotype being studied and the population size needed to have adequate statistical power. One way around this problem, which is critical for the study of rare diseases, is to study the functional patterns of known disease mutations. We have previously shown that the functional patterns of known human disease mutations have a significant tendency to cluster at protein domain positions, namely position-based domain hotspots of disease mutations. However, the limited number of known disease mutations remains the main factor hindering the advancement of mutation studies at a functional level. In this paper, we address this problem by incorporating mutations known to be disruptive of phenotypes in other species. Focusing on two evolutionarily distant organisms, human and yeast, we describe the first inter-species analysis of mutations of phenotypic relevance at the protein domain level. Our results show that phenotypic mutations from yeast cluster at specific positions on protein domains, a characteristic previously revealed to be displayed by human disease mutations. This first-of-a-kind study of phenotypically relevant yeast mutations in relation to human disease mutations demonstrates the utility of a multi-species analysis for advancing the understanding of the relationship between genetic mutations and phenotypic changes at the organismal level.
TOPLBR06 - Predicting the biochemical consequences of missense mutations using genome-wide homology modeling
Room: Hall 15.2Date: Sunday, July 21Author(s): Andrew Bordner, Mayo Clinic, us
Andrew Bordner, Mayo Clinic, United States
Barry Zorman, Mayo Clinic, United States
The discovery of which mutations contribute to a particular disease is an important biomedical problem with potential applications in drug discovery, disease diagnosis and prognosis, and the development of improved personalized therapies. To this end, we have developed a computational method that integrates complementary approaches for predicting the biochemical effects of missense mutations using genome-wide generation of homology models for human protein complexes. Mutations affecting diverse types of binding sites are identified by homology to available X-ray structures of complexes and machine learning classifiers while spatial clustering of mutations is used to detect other compact regions of the protein structure important for its function. A Random Forest classifier trained on results from these structure-based methods, as well as annotations from online databases, evolutionary conservation, and predicted stability changes was found to outperform current popular prediction methods. Finally, the predicted biochemical effects of mutations showed good agreement with experimental assays.
TOPLBR07 - Integrative modelling coupled with mass spectrometry (MS)-based approaches reveals the structure and dynamics of protein assemblies
Room: Hall 15.2Date: Sunday, July 21Author(s): Argyris Politis, Univeristy of Oxford, uk
Argyris Politis, University of Oxford, United Kingdom
In recent years, integrative structure determination of protein complexes has garnered great interest as a result of the vast amount of data obtained by different experiments. In particular integrative approaches have gained attention for studying highly heterogeneous and dynamic systems which remain refractory to structure determination by conventional methods. Key developments in emerging mass spectrometry (MS)-based techniques, such as native MS and ion mobility (IM)-MS, have led to their integration into the structural biologist’s pipeline. Here we present an integrative approach for structure determination of protein assemblies by combining native mass spectrometry (MS), ion mobility-MS and chemical cross-linking MS. The accuracy and confidence levels of this approach are demonstrated by encoding data from MS techniques into restraints for assembling a set of known hetero-complexes from their building blocks. This method enabled us to characterize the structures of two unknown precursors acting en route to the assembly of the AAA-ATPase base subcomplex within proteasome, a macromolecule responsible for the controlled degradation of intracellular proteins.
TOPLBR08 - The next generation of SCOP and ASTRAL
CancelledRoom: Hall 15.2Date: Sunday, July 21
Author(s): John-Marc Chandonia, Berkeley National Lab, us
The Structural Classification of Proteins (SCOP) database is a manually curated, near-comprehensive ordering of domains from proteins of known structure in a hierarchy according to their structural and evolutionary relationships. The ASTRAL compendium is a collection of software and databases, closely related to SCOP, that is used to aid research into protein structure and evolution. We released new versions of both SCOP and ASTRAL (1.75B) in January 2013. The new releases are the second in a series of stable SCOP and ASTRAL releases based on SCOP 1.75. New versions of both databases are presented to the public through a single, unified interface (http://scop.berkeley.edu/). New features include a SQL-based infrastructure and build procedure, a fully automated classification scheme for new PDB entries that are similar to previously classified entries, and periodic incremental releases to supplement the stable releases. More than 11,300 new PDB entries have been added since SCOP 1.75, without sacrificing the reliability that SCOP has accumulated through years of careful manual curation. We plan to introduce additional features in a series of stable releases, while a major reclassification (SCOP 2.0) is in progress.
TOPLBR09 - Computational methods to preclude switch-like behavior: analysis of the Biomodels database
Room: Hall 15.2Date: Monday, July 22
Author(s): Elisenda Feliu, University of Copenhagen, dk
Miguel A Alejo, University of Copenhagen, Denmark
Carsten Wiuf, University of Copenhagen, Denmark
The number of states in which a cell can be at any given time is linked to the flexibility in its decision making and to cell-to-cell variability. Particularly, bi- and multistable cellular systems provide mechanisms for rapidly switching between different responses. Identifying whether a system exhibits multistable behavior or not is, however, challenging. The theoretical determination of small motifs in gene regulatory networks and signaling pathways that can exhibit multistationarity has been the focus of several studies in the past. However, it remains unclear to what extend these motifs are actually highly represented in living cells.
We have developed a computational method that gives a necessary condition for a system to exhibit multistationarity. If a system is multistationary, we can screen all small subnetworks and determine the key components in multistationarity. We have applied the method to 365 models extracted from the publicly available database Biomodels with data precomputed in PoCab. In this way, we have obtained a catalog of small motifs responsible for multistationarity in real systems.
At the conference, the method will be briefly described and the exhaustive analysis of the Biomodels database, including the small structures causing multistationarity, will be presented
TOPLBR10 - Efficient Modeling and Active Learning of Biological Responses: Learning without Prior Knowledge
Room: Hall 15.2Date: Monday, July 22Author(s):Robert Murphy, Carnegie Mellon University, us
Armaghan Naik, Carnegie Mellon University, United States
Joshua Kangas, Carnegie Mellon University, United States
Devin Sullivan, Carnegie Mellon University, United States
Christopher Langmead, Carnegie Mellon University, United States
Robert Murphy, Carnegie Mellon University, United States
High throughput screening involves determination of the effect of many chemical compounds on a given cellular target. As currently practiced, a full set of measurements for all compounds for each new target is typically made, with little use of information from previous screens. To efficiently study compound effects on many targets, a means is needed for determining and exploiting similarities in the effects of compounds and/or behavior of targets such that measurements of all combinations of compounds and targets are not needed to achieve high accuracy. Here, we describe probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for selecting future informative batches of experiments. Through extensive simulated experiments we showed that our approaches can produce powerful predictive models and learn them significantly faster than can be done by random choice. We further characterized our method’s performance experimentally using a collection of 48 compounds and 48 NIH 3T3 cell clones expressing different GFP-tagged proteins; the learner’s task was to efficiently build a model of the effects of each compound on each clone. Since none of the effects were known prior to beginning the experiments, each clone and compound was silently duplicated to provide the ability to check how well duplicates were recognized. The learner could to request acquisition of batches of image data for specific combinations of drugs and clones using liquid handling robotics and an automated microscope. Our method achieved a 92% accuracy having only sampled 28% of the experiment space.
TOPLBR11 - De novo reconstruction of cell cycle progression using Tour-Recovered Automatic models for Cellular Continuums (TRACC) on multiparameter flow cytometry data
Room: Hall 15.2Date: Monday, July 22Author(s): Tiffany Chen, Stanford University, us
Tiffany Chen, Stanford University, United States
Matthew Clutter, Stanford University, United States
Nikesh Kotecha, Stanford University, United States
Karen Sachs, Stanford University, United States
Wendy Fantl, Stanford University, United States
Garry Nolan, Stanford University, United States
Serafim Batzoglou, Stanford University, United States
Most cell-based drug screening methods identify and evaluate potential drug candidates based on measurements of cell death or target inhibition. Using these approaches, the global impact of these drug candidates on cell cycle and signaling networks is greatly deemphasized, even though quantitative analysis of the cell cycle is fundamental to most anti-cancer drug development. Single-cell multiparameter flow cytometry can simultaneously measure intracellular proteins including those participating in the cell cycle and signaling pathways. To date, however, no automated, data-driven method exists for processing such biologically complex measurements. To address this need, we developed Tour-Recovered Automatic models for Cellular Continuums (TRACC), a computational methodology for automatically reconstructing the cell cycle de novo from flow cytometry data. TRACC reconstructs cell cycle progression without prior expert knowledge, thus setting a foundation for automated cell cycle analysis.
TOPLBR12 - Network-based stratification of tumor mutations
Room: Hall 15.2Date: Monday, July 22Author(s): Matan Hofree, UCSD, us
John Shen, University of California, San Diego, United States
Hannah Carter, University of California, San Diego, United States
Andy Gross, University of California, San Diego, United States
Trey Ideker, University of California, San Diego, United States
Many forms of cancer consist of multiple subtypes with different molecular causes and clinical outcomes. Somatic tumor genomes provide a rich new source of data for uncovering these subtypes, but have proven difficult to compare as two tumors rarely share the same mutations. Here, we introduce ‘network-based stratification’(NBS) which integrates somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients who have mutations within similar network regions. We demonstrate the validity of this approach in simulation. Next, we apply the method to somatic mutation data from three cancer patient cohorts collected as part of The Cancer Genome Atlas - ovarian cancer(OV), breast cancer(BRCA) and uterine cancer(UCEC) and are able to discover a robust cluster assignment significantly associated with important clinical phenotypes. In BRCA we recover subtypes significantly correlated with known subtypes and other clinical makers. In UCEC subtypes segregate patients into distinct sets enriched for tumor grade and histology. In OV subtypes are associated with patient survival and acquired resistance to platinum chemotherapy. We use the OV subtypes to define a predictive signature based on gene expression which successfully recovers the somatic mutation derived subtypes in an independent expression cohort. Finally, we use the subtypes derived in each cohort to highlight potentially dysregulated subnetworks characteristic of each mutation derived subtypes. This study provides a proof of principle for the utility of combining somatic mutation genotypes with interaction networks, enabling the discovery of clinically meaningful mutation based subtypes.
TOPLBR13 - Comparison of D. melanogaster and C. elegans Developmental Stages by modENCODE RNA-Seq data
Room: Hall 15.2Date: Monday, July 22
Author(s): Steven Brenner, University of California, Berkeley, us
Jingyi Jessica Li, University of California, Berkeley, United States
Haiyun Huang, University of California, Berkeley, United States
Peter Bickel, University of California, Berkeley, United States
Steven Brenner, University of California, Berkeley, United States
Drosophila melanogaster and Caenorhabditis elegans are two well-studied model organisms in developmental biology. Their morphological development differ greatly, yet we postulated that there may nonetheless be underlying shared developmental programs employing orthologous genes. We used modENCODE RNA-Seq data to perform a transcriptome-wide comparison of their developmental time courses to address this question. Our approach centers on using stage-associated orthologous genes to link the two organisms. For every stage in each organism, we select stage-associated genes which are defined as relatively highly expressed at that stage compared with others. We tested the dependence of a pair of D. melanogaster and C. elegans stages in terms of orthologous gene expression—the number of orthologous gene pairs associated with both stages.
We first carried out the test on pairs of stages within D. melanogaster and C. elegans respectively, and we found that temporally adjacent stages in both species exhibit high dependence in gene expression, supporting the validity of this approach. When comparing fly with worm, we observed a strong colinearity of their developmental time courses from early embryos to late larvae. Another parallel collinear pattern is found between fly white prepupae through adults and worm late embryos through adults. Investigating stage-associated genes overlapped between stages shows that many- to-one fly-worm orthologs are key factors leading to the two collinear patterns. Some orthologs are known to play similar roles in both organisms, and their mapping in this study may help inform their functions in the development of D. melanogaster and C. elegans.
TOPLBR14 - Phylogenetic quantification of intra-tumour heterogeneity
Room: Hall 15.2Date: Monday, July 22Author(s): Roland Schwarz, European Molecular Biology Laboratory, uk
Intra-tumour heterogeneity (ITH) is currently the focus of cancer
research due to its implications for disease progression, resistance
development and its impact on personalised medicine
approaches. Understanding the aetiology of ITH involves reconstructing
the evolutionary history of cancer within the patient. Especially with
respect to genomic rearrangements this is impeded by changing
cellularity, unknown phasing of genomic variants and the fact that
genomic rearrangement events cover large often overlapping segments of
the genome.
In this study we have assembled a novel clinical dataset of 170 copy
number (CN) profiles from 20 patients undergoing neoadjuvant
chemotherapy for high-grade serous ovarian cancer. Patients were
sampled at multiple distinct sites at biopsy, interval debulking
surgery and relapse. We have developed MEDICC, a novel phylogenetic
method for reconstruction of evolutionary trees based on genomic
rearrangements. Employing state-of-the art machine learning techniques
we phase parental alleles, reconstruct trees and ancestral genomes and
at the same time numerically quantify the degree of ITH and clonal
expansion in each patient. Correlation of these indices with clinical
endpoints such as progression free survival shows how the amount of
genomic change in the course of chemotherapy, and the degree of clonal
expansion determine patient survival times.
Our study is the first to combine rigorous evolutionary methodology
and with a novel clinical dataset of a large patient cohort to
quantify ITH in a rigorous and unbiased manner. We combine insights
from natural language processing with spatial statistics to quantify
biologically meaningful indices of cancer progression in a coherent
translational setting.
TOPLBR15 - The Yule-Simpson effect casts doubt on DNA methylation differences at functional boundaries
Room: Hall 15.2Date: Monday, July 22Author(s):Lior Pachter, University of California, Berkeley, us
Meromit Singer, UC Berkeley, United States
Genome-wide functional assays based on high-throughput sequencing now allow for experimental probing of a wide variety of molecular phenotypes. Among these is DNA methylation, which can be probed at all CpG sites in the genome using bisulfite sequencing. This has allowed for comparisons of methylation extent in different functional regions by first averaging methylation states within region types and then comparing averages between regions. Such comparisons have become commonplace in genome-wide DNA methylation studies. For example, it has been repeatedly reported that the methylation extent is significantly higher in coding regions as compared to introns or UTRs. We report and characterize a bias present in these seemingly straightforward comparisons that is a special case of the Yule-Simpson's effect and show it has extensively altered the magnitude and significance of DNA methylation differences observed and reported from such comparative studies. The bias we discuss arises from the dependance of the sparsity of CpG sites on the extent of evolutionary pressure at a region, together with its overall methylation state. We present a correction utilizing a matrix completion algorithm that is based on a methylation model and show how it affects reported results regarding differences in DNA methylation across functional regions.
TOPLBR16 - Epigenetic mechanisms underlying human T helper cell differentiation
Room: Hall 15.2Date: Monday, July 22
Author(s): Harri Lähdesmäki, Aalto University, fi
David Hawkins, University of Washington School of Medicine, United States
Antti Larjo, Aalto University, Finland
Subhash Tripathi, University of Turku and Åbo Akademi University, Finland
Ulrich Wagner, Ludwig Institute for Cancer Research, University of California San Diego, United States
Ying Luu, Ludwig Institute for Cancer Research, University of California San Diego, United States
Tapio Lönnberg, University of Turku and Åbo Akademi University, Finland
Sunil Raghav, University of Turku and Åbo Akademi University, Finland
Leonard Lee, Ludwig Institute for Cancer Research, University of California San Diego, United States
Riikka Lund, University of Turku and Åbo Akademi University, Finland
Harri Lähdesmäki, Aalto University, Finland
Bing Ren, Ludwig Institute for Cancer Research, University of California San Diego, United States
Riitta Lahesmaa, University of Turku and Åbo Akademi University, Finland
Multipotent CD4+ T cells are central to the adaptive immune system. CD4+ T cells can differentiate to functionally distinct effector subtypes such as T helper 1 (Th1), Th2, Th17, and iTreg. In this study, we have focused on identification of histone modifications (H3K4me1, H3K27ac, H3K4me3) that define the cell-type specific functional cis-regulatory repertoire for early differentiating human Th1 and Th2 cells. Additionally, we have integrated genome-wide digital gene expression analysis from the Helicos platform to correlate epigenetic information with gene expression. We also overlay the identified enhancer regions with open chromatin sites (DNase-seq) from fully differentiated T cells to characterize whether early enhancers are active only during the early lineage specification or remain active in committed Th cells. By analyzing transcription factor binding sites at enhancers we are able to identify known and novel transcriptional regulators which drive the lineage determination. Lastly, under the principle that improper cell fate specification can lead to immunopathogenesis, we found within these lineage-specific enhancers a great number of SNPs from genome-wide association studies (GWAS) that were associated with various autoimmune disorders including T1D, rheumatoid arthritis, Crohn’s disease, and asthma. Several alter transcription factor binding site motifs, and using DAPA experiments we show for a subset of such SNPs within these predicted sites that they influence transcription factor binding. This study provides the first look at how enhancers can contribute to early human T cell lineage specification. Our results also provide insight into how regulatory SNPs may contribute to the disease pathogenesis.
TOPLBR17 - An assessment of the recovery of curated genetic variants through text mining
Room: Hall 10Date: Tuesday, July 23Author(s):Antonio Jimeno Yepes, National Information Communications Technology Australia, au
Karin Verspoor, NICTA, Australia
We assess a mutation extraction tool with respect to the task of curation of the literature for the purpose of populating a database of genetic variation information. Our analysis shows that the ability of text mining tools to recover the mutations catalogued in the databases is far less than what would be expected based on the typically excellent performance of such tools on intrinsic evaluation. While lack of access to the full text of publications has been argued to explain this phenomenon, we show show that the effect persists even when the full text article that was indicated to be the direct source of a mutation in a curated resource is available for processing. We explore several possible explanations for these results, including difficulties in linking genetic variants to specific genes, and the inclusion of data from high-throughput experiments. The results of our work have implications for the future development of text mining systems for genetic variation.
TOPLBR18 - Quantification of Cell-to-cell Variability in Protein Spatial Spread from Fluorescence Microscopy of Unsynchronized Budding Yeast
Room: Hall 10Date: Tuesday, July 23Author(s): Louis-Francois Handfield, University of Toronto, ca
Louis-Francois Handfield, University of Toronto, Canada
Alan Moses, University of Toronto, Canada
The characterization of protein abundance and stochastic abundance has been systematically defined in budding yeast using fluorescently tagged proteins. Subcellular location can also be systematically uncovered using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. As an alternative, we capture cell stage dependence of protein spatial expression within automatically identified cells. We use the identified the bud area as cell-stage indicator. We show that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Further analysis reveals that such a characterization allows identify a 12% of the 4004 proteins by finding the protein that is closest in expression pattern in a replicate experiment. This characterization includes stochasticity levels in measurement, which are correlated with previous reports in the case of stochasticity in protein abundance. Other stochasticity levels, such as in compactness for protein expression, are shown to be reproducible. Changes in cell morphology due to the alpha factor mating pheromone or changes of fluorescents markers required for segmentation also have a limited impact on the measured variability levels. Our results suggest that quantitative cell-stage dependent representations of protein spread discriminates protein spatial expressions without requiring predefined subcellular location classes. We show that some major quantified deviations, such as high spatial variability, are systematically detected under a spectrum of experimental conditions.
TOPLBR19 - ARepA: automated repository acquisition for standardized high-throughput data retrieval, normalization, and analysis
Room: Hall 10Date: Tuesday, July 23
Author(s): Daniela Boernigen, Harvard School of Public Health, Harvard University, us
Daniela Boernigen, Harvard School of Public Health, United States
Yo Sup Moon, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Eric Franzosa, Harvard School of Public Health, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Biological databases of high-throughput experimental results provide vast and growing resources for medical, and bioinformatic research. Open questions remain in how best to maintain such resources, access them computationally, meta-analyze their contents from hundreds of experiments, and do so reproducibly while maintaining computational best practices.
We present ARepA, an extensible, modular Automated Repository Acquisition system for reproducible biological data acquisition and processing. ARepA allows configurable data access for any organism(s) from the GEO, IntAct, BioGRID, RegulonDB, STRING, Bacteriome, and MPIDB databases. A user can retrieve raw data and metadata from these repositories, normalize data files, and automatically process them in standardized ways (e.g. for network analysis). When retrieving data from six model organisms, ARepA currently produces more than 2M interactions (600K physical interactions, 4K regulatory interactions, 1.5M functional associations) and 2.7K gene expression data sets covering approx. 800K samples, accompanied by corresponding metadata and derived network data.
We include biological examples demonstrating the utility of ARepA for integrative analyses. When focusing on human data, ARepA's metadata database allowed us to identify and standardize 12 human prostate cancer gene expression datasets from GEO, which were subsequently meta-analyzed across six different platforms. A subsequent co-expression network analysis correctly recovered the NfκB signaling pathway along with new candidate genes with roles in prostate cancer. A similar example in mouse integrates 11 gene expression datasets selected by querying ARepA for metadata indicating germ-free and intestinal tissue conditions. Finally, multiple data types from three model microbes were integrated to assess differences in peptide secretion systems.
TOPLBR20 - Deciphering the Gene Expression Code via a Combined Synthetic-Computational Biology Approach
Room: Hall 10Date: Tuesday, July 23Author(s): Tamir Tuller, Tel Aviv University, il
One of the greatest challenges of functional genomics is to decipher the way information encoded in that transcript affects various aspects of its expression regulation. Since it is impossible to determine the causality based on the analysis of endogenous sequence features and expression levels we suggest a combined and novel computational-synthetic biology approach. The talk will survey large scale synthetic biology experiments for understanding three aspects of gene expression: 1) splicing, 2) translation elongation; 3) translation initiation from out-of-frame codons; in each experiment a specific library including hundreds of heterologous genes has been tailored to tackle the corresponding question, expression levels of all the library genes have been expressed in S. cerevisiae, and the results were computationally analyzed.
Among others, our analyses emphasize the contribution of local folding strength in different parts of the transcript, and the position and distribution of codons to splicing and translation efficiency and fidelity. In addition, we report novel sets of enhancer and silencer sequence motifs that contribute to various aspects of translation and splicing regulation.
I will also explain how the results inferred in the three studies are integrated, and compared to existing computational biophysical models of gene expression, and will compare the obtained results to the ones reported recently via an evolutionary systems biology analysis of endogenous genes.
TOPLBR21 - Experimental characterization of the human non sequence-specific nucleic acid interactome
Room: Hall 10Date: Tuesday, July 23Author(s): Jacques Colinge, CeMM, at
Gerhard Dürnberger, CeMM, Austria
Tilmann Bürckstümmer, CeMM, Austria
Kilian Huber, CeMM, Austria
Roberto Giambruno, CeMM, Austria
Evren Karayel, CeMM, Austria
Thomas Burkard, CeMM, Austria
Ines Kaupe, CeMM, Austria
Andre Müller, CeMM, Austria
Keiryn Bennett, CeMM, Austria
Tobias Doerks, EMBL, Germany
Peer Bork, EMBL, Germany
Andreas Schönegger, CeMM, Austria
Gehard Ecker, Uni Wien, Austria
Hans Lohninger, TU Wien, Austria
Giulio Superti-Furga, CeMM, Austria
Interactions between proteins and nucleic acids (NAs) play a pivotal role in a wide variety of essential biological processes. Transcription factors that recognize specific DNA motifs only constitute part of the NA-binding proteins (NABPs). In this study, we present the first large-scale effort to systematically map human NABPs with generic classes of nucleic acids. Using 25 carefully designed synthetic DNA and RNA oligonucleotides as baits and affinity purification mass spectrometry (AP-MS), we performed pulldowns in three cell lines that yielded 10,000+ protein-NA interactions and involved 900+ proteins. Bioinformatic analysis allowed us to identify 139 new NABPs, to provide first experimental evidence for another 98, and to determine 513 specificities for 219 distinct NABPs for different subtypes of NAs.
Successful validation of 7/8 chosen new specificities confirmed the affinity of YB-1 for methylated cytosine. YB-1 is over-expressed in tumors and is associated with multiple drug resistance. Network analysis of YB-1 ChIP-seq peak nearest genes identified a subnetwork of 73 genes strongly associated with cancer pathways, thereby suggesting a potential epigenetic role of YB-1 in resistant tumors.
We could also show that non sequence specific proteins binding DNA do interact with nucleic acid chains through an interface that is more constraint in its geometry than proteins binding mRNA, which are known to contain more disordered regions.
To extend the experimental data we undertook a machine learning approach to derive a method of automatically inferring nucleic acid binding. We employed a family of support vector machines (SVMs) to predict NA binding de novo.
TOPLBR22 - Sequence Determinants Govern the Translation Efficiency of the Secretory Proteome
Room: Hall 10Date: Tuesday, July 23Author(s): Michal Linial, The Hebrew University of Jerusalem, il
Shelly Mahlab, The Hebrew University, Israel
Translation must be tightly controlled for coping with the cell's demand and its limited resources. Energetically, translation is the most expensive operation in dividing cells. We applied a measure of tRNA adaptation index (tAI) as an indirect proxy for the translation rate. We tested the possibility that sequence determinants are encoded along the transcripts to govern translational efficiency. The secretory proteome comprises about 30% of the proteins in human and other multi-cellular model systems. Many of these proteins contain at their N’-terminal a segment that is called Signal Peptide (SP) which determines a translocation to the ER. Indeed, all SP-proteins are translated by ER-membrane bound ribosomes. We anticipated that proteins translated by free or bound ribosomes differ with respect to their overall translation speed. We demonstrate that clusters of poorly adapted codons followed by abundant codons specify the N’-terminal of secreted and SP-membranous proteins. The phenomenon is generalized to the proteomes of yeast, fly and worm despite a poor correlation among their codon tAI values. We propose that translation determinants are evolved to match the cellular needs for translational rate. The codons’ arrangement along transctipts is crucial for management of synaptic sites and poorly folded protein translation. The appearance of low tAI codons at the N'-terminal of SP proteins attenuates the elongation rate. We conclude that processes such as translocation through the ER membrane, processing, maturation and folding are dependent on a specific codon arrangement that dictates a delay in translational elongation.
TOPLBR23 - SH3 Interactome Conserves General Function Over Specific Form
Room: Hall 10Date: Tuesday, July 23Author(s): David Gfeller, Swiss Institute of Bioinformatics, ch
David Gfeller, Swiss Institute of Bioinformatics, Switzerland
Xiaofeng Xin, University of Toronto, Canada
Jackie Cheng, University of California Berkeley, Canada
Raffi Tonikian, University of Toronto, Canada
Charles Boone, University of Toronto, Canada
Sachdev Sidhu, University of Toronto, Canada
Gary Bader, University of Toronto, Canada
SH3 domains bind peptides to mediate protein-protein interactions that assemble and regulate dynamic biological processes. We surveyed the repertoire of SH3 binding specificity using peptide phage display in a metazoan, the worm Caenorhabditis elegans, and discovered that it structurally mirrors that of the budding yeast Saccharomyces cerevisiae. We then mapped the worm SH3 interactome using stringent yeast two-hybrid and compared it to the equivalent map for yeast. We found that the worm SH3 interactome resembles the analogous yeast network because it is significantly enriched for proteins with roles in endocytosis. Nevertheless, orthologous SH3 domain mediated interactions are highly rewired. Our results suggest a model of network evolution where general function of the SH3 domain network is conserved over its specific form.
TOPOPT01 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT02 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT03 -
CancelledRoom: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT04 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT05 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT06 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT07 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT08 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT09 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT10 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT11 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT12 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT13 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT14 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT15 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT16 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT17 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT18 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT19 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT20 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT21 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT22 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT23 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPOPT24 -
Room: ICC Lounge 81Date: Sunday, July 21Author(s): ,
TOPPP01 (PT) - Simple Topological Properties Predict Functional Misannotations in a Metabolic Network
Room: Hall 4/5Date: Sunday, July 21Author(s):Rodrigo Liberal, Imperial College London, United Kingdom
John Pinney, Imperial College London, United Kingdom
Session Chair: Erik Bongcam-Rudloff
Motivation: Misannotation in sequence databases is an important
obstacle for automated tools for gene function annotation, which
rely extensively on comparison to sequences with known function.
To improve current annotations and prevent future propagation of
errors, sequence-independent tools are therefore needed to assist
in the identification of misannotated gene products. In the case
of enzymatic functions, each functional assignment implies the
existence of a reaction within the organism’s metabolic network;
a first approximation to a genome-scale metabolic model can
be obtained directly from an automated genome annotation. Any
obvious problems in the network, such as dead-end or disconnected
reactions, can therefore be strong indications of misannotation.
Results: We demonstrate that a machine learning approach using
only network topological features can successfully predict the validity
of enzyme annotations. The predictions are tested at 3 different
levels. A random forest using topological features of the metabolic
network and trained on curated sets of correct and incorrect enzyme
assignments was found to have an accuracy of up to 86% in 5-fold
cross validation experiments. Further cross validation against unseen
enzyme superfamilies indicates that this classifier can successfully
extrapolate beyond the classes of enzyme present in the training
data. The random forest model was applied to several automated
genome annotations, achieving an accuracy of 60% in most cases
when validated against recent genome-scale metabolic models. We
also observe that when applied to draft metabolic networks for
multiple species, a clear negative correlation is observed between
predicted annotation quality and phylogenetic distance to the major
model organism for biochemistry (Escherichia coli for prokaryotes
and Homo sapiens for eukaryotes).
Contact: j.pinney@imperial.ac.uk
TOPPP02 (HT) - Heart Attacks: Leveraging A Cardiovascular Systems Biology Strategy To Predict Future Outcomes
Room: Hall 7Date: Sunday, July 21Author(s): Carlo Vittorio Cannistraci, King Abdullah University of Science and Technology (KAUST), sa
Timothy Ravasi, King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Enrico Ammirati, San Raffaele Scientific Institute, Vita-Salute San Raffaele University, Italy
Session Chair: Predrag Radivojac
TOPPP03 (HT) - Computational identification of a transiently open L1/S3 pocket for reactivation of mutant p53.
Room: Hall 14.2Date: Sunday, July 21Author(s): Richard Lathrop, University of California, Irvine, us
Christopher Wassman, Google Inc., United States
Roberta Baronio, University of California, Irvine, United States
Özlem Demir, University of California, San Diego, United States
Brad Wallentine, University of California, Irvine, United States
Chiung-Kuang Chen, University of California, Irvine, United States
Linda Hall, University of California, Irvine, United States
Faezeh Salehi, University of California, Irvine, United States
Da-Wei Lin, University of California, Irvine, United States
Benjamin Chung, University of California, Irvine, United States
Wesley Hatfield, University of California, Irvine, United States
Richard Chamberlin, University of California, Irvine, United States
Hartmut Luecke, University of California, Irvine, United States
Peter Kaiser, University of California, Irvine, United States
Rommie Amaro, University of California, San Diego, United States
Session Chair: Russell Schwartz
TOPPP04 (PT) - Stability selection for regression-based models of transcription factor-DNA binding specificity
Room: Hall 4/5Date: Sunday, July 21Author(s): Fantine Mordelet, Duke University, United States
John Horton, Duke University, United States
Alexander Hartemink, Duke University, United States
Barbara Engelhardt, Duke University, United States
Raluca Gordan, Duke University, United States
Session Chair: Erik Bongcam-Rudloff
Motivation: The DNA binding specificity of a transcription factor (TF)
is typically represented using a position weight matrix (PWM) model,
which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that
does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have
their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret.
Results: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from
custom protein binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding
sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human
TFs c-Myc, Max, and Mad2) in their native genomic context. These
high-throughput, quantitative data are well suited for training complex
models that take into account not only independent contributions from
individual bases, but also contributions from di- and trinucleotides at
various positions within or near the binding sites. To ensure that our
models remain interpretable, we use feature selection to identify a
small number of sequence features that accurately predict TF-DNA
binding specificity. To further illustrate the accuracy of our regression
models, we show that even in the case of paralogous TF with
highly similar PWMs, our new models can distinguish the specificities
of individual factors. Thus, our work represents an important step
towards better sequence-based models of individual TF-DNA binding
specificity.
Availability: Our code is available at http://genome.duke.edu/labs/
gordan/ISMB2013. The PBM data used in this paper are available in
the Gene Expression Omnibus under accession number GSE44604.
TOPPP05 (HT) - Of Men and Not Mice: Comparative Genomic Analysis of Human Diseases and Mouse Models
Room: Hall 7Date: Sunday, July 21Author(s): Wenzhong Xiao, Massachusetts General Hospital/Harvard Medical School and Stanford University, us
Session Chair: Predrag Radivojac
TOPPP06 (HT) - Virtual ligand screening against comparative structural models of membrane transporters
Room: Hall 14.2Date: Sunday, July 21Author(s): Avner Schlessinger, Mount Sinai School of Medicine, us
Ethan Geier, University of California, San Francisco, United States
Hao Fan, University of California, San Francisco, United States
Jonathan Gable, University of California, San Francisco, United States
John Irwin, University of California, San Francisco, United States
Kathleen Giacomini, University of California, San Francisco, United States
Andrej Sali, University of California, San Francisco, United States
Session Chair: Russell Schwartz
TOPPP07 (PT) - A Graph Kernel Approach for Alignment-Free Domain-Peptide Interaction Prediction with an Application to Human SH3 Domains
Room: Hall 4/5Date: Sunday, July 21Author(s): Kousik Kundu, University of Freiburg, Germany
Fabrizio Costa, University of Freiburg, Germany
Rolf Backofen, University of Freiburg, Germany
Session Chair: Erik Bongcam-Rudloff
State-of-the-art experimental data for determining binding specificities of peptide recognition modules (PRMs) is obtained by high-throughput approaches like peptide arrays. Most prediction tools applicable to this kind of data are based on an initial multiple alignment of the peptide ligands. Building an initial alignment can be error-prone, especially in the case of the proline-rich peptides bound by the SH3 domains. Here we present a machine learning approach based on an efficient graph-kernel technique to predict the specificity of a large set of 70 human SH3 domains, which are a very important class of PRMs. The graph-kernel strategy allows us to 1) integrate several types of physico-chemical information for each amino acid, 2) consider high order correlations between these features and 3) eliminate the need for an initial peptide alignment. We build specialized models for each human SH3 domain and achieve competitive predictive performance of 0.73 area under precision-recall curve (AUC PR), compared to 0.27 AUC PR for state-of-the-art methods based on position weight matrices. We show that better models can be obtained when we use information on the on-interacting peptides (negative examples), which is currently not used by the state-of-the art approaches based on position-weight matrices. To this end, we analyze two strategies to identify subsets of high confidence negative data. The techniques introduced here are more general and hence can also be used for any other protein domains which interact with short peptides (i.e., other PRMs).
TOPPP08 (HT) - Impact of genetic dynamics and single-cell heterogeneity on development of nonstandard personalized medicine strategies for cancer
Room: Hall 7Date: Sunday, July 21Author(s): Chen-Hsiang Yeang, Academia Sinica, tw
Robert Beckman, University of California, San Francisco, United States
Gunter Schemmann, World Water and Solar Technologies, United States
Session Chair: Predrag Radivojac
TOPPP09 (HT) - Extensive changes in DNA methylation are associated with expression of mutant huntingtin
Room: Hall 14.2Date: Sunday, July 21Author(s): Christopher Ng, Massachusetts Institute of Technology, us
Ferah Yildirim, Massachusetts Institute of Technology, United States
Yoon Yap, Massachusetts Institute of Technology, United States
Simona Dalin, Massachusetts Institute of Technology, United States
Bryan Matthews, Massachusetts Institute of Technology, United States
Patricio Velez, Massachusetts Institute of Technology, United States
Adam Labadorf, Massachusetts Institute of Technology, United States
Ernest Fraenkel, Massachusetts Institute of Technology, United States
David Housman, Massachusetts Institute of Technology, United States
Session Chair: Russell Schwartz
TOPPP10 (HT) - Systems-based metatranscriptomic analysis
Room: Hall 7Date: Sunday, July 21Author(s): Xuejian Xiong, Hospital for Sick Children, ca
John Parkinson, Hospital For Sick Children, Canada
Daniel Frank, University of Colorado, United States
Charles Robertson, University of Colorado, United States
Stacy Hung, Hospital for Sick Children, Canada
Janet Markle, Hospital for Sick Children, Canada
Jayne Danska, Hospital for Sick Children, Canada
Philippe Poussier, Sunnybrook Health Sciences Centre Research Institute, Canada
Angelo Canty, McMaster University, Canada
Kathy McCoy, University of Bern, Switzerland
Andrew MacPherson, University of Bern, Switzerland
Session Chair: Predrag Radivojac
TOPPP11 (HT) - Metabolic phenotypic analysis uncovers reduced proliferation associated with oxidative stress in progressed breast cancer
Room: Hall 14.2Date: Sunday, July 21Author(s): Livnat Jerby Arnon, Tel Aviv University, il
Lior Wolf, Tel Aviv University, Israel
Carsten Denkert, Charité Hospital, Germany
Gideon Y Stein, Beilinson Hospital, Rabin Medical Center, Israel
Mika Hilvo, VTT Technical Research Centre of Finland, Finland
Matej Oresic, VTT Technical Research Centre of Finland, Finland
Tamar Geiger, Tel Aviv University, Israel
Eytan Ruppin, Tel Aviv University, Israel
Session Chair: Russell Schwartz
TOPPP12 (HT) - Mapping the Strategies of Viruses Hijacking Human Host Cells – An Experimental and Computational Comparative Study
Room: Hall 4/5Date: Sunday, July 21Author(s): Jacques Colinge, CeMM, at
Session Chair: Olga Vitek
TOPPP13 (HT) - Ultrashort and progressive 4sU-tagging reveals key characteristics of RNA processing at nucleotide resolution
Room: Hall 7Date: Sunday, July 21Author(s): Caroline Friedel, Ludwig-Maximilians-Universität München, de
Lukas Windhager, Ludwig-Maximilians-Universität München, Germany
Thomas Bonfert, Ludwig-Maximilians-Universität München, Germany
Kaspar Burger, Helmholtz-Zentrum München, Germany
Zsolt Ruzsics, Ludwig-Maximilians-Universität München, Germany
Stefan Krebs, Ludwig-Maximilians-Universität München, Germany
Stefanie Kaufmann, Ludwig-Maximilians-Universität München, Germany
Georg Malterer, Ludwig-Maximilians-Universität München, Germany
Anne L’Hernault, University of Cambridge, United Kingdom
Markus Schilhabel, Christian-Albrechts-Universität Kiel, Germany
Stefan Schreiber, Christian-Albrechts-Universität Kiel, Germany
Philip Rosenstiel, Christian-Albrechts-Universität Kiel, Germany
Ralf Zimmer, Ludwig-Maximilians-Universität München, Germany
Dirk Eick, Helmholtz-Zentrum München, Germany
Lars Dölken, University of Cambridge, United Kingdom
Session Chair: Ivo Hofacker
TOPPP14 (HT) - Identifying differentially expressed transcripts from RNA-seq data with biological variation
Room: Hall 14.2Date: Sunday, July 21Author(s): Peter Glaus, University of Manchester, uk
Antti Honkela, University of Helsinki, Finland
Magnus Rattray, University of Manchester, United Kingdom
Session Chair: Cenk Sahinalp
TOPPP15 (PT) - Multi-task learning for Host-Pathogen protein interactions
Room: Hall 4/5Date: Sunday, July 21Author(s): Meghana Kshirsagar, Carnegie Mellon University , United States
Jaime Carbonell, Carnegie Mellon University, United States
Judith Klein-Seetharaman, University of Pittsburgh School of Medicine, United States
Session Chair: Olga Vitek
Motivation:
An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host-pathogen interactions in several diseases in order to build stronger predictive models. Our approach is based on a formalism from machine-learning called `multi-task learning', which considers the problem of building models across tasks that are related to each other. A `task' in our scenario is the set of host-pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks.
Results:
Our current work on host-pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multi-task learning technique we develop uses a task based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex-Concave procedure based algorithm. We compare our integrative approach to baseline methods that build models on a single host-pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyse the protein interaction predictions generated by the models, and find some interesting insights.
TOPPP16 (HT) - Gene expression anti-profiles as a basis for accurate universal cancer signatures
Room: Hall 7Date: Sunday, July 21Author(s): Hector Corrada Bravo, University of Maryland, us
Session Chair: Ivo Hofacker
TOPPP17 (PT) - GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment
Room: Hall 14.2Date: Sunday, July 21Author(s):Zhaojun Zhang, UNC Chapel Hill, United States
Shunping Huang, UNC Chapel Hill, United States
Jack Wang, UNC Chapel Hill, United States
Xiang Zhang, Case Western Reserve University, United States
Fernando Pardo Manuel De Villena, UNC Chapel Hill, United States
Leonard McMillan, UNC Chapel Hill, United States
Wei Wang, UCLA, United States
Session Chair: Cenk Sahinalp
Motivation:
RNA-seq techniques provide an unparalleled means for exploring a transcriptome with deep coverage and base pair level resolution. Various analysis tools have been developed to align and assemble RNA-seq data, such as the widely used TopHat/Cufflinks pipeline. A common observation is that a sizable fraction of the fragments/reads align to multiple locations of the genome. These multiple alignments pose substantial challenges to existing RNA-seq analysis tools. Inappropriate treatment may result in reporting spurious expressed genes (false positives), and missing the real expressed genes (false negatives). Such errors impact the subsequent analysis, such as differential expression analysis. In our study, we observe that about 3.5% of transcripts reported by TopHat/Cufflinks pipeline correspond to annotated nonfunctional pseudogenes. Moreover, about 10.0% of reported transcripts are not annotated in the Ensembl database. These genes could be either novel expressed genes or false discoveries.
Results:
We examine the underlying genomic features that lead to multiple alignments and investigate how they generate systematic errors in RNA-seq analysis. We develop a general tool, GeneScissors, which exploits machine learning techniques guided by biological knowledge to detect and correct spurious transcriptome inference by existing RNA-seq analysis methods. In our simulated study, GeneScissors can predict spurious transcriptome calls due to misalignment with an accuracy close to 90%. It provides substantial improvement over the widely used TopHat/Cufflinks or MapSplice/Cufflinks pipelines in both precision and F-measurement. On real data, GeneScissors reports 53.6% less pseudogenes and 0.97% more expressed and annotated transcripts, when compared with the TopHat/Cufflinks pipeline. In addition, among the 10.0% unannotated transcripts reported by TopHat/Cufflinks, GeneScissors finds that more than 16.3% of them are false positives.
Availablility:
The software can be downloaded at http://csbio.unc.edu/genescissors/
TOPPP18 (HT) - A Conserved Map of Genetic Interactions Induced by DNA Damage
CancelledRoom: Hall 4/5Date: Sunday, July 21Author(s): Rohith Srivas, University of California, San Diego, us
Aude Guenole, Leiden University Medical Center, Netherlands
Kees Vreeken, Leiden University Medical Center, Netherlands
Ze Zhong Wang, University of California, San Diego, United States
Shuyi Wang, University of California, San Francisco, United States
Nevan Krogan, University of California, San Francisco, United States
Trey Ideker, University of California, San Diego, United States
Haico van Attikum, Leiden University Medical Center, Netherlands
Session Chair: Olga Vitek
TOPPP19 (HT) - Newborn screening for SCID identifies patients with ataxia telangiectasia
Room: Hall 7Date: Sunday, July 21Author(s): Steven Brenner, University of California, Berkeley, us
Jacob Mallott, UCSF, United States
Antonia Kwan, UCSF, United States
Joseph Church, USC, United States
Diana Gonzalez, UCSF, United States
Fred Lorey, Public Health Institute, United States
Ling Tang, UCSF, United States
Rajgopal Srinivisan, Tata Conservancy Service, India
Sadhna Rana, Tata Conservancy Service, India
Uma Sunderam, Tata Conservancy Service, India
Session Chair: Ivo Hofacker
TOPPP20 (PT) - Poly(A) motif prediction using spectral latent features from human DNA sequences
Room: Hall 14.2Date: Sunday, July 21Author(s): Bo Xie, Georgia Institute of Technology, United States
Boris Yankovic, King Abdullah University of Science and Technology
Vladimir Bajic, King Abdullah University of Science and Technology
Le Song, Georgia Institute of Technology, United States
Xin Gao, King Abdullah University of Science and Technology
Session Chair: Cenk Sahinalp
Motivation:
Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.
Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.
Results:
We propose a novel machine learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we employed hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine tune the classification performance.
We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14,740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of previous state-of-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false negative rate and false positive rate by 26%, 15% and 35%, respectively. Meanwhile, our method made about 30% fewer error predictions relative to the other string kernels. Furthermore, our method can be used to visualize the importance of oligomers and positions in predicting poly(A) motifs, from which we can observe a number of characteristics in the surrounding regions of true and false motifs that have not been reported before.
Availability:
website:http://sfb.kaust.edu.sa/Pages/Software.aspx
TOPPP21 (HT) - Synthetic lethality between gene defects affecting a single non-essential molecular pathway with reversible steps
Room: Hall 4/5Date: Sunday, July 21Author(s): Inna Kuperstein, Institut Cuire, fr
Andrei Zinovyev, Institut Curie, France
Emmanuel Barillot, Institut Curie, France
Wolf-Dietrich Heyer, University of California, Davis, United States
Session Chair: Olga Vitek
TOPPP22 (HT) - BioJS: An Open Source JavaScript Framework for Biological Data Visualization. Bioinformatics
Room: Hall 7Date: Sunday, July 21Author(s): Manuel Corpas, The Genome Analysis Centre, uk
John Gómez, EBI, United Kingdom
Leyla García, EBI, United Kingdom
Gustavo Salazar, University of Cape Town, South Africa
Jose Villaveces, Max Planck Institute, Germany
Swanand Gore, EBI, United Kingdom
Alexander García, Florida State University, United States
Maria Martín, EBI, United Kingdom
Guillaume Launay, Lyon1 University, France
Rafael Alcántara, EBI, United Kingdom
Noemi Del Toro Ayllón, EBI, United Kingdom
Marine Dumousseau, EBI, United Kingdom
Sandra Orchard, EBI, United Kingdom
Sameer Velankar, EBI, United Kingdom
Henning Hermjakob , EBI, United Kingdom
Chenggong Zong, UCLA, United States
Peipei Ping, UCLA, United States
Rafael Jiménez, EBI, United Kingdom
Session Chair: Ivo Hofacker
TOPPP23 (PT) - Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation
Room: Hall 14.2Date: Sunday, July 21Author(s): Dina Hafez, Duke University, United States
Uwe Ohler, Max Delbrück Center for Molecular Medicine, Germany
Jun Zhu, National Institutes of Health
Ting Ni, National Institutes of Health
Sayan Mukherjee, Duke University, United States
Session Chair: Cenk Sahinalp
Motivation:
Pre-mRNA cleavage and polyadenylation is an essential step for 3' end maturation, and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage site (polyA site), which are frequently constrained by sequence content and position. More than 50\% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with varying 3'UTRs, thus affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries.
Results:
We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three adult cell types. We specified a linear effects regression model to identify tissue-specific biases indicating regulated alternative polyadenylation; the significance of differences between cell types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual cell types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6\%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical PAS signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation.
TOPPP24 (HT) - Efficient Computation of Gene Tree Probability based on Coalescent Theory under Incomplete Lineage Sorting
Room: Hall 4/5Date: Monday, July 22Author(s): Yufeng Wu, University of Connecticut, us
Session Chair: Russell Schwartz
TOPPP25 (PT) - Predicting protein contact map using evolutionary and physical constraints by integer programming
Room: Hall 7Date: Monday, July 22Author(s): Jinbo Xu, Toyota Technological Institute at Chicago, United States
Zhiyong Wang, Toyota Technological Institute at Chicago
Session Chair: Alex Bateman
Motivation. Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains very challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole contact map. A couple of recent methods predict contact map by using mutual information (MI) and enforcing a sparsity restraint (i.e., the contact matrix shall be very sparse), but these methods demand for a very large number of sequence homologs and the resultant contact map may be still physically infeasible.
Results. This paper presents a novel method for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming (ILP). The evolutionary restraints are much more informative than MI and the physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and thus, significantly improves prediction accuracy. Experimental results show that our method outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration.
TOPPP26 (HT) - Interpreting Personal Transcriptomes: Personalized Mechanism-Scale Profiling Predicts Survival in Oral, Prostate, Lung and Gastric Cancers
Room: Hall 14.2Date: Monday, July 22Author(s): Yves Lussier, The University of Illinois, us
Xinan Yang, The University of Chicago, United States
Kelly Regan, Ohio State University, United States
Yong Huang, The University of Chicago, United States
Jianrong Li, The University of Illinois at Chicago, United States
Ezra Cohen, The University of Chicago, United States
Tanguy Zeiwert, The University of Chicago, United States
Session Chair: Serafim Batzoglou
TOPPP27 (HT) - Deconvolution of targeted protein-protein interaction maps
Room: ICC Lounge 81Date: Monday, July 22Author(s): Alexey Stukalov, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, at
Session Chair: Hagit Shatkay
TOPPP28 (PT) - IBD-Groupon : An Efficient Method for Detecting Group-wise Identity-by-Descent regions simultaneously in Multiple Individuals based on Pairwise IBD relationships
Room: Hall 4/5Date: Monday, July 22Author(s): Dan He, IBM T.J. Watson, United States
Session Chair: Russell Schwartz
Detecting Identity-by-Descent (IBD) is a very important problem in genetics. Most of the existing methods focus on detecting pairwise IBDs, which have relatively low power to detect short IBDs. Methods to detect IBDs among multiple individuals simultaneously, or group-wise IBDs, have better performance for short IBD detection. In the meanwhile group-wise IBDs can be applied to a wide range of applications such as disease mapping, pedigree reconstruction, etc. The existing group-wise IBD detection method is computationally inefficient and is only able to handle small data sets such as 20, 30 individuals with hundreds of SNPs. It also requires a prior specification of the number of IBD groups, which may not be realistic in many cases. The method can only handle small number of IBD groups such as two or three due to scalability issue. What's more, it does not take LD into consideration. In this work, we developed a very efficient method \textit{IBD-Groupon}, which detects group-wise IBDs based on pairwise IBD relationships and it is able to address all the drawbacks mentioned above. To our knowledge, our method is the first group-wise IBD detection method that is scalable to very large data sets, for example, hundreds of individuals with thousands of SNPs, and in the meanwhile is powerful to detect short IBDs. Our method does not need to specify the number of IBD groups, which will be detected automatically. And our method takes LD into consideration as it is based on pairwise IBDs where LD can be easily incorporated.
TOPPP29 (PT) - ThreaDom: Extracting Protein Domain Boundary Information from Multiple Threading Alignments
Room: Hall 7Date: Monday, July 22Author(s): Zhidong Xue, University of Michigan, United States
Dong Xu, University of Michigan
Yan Wang, University of Michigan
Yang Zhang, University of Michigan
Session Chair: Alex Bateman
Motivation: Protein domains are subunits that can fold and function independently. Identification of domain boundary locations is often the first step in protein folding and function annotations. Most of the current methods deduce domain boundaries by sequence-based analysis where accuracy is low. There is no efficient method for predicting discontinuous domains that consist of segments from separated sequences. Since template-based methods are most efficient for protein 3D structure modeling, combining multiple threading alignment information should increase the accuracy and reliability of computational domain predictions.
Result: We develop a new domain predictor, ThreaDom, which deduces protein domain boundary locations based on multiple threading alignments. The core of the method development is the derivation of a domain conservation score that combines composite information from template domain structures and terminal and internal alignment gaps. Tested on 630 non-redundant sequences, without using homologous templates ThreaDom generates correct single- and multi-domain classifications in 81% of cases where 78% have the domain linker location assigned within 20 residues. In a second test on 486 proteins with discontinuous domains, ThreaDom achieves an average precision 84% and a recall 65% in domain boundary prediction. Finally, ThreaDom was examined on 56 targets from CASP8 and had a domain overlap rate 73%, 87% and 85% with the target structure for Free Modeling, Hard multiple-domain and discontinuous domain proteins, respectively, which are significantly higher than most of the domain predictors in the CASP8 experiment.
TOPPP30 (HT) - Compressive Genomics
Room: Hall 14.2Date: Monday, July 22Author(s): Michael Baym, Harvard Medical School, us
Po-Ru Loh, MIT, United States
Bonnie Berger, MIT, United States
Session Chair: Serafim Batzoglou
TOPPP31 (PT) - Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding
Room: ICC Lounge 81Date: Monday, July 22Author(s): Carlo Vittorio Cannistraci, King Abdullah University of Science and Technology, Saudi Arabia
Gregorio Alanis-Lobato, King Abdullah University of Science and Technology
Timothy Ravasi, King Abdullah University of Science and Technology
Session Chair: Hagit Shatkay
Motivation: Most functions within the cell emerge thanks to protein-protein-interactions (PPIs), yet their experimental determination is both expensive and time consuming. PPI-networks present signifi-cant levels of noise and incompleteness. Prediction of interactions using solely PPI-network-topology (topological prediction) is difficult but essential when biological prior-knowledge is absent or unreliable.
Methods: Network-embedding emphasizes relations between net-work proteins embedded in a low-dimensional space, where protein-pairs closer to each other represent potential candidate interactions to predict. Network denoising, which boosts the prediction perfor-mance, is here achieved by minimum-curvilinear-embedding (MCE), combined with the shortest-path (SP) adopted in the reduced space for assigning likelihood scores to candidate interactions. Further-more, we introduce: (i) a new valid variation of MCE named non-centred-MCE (ncMCE); (ii) two automatic strategies for the selection of the appropriate embedding-dimension; (ii) two new randomised procedures for prediction evaluation.
Results: We compared our method against several unsupervised and supervised embedding approaches, and node-neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader outperforming the current methods for topological link prediction.
Conclusion: Minimum curvilinearity is a valuable nonlinear frame-work, which we successfully applied in embedding of protein net-works for unsupervised prediction of novel PPIs. The rationale is that biological and evolutionary prior-information is imprinted in the nonlinear patterns hidden behind the protein network topology, and can be exploited for prediction of new protein links. The predicted PPIs represent good candidates to test in high-throughput experi-ments or to exploit in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules.
TOPPP32 (PT) - Inference of historical migration rates via haplotype sharing
Room: Hall 4/5Date: Monday, July 22Author(s): Pier Francesco Palamara, Columbia University, United States
Itsik Pe'Er, Columbia University, United States
Session Chair: Russell Schwartz
Pairs of individuals from a study cohort will often share long-range haplotypes identical-by-descent (IBD). Such haplotypes are transmitted from common ancestors that lived tens to hundreds of generations in the past, and can now be efficiently detected in high-resolution genomic datasets, providing a novel source of information in several domains of genetic analysis. Recently, haplotype sharing distributions were studied in the context of demographic inference, and were used to reconstruct recent demographic events in several populations. We here extend such framework to handle demographic models that contain multiple demes interacting through migration. We extensively test our formalism in several demographic scenarios, and provide a freely available software tool for demographic inference.
TOPPP33 (PT) - Protein Threading Using Context-Specific Alignment Potential
Room: Hall 7Date: Monday, July 22Author(s):Jianzhu Ma, Toyota Technological Institute at Chicago, United States
Sheng Wang, Toyota Technological Institute at Chicago, United States
Jinbo Xu, Toyota Technological Institute at Chicago, United States
Feng Zhao, Toyota Technological Institute at Chicago, United States
Session Chair: Alex Bateman
Motivation: Template-based modeling (TBM) including homology modeling and protein threading is the most reliable method for pro-tein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current TBM methods, especially when proteins under consideration are distantly related.
Results: We present a novel context-specific alignment potential for protein threading including alignment and template selection. Our alignment potential measures the log odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global context-specific information. The local alignment potential quantifies how well one sequence residue can be aligned to one template residue based upon context-specific information of the residues. The global alignment potential quantifies how well two sequence residues can be placed into two template positions at a given distance, again based upon context-specific information. By accounting for correla-tion among a variety of protein features and making use of context-specific information, our alignment potential is much more sensitive than the widely used context-independent or profile-based scoring function. Experimental results confirm that our method generates significantly better alignments and threading results than the best profile-based methods on several very large benchmarks. Our method works particularly well for distantly-related proteins or pro-teins with sparse sequence profiles due to the effective integration of context-specific, structure and global information.
TOPPP34 (PT) - Predicting Drug-Target Interactions Using Restricted Boltzmann Machines
Room: Hall 14.2Date: Monday, July 22Author(s):Yuhao Wang, Tsinghua University, China
Jianyang Zeng, Tsinghua University, China
Session Chair: Serafim Batzoglou
Motivation:
In silico prediction of drug-target interactions plays an important role towards identifying and developing new uses of existing or abandoned drugs. Network-based approaches have recently become a popular tool for discovering new drug-target interactions. Unfortunately, most of these network-based approaches can only predict binary interactions between drugs and targets, and information about different types of interactions has not been well exploited for drug-target interaction prediction in previous studies. On the other hand, incorporating additional information about drug-target relationships or drug modes of action can improve prediction of drug-target interactions. Furthermore, the predicted types of drug-target interactions can broaden our understanding about the molecular basis of drug action.
Results:
We propose a first machine learning approach to integrate multiple types of drug-target interactions and predict unknown drug-target relationships or drug modes of action. We cast the new drug-target interaction prediction problem into a two-layer graphical model, called restricted Boltzmann machine (RBM), and apply a practical learning algorithm to train our model and make predictions. Tests on two public databases show that our RBM model can effectively capture the latent features of a drug-target interaction network, and achieve excellent performance on predicting different types of drug-target interactions, with the area under precision-recall curve (AUPR) up to 89.6. In addition, we demonstrate that integrating multiple types of drug-target interactions can significantly outperform other predictions either by simply mixing multiple types of interactions without distinction or using only a single interaction type. Further tests show that our approach can infer a high fraction of novel drug-target interactions that has been validated by known experiments in the literature or other databases. These results indicate that our approach can have highly practical relevance to drug-target interaction prediction and drug repositioning, and hence advance the drug discovery process.
Availability: Software and datasets are available upon request.
TOPPP35 (PT) - Supervised de novo reconstruction of metabolic pathways from metabolome-scale compound sets
Room: ICC Lounge 81Date: Monday, July 22Author(s): Masaaki Kotera, Kyoto University, Japan
Yasuo Tabei,
Yoshihiro Yamanishi, Kyushu University, Japan
Toshiaki Tokimatsu, Kyoto University, Japan
Susumu Goto, Kyoto University, Japan
Session Chair: Hagit Shatkay
Motivation: The metabolic pathway is an important biochemical reaction network involving enzymatic reactions among chemical compounds. However, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. Therefore, the most important challenge in metabolomics is the automated de novo reconstruction of metabolic pathways, which includes the elucidation of previously unknown reactions to bridge the metabolic gaps.
Results: In this paper we develop a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as ”enzymatic-reaction likeness”, i.e., whether or not compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns, and in the large-scale applicability owing to the computational efficiency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in KEGG. Our comprehensively predicted reaction networks of 15,698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics.
TOPPP36 (PT) - Efficient network-guided multi-locus association mapping with graph cuts
Room: Hall 4/5Date: Monday, July 22Author(s): Chloé-Agathe Azencott, Max-Planck-Institutes Tübingen, Germany
Dominik Grimm, Max-Planck-Institutes Tübingen, Germany
Mahito Sugiyama, Max-Planck-Institutes Tübingen, Germany
Yoshinobu Kawahara, Osaka University, Japan
Karsten Borgwardt, Max-Planck-Institutes Tübingen, Germany
Session Chair: Russell Schwartz
As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. While several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci, or do not scale to genome-wide settings.
We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype, while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly.
SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci and exhibits higher power in detecting causal SNPs in simulation studies than other methods. On flowering time phenotypes and genotypes from Arabidposis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature.
TOPPP37 (HT) - The role of proteins encoded by chimeric RNAs in eukaryotes
Room: Hall 7Date: Monday, July 22Author(s): Milana Frenkel-Morgenstern, Spanish National Cancer Research Centre (CNIO), es
Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain
Session Chair: Alex Bateman
TOPPP38 (HT) - Navigating chemical and biological space – in the search of novel pharmaceuticals
Room: Hall 14.2Date: Monday, July 22Author(s): Paula Petrone, Hoffmann-La Roche, ch
Ben Simms, Novartis NIBR, United States
Anne Mai Wassermann, Novartis NIBR, United States
Eugen Lounkine, Novartis NIBR, United States
Peter Kutchukian, Novartis NIBR, United States
Paul Selzer, Novartis NIBR, United States
Florian Nigsch, Novartis NIBR, United States
Jeremy Jenkins, Novartis NIBR, United States
Allen Cornett, Novartis NIBR, United States
Zhan Deng, Novartis NIBR, United States
John W Davies, Novartis NIBR, United States
Session Chair: Serafim Batzoglou
TOPPP39 (PT) - A framework for scalable parameter estimation of gene circuit models using structural information
Room: ICC Lounge 81Date: Monday, July 22Author(s): Xin Gao, King Abdullah University of Science and Technology, Saudi Arabia
Ming Fan, King Abdullah University of Science and Technology , Saudi Arabia
Suojin Wang, Texas A&M University, United States
Hiroyuki Kuwahara, King Abdullah University of Science and Technology, Saudi Arabia
Session Chair: Hagit Shatkay
Motivation:
Systematic and scalable parameter estimation is a key to construct complex gene regulatory models and to ultimately facilitate an integrative systems biology approach to quantitatively understand the molecular mechanisms underpinning gene regulation.
Results:
Here, we report a novel framework for efficient and scalable parameter estimation that focuses specifically on modeling of gene circuits.
Exploiting the structure commonly found in gene circuit models, this framework decomposes a system of coupled rate equations into individual ones and efficiently integrates them separately to reconstruct the mean time evolution of the gene products. The accuracy of the parameters is refined by iteratively increasing the accuracy of numerical integration using the model structure. As a case study, we applied our framework to four gene circuit models with complex dynamics based on three synthetic data sets and one time-series microarray data set. We compared our framework to three state-of-the-art parameter estimation methods and found that our approach consistently generated higher quality parameter solutions efficiently.
While many general-purpose parameter estimation methods have been applied for modeling of gene circuits, our results suggest that the use of more tailored approaches to employ domain specific information may be a key to reverse-engineering of complex biological systems.
Availability:
Website: http://sfb.kaust.edu.sa/Pages/Software.aspx
TOPPP40 (PT) - Identifying proteins controlling key disease signaling pathways
Room: Hall 4/5Date: Monday, July 22Author(s): Anthony Gitter, Carnegie Mellon University , United States
Ziv Bar-Joseph, Carnegie Mellon University
Session Chair: Reinhard Schneider
Several types of studies, including genome-wide association studies and RNA interference screens, strive to link genes to diseases. Although these approaches have had some success, genetic variants are often only present in a small subset of the population and screens are noisy with low overlap between experiments in different labs. Neither provides a mechanistic model explaining how identified genes impact the disease of interest or the dynamics of the pathways those genes regulate. Such mechanistic models could be used to accurately predict downstream effects of knocking down pathway members and allow comprehensive exploration of the effects of targeting pairs or higher-order combinations of genes.
We developed methods to model the activation of signaling and dynamic regulatory networks involved in disease progression. Our model, SDREM, integrates static and time series data to link proteins and the pathways they regulate in these networks. SDREM utilizes prior information about proteins' likelihood of involvement in a disease (e.g. from screens) to improve the quality of the predicted signaling pathways. We used our algorithms to study the human immune response to H1N1 influenza infection. The resulting networks correctly identified many of the known pathways and transcriptional regulators of this disease. Furthermore, they accurately predict RNA interference effects and can be used to infer genetic interactions, greatly improving over other methods suggested for this task. Applying our method to the more pathogenic H5N1 influenza allowed us to identify several strain-specific targets of this infection.
TOPPP41 (PT) - Automated Cellular Annotation for High Resolution Images of Adult C. elegans
Room: Hall 7Date: Monday, July 22Author(s): Sarah Aerni, Stanford University, United States
Xiao Liu, Stanford University, United States
Chuong Do, 23andMe, Inc., United States
Samuel Gross, Stanford University, United States
Andy Nguyen, Stanford University School of Medicine, United States
Stephen Guo, Stanford University, United States
Fuhui Long, Howard Hughes Medical Institute, United States
Hanchuan Peng, Allen Institute for Brain Science, United States
Stuart Kim, Stanford University School of Medicine, United States
Serafim Batzoglou, Stanford University, United States
Session Chair: Stefan Kramer
Motivation:
Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm C. elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C. elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes.
Results:
In this paper, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal, and hypodermal cells) in high-resolution images of adult C. elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum flow and apply a cross-entropy based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system.These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C. elegans.
TOPPP42 (PT) - Haplotype assembly in polyploid genomes and identical by descent shared tracts
Room: Hall 14.2Date: Monday, July 22Author(s): Derek Aguiar, Brown University, United States
Sorin Istrail, Brown University, United States
Session Chair: Sean O'Donoghue
Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing these high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (1) do not consider individuals sharing haplotypes jointly which reduces the size and accuracy of assembled haplotypes and (2) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Particularly, polyploid organisms are becoming the target of many research groups interested in studying the genomics of disease, phylogenetics, botany, and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction.
Results: In this work, we present a number of results, extensions, and generalizations of Compass graphs and our HapCompass framework (Aguiar et al. 2012). We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. We present graph theory-based algorithms for the problem of haplotype assembly from sequencing data using our previously developed HapCompass framework for (1) novel implementations of haplotype assembly optimizations (minimum error correction), (2) assembly of a pair of individuals sharing a tract identical by descent, and (3) assembly of polyploid genomes. We demonstrate the accuracy of each method on the 1000 Genomes Project, Pacific Biosciences, and simulated sequence data.
HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/
TOPPP43 (HT) - Emerging methods in protein co‐evolution
Room: ICC Lounge 81Date: Monday, July 22Author(s): David Juan, Spanish National Cancer Research Centre, es
Florencio Pazos, Spanish National Centre for Biotechnology, Spain
Alfonso Valencia, Spanish National Cancer Research Centre, Spain
Session Chair: Burkhard Rost
TOPPP44 (PT) - Compressive genomics for protein databases
Room: Hall 4/5Date: Monday, July 22Author(s): Noah Daniels, Tufts University, United States
Andrew Gallant, Tufts University, United States
Jian Peng, Massachusetts Institute of Technology, United States
Lenore Cowen, Tufts University, United States
Michael Baym, Harvard Medical School
Bonnie Berger, Massachusetts Institute of Technology, United States
Session Chair: Reinhard Schneider
Motivation: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed up homology search directly, but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools.
Results: We introduce a suite of homology search tools, powered by compressively-accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate to all known state- of-the-art tools including HHblits, DELTA-BLAST, and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP’s runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed up many tasks such as protein structure prediction and orthology mapping which rely heavily on homology search. Availability: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/
TOPPP45 (PT) - FuncISH: Learning a functional representation of neural ISH images
Room: Hall 7Date: Monday, July 22Author(s): Noa Liscovitch, Bar Ilan University, Israel
Uri Shalit, Hebrew University of Jerusalem, Israel
Gal Chechik, Stanford University, United States
Session Chair: Stefan Kramer
High spatial resolution imaging datasets of mammalian brains have recently become available in unprecedented amounts. Images now reveal highly complex patterns of gene expression varying on multiple scales. The challenge in analyzing these images is both in extracting the patterns that are most relevant functionally, and in providing a meaningful representation that allows neuroscientists to interpret the extracted patterns.
Here we present FuncISH – a method to learn functional representations of neural in situ hybridization (ISH) images. We represent images using a histogram of local descriptors (SIFT) in several scales, and use this representation to learn detectors of functional (GO) categories for every image. As a result, each image is represented as a point in a low dimensional space whose axes correspond to meaningful functional annotations. The resulting representations define similarities between ISH images that can be easily explained by functional categories.
We applied our method to the genomic set of mouse neural ISH images available at the Allen Brain Atlas, finding that the majority of GO biological processes can be inferred from spatial expression patterns with high accuracy. Using functional representations, we predict several gene interaction properties such as protein-protein interactions and cell type specificity more accurately than competing methods based on global correlations. We used FuncISH to identify similar expression patterns of GABAergic neuronal markers that were not previously identified, and to infer new gene function based on image-image similarities.
TOPPP46 (PT) - Using State Machines to Model the IonTorrent Sequencing Process and Improve Read Error-Rates
Room: Hall 14.2Date: Monday, July 22Author(s): David Golan, Tel Aviv University, Israel
Paul Medvedev, The Pennsylvania State University, United States
Session Chair: Sean O'Donoghue
Motivation:
The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is IonTorrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous semiconductor technology and innovation in chemistry, IonTorrent has been gaining popularity since its debut in 2011. Despite the advantages, however, IonTorrent read accuracy
remains a significant concern.
Results:
We present FlowgramFixer, a new algorithm for converting flowgrams into reads. Our key observation is that the incorporation signals of neighboring flows, even after normalization and phase correction, carry considerable mutual information and are important in making the correct base-call. We therefore propose that base-calling of flowgrams should be done on a read-wide level, rather than one flow at a time. We show that this can be done in linear time by combining a state machine with a Viterbi algorithm to find the nucleotide sequence that maximizes the likelihood of the observed flowgram. FlowgramFixer is applicable to any flowgram based sequencing platform. We demonstrate FlowgramFixer’s superior performance on Ion Torrent E.Coli data, with a 4.8% improvement in the number of high-quality mapped reads and a 7.1% improvement in the number of uniquely mappable reads.
Availability:
Binaries and source code of FlowgramFixer are freely available at:
http://www.cs.tau.ac.il/˜davidgo5/flowgramfixer.html
TOPPP47 (HT) - Short Toxin-like Proteins Abound in Cnidaria Genomes
Room: ICC Lounge 81Date: Monday, July 22Author(s): Michal Linial, The Hebrew University of Jerusalem, il
Isaak Tirosh, The Hebrew University of Jerusalem, Israel
Manor Askenazi, The Hebrew University of Jerusalem, Israel
Itai Linial, The Hebrew University of Jerusalem, Israel
Session Chair: Burkhard Rost
TOPPP48 (HT) - Predicting the molecular complexity of sequencing libraries
Room: Hall 4/5Date: Monday, July 22Author(s): Andrew Smith, University of Southern California, us
Session Chair: Reinhard Schneider
TOPPP49 (PT) - Automated annotation of gene expression image sequences via nonparametric factor analysis and conditional random fields
Room: Hall 7Date: Monday, July 22Author(s): Iulian Pruteanu-Malinici, Duke University, United States
William Majoros, Duke University, United States
Uwe Ohler, Duke University, United States
Session Chair: Stefan Kramer
Motivation: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously.
Methods: We describe a discriminative, undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a nonparametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy, incomplete samples, i.e. it can tolerate data missing from individual time points.
Results: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared to previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages.
TOPPP50 (HT) - A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis
Room: Hall 14.2Date: Monday, July 22Author(s): Mathieu Clément-Ziza, Biotec, Technische Universitaet Dresden, de
Paola Picotti, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Henry Lam, The Hong Kong University of Science and Technology, Hong Kong
David Campbell, Institute for Systems Biology, United States
Alexander Schmidt, University of Basel, Switzerland
Eric Deutsch, Institute for Systems Biology, United States
Hannes Röst, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Zhi Sun, Institute for Systems Biology, Seattle, United States
Olivier Rinner, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Lukas Reiter, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Qin Shen, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Jacob Michaelson, Technische Universitaet Dresden, Germany
Andreas Frei, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Simon Alberti, Max Planck Institute of Molecular Cell Biology and Genetics, Germany
Ulrike Kusebauch, Institute for Systems Biology, Seattle, United States
Bernd Wollscheid, nstitute of Molecular Systems Biology, ETH Zurich, Switzerland
Robert Moritz, Institute for Systems Biology, Seattle, United States
Andreas Beyer, BIOTEC, Technische Universitaet Dresden, Germany
Ruedi Aebersold, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Session Chair: Sean O'Donoghue
TOPPP51 (PT) - Predicting protein interactions via parsimonious network history inference
Room: ICC Lounge 81Date: Monday, July 22Author(s): Robert Patro, Carnegie Mellon University, United States
Carl Kingsford, Carnegie Mellon University, United States
Session Chair: Burkhard Rost
Motivation: Reconstruction of the network-level evolutionary history of
protein-protein interactions provides a principled way to relate interactions
in several present-day networks. Here, we present a general framework for
inferring such histories and demonstrate how it can be used to determine what
interactions existed in the ancestral networks, which present-day interactions
should we expect to exist based on evolutionary evidence, and what information
extant networks contain about the order of ancestral protein duplications.
Results: Our framework characterizes the space of likely parismonious network
histories. It results in a structure that can be used to find probabilities for
a number of events associated with the histories. The framework is based on a
directed hypergraph formulation of dynamic programming that we extend to
enumerate many optimal and near-optimal solutions. The algorithm is applied to
reconstructing ancestral interactions among bZIP transcription factors,
imputing missing present-day interactions among the bZIPs and among proteins
from 5 herpes viruses, and determining relative protein duplication order in
the bZIP family. Our approach more accurately reconstructs ancestral
interactions compared with existing approaches. In cross-validation tests, we find
that our approach ranks the majority of the left-out present-day interactions
among the top 2% and 17% of possible edges for the bZIP and herpes networks,
respectively, making it a competitive approach for edge imputation. It also
estimates, from interaction data alone, relative bZIP protein duplication
orders that are significantly correlated with sequence-based estimates.
Availability: The algorithm is implemented in C++, is open source,
and available at http://www.cs.cmu.edu/~ckingsf/software/parana2.
Contact: robp@cs.cmu.edu and carlk@cs.cmu.edu
TOPPP52 (HT) - Interpreting genomic data via entropic dissection
Room: Hall 4/5Date: Monday, July 22Author(s): Rajeev Azad, University of North Texas, us
Session Chair: Reinhard Schneider
TOPPP53 (PT) - A High-Throughput Framework to Detect Synapses in Electron Microscopy Images
Room: Hall 7Date: Monday, July 22Author(s): Saket Navlakha, Carnegie Mellon University , United States
Joseph Suhan, Carnegie Mellon University
Alison Barth, Carnegie Mellon University
Ziv Bar-Joseph, Carnegie Mellon University
Session Chair: Stefan Kramer
Motivation: Synaptic connections underlie learning and memory in the brain and are dynamically formed and eliminated during development and
in response to stimuli. Quantifying changes in overall density and strength of synapses is an important pre-requisite for studying
connectivity and plasticity in these cases or in diseased conditions. Unfortunately, most techniques to detect such changes are either
low-throughput (e.g. electrophysiology), prone to error and difficult to automate (e.g. standard electron microscopy), or too coarse (e.g.
MRI) to provide accurate and large-scale measurements. Results: To facilitate high-throughput analyses, we used a 50-year-old
experimental technique to selectively stain for synapses in electron microscopy (EM) images, and we developed a machine learning framework
to automatically detect synapses in these images. To validate our method we experimentally imaged brain tissue of the somatosensory
cortex in six mice. We detected thousands of synapses in these images and demonstrate the accuracy of our approach using cross-validation
with manually labeled data and by comparing against existing algorithms and against tools that process standard EM images. We also
used a semi-supervised algorithm that leverages unlabeled data to overcome sample heterogeneity and improve performance. Our algorithms
are highly efficient and scalable and are freely available for others to use.
TOPPP54 (HT) - A probabilistic histone modification map of the human genome and its implications for gene regulation
Room: Hall 14.2Date: Monday, July 22Author(s): Misook Ha, Samsung Advanced Institute of Technology, kr
Soondo Hong, Samsung Display Corporation, Korea, Rep
Wen-Hsing Li, University of Chicago, United States
Session Chair: Sean O'Donoghue
TOPPP55 (HT) - Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence.
Room: ICC Lounge 81Date: Monday, July 22Author(s): Yuval Tabach, Massachusetts General Hospital/ Harvard Medical School, us
Session Chair: Burkhard Rost
TOPPP56 (PT) - IDBA-Tran: A More Robust de novo de Bruijn Graph Assembler for Transcriptomes with Uneven Expression Levels
Room: Hall 4/5Date: Tuesday, July 23Author(s):Yu Peng, The University of Hong Kong, Hong Kong
Henry C.M. Leung, The University of Hong Kong
S.M. Yiu, The University of Hong Kong
Xin-Guang Zhu, Shanghai Institutes for Biological Sciences, China
Ming-Zhu Lv, Shanghai Institutes for Biological Sciences, China
Francis Chin, The University of Hong Kong
Session Chair: Debra Goldberg
Motivation: RNA sequencing based on next-generation sequencing technology is an effective approach for analyzing transcriptomes. Similar to de novo genome assembly, de novo transcriptome assembly does not rely on a reference genome or additional annotated information. It is well-known that the transcriptome assembly problem is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100) which make it very difficult to identify low-expressed isoforms. Technically, a core issue is to remove erroneous vertices/edges with high multiplicity (produced by high-expressed isoforms) in the de Bruijn graph without removing those correct ones with not so high multiplicity corresponding to low-expressed isoforms. Failing to do so will result in the loss of low-expressed isoforms or having complicated subgraphs with transcripts of different genes mixed together due to the erroneous vertices and edges.
Contributions: Unlike existing tools which usually remove erroneous vertices/edges if their multiplicities are lower than a global threshold, we developed a probabilistic progressive approach with local thresholds to iteratively remove those erroneous vertices/edges. This enables us to decompose the graph into disconnected components, each of which contains a few, if not single, genes, while keeping a lot of correct vertices/edges of low-expressed isoforms. Combined with existing techniques, IDBA-Tran is able to assemble both high-expressed and low-expressed transcripts and outperforms existing assemblers in terms of sensitivity and specificity for both simulated and real data.
Availability: http://www.cs.hku.hk/~alse/idba_tran
TOPPP57 (HT) - Visual Exploration for Cancer Subtype Analysis
Room: Hall 7Date: Tuesday, July 23Author(s): Nils Gehlenborg, Harvard Medical School, us
Alexander Lex, Harvard University, United States
Marc Streit, Johannes Kepler University Linz, Austria
Hans-Joerg Schulz, University of Rostock, Germany
Christian Partl, Graz University of Technology, Austria
Dieter Schmalstieg, Graz University of Technology, Austria
Peter Park, Harvard Medical School, United States
Session Chair: Thomas Lengauer
TOPPP58 (HT) - Simulating Delta/Notch Signaling in Somitogenesis and Pancreas Development
Room: Hall 14.2Date: Tuesday, July 23Author(s): Hendrik Tiedemann, Helmholtz Center Munich, de
Elida Schneltzer, Helmholtz Center Munich, Germany
Gerhard Przemeck, Helmholtz Center Munich, Germany
Martin Hrabě De Angelis, Helmholtz Center Munich, Germany
Session Chair: Lonnie Welch
TOPPP59 (HT) - From sequence co-evolution to protein (complex) structure prediction
Room: ICC Lounge 81Date: Tuesday, July 23Author(s): Martin Weigt, Universite Pierre and Marie Curie, fr
Session Chair: Janet Kelso
TOPPP60 (PT) - Short Read Alignment with Populations of Genomes
Room: Hall 4/5Date: Tuesday, July 23Author(s): Victoria Popic, Stanford University, United States
Lin Huang, Stanford University, United States
Serafim Batzoglou, Stanford University, United States
Session Chair: Debra Goldberg
The increasing availability of high throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to-date there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this paper. We (1) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (2) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of 2 or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome.
TOPPP61 (HT) - The cBio Portal for Cancer Genomics
Room: Hall 7Date: Tuesday, July 23Author(s): Nikolaus Schultz, Memorial Sloan-Kettering Cancer Center, us
Jianjiong Gao, Memorial Sloan-Kettering Cancer Center, United States
B. Arman Aksoy, Memorial Sloan-Kettering Cancer Center, United States
Benjamin Gross, Memorial Sloan-Kettering Cancer Center, United States
Gideon Dresdner, Memorial Sloan-Kettering Cancer Center, United States
S. Onur Sumer, Memorial Sloan-Kettering Cancer Center, United States
Ethan Cerami, Memorial Sloan-Kettering Cancer Center, United States
Anders Jacobsen, Memorial Sloan-Kettering Cancer Center, United States
Ugur Dogrusoz, Bilkent University, Turkey
Erik Larsson, University of Gothenburg, Sweden
Chris Sander, Memorial Sloan-Kettering Cancer Center, United States
Session Chair: Thomas Lengauer
TOPPP62 (HT) - ATARiS: Computational quantification of gene suppression phenotypes from multisample RNAi screens
Room: Hall 14.2Date: Tuesday, July 23Author(s): Aviad Tsherniak, Broad Institute of MIT and Harvard, us
Diane Shao, Broad Institute, United States
William Hahn, Broad Institute, United States
Jill Mesirov, Broad Institute, United States
Session Chair: Lonnie Welch
TOPPP63 (HT) - Accurate prediction of peptide-induced dynamical changes within the second PDZ domain of PTP1e
Room: ICC Lounge 81Date: Tuesday, July 23Author(s): Elisa Cilia, Université Libre de Bruxelles, be
Tom Lenaerts, Université Libre de Bruxelles, Belgium
Geerten Vuister, University Of Leicester, United Kingdom
Session Chair: Janet Kelso
TOPPP64 (PT) - Design of Shortest Double-Stranded DNA Sequences Covering All K-mers with Applications to Protein Binding Microarrays and Synthetic Enhancers
Room: Hall 4/5Date: Tuesday, July 23Author(s): Yaron Orenstein, Tel-Aviv University, Israel
Ron Shamir, Tel-Aviv University, Israel
Session Chair: Debra Goldberg
Novel technologies can generate large sets of short double-stranded DNA sequences that can be used to measure their regulatory effects. Microarrays can measure in vitro the binding intensity of a protein to thousands of probes. Synthetic enhancer sequences inserted into an organism's genome allow us to measure in vivo the effect of such sequences on the phenotype. In both applications, by using sequence probes that cover all k-mers, a comprehensive picture of the effect of all possible short sequences on gene regulation is obtained. The value of k that can be used in practice is, however, severely limited by cost and space considerations. A key challenge is therefore to cover all k-mers with a minimal number of probes.The standard way to do this uses the de Bruijn sequence of length 4^k. However, since probes are double stranded, when a k-mer is included in a probe, its reverse complement k-mer is accounted for as well. Here we show how to efficiently create a
shortest possible sequence with the property that it contains each k-mer or its reverse complement, but not necessarily both. The length of the resulting sequence approaches half that of the de Bruijn sequence as k increases. By reducing the total sequence length, experimental limitations can be overcome; alternatively, additional sequences with redundant k-mers of interest can be added.
TOPPP65 (HT) - Visualizing and Mining Chemical-Biological Space
Room: Hall 7Date: Tuesday, July 23Author(s): Stefan Kramer, Johannes Gutenberg University Mainz, de
Andreas Karwath, Johannes Gutenberg University Mainz, Germany
Madeleine Seeland, TU München, Germany
Martin Gütlein, University of Freiburg, Germany
Session Chair: Thomas Lengauer
TOPPP66 (PT) - Learning Subgroup-Specific Regulatory Interactions and Regulator Independence with PARADIGM
Room: Hall 14.2Date: Tuesday, July 23Author(s): Andrew J. Sedgewick, University of Pittsburgh, United States
Stephen Benz, Five3 Genomics, LLC
Shahrooz Rabizadeh, Chan Soon-Shiong Institute for Advanced Health
Patrick Soon-Shiong, Chan Soon-Shiong Institute for Advanced Health
Charles Vaske, Five3 Genomics, LLC
Session Chair: Lonnie Welch
High-dimensional “-omics” profiling provides a detailed molecular view of individual cancers, however understanding the mechanisms by which tumors evade cellular defenses requires deep knowledge of the underlying cellular pathways within each cancer sample. We extended the PARADIGM algorithm (Vaske et al., 2010), a pathway analysis method for combining multiple “-omics” data types, to learn the strength and direction of 9139 gene and protein interactions curated from the literature. Using genomic and mRNA expression data from 1936 samples in The Cancer Genome Atlas (TCGA) cohort, we learned interactions that provided support for and relative strength of 7138 (78%) of the curated links. Gene set enrichment found that genes involved in the strongest interactions were significantly enriched for transcriptional regulation, apoptosis, cell cycle regulation, and response to tumor cells. Within the TCGA breast cancer cohort we assessed different interaction strengths between breast cancer subtypes, and found interactions associated with the MYC pathway and the ER alpha network to be among the most differential between basal and luminal A subtypes. PARADIGM with the Naive Bayesian assumption produced gene activity predictions that, when clustered, found groups of patients with better separation in survival than both the original version of PARADIGM and a version without the assumption. We found that this Naive Bayes assumption was valid for the vast majority of co-regulators, indicating that most co-regulators act independently on their shared target. Availability: http://paradigm.five3genomics.com
TOPPP67 (HT) - A large‐scale evaluation of computational protein function prediction
Room: ICC Lounge 81Date: Tuesday, July 23Author(s): Predrag Radivojac, Indiana University, us
Session Chair: Janet Kelso
TOPPP68 (HT) - Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
Room: Hall 4/5Date: Tuesday, July 23Author(s): Denisa Duma, University of California Riverside, us
Stefano Lonardi, University of California Riverside, United States
Matthew Alpert, University of California Riverside, United States
Gianfranco Ciardo, University of California Riverside, United States
Timothy J. Close, University of California Riverside, United States
Steve Wanamaker, University of California Riverside, United States
Yaqin Ma, University of California Riverside, United States
Ming-Cheng Luo, University of California Davis, United States
Yonghui Wu, University of California Riverside, United States
Francesca Cordero, University of Torino, Italy
Marco Beccuti, University of Torino, Italy
Serdar Bozdag, Marquette University, United States
Prasanna R. Bhat, University of California Riverside, United States
Burair Alsaihati, University of California Riverside, United States
Josh Resnik, University of California Riverside, United States
Session Chair: Debra Goldberg
TOPPP69 (HT) - Designing with the user in mind: how UCD can work for bioinformatics
Room: Hall 7Date: Tuesday, July 23Author(s): Jennifer Cham, European Bioinformatics Institute, uk
Katrina Pavelin, European Bioinformatics Institute, United Kingdom
Paula de Matos, European Bioinformatics Institute, United Kingdom
Cath Brooksbank, European Bioinformatics Institute, United Kingdom
Graham Cameron, European Bioinformatics Institute, United Kingdom
Hong Cao, European Bioinformatics Institute, United Kingdom
Rafael Alcantara, European Bioinformatics Institute, United Kingdom
Francis Rowland, European Bioinformatics Institute, United Kingdom
Brendan Vaughan, European Bioinformatics Institute, United Kingdom
Silvano Squizzato , European Bioinformatics Institute, United Kingdom
Youngmi Park, European Bioinformatics Institute, United Kingdom
Rodrigo Lopez, European Bioinformatics Institute, United Kingdom
Christoph Steinbeck, European Bioinformatics Institute, United Kingdom
Session Chair: Thomas Lengauer
TOPPP70 (PT) - Hard-wired heterogeneity in blood stem cells revealed using a dynamic regulatory network model
Room: Hall 14.2Date: Tuesday, July 23Author(s): Nicola Bonzanni, VU University Amsterdam, Netherlands
Abhishek Garg, Swiss Institute of Bioinformatics, Switzerland
K. Anton Feenstra, VU University Amsterdam, Netherlands
Judith Schütte, University of Cambridge, United Kingdom
Sarah Kinston, University of Cambridge
Diego Miranda-Saavedra, University of Cambridge
Jaap Heringa, VU University Amsterdam / Netherlands Bioinformatics Centre
Ioannis Xenarios, Swiss Institute of Bioinformatics
Berthold Göttgens, University of Cambridge, United Kingdom
Session Chair: Lonnie Welch
Motivation:
Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements therefore has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes.
Results:
Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as ‘stepping stones’ for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or ‘trigger’ is required to exit the stem cell state, with distinct triggers characterising maturation into the various different lineages. By focussing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1 which we confirmed experimentally thus validating our model.
In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells.
TOPPP71 (HT) - Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and functioncc
Room: ICC Lounge 81Date: Tuesday, July 23Author(s): Michael Liam Tress, Centro Nacional de Investigaciones Oncologicas (CNIO), es
Iakes Ezkurdia, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Angela del Pozo, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Jose Manuel Rodriguez, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Alfonso Valencia, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Jennifer Harrow, Wellcome Trust Sanger Centre, United Kingdom
Adam Frankish, Wellcome Trust Sanger Centre, United Kingdom
Keith Ashman, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Session Chair: Janet Kelso
TOPPP72 (HT) - Genetic variants in the next generation: detection, reprioritizing and function annotation
Room: Hall 4/5Date: Tuesday, July 23Author(s): Junwen Wang, The University of Hong Kong, cn
Feng Xu, The University of Hong Kong, China
Mulin Li, The University of Hong Kong, China
Weixin Wang, The University of Hong Kong, China
Pak Sham, The University of Hong Kong, China
Panwen Wang, The University of Hong Kong, China
Session Chair: Reinhard Schneider
TOPPP73 (HT) - Metagenomic inference and biomarker discovery for the gut microbiome in inflammatory bowel disease
Room: Hall 7Date: Tuesday, July 23Author(s): Timothy Tickle, Harvard School of Public Health, us
Xochitl Morgan, Harvard School of Public Health, United States
Harry Sokol, University of Paris, France
Dirk Gevers, Broad Institute, United States
Kathryn Devaney, Massachusetts General Hospital, United States
Doyle Ward, Broad Institute, United States
Joshua Reyes, Harvard School of Public Health, United States
Samir Shah, Brown University, United States
Neal LeLeiko, Brown University, United States
Scott Snapper, Children's Hospital and Brigham and Women's Hospital, United States
Athos Bousvaros, Children's Hospital and Brigham and Women's Hospital, United States
Joshua Korzenik, Children's Hospital and Brigham and Women's Hospital, United States
Bruce Sands, Mount Sinai School of Medicine, United States
Ramnik Xavier, Massachusetts General Hospital, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Session Chair: Alfonso Valencia
TOPPP74 (HT) - Interplay of microRNAs, transcription factors and target genes: linking dynamic expression changes to function
Room: Hall 14.2Date: Tuesday, July 23Author(s): Petr Nazarov, Centre de Recherche Public de la Sante, lu
Susanne Reinsbach, University of Luxembourg, Luxembourg
Arnaud Muller, Centre de Recherche Public de la Sante, Luxembourg
Nathalie Nicot, Centre de Recherche Public de la Sante, Luxembourg
Demetra Philippidou, University of Luxembourg, Luxembourg
Laurent Vallar, Centre de Recherche Public de la Sante, Luxembourg
Stephanie Kreis, University of Luxembourg, Luxembourg
Session Chair: Ralf Zimmer
TOPPP75 (HT) - Systematic Computational Drug Repositioning
Room: ICC Lounge 81Date: Tuesday, July 23Author(s): Philippe Sanseau, GlaxoSmithKline, uk
Mark Hurle, GlaxoSmithKline, United States
Brent Richards, McGill University, Canada
Lon Cardon, GlaxoSmithKline, United States
Pankaj Agarwal, GlaxoSmithKline, United States
Session Chair: Donna Slonim
TOPPP76 (PT) - Information-theoretic evaluation of predicted ontological annotations
Room: Hall 4/5Date: Tuesday, July 23Author(s): Wyatt Clark, Indiana University, United States
Predrag Radivojac, Indiana University, United States
Session Chair: Reinhard Schneider
The development of effective methods for the prediction of ontological annotations is an important goal in computational biology, with protein function prediction and disease gene prioritization gaining wide recognition. While various algorithms have been proposed for these tasks, evaluating their performance is difficult due to problems caused both by the structure of biomedical ontologies and biased or incomplete experimental annotations of genes and gene products. In this work, we propose an information-theoretic framework to evaluate the performance of computational protein function prediction. We use a Bayesian network, structured according to the underlying ontology, to model the prior probability of a protein's function. We then define two concepts, misinformation and remaining uncertainty, that can be seen as information-theoretic analogs of precision and recall. Finally, we propose a single statistic, referred to as semantic distance, that can be used to rank or train classification models. We evaluate our approach by analyzing the performance of three protein function predictors of Gene Ontology terms and provide evidence that we address several weaknesses of currently used metrics. We believe this framework provides useful insights into the performance of protein function prediction tools.
TOPPP77 (PT) - CAMPways: Constrained Alignment Framework for the Comparative Analysis of a Pair of Metabolic Pathways
Room: Hall 7Date: Tuesday, July 23Author(s):Gamze Abaka, Kadir Has University, Turkey
Turker Biyikoglu, Izmir Institute of Technology
Cesim Erten, Kadir Has University
Session Chair: Alfonso Valencia
Given a pair of metabolic pathways, an alignment of the pathways corresponds to
a mapping between similar substructures of the pair. Successful alignments may provide useful applications in phylogenetic tree reconstruction, drug design, and overall may enhance our understanding of cellular metabolism. We consider the problem of providing one-to-many alignments of reactions in a pair of metabolic
pathways. We first provide a constrained alignment framework applicable to the problem. We show that the constrained alignment problem even in a very primitive setting is computationally intractable which justifies efforts for designing efficient heuristics. We present our Constrained Alignment of Metabolic Pathways (CAMPWays) algorithm designed for this purpose. Through extensive experiments involving a large pathway database we demonstrate that when compared to a state-of-the-art alternative, the CAMPWays algorithm provides better alignment results on metabolic networks as far as measures based same-pathway inclusion are concerned. The execution speed of our algorithm constitutes yet another important improvement over alternative algorithms.
TOPPP78 (PT) - Integrating sequence, expression and interaction data to determine condition-specific miRNA regulation
Room: Hall 14.2Date: Tuesday, July 23Author(s): Hai-Son Le, Carnegie Mellon, United States
Ziv Bar-Joseph, Carnegie Mellon
Session Chair: Ralf Zimmer
Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. MiRNAs were shown to play an important role in development and disease, and accurately determining the networks regulated by these miRNAs in a specific condition is of great interest. Early work on miRNA target prediction has focused on utilizing static sequence information. More recently, researchers have combined sequence and expression data to identify such targets in various conditions.
Results: Here we propose a regression-based probabilistic method that integrates sequence, expression and interaction data to identify modules of mRNAs controlled by small sets of miRNAs. We formulate an optimization problem and develop a learning framework to determine the module regulation and membership. Applying our method to cancer data we show that by adding protein interaction data and modeling combinatorial regulation our method can accurately identify both miRNA and their targets improving upon prior methods. We next used our method to jointly analyze a number of different types of cancers and identified both common and cancer type specific miRNA regulators.
TOPPP79 (PT) - Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations
Room: ICC Lounge 81Date: Tuesday, July 23Author(s):Salim Akhter Chowdhury, Carnegie Mellon University, United States
Stanley Shackney, Intelligent Oncotherapeutics
Kerstin Heselmeyer-Haddad, National Institutes of Health
Thomas Ried, National Institutes of Health
Alejandro Schäffer, National Institutes of Health
Russell Schwartz, Carnegie Mellon University, United States
Session Chair: Donna Slonim
Motivation: Development and progression of solid tumors can be attributed to a process of mutations, which typically includes changes in the number of copies of genes or genomic regions. Although comparisons of cells within single tumors show extensive heterogeneity, recurring features of their evolutionary process may be discerned by comparing multiple regions or cells of a tumor. A particularly useful source of data for studying likely progression of individual tumors is fluorescence in situ hybridization (FISH), which allows one to count copy numbers of several genes in hundreds of single cells. Novel algorithms for interpreting such data phylogenetically are needed, however, to reconstruct likely evolutionary trajectories from states of single cells and facilitate analysis of their evolutionary trajectories.
Results: In this paper, we develop phylogenetic methods to infer likely models of tumor progression using FISH copy number data and apply them to a study of FISH data from two cancer types. Statistical analyses of topological characteristics of the tree-based model provide insights into likely tumor progression pathways consistent with the prior literature. Furthermore, tree statistics from the resulting phylogenies can be used as features for prediction methods. This results in improved accuracy, relative to unstructured gene copy number data, at predicting tumor state and future metastasis.
Availability: A package of source code for FISH tree building (FISHtrees) and the data on cervical cancer and breast cancer examined here are publicly available at the site ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees.
TOPPP80 (HT) - Turning networks into ontologies of gene function
Room: Hall 4/5Date: Tuesday, July 23Author(s): Janusz Dutkowski, University of California, San Diego, us
Michael Kramer, University of California San Diego, United States
Michal Surma, 3. Max Planck Institute, Germany
Rama Balakrishnan, Stanford University, United States
J. Michael Cherry, Stanford University, United States
Nevan Krogan, University of California, San Francisco, United States
Trey Ideker, University of California San Diego, United States
Session Chair: Reinhard Schneider
TOPPP81 (HT) - Along Signal Paths: Connecting Pathway Annotation to Topological Analyses
Room: Hall 7Date: Tuesday, July 23Author(s): Gabriele Sales, Università di Padova, it
Paolo Martini, Università di Padova, Italy
Enrica Calura, Università di Padova, Italy
Chiara Romualdi, Università di Padova, Italy
Session Chair: Alfonso Valencia
TOPPP82 (PT) - The RNA Newton Polytope and Learnability of Energy Parameters
Room: Hall 14.2Date: Tuesday, July 23Author(s):Elmirasadat Forouzmand, Wayne State University, United States
Hamidreza Chitsaz, Wayne State University, United States
Session Chair: Ralf Zimmer
Motivation: Computational RNA structure prediction is a mature important problem which has received a new wave of attention with the discovery of regulatory non-coding RNAs and the advent of high-throughput transcriptome sequencing. Despite nearly two scores of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. So far, researchers have proposed increasingly complex energy models and improved parameter estimation methods, experimental and/or computational, in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. Why is that?
Approach: The first step towards high accuracy structure prediction is to pick an energy model that is inherently capable of predicting each and every one of known structures to date. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of such an inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. To the best of our knowledge, this is the first approach towards computing the RNA Newton polytope and a systematic assessment of the inherent capabilities of an energy model. The worst complexity of our algorithm is expontential in the number of features. However, one could employ dimensionality reduction techniques to avoid the curse of dimensionality.
Results: We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U, C-G, and G-U base pairs. Our results show that this simple energy model satisfies the necessary condition for more than half of the input unpseudoknotted sequence-structure pairs (55%) chosen from the RNA STRAND v2.0 database and severely violates the condition for about 13%, which provide a set of hard cases that require further investigation. From 1350 RNA strands, the observed three dimensional feature vector for 749 strands is on the surface of the computed polytope. For 289 RNA strands, the observed feature vector is not on the boundary of the polytope but its distance from the boundary is not more than one. A distance of one essentially means one base pair difference between the observed structure and the closest point on the boundary of the polytope, which need not be the feature vector of a structure. For 171 sequences, this distance is larger than 2, and for only 11 sequences, this distance is larger than 5.
TOPPP83 (PT) - Automated target segmentation and fast alignment methods for high-throughput classification and averaging of crowded cryo-electron subtomograms
Room: ICC Lounge 81Date: Tuesday, July 23Author(s): Min Xu, University of Southern California, United States
Frank Alber, University of Southern California
Session Chair: Donna Slonim
Motivation: Cryo-electron tomography allows the imaging of macromolecular complexes in near living conditions. To enhance the nominal resolution of a structure it is necessary to align and average individual subtomograms each containing identical complexes. However, if the sample of complexes is heterogeneous, it is necessary to first classify subtomograms into groups of identical complexes. This task becomes challenging when tomograms contain mixtures of unknown complexes extracted from a crowded environment. Two main challenges must be overcome: First, classification of subtomograms must be performed without knowledge of template structures. However, most alignment methods are too slow to perform reference-free classification of a large number of (e.g. tens of thousands) of subtomograms. Second, subtomograms extracted from crowded cellular environments, contain often fragments of other structures besides the target complex. However, alignment methods generally assume that each subtomogram only contains one complex. Automatic methods are needed to identify the target complexes in a subtomogram even when its shape is unknown.
Results: In this paper, we propose an automatic and systematic method for the isolation and masking of target complexes in subtomograms extracted from crowded environments. Moreover, we also propose a fast alignment method using fast rotational matching in real space. Our experiments show that, compared to our previously proposed fast alignment method in reciprocal space, our new method significantly improves the alignment accuracy for highly distorted and especially crowded subtomograms. Such improvements are important for achieving successful and unbiased high-throughput reference-free structural classification of complexes inside whole cell tomograms.
TOPPP84 (PT) - A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text
Room: Hall 4/5Date: Tuesday, July 23Author(s):Makoto Miwa, The University of Manchester, United Kingdom
Tomoko Ohta, The University of Manchester
Rafal Rak, The University of Manchester
Andrew Rowley, The University of Manchester
Douglas B. Kell, The University of Manchester
Sampo Pyysalo, The University of Manchester
Sophia Ananiadou, The University of Manchester
Session Chair: Reinhard Schneider
Motivation: In order to create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge.
Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models, and then turns them into queries for three text-mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machinelearning approaches.
Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText.
Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/.
Contact: makoto.miwa@manchester.ac.uk
TOPPP85 (PT) - A Context-Sensitive Framework for the Analysis of Human Signalling Pathways in Molecular Interaction Networks
Room: Hall 7Date: Tuesday, July 23Author(s): Alex Lan, Ben Gurion University, Israel
Michal Ziv-Ukelson, Ben Gurion University of the Negev, Israel
Esti Yeger-Lotem, Ben Gurion University, Israel
Session Chair: Alfonso Valencia
A major challenge in systems biology is to reveal the cellular pathways that give rise to specific phenotypes and behaviours. Current techniques often rely on a network representation of molecular interactions, where each node represents a protein or a gene and each interaction is assigned a single static score. However, the use of single interaction scores fails to capture the tendency of proteins to favour different partners under distinct cellular conditions. Here we propose a novel context-sensitive network model, in which genes and protein nodes are assigned multiple contexts based on their gene ontology annotations, and their interactions are associated with multiple context-sensitive scores. Using this model we developed a new approach and a corresponding tool, ContextNet, based on a dynamic programming algorithm for identifying signalling paths linking proteins to their downstream target genes. ContextNet finds high-ranking context-sensitive paths in the interactome, thereby revealing the intermediate proteins in the path and their path-specific contexts. We validated the model using 18,348 manually-curated cellular paths derived from the SPIKE database. We next applied our framework to elucidate the responses of human primary lung cells to influenza infection. Top-ranking paths were much more likely to contain infection-related proteins, and this likelihood was highly correlated with path score. Moreover, the contexts assigned by the algorithm pointed to putative as well as previously known responses to viral infection. Thus context-sensitivity is an important extension to current network biology models and can be efficiently used to elucidate cellular response mechanisms.
ContextNet is publicly available at http://netbio.bgu.ac.il/ContextNet.
TOPPP86 (PT) - A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotides distribution
Room: Hall 14.2Date: Tuesday, July 23Author(s): Vladimir Reinharz, McGill University, Canada
Yann Ponty, CNRS/LIX, Polytechnique, France
Jerome Waldispuhl, McGill University, Canada
Session Chair: Ralf Zimmer
Motivations: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software use similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criteria for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity.
Results: In this paper, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seed-less (we remove the bias of the seed in local search heuristics), and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop an hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology outperforms both local and global approaches.
TOPSS03_PartB - Inference, visualization and evaluation of signaling networks: A literature based framework
Room: Hall 1Date: Monday, July 22Author(s): Hayssam Soueidan, NKI-AVL, United States
TOPSS03_PartD - Using the rxncon framework for network definition, visualisation and modelling
Room: Hall 1Date: Monday, July 22Author(s): Marcus Krantz, Humboldt University, United States
TOPSS06_PartA1 - ELIXIR
Room: Hall 1Date: Tuesday, July 23Author(s): Niklas Blomberg,
TOPSS06_PartA2 - BioMedBridges - providing data and services bridges between the biomedical sciences infrastructures
Room: Hall 1Date: Tuesday, July 23Author(s): Janet Thornton, EMBL-EBI, United Kingdom
TOPSS06_PartC3 - ELIXIR Swedish Node
Room: Hall 1Date: Tuesday, July 23Author(s): Bengt Persson, Linköping University, Sweden
TOPTT01 -
Room: Hall 9Date: Sunday, July 21Author(s): ,
Session Chair: Geoff Barton
TOPTT02 -
Room: Hall 10Date: Sunday, July 21Author(s): ,
Session Chair: Dominic Clark
TOPTT03 -
Room: Hall 9Date: Sunday, July 21Author(s): ,
Session Chair: Geoff Barton
TOPTT04 -
Room: Hall 10Date: Sunday, July 21Author(s): ,
Session Chair: Dominic Clark
TOPTT05 -
Room: Hall 9Date: Sunday, July 21Author(s): ,
Session Chair: Geoff Barton
TOPTT06 -
Room: Hall 10Date: Sunday, July 21Author(s): ,
Session Chair: Dominic Clark
TOPTT07 -
Room: Hall 9Date: Sunday, July 21Author(s): ,
Session Chair: Geoff Barton
TOPTT08 -
Room: Hall 9Date: Sunday, July 21Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT09 -
Room: Hall 10Date: Sunday, July 21Author(s): ,
Session Chair: Dominic Clark
TOPTT10 -
Room: Hall 9Date: Sunday, July 21Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT11 -
Room: Hall 10Date: Sunday, July 21Author(s): ,
Session Chair: Dominic Clark
TOPTT12 -
Room: Hall 10Date: Sunday, July 21Author(s): ,
Session Chair: Dominic Clark
TOPTT13 -
Room: Hall 9Date: Sunday, July 21Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT14 -
Room: Hall 10Date: Sunday, July 21Author(s): ,
Session Chair: Dominic Clark
TOPTT15 -
Room: Hall 9Date: Monday, July 22Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT16 -
Room: Hall 10Date: Monday, July 22Author(s): ,
Session Chair: Christophe Blanchet
TOPTT17 -
Room: Hall 9Date: Monday, July 22Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT18 -
Room: Hall 10Date: Monday, July 22Author(s): ,
Session Chair: Christophe Blanchet
TOPTT19 -
Room: Hall 9Date: Monday, July 22Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT20 -
Room: Hall 10Date: Monday, July 22Author(s): ,
Session Chair: Christophe Blanchet
TOPTT21 -
Room: Hall 10Date: Monday, July 22Author(s): ,
Session Chair: Christophe Blanchet
TOPTT22 -
Room: Hall 9Date: Monday, July 22Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT23 -
Room: Hall 10Date: Monday, July 22Author(s): ,
Session Chair: Christophe Blanchet
TOPTT24 -
Room: Hall 9Date: Monday, July 22Author(s): ,
Session Chair: Johannes Soedling
TOPTT25 -
Room: Hall 10Date: Monday, July 22Author(s): ,
Session Chair: Christophe Blanchet
TOPTT26 -
Room: Hall 9Date: Monday, July 22Author(s): ,
Session Chair: Johannes Soedling
TOPTT27 -
Room: Hall 9Date: Monday, July 22Author(s): ,
Session Chair: Johannes Soedling
TOPTT28 -
Room: Hall 10Date: Monday, July 22Author(s): ,
Session Chair: Christophe Blanchet
TOPTT29 -
Room: Hall 9Date: Tuesday, July 23Author(s): ,
Session Chair: Dominic Clark
TOPTT30 -
Room: Hall 9Date: Tuesday, July 23Author(s): ,
Session Chair: Dominic Clark
TOPTT31 -
Room: Hall 9Date: Tuesday, July 23Author(s): ,
Session Chair: Dominic Clark
TOPTT32 -
Room: Hall 9Date: Tuesday, July 23Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT33 -
Room: Hall 9Date: Tuesday, July 23Author(s): ,
Session Chair: Rodrigo Lopez
TOPTT34 -
Room: ICC Lounge 81Date: Tuesday, July 23Author(s): ,
TOPTT35 -
Room: Hall 10Date: Tuesday, July 23Author(s): ,
Session Chair: Johannes Soedling
TOPTT36 -
Room: Hall 9Date: Tuesday, July 23Author(s): ,
Session Chair: Rodrigo Lopez
TOP