Late Breaking Research Presentation Schedule

As of May 1, 2013 (schedule subject to change)

LBR01 - Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies
Room: Hall 15.2
Date: Sunday, July 21 10:30 a.m. - 10:55 a.m.

Presenting Author: Young-suk Lee, Princeton University, United States


Additional Authors:
Qian Zhu, Princeton University, United States
Arjun Krishnan, Princeton University, United States
Olga Troyanskaya, Princeton University, United States
Abstract
Directly dealing with multicellularity and heteorogeneity of human gene expression samples is paramount for understanding human homeostasis, disease manifestation and pharmacokinetics/pharmacodynamics. However, leveraging gene expression data through large-scale integrative analyses is challenging because most samples are not fully annotated to their tissue/cell-type of origin. A computational method to classify samples using their entire gene expression profiles is needed. Such a method must be applicable across thousands of independent studies, hundreds of gene expression technologies, and hundreds of diverse human tissues and cell-types. We present URSA (Unveiling RNA Sample Annotation) that leverages the complex tissue/cell-type relationships and simultaneously estimates the probabilities associated to hundreds of tissues/cell-types for any given gene expression profile. URSA provides accurate and intuitive probability values for expression profiles across independent studies and outperforms other methods irrespective of data preprocessing techniques. Moreover, without re-training, URSA can be used to classify samples from diverse microarray platforms and even from next generation sequencing technology. Finally, we provide a molecular interpretation for the tissue and cell-type models as the biological basis for URSA’s classifications.

Keywords: Functional Genomics, Systems Biology and Networks

Presentation PDF: Download Abstract

TOP

LBR02 - A Model-Based Analysis of GC-Biased Gene Conversion
Room: Hall 15.2
Date: Sunday, July 21 11:00 a.m. - 11:25 a.m.

Presenting Author: John Capra, Vanderbilt University, United States

Additional Authors:
Melissa Hubisz, Cornell University, United States
Dennis Kostka, University of Pittsburgh, United States
Katherine Pollard, University of California, San Francisco, United States
Adam Siepel, Cornell University, United States

Abstract
Interpreting patterns of DNA sequence variation between the genomes of closely related species is critically important to understanding the causes and functional effects of nucleotide substitutions. In addition to well-studied adaptive processes, like natural selection, other forces influence substitution patterns. GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that favors the fixation of strong (G/C) over weak (A/T) alleles. In mammals, gBGC is thought to promote variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations. It also has the potential to produce false positives in common tests for positive selection. However, because it is difficult to incorporate gBGC into existing statistical models of evolution, its genome-wide influence is poorly understood. In this work, we describe a new phylogenetic hidden Markov model that jointly models the effects of selection and gBGC and apply it to the human and chimpanzee genomes. We find that gBGC has influenced a small, but important fraction of these genomes. Fast evolving regions and disease-associated polymorphisms show significant enrichment for gBGC. Overall, our analyses indicate that gBGC has been an important force in recent human evolution, and our publicly available algorithms and predictions will enable other researchers to consider gBGC in their analyses.

Keywords: Comparative Genomics, Population Genetics Variation and Evolution

Presentation PDF: Download Abstract

TOP

LBR03 - Determination of hormone induced structural changes in genomic topological domains
Room: Hall 15.2
Date: Sunday, July 21 11:30 a.m. - 11:55 a.m.

Presenting Author: Davide Bau, Centro Nacional de Analisis Genomica, Spain

Additional Authors:
Marc Marti-Renom, Centro Nacional de Analisis Genomica, Spain

Abstract
Advances in genomic technologies have allowed getting better insights into how the genome is organized inside the cell nucleus. Recently, it has been shown that chromatin is organized in Topologically Associating Domains (TADs), large interaction domains that appear to be conserved among different cell types. To determine whether these TADs have a functional role during the dynamic changes of gene expression in terminally differentiated cells, we studied the relationship between the spatial position of Progesterone (Pg) responsive genes and the TAD structure in breast cancer cells. Using Hi-C data, we found that the genome is organized into about 2,000 TADs. TADs were similarly positioned before and after hormone treatment; nonetheless the Pg induced some changes in the intra-TAD chromatin interactions. Unexpectedly, a large proportion of genes that responded similarly upon Pg treatment was clustered within individual TADs, indicating a topological segregation of Pg up- and down-regulation sites. Remarkably, hormone induced correlated epigenetic changes that spread over several 100kb, revealing regional remodeling of chromatin. Although consecutive TADs can be covered by one or more similar epigenetic changes, their combination differs among individual consecutive TADs, reflecting topologically restrained combinatory chromatin signatures. Integrative 3D modeling of the intra-TAD contacts before and after Pg stimulation further supports this hypothesis, showing dynamic structural changes correlated with the transcriptional response. Given the segregation of target genes in TADs and the fine-tuning of Pg induced chromatin changes, we propose that TADs behave as regulons enabling spatially proximal genes to be coordinately transcribed in response to hormone.

Keywords: Genome Organization and Annotation, Functional Genomics

Presentation PDF: Download Abstract

TOP

LBR04 - Maximum Parsimony Interpretation of Chromatin Capture Experiments
Room: Hall 15.2
Date: Sunday, July 21 12:00 p.m. - 12:25 p.m.

Presenting Author: Andrzej Kudlicki, University of Texas Medical Branch, United States

Abstract
Genome-wide chromatin conformation capture experiments allow characterizing the spatial structure of genome; however, existing methods of data processing provide no means of appreciating the variability between the cells in the sample. We present a novel algorithmic framework that addresses this problem by analyzing the geometric and topological characteristics of an experimental DNA contact network. Our method applied to the measurement of interactions in the yeast genome of Duan et al (2010) prove that indeed no homogeneous conformation can agree with the observed 3C contacts, and attempting to construct a homogeneous 3D model will lead to thousands of geometrically impossible structural motifs. The topological properties of the DNA contact network, along with Occam’s razor principle, are used to reconstruct the chromatin conformations characteristic of uniform subpopulations of cells confounding the experimental sample. Specifically, the individual chromatin states are inferred by analyzing and coloring a line graph representing geometrical conflicts within the DNA contact network, i.e., loci whose direct interpretation will lead to violation of the triangle inequality. We show that hundreds of thousands of conflicting interactions can be resolved by just a handful of chromatin states, and the the properties of these states point to different transcriptional programs being executed.

Keywords: Genome Organization and Annotation, Computational aspects

Presentation PDF: Download Abstract

TOP

LBR05 - A protein domain-centric approach for the comparative analysis of human and yeast phenotypically relevant mutations
Room: Hall 15.2
Date: Sunday, July 21 2:10 p.m. - 2:35 p.m.

Presenting Author: Maricel Kann, UMBC, United States

Abstract
The body of disease mutations with known phenotypic relevance continues to increase and is expected to do so even faster with the advent of new experimental techniques such as whole- genome sequencing coupled with disease association studies. However, genomic association studies are limited by the molecular complexity of the phenotype being studied and the population size needed to have adequate statistical power. One way around this problem, which is critical for the study of rare diseases, is to study the functional patterns of known disease mutations. We have previously shown that the functional patterns of known human disease mutations have a significant tendency to cluster at protein domain positions, namely position-based domain hotspots of disease mutations. However, the limited number of known disease mutations remains the main factor hindering the advancement of mutation studies at a functional level. In this paper, we address this problem by incorporating mutations known to be disruptive of phenotypes in other species. Focusing on two evolutionarily distant organisms, human and yeast, we describe the first inter-species analysis of mutations of phenotypic relevance at the protein domain level. Our results show that phenotypic mutations from yeast cluster at specific positions on protein domains, a characteristic previously revealed to be displayed by human disease mutations. This first-of-a-kind study of phenotypically relevant yeast mutations in relation to human disease mutations demonstrates the utility of a multi-species analysis for advancing the understanding of the relationship between genetic mutations and phenotypic changes at the organismal level.

Keywords: Bioinformatics of Disease and Treatment, Genetic Variation Analysis

Presentation PDF: Download Abstract

TOP

LBR06 - Predicting the biochemical consequences of missense mutations using genome-wide homology modeling
Room: Hall 15.2
Date: Sunday, July 21 2:40 p.m. - 3:05 p.m.

Presenting Author: Andrew Bordner, Mayo Clinic, United States

Additional Authors:
Barry Zorman, Mayo Clinic, United States

Abstract
The discovery of which mutations contribute to a particular disease is an important biomedical problem with potential applications in drug discovery, disease diagnosis and prognosis, and the development of improved personalized therapies. To this end, we have developed a computational method that integrates complementary approaches for predicting the biochemical effects of missense mutations using genome-wide generation of homology models for human protein complexes. Mutations affecting diverse types of binding sites are identified by homology to available X-ray structures of complexes and machine learning classifiers while spatial clustering of mutations is used to detect other compact regions of the protein structure important for its function. A Random Forest classifier trained on results from these structure-based methods, as well as annotations from online databases, evolutionary conservation, and predicted stability changes was found to outperform current popular prediction methods. Finally, the predicted biochemical effects of mutations showed good agreement with experimental assays.

Keywords: Protein Structure and Function Prediction and Anal, Genetic Variation Analysis

Presentation PDF: Download Abstract

TOP

LBR07 - Integrative modelling coupled with mass spectrometry (MS)-based approaches reveals the structure and dynamics of protein assemblies
Room: Hall 15.2
Date: Sunday, July 21 3:10 p.m. - 3:35 p.m.

Presenting Author: Argyris Politis, Univeristy of Oxford, United Kingdom

Abstract
In recent years, integrative structure determination of protein complexes has garnered great interest as a result of the vast amount of data obtained by different experiments. In particular integrative approaches have gained attention for studying highly heterogeneous and dynamic systems which remain refractory to structure determination by conventional methods. Key developments in emerging mass spectrometry (MS)-based techniques, such as native MS and ion mobility (IM)-MS, have led to their integration into the structural biologist’s pipeline. Here we present an integrative approach for structure determination of protein assemblies by combining native mass spectrometry (MS), ion mobility-MS and chemical cross-linking MS. The accuracy and confidence levels of this approach are demonstrated by encoding data from MS techniques into restraints for assembling a set of known hetero-complexes from their building blocks. This method enabled us to characterize the structures of two unknown precursors acting en route to the assembly of the AAA-ATPase base subcomplex within proteasome, a macromolecule responsible for the controlled degradation of intracellular proteins.

Keywords: Protein Structure and Function Prediction and Anal, Computational aspects

Presentation PDF: Download Abstract

TOP

LBR08 - The next generation of SCOP and ASTRALCANCELLED



LBR09 - Computational methods to preclude switch-like behavior: analysis of the Biomodels database
Room: Hall 15.2
Date: Monday, July 22 10:30 a.m. - 10:55 a.m.

Presenting Author: Elisenda Feliu, University of Copenhagen, Denmark

Additional Authors:
Miguel A Alejo, University of Copenhagen, Denmark
Carsten Wiuf, University of Copenhagen, Denmark

Abstract
The number of states in which a cell can be at any given time is linked to the flexibility in its decision making and to cell-to-cell variability. Particularly, bi- and multistable cellular systems provide mechanisms for rapidly switching between different responses. Identifying whether a system exhibits multistable behavior or not is, however, challenging. The theoretical determination of small motifs in gene regulatory networks and signaling pathways that can exhibit multistationarity has been the focus of several studies in the past. However, it remains unclear to what extend these motifs are actually highly represented in living cells.

We have developed a computational method that gives a necessary condition for a system to exhibit multistationarity. If a system is multistationary, we can screen all small subnetworks and determine the key components in multistationarity. We have applied the method to 365 models extracted from the publicly available database Biomodels with data precomputed in PoCab. In this way, we have obtained a catalog of small motifs responsible for multistationarity in real systems.

At the conference, the method will be briefly described and the exhaustive analysis of the Biomodels database, including the small structures causing multistationarity, will be presented

Keywords: Systems Biology and Networks, Computational aspects

Presentation PDF: Download Abstract

TOP

LBR10 - Efficient Modeling and Active Learning of Biological Responses: Learning without Prior Knowledge
Room: Hall 15.2
Date: Monday, July 22 11:00 a.m. - 11:25 a.m.

Presenting Author: Armaghan Naik, Carnegie Mellon University, United States


Additional Authors:
Joshua Kangas, Carnegie Mellon University, United States
Devin Sullivan, Carnegie Mellon University, United States
Christopher Langmead, Carnegie Mellon University, United States
Robert Murphy, Carnegie Mellon University, United States
Robert Murphy, Carnegie Mellon University, United States
Abstract
High throughput screening involves determination of the effect of many chemical compounds on a given cellular target. As currently practiced, a full set of measurements for all compounds for each new target is typically made, with little use of information from previous screens. To efficiently study compound effects on many targets, a means is needed for determining and exploiting similarities in the effects of compounds and/or behavior of targets such that measurements of all combinations of compounds and targets are not needed to achieve high accuracy. Here, we describe probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for selecting future informative batches of experiments. Through extensive simulated experiments we showed that our approaches can produce powerful predictive models and learn them significantly faster than can be done by random choice. We further characterized our method’s performance experimentally using a collection of 48 compounds and 48 NIH 3T3 cell clones expressing different GFP-tagged proteins; the learner’s task was to efficiently build a model of the effects of each compound on each clone. Since none of the effects were known prior to beginning the experiments, each clone and compound was silently duplicated to provide the ability to check how well duplicates were recognized. The learner could to request acquisition of batches of image data for specific combinations of drugs and clones using liquid handling robotics and an automated microscope. Our method achieved a 92% accuracy having only sampled 28% of the experiment space.

Keywords: Systems Biology and Networks, Proteomics

Presentation PDF: Download Abstract

TOP

LBR11 - De novo reconstruction of cell cycle progression using Tour-Recovered Automatic models for Cellular Continuums (TRACC) on multiparameter flow cytometry data
Room: Hall 15.2
Date: Monday, July 22 11:30 a.m. - 11:55 a.m.

Presenting Author: Tiffany Chen, Stanford University, United States

Additional Authors:
Matthew Clutter, Stanford University, United States
Nikesh Kotecha, Stanford University, United States
Karen Sachs, Stanford University, United States
Wendy Fantl, Stanford University, United States
Garry Nolan, Stanford University, United States
Serafim Batzoglou, Stanford University, United States

Abstract
Most cell-based drug screening methods identify and evaluate potential drug candidates based on measurements of cell death or target inhibition. Using these approaches, the global impact of these drug candidates on cell cycle and signaling networks is greatly deemphasized, even though quantitative analysis of the cell cycle is fundamental to most anti-cancer drug development. Single-cell multiparameter flow cytometry can simultaneously measure intracellular proteins including those participating in the cell cycle and signaling pathways. To date, however, no automated, data-driven method exists for processing such biologically complex measurements. To address this need, we developed Tour-Recovered Automatic models for Cellular Continuums (TRACC), a computational methodology for automatically reconstructing the cell cycle de novo from flow cytometry data. TRACC reconstructs cell cycle progression without prior expert knowledge, thus setting a foundation for automated cell cycle analysis.

Keywords: Systems Biology and Networks, Computational aspects

Presentation PDF: Download Abstract

TOP

LBR12 - Network-based stratification of tumor mutations
Room: Hall 15.2
Date: Monday, July 22 12:00 p.m. - 12:25 p.m.

Presenting Author: Matan Hofree, UCSD, United States

Additional Authors:
John Shen, University of California, San Diego, United States
Hannah Carter, University of California, San Diego, United States
Andy Gross, University of California, San Diego, United States
Trey Ideker, University of California, San Diego, United States

Abstract
Many forms of cancer consist of multiple subtypes with different molecular causes and clinical outcomes. Somatic tumor genomes provide a rich new source of data for uncovering these subtypes, but have proven difficult to compare as two tumors rarely share the same mutations. Here, we introduce ‘network-based stratification’(NBS) which integrates somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients who have mutations within similar network regions. We demonstrate the validity of this approach in simulation. Next, we apply the method to somatic mutation data from three cancer patient cohorts collected as part of The Cancer Genome Atlas - ovarian cancer(OV), breast cancer(BRCA) and uterine cancer(UCEC) and are able to discover a robust cluster assignment significantly associated with important clinical phenotypes. In BRCA we recover subtypes significantly correlated with known subtypes and other clinical makers. In UCEC subtypes segregate patients into distinct sets enriched for tumor grade and histology. In OV subtypes are associated with patient survival and acquired resistance to platinum chemotherapy. We use the OV subtypes to define a predictive signature based on gene expression which successfully recovers the somatic mutation derived subtypes in an independent expression cohort. Finally, we use the subtypes derived in each cohort to highlight potentially dysregulated subnetworks characteristic of each mutation derived subtypes. This study provides a proof of principle for the utility of combining somatic mutation genotypes with interaction networks, enabling the discovery of clinically meaningful mutation based subtypes.

Keywords: Bioinformatics of Disease and Treatment, Systems Biology and Networks

Presentation PDF: Download Abstract

TOP

LBR13 - Comparison of D. melanogaster and C. elegans Developmental Stages by modENCODE RNA-Seq data
Room: Hall 15.2
Date: Monday, July 22 2:10 p.m. - 2:35 p.m.

Presenting Author: Steven Brenner, University of California, Berkeley, United States

Additional Authors:
Jingyi Jessica Li, University of California, Berkeley, United States
Haiyun Huang, University of California, Berkeley, United States
Peter Bickel, University of California, Berkeley, United States

Abstract
Drosophila melanogaster and Caenorhabditis elegans are two well-studied model organisms in developmental biology. Their morphological development differ greatly, yet we postulated that there may nonetheless be underlying shared developmental programs employing orthologous genes. We used modENCODE RNA-Seq data to perform a transcriptome-wide comparison of their developmental time courses to address this question. Our approach centers on using stage-associated orthologous genes to link the two organisms. For every stage in each organism, we select stage-associated genes which are defined as relatively highly expressed at that stage compared with others. We tested the dependence of a pair of D. melanogaster and C. elegans stages in terms of orthologous gene expression—the number of orthologous gene pairs associated with both stages.
We first carried out the test on pairs of stages within D. melanogaster and C. elegans respectively, and we found that temporally adjacent stages in both species exhibit high dependence in gene expression, supporting the validity of this approach. When comparing fly with worm, we observed a strong colinearity of their developmental time courses from early embryos to late larvae. Another parallel collinear pattern is found between fly white prepupae through adults and worm late embryos through adults. Investigating stage-associated genes overlapped between stages shows that many- to-one fly-worm orthologs are key factors leading to the two collinear patterns. Some orthologs are known to play similar roles in both organisms, and their mapping in this study may help inform their functions in the development of D. melanogaster and C. elegans.

Keywords: Comparative Genomics, Functional Genomics

Presentation PDF: Download Abstract

TOP

LBR14 - Phylogenetic quantification of intra-tumour heterogeneity
Room: Hall 15.2
Date: Monday, July 22 2:40 p.m. - 3:05 p.m.

Presenting Author: Roland Schwarz, European Molecular Biology Laboratory, United Kingdom

Abstract

Intra-tumour heterogeneity (ITH) is currently the focus of cancer research due to its implications for disease progression, resistance development and its impact on personalised medicine approaches. Understanding the aetiology of ITH involves reconstructing the evolutionary history of cancer within the patient. Especially with respect to genomic rearrangements this is impeded by changing cellularity, unknown phasing of genomic variants and the fact that genomic rearrangement events cover large often overlapping segments of the genome.

In this study we have assembled a novel clinical dataset of 170 copy number (CN) profiles from 20 patients undergoing neoadjuvant chemotherapy for high-grade serous ovarian cancer. Patients were sampled at multiple distinct sites at biopsy, interval debulking surgery and relapse. We have developed MEDICC, a novel phylogenetic method for reconstruction of evolutionary trees based on genomic rearrangements. Employing state-of-the art machine learning techniques we phase parental alleles, reconstruct trees and ancestral genomes and at the same time numerically quantify the degree of ITH and clonal expansion in each patient. Correlation of these indices with clinical endpoints such as progression free survival shows how the amount of genomic change in the course of chemotherapy, and the degree of clonal expansion determine patient survival times.

Our study is the first to combine rigorous evolutionary methodology and with a novel clinical dataset of a large patient cohort to quantify ITH in a rigorous and unbiased manner. We combine insights from natural language processing with spatial statistics to quantify biologically meaningful indices of cancer progression in a coherent translational setting.


Keywords: Bioinformatics of Disease and Treatment, Comparative Genomics

Presentation PDF: Download Abstract

TOP

LBR15 - The Yule-Simpson effect casts doubt on DNA methylation differences at functional boundaries
Room: Hall 15.2
Date: Monday, July 22 3:10 p.m. - 3:35 p.m.

Presenting Author: Meromit Singer, UC Berkeley, United States


Additional Authors:
Lior Pachter, University of California, Berkeley, United States
Abstract
Genome-wide functional assays based on high-throughput sequencing now allow for experimental probing of a wide variety of molecular phenotypes. Among these is DNA methylation, which can be probed at all CpG sites in the genome using bisulfite sequencing. This has allowed for comparisons of methylation extent in different functional regions by first averaging methylation states within region types and then comparing averages between regions. Such comparisons have become commonplace in genome-wide DNA methylation studies. For example, it has been repeatedly reported that the methylation extent is significantly higher in coding regions as compared to introns or UTRs. We report and characterize a bias present in these seemingly straightforward comparisons that is a special case of the Yule-Simpson's effect and show it has extensively altered the magnitude and significance of DNA methylation differences observed and reported from such comparative studies. The bias we discuss arises from the dependance of the sparsity of CpG sites on the extent of evolutionary pressure at a region, together with its overall methylation state. We present a correction utilizing a matrix completion algorithm that is based on a methylation model and show how it affects reported results regarding differences in DNA methylation across functional regions.

Keywords: Epigenetics, Computational aspects

Presentation PDF: Download Abstract

TOP

LBR16 - Epigenetic mechanisms underlying human T helper cell differentiation
Room: Hall 15.2
Date: Monday, July 22 3:40 p.m. - 4:05 p.m.

Presenting Author: Harri Lähdesmäki, Aalto University, Finland

Additional Authors:
David Hawkins, University of Washington School of Medicine, United States
Antti Larjo, Aalto University, Finland
Subhash Tripathi, University of Turku and Åbo Akademi University, Finland
Ulrich Wagner, Ludwig Institute for Cancer Research, University of California San Diego, United States
Ying Luu, Ludwig Institute for Cancer Research, University of California San Diego, United States
Tapio Lönnberg, University of Turku and Åbo Akademi University, Finland
Sunil Raghav, University of Turku and Åbo Akademi University, Finland
Leonard Lee, Ludwig Institute for Cancer Research, University of California San Diego, United States
Riikka Lund, University of Turku and Åbo Akademi University, Finland
Bing Ren, Ludwig Institute for Cancer Research, University of California San Diego, United States
Riitta Lahesmaa, University of Turku and Åbo Akademi University, Finland

Abstract
Multipotent CD4+ T cells are central to the adaptive immune system. CD4+ T cells can differentiate to functionally distinct effector subtypes such as T helper 1 (Th1), Th2, Th17, and iTreg. In this study, we have focused on identification of histone modifications (H3K4me1, H3K27ac, H3K4me3) that define the cell-type specific functional cis-regulatory repertoire for early differentiating human Th1 and Th2 cells. Additionally, we have integrated genome-wide digital gene expression analysis from the Helicos platform to correlate epigenetic information with gene expression. We also overlay the identified enhancer regions with open chromatin sites (DNase-seq) from fully differentiated T cells to characterize whether early enhancers are active only during the early lineage specification or remain active in committed Th cells. By analyzing transcription factor binding sites at enhancers we are able to identify known and novel transcriptional regulators which drive the lineage determination. Lastly, under the principle that improper cell fate specification can lead to immunopathogenesis, we found within these lineage-specific enhancers a great number of SNPs from genome-wide association studies (GWAS) that were associated with various autoimmune disorders including T1D, rheumatoid arthritis, Crohn’s disease, and asthma. Several alter transcription factor binding site motifs, and using DAPA experiments we show for a subset of such SNPs within these predicted sites that they influence transcription factor binding. This study provides the first look at how enhancers can contribute to early human T cell lineage specification. Our results also provide insight into how regulatory SNPs may contribute to the disease pathogenesis.

Keywords: Epigenetics, Functional Genomics

Presentation PDF: Download Abstract

TOP

LBR17 - An assessment of the recovery of curated genetic variants through text mining
Room: Hall 10
Date: Tuesday, July 23 10:30 a.m. - 10:55 a.m.

Presenting Author: Karin Verspoor, NICTA, Australia


Additional Authors:
Antonio Jimeno Yepes, National Information Communications Technology Australia, Australia
Abstract
We assess a mutation extraction tool with respect to the task of curation of the literature for the purpose of populating a database of genetic variation information. Our analysis shows that the ability of text mining tools to recover the mutations catalogued in the databases is far less than what would be expected based on the typically excellent performance of such tools on intrinsic evaluation. While lack of access to the full text of publications has been argued to explain this phenomenon, we show show that the effect persists even when the full text article that was indicated to be the direct source of a mutation in a curated resource is available for processing. We explore several possible explanations for these results, including difficulties in linking genetic variants to specific genes, and the inclusion of data from high-throughput experiments. The results of our work have implications for the future development of text mining systems for genetic variation.

Keywords: Genetic Variation Analysis, other

Presentation PDF: Download Abstract

TOP

LBR18 - Quantification of Cell-to-cell Variability in Protein Spatial Spread from Fluorescence Microscopy of Unsynchronized Budding Yeast
Room: Hall 10
Date: Tuesday, July 23 11:00 a.m. - 11:25 a.m.

Presenting Author: Louis-Francois Handfield, University of Toronto, Canada

Additional Authors:
Alan Moses, University of Toronto, Canada

Abstract
The characterization of protein abundance and stochastic abundance has been systematically defined in budding yeast using fluorescently tagged proteins. Subcellular location can also be systematically uncovered using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. As an alternative, we capture cell stage dependence of protein spatial expression within automatically identified cells. We use the identified the bud area as cell-stage indicator. We show that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Further analysis reveals that such a characterization allows identify a 12% of the 4004 proteins by finding the protein that is closest in expression pattern in a replicate experiment. This characterization includes stochasticity levels in measurement, which are correlated with previous reports in the case of stochasticity in protein abundance. Other stochasticity levels, such as in compactness for protein expression, are shown to be reproducible. Changes in cell morphology due to the alpha factor mating pheromone or changes of fluorescents markers required for segmentation also have a limited impact on the measured variability levels. Our results suggest that quantitative cell-stage dependent representations of protein spread discriminates protein spatial expressions without requiring predefined subcellular location classes. We show that some major quantified deviations, such as high spatial variability, are systematically detected under a spectrum of experimental conditions.

Keywords: Proteomics,

Presentation PDF: Download Abstract

TOP

LBR19 - ARepA: automated repository acquisition for standardized high-throughput data retrieval, normalization, and analysis
Room: Hall 10
Date: Tuesday, July 23 11:30 a.m. - 11:55 a.m.

Presenting Author: Daniela Boernigen, Harvard School of Public Health, Harvard University, United States

Additional Authors:
Yo Sup Moon, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Eric Franzosa, Harvard School of Public Health, United States
Curtis Huttenhower, Harvard School of Public Health, United States

Abstract
Biological databases of high-throughput experimental results provide vast and growing resources for medical, and bioinformatic research. Open questions remain in how best to maintain such resources, access them computationally, meta-analyze their contents from hundreds of experiments, and do so reproducibly while maintaining computational best practices.

We present ARepA, an extensible, modular Automated Repository Acquisition system for reproducible biological data acquisition and processing. ARepA allows configurable data access for any organism(s) from the GEO, IntAct, BioGRID, RegulonDB, STRING, Bacteriome, and MPIDB databases. A user can retrieve raw data and metadata from these repositories, normalize data files, and automatically process them in standardized ways (e.g. for network analysis). When retrieving data from six model organisms, ARepA currently produces more than 2M interactions (600K physical interactions, 4K regulatory interactions, 1.5M functional associations) and 2.7K gene expression data sets covering approx. 800K samples, accompanied by corresponding metadata and derived network data.

We include biological examples demonstrating the utility of ARepA for integrative analyses. When focusing on human data, ARepA's metadata database allowed us to identify and standardize 12 human prostate cancer gene expression datasets from GEO, which were subsequently meta-analyzed across six different platforms. A subsequent co-expression network analysis correctly recovered the NfκB signaling pathway along with new candidate genes with roles in prostate cancer. A similar example in mouse integrates 11 gene expression datasets selected by querying ARepA for metadata indicating germ-free and intestinal tissue conditions. Finally, multiple data types from three model microbes were integrated to assess differences in peptide secretion systems.

Keywords: Functional Genomics, Computational aspects

Presentation PDF: Download Abstract

TOP

LBR20 - Deciphering the Gene Expression Code via a Combined Synthetic-Computational Biology Approach
Room: Hall 10
Date: Tuesday, July 23 12:00 p.m. - 12:25 p.m.

Presenting Author: Tamir Tuller, Tel Aviv University, Israel

Abstract
One of the greatest challenges of functional genomics is to decipher the way information encoded in that transcript affects various aspects of its expression regulation. Since it is impossible to determine the causality based on the analysis of endogenous sequence features and expression levels we suggest a combined and novel computational-synthetic biology approach. The talk will survey large scale synthetic biology experiments for understanding three aspects of gene expression: 1) splicing, 2) translation elongation; 3) translation initiation from out-of-frame codons; in each experiment a specific library including hundreds of heterologous genes has been tailored to tackle the corresponding question, expression levels of all the library genes have been expressed in S. cerevisiae, and the results were computationally analyzed.
Among others, our analyses emphasize the contribution of local folding strength in different parts of the transcript, and the position and distribution of codons to splicing and translation efficiency and fidelity. In addition, we report novel sets of enhancer and silencer sequence motifs that contribute to various aspects of translation and splicing regulation.

I will also explain how the results inferred in the three studies are integrated, and compared to existing computational biophysical models of gene expression, and will compare the obtained results to the ones reported recently via an evolutionary systems biology analysis of endogenous genes.

Keywords: Functional Genomics, Systems Biology and Networks

Presentation PDF: Download Abstract

TOP

LBR21 - Experimental characterization of the human non sequence-specific nucleic acid interactome
Room: Hall 10
Date: Tuesday, July 23 2:10 p.m. - 2:35 p.m.

Presenting Author: Jacques Colinge, CeMM, Austria

Additional Authors:
Gerhard Dürnberger, CeMM, Austria
Tilmann Bürckstümmer, CeMM, Austria
Kilian Huber, CeMM, Austria
Roberto Giambruno, CeMM, Austria
Evren Karayel, CeMM, Austria
Thomas Burkard, CeMM, Austria
Ines Kaupe, CeMM, Austria
Andre Müller, CeMM, Austria
Keiryn Bennett, CeMM, Austria
Tobias Doerks, EMBL, Germany
Peer Bork, EMBL, Germany
Andreas Schönegger, CeMM, Austria
Gehard Ecker, Uni Wien, Austria
Hans Lohninger, TU Wien, Austria
Giulio Superti-Furga, CeMM, Austria

Abstract
Interactions between proteins and nucleic acids (NAs) play a pivotal role in a wide variety of essential biological processes. Transcription factors that recognize specific DNA motifs only constitute part of the NA-binding proteins (NABPs). In this study, we present the first large-scale effort to systematically map human NABPs with generic classes of nucleic acids. Using 25 carefully designed synthetic DNA and RNA oligonucleotides as baits and affinity purification mass spectrometry (AP-MS), we performed pulldowns in three cell lines that yielded 10,000+ protein-NA interactions and involved 900+ proteins. Bioinformatic analysis allowed us to identify 139 new NABPs, to provide first experimental evidence for another 98, and to determine 513 specificities for 219 distinct NABPs for different subtypes of NAs.

Successful validation of 7/8 chosen new specificities confirmed the affinity of YB-1 for methylated cytosine. YB-1 is over-expressed in tumors and is associated with multiple drug resistance. Network analysis of YB-1 ChIP-seq peak nearest genes identified a subnetwork of 73 genes strongly associated with cancer pathways, thereby suggesting a potential epigenetic role of YB-1 in resistant tumors.

We could also show that non sequence specific proteins binding DNA do interact with nucleic acid chains through an interface that is more constraint in its geometry than proteins binding mRNA, which are known to contain more disordered regions.

To extend the experimental data we undertook a machine learning approach to derive a method of automatically inferring nucleic acid binding. We employed a family of support vector machines (SVMs) to predict NA binding de novo.

Keywords: Systems Biology and Networks, Protein Structure and Function Prediction and Anal

Presentation PDF: Download Abstract

TOP

LBR22 - Sequence Determinants Govern the Translation Efficiency of the Secretory Proteome
Room: Hall 10
Date: Tuesday, July 23 2:40 p.m. - 3:05 p.m.

Presenting Author: Michal Linial, The Hebrew University of Jerusalem, Israel

Additional Authors:
Shelly Mahlab, The Hebrew University, Israel

Abstract
Translation must be tightly controlled for coping with the cell's demand and its limited resources. Energetically, translation is the most expensive operation in dividing cells. We applied a measure of tRNA adaptation index (tAI) as an indirect proxy for the translation rate. We tested the possibility that sequence determinants are encoded along the transcripts to govern translational efficiency. The secretory proteome comprises about 30% of the proteins in human and other multi-cellular model systems. Many of these proteins contain at their N’-terminal a segment that is called Signal Peptide (SP) which determines a translocation to the ER. Indeed, all SP-proteins are translated by ER-membrane bound ribosomes. We anticipated that proteins translated by free or bound ribosomes differ with respect to their overall translation speed. We demonstrate that clusters of poorly adapted codons followed by abundant codons specify the N’-terminal of secreted and SP-membranous proteins. The phenomenon is generalized to the proteomes of yeast, fly and worm despite a poor correlation among their codon tAI values. We propose that translation determinants are evolved to match the cellular needs for translational rate. The codons’ arrangement along transctipts is crucial for management of synaptic sites and poorly folded protein translation. The appearance of low tAI codons at the N'-terminal of SP proteins attenuates the elongation rate. We conclude that processes such as translocation through the ER membrane, processing, maturation and folding are dependent on a specific codon arrangement that dictates a delay in translational elongation.

Keywords: Proteomics, Sequence Analysis

Presentation PDF: Download Abstract

TOP

LBR23 - SH3 Interactome Conserves General Function Over Specific Form
Room: Hall 10
Date: Tuesday, July 23 3:10 p.m. - 3:35 p.m.

Presenting Author: David Gfeller, Swiss Institute of Bioinformatics, Switzerland

Additional Authors:
Xiaofeng Xin, University of Toronto, Canada
Jackie Cheng, University of California Berkeley, Canada
Raffi Tonikian, University of Toronto, Canada
Charles Boone, University of Toronto, Canada
Sachdev Sidhu, University of Toronto, Canada
Gary Bader, University of Toronto, Canada

Abstract
SH3 domains bind peptides to mediate protein-protein interactions that assemble and regulate dynamic biological processes. We surveyed the repertoire of SH3 binding specificity using peptide phage display in a metazoan, the worm Caenorhabditis elegans, and discovered that it structurally mirrors that of the budding yeast Saccharomyces cerevisiae. We then mapped the worm SH3 interactome using stringent yeast two-hybrid and compared it to the equivalent map for yeast. We found that the worm SH3 interactome resembles the analogous yeast network because it is significantly enriched for proteins with roles in endocytosis. Nevertheless, orthologous SH3 domain mediated interactions are highly rewired. Our results suggest a model of network evolution where general function of the SH3 domain network is conserved over its specific form.

Keywords: Protein Structure and Function Prediction and Anal, Systems Biology and Networks

Presentation PDF: Download Abstract

TOP