Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact and provide your poster title or submission ID.

Category E - 'Functional Genomics'
E01 - Functional Genomics Resources at EMBL-EBI: Expression Atlas and ArrayExpress
Maria Keays, European Bioinformatics Institute,
Elisabet Barrera, European Bioinformatics Institute,
Marco Brandizi, European Bioinformatics Institute,
Miroslaw Dylag, European Bioinformatics Institute,
Nuno Fonseca, European Bioinformatics Institute,
Nikolay Kolesnikov, European Bioinformatics Institute,
Oliver Mannion, European Bioinformatics Institute,
Robert Petryszak, European Bioinformatics Institute,
Alfonso Munoz-Pomer Fuentes, European Bioinformatics Institute,
Catherine Snow, European Bioinformatics Institute,
Amy Tang, European Bioinformatics Institute,
Ugis Sarkans, European Bioinformatics Institute,
Alvis Brazma, European Bioinformatics Institute,
Ekaterina Pilicheva, European Bioinformatics Institute,
Andrew Tikhonov, European Bioinformatics Institute,
Olga Melnichuk, European Bioinformatics Institute,
Short Abstract: Expression Atlas consists of pre-analyzed RNA-seq and expression microarray
data, to enable users to discover which genes are expressed in which tissues,
cell types, developmental stages, and hundreds of other experimental
conditions. This could be either in a baseline context, e.g. to find genes
expressed in the human kidney, or in a differential context, e.g. to find
genes that are upregulated in response to a drug treatment. All datasets
are manually curated to a high standard, by our in-house curators and using
advice from experts around the world. At the time of writing, Expression Atlas
consists of 981 datasets, including 86 RNA-seq experiments. All data in
Expression Atlas is free to browse, download, and re-use. ArrayExpress is an
archive of functional genomics experiments, submitted by scientists working in
diverse fields. Submission to ArrayExpress is free of charge, and datasets can
be kept private until the submitter publishes their findings. Submission is
now quicker and easier than ever before, thanks to our new tool, Annotare:
most submissions are accessioned within 24 hours. ArrayExpress also imports
all new releases from NCBI GEO weekly. As of March 2015, ArrayExpress
consists of 56892 datasets studying a wide variety of organisms, from human
and mouse to rice and maize. The majority of ArrayExpress data is microarray
based, though our RNA-seq coverage is growing daily. Public data in
ArrayExpress is freely available for download, either from the website or via
programmatic access. Both resources employ ontology-driven query expansion,
enabling powerful searching across thousands of datasets.
E02 - A Context-Specific Machine Learning Method to Predict Novel Osteoporosis Related Pathways
Jacob Luber, Trinity University, United States
Catherine Sharp, The Jackson Laboratory, United States
KB Choi, The Jackson Laboratory, United States
Cheryl Ackert-Bicknell, The University of Rochester Medical Center, United States
Matthew Hibbs, Trinity University, United States
Short Abstract: Identifying novel pathways related to bone development is important for developing osteoporosis treatments, but is difficult due to noise from unrelated pathways observed in RNA-Seq and other gene expression data exploring osteoblast differentiation, which can lead to overfitting in machine learning models. We propose a Support Vector Machine based approach using input from 661 mouse gene expression experiments. The bulk of data was obtained from the Gene Expression Omnibus and supplemented by our lab's RNA-seq time course data measuring mouse mesenchymal stem cells differentiating into mature osteoblasts in culture. Data was converted into ~500B pairwise data points using multiple distance metrics in order to predict pairwise relationships between all genes. SVMs were trained using our manually curated set of known osteoporosis and bone density gene pair relationships. Two models were generated: one utilizing all available gene expression data, and one using only data generated in osteoblasts and bone related cells. Both models were applied to unclassified gene pairs to generate probabilities (by converting distances to the separating hyperplane using Platt's method) that novel pairs are functionally related in the context of bone development and maintenance. We then constructed graphs with nodes as genes and edges as probabilities of functional relationship and compared the resulting networks from the two models to determine bone-specific pathway components. The proposed comparative learning approach is powerful because it inherently takes into account the high levels of correlation between data due to reuse of biological pathways and components in different cellular contexts.
E03 - Models of Sequence Conservation Reveal Errors in the CHO Genome
Qixun Fang, The University of Sheffield,
Paul Dobson, The University of Manchester,
Short Abstract: Chinese Hamster Ovary (CHO) cells have long been used to manufacture valuable biopharmaceuticals but their cell biology remains obscure. The level of annotation of the published CHO-K1 genome remains low. In this project we are interested in identifying mutations within CHO genes that are highly similar by sequence to functionally-annotated homologues, but which differ at critical sites that might impinge upon function, in which case the homologue’s function should not be assigned to the CHO counterpart. The criticality of sites is estimated using models that capture patterns of residue conservation across Evolution. The lack of such highly-conserved sites in the CHO homologue is taken as indicative of a loss of gene function. Predictions are mapped onto Gene Ontology (GO) and term enrichments calculated to generate a higher level of view of the genes and pathways that do or do not function normally, giving a richer view of CHO cell biology. The models predicted that about 30% of the CHO genome lost their function and the substantial function loss occurs on signal recognition particles, ribosome proteins, mitochondrial proteins and cell differentiation.
E04 - GenomeRNAi: A Phenotype Database for Large-scale RNAi Screens
Esther Schmidt, German Cancer Research Center (DKFZ), Germany
Oliver Pelz, German Cancer Research Center (DKFZ), Germany
Svetlana Buhlmann, German Cancer Research Center (DKFZ) and Heidelberg University, Germany
Hannah Fleckenstein, German Cancer Research Center (DKFZ), Germany
Johanna Mehl, German Cancer Research Center (DKFZ), Germany
Michael Boutros, German Cancer Research Center (DKFZ) and Heidelberg University, Germany
Short Abstract: RNA interference (RNAi) represents a popular approach for the systematic perturbation of gene expression. Large-scale screening experiments can be performed, employing a wide variety of biological assays, resulting in the observation of loss-of-function phenotypes across many fields in biology. These phenotypes constitute a rich source of functional gene annotation.

The GenomeRNAi database is a repository for RNAi phenotype and reagent data, aiming to provide a platform for data mining and comparisons. Data is extracted from the literature by manual curation, or directly submitted by data producers, followed by curatorial review according to structured annotation guidelines. Currently, the database contains 214 experiments in human, and 200 experiments in Drosophila, totaling to more than 1,2 million individual gene-phenotype associations. It also holds information on more than 400,000 RNAi reagents, along with quality assessment data like efficiency and specificity.

GenomeRNAi ( provides functionalities for searching (by gene, reagent, or phenotype), browsing and downloading experiments. A DAS server allows visualization of reagents and phenotypes in their genomic context. The website features a “frequent hitter” page; further a functionality for the overlay of genes sharing the same phenotype onto known gene networks provided by the String database (; as well as a beta version of a screen comparison tool, implemented as a heat map-like overview of genes and their phenotypes in different screens. GenomeRNAi data are well integrated with external resources, providing e.g. mutual links with FlyBase, GeneCards and UniProt. GenomeRNAi functional data have also been incorporated into the FlyMine tool.
E05 - Human paralog genes share regulatory elements and co-localize in the three-dimensional chromatin architecture
Jonas Ibn-Salem, Johannes Gutenberg University, Germany
Miguel Andrade-Navarro, Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany, Germany
Short Abstract: Paralog genes arise from gene duplication events during evolution. The resulting sequence similarity between paralogs often leads to proteins of similar structures and functions in common pathways. Therefore it might be useful for the cell to have paralog genes co-regulated. However, since paralog genes often show also slightly different functions, for example alternative domains, it might be also useful for cells to exclusively express only one out of several paralogs for a specific function or response.
Eukaryotic genes are regulated by binding of transcription factors to distal enhancer elements which perform looping interactions to the transcription machinery at gene promoters. We hypothesised that paralog genes share common regulatory mechanism that allows co-regulation and exclusive expression.

To test this hypothesis, we integrated paralogy annotations with genome-wide data-sets of enhancer-promoter associations and genome-wide chromatin interaction data from Hi-C experiments in human cells.

With carefully sampled control data sets that take linear co-localisation of paralogs into account, we show that paralog gene pairs share a significant amount of common enhancer elements. Furthermore they are located significantly more often in the same topological association domain than expected and therefore cluster not only in the linear genome but also in the three-dimensional chromatin structure of the nucleus.

Together our results indicate that human paralog gene pairs share common regulatory mechanisms. We will further integrate expression data from different tissues and functional annotation of genes to support our findings that paralog genes tent to be expressed either collectively or exclusively depending on the cells functional needs.
E06 - Comprehensive analysis of association between heterogeneity and translation of 5’ leaders
Paul Korir, University College Cork, Ireland
Pavel Baranov, University College Cork, Ireland
Short Abstract: There is overwhelming evidence of translation of upstream Open Reading Frames (uORFs) in the 5’ leaders of many mammalian mRNAs. The translation of uORFs often inhibits translation of annotated coding ORFs (acORFs) and allows for regulation in response to changes in cellular conditions. We hypothesised that 5’ leader heterogeneity of alternative transcripts (due to alternative transcription initiation and splicing) is associated with a synthesis of mRNAs that code for the same protein product, but regulated differently depending on uORF organization of their 5’ leaders.

To explore the relationship between translation of 5’ leaders and their heterogeneity, we carried out bioinformatic analyses using publicly available ribosome footprinting data. Our analyses involve identifying high-confidence translated regions then estimating various facets of heterogeneity of 5’ leaders across alternative transcripts. We devised a simple peak-calling method on ribosome footprints in 5’ leaders treating such peaks as a proxy for 5’ leader translation. We defined heterogeneity on the set of transcript isoforms associated with a pair of translation termination sites for non-overlapping genes. We reasoned that such an approach would emphasise the effect of heterogeneity because such transcripts differ only in mRNA leader regions, which confer the bulk of regulatory activity. We examined several key aspects of heterogeneity such as alternative initiation and/or splicing, mean leader length across isoforms, sequence content (uAUGs, GC content, regulatory motifs such as terminal-oligopyrimidine (TOP) tracts, codon bias), and secondary structure on translation. Finally, we performed functional analyses on extreme cases cases of heterogeneity to identify enriched gene categories.
E07 - hitSeekR: A High-throughput Screening web application kit for R
Markus List, University of Southern Denmark, Denmark
Steffen Schmidt, University of Southern Denmark, Denmark
Helle Christiansen, University of Southern Denmark, Denmark
Mads Thomassen, University of Southern Denmark, Denmark
Torben A. Kruse, University of Southern Denmark, Denmark
Qihua Tan, University of Southern Denmark, Denmark
Jan Baumbach, University of Southern Denmark, Denmark
Jan Mollenhauer, University of Southern Denmark, Denmark
Short Abstract: Formerly applied exclusively by pharmaceutical companies, industrial-scale high-throughput screening (HTS) is nowadays performed by academic institutions, which generate a wealth of publicly available data. It is well known that HTS data suffers from variation caused by, for instance, plate, batch, library, time and positional effects. The robust identification of putative hits from a single screen or comparison of several independent screens thus crucially depends on normalizing these data appropriately. While a number of plate and control based normalization methods have been developed, there is currently no standard for processing HTS data, since different normalization strategies can lead to largely varying results. Additionally, advanced normalization strategies can be difficult to apply due to a lack of suitable and user-friendly analysis software. To overcome these issues, we developed hitSeekR, a web application that utilizes the R statistical framework for processing HTS raw data. It provides rich interactive visualizations and supports the user in identifying sources of variation. This allows the user to select an optimal processing strategy from a large panel of available normalization and hit identification methods. In addition to compound screens, hitSeekR supports siRNA knock-down and CRISPR/CASP9 knock-out screens, as well as microRNA inhibitor and -mimics screens. After normalization, hit lists are generated based on a single method or as a consensus of several methods. Where applicable, hitSeekR reports drug and miRNA target genes and offers systems biology analysis of the resulting gene sets. In conclusion, hitSeekR enables experimental scientists without R knowledge to effectively analyze complex HTS data sets.
E08 - A better false discovery rate controlling procedure for identifying key functional genes in gene set analysis
Takafumi Narise, Tohoku university, Japan
Takeshi Obayashi, Tohoku university, Japan
Hiroyuki Ohta, Tokyo Institute of Technology, Japan
Kengo Kinoshita, Tohoku university, Japan
Short Abstract: Gene set analysis (GSA) methods, which are used to facilitate biological interpretation of the expression data, can determine statistically significant gene sets (SSGSs) without the need for determination of differentially expressed genes (DEGs). Therefore, the methods can detect subtle but coordinated expression changes in functionally related genes unlike overrepresentation analysis of DEGs. Although this is an advantage of the GSA methods, to identify DEGs in SSGSs is a challenging task, because the DEGs in SSGSs are likely to contribute to the functional differences between the sample groups and thus are biologically important.
The leading-edge subsets in Gene Set Enrichment Analysis are similar to the DEGs in SSGSs, however, the subsets are determined by ad hoc criteria and may contain many non-DEGs. This problem can be resolved by controlling the overall false discovery rate (FDR) when testing multiple gene sets and genes, however, which causes low sensitivity and inflexibility of the identification of the DEGs in SSGSs.
In order to sensitively and flexibly identify the DEGs in SSGSs, we propose an approach that controls the FDR of the DEGs in each SSGS. This approach was able to sensitively detect DEGs that show coordinated expression changes in functionally related genes. We demonstrate that the proposed FDR controlling procedure has several advantages in identification of the DEGs in SSGSs compared with the approach using the conventional FDR controlling procedure, and that the subsequent clustering analysis based on the DEGs in SSGSs is appropriate for identifying biologically important genes in GSA.
E09 - Systematic integration of molecular signatures identifies novel components of the antiviral RIG-I-like receptor pathway
Robin van der Lee, Radboud university medical center, Netherlands
Qian Feng, Virology Division, Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, University of Utrecht, Netherlands
Martijn A. Langereis, Virology Division, Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, University of Utrecht, Netherlands
Rob ter Horst, Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud university medical center, Netherlands
Radek Szklarczyk, Department of Clinical Genetics, Unit Clinical Genomics, Maastricht University Medical Centre, Netherlands
Mihai G. Netea, Department of Internal Medicine, Radboud university medical center, Netherlands
Arno C. Andeweg, Department of Viroscience, Erasmus Medical Center, Netherlands
Frank J. M. van Kuppeveld, Virology Division, Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, University of Utrecht, Netherlands
Martijn A. Huynen, Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud university medical center, Netherlands
Short Abstract: The RIG-I-like receptor (RLR) system is critical for the innate defense against invading viruses. Recognition of viral RNA by the RLRs leads to the production of type I interferons (IFNα/β), which initiate the immune response. Here, through systematic assessment of a wide variety of available genomics data, we identify 10 molecular signatures of RLR pathway components. Five of these signatures are based on the relationship of RLR signaling with viruses, while the other five are based on properties of the pathway itself. We demonstrate that RLR pathway genes, among other characteristics, tend to evolve at a high rate, interact with viral proteins, contain a limited set of protein domains, are regulated by a specific set of transcription factors, and form a tightly connected physical interaction network. By weighing the signatures for their ability to predict known RLR pathway genes, and integrating them, we propose 187 novel genes with a likely role in the RLR system in humans. Using two RNAi screens, we validate an effect on RIG-I-mediated IFNβ production for about half of these genes. Among these, a conservative set of 19 new genes have strong effects on RLR signaling in multiple experiments. Finally, by connecting the results with the known protein interaction network, we suggest for several newly identified RLR genes where in the pathway they could function. We provide the genome-wide prioritization of RLR pathway components as a new resource for identifying genes involved in the RLR system and for the evaluation of custom data sets relevant to antiviral innate immunity.
E10 - Dissecting highly similar transcriptomes of the fruitfly imaginal discs
Roderic Guigo, Center for Genomic Regulation (CRG), Spain
Cecilia Coimbra Klein, Center for Genomic Regulation (CRG), Spain
Alessandra Breschi, Center for Genomic Regulation (CRG), Spain
Silvia Pérez-Lluch, Center for Genomic Regulation (CRG), Spain
Amaya Abad, Center for Genomic Regulation (CRG), Spain
Emilio Palumbo, Center for Genomic Regulation (CRG), Spain
Short Abstract: The exploration of high-dimensional data sets requires methods to extract the relevant biological knowledge that may not be apparent by the presence of non-informative variables. In the present study, we aim to trace the spatial and temporal transcriptional profile from the eye and the wing imaginal discs of Drosophila melanogaster in three developmental stages: third instar larva, white pupa and late pupa. We further aim to explore the signatures of five compartments of the wing imaginal disc. The preliminary RNA-seq data is, however, highly similar which may be due to the partial overlap of the wing compartments. Thus, we used the projection score (Fontes and Soneson, BMC Bioinformatics 2011) to find the informative structures of these transcriptomes in order to detect tissue-specific genes. Such approach, without any class labels, measures the informativeness of a variable subset based on the variance contained in the sample configuration obtained by PCA. From the preliminary dataset of 13.832 protein coding genes, the projection score attains its maximal value of 0.4578 for a variance threshold of 37.5% of the maximal variance, corresponding to 335 genes with highest variances. These genes highly intersect with the set differentially expressed genes found using the GLM functionality of edgeR (Robinson MD et al., Bioinformatics 2010; McCarthy et al., Nucleic Acids Research 2012). The projection score shows to be useful to get cleaner signal in such fine resolution of the transcription profile of wing compartments.
E11 - ALCOdb: a gene coexpression database for microalgal species
Yuichi Aoki, Tohoku University, Japan
Takeshi Obayashi, Tohoku University, Japan
Kengo Kinoshita, Tohoku University, Japan
Short Abstract: Microalgal species are gathering worldwide attention as promising source for biofuel feedstock or animal feed, and also as model organisms to investigate the evolution of phototrophs. Numerous molecular and biochemical studies have been conducted to reveal a variety of phenomena such as lipid metabolism and organelle development, however, the biological functions of most microalgal genes remain unclear. Gene coexpression, a similarity of gene expression profiles, has been widely applied to inferring the function of uncharacterized genes in animal or plant sciences. Here, we present a gene coexpression database specialized for microalgal species, a green alga Chlamydomonas reinhardtii and a red alga Cyanidioschyzon merolae. Algae Gene Coexpression Database (ALCOdb, provides coexpressed gene sets deduced from microarray and RNA sequencing data. One disadvantage of calculating microalgal gene coexpression is the limited numbers of expression data compared with other model organisms. To avoid making a wrong assumption arise from poor data quality, the reliability of gene coexpression was carefully evaluated. We calculated the coincidence degree between a coexpressed gene list deduced from expression data and that deduced from codon usage similarity, and applied as a reliability measure. ALCOdb also provides comprehensive annotation, orthologous gene information and interactive network analysis tools. In this poster, we will present the overview of ALCOdb with an application to comparative analysis of lipid metabolism between Chlamydomonas reinhardtii and a model plant Arabidopsis thaliana.
E12 - The relationship between the replication axis and genome behavior on imbalanced Bacillus subtilis chromosome
Nobuaki Kono, Keio University Institute for Advanced Biosciences, Japan
Masaru Tomita, Institute for Advanced Biosciences, Keio University, Japan
Kazuharu Arakawa, Institute for Advanced Biosciences, Keio University, Japan
Short Abstract: In bacterial chromosomes, the location of replication origin appears to be physically identified, and is genetically regulated by relevant molecular factors. In almost all bacterial replicons, the replication terminus is located at the opposite site from the replication origin. The replication origin and terminus axis is a central structure in circular genome, and in biotechnology, the maintenance of the axis stability is one of the important goals. The purpose of this research is to control the replication axis. Here, we investigated Bacillus subtilis 168 strains whose axes were hindered, and found that the native replication terminus function was robust. However, eradication of terminus region specific binding protein resulted in the natural terminus not being used for termination; instead, new termini were selected at a location exactly opposite to replication origin. Next, we focused on features of strand properties (leading and lagging strands). The relocation of replication terminus leads to a switching of leading and lagging strands. Hence, the appearance of new replication terminus implies the change in the strand property. Therefore, we observed various alterations such as the mutation rate, fork speed, and gene expression profile. We concluded that replication generally terminates at the loci where the two approaching replisomes meet. This site was automatically selected, and two replisomes moving at the same rate supported symmetrical chromosome structures relative to replication origin. Furthermore, various profiles about the strand property will contribute the next generation of biotechnologies.
E13 - GenomeSpace: An environment for frictionless bioinformatics
Sara Garamszegi, Broad Institute of MIT and Harvard, United States
Jill Mesirov, Broad Institute of MIT and Harvard, United States
The GenomeSpace Team , Broad Institute of MIT and Harvard, United States
Short Abstract: Genomic research increasingly involves the integrative analysis of multiple data types, such as sequence, expression, epigenetic and proteomic datasets. Comprehensive analysis of these datasets frequently requires the coordinated use of Web-based applications and data repositories, desktop analysis tools, and visualizers. However, the effort required to transfer data between tools, convert between data formats, and manage results often prevents researchers from utilizing the wealth of methods available to them. Many integrative genomics and translational “bench-to-bedside” discoveries are possible with combinations of existing tools, but the necessary transitions between them puts these analyses out of the reach of most investigators.

GenomeSpace ( is a lightweight, cloud-based environment that brings together diverse computational tools, enabling scientists without programming skills to easily combine their capabilities. It offers a common space to create, manipulate and share an ever-growing range of genomic analysis tools. GenomeSpace features support for cloud-based data storage and analysis, multi-tool analytic workflows, automatic conversion of data formats, and ease of connecting new tools to the environment.

The GenomeSpace Recipe Resource is an online repository of analysis “recipes”, which are short, standalone guides for accomplishing common bioinformatics analysis tasks in GenomeSpace using 2-3 tools or data resources. Recipes aim to make bioinformatics analyses accessible and easily understandable to investigators, regardless of computational proficiency. Recipes are curated and added to the Recipe Resource on an ongoing basis.

Here, we show how researchers can use GenomeSpace to combine the capabilities of multiple bioinformatics tools in several research scenarios.
E14 - Power gain: how normalization affects reproducibility and biological insight of RNA-seq studies in Neuroscience
Davide Risso, University of California, Berkeley, United States
Lucia Peixoto, Department of Biology, University of Pennsylvania, United States
Shane G. Poplawski, Department of Biology, University of Pennsylvania, United States
Mathieu E. Wimmer, Department of Biology, University of Pennsylvania, United States
Terence P. Speed, Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Australia
Marcelo A. Wood, Department of Neurobiology and Behavior, University of California, Irvine, United States
Ted Abel, Department of Biology, University of Pennsylvania, United States
Short Abstract: The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, current RNA-seq normalization methods only account for differences in sequencing depth and thus unwanted variation (e.g., batch effects) can dominate the signal of interest, leading to low detection power and reproducibility. This is particularly problematic for the study of the brain in vivo, since a lot of experimental variables cannot be controlled and may have effect sizes above the treatment of interest.
We show, using publicly available data and our own RNA-seq data obtained from the mouse hippocampus following learning, that commonly used normalization methods are largely unable to remove unwanted variation in brain in vivo data. This in turn may lead to false negatives and/or false positives in the reported results. We present a novel application of the RNA-seq normalization method RUVSeq and show that it is able to properly normalize the samples leading to a more accurate and reproducible picture of gene expression changes due to the learning experience.
We show that RUVSeq performs better than other available methods to remove batch effects. RUVSeq is also able to normalize across different laboratories and learning paradigms in mice, showing great potential for multi-site data integration and meta-analysis.
E16 - A high-resolution gene expression atlas of epistasis between gene-specific transcription factors reveals new mechanisms for genetic interactions
Patrick Kemmeren, UMC Utrecht, Netherlands
Katrin Sameith, UMC Utrecht, Netherlands
Marian Groot Koerkamp, UMC Utrecht, Netherlands
Dik van Leenen, UMC Utrecht, Netherlands
Mariel Brok, UMC Utrecht, Netherlands
Nathalie Brabers, UMC Utrecht, Netherlands
Philip Lijnzaad, UMC Utrecht, Netherlands
Sander van Hooff, UMC Utrecht, Netherlands
Joris Benschop, UMC Utrecht, Netherlands
Tineke Lenstra, UMC Utrecht, Netherlands
Eva Apweiler, UMC Utrecht, Netherlands
Sake van Wageningen, UMC Utrecht, Netherlands
Berend Snel, Utrecht University, Netherlands
Frank Holstege, UMC Utrecht, Netherlands
Short Abstract: Recent studies have systematically exposed large numbers of non-additive genetic interactions, the majority of which are functionally uncharacterized. To investigate such genetic interactions between gene-specific transcription factors (GSTFs) in Saccharomyces cerevisiae, we systematically analysed 72 GSTF pairs by DNA microarray analysis of double and single deletion mutants. These pairs were selected through previously published growth-based genetic interaction as well as through similarity in DNA binding properties. The result is a high-resolution atlas of gene expression-based genetic interactions that provides systems-level insight into GSTF epistasis. The atlas confirms known genetic interactions and exposes new ones. Importantly, the data can be used to elucidate the mechanisms that underlie individual genetic interactions. Evidence is provided for two previously uncharacterized mechanisms, "Buffering by induced dependency" and "Alleviation by derepression". These mechanisms demonstrate how negative genetic interactions can occur between seemingly unrelated pathways and how positive genetic interactions can indirectly expose parallel- rather than same-pathway relationships. The study provides general insights into the complex nature of epistasis and results in new models for genetic interactions, the majority of which do not fall into easily recognizable within- or between pathway relationships.
E18 - Identification of the set of transcripts up-regulated during DNA replication and Mitosis
Bruno Giotti, Roslin Institute,
Mark Barnett, Roslin insitute,
Tom Freeman, Roslin insitute,
David Chen, Roslin insitute,
Short Abstract: The cell cycle is a central pathway of the eukaryotic kingdom which coordinates genome duplication and cell proliferation (mitosis). Comparisons of the numerous transcriptomics studies on cell cycle-associated gene expression suggest large discrepancies between the transcriptional networks underpinning this system between cell types. We therefore set out to test this hypothesis and in so doing have identified a core set of genes associated with S-, G2- and M-phase. To do so, we performed a new series of time-course microarray experiments on human fibroblasts entering cell division as a synchronised population of cells. These data were analysed and genes of interest selected using BioLayout Express3D, an open-source tool that visualises and clusters data matrices as correlation graphs. We validated the co-expression of these genes across a large collection human primary cells and tissue types which refined the initial list to 705 genes highly enriched for cell-cycle annotations (according to GO and Reactome). Of these, 70 are not functionally characterised. Localisation studies of GFP tagged clones identified four to be localised at centrosomes during mitotic division in HEK293T cells and many others to be expressed in the nucleus. Also, RNA silencing experiments identified 19 as inhibiting cell proliferation after their knock-down. These observations support the quality of our cell cycle gene list which represents a catalogue of key genes up-regulated during DNA replication and mitosis, including a set of uncharacterised proteins that may play a role in human cell division.
E19 - Predicting Microbial Gene Function on a Massive Scale Reveals Extensive Complementarity between Genome Context Methods
Vedrana Vidulin, Ruđer Bošković Institute, Croatia
Tomislav Šmuc, Ruđer Bošković Institute, Croatia
Fran Supek, Ruđer Bošković Institute, Croatia
Short Abstract: We present a novel pipeline for annotating prokaryotic genes with Gene Ontology functions based on supervised machine learning in a hierarchical multi-label setting. 14,945,154 genes from 5,271 Bacteria/Archaea are used to form a singular learning dataset via mappings of the genes to 60,892 COG/NOG groups, predicting 8,005 different GO terms. Four classifiers are trained on: (i)phyletic profiles indicating presence/absence of COGs in genomes; (ii)signatures of remote homology across genomes; (iii)conserved gene neighborhoods and (iv)a novel method where evolutionary change in codon adaptation is tracked across orthologs. We measured accuracy of classifiers in crossvalidation (out-of-bag error in CLUS-HMC Random Forest), and combined their predictions in a late fusion scheme. This resulted in accuracy higher than the individual components, allowing many genes to receive annotations: 61.17% E. coli COGs received at least one novel and likely correct (precision >50%) function. The four ‘genome context’ methods were complementary to a large extent: out of 216,694 function predictions made for various E. coli genes, 86,285 (40%) were unique to a single method. Moreover, out of 3,041 GO functions that were successfully learned (cross-validation AUPRC>0.05), 1,316 were learned exclusively by one method. This underscores the need to combine multiple approaches. Finally, we used information accretion to compare the amount of past vs. newly predicted knowledge on gene function, and found that model bacteria have ~90–100 bits/gene of known annotations, while our pipeline typically annotates ~30 additional bits/gene. Thus, a comprehensive use of genome context methods allows a sizable increase in our knowledge regarding gene function.
E21 - Gene Functional Neighborhoods as Predictors of Gene Function
Tomislav Smuc, Rudjer Boskovic Institute, Croatia
Matej Mihelcic, Rudjer Boskovic Institute, Croatia
Fran Supek, Rudjer Boskovic Institute, Croatia
Short Abstract: Roles of many genes in a typical genome are still unknown or only partially explored. Experiments that determine the gene function are expensive and time-consuming, necessitating development of novel computational methods to predict gene functions using machine learning and knowledge discovery techniques. It is known from comparative genomics that certain functionally interacting groups of genes reside in conserved neighborhoods in phylogenetically distant bacterial genomes. In this work we present a novel feature construction method which uses the functional composition of a gene's neighborhood on the chromosome as features for inferring the gene's functions. We integrate data across approx. 1000 bacterial and archaeal genomes using COG gene families; each COG is then described by the average frequency of gene functions in its close neighborhood across many genomes. We use a Random Forest of the CLUS-HMC multilabel classifiers (predictive clustering trees), which is particularly adapted to learning hierarchical sets of labels - here, the Gene Ontology annotations of COGs. We experiment with different definitions of the local neighborhoods in terms of its size and genome composition and compare our method to the common 'guilt-by-association' approach in which the functions from the physically closest gene(s) across bacterial genomes is assigned to a query genes. Our analysis shows that proposed approach gives substantially more accurate predictions across many GO functions, both general and specific ones.
E22 - Regulatory response of CHO cells to serum withdrawal from a novel genome-scale probe design
Smriti Shridhar, BOKU University Vienna, Austria
Gerald Klanert, BOKU University Vienna, Austria
Norbert Auer, BOKU University Vienna, Austria
Johannes Grillari, BOKU University Vienna, Austria
Nicole Borth, BOKU University Vienna, Austria
David Kreil, BOKU University Vienna, Austria
Short Abstract: Chinese Hamster Ovary (CHO) cells are the preferred choice of cell line for production of biopharmaceuticals. The increasing availability of genome scale experiments now facilitates a better understanding of metabolic pathways and gene networks. In general, RNA-Seq provides better sensitivity and specificity for strongly expressed transcripts, whereas micro-arrays can obtain a better signal to noise ratio for low expressors. Recent work increasingly suggests that both technologies have rather different strengths and weaknesses. When target sequences are known, micro-arrays thus remain a highly attractive tool in the quest to systematically and sensitively assess and analyze the regulation of the transcriptome. In our design, unique uniformity, sensitivity, and specificity are achieved by optimization at the probe set level across the entire array instead of greedy selection by probe parameter thresholds. Probe properties are computed using a full signal model that has recently been updated to incorporate the latest thermodynamic estimates. Our design covers 98.5% of all CHO transcripts , and 95.9% with high specificity (RefSeq genome assembly: GCF_000223135.1). These probes have also been mapped to another recent addition of transcripts from an alternate assembly (GCF_000419365.1).

The performance of the array is demonstrated by assessing changes in gene expression of CHO cells. In particular, we report on the results of an experiment comparing four replicate cultures each of CHO-K1 cells grown with serum or adapted to protein free medium. The availability of a validated high-performance CHO specific array will be a valuable new resource for the CHO community facilitating system level cell line engineering.
E23 - Definition of conserved expression signatures of human immune cells for deconvolution analysis of mixed cell population data – a network-based approach
Ajit Johnson Nirmal, Roslin Institute/Edinburgh University,
Tim Regan, Roslin Institute/Edinburgh University,
Tom Freeman, Roslin Institute/Edinburgh University,
Andrew Sims, Edinburgh University,
Short Abstract: Transcriptomics data is a valuable resource for understanding a system’s response to disease or other perturbation. Frequently, pathological states are associated with the recruitment of immune cells to the site of disease and/or changes in the relative numbers or activation state of other cell types. Identifying these alterations are key to the correct interpretation of data derived from clinical samples. In order to identify the presence of immune cells in a sample, previous studies have attempted to identify modules of cell type-specific gene signatures. Our analysis of these signatures suggest that the gene sets identified by these studies are not optimal. We therefore set out to generate immune cell type-specific gene signatures using a network-based approach.
First an extensive meta-analysis of 39 datasets containing 276 samples of different human immune cell populations was performed using the network analysis tool BioLayout Express3D. We extracted the most cell-specific clusters from these data and then examined their co-expression across eight blood and four tissue datasets. In this way we refined and validated a marker signatures for B cells (75 genes), macrophages/dendritic cells (91), monocytes (69), T cells (67), neutrophils (148), NK cells (69) and platelets (87). These signatures can be used to deconvolute data derived from mixed cell populations e.g. blood and tissue data, providing a way to estimate the relative abundance and activation state of immune cells within such data.
E24 - Functional transcription factor target discovery via compendia of binding and expression profiles
Christopher Banks, University of Edinburgh,
Anagha Joshi, Roslin Institute, University of Edinburgh,
Tom Michoel, Roslin Institute, University of Edinburgh,
Short Abstract: Transcriptional regulation by sequence-specific transcription factors (TFs) determines all aspects of cell behaviour and TFs are known to be essential for a wide range of important cellular and organismal phenotypes. Genome-wide experiments to map the DNA-binding locations of TFs show without exception that thousands of genes are bound by any TF, far exceeding the number of possible direct target genes. How to distinguish biologically functional from non-functional binding has therefore become a major bottleneck in the process of annotating the regulatory genome.

We hypothesized that functional TF-binding sites can be discovered by correlating binding and expression profiles across multiple experimental conditions. To test this hypothesis, we obtained integrated datasets of ChIP-seq and RNA-seq experiments in matching cell types from the human ENCODE resource and considered both promoter-proximal and distal cumulative regulatory models to map binding peaks to genes. Compared to the traditional approach where functional binding is inferred from the presence of multiple binding sites in a gene locus in a cell type of interest, we found that a high degree of correlation between a gene's TF-binding and expression profiles was significantly more predictive of the gene being differentially expressed upon knockdown of that TF. Gene ontology analysis further showed that the correlation-based method is able to identify highly relevant functional target genes among the thousands of genes bound by a given transcription factor. Remarkably, these results were confirmed when using correlation across a time course of ChIP-seq and RNA-seq experiments during mouse circadian rhythm.
E26 - Characterizing the differential usage of Spliced Leader Trans-Splicing acceptor sites in polycistronic units of Trypanosoma cruzi
Gloria Franco, Universidade Federal de Minas Gerais, Brazil
Andre Luiz Reis, Universidade Federal de Minas Gerais, Brazil
Bitar Maina, Universidade Federal de Minas Gerais, Brazil
Short Abstract: Trypanosomatids such as Trypanosoma cruzi have no introns in their genes and perform polycistronic transcription. The maturation of individual mRNAs from polycistronic pre-mRNAs relies on the spliced leader trans-splicing (SLTS) mechanism, in which a 39nt-sequence is attached to the 5’ end of individual cistrons upon recognition of acceptor sites. Alternative SLTS has been previously observed in this parasite and shown to be important for regulation of gene expression. Many roles for this mechanism were speculated and a deeper knowledge on this topic will not only contribute to a better understanding of how splice acceptor sites are selected under specific conditions, but also to find experimental evidences that support these roles. Unfortunately, neither the polycistronic units nor the promoters for RNA polymerase II are well annotated in the genome of this parasite. We herein attempted to better characterize the possible regulatory roles of the SLTS in T. cruzi, through the analysis of RNA-seq data. Our method was able to define 592 polycistronic units, comprising 21,626 genes. We also identified 42,879 putative splice acceptor sites, 33,711 of which were successfully associated with the regulatory regions of 9,520 different annotated genes. The frequency of use of each splice acceptor site and the conservation of AG dinucleotides and the upstream polypyrimidine tract allowed the definition of the main acceptor site for each gene. Finally, we attempted to correlate the presence of uORFs and alternative start codons with the differential usage of splice acceptor sites in each gene.
E27 - Unsupervised Discovery of Biological Properties of Pseudomonas aeruginosa using ADAGE
Casey Greene, Geisel School of Medicine at Dartmouth, United States
Jie Tan, Geisel School of Medicine at Dartmouth, United States
John Hammond, Geisel School of Medicine at Dartmouth, United States
Deborah Hogan, Geisel School of Medicine at Dartmouth, United States
Short Abstract: Extracting the rich biological information that is embedded in high-throughput genome-scale data will require new analytical methods that succeed when curated knowledge is sparse or unavailable. Addressing this challenge requires unsupervised algorithms that identify biological patterns present within complex and noisy data. We developed a deep-learning-based approach for gene expression termed ADAGE (Analysis using Denoising Autoencoders of Gene Expression) and applied it to publicly available gene expression data from studies on the bacterium Pseudomonas aeruginosa. The resulting ADAGE model revealed characteristics of genome organization, genome divergence, transcriptional responses, gene function, and gene-gene relationships. In addition, the model also accurately predicted transcriptional responses in newly performed genome-scale assays, and re-analysis of published data using the ADAGE model clearly revealed subtle but critical patterns masked by other gene expression changes in the initial study. We provide ADAGE model values for publically available P. aeruginosa gene expression studies to facilitate discovery by the deep examination of results across multiple experiments by different groups. The unsupervised ADAGE approach can be applied to any large publicly available or newly generated gene expression compendia to characterize genomic and transcriptional features.
E28 - New toxicogenomic predictive model for decreased reticulocytes based on gene expressions in liver of rats built with class association rule mining.
Keisuke Nagata, Astellas Pharma Inc., Japan
Takashi Washio, Osaka University, Japan
Yoshinobu Kawahara, Osaka University, Japan
Akira Unami, Astellas Pharma Inc., Japan
Short Abstract: Toxicogenomic predictive models, which forecast toxicological effects of chemical compounds in human or animals typically based on earlier transcriptome-wide gene expression data obtained from DNA microarrays, have been so far built usually with multivariate analytic techniques such as k-nearest neighbors, linear discriminant analysis (LDA) and support vector machine (SVM). However, building a classifier that is accurate and understandable at the same time is not an easy task, as predictive accuracy, understandability, and computational demands often need to be traded off against one another.

Recently, we reported the application of the Classification Based on Association (CBA) algorithm, one of the class association rule mining techniques, on constructing a predictive model of liver weight gain in rats after 14-day repetitive doses of chemical compounds from gene expressions on day 4 with the TG-GATEs database. The cross-validation study of generated classifiers demonstrated CBA's superiority over LDA in terms of both predictive performances and interpretability.

In this study, we further explored CBA's applicability on various endpoints of toxicity studies other than liver weight gain, and proposed a new toxicogenomic predictive model for decreased reticulocytes after 4-day repetitive doses based on gene expressions in the liver of rats 24 hours after a single dose. The new model could be useful, in terms of both predictive performance and understandability, especially in compound screening during early stages of research and development in pharmaceutical industry.
E30 - A novel approach to identify highly connected and differentially expressed subnetworks reveals underlying biological processes in endometrial cancer metastasis
Kanthida Kusonmano, University of Bergen, Norway
Mari K Halle, University of Bergen, Norway
Helga B Salvesen, University of Bergen, Norway
Kjell Petersen, University of Bergen, Norway
Short Abstract: Differential expression analyses based on high-throughput data have been used to study molecular changes between phenotypes of interest. Further from a typical analysis of deriving a ranked list of individual genes that are significant different between the studied conditions, several methods have been developed to identify differentially expressed genes as a set to facilitate functional interpretation. One main approach is gene set analysis, which evaluates functional enrichment of differentially expressed genes based on publicly available gene sets. Meanwhile network analysis tries to identify functional modules of genes with their interactions based on a studied data. Here we present a novel approach, which combine the features of both gene set and network analyses by detecting subnetworks based on internal relations of the studied data and assessing their differential expression using a well-known gene set method, Gene Set Enrichment Analysis (GSEA). The subnetworks are derived by integrating a priori gene-gene interactions (here we used protein-protein interactions) and expression correlations. We demonstrate our approach on endometrial cancer data between aggressive primary tumors and metastases. The detected differentially subnetworks show biological insights in metastatic settings and display interesting expression trends through tumor aggressiveness. A few subnetworks also have significant links to patient disease specific survival. The study provide exceptional discovery in metastatic context that is interesting for further follow-up studies.
E31 - Charting the unexpected complexity of the transcriptome
Pawel Labaj, Boku Univeristy Vienna, Austria
David Kreil, Boku Univeristy Vienna, Austria
Short Abstract: Exploiting the latest methods for high-throughput expression profiling, recent studies are beginning to fully reveal the astonishing complexity of the transcript world. In particular, the rapid improvement of next-generation sequencing (NGS) platforms has triggered a wave of new findings based on whole transcriptome sequencing (RNA-seq). RNA-seq utilizes the capabilities of NGS to reveal a snapshot of transcript presence and abundances in a sample at a given moment in time. Systematic surveys confirm that a large number of alternative transcripts show sex-specific, developmental stage or age dependent, or tissue or organ specific expression and regulation – reflecting their different functional roles.

The latest discoveries in the ENCODE or SEQC projects have shown that there are tens of thousands of unannotated genes as well as hundreds of thousands additional alternative transcripts. It remains a mystery what their biological function might be, or if many of them just constitute ‘biological noise’ and, if so, how we could discern between the biologically relevant and irrelevant new exon junctions and gene transcripts. Quantitative transcript specific expression profiling will play a key role in researching these questions

Even for strongly expressed transcripts, moreover, gene level expression profiling is not really appropriate, as alternative transcripts are known to often be expressed or regulated specifically. In fact, in pilot experiments, we see substantial differences between gene and transcript level differential expression. The general question arises, how the introduction of alternative transcript discrimination affects differential expression analyses and their functional interpretations.
E32 - ARH/ARH-Seq: Discovery tool for differential splicing in High-throughput data
Axel Rasche, MPI Molecular Genetics, Germany
Matthias Lienhard, MPI Molecular Genetics, Germany
Ralf Herwig, MPI Molecular Genetics, Germany
Short Abstract: Alternative splicing (AS) is a key mechanism for generating the complex proteome of an organism. AS has been observed within a variety of biological conditions, for example, in tissue expression, with respect to human diseases and in protein modification. The computational prediction of alternative splicing from high-throughput data is inherently difficult and necessitates robust statistical measures because the differential splicing signal is overlaid by influencing factors such as gene expression differences and simultaneous expression of multiple isoforms amongst others. We propose ARH, a discovery tool for differential splicing in case control studies that is based on the information-theoretic concept of entropy. ARH-seq works on high-throughput sequencing data and is an extension of the ARH method that was originally developed for exon microarrays. We show that the method has inherent features, such as independence of transcript exon number and independence of differential expression, what makes it particularly suited for detecting alternative splicing events from sequencing data. In order to test and validate our workflow we challenged it with publicly available sequencing data derived from human tissues and conducted a comparison with eight alternative computational methods. In order to judge the performance of the different methods we constructed a benchmark data set of true positive splicing events across different tissues agglomerated from public databases and show that ARH is an accurate, computationally fast and high-performing method for detecting differential splicing events.
E33 - Contribution of robustness and condition-dependency to non-responsive behaviour
Saman Amini, UMC Utrecht, Netherlands
Patrick Kemmeren, UMC Utrecht, Netherlands
Short Abstract: Investigating the role and interplay between individual proteins in biological processes is often performed by assessing the functional consequences of removing one or more genes. This is however hampered by a large number of deletions that result in no detectable phenotype. In Saccharomyces cerevisiae over 80% of deletion mutants have little or no detectable effect on growth under a single laboratory condition. To investigate the regulatory network of S. cerevisiae, we have recently determined the effects on mRNA expression levels for 1,484 deletion mutants. This exposed that 53% of mutants have a phenotype similar to wildtype, confirming previous conclusions about the degree of non-responsiveness. Several studies have investigated the contribution of redundancy and condition-dependency to this phenomenon. However, these focused either on a single explanation or have insufficiently investigated the underlying mechanism. Here, we aim to provide a systematic classification of the relative contributions of the underlying causes to the non-responsive behaviour observed for 784 out of 1,484 deletion mutants. Among the non-responsive mutants, 350 have a close paralog including duplicates from the Whole Genome Duplication (WGD) and Small Scale Duplication (SSD) events. Not all close paralog pairs are expected to be redundant due to functional and sequence divergence. Negative genetic interaction scores and lack of sensitivity in multiple conditions suggested that some of the non-responsive close paralog pairs are likely candidates for providing compensation. Based on these results, we estimate that the contribution of redundancy is somewhere between 10% and 20%, attributing most of the non-responsive behaviour to condition-dependency.
E34 - Wiring lncRNAs to mRNAs and miRNAs in epithelial ovarian cancer stage I
Enrica Calura, University of Padova, Italy
Paolo Martini, University of Padova, Italy
Lara Paracchini, Mario Negri Institute, Italy
Maurizio D\'Incalci, Mario Negri Institute, Italy
Sergio Marchini, Mario Negri Institute, Italy
Chiara Romualdi, University of Padova, Italy
Short Abstract: The global knowledge and discovery of non coding RNAs (ncRNAs) has grown exponentially in the last few years demonstrating that a wide range of biological processes both physiological and pathological are controlled by ncRNAs. Together with miRNAs, long non coding RNAs (lncRNAs) has been shown to be important players in oncogenesis opening a new era in cancer research. LncRNAs, are non coding RNAs longer than 200bp that has been shown to be extremely tissue/condition specific. For lncRNAs, four main molecular functions have been proposed so far: gene expression regulators, miRNA decoys, mediators of the mRNA translation, and modulators of protein activities.
Here we proposed a method to infer the role of lncRNAs and to link them to the biological pathways in Epithelial Ovarian Cancer (EOC) stage I cancer.
We analyze the expression profiles of lncRNA, mRNA and miRNA from 73 EOC stage I patients. We selected 82 lncRNAs according to the survival predictive power and using a network reconstruction approach based on the mutual information we defined lncRNAs putative interactors among miRNAs and mRNAs. Each lncRNA was linked to 15 pathways on average (min 0 - max 70 pathways with at least 3 genes per pathway). On the other hand, “Pathway in cancer” from Kegg was the most targeted pathway. Then using these networks we identified promising regulatory circuits associated to the prognosis.
E35 - Inferring Directional Genetic Interactions from combinatorial, multi-parametric data
Bernd Fischer, German Cancer Research Center (DKFZ), Germany
Thomas Sandmann, German Cancer Research Center (DKFZ), Germany
Thomas Horn, German Cancer Research Center (DKFZ), Germany
Max Billmann, German Cancer Research Center (DKFZ), Germany
Varun Chaudhary, German Cancer Research Center (DKFZ), Germany
Wolfgang Huber, EMBL, Germany
Michael Boutros, German Cancer Research Center (DKFZ), Germany
Short Abstract: Genes display epistatic (genetic) interactions, whereby the presence of one genetic variant can mask, alleviate or amplify the phenotypic effect of other variants. Such a directional relationship is present, for instance, if one gene product positively or negatively regulates the activity of the other, if its function temporally precedes that of the other, or if its function is a necessary requirement for the action of the other. Large-scale synthetic genetic interaction screens have been performed and have been predictive for functional relationships between genes in yeast, E. coli, C. elegans and metazoan cells. To date, all large-scale genetic interaction studies have been designed to detect gene-gene interactions based on the definition of an interaction as a departure from the combination of the genes’ individual effects. This statistical definition provides limited information on the directional relationship between the genes. Here, we present a method that combines genetic interactions on multiple phenotypes to reveal directional relationships, and report a dense regulatory network covering 1367 genes. It reveals the directional, temporal and logical relationships between genes and allows us to dissect regulatory networks using high-throughput intervention experimentation. The network could reconstruct the sequence of protein activities in mitosis, and revealed that the Ras pathway interacts with the SWI/SNF chromatin-remodelling complex, which we show is conserved in human cancer cells. Our work presents a powerful approach for reconstructing directional regulatory networks, and provides a resource for the interpretation of functional consequences of genomic alterations in disease.
E36 - HINT-BC - HMM-based Identification of Transcription Factor Footprints on Bias-Corrected DNase-seq Data
Eduardo Gusmao, RWTH University Aachen, Germany
Martin Zenke, RWTH University Aachen Medical School, Germany
Ivan Costa, RWTH University Aachen, Germany
Short Abstract: After the development of high-throughput DNase I footprinting technique (DNase-seq), many computational methods have been proposed to automatically detect transcription factor (TF) footprints using such data. However, recent research shows that DNase-seq presents an intrinsic cleavage bias regarding the fact that the DNase I enzyme is more likely to digest DNA around certain k-mers. We modified our method - HINT (HMM identification of TF footprints) - to include a cleavage bias-correction step to test whether the DNase I cleavage bias correction could improve computational TF footprint identification. We evaluated our novel approach, termed HINT-BC, together with seven competing methods in a comprehensive evaluation data set which comprises 83 ChIP-seq data sets. We observed that our bias correction strategy mitigated the cleavage bias by evaluating the correlation between the amount of bias and the area under the ROC curve (AUC) statistics for all TFs (p-value < 0.05). Interestingly, we find that DNase-seq profiles indicate that the bias-corrected signal fits better the DNA sequence binding affinity of the TFs than the uncorrected signal. As expected, our method presents a significantly higher AUC than the competing methods and the uncorrected version of our method (Friedman-Nemenyi hypothesis test; p-value < 0.05). These findings suggest that proper correction of DNase-seq cleavage bias mitigates the impact of such bias regarding the performance of computational footprinting. Our method is available as a command-line tool within the framework of the Regulatory Genomics Toolbox (RGT).
E37 - The LAILAPS search engine – travel through the network of plant genome databases
Matthias Lange, Leibniz-Institute of Plant Genetics and Crop Plant Research, Germany
Jinbo Chen, Leibniz-Institute of Plant Genetics and Crop Plant Research, Germany
Christian Colmsee, Leibniz-Institute of Plant Genetics and Crop Plant Research, Germany
Uwe Scholz, Leibniz-Institute of Plant Genetics and Crop Plant Research, Germany
Daniel Arend, Leibniz-Institute of Plant Genetics and Crop Plant Research, Germany
Short Abstract: We will present LAILAPS (, an information retrieval system to mine plant genomic data in the context of phenotypic attributes. The software has been developing since 2007 as an information retrieval (IR) research project and is now part of the European transPlant consortium, which aims to build a transnational plant genomics infrastructure.
We will show LAILAPS’ capabilities as an integrated IR environment to associate candidate genes, related –omics facts and traits of interest. Using an interactive query assistance and support for evidence ranked cross references, the scientist is guided through the network of genomic databases. Using an artificial neural network, the extracted knowledge excerpts are sorted by its relevance. We will discuss the recall and precision of this relevance prediction and argue which are the best performing discriminating relevance features in life science database records. Finally, we will demonstrate LAILAPS’ application for knowledge extraction based on two agronomic traits.
E38 - The regulatory landscape of alcohol response in liver cells
Anat Kreimer, Columbia University, United States
Nadav Ahituv, UCSF, United States
Nir Yosef, UC Berkeley, United States
Short Abstract: Alcohol abuse is one of the leading causes of death and disability in the world; killing over 3 million people a year. The main tissue damaged is the liver, causing what is termed as alcohol liver disease (ALD). Alcohol is thought to cause significant changes in gene transcription in the liver, leading to numerous diseases. Yet, while we know that alcohol has potent effects on transcription, no study has comprehensively identified how it exerts these effects, nor systematically mapped the downstream targets and binding sites of the affected TFs.

Here we systematically study the molecular components and pathways that control the transcriptional response of human hepatocytes to ethanol using four different approaches: (1) Genomic methods to profile the regulatory landscape of hepatocytes during the response to ethanol; (2) High-throughput screens to measure the transcriptional activity of fully synthesized regulatory DNA regions using reporter mRNA constructs; (3) Novel computational algorithms to characterize the key alcohol response regulatory DNA regions, identify the key transcriptional regulators, and predict the logics of their joint activity in cis; (4) A combined computational-experimental strategy to explore how genetic variation in the regions we identify can affect susceptibility for liver diseases, by comparing against relevant genome wide association studies, and using genome editing tools (CRISPR/Cas9 system).

Ultimately, we generate a comprehensive map of the genomic regulatory landscape and genetic pathways of alcohol response. This provides a better understanding of how alcohol can lead to various liver diseases and identify potential therapeutic candidates.
E39 - Finding interspecies patterns of response to toxins in chemical genomics data
Oleg Moskvin, University of Wisconsin-Madison, United States
Li Hinchman, University of Wisconsin-Madison, United States
Chad Myers, University of Minnesota-Twin Cities, United States
Hirotada Mori, Nara Institute of Science and Technology, Japan
Jeffrey Piotrowski, University of Wisconsin-Madison, United States
Short Abstract: Using barcoded single gene knockout libraries for Saccharomyces cerevisiae, Escherichia coli and Zymomonas mobilis, we performed selection tests for improved fitness in presence of 29 chemical compounds that are relevant to the process of cellulosic biomass decomposition within a biofuel production chain. We used pathway- and metabolite-centric views on the derived genetic fitness profiles to analyze interspecies patterns of cellular responses to toxins in the growth medium. Strategies of interspecies comparisons in chemical genomics learned from this experiment and potentially applicable to wider range of experimental setups and biological questions are discussed.
E40 - Reference-based and de novo assembly as a combined strategy to identify canonical transcripts and potential novel splice variants in proteogenomics
Raphael Tavares Silva, Fundação Oswaldo Cruz, Brazil
Fabio Passetti, Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute, FIOCRUZ, Brazil
Nicole Scherer, Bioinformatics Unit, Clinical Research Coordination, Instituto Nacional de Câncer (INCA), Brazil
Carlos Gil Ferreira, Clinical Research Coordination, Instituto Nacional de Câncer (INCA), Brazil
Short Abstract: Transcriptome assembly from RNA-Seq data has been a challenging field. In conjunction with the emerging field of proteogenomics, the transcriptome assembly may be a promising strategy to obtain the comprehensive proteome from an organism. Reference-based and de novo assembling methods have been developed and used in the transcriptome analysis for many organisms. Here, we used a strategy combing reference-based assembly to detect canonical transcripts and de novo assembly to identify potential novel splice variants with focus on proteogenomics. mRNA-Seq reads from mouse liver publicly available (SRR1462347) were aligned against mouse genome (mm9) and submitted to a reference-based assembly with Cufflinks. We filtered reads to obtain only those discarded in the reference-based assembly, which were aligned to known mouse gene sequences to cluster related reads. Each cluster was then separately used by the Trinity de novo assembler for isoform detection. From a single run, 18,355 transcripts were found with the reference-based assembly, while with the de novo assembly we could reconstruct further 11,074 transcripts from 3,498 genes, resulting in an average of approximately 3 alternative transcripts per gene. This preliminary results indicate that this combination of reference-based and de novo assembly are viable and a promising strategy to explore transcriptome assembly with focus on proteogenomics. We will use this data as input to the in silico transcriptome translation method developed by our group termed SpliceProt. Financial support: INCA/MS, FIOCRUZ, CAPES, Fundação do Câncer, FAPERJ and CNPq.
E41 - De novo transcriptome assembly and in silico expression profiles of Sebastes schlegeli
Seung Jae Noh, Ph.D., Korea, Rep
Sathiyamoorthy Subramaniyam, Codes Division, Insilicogen, Inc. , Korea, Rep
Seungil Yoo, Codes Division, Insilicogen, Inc. , Korea, Rep
Jehee Lee, School of Marine Biomedical Sciences, Jeju National University, Korea, Rep
Jae-Koo Noh, Genetics & Breeding Research Center, National Fisheries Research & Development Institute , Korea, Rep
Bohye Nam, Biotechnology Research Division, National Fisheries Research & Development Institute, Korea, Rep
Short Abstract: Korean rockfish, Sebastes schlegeli, is one of the most commercially important marine finfish, and highly produced through aquacultures due to its viviparous reproduction style and adaptable to cold and different water conditions. In spite of its economical value, there is no genomic information available for Korean rockfish till now. In this study, we generated complete transcriptome from three sampling conditions, i.e. control, Streptococcus iniae-infected, and PolyI:C-treated, at two time points (6 hr & 24 hr). Poly-A positive RNAs collected from five immune-related tissues, i.e. head kidney, spleen, liver, gill and peripheral blood, were sequenced by Illumina NextSeq platform. From 30 conditions, 118.4 Gbp processed clean reads were obtained as a total and used for de novo transcriptome assembly, resulting 210,321 transcripts with 326 Mbp total size, range of 500-27,669 bases. Among them, 74,891 transcripts (35.6%) were functionally annotated from UniProt, Gene Ontology, and KEGG pathway databases using BLAST2GO. In silico expression analyses between tissues and between different treatments, revealed 3,014 transcripts were regulated more than 2 fold in treated conditions than controls. Aid of annotations and predictions, 46 transcripts which are responsible for pattern recognition receptors (lectins, TLRs, NLRs) and immune signaling pathways (NF-kappaB, TNFalpha, cytokines) were validated by real-time PCR. In addition, polymorphic microsatellite markers were also predicted by different computational methods. To the best of our knowledge, this is the first report about comprehensive transcriptome and expression profiles for Korean rockfish which can aid to further genomics and functional approaches.
E42 - Gene Prioritization through Bayesian matrix factorization
POOYA ZAKERI, Katholieke Universiteit Leuven, Belgium
Jaak Simm Simm, 1-Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, KU Leuven. 2- iMinds Medical IT, Kasteelpark Arenberg 10, box 2446, 3001 Le, Belgium
Adam Arany, 1-Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, KU Leuven. 2- iMinds Medical IT, Kasteelpark Arenberg 10, box 2446, 3001 Le, Belgium
Sara Elshal, 1-Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, KU Leuven. 2- iMinds Medical IT, Kasteelpark Arenberg 10, box 2446, 3001 Le, Belgium
Yves Moreau, 1-Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, KU Leuven. 2- iMinds Medical IT, Kasteelpark Arenberg 10, box 2446, 3001 Le, Belgium
Short Abstract: In biology, there is often the need to discover the most promising genes among large list of candidate genes to further investigate. While a single data source might not be effective enough, fusing several complementary genomic data sources results in more accurate prediction. Moreover, fusing the phenotypic similarity of diseases and sharing information about known disease genes across both diseases and genes through a multi-task approach, enable us to handle gene prioritization for diseases with very few known genes (or even no known gene) and genes with limited available information.
The standard approaches for gene prioritization often models each diseases individually, that fails to capture the common patterns in the data. This motivates us to formulate the gene prioritization task as a factorization of an incompletely filled gene-disease-matrix where the objective is to predict unknown values.
We propose an extended Bayesian matrix factorization with an ability to work with multiple side information sources for gene prioritization task, which allows to make out-of-genes-disease-matrix ranking. In our proposed framework we are also able to integrate both genomic data sources and diseases information, whereas earlier approaches for gene prioritization are limited to only fuse genomic information. Moreover, although existing gene prioritization methods have difficulty with the treatment of missing values, our approach can handle missing values easily without doing the typical missing value imputation.
Experimental result on the updated version of Endeavour benchmark demonstrates that our proposed model can effectively improve the accuracy of the state-of-the-art gene prioritization model.
E43 - Co-clustering of the phenome-genome in mice for causative mutation discovery
Michelle Simon, Medical Research Council,
Saumya Kumar, Medical Research Council,
Simon Greenaway, Medical Research Council,
Andrew Blake, Medical Research Council,
Siddharth Sethi, Medical Research Council,
Ann-Marie Mallon, Medical Research Council,
Short Abstract: Understanding the risk factors which underlie the metabolic reactions, cell signaling and developmental pathways in human aging remain largely unknown. Discovery of candidate genes and phenotypes associated with aging mice will provide a wealth of information on age-related changes and gene molecular function. The Phenotype-driven screen after chemical mutagenesis of males with N-ethyl-N-nitrosourea (ENU) at MRC Harwell has been a productive method for identifying mouse models of aging.

Mutagenized mice are aged for 18 months with phenotypic assessments repeated at regular intervals, todate >8000 G3 mice have completed the screen. Using robust statistical techniques and ontology terms for detecting phenodeviants we can determine different types of aging phenotypes from early to late onset and progressive, chronic or idiopathic phenotypes. Aging mice harboring interesting phenotypic changes are subsequently characterised by whole genome next generation sequencing (WGS) and the causative mutation identified.

Our aim is to effectively integrate all our datasets to discover interesting and novel mouse models of human disease; here we present our efforts in clustering the phenotypes identified in each pedigree alongside the genotype data found in the G1 mice. This study provides basic data on the large-scale phenotyping of Aging ENU mice and their associated genome. Aging data can be found at
E44 - Genomics analysis of the respiration the microbiota of human intestine
Dmitry Ravcheev, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg
Ines Thiele, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg
Short Abstract: The intestinal microbiota has been intensively studied during last years; however, respiratory capacities of the gut microbiota have been investigated for only a limited number of model organisms. Here, we present a systematic analysis of respiration genes encoded by the genomes of human gut habitants. Our study included an analysis of genes for ATP synthases, respiratory reductases for electron acceptors, and quinone biosynthesis.
We applied genomic analysis to 254 microorganisms commonly found in the human gut. The investigated genomes belong mostly to Firmicutes, Bacteroides, Proteobacteria, Actinobacteria, and Fusobacteria phyla. We found ATP synthases of F- and/or V-type in all analyzed genomes. Additionally, the reference genomes demonstrated perceptible variations in the distribution of respiratory reductases. The analysis of the studied genomes revealed aerobic reductases and anaerobic reductases for tetrathionate, thiosulfate, polysulfide, sulfite, adenylyl sulfate, heterodisulfides, fumarate, trimethylamine N-oxide, dimethyl sulfoxide, nitrate, nitrate, nitrogen oxide, nitrous oxide, selenate, and arsenate. Also, we predicted the one novel aerobic reductase and one novel anaerobic thiosulfate reductase.
The analysis of the quinone pathway distribution also revealed one alternative enzyme for the ubiquinone biosynthesis and three alternative enzymes for the menaquinone biosynthesis. Additionally, we predicted three previously unknown enzymes to be involved in the futalosine pathway for the menaquinone biosynthesis. Overall, the comparison of our predictions with known distributions of quinone biosynthetic pathways and of respiratory reductases showed an agreement of 88.6% for studied genomes.
Taken together, this work substantially expands our knowledge on respiratory pathways in bacteria and physiology of the human gut microbiome.

View Posters By Category

Search Posters: