Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner


Accepted Posters

If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.


Track: RNA

Session A-189: Evolution of Mutually Exclusive Splicing in Drosophila
COSI: RNA
  • Ivan Molodtsov, Skolkovo Institute of Science and Technology, Russia
  • Dmitri Pervouchine, Skolkovo Institute of Science and Technology, Russia

Short Abstract: Mutually exclusive splicing is an important splicing pattern that gives rise to functionally distinct proteins. It appears to be highly specific since it gives rise to mRNA variants that contain exactly one exon from a set of mutually exclusive exons. Several mechanisms have been recognized to drive mutually exclusive splicing. Of these, the mechanism of formation of long-range secondary structures in mRNA appears to be of a specific interest since it is capable of explaining mutually exclusive splicing of numerous cassette exons.
Our goal is to examine mutually exclusive splicing events in a set of closely related Drosophila species. To this end, we built a tool for joint visualization of splicing events and competitive RNA secondary structures. We hypothesize that since mutually exclusive exons in most cases result from tandem exon duplication, such competitive regulatory elements may also result from duplications of selector sequences.
Our hypothesis suggest an evolutionary mechanism of generation of mutually exclusive splicing patterns by tandem exon duplication along with the duplication of neighbouring selector sequences that creates competitive RNA secondary structures. In theory, this mechanism might also be applicable to other complex splicing events such as multiple exon skipping, where tandem exon duplication should happen within the base pairing region leading to mutually inclusive splicing pattern. However, here we show that in Drosophila this mechanism is likely to be unique for mutually exclusive exons. For instance, in other complex splicing events such as multiple exon skipping there is little or no evidence of tandem exon duplication.

Session A-191: A Snapshot of the Escherichia coli RNA-RNA Interactome
COSI: RNA
  • Christoph Schaal, University of Stuttgart, Institute of Biochemical Engineering, Computational Biology, Germany
  • Richard Schäfer, University of Stuttgart, Institute of Biochemical Engineering, Computational Biology, Germany
  • Björn Voß, University of Stuttgart, Institute of Biochemical Engineering, Computational Biology, Germany

Short Abstract: Recent years saw an explosion in the discovery of ncRNAs in all kingdoms of life. NcRNAs commonly interact with other RNAs, mainly mRNAs, by complementary base pairing, to carry out their regulatory function. The diversity and large number of ncRNAs demands a high-throughput method to identify and unravel targets and functions of ncRNAs. Psoralen crosslinking is a very promising tool for the detection of RNA-RNA interactions in vivo and in a transcriptome-wide fashion. Direct-Duplex-Detection methods (DDD) [1] rely on the Psoralen-mediated reversible crosslinking of interacting RNAs, their ligation to obtain a single RNA molecule, subsequent sequencing and mapping of reads to the genome. The duplex RNAs, obtained after nuclease digestion to enrich for crosslinked RNAs, may often be (nearly) blunt-ended and thus a poor substrate for the ligation of the 5´-end to the 3´-end at one side of the duplex. We improved the ligation efficiency by adding nucleotides at the 3´-ends through terminal deoxynucleotidyl transferase (TdT) treatment. Thereby, the overhang gains flexibility necessary for efficient ligation. Here we present the DDD protocol optimized in this way and the results we obtained for the RNA-RNA interactome of E. coli under standard growth conditions. In comparison to results from another group [2], we detected more interactions in total and recovered more known interactions when compared with sRNATarBase, which holds experimentally verified RNA-RNA interactions. References 1. Weidmann et al. (2016) Trends in Biochemical Sciences. 41(9) 2. Liu et al. (2017) BMC Genomics, 18, 343

Session A-193: Analysis of chimeric reads for the detection of RNA-RNA Interactions
COSI: RNA
  • Richard Schäfer, University of Stuttgart, Institute of Biochemical Engineering, Computational Biology, Germany
  • Christoph Schaal, University of Stuttgart, Institute of Biochemical Engineering, Computational Biology, Germany
  • Björn Voß, University of Stuttgart, Institute of Biochemical Engineering, Computational Biology, Germany

Short Abstract: The ability of RNA to base-pair with itself and other RNAs is crucial for its function in vivo. While there are reasonable approaches to map RNA secondary structures genome-wide, understanding how different RNAs interact to carry out their regulatory functions requires mapping of intermolecular base pairs. Recently, different strategies to detect RNA-RNA duplexes in cells, termed direct duplex detection (DDD) methods (reviewed in [1]) have been developed. Common to all is that they rely on Psoralen mediated in vivo crosslinking and RNA Proximity Ligation (RPL) [2], which covalently links the interacting RNA strands. Subsequently, the RNA is sequenced using RNA-seq and analyzed with respect to inter- and intramolecular RNA-RNA interactions. The methods that have been used so far implement strict algorithms that lack a sophisticated processing of the reads and tend to miss captured interactions. In this work, we present a general pipeline for the inference of RNA-RNA interactions from raw DDD reads. We applied our pipeline to data from different DDD methods and compared our results to the original ones. This showed that our method due to its tolerant primary data analysis reconstructs more information about known and novel RNA-RNA interactions that otherwise would have been lost. In order to ensure comparability between the established and future DDD methods there is a need for a standardized pipeline to analyze chimeric reads to infer inter- and intramolecular interactions and to guarantee the reproducibility of the analysis. References 1. Weidmann C.A. et al. (2016) Trends in Biochemical Sciences. 2. Ramani V. et al. (2015) Nature Biotechnology.

Session A-195: miRNAmotif – a tool for prediction of pre-miRNA:protein interactions
COSI: RNA
  • Martyna Urbanek, Institute of Bioorganic Chemistry PAS, Poland
  • Edyta Jaworska, Institute of Bioorganic Chemistry PAS, Poland
  • Wlodzimierz Krzyzosiak, Institute of Bioorganic Chemistry PAS, Poland

Short Abstract: In human cells, mature microRNAs (miRNAs) are produced from primary precursors (pri-miRNA) through intermediate step of pre-miRNA precursor, with or without the use of canonical protein machinery that includes Drosha/DGCR8 and Dicer. Complexity of miRNA maturation process is caused by the involvement of multiple other regulating proteins that bind directly to distinct miRNA precursors in sequence- or structure-dependent manner. Thus far, a number of proteins were shown to bind to the terminal loop of miRNA precursors (e.g., hnRNPA1, HuR, KSRP, Lin28, MBNL1, MCPIP1) and other auxiliary proteins were demonstrated to interact with stem portion or flanking sequences of pre- and pri-miRNA. In plants different proteins are involved in miRNA biogenesis, and in both animal and plant systems multiple auxiliary components of their miRNA biogenesis machineries remain to be identified. To facilitate their finding we present here a web server that enables to search for miRNA precursors, that can be recognized by diverse RNA binding proteins, based on known sequence motifs. The database used by the server contains known human, murine and A. thaliana pre-miRNAs. The server may also be used to predict new RNA binding protein motifs based on a group of user provided sequences. We show examples of miRNAmotif applications, presenting precursors that contain motif recognized by Lin28 and predicting motifs within pre-miRNA precursors that are recognized by DDX1.

Session A-197: SMARTIV: a Novel Method for RNA Sequence and Structure Motif Discovery from In-vivo Binding Data
COSI: RNA
  • Maya Polishchuk, Technion - Israel Institute of Technology, Israel
  • Inbal Paz, Technion - Israel Institute of Technology, Israel
  • Yael Mandel-Gutfreund, Technion - Israel Institute of Technology, Israel

Short Abstract: RNA binding proteins (RBPs) are essential for cell processes. Many RBPs recognize specific RNA binding sites characterized by specific short sequences known as binding motifs. Besides primary RNA sequence, the structure of the RNA target is known to play a major role in RBP-RNA recognition. Inferring both sequence and structure preferences of RBPs remains a big challenge. Here we present a novel method, named SMARTIV (Sequence and Structure Motif Enrichment Analysis for Ranked RNA daTa generated from In-Vivo binding experiments), for enriched motif discovery from in-vivo high-throughput RNA binding data. SMARTIV uses ranked numerical sequence scores from results of CrossLinking and ImmunoPrecipitation (CLIP) experiments and predicted secondary structure of the sequences to generate motifs. SMARTIV motifs are concisely represented in a combined sequence and structure 8-letter alphabet ACGUacgu (upper case for unpaired and lower case for paired nucleotides). SMARTIV is an extremely fast algorithm representing motif sequence and structure in an informative single logo and is available both as a stand-alone program and a user-friendly web-server (http://smartiv.technion.ac.il). The method is based on the ranked CLIP-data with no requirement to split the input data into bound and unbound datasets. SMARTIV provides data-driven p-value assessment for the detected motifs. We tested our method on CLIP-seq data for a variety of RBPs and show that our results are highly consistent with previously known sequence and structure binding preferences of the proteins.

Session A-199: Sparse sequencing-based profiles of isomiRs for non-invasive diagnostics of inflammatory phenotypes
COSI: RNA
  • Matthias Hübenthal, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
  • Simonas Juzėnas, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
  • Sebastian Zeißig, Department of Internal Medicine I, University Hospital Schleswig-Holstein, Kiel, Germany
  • Nina Strüning, Department of Internal Medicine I, University Hospital Schleswig-Holstein, Kiel, Germany
  • Andreas Keller, Chair for Clinical Bioinformatics, Saarland University, Saarbrücken, Germany
  • Dominik Schulte, Department of Internal Medicine I, University Hospital Schleswig-Holstein, Kiel, Germany
  • Hanja Kramer, Department of Internal Medicine I, University Hospital Schleswig-Holstein, Kiel, Germany
  • Mauro D’amato, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden
  • Jonas Halfvarson, Department of Gastroenterology, Faculty of Medicine and Health, Örebro University, Örebro, Sweden
  • Stefan Schreiber, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
  • Georg Hemmrich-Stanisak, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
  • Andre Franke, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany

Short Abstract: Inflammatory bowel disease (IBD) is a chronic intestinal disease entity comprising two major subtypes: Crohn’s disease (CD) and ulcerative colitis (UC). Because of extensive heterogeneity in the disease presentation, behavior, and response to treatment, the diagnostic distinction of CD and UC remains a clinical challenge. In this study we aimed to evaluate the suitability of sequencing-based isomiR expression profiling and state-of-the-art machine learning techniques for non-invasive diagnostic tests. Full blood was drawn from a total of 672 German and Swedish individuals, in order to profile isomiRs for the following traits: CD (138 treated, 44 untreated samples), UC (108 treated, 49 untreated samples) as well as different types of controls (333 samples). After normalization sequencing read count data was corrected for biological (country-of-origin and sex) as well as non-biological experimental variation (sequencing machine, run, chemistry and technician performing library preparation) using empirical Bayes adjustments. Subsequently, various types of (penalized) support vector machines (SVMs) were employed to solve binary classification problems, considering main diagnoses (CD, UC, controls) as well as subphenotypes (CD location, CD behaviour, UC extent) and evaluated with respect to model performance, stability and sparsity. In terms of median Matthews correlation coefficient (MCC) resulting models showed remarkable predictive performance estimated as being 1.00 (main diagnoses) or ranging from 0.66 to 0.76 (CD behavior), 0.68 to 0.76 (CD location) and 0.69 to 0.76 (UC extent), respectively, incorporating a median number of 754 to 1298 (non-penalized models) and 1 to 39 isomiRs (penalized models).

Session A-201: High-resolution miRNome analysis of the human peripheral blood
COSI: RNA
  • Simonas Juzenas, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany
  • Geetha Venkatesh, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany
  • Matthias Hübenthal, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany
  • Zhipei Gracie Du, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany
  • Maren Paulsen, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany
  • Philip Rosenstiel, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany
  • Philipp Senger, Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Germany
  • Martin Hofmann-Apitius, Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Germany
  • Andreas Keller, Clinical Bioinformatics, Saarland University, Germany
  • Limas Kupcinskas, Institute for Digestive Research, Academy of Medicine, Lithuanian University of Health Sciences, Lithuania
  • Andre Franke, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany
  • Georg Hemmrich-Stanisak, Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Germany

Short Abstract: MicroRNAs (miRNAs) are small noncoding regulatory RNAs, which are involved in complex regulatory processes including inhibition of translation and modulation of transcript stability. These molecules are implicated in the pathogenesis of oncological, immune-related, cardiac and other diseases. The discovery of circulating miRNAs in serum, plasma, and other body fluids has attracted great interest in biomarker research. To date, numerous studies have reported circulating miRNAs as biomarkers for a variety of diseases. However, the origin of these circulating miRNAs has been poorly examined. With this study, we provide a comprehensive reference dataset of detailed miRNA expression profiles from seven types of human peripheral blood cells, serum, exosomes and whole blood. The peripheral blood cells from buffy coats were typed and sorted using FACS/MACS. The overall dataset was generated from 450 small RNA libraries using high-throughput sequencing. We define the cell lineage-specific miRNA/isomiR expression and modification patterns. Furthermore, we identify novel cell-type specific miRNA candidates. The study provides the most comprehensive contribution to date towards a complete miRNA catalogue of human peripheral blood, which can be used as a reference for future studies. The dataset is publicly available on GEO and also can be explored interactively following this link: http://134.245.63.235/ikmb-tools/bloodmiRs.

Session A-203: A comprehensive meta-analysis of the induced immediate early response reveals a conserved order of promoter activation
COSI: RNA
  • Annalaura Vacca, MRC HGU, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom, United Kingdom
  • Stuart Aitken, MRC HGU, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom, United Kingdom
  • Colin A Semple, MRC HGU, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom, United Kingdom

Short Abstract: background Human cells respond to a broad range of stimuli with a characteristic burst of transcription within minutes at many sites across the genome; this underlies differentiation, responses to cellular stress, and inflammation. The earliest events involve the transient activation of the promoters of immediate-early genes (IEGs), a special class of genes dysregulated in developmental diseases and cancer [1, 2]. However, the core IEG repertoire active across cellular responses, and the mechanisms underlying IEG induction remain controversial. results Here we present a rigorous meta-analysis of 8 genome-wide FANTOM5 CAGE time course expression datasets [3]. These unique datasets measure the responses of different human cell types (including embryonic, immune and cancer cells) to many different stimuli over the first 5 hours post stimulus. Using novel approaches we classify promoter expression profiles to several predefined patterns, including a transient early peak representing IEG dynamics, to identify the likely core complement of IEGs. These genes are strongly enriched for known IEGs and relevant Gene Ontology terms, but also include a number of compelling novel IEG candidates, including transcription factors and noncoding RNAs. However, comparing genes classified as IEGs between datasets we found differences in their numbers and identity, which explain the heterogeneity and inconsistencies of the literature related to the immediate-early response Furthermore, we show that the promoters of many candidate and known IEGs are activated in a consistent order across datasets, suggesting deeper levels of conservation in the regulation of immediate-early responses. conclusions Here we exploit unusual, densely sampled promoter expression datasets using novel approaches to estimate the core IEG response of human cells to cellular stimuli. We discuss surprising candidate IEGs that may be important new factors in diseases related processes and the potential roles for non-coding RNAs in the immediate-early response. We also uncover unexpected conservation in the temporal order of promoter activation across stimuli and cell types. references 1. Healy, S., P. Khan, and J.R. Davie, Immediate early response genes and cell transformation. Pharmacology & therapeutics. 137(1): p. 64-77 (2013). 2. Fowler, T., R. Sen, and A.L. Roy, Regulation of primary response genes. Molecular cell 44(3): p. 348-360 (2011). 3. Lizio, M., et al., Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals. Nucleic Acids Research. 45(D1): p. D737-D743 (2017).

Session A-205: Revealing the role of Hfq C-terminus in sRNA regulation through RNAseq profiling
COSI: RNA
  • Chin-Hsien Tai, National Cancer Institute, NIH, United States
  • Kumari Kavita, National Cancer Institute, NIH, United States
  • Aixia Zhang, NICHD, NIH, United States
  • Gisela Storz, NICHD, NIH, United States
  • Susan Gottesman, National Cancer Institute, United States

Short Abstract: Background and Methods: Hfq is a homohexameric RNA chaperone that stabilizes small noncoding RNAs (sRNAs) and facilitates riboregulation by promoting sRNA base pairing with target mRNAs. Recent studies showed that the C-terminus contributed to the stability of a subset of sRNAs (Class II sRNAs) and release of RNA from Hfq.  Here we further investigate the global effect of the C-terminus for Hfq interactions in E. coli by comparing RNAseq data of wild type and C-terminus deleted Hfq65, total RNA as well as Hfq immunoprecipitated (IP) RNA samples.   Results: Comparing the IP Hfq65 mutant to IP wild type samples, 82 genes are 2 fold down regulated in Hfq65.  Among them, 1 is antisense RNA, 3 are tRNAs, and 12 are Hfq-binding sRNAs, including 3 of 4 Class II sRNAs.  Many of the sRNA targets are among the mRNAs been down regulated.  Further investigation is being undertaken to confirm these findings and investigate the effects on tRNAs.  Differences were also observed for the regions flanking mature tRNAs, possibly reflecting changes in tRNA processing or independent roles of these RNA regions.  Conclusion: About 1/5 of sRNAs in the database used here are 2 fold down regulated in the Hfq65.   In vivo results thus far are consistent with the in vitro role for the C-terminus in modulating on and off rates for RNAs, but suggest that this is not rate-limiting for steady state RNA levels for most genes.    A better computational tool is needed to capture and analyze RNAseq signals for tRNA precursors and non-coding regions.

Session A-207: Scalable and accessible clustering of ncRNAs based on sequence and secondary structures
COSI: RNA
  • Milad Miladi, University of Freiburg, Germany
  • Eteri Sokhoyan, University of Freiburg, Germany
  • Torsten Houwaart, University of Freiburg, Germany
  • Rolf Backofen, Albert-Ludwigs-University Freiburg, Germany
  • Bjoern Gruening, University of Freiburg, Germany

Short Abstract: A large portion of the human transcriptome is not coding for proteins. These non coding RNAs (ncRNAs) have been shown to be associated with a large range of important cellular functions. Clustering of RNA sequences is currently one of the prevalent approaches for detecting and annotating the function of putative ncRNAs and regulatory elements. The structure of ncRNAs often plays a crucial rule and is usually better conserved than the sequence, making it computationally more expensive to compare them with traditional algorithms. Thanks to the pervasive availability of the transcriptomic and metatranscriptomic data, generated by high-throughput sequencing (HTS), efficient and easy-to-use approaches are highly demanded. Here we introduce Galaxy-GraphClust, a workflow for large-scale clustering of RNAs based on sequential and structural similarity in linear-time that is provided via the Galaxy framework. This extension of GraphClust considerably simplifies clustering and analysing large amounts of RNA sequences by making it possible to: a) interactively perform the clustering via a web interface, b) support back end computation on diverse platforms, ranging from personal computers to large scale computer clusters and the cloud, c) run both HTS transcriptomic analysis and structural clustering in a homogeneous manner. The highly modular design of Galaxy-GraphClust has made it possible to enhance the clustering performance by offering alternative RNA structure prediction and annotation tools and incorporating chemical structure probing data. We also present the applicability of the tool for predicting conserved structural motifs under the presence of noisy unrelated sequences and long surrounding contexts. Availability: https://github.com/BackofenLab/docker-galaxy-graphclust

Session A-209: isomiR-SEA: miRNA and isomiR expression level detection in seven RNA-Seq datasets
COSI: RNA
  • Gianvito Urgese, Politecnico di Torino, Italy
  • Giulia Paciello, politecnico di torino, Italy
  • Enrico Macii, Politecnico di Torino, Italy
  • Andrea Acquaviva, Politecnico di Torino, Italy
  • Elisa Ficarra, Politecnico di Torino, Italy

Short Abstract: Massive parallel sequencing of transcriptomes revealed the presence of miRNA variants named isomiRs. The sequence variations identified within isomiR molecules with respect to the miRNA sequences with which they share the same seed can affect their targeting activity. With consequences in gene expression and potential impact in multi-factorial diseases. miRNAs are considered good biomarkers, making their adoption for disease characterization highly desirable. Several methodologies and tools were devised to identify and quantify miRNAs from sequencing data. However, all these tools are built on-top of general-purpose alignment algorithms, providing poorly accurate results and no information concerning isomiRs and conserved miRNA-mRNA interaction sites. To overcome these limitations we developed the isomiR-SEA algorithm. By implementing a miRNA-specific alignment procedure, isomiR-SEA analysis accounts for accurate miRNA/isomiR expression levels and for a precise evaluation of the conserved interaction sites. As first, isomiR-SEA identifies miRNA seeds within the tags. If the seed is found, the alignment is extended and the positions of the encountered mismatches recorded. Then, the collected info is evaluated to distinguish among miRNAs and isomiRs and to assess the conservation of the interaction sites. isomiR-SEA performance was assessed on 7 public RNA-Seq datasets. 40% of reads attributed to miRNAs (189M) comes from mature miRNAs, 50% derives instead from 3’ isomiRs, and the remaining reads account for 5’/SNP isomiRs or combinations between them. Furthermore, about 2% of reads lost some interaction sites. This proves the importance of a miRNA-specific alignment algorithm to correctly evaluate miRNA targeting activity. For further Information, please see eda.polito.it/isomir-sea/

Session A-211: PureCLIP: capturing target-specific protein-RNA interaction footprints from single-nucleotide CLIP-seq data
COSI: RNA
  • Sabrina Krakau, Max Planck Institute for Molecular Genetics, Germany
  • Hugues Richard, University Pierre and Marie Curie, France
  • Annalisa Marsico, Max Planck Institute for Molecular Genetics, Germany

Short Abstract: RNA binding sites for a protein of interest can now be detected genome-wide and at a high resolution thanks to the development of CLIP-seq technologies. Among these methods, iCLIP and eCLIP provide single-nucleotide resolution and are particularly powerful in characterizing protein-RNA interaction landscapes. However, current methods do not address both problems of peak calling and crosslink sites detection simultaneously, and fail to model the various sources of biases, such as transcript abundances or unspecific crosslink (CL) sequence motifs. We developed an approach based on a non-homogeneous Hidden Markov model, which calls individual crosslink sites taking into account both regions enriched in protein bound fragments and the specifics of iCLIP truncation patterns. Our modeling framework also incorporates information from various covariates, such as RNA abundances or information from CL-motifs. We extensively validated the superiority of our approach over other common strategies, both within a realistic iCLIP simulation setup (using real RNA-seq data) and on five published iCLIP/eCLIP datasets where the protein's predominant binding regions are known. Over a large range of simulation parameters, our tool recovers binding sites with a better accuracy than other methods. Further, on all real datasets our approach is more precise in determining the bona fide binding sites. Our results show the importance of combining peak calling and cross link site detection when analyzing iCLIP or eCLIP data. We also show that the incorporation of covariates (input signals, as well as CL-motifs) clearly improves the accuracy of the calls.

Session A-213: Deep-learning-based prediction of functional RNA binding sites using primary sequence
COSI: RNA
  • Daria Romanovskaia, Skolkovo Institute of Science and Technology, Russia
  • Dmitry Svetlichnyy, Skolkovo Institute of Science and Technology, Russia
  • Dmitri Pervouchine, Skolkovo Institute of Science and Technology, Russia

Short Abstract: Background RNA-binding proteins (RBPs) play an important role in alternative splicing and other RNA processing steps. Mutations that occur in splice sites are the major risk factor for genetic diseases such as neurological disorders or cancer. Experimental validation of a mutation’s impact is expensive and time consuming. Thus, in silico predictions using primary RNA sequence provide a convenient way to score and prioritize mutations [1], but requires a highly accurate computational model capturing RBP binding “code” of the cis-RNAs regulatory elements. Results Using a large collection of functional RBP binding sites derived for K562 and HepG2 cell lines from eCLIP-seq experiments within ENCODE project [2], we developed a machine learning method that predicts RBP binding. The dataset contains nearly 80 RBPs profiled in a genome-wide manner with eCLIP-seq method. As a background, we used regions that were sampled randomly from the human genome that were matched by the GC content with the foreground set. Notably, we used an unbalanced dataset with 20-100 times more negative regions in order to have a classifier with high specificity. To train our model we used convolutional neural networks. Several topologies were tested in order to select network structure yielding the highest prediction power. To this end, we varied the size, the number of filters (to learn long RNA binding motifs from primary sequence), and number of convolutional layers to select the best performing models to predict binding of a certain RBP type. Conclusion Performance of our models varies depending on the RBP type. Also we found that for the majority of RPBs, the optimal network topology converges to a certain architecture and the number of filters, but not the number of convolutional layers, is the most sensitive hyperparameter with respect to the classification performance. We applied our classifiers to identify variants that may affect binding of regulatory proteins to RNA. References [1] Li, X., Kazan, H., Lipshitz, H. D., & Morris, Q. D. (2014). Finding the target sites of RNA-binding proteins. WIREs RNA, 5, 111–130. http://doi.org/10.1002/wrna.1201 [2] Redesigning CLIP for efficiency, accuracy and speed. Nature Methods, 13(6), 482. http://doi.org/10.1038/NMETH.3870

Session A-215: Fast and efficient alignment-free comparative genomic screen for structured RNAs with DotcodeR
COSI: RNA
  • Yuki Kato, Osaka University, Japan
  • Jan Gorodkin, University of Copenhagen, Denmark
  • Jakob Hull Havgaard, University of Copenhagen, Denmark

Short Abstract: A class of structured non-coding RNAs is known to play various roles in the cells, but the annotation of these RNAs is still lacking even within the human genome. Part of the reason is ascribed to the fact that the currently available computational tools are either too computationally heavy for use in full genomic screens or rely on pre-aligned sequences. In this poster we present DotcodeR for detecting a set of structurally similar RNA pairs of predefined window length in a pair of genomic sequences by comparing their corresponding coarse-grained secondary structure dot plots at string level. This allows us to perform an all-against-all scan of all window pairs from two genomes without alignment. Our computational experiments with simulated data and real chromosomes show that DotcodeR has good sensitivity while reducing the search space drastically. Considering the results, DotcodeR can be useful as a prefilter in a comparative genomic scan for structured RNAs, which could be followed by a more rigorous approach to structural alignment for functional analysis of predicted RNA regions.

Session A-217: Modeling condition-specific alternative splicing
COSI: RNA
  • Martin Strazar, University of Ljubljana, Faculty of Computer and Information Science, Slovenia
  • Jernej Ule, The Francis Crick Institute, London, United Kingdom
  • Tomaz Curk, University of Ljubljana, Faculty of Computer and Information Science, Slovenia

Short Abstract: The role of most RNA-binding proteins (RBPs) in alternative splicing (AS) remains unclear. Next-generation sequencing (NGS) assays enable the search for the splicing code, a model that can relate multiple cis- and trans- acting factors to differential exon usage. Contemporary differential exon usage (DEU) statistical tests compare multiple experimental conditions (e.g. RBP knockdowns) to a single reference condition. The emergence of datasets including hundreds of experimental conditions calls for tailored models to detect condition-specific changes in AS and uncover RBP-specific regulation. We design a novel statistical model, named Condition-specific differential exon expression (csDEX), to discover changes in exon usage that occur only in a small subset of conditions. The package supports both read count- and Percent spliced-in (PSI)-based exon expression quantification. We test for splicing changes on a real-size public dataset with 189 shRNA knockdown samples of different RBPs (including SRSF1, U2AF1/2, PTBP1, hnRNPs, TARDBP) provided by the ENCODE project. We demonstrate the advantages of PSI- based quantification when seeking changes in AS, which are not due to gene expression. The precision of related methods is evaluated using the UCSC knownAlt annotation, where csDEX PSI-based model retrieves known AS events with highest precision (98%). The causal effect of RBP binding on AS is further validated by multiple independent data sources, such as RBP binding assays (eCLIP) and motif analysis. For TARDBP, the functional relevance was further verified by successfully retrieving cryptic exons known to be specifically TARDBP-regulated. We provide the first statistical package for computationally efficient detection condition-specific AS changes in RNA-seq datasets with hundreds of experimental conditions. The predicted condition-specific changes in AS were verified by multiple independent data sources provide functionally relevant candidates.

Session A-219: Automated analysis and comparison of multiple small RNA datasets with RAPID
COSI: RNA
  • Sivarajan Karunanithi, Cluster of Excellence on Multimodal Computing and Interaction, and Department of Computational Biology and Applied Algorithms, Max Planck Institute for Informatics, Saarbruecken, Germany, Germany
  • Martin Simon, Molecular Cell Dynamics Saarland University, Centre for Human and Molecular Biology, Saarbrücken, Germany, Germany
  • Marcel H. Schulz, Cluster of Excellence on Multimodal Computing and Interaction, and Department of Computational Biology and Applied Algorithms, Max Planck Institute for Informatics, Saarbruecken, Germany, Germany

Short Abstract: The role of small RNA (sRNA) molecules in genome regulation is not fully understood. Micro RNAs (miRNA), small interfering RNAs (siRNA), piwi-interacting RNAs (piRNA) and trans-acting RNAs (taRNA) are some members of the sRNA family. Existing sRNA analysis tools predominantly focus on predicting novel miRNAs, piRNAs and quantifying them. This leads to either ignoring other classes of sRNA or require custom made scripts. Understanding the role of these sRNAs in diverse biological processes requires paying attention to minor details, e.g. sRNAs originating from different strands in varying lengths, have different targets to involve in different downstream pathway. No integrated computational solution exists to investigate novel sRNA data in an unbiased way. Hence, we developed a generic sRNA analysis tool capturing read counts along with strand specificity, length distribution, and base modification. Our tool also automatically produces numerous visualizations covering multiple categories required for sRNA analysis. Our tool allows various normalization techniques to compare different sRNA samples, tailored to different scenarios e.g. knockdown of RNA interference components in the model organism. All analyses can be restricted to certain read lengths. For ease of use, our tool integrates an automated differential expression analysis using DESeq2. Finally, we present analyses of multiple datasets from different organisms. With no doubt, our tool is designed to simplify the life of data analysts and introduces a different perspective of available data. Our tool is available at: https://github.com/SchulzLab/RAPID.

Session A-221: Random Sample Consensus (RANSAC) for the robust identification of outliers in transcriptomic data
COSI: RNA
  • André Veríssimo, IDMEC, IST, Univ. Lisboa, Portugal
  • Eunice Carrasquinha, IDMEC, IST, Univ. Lisboa, Portugal
  • Marta Lopes, IDMEC, IST, Univ. Lisboa, Portugal
  • Arlindo Oliveira, INESC-ID, IST, Univ. Lisboa, Portugal
  • Marie-France Sagot, INRIA Grenoble Rhône-Alpes and Université de Lyon 1, Villeurbanne, France, France
  • Susana Vinga, IDMEC, IST, Univ. Lisboa, Portugal

Short Abstract: Random Sample Consensus (RANSAC) is a technique that has been widely and successfully used in areas such as computer vision for modeling data with a large amount of noise. RANSAC's algorithm randomly selects a number of observations and creates a model that is expanded with observations that are below a distance threshold, named inliers. This procedure is repeated k times, leading to a consensus model. We applied this technique in the biomedical area to both synthetic and clinical datasets, namely, the breast invasive carcinoma dataset from The Cancer Genome Atlas (TCGA-BRCA), including RNA-seq expression levels for both tumour and non-tumor tissues. The results of a baseline regularized logistic model, trained with all observations, are then compared against RANSAC. To evaluate the robustness of this method, the original dataset is perturbed by randomly changing the response class for 0% to 25% of the observations. At each step, 10 replicates were generated and the average misclassification rates and standard errors were obtained. In RANSAC, an observation is considered an outlier if it is not included in the model. Outlier observations are compared against misclassifications of the baseline model. The results show that RANSAC has high precision with the inlier observations, as expected, and is robust for increasingly perturbed data. In conclusion, the RANSAC results for these experiments show that this algorithm can identify a subset of observations for which the model is highly accurate, while simultaneously identifying outlier observations.

Session A-223: Reconstructing the inner structure of circular RNAs
COSI: RNA
  • Franziska Metge, MPI Biology of Ageing, Germany
  • Christoph Dieterich, Klaus Tschira Institute for Integrative Computational Cardiology, Germany
  • Jorge Boucas, MPI Biology of Ageing, Germany

Short Abstract: Motivation: Circular RNAs are a special class of RNA forming a covalently closed loop through a process called back-splicing. Only for a few well studied circRNAs, potential functions were shown, these include miRNA sponging, RNA binding protein (RBP) sponging, and regulation of their host gene’s transcription. Circular RNAs can be identified in rRNA depleted RNA-Sequencing by detecting chimeric reads, which span a back-splice junction. A variety of circRNA detection tools exists but no tool is able to summarize and characterize the identified circRNAs. To perform accurate downstream analyses after circRNA detection, it is crucial to know the exact exon-intron structure of circRNAs. Here, I am presenting FUCHS and FUCHSdenovo to summarize circRNAs and reconstruct their exon-intron chain based on linear-splice signals of back-splice junction anchored reads. Results: Running FUCHS on mouse samples revealed that heart circRNAs are less diverse but more abundant than liver circRNAs. The average length of circRNAs was 500BP. A de novo reconstruction of the inner circle structure using FUCHSdenovo showed a gain of information of 15%. Furthermore, FUCHSdenovo identified alternative splicing in 8-10% of circRNAs. To exemplify the value of the reconstructed circRNA models in downstream analyses, I performed a miRNA seed search and RBP motif search. Comparing the seed density of circRNAs and mRNAs showed that circRNAs were more densely populated with both, miRNA seeds and RBP motifs suggesting that circRNAs could form an additional layer in the gene-regulatory network by competing with their host genes for miRNA or RBP binding. Availability: https://github.com/dieterich-lab/FUCHS.git

Session A-225: Detection and mitigation of spurious antisense transcripts in RNA-Seq experiments
COSI: RNA
  • Kira Mourão, University of Dundee, United Kingdom
  • Radoslaw Lukoszek, University of Dundee, United Kingdom
  • Kimon Froussios, University of Dundee, United Kingdom
  • Nicholas Schurch, University of Dundee, United Kingdom
  • Katarzyna MacKinnon, University of Dundee, United Kingdom
  • Céline Duc, Université Clermont Auvergne, France
  • Gordon Simpson, University of Dundee, United Kingdom
  • Geoff Barton, University of Dundee, United Kingdom

Short Abstract: Antisense transcripts impact gene transcription in several different ways, affecting transcription initiation, such as overlapping antisense transcripts repressing initiation; transcription itself, where antisense transcripts can limit the length of the sense transcripts to shorter isoforms; or post-transcriptionally, when, for example, antisense transcripts can compete with sense transcripts for binding sites.   Stranded RNA-Seq determines the strand from which an RNA fragment originates, and so can be used to identify where antisense transcription may be implicated in gene regulation. However, by analysing over 100 experiments across multiple organisms from both ENCODE and our own work, we show that spurious antisense reads are often present in experiments, and can manifest at levels greater than 1% of sense transcript levels. This is enough to disrupt analyses by causing spurious antisense counts to dominate the set of genes with high antisense transcription levels.   Our tool RoSA (Removal of Spurious Antisense) detects the presence of high levels of non-authentic antisense transcripts, by analysing ERCC spike-in data to find the ratio of antisense:sense transcripts in the spike-ins. Similarly, RoSA will calculate a correction to the antisense counts based on either the spike-in antisense:sense ratio, or, where possible, using antisense and sense counts around splice sites to provide a gene-specific correction. We demonstrate the utility of our tool to filter authentic antisense transcript counts in an Arabidopsis thaliana RNA-Seq experiment.

Session A-227: A tool and a warning: Differential isoform usage with RATs
COSI: RNA
  • Kimon Froussios, University of Dundee, United Kingdom
  • Kira Mourão, University of Dundee, United Kingdom
  • Gordon Simpson, University of Dudnee, United Kingdom
  • Geoffrey Barton, University of Dundee, United Kingdom
  • Nicholas Schurch, University of Dundee, United Kingdom

Short Abstract: We present the R package “RATs” – (Relative Abundance of Transcripts) – that identifies transcriptome-wide Differential Transcript Usage (DTU) directly from transcript abundance estimations, without requiring access to alignment or assembly information. RATs is agnostic to quantification methods and unique in that it exploits bootstrapped quantifications, if available, to inform the significance of detected DTU events. In addition, RATs shows the DTU results graphically, and achieves a median False Discovery Rate ≤0.05 even at low replication levels. The package is available through Github at https://github.com/bartongroup/Rats. We applied RATs to a public human RNA-seq dataset for which three DTU events had previously been validated by qRT-PCR. We found that the ability to reproduce the reported DTU events depended on the genome annotation used for quantification of the data. The isoform abundance profiles of two of the three genes changes radically between Ensembl v60 and v87. Of the >500 and >400 DTU events identified respectively by RATs, only 141 were in common and only 8 were among those reported by the original study. Investigation of this discrepancy revealed that the effect size of most of the originally reported events was small and below our threshold. More importantly, our analysis revealed that the qRT-PCR probes designed based on Ensembl v60 no longer corresponded to the intended transcripts according to Ensembl v87, but rather they matched an incompatible multitude of isoforms. As a consequence, interpretation of the original qRT-PCR quantifications is impossible with the newer annotation.

Session A-229: An inescapable truth for differential gene expression from RNA-seq data
COSI: RNA
  • Nicholas Schurch, University of Dundee, United Kingdom
  • Kimon Froussios, University of Dundee, United Kingdom
  • Pietá Schofield, Cancer Research UK Manchester Institute, United Kingdom
  • Marek Gierliński, University of Dundee, United Kingdom
  • Christian Cole, University of Dundee, United Kingdom
  • Alexander Sherstnev, University of Dundee, United Kingdom
  • Katarzyna Mackinnon, University of Dundee, United Kingdom
  • Céline Duc, Université Clermont Auvergne, France
  • Vijender Singh, University of Dundee, United Kingdom
  • Nicola Wrobel, University of Edinburgh, United Kingdom
  • Karim Gharbi, Edinburgh Genomics, United Kingdom
  • Gordon Simpson, University of Dundee, United Kingdom
  • Tom Owen-Hughes, University of Dundee, United Kingdom
  • Mark Blaxter, University of Edinburgh, United Kingdom
  • Geoff Barton, University of Dundee, United Kingdom

Short Abstract: There is no escaping it; identifying the deferentially expressed genes between two experimental conditions requires replicates. The question is, how many do you need? The answer to this deceptively simple question is not straight-forward. By conducting the most highly replicated RNA-seq experiment to date, we show that the answer depends on: 1) the statistical character of the underlying RNA-seq expression measurements, 2) the differential gene expression (DGE) tool you use, and 3) the effect size you are looking to detect. Expanding this analysis beyond the simple transcriptome of yeast and into a higher eukaryote (Arabidopsis thaliana) we show that transcriptomic complexity does not appear to be a strong factor in answering this question. Based on this analysis of these two highly-replicated datasets, for future RNA-seq DGE experiments we recommend 1) DGE tools based around a negative binomial count distribution that use a shrinkage variance approach, 2) >6 replicates in all conditions, 3) a minimum effect size threshold of ~10-20% (depending on the replication level), and 4) >12 replicates per condition when it is important to identify cases of DGE with effect sizes in the ~10-20% range.

Session A-231: Analysis of transcriptomic profiles of beagle bone tissues identified novel long noncoding RNAs
COSI: RNA
  • Youngseok Yu, Kyung Hee University, South Korea
  • Jae-Hyung Lee, Kyung Hee University, South Korea

Short Abstract: Dogs have lived with humans for thousands of years and have shared many of them with human lives. Over the past decade, a variety of functional genomic studies have been conducted using one of the dog breeds, beagle. Beagle genes are similar to human genes and it is a good model organism to study human diseases. For this reason, more accurate gene annotation studies of the beagle genome have been performed using various next generation sequencing techniques. In this study, we compared the gene expression profiles among various tissues in the beagle, Aveolar bone, Maxilla, Skull and Tibia including RNA-Seq datasets obtained from beagle bone (Maxilla, Tibia, Skull and Alveolar). Principal component analysis reveals that 78% of the variation in gene expression is explained by the first three principal components and the first principal component separates the data according to tissues. When we focused on the bone tissues, total 19,360 differentially expressed genes have been identified among different bone tissues. Analysis of gene ontology and KEGG pathway showed the different gene expression functional profiles among different bone tissue types. Interestingly, we found total 3789, long noncoding RNAs (lncRNAs) in four tissue types, 3,744 lncRNAs among them were not listed in the known lncRNA databases and the functional annotations for the novel lncRNAs will be performed. The presented study could provide the valuable information of novel lncRNAs, especially for the important functional roles of lncRNAs in the various bone tissues.

Session A-233: The landscape of circRNA expression in luminal breast cancer cell lines and tissue
COSI: RNA
  • Lucia Coscujuela Tarrero, University of Turin, Italy
  • Valentina Miano, University of Turin, Italy
  • Giulio Ferrero, University of Turin, Italy
  • Laura Ricci, University of Turin, Italy
  • Maddalena Arigoni, University of Turin, Italy
  • Federica Riccardo, University of Turin, Italy
  • Laura Annaratone, University of Turin, Italy
  • Carlo De Intinis, University of Turin, Italy
  • Marco Beccuti, University of Turin, Italy
  • Anna Sapino, University of Turin, Italy
  • Raffaele Adolfo Calogero, University of Turin, Italy
  • Francesca Cordero, University of Turin, Italy
  • Michele De Bortoli, University of Turin, Italy

Short Abstract: Circular RNAs (circRNAs) are an emerging class of RNAs originated from exon Back-Splicing (BS) and, given their extracellular stability, they represent a promising set of biomarker for several diseases, including breast cancer. We deeply analyse the circRNA transcriptome on a luminal breast cancer model MCF-7 using twelve paired-end poly(A-) RNA-Seq experiments performed in four different cell growth conditions. We predicted 3,271 circRNAs using the CIRI algorithm and we characterize their genomic properties, using CircHunter, a novel algorithm for circRNA post-discovery analysis developed by our group. We confirmed that circRNAs are predominantly formed by two exons but we also identified intergenic, intronic, and monoexonic BS events. We observed also that circRNA host genes are longer, generate a high number of transcripts, have longer introns, and significant enrichment in H3K36me3 histone modification compared to control gene sets. Then, to extend the circRNA expression analysis on public total RNA-Seq datasets of breast cancer tissues, we developed an alignment-free method to directly compare sequencing reads with reconstructed BS sequences. As a result, we identified 113 circRNAs differentially expressed between Triple Negative and ER+ tumors and 622 circRNAs differentially expressed between ER+ and normal tissue. We analysed experimentally a set of 28 circRNAs in breast cancer cell lines and in 52 breast tumor samples. We identified circRNAs showing a higher expression when compared to their host gene and that are significantly highly expressed in ER+ luminal breast cancer cell lines and tissues.

Session A-235: Grapevine phenology classifier based on transcriptomic signatures
COSI: RNA
  • Francisco Altimiras, Telefonica Research and Development. Faculty of Engineering and Sciences at Universidad Adolfo Ibañez., Chile
  • Claudio Galaz, Telefonica Research and Development. Informatics Department at Universidad Técnica Federico Santa María., Chile

Short Abstract: In agricultural production, it is fundamental to characterize the phenological stage of the plants to ensure a good evaluation of the development, growth and health of the crops. In viticulture, the phenological characterization allows early-detection of nutritional deficiencies in the plants, those diminish the growth, the productive yield and drastically affect the quality of its fruits. Currently, the phenological estimation in grapevine (Vitis vinifera) is done using the scale of Eichhorn and Lorenz and its derivatives. For this estimation, seven phenological stages of the plant are divided into two categories: vegetative growth and reproductive growth. According to the visualization of certain structures, the phenological stage of the plant is determined. This system, which has been widely used for the last 20 years, requires the exhaustive evaluation of crops, which makes it intensive in terms of labor, personnel and time required for its application. There are several genomic information databases for Vitis vinifera and the function of their genes has been widely characterized. The application of advanced molecular biology, including massive parallel sequencing of RNA (RNA-seq), and the handling of large volumes of highly complex data, provide state-of-the-art tools for the determination of phenological stages on a global scale of molecular functions of plants. The main objective of this work is to create a phenological classifier to accurately estimate the stage of development of the grapevine. This estimation will help to improve crop productivity and reduce the costs associated with the use of fertilizers and pesticides to obtain quality fruits.

Session A-237: reactIDR: Statistical approach to robust RNA reactivity classification based on reproducible high-throughput structure analyses
COSI: RNA
  • Risa Kawaguchi, The National Institute of Advanced Industrial Science and Technology, Japan
  • Hisanori Kiryu, The University of Tokyo, Japan

Short Abstract: Currently more than dozen research studies about a novel high-throughput structure analysis have been published. While high-throughput structure analyses can be practical for quite a few subjects beyond scalability problems, their estimation of RNA reactivity at each nucleotide can be inconsistent between each structure analysis. This inconsistency is supposed to come from the distinctive difference of detectability and systematic biases of individual high-throughput structure analyses as well as the sparseness of sequencing read distribution. To establish a statistical methodology for robust structure analyses, I present a novel pipeline, reactIDR, which is designed to extract reliable structure information from general high-throughput structure analyses. To evaluate the reliability of each reactivity score, reactIDR computes the irreproducible discovery rate (IDR) by modeling the joint probability distribution among replicates. Moreover, reactIDR can take the local consistency of IDR into account based on the combination of hidden Markov model and IDR model with EM algorithm for parameter optimization. The efficiency of IDR filtering and classification for reproducible structure prediction was evaluated for the reference structure of human 18S rRNA and computationally estimated stem probabilities for the whole transcriptome. According to the results, IDR-based classification showed higher consistency with the reference structure and stem probability as calculated by in silico structure prediction, indicating that reactIDR would be a significant assist in extracting the condition-specific difference of secondary structure, with a view to deciphering the global view of RNA secondary structure.

Session A-239: Single Cell Expression Data Reveal Human Genes that Escape X-Chromosome Inactivation
COSI: RNA
  • Kerem Wainer Katsir, Hebrew University of Jerusalem, Israel
  • Michal Linial, The Hebrew University of Jerusalem, Israel

Short Abstract: Background: In mammals, sex chromosomes are a source of an inherent genetic difference between the sexes. A balance between the sexes is reached by random inactivation of one of the X-chromosomes, in each female somatic cell. Thus resulting in a tissue mosaic such that some of the cells express one of the X-chromosomes and some the other. While most genes from the inactivated X-chromosome are silenced, about 15-25% of them have been shown to escape inactivation (referred as escapees). These escapees have so far been identified using multiple indirect methods considering the mosaic nature of female tissues. Results: We use single-cell RNA-seq to directly quantify the extent of escape from X-inactivation phenomenon. Analyzing data from single cells is preferable as in each cell only one of the X chromosomes is inactivated. Our method relies on allelic specific expression of genes with heterozygous SNPs, thus enabling us to discriminate between the expression of genes from the active X-chromosome (Xa) and from the inactive (Xi) one. We apply our method to datasets from (i) primary fibroblasts without genomic phasing (n=104), and (ii) clonal lymphoblasts cells with phased parental genomes (n=25). Applying our single-cell analysis we identify 27 and 34 escapees from fibroblasts and lymphoblasts, respectively. On the other hand, when analyzing a pool of lymphoblasts, only 14 escapees are discovered. Altogether, we report on 51 escapees, many of which are known escapees discovered by indirect methods (p=2.74e-06). We identify a few overlooked escapees, and propose to revise the annotations associated with some others. Conclusions: Chromosome X-inactivation and escaping from it are robust phenomena detected at a single-cell resolution. These phenomena are apparent in isolated primary fibroblasts as well as in cell-lines. Genomic phasing substantially improves the detection of escapees. Cumulative data from individual cells increases the sensitivity of detecting escapees compared to data extracted from a pooled sample. Our results support the notion that different cell types express different escapees with only a handful of consistent escapees across cell types and experimental settings.

Session A-241: Transcriptome profiling of brain lesion evolution in MS
COSI: RNA
  • Maria Louise Elkjær, Odense University Hospital, Denmark
  • Mark Burton, Odense University Hospital, Denmark
  • Tobias Frisch, University of Southern Denmark, Denmark
  • Richard Reynolds, Imperial College, London, United Kingdom
  • Torben Kruse, Odense University Hospital, Denmark
  • Jan Baumbach, University of Southern Denmark, Denmark
  • Mads Thomassen, Odense University Hospital, Denmark
  • Zsolt Illes, Odense University Hospital, Denmark

Short Abstract: Multiple sclerosis (MS) is the most common cause of neurological disability among young adults. Early in the disease course, the relapsing and remitting phase, influx of peripheral inflammatory cells induces focal demyelinating lesions in the central nervous system (CNS). Within a few decades, ongoing inflammatory demyelination eventually leads to diffuse axonal/neuronal degeneration with clinical decline resulting in a progressive phase. Unfortunately here the effectively of immunotherapies is lost and there are no good sufficient criteria that determine the conversion from relapsing and remitting to the progressive phase. We hypothesize that that specific transcriptome signatures can characterize lesion evolution in the progressive MS brain and this can be used for stage-specific biomarker discovery. In order to address gene expression changes during lesions evolution/expansion we identified 20 normal-appearing white matter areas, 15 active, 17 chronic active, 14 inactive and 6 repairing lesions, from 10 brains of patients with MS. We characterized tissues for cellular changes (i.e. microglia/macrophage activity), myelin integrity and phagocytosis of myelin proteins by using standard histology and immunohistochemistry. As controls, we chose 25 white matter (WM) areas from five brains without neurological disease. We microdissected the areas of interest, extracted the RNA, and performed next-generation RNA sequencing of the different lesion types using paired end sequencing of 2x80 bases on Illumina’s NextSeq500/550. The transcriptome assembly of the RNA reads was done using reference sequences from Ensembl and alignment program from Bowtie/TopHat2. We counted and analyzed different expressed genes using HT-seq and the edgeR package. From the preliminary data, comparing the chronic active MS lesions and the WM control areas, 1301 significantly differentially expressed genes and 62 significantly differentially pre-defined pathways were found. The most changed genes belong to the immunoglobulin family, which support the autoantibody-mediated theory of MS damage and correlates with the oligoclonal bands detected in the CSF of MS patients. Other differentially expressed genes and pathways confirmed the known key factors playing a role in MS such as the presence of the CD8+ T cells and CD20+ B cells, oxidative stress markers, Ca2+/Na+-induced K-channels and metabolic pathways. Differentially expressed genes involved in degeneration and cell death/survival were also found, such as growth factors and components for axonal regeneration. The transcriptome signature of different lesion types will be compared to our CSF proteome database obtained from 97 patients with early inflammatory and late progressive disease to explore, if lesion type signatures are reflected in MS-CSF, and can be used as composite biomarkers related to disease stages. The multi-omics approach of MS brain lesions may provide radically new insight into MS pathogenesis, reveal novel potential treatment targets, and contribute to discovery of composite biomarkers predicting irreversible disease progression.

Session A-243: Prediction algorithm vs. energy model: an empirical performance comparison of two algorithms with different hypotheses but same energy model
COSI: RNA
  • Hosna Jabbari, University of Alberta, Canada
  • Carlo Montemagnno, University of Alberta, Canada

Short Abstract: Motivation: RNA is a biopolymer with many different applications inside the cell and in biotechnology. Structure of RNA molecules mainly determines their function. Accurate prediction of RNA molecules is therefore important. RNA secondary structure prediction has received attention in the past decade. However, a long standing question for improving prediction accuracy of RNA secondary structure is whether to focus on prediction algorithm or the energy model. This problem is particularly pronounced for complex pseudoknotted structures, for which there is even more trade-off on computational cost of the prediction algorithm versus its generality. The aim of this paper is to attract attention to the importance of energy model and to invite more research in this area. Results: In this work, we thoroughly compare performance of two of the most general RNA pseudoknotted secondary structure prediction algorithms with two different folding hypothesis but the same underlying energy model on a large data set of known structures. Based on the methods compared in this work, we hypothesize that energy model has more significant contribution to the prediction accuracy of the method than its folding hypothesis.

Session A-245: Whole transcriptome analysis reveals that Zika virus halts cell cycle progression and disrupts neuronal differentiation in human neurospheres
COSI: RNA
  • Patrícia Garcez, D’Or Institute for Research and Education (IDOR)/Institute of Biomedical Sciences, Federal University of Rio de Janeiro - Rio de Janeiro, Brazil, Brazil
  • Joao Lidio Vianez Junior, Evandro Chagas Institute, Brazil
  • Janaina Vasconcelos, Evandro Chagas Institute, Brazil
  • Stevens Rehen, D’Or Institute for Research and Education (IDOR)/Institute of Biomedical Sciences, Federal University of Rio de Janeiro - Rio de Janeiro, Brazil, Brazil

Short Abstract: Brazil is facing an unprecedented growth in the number of microcephaly cases in babies. This phenomenon coincided with the recent Zika virus (ZIKV) outbreak in thise country. Although the Brazilian Ministry of Health was quick to recognize that ZIKV was probablyis the cause of microcephaly in newborns, the underlying mechanisms leading to the development of this pathology have not been established. To tackle this problem at the molecular level, we employed whole transcriptome sequencing of human neurospheres derived from neural stem cells exposed to ZIKV isolated in Brazil, that belongs to the Asian genotype. Differential gene expression analysis of control (MOCK) and ZIKV infected neurospheres generated a list of 26 down-regulated and 64 up-regulated genes. Among the up-regulated detected genes, the Cyclin-dependent kinase inhibitor 1A (CDKN1A) and the Glial fibrillary acidic protein gene (GFAP) were found. CDKN1A prevents the activation of the Cyclin E/CDK2 complex, acting as a regulator of cell cycle progression during G1 and GFAP is a known marker of astrocytes. We also observed a decrease in the expression of the neurogenic differentiation 1 gene (NEUROD1), which is directly involved in the neurogenic program. Those findings suggests that ZIKV infection induces cell cycle arrest and inhibits the neuronal differentiation, resulting not only in the reduction of the size, but in a deeper disruption of the normal development of the human brain.

Session A-247: De novo annotation and characterization of the translatome with ribosome profiling data
COSI: RNA
  • Zhengtao Xiao, Tsinghua University, China
  • Rongyao Huang, Tsinghua University, China
  • Xuerui Yang, Tsinghua University, China

Short Abstract: By capturing and sequencing the RNA fragments protected by translating ribosomes, ribosome profiling sketches the landscape of translation at subcodon resolution. We developed a new method, RiboCode, which uses ribosome profiling data to assess the translation of each RNA transcript genome-wide. As supported by multiple tests with simulated data and cell type-specific QTI-seq and mass spectrometry data, RiboCode exhibits superior efficiency, sensitivity, and accuracy for de novo annotation of the full translatome, which covers various types of novel ORFs in the previously annotated coding and non-coding regions and overlapping ORFs. Finally, to showcase its application, we applied RiboCode on a published ribosome profiling dataset and assembled the context-dependent translatomes of yeast under normal condition, heat shock, and oxidative stress. Comparisons among these translatomes revealed stress-activated novel upstream and downstream ORFs, some of which are associated with the potential translational dysregulations of the main protein coding ORFs in response to the stress signals.

Session A-249: Understanding retrotransposons expression in human somatic cells
COSI: RNA
  • Gm Jonaid, University of Nevada, Las Vegas., United States
  • Mira V. Han, University of Nevada, Las Vegas., United States
  • Sophia Quinton, UNLV, United States
  • Daphnie Churchill, UNLV, United States
  • Nicky Chung, UNLV, United States
  • Cody Clymer, UNLV, United States
  • Adrian Alberto, UNLV, United States
  • Austin Ross, UNLV, United States
  • Omar Navarro Leija, UNLV, United States

Short Abstract: Insertions of
 retrotransposons can disrupt genes, and cause dysregulation of gene expression. Our objective is to understand retrotransposons expression in cancer, and identify miRNAs and somatic mutation of genes that control retrotransposons. We measured retrotransposon expression levels in the RNAseq data of cancer samples in the Cancer Genome Atlas (TCGA). We used different statistics to test association between miRNA/gene expression and L1HS expression, and between somatic mutations and L1HS expression across 634 number of patients. We found that unlike other transposon families, L1HS transcripts are always overexpressed in cancer compared to the normal tissue, although the degree of overexpression varied across patients and cancer types. We identified a list of candidate miRNAs and genes that may control transposon expression. The list of genes we identified includes several of the known host factors in L1HS activity.

Session A-251: SSMART: Sequence-structure motif identification for RNA-binding proteins
COSI: RNA
  • Alina Munteanu, Berlin Institute for Medical Systems Biology, Max Delbruck Center, Berlin, Germany, Germany
  • Neelanjan Mukherjee, MDC Berlin, Germany
  • Uwe Ohler, Max Delbrueck Center & Humboldt University, Germany
Session A-253: Stacking interactions between ribose and nucleobases in functional RNAs: occurrence, structural context and stability
COSI: RNA
  • Mohit Chawla, King Abdullah University of Science and Technology (KAUST), Saudi Arabia
  • Romina Oliva, University Parthenope of Naples, Italy
  • Luigi Cavallo, King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Session A-255: NeuroExpresso: Cross laboratory database of brain cell type specific gene expression
COSI: RNA
  • B Ogan Mancarci, University of British Columbia, Canada
  • Lilah Toker, University of British Columbia, Canada
  • Shreejoy J Tripathy, University of British Columbia, Canada
  • Brenna Li, University of British Columbia, Canada
  • Brad Rocco, University of Toronto, Canada
  • Etienne Sibille, University of Toronto, Canada
  • Paul Pavlidis, University of British Columbia, Canada

Short Abstract: Due to heterogenous nature of the brain, molecular classification of the individual cell types is often challenging due to limited focus of individual studies to individual cell types, or specific brain regions. The data from such data however is constantly accumulating, allowing us to aggregate them to have a more comprehensive view of the brain cell types. Here we present NeuroExpresso, a curated database of mouse brain cell type specific expression profiles representing 35 major cell types from 10 brain regions, acquired from independent expression profiling experiments using both pooled cell microarray data and single cell RNA sequencing. We make this database available to the community at neuroexpresso.org, a website that allows visualization and basic analysis (differential expression) of gene expression in brain cell types. The database is a valuable resource as it allows researchers to identify novel properties of brain cell types in context whole brain or specific brain regions, not detectable by individual studies. Further, we used this database to identify marker genes specific to individual cell types and identified a substantial number of previously unknown cellular markers. These markers are then validated using in siloco analyses and in situ hybridization. Finally, we demonstrate that summarized expression of marker genes (marker gene profiles-MGPs) in bulk tissue correlates with changes in cellular proportions. We use MGPs to re-capture known loss of dopaminergic cells in Parkinson’s disease (PD) patients and discover that a substantial proportion of genes previously reported as differentially expressed in PD patients can be attributed to the reduction of dopaminergic cells

Session A-257: T2GO: Deciphering the functional and regulatory impact of differential splicing
COSI: RNA
  • Lorena de La Fuente Lorente, Centro de Investigación Príncipe Felipe (CIPF), Spain
  • Manuel Tardaguila, UNIVERSITY OF FLORIDA, United States
  • Hector Del Risco, University of Florida - Dr. Conesa's Lab, United States
  • Ana Conesa, Genomics of Gene Expression Lab, Spain
Session A-259: Cardiac transcriptome analysis reveals splicing and allele specific expression differences in patients with dilated cardiomyopathy
COSI: RNA
  • Michiel Adriaens, Maastricht Centre for Systems Biology, Maastricht University, Netherlands
  • Matthias Heinig, Institute of Computational Biology, Helmholtz Zentrum München, Germany
  • Sebastian Schafer, Division of Cardiovascular & Metabolic Disorders, Duke-National University of Singapore, Singapore
  • Hanneke Van Deutekom, Department of Clinical and Experimental Cardiology, Heart Center, Academic Medical Center, University of Amsterdam, Netherlands
  • Elisabeth Lodder, Department of Clinical and Experimental Cardiology, Heart Center, Academic Medical Center, University of Amsterdam, Netherlands
  • Enrico Petretto, Division of Cardiovascular & Metabolic Disorders, Duke-National University of Singapore, Singapore
  • Paul Barton, National Heart and Lung Institute, Imperial College London, United Kingdom
  • Stuart Cook, Division of Cardiovascular & Metabolic Disorders, Duke-National University of Singapore, Singapore
  • Yigal Pinto, Department of Clinical and Experimental Cardiology, Heart Center, Academic Medical Center, University of Amsterdam, Netherlands
  • Connie Bezzina, Department of Clinical and Experimental Cardiology, Heart Center, Academic Medical Center, University of Amsterdam, Netherlands
  • Norbert Hubner, Cardiovascular and Metabolic Sciences, Max-Delbrück-Center for Molecular Medicine (MDC), Germany
Session A-261: PyScope: Detecting oscillatory gene networks
COSI: RNA
  • Alexis Boukouvalas, University of Manchester, United Kingdom
  • Luisa Cutillo, University of Sheffield, United Kingdom
  • Magnus Rattray, University of Manchester, United Kingdom
  • Elli Marinopoulou, University of Manchester, United Kingdom
Session A-263: Transcriptomic profiling of human iPSC-derived neurons
COSI: RNA
  • Kerstin Lenk, DFG-Center for Regenerative Therapies Dresden/ Technische Universität Dresden, Germany
  • Lisa Kutsche, DFG-Center for Regenerative Therapies Dresden/ Technische Universität Dresden, Germany
  • Volker Busskamp, DFG-Center for Regenerative Therapies Dresden/ Technische Universität Dresden, Germany
Session A-265: Choosing the best sequencing methodology for simultaneous detection and relative quantification of coding and non-coding RNA.
COSI: RNA
  • Vincent Boivin, Université de Sherbrooke, Canada
  • Gabrielle Deschamps-Francoeur, Université de Sherbrooke, Canada
  • Sonia Couture, université de sherbrooke, Canada
  • Sherif Abou Elela, université de sherbrooke, Canada
  • Michelle Scott, University of Sherbrooke, Canada

Short Abstract: The ability to compare the abundance of one RNA molecule to another is a crucial step for understanding how gene expression is modulated to shape the transcriptome landscape. However, little information is available about the relative expression of the different classes of coding and non-coding RNA or even between RNA of the same class. We have determined the most accurate experimental and bioinformatic sequencing methodology to date to elaborate a complete portrait of the human transcriptome that depicts the relationship of all classes of non-ribosomal RNA longer than sixty nucleotides. The results show that the most abundant RNA in the human rRNA-depleted transcriptome is tRNA followed by spliceosomal RNA. Surprisingly, the signal recognition particle RNA 7SL by itself occupies 8% of the ribodepleted transcriptome producing a similar number of transcripts as that produced by all snoRNA genes combined. In general, the most abundant RNA are non-coding but many more protein coding than non-coding genes produce more than 1 transcript per million. Examination of gene functions suggests that RNA abundance reflects both gene and cell function. Together, the data indicate that the human transcriptome is shaped by a small number of highly expressed non-coding genes and a large number of moderately expressed protein coding genes that reflect cellular phenotypes.

Session A-267: Transcriptome-wide modelling of RNA life cycle
COSI: RNA
  • Alina Selega, University of Edinburgh, United Kingdom
  • David Schnoerr, University of Edinburgh, United Kingdom
  • Sander Granneman, University of Edinburgh, centre for Synthetic and Systems Biology (SynthSys), United Kingdom
  • Guido Sanguinetti, University of Edinburgh, United Kingdom
Session A-269: A systems level view on miR-124 function during neuronal differentiation from human iPS cells
COSI: RNA
  • Lisa K. Kutsche, DFG Research Center for Regenerative Therapies, Technische Universität Dresden, Dresden, Germany, Germany
  • Deisy M. Gysi, Department of Computer Science, TFome Research Group, Bioinformatics Group, Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Germany, Germany
  • Kerstin Lenk, DFG Research Center for Regenerative Therapies, Technische Universität Dresden, Dresden, Germany, Germany
  • Rebecca Petri, Lab of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center and Lund Stem Cell Center, Lund University, Lund, Sweden, Sweden
  • Johan Jakobsson, Lab of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center and Lund Stem Cell Center, Lund University, Lund, Sweden, Sweden
  • Katja Nowick, Department of Computer Science, TFome Research Group, Bioinformatics Group, Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Germany, Germany
  • Volker Busskamp, DFG Research Center for Regenerative Therapies, Technische Universität Dresden, Dresden, Germany, Germany
Session A-271: pulseR: Versatile computational analysis of RNA turnover from metabolic labeling experiments
COSI: RNA
  • Alexey Uvarovskii, University Hospital Heidelberg, German Center for Cardiovascular Research, Germany
  • Christoph Dieterich, University Hospital Heidelberg, German Center for Cardiovascular Research, Germany
Session A-273: The impact of alternative splicing in human-pathogenic fungi during host infection
COSI: RNA
  • Patricia Sieber, Department of Bioinformatics, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, Germany
  • Stefan Schuster, University of Jena, Germany
  • Jörg Linde, Research Group Systems Biology and Bioinformatics, Hans Knöll Institute, Leibniz Institute for Natural Product Research and Infection Biology, Jena, Germany
Session A-275: Dissecting the complexity and cell-to-cell variability of alternative splicing regulation
COSI: RNA
  • Martin Mikl, Weizmann Institute of Science, Israel
  • Eran Segal, Weizmann Institute of Science, Israel
Session A-277: Genome-wide analysis for identification of lncRNAs that sponge RNA-binding proteins
COSI: RNA
  • Hilal Kazan, Antalya International University, Turkey
  • Saber Hafezqorani, Middle East Technical University, Turkey
Session A-279: Dissecting the cell heterogeneity of skin in inflammatory dermatological conditions through deconvolution of tissue expression profiles
COSI: RNA
  • Zandra Félix Garza, Eindhoven University of Technology, Netherlands
  • Michael Lenz, Maastricht Centre for Systems Biology (MaCSBio), Netherlands
  • Joerg Liebmann, Philips GmbH, Germany
  • Matthias Born, Philips GmbH, Germany
  • Ilja Arts, Maastricht Centre for Systems Biology (MaCSBio), Netherlands
  • Peter Hilbers, Technische Universiteit Eindhoven, Netherlands
  • Natal van Riel, Eindhoven University of Technology, Netherlands
Session A-281: Application of refined brain cell type marker gene analysis identifies differentially expressed genes in Alzheimer’s disease
COSI: RNA
  • Xue Wang, Mayo Clinic, United States
  • Mariet Allen, Mayo Clinic, United States
  • Jeremy Burgess, Mayo Clinic, United States
  • Minerva Carrasquillo, Mayo Clinic, United States
  • Shaoyu Li, University of North Carolina at Charlotte, United States
  • Yan Asmann, Mayo Clinic, United States
  • Nilüfer Ertekin-Taner, Mayo Clinic, United States
Session A-283: Using machine learning to identify site of origin of metastatic tumours
COSI: RNA
  • Jasleen Grewal, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Steven Jones, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Sitanshu Gakkhar, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Yussanne Ma, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Yongjun Zhao, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Andrew Mungall, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Richard Moore, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Howard Lim, Department of Medical Oncology, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Daniel Renouf, Department of Medical Oncology, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Karen Gelmon, Department of Medical Oncology, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Stephen Yip, Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of British Columbia, Vancouver, Canada, Canada
  • Janessa Laskin, Department of Medical Oncology, British Columbia Cancer Agency, Vancouver, Canada, Canada
  • Marco Marra, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada, Canada
Session A-285: Dysregulation of Human Endogenous Retroviruses in Primary CD4 T-Cells following Vorinostat treatment
COSI: RNA
  • Cory White, University of Southampton, United Kingdom
  • Nadejda Beliakova-Bethell, University of California, San Diego, United States
  • Steven Lada, University of California, San Diego, United States
  • Michael Breen, Icahn School of Medicine at Mt. Sinai, United States
  • Gkikas Magiorkinis, University of Oxford, United Kingdom
  • Douglas Richman, University of California, San Diego, United States
  • John Frater, University of Oxford, United Kingdom
  • John Holloway, University of Southampton, United Kingdom
  • Christopher Woelk, University of Southampton, United Kingdom
Session A-287: Recurrent neural network models to quantitatively predict RNA-RNA interactions
COSI: RNA
  • Michelle Wu, Stanford University School of Medicine, United States
  • Johan Andreasson, Stanford University School of Medicine, United States
  • Wipapat Kladwang, Stanford University School of Medicine, United States
  • William Greenleaf, Stanford University School of Medicine, United States
  • Rhiju Das, Stanford University School of Medicine, United States
Session A-289: Integrative Deep Models for Alternative Splicing
COSI: RNA
  • Yoseph Barash, University of Pennsylvania, United States
  • Anupama Jha, University of Pennsylvania, United States
  • Matthew Gazzara, University of Pennsylvania Perelman School of Medicine, United States

Short Abstract: Motivation: Advancements in sequencing technologies have highlighted the role of alternative splicing (AS) in increasing transcriptome complexity. This role of AS, combined with the relation of aberrant splicing to malignant states, motivated two streams of research, experimental and computational. The first involves a myriad of techniques such as RNA-Seq and CLIP-Seq to identify splicing regulators and their putative targets. The second involves probabilistic models, also known as splicing codes, which infer regulatory mechanisms and predict splicing outcome directly from genomic sequence. To date, these models have utilized only expression data. In this work we address two related challenges: Can we improve on previous models for AS outcome prediction and can we integrate additional sources of data to improve predictions for AS regulatory factors. Results: We perform a detailed comparison of two previous modeling approaches, Bayesian and Deep Neural networks, dissecting the confounding effects of datasets and target functions. We then develop a new target function for AS prediction in exon skipping events and show it significantly improves model accuracy. Next, we develop a modeling framework that leverages transfer learning to incorporate CLIP-Seq, knockdown and over expression experiments, which are inherently noisy and suffer from missing values.Using several datasets involving key splice factors in mouse brain, muscle and heart we demonstrate both the prediction improvements and biological insights offered by our new models. Overall, the framework we propose offers a scalable integrative solution to improve splicing code modeling as vast amounts of relevant genomic data become available. Availability: code and data available at: majiq.biociphers.org/jha_et_al_2017/

Session A-291: Effect of de novo transcriptome assembly on read mapping correctness for transcript quantification
COSI: RNA
  • Ping-Han Hsieh, National Taiwan University, Taiwan
  • Yen-Jen Oyang, National Taiwan University, Taiwan
  • Chien-Yu Chen, National Taiwan University, Taiwan

Short Abstract: Correct quantification of transcript abundance is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For these practical projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of fragmented contigs and redundant sequences produced by the assembler may result in ambiguous read mapping or unreliable abundance estimation, resulting in the decrease of reliability in transcript quantification. In this regard, this study aims at investigating how assembly quality might affect the quality of read mapping and count estimation. By experiments and analyses conducted in this study, several important factors that might seriously affect the accuracy of the RNA-Seq analysis workflow were comprehensively discussed. The effects of ambiguous regions originally present in the transcriptome to the mapping and assembly quality were examined. The factors such as assembly completeness and types of mis-assembly were examined on both of the synthetic and practical sequencing reads. By assessing the quantification quality for each sequence category, this study has shown that the ambiguous regions presented in the reference transcriptome only slightly influences mapping quality, but leads to un-eliminated mis-assembly. The resultant mis-assembly then heavily decreased the reliability of read mapping and count estimation. Among all the wrongly assembled contigs, the mistakenly aggregation of several transcripts into one was shown to cause the most serious damages on the reliability of quantification. Fortunately, it is generally believed that this category has chances to be largely reduced by post-modifications in the advanced transcriptome assembly pipelines developed recently.

Session A-293: Advancing parasite transcriptomics with spliced-leader sequencing experimental and computational workflows
COSI: RNA
  • Bart Cuypers, Univeristy Of Antwerp, Belgium
  • Malgorzata Domagalksa, Institute of Tropical Medicine, Antwerp, Belgium
  • Geraldine de Muylder, Institute of Tropical Medicine, Antwerp, Belgium
  • Pieter Meysman, University of Antwerp, Belgium
  • Manu Vanaerschot, Columbia University in the City of New York, United States
  • Hideo Imamura, Institute of Tropical Medicine, Belgium
  • Franck Dumetz, Institute of Tropical Medicine, Belgium
  • Thomas-Wolf Verdonckt, Institute of Tropical Medicine, Belgium
  • Peter J. Myler, Center for Infectious Disease Research, United States
  • Gowthaman Ramasamy, Center for Infectious Disease Research, United States
  • Kris Laukens, University of Antwerp, Belgium
  • Jean-Claude Dujardin, Institute of Tropical Medicine, Antwerp, Belgium
Session A-295: Single cell transcriptomics reveals specific RNA editing signatures in the human brain
COSI: RNA
  • Ernesto Picardi, University of Bari & IBIOM-CNR, Italy
  • Anna Maria D'Erchia, University of Bari & IBIOM-CNR, Italy
  • Graziano Pesole, University of Bari & IBIOM-CNR, Italy
Session A-297: REDIportal: a comprehensive database of A-to-I RNA editing events in humans
COSI: RNA
  • Ernesto Picardi, University of Bari & IBIOM-CNR, Italy
  • Anna Maria D'Erchia, University of Bari & IBIOM-CNR, Italy
  • Claudio Lo Giudice, University of Bari, Italy
  • Graziano Pesole, University of Bari & IBIOM-CNR, Italy
Session A-473: PIPELINER - A Flexible High Throughput Sequencing Data Analysis Framework
COSI: RNA
  • Tanya Karagiannis, Boston University, United States
  • Kritika Karri, Boston University, United States
  • Dileep Kishore, Boston University, United States
  • Gary Benson, Boston University, United States
  • Josh Campbell, Boston University School of Medicine, United States
  • Stefano Monti, Boston University School of Medicine, United States

Short Abstract: High-throughput sequencing (HTS) technology has become essential in study of genomics and has made it possible to obtain millions of sequencing reads in a single experiment in a cost effective manner. However, the analysis of HTS data requires heavy utilization of computationally intensive techniques, since millions of sequencing samples have to undergo various processing steps, from read quality assessment and alignment to quantification. Each step in the analysis needs a specialized tool or algorithm and all of these steps need to be streamlined due to their computationally demanding nature.Both commercial and open source platforms are available for analysis of HTS data; however,they are either too limited in terms of tools and functionality (hence, rigid), or incorporate a wide array of tools but are difficult to use (hence, too complex). The aim of our project was to develop a framework that would address these concerns, and be user-friendly while retaining its flexibility and reproducibility. We have developed “PIPELINER”, a framework that is efficient for the user and can generate flexible and modular workflows for quality assessment and processing of HTS data. PIPELINER is based on Nextflow, a portable, scalable, parallelizable, domain-specific language (DSL) that enables our pipelines to have language and platform independence, implicit parallelism, and automatic failure recovery. PIPELINER also incorporates an Anaconda virtual environment, which allows for the pre-compilation of all the tools involved with the pipeline being generated. This makes the deployment process and execution platform-independent and less cumbersome.As a proof of concept, we developed a pipeline for processing of bulk RNA-sequencing(RNA-seq) data. Based on the lessons learned, we will next develop a pipeline for single cell RNA sequencing (scRNA-seq) data.

Session A-475: Conserved Long-Range RNA Secondary Structure Implicated in Eukaryotic pre-mRNA Processing
COSI: RNA
  • Dmitri Pervouchine, Skolkovo Institute of Science and Technology, Moscow, Russia
  • Carme Arnan Ros, Center for Genomic Regulation, Barcelona, Spain
  • Danila Bredikhin, Faculty of Bioengineering and Bioinformatics, Moscow State University, Russia
  • Andrei Mironov, Faculty of Bioengineering and Bioinformatics, Moscow State University, Russia
  • Roderic Guigo, Center for Genomic Regulation, Barcelona, Spain

Short Abstract: Eukaryotic RNAs undergo extensive processing at the post-transcriptional level, including capping, 3’-cleavage and polyadenylation, and splicing. These steps happen synergistically and also concurrently with each other and with transcription, generating multiple alternative products arising from the same locus. Here, by using comparative genomics, we identified a robust set of ~15,000 pairs of conserved complementary regions in non-coding regions of human protein-coding genes. These conserved regions have a large non-random overlap with eCLIP peaks, and in particular with those of RBFOX2, suggesting widespread structural regulatory mechanisms similar to RNA bridges. The complementary regions tend to avoid mutations and even if polymorphisms occur in them, the corresponding impact on the free energy is smaller than that of random mutations. Conserved regions also contain cryptic splice sites and are possibly involved in the suppression of aberrant splicing. Interestingly, the complementary pairs are located preferentially close to splice junctions and contain unexpectedly high number of internal transcription termination and transcription start sites. This leads to a hypothesis that intramolecular RNA structure in combination with splicing could serve to suppress premature cleavage and polyadenylation by holding RNA parts together while the spliceosome excises the intron containing the cleavage site. This mechanism could as well be responsible for seemingly alternative transcription start sites that are generated through premature cleavage and nuclear re-capping of initial introns. Overall, we find a highly non-random distribution of conserved complementary regions with respect to mammalian gene structure including not only splicing signal demarcation, but also transcriptional start and stop sites.

Session A-477: TurboFold II: RNA Structural Alignment and Secondary Structure Prediction Informed by Multiple Homologs
COSI: RNA
  • Zhen Tan, University of Rochester, United States
  • Yinghan Fu United States
  • Guarav Sharma, University of Rochester, United States
  • David Mathews, University of Rochester, United States

Short Abstract: In this study, I developed TurboFold II, an extension of the TurboFold algorithm for predicting secondary structures for multiple RNA homologs. TurboFold II makes several improvements upon the original TurboFold algorithm. Whereas TurboFold only provided secondary structure predictions, TurboFold II also provides a multiple sequence alignment that incorporates information from secondary structure conservation. In contrast with TurboFold that used fixed alignment probabilities computed at the start using only sequence information, TurboFold II updates the alignment probabilities for inter-sequence alignment at each iteration. The updates incorporate secondary structure conservation information in the alignment by using a match score, calculated from estimated base pairing probabilities to represent the secondary structural similarity between nucleotide positions in the two sequences. Upon completion of the iterations, in addition to structure predictions computed as in TurboFold, TurboFold II computes a multiple sequence alignment that is progressively computed based on a probabilistic consistency transformation and a hierarchically computed guide tree, adopted from other sequence alignment methods. The TurboFold II algorithm is modified for prediction of RNA secondary structures to utilize base pairing probabilities guided by SHAPE experimental data. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone.
TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods.TurboFold II is part of the RNAstructure software package(http://rna.urmc.rochester.edu).


View Posters By Category

Search Posters: