Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


RNA: Computational RNA Biology

COSI Track Presentations

Schedule subject to change
Wednesday, July 24th
10:15 AM-10:20 AM
Opening Remarks
  • Yoseph Barash, Klemens Hertel, Michelle Scott
10:20 AM-11:00 AM
Translation is required to initiate miRNA-dependent decay of endogenous transcripts.
  • Ana Claudia Marques, Adriano Biasini, Stefano de Pretis, Jennifer Y. Tan, Baroj Abdulkarim, Harry Wischnewski, Rene Dreos, Mattia Pelizzola and Constance Ciaudo

Presentation Overview: Show

Posttranscriptional repression by microRNA (miRNA) occurs through transcript destabilization or translation inhibition. Whereas the general consensus is that RNA degradation explains most miRNA-dependent repression, transcript decay occurs co-translationally, raising questions regarding the requirement of target translation to miRNA dependent transcript destabilization. To assess the importance of translation to miRNA-mediated RNA destabilization, we decoupled these two molecular processes by dissecting the impact of miRNA loss of function on cytosolic long noncoding RNAs (lncRNAs). We show, that despite interacting with miRNA loaded RNA-induced silencing complex, the steady state abundance and degradation rates of these endogenously expressed non-translated transcripts are minimally impacted by miRNA loss. Forcing the association between lncRNAs and translating ribosomes, by fusing candidate noncoding transcripts to the 3’end of a protein-coding gene reporter, results in their miRNA-dependent transcript destabilization. Furthermore, closer analysis of lncRNAs undergoing miRNA-dependent destabilization revealed these are associated with translating ribosomes and are likely misannotated micropeptides. Our analysis revealed that coding and noncoding transcripts are differently affected by miRNAs and support the strict requirement of translation for miRNA-dependent transcript destabilization.

11:00 AM-11:20 AM
Single cell transcriptomics of liver-expressed long non-coding RNAs
  • Kritika Karri, Boston University, United States

Presentation Overview: Show

The liver exhibits striking metabolic zonation, with distinct functions and expression patterns for hepatocytes proximal to the portal vein compared to cells surrounding the central vein. However, little is known about the zonation of liver lncRNAs, which have diverse functions and regulatory activities. We analyzed mouse liver single cell RNA-seq data (Smart-seq2, Drop-Seq) to detect protein-coding mRNAs and lncRNAs at single cell resolution. 4,500 liver-expressed lncRNAs were detectable at >4 transcripts per million reads. Seurat analysis identified 130 lncRNAs as markers specific to individual liver cell types: hepatocytes, endothelial cells, Kupffer cells, B cells and NK cells. A further 28 lncRNAs showed zonated hepatocyte expression patterns based on established landmark genes. We also identified lncRNAs showing sex-biased expression. In addition, 234 of 412 lncRNAs responsive to the non-genotoxic hepatocarcinogen TCPOBOP were detected; these include lncRNAs specific to non-hepatocyte clusters and lncRNAs showing zonated expression in hepatocytes. These findings are being used to identify protein-coding gene targets of zone co-localized regulatory lncRNAs and obtain insight into the zone-based biological pathways they may regulate. These analyses also demonstrate that xenobiotic exposure can dysregulated lncRNAs expressed in multiple liver cell types.

11:20 AM-11:30 AM
MiR-205-5p and miR-342-3p cooperate in the repression of the E2F1 transcription factor in the context of anticancer chemotherapy resistance
  • Xin Lai, Universitätsklinikum Erlangen, Germany
  • Julio Vera, Universitätsklinikum Erlangen, Germany

Presentation Overview: Show

High rates of lethal outcome in tumour metastasis are associated with the acquisition of chemoresistance. Several clinical studies indicate that E2F1 overexpression culminates in unfavourable prognosis and chemoresistance in patients. Thus, fine-tuning the expression of E2F1 could be a promising approach for treating patients showing chemoresistance. We integrated bioinformatics, structural and kinetic modelling, and experiments to study cooperative regulation of E2F1 by microRNAs in the context of chemoresistance. We showed that an enhanced E2F1 repression efficiency can be achieved in chemoresistant tumour cells through two cooperating microRNAs. We then employed molecular dynamics simulations to show that miR-205-5p and miR-342-3p can form the most stable triplex with E2F1 mRNA. A mathematical model simulating the E2F1 regulation by the cooperative microRNAs predicted enhanced E2F1 repression, a feature that was verified by in vitro experiments. Finally, we integrated this cooperative microRNA regulation into a more comprehensive network to account for E2F1-related chemoresistance in tumour cells. The network model simulations and experimental data indicate the ability of enhanced expression of both miR-205-5p and miR-342-3p to decrease tumour chemoresistance by cooperatively repressing E2F1. Our results suggest that pairs of cooperating microRNAs could be used as potential RNA therapeutics to reduce E2F1-related chemoresistance.

11:30 AM-11:40 AM
Transcriptional Landscape of Human Progenitor Cell Populations
  • Maina Bitar, QIMR Berghofer, Australia
  • Isabela Pimentel de Almeida, Universidade de Sao Paulo and QIMR Berghofer, Brazil
  • Elizabeth O'Brien, QIMR Berghofer, Australia
  • Guy Barry, QIMR Berghofer, Australia

Presentation Overview: Show

Somatic progenitor cells are crucial for human tissue development and maintenance. These unspecialized often tissue-specific cells have recently begun to be used clinically for organ repair and regeneration. As unique cell populations, their molecular composition is of great interest and panoramic views of their transcriptional landscape can support further developments in the field. Here we investigate for the first time a set of five distinct human progenitor cell types using state of the art Bioinformatics tools. We uncover similarities and differences between the cell types based on transcript expression profiles revealed by RNA-Seq. Our analyses suggest very high overall similarity among these progenitor cell populations at the transcriptome level. Although these cell populations have high transcriptomic similarity, we explored the differences between them and found unique transcriptional signatures and cell type-specific coding and non-coding transcripts. This study shows the high transcriptomic similarity of progenitor cells but that a minority of specialized and uniquely expressed transcripts are able to differentiate each cell type. Functional exploration into the transcriptomic similarities and differences between progenitor cell provided essential knowledge about unique cellular markers, and shared and distinct functions required for progenitor cell differentiation into defined cell populations that constitute our tissues and organs.

11:40 AM-12:00 PM
scSLAM-seq and GRAND-SLAM reveal core features of CMV-induced regulation in single cells
  • Florian Erhard, Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Germany

Presentation Overview: Show

Current single-cell RNA sequencing (scRNA-seq) approaches analyze total RNA profiles at a single time point but convey little information about the underlying temporal dynamics. Thus, (i) responses to perturbations cannot be measured directly, (ii) kinetics of transcription cannot be investigated, (iii) short-term changes due to a perturbation within a timescale of a few hours are masked by pre-existing RNA and (iv) changes in RNA synthesis and decay cannot be differentiated.
We present single-cell SLAM-seq (scSLAM-seq), which integrates metabolic RNA labeling, biochemical nucleoside conversion and scRNA-seq to record transcriptional activity. A new computational approach (GRAND-SLAM) that we recently developed allowed us to precisely quantify both new and old RNA for thousands of genes in hundreds of individual cells.
We applied scSLAM-seq to the initial response to lytic cytomegalovirus (CMV) infection. Our data allowed us to perform dose-response analyses at the single cell level. Most of the variability of infection efficacy as well as the interferon and NF-kB responses is due a combination of the cell cycle state at the time of infection and the infection dose. Moreover, scSLAM-seq visualizes transcriptional bursts. We show that these are associated with promoter-intrinsic features indicating that DNA methylation renders promoters non-permissive in between transcriptional bursts.

12:00 PM-12:20 PM
Cellular proteostasis collapse and decoupling between transcription and translation regulation in mammalian senescence
  • Reut Shalgi, Technion, Israel

Presentation Overview: Show

Collapse of proteostasis, the decline in the ability to cope with stress and maintain protein homeostasis in the face of external challenges, has been established as a hallmark of aging. The aging proteostasis collapse has been characterized in nematodes, however, whether it occurs in human aging has still remained controversial.
Here, using RNA-seq and ribosome footprint profiling, we characterized transcription and translation in young vs. senescent human fibroblasts, and asked how these cells respond to proteotoxic stress at various mRNA expression regulatory levels. We found that senescent cells showed a marked deterioration in their ability to mount a heat-shock related transcriptional response. Interestingly, stress-related alternative splicing regulation was also diminished in senescent cells. Surprisingly, we observed a decoupling between different regulatory branches of the Unfolded Protein Response (UPR) in senescence. While young cells initiated both UPR-related translational and transcriptional responses, senescent cells showed enhanced translational regulation, as well as ER stress sensing, however they were unable to activate the UPR transcriptional response.
Together, our data established a first transcriptome-wide characterization of senescence proteostasis collapse at various mRNA expression regulatory levels, and revealed a general deterioration in the cellular ability to mount stress response transcriptional and alternative splicing programs upon senescence.

12:20 PM-12:40 PM
Proceedings Presentation: FunDMDeep-m6A: Identification and prioritization of functional differential m6A methylation genes.
  • Song-Yao Zhang, Northwestern Polytechnical University, China
  • Shao-Wu Zhang, Northwestern Polytechnical University, China
  • Xiao-Nan Fan, Northwestern Polytechnical University, China
  • Teng Zhang, Northwestern Polytechnical University, China
  • Jia Meng, Department of Biological Sciences, HRINU, SUERI, Xi’an Jiaotong-Liverpool University, China
  • Yuifei Huang, Department of Electrical and Computer Engineering, the University of Texas at San Antonio, United States

Presentation Overview: Show

Motivation: As the most abundant mammalian mRNA methylation, N6-methyladenosine (m6A) exists in >25% of human mRNAs and is involved in regulating many different aspects of mRNA metabolism, stem cell differentiation and diseases like cancer. However, our current knowledge about dynamic changes of m6A levels and how the change of m6A levels for a specific gene can play a role in certain biological processes like stem cell differentiation and diseases like cancer is largely elusive.
Results: To address this, we propose in this paper FunDMDeep-m6A a novel pipeline for identify-ing context-specific (e.g., disease vs. normal, differentiated cells vs. stem cells or gene knockdown cells vs. wild type cells) m6A-mediated functional genes. FunDMDeep-m6A includes, at the first step, DMDeep-m6A a novel method based on a deep learning model and a statistical test for identi-fying differential m6A methylation (DmM) sites from MeRIP-Seq data at a single-base resolution. FunDMDeep-m6A then identifies and prioritizes functional DmM genes (FDmMGenes) by combing the DmM genes (DmMGenes) with differential expression analysis using a network-based method. This proposed network method includes a novel m6A-signaling bridge (MSB) score to quantify the functional significance of DmMGenes by assessing functional interaction of DmMGenes with their signaling pathways using a heat diffusion process in protein-protein interaction (PPI) networks. The test results on 4 context-specific MeRIP-Seq datasets showed that FunDMDeep-m6A can identify more context-specific and functionally significant FDmMGenes than m6A-Driver. The functional enrichment analysis of these genes revealed that m6A targets key genes of many important con-text-related biological processes including embryonic development, stem cell differentiation, tran-scription, translation, cell death, cell proliferation and cancer-related pathways. These results demonstrate the power of FunDMDeep-m6A for elucidating m6A regulatory functions and its roles in biological processes and diseases.
Availability: The R-package for DMDeep-m6A is freely available from https://github.com/NWPU-903PR/DMDeepm6A1.0.

2:00 PM-2:20 PM
Global analysis of human mRNA folding demonstrates significant population constraint of disruptive synonymous variants
  • Peter White, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Jeffrey Gaither, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • David Gordon, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Grant Lammi, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Blythe Moreland, The Institute for Genomic Medicine at Nationwide Children's Hospital, United States

Presentation Overview: Show

Current guidelines for variant interpretation classify most synonymous variants (sSNVs) as benign. RNA folding studies suggest mRNA secondary structure is essential for transcription and translation, yet the potential for pathogenic sSNVs impacting RNA folding in human disease is largely unknown. We therefore set out to test the hypothesis that sSNVs predicted to disrupt RNA stability would show significant constraint in the human population.

We performed a systematic study of SNPs impacting RNA stability. First, we developed novel cloud-based software using Apache Spark, deriving RNA folding metrics for every possible polymorphism in the human transcriptome (~0.5 billion variants). Second, we utilized population allele frequencies to determine if highly disruptive SNP mRNA folding values were constrained. Third, these metrics were utilized to construct a Structural Predictivity Index (SPI score).

We observed that sSNVs predicted to disrupt mRNA structure are highly constrained, supporting the hypothesis for their role in human genetic disease. To our knowledge, SPI is the first metric of its kind to allow assessment of sSNVs. Given that ~75% of rare disease patients have no clinically relevant finding using current variant interpretation approaches that ignore sSNVs, SPI has the potential to enable discovery of new pathogenic variants that impact RNA stability.

2:20 PM-2:40 PM
Integrative analysis of untranslated regions in human messenger RNAs uncovers G-quadruplexes as constrained regulatory features
  • David S.M. Lee, University of Pennsylvania, United States
  • Louis R. Ghanem, Children's Hospital of Philadelphia, United States
  • Yoseph Barash, University of Pennsylvania, United States

Presentation Overview: Show

Identifying regulatory elements in the noncoding genome is a fundamental challenge in biology. G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5’ and 3’ UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we uncover patterns of selection at single-nucleotide resolution in UTR pG4s supporting their role in mediating protein-binding via secondary-structure formation. In parallel, we identify new proteins with evidence for preferential binding at pG4s from ENCODE annotations, and delineate putative regulatory networks composed of shared binding targets. Finally, by mapping variants in the NIH GWAS Catalogue and ClinVar, we find enrichment for disease-associated variation in 3’UTR pG4s. At a GWAS pG4-variant associated with hypertension in HSPB7, we uncover robust allelic imbalance in GTEx RNA-seq across multiple tissues, suggesting that changes in gene expression associated with pG4 disruption underlie the observed phenotypic association. Taken together, our results establish UTR G-quadruplexes as important cis-regulatory features, and point to a putative link between disruption within UTR pG4 and susceptibility to human disease.

2:40 PM-3:00 PM
RNA 2D/3D structure prediction with a consensus of contact methods
  • Russell Hamilton, University of Cambridge, United Kingdom
  • Anne Ferguson-Smith, University of Cambridge, United Kingdom
  • William Taylor, Francis Crick Institute, United Kingdom

Presentation Overview: Show

RNA structures are formed from canonical Watson-Crick base-pairings (A:U, C:G, G:U) forming structural elements such as stem-loops. However, non-canonical base-pairings mediated through hoogenstein and sugar edges of the nucleotides permit many more base-pairing combinations, enabling more elaborate 3D structures. Motifs such as the G-quadruplex, i-motif and kink-turn have been suggested to mediate the translation of the RNA, however the roles these motifs remain enigmatic. Therefore, accurate prediction of these motifs is essential for furthering our understanding of the non-canonical base-paired motif functions.

RNA base-pairings predicted using correlated mutation approaches can provide powerful restrictions on the 2D/3D conformation of RNA motifs. However, we have previously shown predicted contacts can be too erratic to provide generalised RNA 3D predictions. To further assess this limitation, we evaluate contact predictions made by four methods (CCMpred, R-Scape, Plmc, pySCA) and benchmark their individual performance against databases of RNAs with known structures (Rfam & RNA-puzzles). We then produce a consensus of restraints, supplemented with minimum-free energy calculations taking into account base stacking. Performance of the consensus restraints is assessed by their ability to accurately predict 3D structures. We discuss strengths and weaknesses of each of the methods and present the results as a dynamic web resource.

3:00 PM-3:20 PM
Proceedings Presentation: LinearFold: Linear-Time Approximate RNA Folding by 5’-to-3’ Dynamic Programming and Beam Search
  • Liang Huang, Oregon State University and Baidu Research USA, United States
  • He Zhang, Baidu Research USA, United States
  • Dezhong Deng, Oregon State University, United States
  • Kai Zhao, Oregon State University, United States
  • Kaibo Liu, Baidu Research USA, United States
  • David Hendrix, Oregon State University, United States
  • David Mathews, University of Rochester, United States

Presentation Overview: Show

Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications.

Results: We present a novel alternative O(n3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high- quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to- right (5’-to-3’) direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models.

Availability: Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (max. sequence length on server: 100,000nt).

3:20 PM-3:30 PM
High resolution analysis of functional regions of mRNA folding in protein-coding sequences across the tree of life
  • Michael Peeri, Tel-Aviv University, Israel
  • Tamir Tuller, Tel Aviv University, Israel

Presentation Overview: Show

mRNA can form local structures and the local folding strength affects the interaction with the ribosome and is thought to influence many additional aspects of gene expression. However, the way evolution shapes local mRNA structure strength in coding sequences is still poorly understood. In this study, we performed the first analysis of selection on secondary-structure strength in the coding sequences of 513 species considering their phylogenetic relationships. We show that coding sequences in phyla from the three domains of life contain consistent regions of increased or decreased secondary-structure strength. These regions coincide with mRNA sections involved in different gene expression processes and in particular different stages of protein translation (initiation, elongation and termination), indicating folding strength has a role in these processes. The increase or decrease in secondary-structure strength in different parts of the coding sequence correlate with genomic and environmental traits and is disrupted in species expected to have weak selection for efficient gene expression (such as species with weak codon-bias or intracellular replication). Our results suggest that mRNA secondary-structure strength is maintained under selection to improve gene expression efficiency. This mechanism complements other synonymous features of the coding sequence to regulate mRNA concentrations and optimize the translation process.

3:30 PM-3:40 PM
Feature reduction of CRISPR-Cas9 on-target efficiency prediction improves the accuracy
  • Jan Gorodkin, University of Copenhagen, Denmark
  • Guilia Corsi, University of Copenhagen, Denmark
  • Ferhat Alkan, University of Copenhagen, Denmark

Presentation Overview: Show

The CRISPR-Cas9 system has become a highly popular DNA scissor in genome editing. However, not all guide RNAs (gRNAs) cleave equally efficiently. Consequently, a wide variety of (machine learning) methods predicting the efficiency have been made, but they typically encode >600 features of the gRNA as input along with experimentally determined efficiencies as output. The features are typically derived directly from the primary sequence only, and potential relevant features from gRNA self-folding and binding interaction with the DNA are omitted, while redundant features can hamper the prediction accuracy. Here, we analyzed the 629 input features used to predict the cleavage efficiency of the Cas9-gRNA tool. We employed a five-fold cross-validation with an independent test set for different architectures of a gradient boosted tree and a feature elimination strategy to reduce the number of input features to 129 without performance loss. Adding five energy related features, yields a reduction to 98 features, while increasing the performance on the independent test set. When comparing the 15 highest weighted features among the trained models, known features like the cutting position are among these. Interestingly, our five energy related features are consistently weighted as top features in all five resulting models.

3:40 PM-3:50 PM
LinearCoFold: Two-Strand RNA Folding in Linear Time
  • Liang Huang, Oregon State University and Baidu Research USA, United States
  • He Zhang, Baidu Research USA, United States

Presentation Overview: Show

Most ncRNAs function through RNA-RNA interactions. Fast and reliable RNA secondary structure prediction with consideration of RNA-RNA interaction is desired. Some existing tools, such as RNAhybrid and RNAduplex, are not only less informative but also less accurate due to omitting the competing between intermolecular and intramolecular base pairs. Another group of tools such as RNAup focus on predicting the binding region rather than predicting two-strand co-folding structure. Some other tools like RNAcofold are slow. We present LinearCoFold, which is able to predict pseudo-knot free co-folding structure in linear runtime and space. LinearCoFold is a global co-folding approach without restriction on base pair length, and can output both intermolecular and intramolecular base pairs. LinearCoFold extends LinearFold to two-strand co-folding by concatenating two interacting RNAs, and adopts a left-to-right dynamic programming (DP). This alternative DP fashion allows it to apply beam pruning heuristic to achieve performance improvement. LinearCoFold is 6 times faster than RNAcofold for the RNA-RNA complex with 6000+ nucleotides. Even using approximate search LinearCoFold is also more accurate compared with RNAcofold: overall PPV/Sensitivity increases by 0.99/3.39. LinearCoFold also significantly improves accuracy in longest families. For the longest family, PPV/Sensitivity is improved by 13.33/15.38, respectively.

4:40 PM-5:00 PM
What can we do with RNA-protein interaction?
  • Rolf Backofen, Albert-Ludwigs-University Freiburg, Germany

Presentation Overview: Show

It is becoming increasingly clear that a RNA-binding proteins are key elements in regulating the cell's transcriptome. CLIP-seq is one of the major tools to determine binding sites but sufferes from high false negative rate due its expression dependency. This crictical hinders the use of public CLIP-data. We will show in several examples how use of raw public CLIP-seq data can lead to false biological reasoning and how advanced advanced machine learning approach can overome this problem. I will also introduce some recent application of approach that allows us to determine mechanism underlying post-transcriptional regulation that can be determined only by using predictions based on CLIP-data. We have introduced the MechRNA approach, which combines prediction evidences to determine the molecular mechnanisms of RNA-based regulation.

Tugce Aktas, Ibrahim Avsar Ilik, ... , Thomas Manke, Rolf Backofen, and Asifa Akhtar.
DHX9 suppresses RNA processing defects originating from the Alu invasion of the human genome. Nature, 2017.

Alexander R. Gawronski, Michael Uhl, ... , S. Cenk Sahinalp, and Rolf Backofen.
MechRNA: prediction of lncRNA mechanisms from RNA-RNA and RNA-protein interactions. Bioinformatics, 2018

5:00 PM-5:20 PM
Predicting canonical and non-canonical box C/D snoRNA interactions using machine learning
  • Gabrielle Deschamps-Francoeur, Université de Sherbrooke, Canada
  • Michelle Scott, University of Sherbrooke, Canada

Presentation Overview: Show

Small nucleolar RNAs (snoRNAs) are small non-coding RNAs separated in two families, the C/D and the H/ACA boxes, that are respectively known to guide methylation and pseudouridylation of ribosomal and small nuclear RNAs. These functions are well characterized and require an interaction with specific regions of the snoRNAs. In humans, however, some snoRNAs do not have known canonical targets. Also, some snoRNAs exhibit non-canonical functions, such as regulation of alternative splicing and of mRNA stability, the deregulation of which has been implicated in diseases such as Prader-Willi syndrome and cancer. New methodologies were developed allowing the high-throughput detection of RNA-RNA interactions. The study of these datasets revealed that the canonical interactions only account for 5% of all snoRNA interactions.
The aim of this project is to develop a tool to predict snoRNA interactions, both canonical and non-canonical, using an artificial neural network. The datasets obtained using high-throughput interactions identification methodologies were analyzed and compared, together with a curation of the literature. The interaction sequences were fed to the algorithm resulting in an accuracy of 0.78.
With this tool, we will be able to predict novel potential snoRNA interactions, shedding light on their non-canonical functions and their implication in different diseases.

5:20 PM-5:40 PM
Chromatin-enriched RNAs mark both active and repressive cis-regulation: a computational analysis of nuclear RNA-seq
  • Xiangying Sun, Purdue University, United States
  • Zhezhen Wang, The University of Chicago, United States
  • Carlos Perez-Cervantes, The University of Chicago, United States
  • Alex Ruthenburg, The University of Chicago, United States
  • Ivan Moskowitz, The University of Chicago, United States
  • Michael Gribskov, Purdue University, United States
  • Xinan Yang, The University of Chicago, United States

Presentation Overview: Show

Most long noncoding RNAs (lncRNAs) localize in the cell nucleus to influence important biological processes. Modern RNA sequencing of nuclei and/or their components has revealed cis-regulatory lncRNAs, including chromatin-enriched nuclear RNA (cheRNA) that is tightly bound to chromatin. However, a rigorous analytic pipeline for nuclear RNA-seq is lacking. In this study, we survey four computational strategies for nuclear RNA-seq data analysis and show outperformance of a new pipeline (Tuxedo) in complete transcriptome assembly and accurate cheRNA identification. Analyzing well-studied K562 datasets with the Tuxedo pipeline, we characterize genomic features of intergenic cheRNA (icheRNA) that is similar to those of enhancer RNA (eRNA). Moreover, we quantify the transcriptional correlation of icheRNA and adjacent genes, affirming that icheRNA is more positively associated with its neighbor gene in expression than the eRNA predicted by state-of-art methods or CAGE (cap analysis of gene expression) signals. We further propose icheRNA coincident with H3K9me3 marks as a very effective predictor for novel chromatin-based eRNA, and a potential cis-repressive function of antisense cheRNA (as-cheRNAs); these activities are likely to be involved in transiently modulating cell type-specific cis-regulation. These findings demonstrate a rigorous computational analysis of nuclear RNA-seq will shed new light on cis-regulation.

5:40 PM-5:50 PM
miRCoop: Identifying Cooperating Pairwise miRNAs \\ via Kernel Based Interaction Test
  • Oznur Tastan, Sabanci University, Turkey
  • Gulden Olgun, Bilkent University, Turkey

Presentation Overview: Show

MicroRNAs(miRNAs) are small non-coding RNAs that regulate gene expressions post-transcriptionally by binding the complementary sequence of their target messenger RNAs(mRNAs). Recent studies reveal that miRNA pairs can repress the translation of target mRNA in a synergistic fashion; when bound together they induce a stronger down-regulation of their target mRNA. Our knowledge of on synergistic miRNA pairs is very limited. In order to identify the cooperative miRNA pairs, we propose a new method: miRCoop. miRCoop makes use of the miRNA – mRNA target prediction tools to find miRNA pairs that are predicted to target the same mRNA with non-overlapping binding sites. Using these potential triplets, we conduct kernel-based statistical interaction tests on the expression profiles of miRNAs and mRNAs to identify triplets for which the miRNAs’ expressions are statistically independent from the mRNA’s expression when taken individually but are dependent when taken together. We apply miRCoop on kidney cancer patient expression profiles. When applied to kidney cancer patient expression profiles, we find 503 potentially cooperative miRNA:miRNA:mRNA interactions. Several of these miRNAs are regulated with the same transcription factor. Furthermore, there are pairs clustered on the genome. We hope miRCoop will facilitate the mapping of the miRNA functional landscape.

8:00 PM-10:00 PM
  • Hotel Sorell Merian
Thursday, July 25th
8:30 AM-8:50 AM
Is this the “end”? Termin(A)ntor: Transcriptome annotation with deep learning
  • Chenkai Li, BC Cancer Genome Sciences Centre, Canada
  • Chen Yang, BC Cancer Genome Sciences Centre, Canada
  • Ka Ming Nip, BC Cancer Genome Sciences Centre, Canada
  • Rene Warren, BC Cancer, Genome Sciences Centre, Canada
  • Inanc Birol, BC Cancer Genome Sciences Centre, Canada

Presentation Overview: Show

Recent advances in high-throughput sequencing technologies have enabled comprehensive transcriptome analysis with base-level resolution. However, intrinsic biases, such as GC content and PCR cycles, make it challenging to assemble and annotate the 5’ and 3’ ends of transcript isoforms. Consequently, reconstructed transcript sequences may be incomplete, potentially missing untranslated regions, resulting in unreliable functional and regulatory analyses. Thus, it is desirable to have a completeness annotation pipeline for assembled transcripts and to incorporate such into RNA-seq analysis routines. Here we present Termin(A)ntor, a transcript annotation utility that is built upon two deep neural network classifiers. With as short as 20 bp sequence from the 5’ or 3’ end of assembled transcript sequence, our models achieve >81% and >87% annotation accuracy of the 5’ transcription start site and 3’ polyadenylation site (Poly(A) site), respectively. Through benchmarking the Poly(A) site prediction performance on two human RNA-seq samples, Termin(A)ntor demonstrates both higher sensitivity and precision than state-of-the-art methods. We performed cross-species experiment to show the capability of Termin(A)ntor to accurately annotate species without a good reference annotation or genome.

8:50 AM-9:00 AM
RNA-seq methodological landscape : the ignored importance of the choice of genome annotations
  • Joël Simoneau, Université de Sherbrooke, Canada
  • Simon Dumontier, Université de Sherbrooke, Canada
  • Ryan Gosselin, Université de Sherbrooke, Canada
  • Michelle Scott, University of Sherbrooke, Canada

Presentation Overview: Show

The process of transforming RNA-seq sequencing data into meaningful quantification of gene features can be decomposed in a series of defined steps. Hundreds of different software and several different biological resources exist to fulfill the different steps. Therefore, users must define a set of steps (e.g. trimming, alignment and quantification software, in addition to a genome assembly and annotation) to process RNA-seq datasets. However, users cannot currently rely on an extensive assessment of the importance of every design choices to create their own suitable analysis. Our objective is to characterize the relative importance of each methodological step in RNA-seq on gene quantification. First, the current usage of software, genome and genomic annotation was characterized throughout the literature by performing a methodological review. Second, using different permutations of software and references highlighted by the methodological review, we explored the biases of the steps using statistical approaches. Through a methodological review and statistical analyses of RNA-seq data, we show that the choice of genome annotation not only has the biggest impact of gene quantification, it is also the least well-described design choice in the literature. We believe that the importance of genome annotation in quantification has been underestimated and not thoroughly characterized.

9:00 AM-9:10 AM
Learning biologically meaningful representations of cancer transcriptomes with hierarchical Variational Bayes
  • Jasleen Grewal, Canada's Michael Smith Genome Sciences Centre, Canada
  • Steven Jones, BC Cancer, Genome Sciences Centre, Canada

Presentation Overview: Show

RNA-Seq experiments yield unbiased gene expression measurements with high dynamic range and comprehensive transcriptome coverage. In spite of this, most interpretation tasks from RNA-Seq data use gene subsets selected by statistical prioritization or background knowledge. This approach overlooks the impact of understudied genes and diffused biological changes, and can influence our understanding of unusual etiologies. This is particularly relevant when dissecting the biological underpinnings of rare and advanced cancers.
Our method leverages Bayesian inference to approximate the generative distribution from which individual RNA-Seq transcriptomes originate. We use this method to develop compact representations of >10,000 primary cancer transcriptomes. These representations can be decoded back into the whole transcriptome, and our findings suggest that they capture biological hierarchies within the data. Upon analysing decoded representations, we found that genes associated with cellular functionality, and cancer onset and maintenance are highly correlated with their scaled input RPKM values, whereas short non-coding RNAs, keratin associated protein family members, and a subset of pseudogenes are reproduced with less fidelity. The immediate application of this method lies in identifying genes that are under strong transcriptional control in a set of samples, and in generating biologically meaningful, compressed representations of the whole transcriptome for further analysis.

9:10 AM-9:30 AM
Complexity and evolution of the mammalian transcriptome: the architecture of alternative transcription and processing
  • Svetlana Shabalina, NCBI/NLM/NIH, United States

Presentation Overview: Show

Alternative transcription and splicing create extraordinary complexity of transcriptomes and lay the basis for structural and functional diversity of mammalian proteomes. We present evidence that the acquisition of new exons in spliced genes can occur by mosaic extension of gene functional domains, where new alternative coding exons can be incorporated through the course of evolution, preferentially at the ends of CDSs. Additionally, the acquisition of novel exons at the boundaries of CDSs and UTRs is primarily mediated by alternative transcription events, where extended 5' and 3' ends make major contributions to the diversity of the protein isoforms. Alternative transcription makes five times larger contribution to human transcriptome and proteome diversity than alternative splicing, specifically for tissue- and condition-specific genes and their expression. Our results suggest that differential processing of the 5' and 3' ends reflect two regulatory strategies at the level of i) transcription initiation using variable promoters and ii) post-transcriptional regulation with C-terminal variability of protein isoforms. We showed that during evolution, compact protein domains are typically encoded by highly structured mRNAs, suggesting that alternative mRNA structures might control protein folding of alternative isoforms. The role of alternative isoforms as biomarkers will be discussed for disease states.

9:30 AM-9:40 AM
Reference-free transcriptome assembly of nanopore RNA-seq data
  • Chen Yang, BC Cancer Genome Sciences Centre, Canada
  • Saber Hafezqorani, BC Cancer Genome Sciences Centre, Canada
  • Rene Warren, BC Cancer, Genome Sciences Centre, Canada
  • Inanc Birol, BC Cancer Genome Sciences Centre, Canada

Presentation Overview: Show

In recent years, there has been a growth in the number of sequence assembly solutions for nanopore sequencing data. These methods are designed for genome assembly and do not work well with RNA-seq data, if at all. Existing methods for transcriptome assembly with nanopore RNA-seq reads are either reference-guided or intended for assembling a hybrid of long and short reads. To fill this gap, we introduce a de novo assembly method that only uses nanopore RNA-seq data. In our approach, reads are first corrected for errors and then stratified by length and k-mer coverage. Then, error-corrected reads are retrieved from each stratum for clustering into groups of reads that belong to the same gene. Finally, the reads in each cluster are assembled into transcript isoforms. Using a simulated mouse transcriptome dataset, we show that our method was able to correct a significant proportion of errors in the nanopore reads and then assemble full-length isoforms from clusters of reads predominantly representing single genes. Since our method does not rely on a reference for transcript sequence reconstruction, it sets up the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.

10:15 AM-10:20 AM
  • Yoseph Barash, Klemens Hertel, Michelle Scott
10:20 AM-11:00 AM
Computational approaches to dissect post-transcriptional gene regulation
  • Uwe Ohler

Presentation Overview: Show

Deep sequencing protocols enable the large-scale identification of the targets of regulatory non-coding RNAs as well as RNA-binding proteins, and they also enable the profiling of different steps of gene regulation, from nascent transcription to translation. I will present ongoing work on detecting translation at the isoform level from ribosome profiling data, as well as deep learning approaches to identify functional binding sites and their features from cross-linking and immunoprecipitation compendia.

11:00 AM-11:20 AM
Innovative advanced computational solutions for improved gene and transcript level analysis using RNA-seq
  • Runxuan Zhang, The James Hutton Institute, United Kingdom
  • Wenbin Guo, University of Dundee, United Kingdom
  • Cristiane Calixto, University of Dundee, Brazil
  • Allan James, University of Glasgow, United Kingdom
  • Hugh Nimmo, University of Glasgow, United Kingdom
  • John Brown, University of Dundee, United Kingdom

Presentation Overview: Show

Understanding the current limitations of RNA-seq is crucial for reliable expression analysis. We have developed:
1) Novel methods to construct high-quality transcript references using RNA-seq (Zhang et al. 2017 NAR) and Iso-seq data. Comprehensive and accurate transcript references ensure the accuracy of transcript quantification which underpins research on post-transcriptional regulation (AS, APA, translation, etc).
2) A cutting-edge pipeline (3D RNA-seq) that detects differential gene expression, alternative splicing, and transcript usage. It allows simple and rapid expression/AS analysis of RNA-seq experiments by biologists with no programming skills (https://github.com/wyguo/ThreeDRNAseq).
3) A shiny app (TSIS) (Guo et al. 2017 Bioinformatics) to detect and characterize significant transcript isoform switches for time-series RNA-seq.
4) Tools that accurately identify open reading frames (ORFs) avoiding mis-annotation of ORFs found in many databases (Brown et al. 2015 Plant Cell) (Transfix) and characterize transcripts encoding protein variants, AS events, premature termination codons and nonsense-mediated decay features (Transfeature).

These tools/methods enabled the large scale investigation of expression/AS in a cold time-series in Arabidopsis showing a massive and rapid AS response and identifying novel cold-responsive transcription and splicing factors regulated only by AS (Calixto et al. 2018 Plant Cell).

11:20 AM-11:40 AM
ShiRlOc: A robust computational approach to analyze Polysome Profiling RNA-Seq Data
  • Charles Blatti, University of Illinois at Urbana-Champaign, United States
  • Mikel Heranez, University of Illinois, at Urbana-Champaign, United States
  • Waqar Arif, University of Illinois at Urbana-Champaign, United States
  • Auinash Kalsotra, University of Illinois at Urbana-Champaign, United States

Presentation Overview: Show

There has been a growing interest in understanding the regulation of translating mRNAs within a cell, or the translatome. Recently researchers have coupled RNA-Seq with polysome profiling, a well-established technique in which intact mRNA can be fractionated based on the number of associated ribosomes, or its ribosomal occupancy. Data obtained from this method provides transcriptome-wide view of translating mRNA and has the potential of shedding light into mechanisms of regulation. However, a robust computational pipeline to analyze this data has been lacking. Previous approaches have utilized clustering techniques on normalized read count to identify transcripts with differential ribosomal association. A major drawback of this method is the lack of statistical testing to discriminate transcripts with significant differences. In the present work, we propose a robust computational approach for the analysis of Polysome Profiling RNA-Seq data and identification of transcripts exhibiting translation control. We call our pipeline Shirloc or Shifts in Ribosomal Occupancy. Utilizing publicly available datasets, we have found our methodology is able to identify thousands of transcripts with varying degrees of ribosomal occupancy. However, our pipeline has also revealed that a significant portion of expressed transcripts display large variability and their relative ribosomal occupancy cannot be confidently determined.

11:40 AM-12:00 PM
Proceedings Presentation: ShaKer: RNA SHAPE prediction using graph kernel
  • Stefan Mautner, Albert-Ludwigs-University Freiburg, Germany
  • Soheila Montaseri, Albert-Ludwigs-University Freiburg, Iran
  • Milad Miladi, University of Freiburg, Germany
  • Martin Mann, University of Freiburg, Germany
  • Fabrizio Costa, University of Exeter, United Kingdom
  • Rolf Backofen, Albert-Ludwigs-University Freiburg, Germany

Presentation Overview: Show

SHAPE experiments are used to probe the structure of RNA molecules.
We present ShaKer to predict SHAPE data for RNA using a graph-kernel-based machine learning approach that is trained on experimental SHAPE information.
While other available methods require a manually curated reference structure, ShaKer predicts reactivity data based on sequence input only and by sampling the ensemble of possible structures.
Thus, ShaKer is well placed to enable experiment-driven, transcriptome-wide SHAPE data prediction to enable the study of RNA structuredness and to improve RNA structure and RNA-RNA interaction prediction.

For performance evaluation we use accuracy and accessibility comparing to experimental SHAPE data and competing methods. We can show that Shaker outperforms its competitors and is able to predict high quality SHAPE annotations even when no reference structure is provided.

12:00 PM-12:20 PM
Proceedings Presentation: Prediction of mRNA subcellular localization using deep recurrent neural networks
  • Zichao Yan, McGill University, Canada
  • Eric Lécuyer, Institut de Recherche Clinique de Montréal, Canada
  • Mathieu Blanchette, McGill University, Canada

Presentation Overview: Show

Motivation: Messenger RNA subcellular localization mechanisms play a crucial role in post-transcriptional gene regulation. This trafficking is mediated by trans-acting RNA-binding proteins interacting with cis-regulatory elements called zipcodes. While new sequencing-based technologies allow the high-throughput identification of RNAs localized to specific subcellular compartments, the precise mechanisms at play, and their dependency on specific sequence elements, remain poorly understood.
Results: We introduce RNATracker, a novel deep neural network built to predict, from their sequence alone, the distributions of mRNA transcripts over a predefined set of subcellular compartments. RNATracker integrates several state-of-the-art deep learning techniques (e.g. CNN, LSTM and attention layers) and can make use of both sequence and secondary structure information. We report on a variety of evaluations showing RNATracker's strong predictive power, which is significantly superior to a variety of baseline predictors. Despite its complexity, several aspects of the model can be isolated to yield valuable, testable mechanistic hypotheses, and to locate candidate zipcode sequences within transcripts.
Availability: Code and data can be accessed at https://www.github.com/HarveyYan/RNATracker

12:20 PM-12:30 PM
Novel bioinformatics tools to assess the functional impact of alternative isoform usage
  • Francisco Pardo Palacios, Centro de Investigación Príncipe Felipe (CIPF), Spain
  • Lorena de La Fuente Lorente, Centro de Investigación Príncipe Felipe (CIPF), Spain
  • Pedro Salguero, Centro de Investigación Príncipe Felipe (CIPF), Spain
  • Cristina Marti, CIPF, Spain
  • Manuel Tardaguila, Sanger Institute, United Kingdom
  • Hector del Risco, University of Florida, United States
  • Ana Conesa, University of Floria, United States

Presentation Overview: Show

Post-transcriptional mechanisms such as Alternative Splicing (AS) and Alternative Polyadenylation (APA) regulate the maturation of pre-mRNA molecules and may result in different transcripts arising from the same gene. AS and APA increase of diversity and regulation capacity of transcriptomes and proteomes. AS and APA has been extensively characterised at the mechanistic level but to a lesser extent in terms of functional impact. User-friendly tools for functional profiling at isoform resolution are missing, limiting our capacity for investigating the functional consequences of posttranscriptional RNA maturation.We have developed a novel analysis framework for functional iso-transcriptomics consisting of a pipeline for the isoform-resolved functional annotation (IsoAnnot) and a user-friendly software to analyse the potential impact of AS and APA (tappAS). IsoAnnot incorporates more than 15 funcional databases while tappAS implements novel algorithms to interrogate the interplay between AS and function. We applied these methods to characterise the iso-transcriptome in neural differentiation and in plant tissues. Our analysis framework reveals the functional motifs differentially included in isoforms to regulate the function of specific biological processes and offers new venues to investigate the functional consequences of post-transcriptional regulation.

12:30 PM-12:40 PM
Adjusting for known and unknown confounding factors in RNASeq based splicing analysis
  • Barry Slaff, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States

Presentation Overview: Show

Correcting for confounding factors when studying gene expression has received much attention. Remarkably, there are no equivalent methods for RNA splicing analysis. There is therefore a strong need for a method which integrates with existing splicing tools, scales to datasets with thousands of samples and millions of junctions, and produces adjusted counts enabling greater insights from exploratory analyses and differential splicing tests.

We developed MOCCASIN, a splicing confounding factor adjustment method. The method can correct for known and unknown confounding factors, scales to very large datasets, and produces adjusted counts which can be integrated with splicing quantification and visualization tools such as MAJIQ. Using RNASeq from hundreds of cancer samples (AML, ALL), we demonstrate the effectiveness of MOCCASIN and show it compares favorably with a procedure recently used by ENCODE on splicing data. We show that corrected splicing quantifications improve signal for both known (treatment) and unknown (gender) signals.

MOCCASIN is the first confounding factor correction method integrated with splicing quantification and analysis software (MAJIQ). The method successfully removes variation from known and unknown confounders, reduces false positives in differential splicing analysis, and scales to consortium-level RNA seq datasets such as TCGA and GTEX.

2:00 PM-2:40 PM
Regulation of translation in relation to cell fate
  • Mihaela Zavolan

Presentation Overview: Show

Cell type specification has been primarily attributed to epigenetic

and transcriptional regulation. However, translation regulation is

prominent in early development and several recent studies have linked

alterations in protein synthesis to changes in cell

functionality. Furthermore, evidence is accumulating that ribosome

variants contribute to cell type-specific programs of gene

expression. In this talk I would like to present our efforts in

characterizing the regulation of mRNA translation in relation to

cellular states, from yeast to man. I will describe our work on

inferring determinants of protein synthesis rates in yeast, as well as

the relationship between protein synthesis capacity and lifespan. I

will also discuss aspects of translation regulation in mammalian

system, specifically relative to cell type and the proliferation rate

of cells.

2:40 PM-3:00 PM
A landscape of circadian and ultradian alternative splicing in mammalian tissues
  • Rukeia El-Athman, Institute for Theoretical Biology, Humboldt University Berlin and Charité Medical University Berlin, Germany

Presentation Overview: Show

Mounting evidence points to a role of the circadian clock in the temporal regulation of pre-mRNA splicing. To investigate whether the same gene can give rise to transcripts with divergent oscillatory patterns, we analyzed circadian and ultradian transcriptional rhythms of individual isoforms and compared them to those observed on gene-level in 12 mouse and 64 olive baboon tissues. We found various splicing-related genes with consistently 24-h rhythmic transcriptional activity across tissues and species that displayed a bimodal phase distribution. We further identified 24-h and 12-h rhythmic putative alternative splicing events in murine tissues and pairs of differentially 24-h and 12-h rhythmic splice isoforms of the same gene in baboon tissues whose expression peaked at opposing times of the day. Several of the candidate genes were associated with mRNA splicing processes, hinting at a reciprocal interplay between the observed circadian rhythmicity of splicing-related genes and time-of-day-dependent isoform production. We extended our findings by analyzing a novel dataset of two colorectal cancer cell lines in different progression stages from the same patient. Both displayed 24-h and 12-h rhythmic phase-shifted isoforms that differed between the primary tumor and the metastatic cell line, pointing to a role of rhythmic alternative splicing in tumor progression.

3:00 PM-3:20 PM
Isoforms across single cells and brain cell types.
  • Hagen Tilgner, Cornell University, United States

Presentation Overview: Show

We recently published distinct long-read isoform methods, including a) single-cell isoform RNA sequencing (ScISOr-Seq)1 and b) synthetic-long-read RNA sequencing (SLR-RNA-Seq)2. ScISOr-Seq operates on single-cell suspensions from bulk tissue, employs 3’end sequencing to determine the cell type of each single cell and then isoform sequencing (PacBio or Oxford Nanopore) to determine the complete isoforms of single cells and cell populations. SLR-RNA-seq determines full-length sequences of millions of single molecules using deep short-read sequencing of very few molecules (which statistically almost certainly originate from different genes) at a time. Importantly SLR-RNA-seq can work from less than 1 nanogramm of material, thus requiring much less PCR. Here, I will describe so far unpublished applications of these technologies in the mammalian brain. I will report on the coordination of RNA processing events that do not involve RNA splice sites, in single cells, cell lines and central nervous system cell types. Furthermore, our newer datasets reveal an order of magnitude more cell-type specific isoform expression patterns than our previous datasets. Thus cell-type specific isoforms will be a more easily addressed topic in the near future.

1. Gupta*,Collier* et al., Nature Biotechnol, 2018
2. Tilgner*, Jahanbani* et al. Nature Biotechnol, 2015

3:20 PM-3:40 PM
A Comprehensive Map of Intron Branchpoints and Lariat RNAs in Plants
  • Xiaotuo Zhang, Fudan University, China
  • Binglian Zheng, Fudan University, China
  • Junqiang Guo, Kunming University of Science and Technology, China
  • Chenyu Lu, Kunming University of Science and Technology, China
  • Li Liu, Kunming University of Science and Technology, China
  • Kun Chen, Kunming University of Science and Technology, China
  • Qi Tang, Fudan University, China
  • Haoran Ge, Fudan University, China
  • Jinping Cheng, Fudan University, China
  • Ziwei Li, Fudan University, China
  • Taiyun Wang, Fudan University, China
  • Yong Zhang, Fudan University, China
  • Yun Zheng, Kunming University of Science and Technology, China

Presentation Overview: Show

Lariats are formed by excised introns, when the 5’ splice site joins with the branchpoint (BP) during splicing. Although lariat RNAs are usually degraded by RNA debranching enzyme 1, recent findings in animals detected many lariat RNAs under physiological conditions. By contrast, the features of BPs and to what extent lariat RNAs accumulate naturally are largely unexplored in plants. Here, we analyzed 948 RNA sequencing datasets to document plant BPs and lariat RNAs on a genome-wide scale. In total, we identified 13872, 5199, 29582, and 13478 BPs in Arabidopsis thaliana, tomato (Solanum lycopersicum), rice (Oryza sativa), and maize
(Zea mays), respectively. Features of plant BPs are highly similar to those in yeast and human, in that BPs are adenine-preferred and flanked by uracil-enriched sequences. Intriguingly, ~20% of introns harbor multiple BPs, and BP usage is tissue-specific. Furthermore, 10,580 lariat RNAs accumulate in wild-type Arabidopsis plants, and most of these lariat RNAs originate from longer or retroelement-depleted introns. Moreover, the expression of these lariat RNAs is accompanied with the incidence of back-splicing of parent exons. Collectively, our results provide a comprehensive map of intron BPs and lariat RNAs in four plant species, and uncover a link between lariat turnover and splicing.

3:40 PM-4:00 PM
Measuring isoform co-expression in single-cell RNAseq successfully decodes splicing coordination as a key determinant of neural cell-type identity
  • Ángeles Arzalluz-Luque, Universitat Politècnica de València, Spain
  • Sonia Tarazona, Universitat Politècnica de València, Spain
  • Ana Conesa, University of Florida, United States

Presentation Overview: Show

Single-cell RNAseq studies have mainly focused on the discovery and characterization of new cell types, leaving areas such as isoform expression dynamics unexplored. In this study, we develop the first pipeline to infer co-expression relationships between modules of full-length isoforms using single-cell data, and explore the biological implications of alternative splicing as a coordinated mechanism that changes transcript and protein properties simultaneously depending on the cell type. Furthermore, our pipeline overcomes three main limitations of previous analyses: (1) assessment of differential expression and differential splicing across multiple groups, avoiding pairwise comparisons, (2) new correlation measure that reduces interferences of noise and dropouts in traditional metrics, successfully capturing co-expression relationships, and (3) multi-group analysis of genes with co-isoform usage, via the clustering of transcripts with similar expression profiles across all groups. By looking at gene sets whose isoforms are assigned to different clusters, we demonstrate that splicing acts coordinately to increase 3’UTR length and the expression of associated miRNA and RBP binding motifs in neural vs glial cells, as well as to promote the inclusion of exons coding for specific protein domains in neurons, and propose a role for the differential splicing of these genes in cytoskeleton remodelling and vesicle transport.

4:00 PM-4:20 PM
Genome wide quantification of ADAR A-to-I RNA editing activity
  • Shalom Hillel Roth, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Israel
  • Erez Y. Levanon, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Israel
  • Eli Eisenberg, Raymond and Beverly Sackler School of Physics and Astronomy and Sagol School of Neuroscience, Tel Aviv University, Israel

Presentation Overview: Show

Adenosine(A)-to-Inosine(I) (interpreted as guanosines (G)) RNA editing is a ubiquitous and critical RNA modification, catalyzed by the ADAR protein family in dsRNA. Editing affects coding sequences, mRNA processing and regulation, and inflammation of the tissue. Aberrant RNA-editing has recently been associated with cancer, autoimmune disorders such as Psoriasis and SLE, autism, and the efficacy of PDL-1 inhibitors. Thus, a global editing quantification is an important tool for the study of these conditions. As a typical example, over 90% of all human editing activity occurs within Alu regions, encompassing millions of sites, but with a very low mean editing level (< 1%). Therefore, quantification of editing rates at these sites requires an almost unachievable ultra-high coverage, such that de-novo detection schemes are biased, picking up only a random fraction of the editing signal. We created a publically available software package enabling a straight-forward calculation of global editing levels from raw RNA alignment files. We applied it to the GTEx project, analyzing global editing patterns for 8848 different tissues. We found it robust and consistent with respect to varying read length, coverage and alignment scheme. To demonstrate the versatility of this method, we adapted and used it with murine data as well.

4:20 PM-4:35 PM
PolyA-miner: Accurate Estimation of Alternative Poly-Adenylation from 3’Seq data using Non-negative matrix factorization and Vector algebra
  • Hari Krishna Yalamanchili, Baylor College of Medicine, United States
  • Zhandong Liu, Baylor College of Medicine, United States

Presentation Overview: Show

More than half of human genes exercise alternative polyadenylation (APA) to generate different mRNA transcripts. Increasing significance of APA in disease context propelled the development of several 3’ sequencing techniques. In spite of this there are no computational tools that are designed precisely for 3’seq data. Here we present PolyA-miner, a novel alternative polyadenylation quantification algorithm based on Non-negative matrix factorization (NMF). A gene is abstracted as a matrix of polyadenylation sites and an iterative Consensus NMF is executed to extract a robust dichotomization of samples. Statistical significance is evaluated as the goodness-of-fit of the dichotomization over a null model. We evaluated PolyA-miner on Glioblastoma cell line PAC-seq data. Strikingly, 1418 genes with APA changes are identified in contrast to 695 genes reported in the original study. In addition, 157 genes with novel polyadenylation sites are identified . PolyA-miner is the first computational tool specifically designed for 3’Seq data. Iterative Consensus NMF makes it less susceptible to sample variation. It can effectively identify novel APA sites and account all APA changes including non-proximal to non-distal changes. With the emerging importance of APA in human diseases, PolyA-miner can significantly accelerate analysis and help decoding the missing pieces of underlying APA dynamics.

4:35 PM-4:40 PM
Wrap-up and Poster Prizes
  • Yoseph Barash, Klemens Hertel, Michelle Scot