Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

iRNA COSI

Presentations

Schedule subject to change
Monday, July 13th
10:40 AM-11:20 AM
iRNA Keynote: The Functional Iso-Transcriptomics toolset to leverage long reads sequencing for unraveling isoform transcriptional networks from single cells.
Format: Live-stream

  • Ana Conesa, University of Florida, United States

Presentation Overview: Show

Long read sequencing is revolutionizing our ability to study transcriptome diversity and dynamics by empowering the detection of full-length transcripts. However, long reads pose novel bioinformatics challenges both for removing technology biases as for leveraging the discovery potential of the new data. In this talk I will present the Functional Iso-Transcriptomics analysis toolset composed of SQANTI, IsoAnnot and tappAS, that covers quality control, annotation and statistical analysis of long reads transcriptome data. I will also illustrate how to use these tools to unravel isoform transcriptional networks combining long and short single cell RNA-seq data.

11:20 AM-11:30 AM
Swan: a Python library for the analysis and visualization of long-read transcriptomes
Format: Pre-recorded talk with panel discussion

  • Ali Mortazavi, University of California, Irvine, United States
  • Fairlie Reese, University of California, Irvine, United States

Presentation Overview: Show

Long-read RNA-sequencing platforms such as PacBio and Oxford Nanopore have led to an explosion in discovery of transcript isoforms that were impossible to assemble with short reads. Current transcript model visualization tools are difficult to interpret on a genomic scale and complicate distinguishing similar isoforms.
We introduce the Swan Python library, which is designed for the analysis and visualization of transcript models. Swan offers a robust visualization suite for easily differentiating splicing events. Using a graphical model approach, Swan provides a platform to visually discriminate between transcript models and to identify novel exon skipping as well as intron retention events that are commonly missed in short read transcriptomics. Furthermore, Swan is integrated with flexible differential gene and transcript expression statistical tools that enable the analysis of full-length transcript models in different biological settings. We demonstrate the utility of this software by applying Swan to the HepG2 and HFFc6 human cell lines which have full-length PacBio transcriptome data available on the ENCODE portal. Swan found 4,503 differentially expressed transcripts, including 280 transcripts that are differentially expressed even though the parent gene is not. Swan provides a comprehensive environment to analyze long-read transcriptomes and produce high-quality publication-ready figures.

11:30 AM-11:40 AM
IsoQuant: isoform analysis and quantification with long error-prone transcriptomic reads
Format: Pre-recorded talk with panel discussion

  • Andrey D. Prjibelski, Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, Russia
  • Alla Mikheenko, Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, Russia
  • Anoushka Joglekar, Tri-Institutional Computational Biology & Medicine, Weill Cornell Medicine, United States
  • Dmitry Meleshko, Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, Russia
  • Alla L. Lapidus, Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, Russia
  • Hagen Tilgner, Brain and Mind Research Institute, Weill Cornell Medicine, United States

Presentation Overview: Show

Third-generation (PacBio/Oxford Nanopore) transcriptomics allow to generate long reads, which in contrast to short-read sequencing have the potential to analyse and quantify complex alternative isoforms. Due to long-read RNA sequencing novelty, few computational methods analyse such data, the SQANTI pipeline being an exception that is designed primarily for PacBio data.

Here, we present a software called IsoQuant for reference-based analysis of long error-prone reads. IsoQuant assigns reads to annotated isoforms based on their intron and exon structure, and further performs gene and isoform quantification. For high-error-rate data, the algorithm uses inexact intron and exon matching, which accurately resolves various error-rate induced alignment artifacts, such as skipped short exons or shifted splice sites.

To estimate accuracy of IsoQuant we simulated several Nanopore and PacBio datasets based on mouse and human transcriptomes. For low-error reads (e.g. PacBio CCS), both IsoQuant and SQANTI2 show near-perfect accuracy, but for high error data with complex artifacts (such as Oxford Nanopore - for which SQANTI2 was not designed), IsoQuant’s inexact intron/exon matching yields strong improvement.

IsoQuant is an open-source software implemented in Python and is available at https://github.com/ablab/IsoQuant.

12:00 PM-12:10 PM
Full-length transcriptome reconstruction reveals a large diversity of RNA and protein isoforms in rat hippocampus
Format: Pre-recorded talk with panel discussion

  • Xi Wang, German Cancer Research Center, Germany
  • Xintian You, Max Delbrück Center for Molecular Medicine, Germany
  • Julian Langer, Max Planck Institute for Brain Research, Germany
  • Jingyi Hou, Max Delbrück Center for Molecular Medicine, Germany
  • Fiona Rupprecht, Max Planck Institute for Brain Research, Germany
  • Irena Vlatkovic, Max Planck Institute for Brain Research, Germany
  • Claudia Quedenau, Max Delbrück Center for Molecular Medicine, Germany
  • Erin Schuman, Max Planck Institute for Brain Research, Germany
  • Wei Chen, Southern University of Science and Technology, China

Presentation Overview: Show

Gene annotation is a critical resource in genomics research. Many computational approaches have been developed to assemble transcriptomes based on high-throughput short-read sequencing, however, only with limited accuracy. Here, we combine next-generation and third-generation sequencing to reconstruct a full-length transcriptome in the rat hippocampus, which is further validated using independent 5 ́ and 3 ́-end profiling approaches. In total, we detect 28,268 full-length transcripts (FLTs), covering 6,380 RefSeq genes and 849 unannotated loci. Based on these FLTs, we discover co-occurring alternative RNA processing events. Integrating with polysome profiling and ribosome footprinting data, we predict isoform-specific translational status and reconstruct an open reading frame (ORF)-eome. Notably, a high proportion of the predicted ORFs are validated by mass spectrometry-based proteomics. Moreover, we identify isoforms with subcellular localization pattern in neurons. Collectively, our data advance our knowledge of RNA and protein isoform diversity in the rat brain and provide a rich resource for functional studies.

12:10 PM-12:20 PM
Freddie: Annotation-free Isoform Discovery Using Long-Read Sequencing
Format: Pre-recorded talk with panel discussion

  • Baraa Orabi, The University of British Columbia, Canada
  • Brian McConeghy, Vancouver Prostate Centre, Canada
  • Cedric Chauve, Simon Fraser University, Canada
  • Faraz Hach, The University of British Columbia, Canada

Presentation Overview: Show

Background
Alternative splicing (AS) events are essential to understanding the development of cancer and may play a role as a target of personalized cancer therapeutics. However, detecting novel AS events remains a challenging task: existing reference transcriptome annotation databases are far from universal comprehensiveness and traditional sequencing technologies are limited by their short-read lengths. Given these challenges, transcriptomic long-read sequencing (LRS) presents a promising potential for novel AS discovery.

Methods
We present Freddie, a computational annotation-free isoform discovery and detection tool. Freddie uses genome alignments of transcriptomic LRS reads as input and generates isoform clusters of these reads for a given gene of interest. Freddie then segments the gene interval using the alignments into canonical exon segments. Finally, Freddie clusters the reads into isoform clusters that satisfy a set of expected transcriptomic LRS constraints. We formulate this clustering as an optimization problem that we name Minimum Error Clustering into Isoforms (MErCi) problem and solve it an Integer Linear Program.

Results
We compare the performance of Freddie on both simulated and real datasets with state-of-the-art isoform detection tools with varying dependence on reference annotations. We show that both the segmentation and clustering steps of Freddie are highly accurate and computationally efficient.

12:20 PM-12:30 PM
Prioritizing genes likely to have functionally distinct splice isoforms using long read RNA-seq data
Format: Pre-recorded talk with panel discussion

  • Shamsuddin Bhuiyan, The University of British Columbia, Canada
  • John Tyson, The University of British Columbia, Canada
  • Manuel Belmadani, The University of British Columbia, Canada
  • Jordan Sicherman, The University of British Columbia, Canada
  • Terrance Snutch, The University of British Columbia, Canada
  • Paul Pavlidis, The University of British Columbia, Canada

Presentation Overview: Show

RNA sequencing continues to detect an increasing number of splice variants, yet there are limited genes with evidence of functionally distinct splice isoforms (FDSIs). One challenge is that splice variants are often reconstructed computationally from short sequence fragments, an error-prone process. Here we identify likely candidate genes with FDSIs using long-read RNA-sequencing data that removes much of the present ambiguity around transcript structures.
We developed a computational pipeline for prioritizing splice isoforms in MinION long-read RNA-seq data. We designed the prioritization approach using splice variant-specific conservation (PhastCons, PhyloP and BLAST homology searches), expression, coding-potential, and protein domain annotations. Based on these annotations, the pipeline outputs a prioritized list of genes likely to have FDSIs. We then applied this pipeline to publicly available and novel mouse brain and liver transcriptomes. For 6,799 genes with multiple splice variants, our approach prioritized a set of 44 putative genes with FDSIs. This candidate set includes genes with published evidence of FDSIs (Cdc42) and genes with promising literature evidence of FDSIs (Tpd52l1, Gstz1). The limited amount of long-read data and low sequencing depth hinders our prioritization. Nevertheless, our work aids in establishing guidelines for high-throughput prioritization of genes with FDSIs.

12:30 PM-12:40 PM
iRNA Panel Discussion: Long-read RNA-seq
Format: Live-stream

  • Yoseph Barash, University of Pennsylvania, United States

Presentation Overview: Show

Long read discussion going into lunch break

2:00 PM-2:20 PM
Inferring snoRNA characteristics from their abundance profile in healthy human tissues
Format: Pre-recorded with live Q&A

  • Étienne Fafard-Couture, Université de Sherbrooke, Canada
  • Sonia Couture, Université de Sherbrooke, Canada
  • Michelle S Scott, Université de Sherbrooke, Canada

Presentation Overview: Show

Small nucleolar RNAs (snoRNAs) are noncoding RNAs known to regulate ribosome biogenesis and splicing. SnoRNAs have a highly stable structure which impairs their quantification in high-throughput sequencing (RNA-Seq). The use of thermostable group II intron reverse transcriptase in RNA-seq (TGIRT-Seq) was recently shown to accurately quantify their abundance. To faithfully characterize snoRNAs’ abundance determinants, we carried out the TGIRT-Seq of seven healthy human tissues and conducted subsequent bioinformatic analyses. We found that snoRNAs can be categorized in two abundance profiles with distinct characteristics: uniformly expressed (UE) and tissue-specific (TS) snoRNAs. UE snoRNAs are encoded in protein-coding host gene (HG), are highly conserved and target rRNA whereas TS snoRNAs are encoded in noncoding HG, are poorly conserved and are orphan snoRNAs. We found that UE snoRNAs can be anticorrelated with their HG (in which case the HG is mostly involved in ribosome biogenesis and translation) or correlated (in which case the HG is mostly coding for a ribosomal protein) in regards of their abundance. Conversely, TS snoRNAs are well correlated with their HG. These results suggest a model in which, based upon its abundance profile across tissues, a snoRNA’s target, conservation and functional relationship with its HG can be clearly inferred.

2:20 PM-2:40 PM
Dissecting the role of SINE non-coding RNAs in amyloid pathology: An integrative RNA genomics approach.
Format: Pre-recorded with live Q&A

  • Yubo Cheng, University of Lethbridge, Canada
  • Luke Saville, University of Lethbridge, Canada
  • Babita Gollen, University of Lethbridge, Canada
  • Chris Isaac, University of Lethbridge, Canada
  • Liam R Mitchell, University of Lethbridge, Canada
  • Jogender Mehla, University of Lethbridge, Canada
  • Majid Mohajerani, University of Lethbridge, Canada
  • Athanasios Zovoilis, University of Lethbridge, Canada

Presentation Overview: Show

As the human life span increases, the numbers of people in Canada suffering from aging associated cognitive impairment and Alzheimer’s disease (AD) are expected to rise dramatically. In a previous work we have shown that learning impairment is connected with epigenetic changes in stress response genes (SRGs),(Peleg*, Sanabenesi*,Zovoilis* et al, Science 2010). However, the molecular mechanisms associated with this epigenetic deregulation remain unknown. Among mechanisms that have recently attracted attention are those involving non-protein-coding RNAs (non-coding RNAs), including RNAs derived by repetitive DNA (Zovoilis et al, Cell 2016). Repetitive DNA accounts for ~50% of the noncoding sequences, with Short Interspersed Nuclear Elements (SINEs), being among the most frequent repeats. Here, we applied an integrative RNA genomics and bioinformatics approach to dissect any connection of SINE non-coding RNAs with amyloid pathology. Using short and long RNA sequencing, we demonstrate that SINE RNAs are associated with amyloid pathology in brain, revealing a potential biomarker and a novel molecular mechanism associated with this condition.

2:40 PM-3:00 PM
Inferring competing endogenous RNA (ceRNA) interactions in cancer
Format: Pre-recorded with live Q&A

  • Serdar Bozdag, Marquette University, United States
  • Ziynet Nesibe Kesimoglu, Marquette University, United States

Presentation Overview: Show

To understand driving biological factors for cancer, regulatory circuity of genes needs to be discovered. Recently, a new gene regulation mechanism called competing endogenous RNA (ceRNA) interactions has been discovered. Certain RNAs targeted by common microRNAs (miRNAs) “compete” for these miRNAs, thereby regulate each other by making other free from miRNA regulation. Several computational tools have been published to infer ceRNA networks. In most existing tools, however, expression abundance and groupwise effect of ceRNAs are not considered. In this study, we developed a computational pipeline named Crinet to infer cancer-associated ceRNA networks addressing critical drawbacks. Crinet considers lncRNAs, pseudogenes and mRNAs as potential ceRNAs and incorporates a network deconvolution method to exclude amplifying effect of ceRNA pairs. We tested Crinet on breast cancer data in TCGA. Crinet inferred reproducible ceRNA interactions and groups, which were significantly enriched in cancer-related genes and biological processes. We validated our ceRNA interactions using protein expression data. Crinet outperformed existing tools predicting gene expression change in knockdown assays. Top high-degree genes in the inferred network included known suppressor/oncogene lncRNAs of breast cancer showing the importance of noncoding-RNA’s inclusion for ceRNA inference.

3:20 PM-3:40 PM
Single Cell Chromatin Accessibility Delineates Cellular Identities of the Neonatal Organ of Corti
Format: Pre-recorded with live Q&A

  • Shuze Wang, University of Michigan, United States
  • Joerg Waldhaus, University of Michigan, United States
  • Jie Liu, University of Michigan, United States
  • Mary Lee, University of Michigan, United States
  • Yujuan Fu, University of Michigan, United States
  • Scott Jones, University of Michigan, United States
  • Jenna Diegel, University of Michigan, United States

Presentation Overview: Show

The organ of Corti, the receptor organ for hearing, is formed by a variety of sensory hair cells (HCs) and supporting cells (SCs) within the cochlea. However, the gene regulation mechanisms of cochlea development are not fully understood.

The aim of this study is to identify regulatory elements controlling the differentiation and maturation of the organ of Corti. To achieve this goal, we generated scATAC-seq and scRNA-seq libraries from postnatal day 2 organ of Corti preparations divided into apical and basal compartments. By integrating scRNA-seq data, we identified cell types of scATAC-seq by calculating a Jaccard similarity matrix, identified cell type-specific transcription factors (TFs), classified them as activators and repressors based on function, and further validation by footprints.

Focusing on HCs, we reconstructed the organ’s one-dimensional architecture from both scRNA-seq and scATAC-seq data. We identified novel differentially expressed genes along the tonotopic axis and validated them by RNAscope. Additionally, we identified TFs that drive HC differentiation and maturation by reconstructing developmental trajectories.

The results of this study enable us to understand how epigenomic landscape delineates cellular identities and functions within the organ of Corti. Further studies will investigate regulatory elements driving SC maturation, which will contribute to regenerative strategies.

3:40 PM-4:00 PM
Proceedings Presentation: A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification
Format: Pre-recorded with live Q&A

  • Robert Patro, University of Maryland, United States
  • Avi Srivastava, Stony Brook university, United States
  • Laraib Iqbal Malik, Stony Brook University, United States
  • Hirak Sarkar, University of Maryland, United States

Presentation Overview: Show

Motivation: Droplet based single cell RNA-seq (dscRNA-seq) data is being generated at an unprecedented pace, and the accurate estimation of gene level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When preprocessing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscNRA-seq data, and the strong 3’ sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes.
Results: We introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene expression patterns, and learn informative, empirical priors which we provide to alevin’s gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups.
Availability: The information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0.

4:00 PM-4:40 PM
iRNA Keynote: The race to the 3’ end: tracking mRNA cleavage diversity in real time
Format: Live-stream

  • Athma A. Pai

Presentation Overview: Show

Alternative mRNA processing is increasingly appreciated to play a large role in driving transcriptome
variability, disease etiology, cellular identity, and the molecular response to diverse environmental
stresses. Although we have extensive insights into the regulatory factors and sequence elements that
influence alternative isoform usage, less is known about the temporal dynamics and co-regulation of
RNA processing decisions. Intermediate processing events—splicing and 3’ end cleavage—often occur
co-transcriptionally, with the interplay between transcriptional elongation rates and rates of each
processing event often impacting choices that lead to alternative isoform production. Consequently,
measuring the kinetics of these processes may shed light on how early gene regulatory decisions are
made. Recent development of high-throughput sequencing techniques that capture nascent RNA over
defined temporal intervals has made genome-wide kinetic profiling of RNA maturation possible. Though
rates of mRNA splicing have been estimated globally, the rate at which an mRNA is cleaved and
polyadenylated to complete the maturation process has never been investigated. Here, we present a
novel computational method to estimate genome-wide kinetic parameters for mRNA cleavage rates.
This method capitalizes on short-read sequencing data from nascent mRNAs isolated after a time-course
of 4sU metabolic labeling to model the rate of mRNA maturation over time. To specifically measure
cleavage rates, we first use patterns of read coverage from our sequencing data to approximate the
position at which cleavage occurs. We then estimate the fraction of reads derived from cleaved or
uncleaved molecules at that site across time to model the rate of 3’ end cleavage over time. We applied
this method to nascent RNA-seq data from Drosophila melanogaster S2 cells to estimate
polyadenylation-site (PAS) specific rates of mRNA cleavage. Our findings shed light on the timing of
decisions involved in alternative PAS usage within genes and the variable efficiency of 3’ end cleavage
and polyadenylation across genes.

5:00 PM-5:20 PM
RBP-Pokedex: Prediction of RBP knockdown effect via DNN experiment modeling
Format: Pre-recorded with live Q&A

  • Matthew R. Gazzara, University of Pennsylvania, United States
  • Anupama Jha, University of Pennsylvania, United States
  • Caleb M. Radens, University of Pennsylvania, United States
  • Paul R. Jewell, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States

Presentation Overview: Show

The genome of eukaryotes code for hundreds of RNA-binding proteins (RBP), which regulate the fate of RNA from synthesis to degradation. These RBPs form an extensive network of condition specific regulation, including direct and indirect regulation of other RBPs. This fact, combined with technological limitations, make systematic experimental characterization of their effect on gene expression infeasible. Here, we propose to leverage the available expression measurements in diverse conditions to instead build predictive models for the effect of RBP knockdown on expression of other RBPs, capturing a condition-specific “RBP state”. We develop a model for prediction of knockdown effects via DNN experiments (Pokedex). We use an unsupervised learning approach where we construct a variational autoencoder for the expression of 531 RBPs. Training it on the naturally occurring variations of RBP expression across 53 GTEx tissues we test its ability to predict knockdown effects in ENCODE experiments. Using the expression changes of RBPs compared to control as the test statistic we show this DNN performs significantly better then a standard PCA based linear model of variation. Finally, we make the learned model available to the RNA community through a web-tool, RBP-Pokedex (https://tools.biociphers.org/rbp-pokedex), to predict the effect of pan-tissue RBP knockdowns.

5:20 PM-5:40 PM
Proceedings Presentation: Finding the Direct Optimal RNA Barrier Energy and Improving Pathways with an Arbitrary Energy Model
Format: Pre-recorded with live Q&A

  • Kiyoshi Asai, The University of Tokyo, Japan
  • Hiroki Takizawa, The University of Tokyo, Japan
  • Junichi Iwakiri, The University of Tokyo, Japan
  • Goro Terai, The University of Tokyo, Japan

Presentation Overview: Show

RNA folding kinetics plays an important role in the biological functions of RNA molecules. An important goal in the investigation of the kinetic behavior of RNAs is to find the folding pathway with the lowest energy barrier. For this purpose, most of the existing methods employ heuristics because the number of possible pathways is huge even if only the shortest (direct) folding pathways are considered. In this study, we propose a new method using a best-first search strategy to efficiently compute the exact solution of the minimum barrier energy of direct pathways. Using our method, we can find the exact direct pathways within a Hamming distance of 20, while the previous methods even miss the exact short pathways. Moreover, our method can be used to improve the pathways found by existing methods for exploring indirect pathways. The source code and datasets created and used in this research are available at https://github.com/eukaryo/czno.

5:40 PM-6:00 PM
Proceedings Presentation: The locality dilemma of Sankoff-like RNA alignments
Format: Pre-recorded with live Q&A

  • Ivo Hofacker, University of Vienna, Austria
  • Sebastian Will, University of Vienna, Austria
  • Teresa Müller, University Freiburg, Bioinformatics, Germany
  • Milad Miladi, University of Freiburg, Germany
  • Frank Hutter, University of Freiburg, Germany
  • Rolf Backofen, Albert-Ludwigs-University Freiburg, Germany

Presentation Overview: Show

Motivation:
Elucidating the functions of non-coding RNAs by homology has been strongly limited due to fundamental computational and modeling issues. While existing simultaneous alignment and folding (SA&F) algorithms successfully align homologous RNAs with precisely known boundaries (global SA&F), the more pressing problem of finding homologous RNAs in the genome (local SA&F) is intrinsically more difficult and much less understood. Typically, the length of local alignments is strongly overestimated and alignment boundaries are dramatically mispredicted. We hypothesize that local SA&F approaches are compromised this way due to a score bias, which is caused by the contribution of RNA structure similarity to their overall alignment score.
Results:
In the light of this hypothesis, we study local SA&F for the first time systematically—based on a novel local RNA alignment benchmark set and quality measure. First, we vary the relative influence of structure similarity compared to sequence similarity. Putting more emphasis on the structure component leads to overestimating the length of local alignments. This clearly shows the bias of current scores and strongly hints at the structure component as its origin. Second, we study the interplay of several important scoring parameters by learning parameters for local and global SA&F. The divergence of these optimized parameter sets underlines the fundamental obstacles for local SA&F. Thirdly, by introducing a position-wise correction term in local SA&F, we constructively solve its principal issues.

6:00 PM-7:00 PM
iRNA: Social Hour
Format: Live-stream

  • Yoseph Barash, University of Pennsylvania, United States
  • Michelle S Scott, Université de Sherbrooke, Canada
  • Klemens Hertel

Presentation Overview: Show

Social Hour

Tuesday, July 14th
10:40 AM-11:00 AM
Detection of differential RNA modifications from direct RNA sequencing of human cell lines
Format: Pre-recorded with live Q&A

  • Ploy Pratanwanich, Genome Institute of Singapore, Singapore
  • Ying Chen, Genome Institute of Singapore, Singapore
  • Christopher Hendra, Genome Institute of Singapore, Singapore
  • Fei Yao, Genome Institute of Singapore, Singapore
  • Boon Hsi Sarah Ng, Genome Institute of Singapore, Singapore
  • Alexandre Thiery, Department of Statistics and Applied Probability, National University of Singapore, Singapore
  • Wee Siong Sho Goh, Genome Institute of Singapore, Singapore
  • Jonathan Göke, Genome Institute of Singapore, Singapore

Presentation Overview: Show

Differences in RNA expression can provide insights into the molecular identity of a cell and pathways involved in human diseases. RNA modifications such as m6A have been found to contribute to molecular functions of RNAs. However, quantification of differences in RNA modifications has been challenging. Here, we present a computational method, xPore, to identify differential RNA modifications from direct RNA sequencing data. Based solely on the current intensity profiles, we extend a standard two-Gaussian mixture model to accommodate multi-sample comparisons. For each single site, the model learns two distributions, corresponding to unmodified and modified RNA, the signal properties that are shared across samples, while allowing the probability of each read being modified to be inferred specifically for each sample. Having incorporated prior knowledge into the model, we are able to determine the signal distributions of the modified k-mers and quantitatively estimate the modification rates accordingly. We evaluate our method on transcriptome-wide m6A profiling, demonstrating that we can accurately prioritize differentially modified sites. Together, we demonstrate that RNA modifications can be quantitatively identified from direct RNA-sequencing data with high accuracy, opening many new opportunities for large scale applications in precision medicine. xPore is available at https://github.com/GoekeLab/xpore.

11:00 AM-11:20 AM
RNA editing landscapes: a new model for biomarkers discovery in neurological disease
Format: Pre-recorded with live Q&A

  • Noel-Marie Plonski, Kent State University, United States
  • Ain Shajihan, Kent State University, United States
  • Caroline Nitirahardjo, Kent State University, United States
  • Heather Milliken Mercer, Kent State University, United States
  • Richard Meindl, Kent State University, United States
  • Helen Piontkivska, Kent State University, United States

Presentation Overview: Show

Current studies look at genomic variants or differential gene expression to predict genetic predisposition and find biomarkers for many neurodevelopmental, psychiatric and degenerative disorders. Here we explore a novel approach to elucidate potential diagnostic, prognostic and/or therapeutic biomarkers for major depressive disorder and suicide focusing on a more nuanced aspect of transcriptome diversity, RNA editing. RNA editing, more specifically adenosine deaminase acting on RNA (ADAR) editing which contributes to transcriptome diversity by dynamically altering the ratios of differentially functioning proteins unpinning the “fine-tuning” of neural signaling and synaptic plasticity. We apply a novel approach utilizing an item response theory model the Guttman Scale to create ADAR editing landscape profiles which are then used to map differential editing in major depressive disorder and suicide. We were able to find a handful of genes of interest for further investigation into their direct contribution to neurological symptoms. We also highlight pathways including ion homeostasis that are altered in depression and suicide which warrant further investigation for their role in synaptic plasticity. Furthermore, we provide evidence this model can be used in biomarker discovery for many other neurological disorders in which transcriptome diversity plays an important role.

11:20 AM-11:30 AM
BORE - Detecting RNA Editing Events comfortably
Format: Pre-recorded with live Q&A

  • Ali Imami, University of Toledo, United States
  • Rammohan Shukla, University of Toledo, United States
  • Robert McCullumsmith, University of Toledo, United States

Presentation Overview: Show

RNA editing and its regulation is increasingly being recognized as an important mechanism in disease pathogenesis. With the growing availability of deep RNASeq datasets, the opportunity for global detection of RNA editing events has become achievable. However, despite advances, reliably detecting RNA editing events in the transcriptome remains difficult.
Here we introduce Balanced Output of RNA Editing (BORE), a cloud-native application that provides researchers with the ability to process their own High Throughput Sequencing data and find potential RNA editing sites. The Application is written in Python and Go, and is hosted on Amazon Web Services (AWS).
The Application uses AWS Simple Storage Service (S3) to store the raw FASTQ files. It then processes the files through the BORE Pipeline to generate candidate RNA editing sites. The results are then available to download by the researcher.
The internal workflow of BORE is: download a FASTQ file, align it to a reference genome with HiSAT2, and then preprocess into a filtered high-quality BAM File. The main processing workflow then takes over. We filter out known Single Nucleotide Variants (SNVs), generate an internal representation of the putative editing sites, and represent them as machine-readable files like VCF and as a summary report.

11:30 AM-2:00 PM
iRNA: Poster Session
Format: Live-stream

  • Yoseph Barash, University of Pennsylvania, United States
  • Michelle S Scott, Université de Sherbrooke, Canada
  • Klemens Hertel

Presentation Overview: Show

Poster session going through lunch

2:00 PM-2:40 PM
iRNA Keynote: Non-canonical base pair interactions improve the scalability and accuracy of the prediction and analysis of RNA 3D structures
Format: Live-stream

  • Jérôme Waldispühl, McGill University, Canada

Presentation Overview: Show

A vast and complex network of base interactions stabilizes the 3D architecture of RNAs. Beyond the canonical Watson-Crick and Wobble base pairs, each pair of nucleotides can interact in up to 12 different ways. The frequency of each type of base pair varies a lot, but in most cases their occurrence is essential to shape the local or global geometry of structured RNAs. Non-canonical base pairs are primarily found within or between unpaired regions of the secondary structure commonly refereed as loops. Their concentration in loops creates sophisticated base interaction patterns that are representative of the 3D structure supported by this network.

In this talk, we show that working with a graphical model based on the set of canonical and non-canonical base pairs offers novel opportunities to develop efficient algorithms for predicting and analyzing physical and biochemical properties of RNA 3D structures. We present applications of this framework to (i) the automated discovery of structural motifs in databases, (ii) the prediction of small molecules binding RNAs, and (ii) the prediction of the 3D structure of large RNAs. These results suggest novel avenues to study of the evolution of RNAs or accelerate RNA drug discovery.

2:40 PM-2:50 PM
RNA Secondary Structure Prediction By Learning Unrolled Algorithms
Format: Pre-recorded with live Q&A

  • Xinshi Chen, Georgia Tech, United States
  • Yu Li, KAUST, Saudi Arabia
  • Ramzan Umarov, King Abdullah University of Science and Technology, Saudi Arabia
  • Xin Gao, King Abdullah University of Science and Technology, Saudi Arabia
  • Le Song, Georgia Tech, United States

Presentation Overview: Show

RNA secondary structure prediction is one of the oldest computational problems in bioinformatics, which has been studied for more than 40 years. Usually, researchers tend to utilize dynamic programming to resolve it, which can be relatively slow, with the F1 score being around 0.6 and having difficulty in handling pseudoknots. In this paper, we address it from an entirely new angle, viewing it as a translation with constraints problem. We propose a novel end-to-end deep learning model, called E2Efold, which has the problem-specific constraints embedded in the network architecture. The core idea of E2Efold is to predict the RNA base-pairing matrix directly, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold: it predicts significantly better structures compared to the previous state-of-the-art methods (especially for pseudoknotted structures), with the F1 score being around 0.8, while being as efficient as the fastest algorithm in terms of inference time. The original paper has been published as an oral paper in ICLR 2020. The code of E2Efold is available at https://github.com/ml4bio/e2efold.

2:50 PM-3:00 PM
Elucidating the Automatically Detected Features Used By Deep Neural Networks for RFAM family classification
Format: Pre-recorded with live Q&A

  • Sebastien Lemieux, IRIC / Université de Montréal, Canada
  • Tom MacDougall, Université de Montréal, Canada
  • Léonard Sauvé, Université de Montréal, Canada
  • Francois Major, University of Montreal, Canada

Presentation Overview: Show

In this study, we show that deep feed-forward neural nets are able to accurately classify RNA families directly from RNA sequences only. We demonstrate to what degree those models use length, as well as nucleotide and dinucleotide composition, and higher order subsequences present in the RNA to make accurate predictions by selectively obfuscating combinations of these features. We report the area under the receiver-operator characteristic curve (ROC-AUC) for the classification task of a diverse selection of RNA families, showing how randomizing various implicit sequence features affects the performance of these models, suggesting what features they are able to detect. We hope these findings will encourage the use of artificial neural network models for reliable data-driven detection of RNA families from primary structure directly, and integration of these models into other various sequence-based bioinformatics tasks, such as de novo genome annotation.

3:20 PM-3:40 PM
Proceedings Presentation: LinearPartition: Linear-Time Approximation of RNA Folding Partition Function and Base Pairing Probabilities
Format: Pre-recorded with live Q&A

  • David Mathews, University of Rochester, United States
  • He Zhang, Baidu Research USA, United States
  • Liang Zhang, Oregon State University, United States
  • Liang Huang, Baidu Research USA; Oregon State University, United States

Presentation Overview: Show

RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy (MFE) methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. This slowness is even more severe than cubic-time MFE-based methods due to a larger constant factor in runtime. Inspired by the success of our recently proposed LinearFold algorithm that predicts the approximate MFE structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g., 2.5 days vs. 1.3 minutes on a sequence with length 32,753 nt). More interestingly, the resulting base pairing probabilities are even better correlated with the ground truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNA), as well as a substantial improvement on long-distance base pairs (500+ nt apart).

3:40 PM-4:00 PM
Proceedings Presentation: Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions
Format: Pre-recorded with live Q&A

  • Mathieu Blanchette, McGill University, Canada
  • William Hamilton, McGill University, Canada
  • Zichao Yan, McGill University, Canada

Presentation Overview: Show

Motivation: RNA-protein interactions are key effectors of post-transcriptional regulation. Significant experimental and bioinformatics efforts have been expended on characterizing protein binding mechanisms on the molecular level, and on highlighting the sequence and structural traits of RNA that impact the binding specificity for different proteins. Yet our ability to predict these interactions in silico remains relatively poor.
Results: In this study, we introduce RPI-Net, a graph neural network approach for RNA-protein interaction prediction. RPI-Net learns and exploits a graph representation of RNA molecules, yielding significant performance gains over existing state-of-the-art approaches. We also introduce an approach to rectify particular type of sequence bias present in many CLIP-Seq data sets, and we show that correcting this bias is essential in order to learn meaningful predictors and properly evaluate their accuracy. Finally, we provide new approaches to interpret the trained models and extract simple, biologically-interpretable representations of the learned sequence and structural motifs.

4:00 PM-4:10 PM
Practical Guidance for Genome-Wide RNA:DNA Triple Helix Prediction
Format: Pre-recorded with live Q&A

  • Elena Matveishina, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia, Russia
  • Ivan Antonov, Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, 117312 Moscow, Russia, Russia
  • Yulia Medvedeva, Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, 117312 Moscow, Russia, Russia

Presentation Overview: Show

Long noncoding RNAs (lncRNAs) play a key role in many cellular processes including chromatin regulation. To modify chromatin, lncRNAs often interact with DNA in a sequence-specific manner forming RNA:DNA triple helices. Computational tools for triple helices search do not always provide genome-wide predictions of sufficient quality. Here, we used four human lncRNAs (MEG3, DACOR1, TERC and HOTAIR) and their experimentally determined binding regions for evaluating triple helix parameters used by Triplexator software. We find out that 10 nt as a minimum length, 20% as a maximum error-rate and a minimum G-content of 70% or 40% provide the highest accuracy of triple helices predictions in terms of area under the curve (AUC). Additionally, we combined triple helix prediction with the lncRNA secondary structure and demonstrated that consideration of only single-stranded fragments of lncRNA predicted by RNAplfold with 0.95 or 0.5 thresholds for probability of pairing can further improve DNA-RNA the quality of triplexes prediction, especially in MEG3 case. This improvement can be explained by the number and characteristics of DBDs - regions of lncRNA that form the majority of the triplexes, detected by TDF software.

4:10 PM-4:20 PM
Splicing variations contribute to the functional dysregulation of genes in acute myeloid leukemia.
Format: Pre-recorded with live Q&A

  • Yoseph Barash, University of Pennsylvania, United States
  • Osvaldo Rivera, University of Pennsylvania, United States
  • Kristen Lynch, University of Pennsylvania, United States

Presentation Overview: Show

Altered pre-mRNA splicing may result in aberrations that phenocopy classical somatic mutations. Despite the importance of RNA splicing, most studies of acute myeloid leukemia (AML) have not broadly explored means by which altered splicing may functionally disrupt genes associated with AML. To address this gap, we investigated the splicing variability of 70 AML-associated genes within RNA-Seq data from 29 in-house AML patient samples (PENN cohort). In brief, using the MAJIQ splicing quantification algorithm, we detected 40 highly variable splicing events across the patients of the PENN cohort, many of which are novel and reduce expression of protein without changing overall transcript abundance. Splicing variability occurred independently of known cis-mutations, thus highlighting pathogenic mechanisms overlooked by standard genetic analyses. We also find these 40 splicing events as significantly more variable within the ~400 patient BEAT-AML cohort when compared to normal CD34+ cells. Furthermore, hierarchical clustering revealed a high degree of correlation between 23 of these 40 splicing events in both the PENN and BEAT-AML cohorts, suggesting a pathogenic co-regulation that is not observed in normal CD34+ cells. Overall, our findings highlight underlying transcriptomic complexity across AML populations and demonstrate how previously unreported splicing variations contribute to protein dysregulation in AML.

4:20 PM-4:40 PM
PEGASAS: A pathway-guided approach for analyzing pre-mRNA alternative splicing during cancer progression
Format: Pre-recorded with live Q&A

  • John Phillips, UCLA, United States
  • Yang Pan, UCLA, United States
  • Brandon Tsai, UCLA, United States
  • Zhijie Xie, UCLA, United States
  • Levon Demirdjian, The Children’s Hospital of Philadelphia, United States
  • Wen Xiao, UCLA, United States
  • Harry Yang, UCLA, United States
  • Yida Zhang, UCLA, United States
  • Chia Ho Lin, UCLA, United States
  • Donghui Cheng, UCLA, United States
  • Qiang Hu, Roswell Park Comprehensive Cancer Center, United States
  • Song Liu, Roswell Park Comprehensive Cancer Center, United States
  • Douglas Black, UCLA, United States
  • Owen Witte, UCLA, United States
  • Yi Xing, The Children’s Hospital of Philadelphia, United States

Presentation Overview: Show

Aberrant pre-mRNA alternative splicing (AS) is widespread in cancer, but the causes and consequences of AS dysregulation during cancer progression are not well understood. We developed a novel computational framework, PEGASAS, as a pathway-guided approach for examining the effects of oncogenic signaling on exon incorporation. PEGASAS was designed to study the interplay among oncogenic signaling, AS, and affected biological processes. In this study, we applied PEGASAS to define the AS landscape across prostate cancer disease states and the relationship between splicing and known driver alterations. We compiled a meta-dataset of RNA-seq data of 876 tissue samples from publicly available sources, covering a range of disease states, from normal tissues to aggressive metastatic tumors. PEGASAS analysis revealed a correlation between Myc signaling and splicing changes in RNA binding proteins (RBPs), suggestive of a previously undescribed auto-regulatory phenomenon. We experimentally verified this result in a human prostate cell transformation assay. Our findings establish a role for Myc in regulating RNA processing by controlling incorporation of nonsense mediated decay-determinant exons in RBP-encoding genes. In conclusion, PEGASAS can mine large-scale transcriptomic data to connect changes in pre-mRNA AS with oncogenic alterations that are common to many cancer types.

5:00 PM-5:40 PM
iRNA Keynote: Systematic approaches to study the subcellular localization properties of RNAs and RNA Binding Proteins
Format: Live-stream

  • Eric Lécuyer

Presentation Overview: Show

Our laboratory seeks to understand the biological functions and regulatory mechanisms of RNA intracellular localization. Our projects aim to elucidate the normal functions of RNA trafficking in the maintenance of genome stability and cell polarity, and how disruption of these pathways can contribute to the aetiology of diseases such as cancer and neuromuscular disorders. For this work, we combine the versatility of Drosophila genetics with high-throughput molecular imaging and functional genomics approaches in fly and human cellular models. During this presentation, I will provide an update in our efforts i) to assess the global cellular distribution properties of the human and fly transcriptomes using fractionation-sequencing methodologies, ii) to systematically map the cytotopic distribution properties of human RNA binding proteins, and iii) to characterize perturbations in post-transcriptional regulatory pathways that may be linked to disease aetiology.

5:40 PM-6:00 PM
iRNA: concluding Remarks
Format: Live-stream

  • Yoseph Barash, University of Pennsylvania, United States
  • Michelle S Scott, Université de Sherbrooke, Canada
  • Klemens Hertel

Presentation Overview: Show

Wrap-up