iRNA

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CEST
Wednesday, July 26th
10:30-10:40
Introduction to iRNA COSI session
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Michelle Scott

  • Michelle Scott
10:40-11:20
Invited Presentation: Methods for investigating the dancing transcriptome
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Michelle Scott

  • Irmtraud Meyer


Presentation Overview: Show

Many computational studies assume a rather static view of the transcriptome. In vivo, however, the transcriptome is rather dynamic, with transcripts assuming and expressing a variety of different functional roles depending on their specific molecular environment at a given space and time in the living cell. I will introduce several computational methods that allow us to investigate some of these features in silico despite the obvious complexities in vivo.

11:20-11:40
LinearDesign: Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity (Nature paper)
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Irmtraud Meyer

  • He Zhang, Baidu Research, United States
  • Liang Zhang, Baidu Research USA (formerly), United States
  • Ang Lin, China Pharmaceutical University, China
  • Congcong Xu, StemiRNA Therapeutics, China
  • Hangwen Li, StemiRNA Therapeutics, China
  • David Mathews, University of Rochester, United States
  • Yujian Zhang, StemiRNA Therapeutics (formerly), United States
  • Liang Huang, Oregon State University, United States


Presentation Overview: Show

(To Appear in Nature)

Messenger RNA (mRNA) vaccines are being used to contain COVID-19, but still suffer from the critical limitation of mRNA instability and degradation, which is a major obstacle in the storage, distribution, and efficacy of the vaccine products. Previous work showed that increasing secondary structure lengthens mRNA half-life, which, together with optimal codons, improves protein expression. Therefore, a principled mRNA design algorithm must optimize both structural stability and codon usage. However, due to synonymous codons, the mRNA design space is prohibitively large (e.g., ~10^632 candidates for the SARS-CoV-2 Spike protein). Here we provide a simple and unexpected solution using a classical concept in computational linguistics, where finding the optimal mRNA sequence is akin to identifying the most likely sentence among similar sounding alternatives. Our algorithm takes only 11 minutes for the Spike protein, and can jointly optimize stability and codon usage. On both COVID-19 and VZV vaccines, LinearDesign substantially improves mRNA half-life and protein expression, and dramatically increases antibody titer by up to 128× in vivo, compared to the codon-optimization benchmark. This surprising result reveals the great potential of principled mRNA design, and enables the exploration of previously unreachable but highly stable and efficient designs.

11:40-12:00
Towards In-Silico CLIP-seq: Predicting Protein-RNA Interaction via Sequence-to-Signal Learning
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Irmtraud Meyer

  • Marc Horlacher, Helmholtz Center Munich, Germany
  • Nils Wagner, Department of Informatics, Technical University of Munich, Germany
  • Lambert Moyon, Helmholtz Center Munich, Germany
  • Klara Kuret, The Francis Crick Institute, London, UK, United Kingdom
  • Nicolas Goedert, Helmholtz Center Munich, Germany
  • Marco Salvatore, Department of Biology, University of Copenhagen, Denmark
  • Jernej Ule, The Francis Crick Institute, London, UK, United Kingdom
  • Julien Gagneur, Department of Informatics, Technical University of Munich, Germany
  • Ole Winther, Department of Biology, University of Copenhagen, Denmark
  • Annalisa Marsico, Helmholtz Center Munich, Germany


Presentation Overview: Show

RNA-binding proteins (RBPs) play a vital role in post-transcriptional regulation, including RNA modification, stabilization, localization, and translation. Knowing their RNA targets and binding preferences is important for understanding the mechanisms of post-transcriptional regulation and their implications in human diseases.
We present RBPNet, a novel deep learning method, which, given an RNA sequence, predicts the CLIP-seq signal distribution at single-nucleotide resolution. RBPNet utilizes a dilated convolutional neural network (CNN) architecture and achieves high generalization performance on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. Crucially, RBPNet directly operates on the raw CLIP-seq signal, eliminating the need for preprocessing steps, such as peak calling. RBPNet performs bias correction by modeling the CLIP-seq signal as a mixture of the protein-specific and control signal, thus mitigating technical biases which may otherwise hinder downstream analysis. Through model interrogation via Integrated Gradients feature importance scores, RBPNet identifies predictive sub-sequences corresponding to known and novel binding motifs. Using in silico mutagenesis, RBPNet scores the impact of single-nucleotide variants on RBP-binding, thus aiding in prioritizing potentially disease-causing variants.
RBPNet is the first method to directly model the raw CLIP-seq signal at nucleotide-resolution, thus improving both computational inference of protein-RNA interaction and interpretation of predictions over the state-of-the-art.

12:00-12:20
Probabilistic models of RNA•DNA:DNA triplex formation accurately predict genome-wide RNA-DNA interactions
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Irmtraud Meyer

  • Timothy Warwick, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany
  • Sandra Seredinski, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany
  • Nina M. Krause, Goethe University Frankfurt, Institute for Organic Chemistry and Chemical Biology, Germany
  • Jasleen Kaur Bains, Goethe University Frankfurt, Institute for Organic Chemistry and Chemical Biology, Germany
  • Harald Schwalbe, Goethe University Frankfurt, Institute for Organic Chemistry and Chemical Biology, Germany
  • Matthias S. Leisegang, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany
  • Marcel H. Schulz, Goethe University Frankfurt, Institute for Cardiovascular Regeneration, Germany
  • Ralf P. Brandes, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany


Presentation Overview: Show

Background:

RNA•DNA:DNA triple helix (triplex) formation enables RNA transcripts to modulate local chromatin environment. Molecular detection of triplex formation is complex, making computational prediction of triplex formation important. Previous predictive methods relied upon Hoogsteen base pairing. We explored whether machine learning in conjunction with triplex-sequencing data generated in vitro could improve prediction of triplex formation.

Methods:

Triplex-enriched DNA and RNA motifs were identified from unpaired triplex-sequencing data, and input to an Expectation-Maximisation algorithm which learned probabilistic matrices linking sets of DNA and RNA motifs. Matrix error was calculated per iteration, and minimised. Final matrices and motif sets were output upon minimisation. Output matrices were implemented as score matrices in the local alignment program TriplexAligner, which uses Karlin-Altschul statistics to predict triplex formation between user-provided DNA and RNA.

Results:

TriplexAligner significantly outperformed previously published methods in the accurate recall of genome-wide RNA-DNA interactions identified by RADICL-sequencing or RedC, as well as specific interactions of lncRNA SARRAH. Predicted triplex DNA and RNA sequences were evaluated biophysically, and appeared to form valid triplex.

Outlook:

DNA-RNA pairing rules learned from triplex-sequencing data accurately predict RNA-DNA interactions. Applications of TriplexAligner could elucidate mechanisms of RNA action and potential importance of triplex formation.

12:20-12:30
On the predictibility of A-minor motifs from their local contexts
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Irmtraud Meyer

  • Coline Gianfrotta, DAVID, UVSQ, Université Paris-Saclay, France & LISN, CNRS, Univ. Paris-Saclay, France, France
  • Vladimir Reinharz, Université du Québec à Montréal, Canada
  • Olivier Lespinet, I2BC, UMR 9198, CNRS - Univ. Paris-Saclay, France
  • Dominique Barth, UVSQ, France
  • Alain Denise, LISN, CNRS, Univ. Paris-Saclay & I2BC, UMR 9198, CNRS - Univ. Paris-Saclay, France


Presentation Overview: Show

Our study concerns the classification and prediction of type I/II A-minor motifs in RNA 3D structures. It investigates the importance of several kinds of structural context in the formation of this motif, such as the 3D substructure around a motif. Our purpose is to determine what kind of information in the structural context can be useful to characterize and predict the presence and the position of these motifs.

Firstly, we develop an automated method to classify and characterize A-minor motifs according to their 3D context similarities. Secondly, we model the topological context of A-minor motifs and of their classes by graphs, and use it to study predictive ability of A-minor motifs, knowing only the topological context and sequence information.

We thus uncovered new subclasses of A-minor motifs according to their local 3D similarities. Most classes are composed of homologous occurrences, but some are composed of non-homologous occurrences, which could suggest an evolutive convergence. We also showed that, for some A-minor motifs, the topology combined with a sequence signal is sufficient to predict their presence and position. In most other cases, these signals are not sufficient for predicting A-minor motifs, however we show that they are interesting signals for this purpose.

13:50-14:10
Proceedings Presentation: RNA Design via Structure-Aware Multi-Frontier Ensemble Optimization
Room: Pasteur Lounge
Format: Live-stream

Moderator(s): Julien Gagneur

  • Tianshuo Zhou, Oregon State University, United States
  • Ning Dai, Oregon State University, United States
  • Sizhen Li, Oregon State University, United States
  • Max Ward, University of Western Australia, Australia
  • David H. Mathews, University of Rochester Medical Center, United States
  • Liang Huang, Oregon State University, United States


Presentation Overview: Show

Motivation: RNA design is the selection of a sequence or set of sequences that will fold to desired structure, i.e. the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, and such problem is even worse for long sequence design. Additionally, for many methods only a small number of sequences satisfying MFE criterion can be found by each design. Those drawbacks limit their use in both applications and advanced research.
Results: We propose an innovative optimization paradigm SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) and yields successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation and updating. Our work, though less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our proposal solves the most Eterna100 puzzles among all the optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on human rules. Surprisingly, our approach shows great superiority on designing long sequences when using structures adapted from the database 16S Ribosomal RNAs.

14:10-14:20
A systematic benchmark of machine learning methods for protein-RNA interaction prediction
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Julien Gagneur

  • Marc Horlacher, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Giulia Cantini, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Julian Hesse, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Patrick Schinke, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Nicolas Goedert, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Shubhankar Londhe, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Lambert Moyon, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Annalisa Marsico, Computational Health Center, Helmholtz Center Munich, Germany, Germany


Presentation Overview: Show

RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation, for which profiling of binding sites on targets can be experimentally evaluated in vivo. As such profiles are limited to the transcripts expressed in the experimental cell type, numerous machine-learning based methods have been developed to infer missing binding information. However, heterogeneity of training and evaluation datasets across various sets of RBPs and CLIP-seq protocols prevents a direct comparison of their performance.

To address this, we systematically benchmarked 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluated the predictive performance of these methods and assessed the contribution of neural network architectures and input modalities on model performance. We show that the negative sampling strategy significantly affects model performance, with notably a strong impact on secondary structure’s contribution. Additionally, we show various degrees of performance degradation in cross-cell-type prediction settings, with some models being more sensitive, reflecting RBP-specific context dependence.

We believe that this study will guide future methods development in the field of computational modeling of protein-RNA interaction by serving as a reference for method design in regards to architecture, input modalities and generation of negative controls.

14:20-14:40
miRarmature: a time series analysis framework for paired miRNA and RNA-seq data reveals new regulatory dynamics
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Julien Gagneur

  • Ranjan Kumar Maji, Goethe University and Uniklinikum Frankfurt, Germany
  • Eva-Maria Rogg, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
  • Ariane Fisher, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
  • Gilles Gasparoni, Epigenetics, Saarland Universtiy, Saarbrücken, Germany
  • Melanie Möller, Molecular Cell Biology & Microbiology, Wuppertal University, Wuppertal, Germany
  • Martin Simon, Molecular Cell Biology & Microbiology, Wuppertal University, Wuppertal, Germany
  • Stefanie Dimmeler, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
  • Marcel H. Schulz, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany


Presentation Overview: Show

MicroRNAs (miRNAs) are the most versatile small RNA regulators in the cell. Cellular stress causes altered canonical miRNA processing by their biogenesis factors which in turn causes the generation of their isomiRic (modified miRNA) forms.
In our study, we perform paired small and ribo-depleted RNA-seq from induced AMI stress experiments, within 4 major cell types of heart at 6-time points. We observe significant dynamics in expression patterns of mature miRNA 3p and 5p arms. In this context, we investigate associations of miRNA target gene regulations with the dynamics of mature miRNA arm ratios (3p / 5p) at every time point. Further, we assess the isomiR expression ratios, their biogenesis factors (Drosha, Dicer, etc.), and TDMD (Target RNA-Directed MicroRNA Degradation) factors (Dis3l2, Tents, etc.) as plausible associative causes for canonical arm ratio changes. Finally, we predict co-regulated functional pairs as an effect of changing miRNA arm ratios and changing median expression of the target genes within functional gene sets. We implement this novel analysis framework in miRarmature, an R package to enable systematic investigation of the interesting dynamics of these differentially processed miRNAs, their associated factors, and predicted functional roles with statistical inferences and visualizations from time series experiments.

14:40-15:00
High-throughput analysis of microRNA-binding thermodynamics and kinetics by RNA Bind-n-Seq (RBNS)
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Julien Gagneur

  • Karina Jouravleva, RNA Therapeutics Institute, University of Massachusetts Medical School, United States
  • Joel Vega-Badillo, RNA Therapeutics Institute, University of Massachusetts Medical School, United States
  • Phillip Zamore, RNA Therapeutics Institute, University of Massachusetts Medical School, United States


Presentation Overview: Show

Predicting the regulatory consequences of molecular interactions based solely on their measured biochemical properties is a long-standing goal in biology. The strength of an interaction is quantitatively described by its equilibrium dissociation constant, the substrate concentration required for half-maximal complex formation. Interactions with the same affinity may arise from different kinetic behaviors that may vary by orders of magnitude: one set of interactions may be driven by rapid recognition and binding, while another may be driven by slower association but increased complex stability. Thus, the affinities and the dynamics of molecular interaction provide critical information for developing quantitative models of a regulatory network. RBNS determines the specificity of proteins for nucleic acids in vitro using a single-step binding assay and a high-throughput sequencing readout, making the method widely accessible and cost-effective. Here, we present an analytical strategy to estimate absolute binding affinities from RBNS data, extend RBNS to kinetic studies, and develop a framework to compute relative association and dissociation rate constants. Using these approaches, we have begun to define quantitative targeting rules for individual microRNAs (miRNAs) bound to mammalian Argonaute proteins. Our data may facilitate the development of siRNA, miRNA, and antagomir therapeutics with high potency and target specificity.

15:00-15:20
Molecular function of the non-coding RNAs snord116 involved in Prader Willi syndrome
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Julien Gagneur

  • Laeya Baldini, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Hélène Marty-Capelle, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Anne Robert, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Bruno Charpentier, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Stéphane Labialle, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France


Presentation Overview: Show

C/D-box small nucleolar RNAs (SNORDs) classically direct the post-transcriptional methylation of nucleosides in ribosomal RNAs, small nuclear RNAs, and transfer RNAs. However, the human genome produces numerous orphan SNORDs that lack the ability to interact with classical RNA targets and whose function is poorly understood. The eutherian-specific, orphan SNORD116 genes, which are organized in a large tandem repeat at 15q11-13, are strongly suspected to play a major role in the rare disease called Prader Willi syndrome (PWS), but their molecular function remains unknown. We combined phylogenetic and computational interaction analyses to reveal that a subset of snord116 copies use an antisense element, which is typically involved in classical target recognition, to interact with messenger RNA (mRNA) targets. Target status was confirmed by transient knockdown and compensation experiments in human and mouse cells. To go further, we are working to characterize the molecular mechanism of snord116 action and to identify the extent of their target repertoire. This combination of computational and experimental approaches expands the description of the molecular bases of PWS and opens new avenues for therapy. More generally, this approach could be considered for the functional characterization of other noncoding RNAs, in particular when expressed from multiple gene copies.

15:20-15:30
Poster flash talks
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Yoseph Barash

  • Multiple Authors
16:00-16:40
Invited Presentation: Calling and predicting aberrant splicing
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Yoseph Barash

  • Julien Gagneur


Presentation Overview: Show

Genetic variants causing aberrant splicing is a major cause of genetic disorders. However, identifying which genetic variants cause aberrant splicing has remained a daunting task. Here, I will present a series of machine learning algorithms to reliably detect aberrant splicing in RNA-sequencing samples as well as to predict those events from genomic and, optionally, transcriptomic data of clinically accessible tissues. Collectively, these results substantially contribute to non-coding loss-of-function variant identification and to genetic diagnostics design and analytics.

16:40-17:00
CamoTSS: analysis of alternative transcription start sites for cellular phenotypes and regulatory patterns from 5' scRNA-seq data
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Yoseph Barash

  • Ruiyan Hou, The University of Hong Kong, Hong Kong
  • Chung-Chau Hon, RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan, Japan
  • Yuanhua Huang, The University of Hong Kong, Hong Kong


Presentation Overview: Show

Five-prime single-cell RNA-seq (scRNA-seq) has been widely employed to profile cellular transcriptomes, however, its power of analysing transcription start sites (TSS) has not been fully utilised. Here, we present a computational method suite, CamoTSS, to precisely identify TSS and quantify its expression by leveraging the cDNA on read 1, which enables effective detection of alternative TSS usage. With various experimental data sets, we have demonstrated that CamoTSS can accurately identify TSS and the detected alternative TSS usages showed strong specificity in different biological processes, including cell types across human organs, the development of human thymus, and cancer conditions. As evidenced in nasopharyngeal cancer, alternative TSS usage can also reveal regulatory patterns including systematic TSS dysregulations.

17:00-17:10
TSS-Captur - A Transcription Starting Site-based Characterization Pipeline for Transcribed but Unclassified Prokaryotic RNA transcripts
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Yoseph Barash

  • Mathias Witte Paz, Institute for Bioinformatics and Medical Informatics - University of Tübingen, Germany
  • Kay Nieselt, Institute for Bioinformatics and Medical Informatics - University of Tübingen, Germany


Presentation Overview: Show

RNA-seq and its modified enriched-based methods, such as differential RNA-seq, have enabled the base-exact identification of transcription starting sites (TSS) and have improved gene expression analysis. However, some TSSs cannot be associated with known annotated genes, thus called orphan TSSs. Hence, characterizing transcripts starting at these positions seems to be challenging for existing computational annotation pipelines. TSS-Captur, a novel pipeline, uses different computational approaches to characterize transcripts starting from experimentally confirmed orphan TSSs with a specific focus on non-coding RNA gene characterization. TSS-Captur uses two methods to classify extracted transcripts into coding or non-coding genes and predicts for each putative transcript their transcription termination sites. For each predicted ncRNA gene, the secondary structure is computed. Furthermore, putative promoter regions are analyzed for the existence of known transcription regulation motifs. The results are presented in an interactive interface for easy exploration. TSS-Captur was tested on Streptomyces coelicolor data and successfully characterized unlabeled ncRNAs overlooked by common genome annotation pipelines. Also, TSS-Captur characterized more unannotated transcripts in greater detail when compared to another similar pipeline. In summary, starting from experimental TSS data, TSS-Captur predicts the characterization of unclassified signals and complements prokaryotic annotation tools, contributing to the understanding of bacterial transcriptomes.

17:10-17:30
Poster flash talks
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Yoseph Barash

  • Multiple Authors
17:30-18:00
Extra time for poster viewing
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Yoseph Barash

Thursday, July 27th
8:30-8:50
Machine learning methods for decoding subcellular RNA organization from spatial transcriptomics data
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Maayan Salton

  • Clarence Mah, University of California San Diego, United States
  • Noorsher Ahmed, University of California San Diego, United States
  • Gene Yeo, University of California San Diego, United States
  • Hannah Carter, University of California San Diego, United States
  • Nicole Lopez, University of California San Diego, United States


Presentation Overview: Show

The spatial organization of molecules in a cell is essential for performing their functions. Spatial transcriptomics technologies have opened the door to characterization of cellular and subcellular organization. While current computational methods focus on discerning tissue architecture, cell-cell interactions and spatial expression patterns, these approaches are limited to investigating spatial variation at the multicellular scale. We present Bento, a Python toolkit that fully takes advantage of single-molecule information to enable spatial analysis at the subcellular scale. Bento ingests molecular coordinates and segmentation boundaries to perform three fundamental analyses: defining subcellular domains, annotating localization patterns, and quantifying gene-gene colocalization. To demonstrate the toolkit, we apply these methods to a variety of datasets including U2-OS cells (MERFISH), 3T3 cells (seqFISH+), and treated cardiomyocytes (Molecular Cartography). We quantify RNA localization changes in cardiomyocytes identifying mRNA depletion of critical cardiac disease-associated genes RBM20 and CACNB2 from the endoplasmic reticulum upon doxorubicin treatment. The Bento package is a member of the open-source Scverse ecosystem, enabling integration with other single-cell omics analysis tools.

8:50-9:10
CRISPRon-ABE: Enhanced CRISPR adenine base editing design from data generation and deep learning
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Maayan Salton

  • Ying Sun, University of Copenhagen, Denmark
  • Kunli Qu, Lars Bolund Institute of Regenerative Medicine / Aarhus University, Denmark
  • Giulia Corsi, University of Copenhagen, Denmark
  • Christian Anthon, University of Copenhagen, Denmark
  • Xiaoguang Pan, Lars Bolund Institute of Regenerative Medicine / Copenhagen University, Denmark
  • Yonglun Luo, Lars Bolund Institute of Regenerative Medicine / Aarhus University, Denmark
  • Jan Gorodkin, University of Copenhagen, Denmark


Presentation Overview: Show

CRISPR base editing holds the promise to overcome the DNA repair challenges observed in “traditional” CRISPR-based gene editing. In base editing, for example the Adenine Base Editor (ABE) all As in an editing window of ~8 nt inside the guide RNA (gRNA) can with different frequencies be changed to Gs. The design task is then to predict in part with which frequency these A to G outcomes appear and to predict the gRNA efficiency itself. We generated new data which roughly doubled the ~5,000 efficient gRNAs in the public domain and used these ~10,000 gRNAs to construct a deep learning-based ABE gRNA predictor CRISPRon-ABE. We find simultaneous evaluation of gRNA efficiency and the outcome frequency to be the most meaningful. We use the RK (K=2) correlation coefficient for that. We tested schemes from data fusing, transfer learning and dual training on both data sets and evaluated the performances on test/held out sets from both sources. Overall, the performances benefit from the data and model fusing, for example on the existing test set we obtain R2=0.82 clearly exceeding R2=0.72 by the current best method both evaluated on the same independent test set.

9:10-9:30
Kinetic barcoding: A novel tool to estimate multi-temporal RNA biogenesis kinetics
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Maayan Salton

  • Ezequiel Calvo-Roitberg, UMass Chan Medical School, United States
  • Adam Hedger, UMass Chan Medical School, United States
  • Jonathan K Watts, UMass Chan Medical School, United States
  • Athma A Pai, UMass Chan Medical School, United States


Presentation Overview: Show

mRNA production speed is determined by the time it takes to transcribe and process pre-mRNA molecules. Methods for measuring RNA maturation involve sequencing RNA intermediates at different time points. However, current techniques have limited temporal resolution, which makes it difficult to measure very fast biogenesis rates. Additionally, these methods do not allow to measure the variability in elongation rates within the same gene. To address these issues, we developed "kinetic barcoding" that involves stepwise labeling of nascent RNA with multiple nucleosides to measure multiple time points within a single sequencing library. By sequentially adding 5-ethynyl-uridine (5eU), 6-thio-guanosine (6sG), and 4-thio-uridine (4sU) at different time points we can measure nascent RNA intermediates at multiple time scales in a single experiment. We isolate nascent RNA by biotinylating and pulling down 5eU-labeled RNA, followed by alkylation of the 6sU and 4sU thiol groups to generate nucleotide-specific substitutions where these nucleotides were incorporated. This allows us to distinguish molecules transcribed during the first, second, and final labeling windows by their substitution patterns. We applied this kinetic barcoding approach to measure transcription elongation rates in human cells and showed that it provides increased temporal resolution for measuring the variation in elongation rates between genes.

10:00-10:20
grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Athma Pai

  • Teresa Rummel, Institute of Virology and Immunobiology, Julius-Maximilians-Universität, Würzburg, Germany
  • Lygeri Sakellaridi, Institute of Virology and Immunobiology, Julius-Maximilians-Universität, Würzburg, Germany
  • Florian Erhard, Computational Immunology, Universität Regensburg, Regensburg, Germany


Presentation Overview: Show

Metabolic RNA labeling is a powerful method to investigate the temporal dynamics of gene expression. The introduction of nucleotide conversion RNA-seq, such as SLAM-seq, has greatly facilitated the experimental effort but has also brought new challenges for data analysis. Another layer of complexity is added when elaborate experimental designs such as time courses with multiple genotypes and treatments are required to answer complex research questions. Yet, appropriate computational tools for analyzing this kind of data are lacking. To address this need, we developed grandR, a comprehensive toolkit for the analysis of nucleotide conversion RNA-seq data.
With our software we also introduce new quality control measures to exclude effects of 4sU on transcription and describe the need for recalibration of effective labeling times that would otherwise bias results. grandR enables researchers to perform differential gene expression analysis and estimate synthesis and degradation rates for both progressive labeling as well as snapshot experiments. Additionally, our software provides a web-based interface for exploratory data analysis.
grandR represents a significant advance in the analysis of nucleotide conversion RNA-seq data, enabling researchers to gain deeper insights into the temporal dynamics of gene expression and accelerating progress in many areas of biomedical research.

10:20-11:00
Invited Presentation: Decoding the Masters of Gene Expression: Unraveling the Influence of Promoters and Enhancers on Alternative Splicing
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Athma Pai

  • Aveksha Sharma, Hebrew University, Israel
  • Sara Dahan, Hebrew University, Israel
  • Maayan Salton, Hebrew University, Israel


Presentation Overview: Show

Traditionally perceived as regulators of gene expression, promoters and enhancers are now recognized to possess a more intricate role. This talk focuses on their emerging significance in alternative splicing, a key regulatory step in gene expression. Through our research, we challenge the conventional understanding and explore the transformative impact of promoters and enhancers on alternative splicing outcomes. By investigating chromatin structure and manipulating it at these regulatory elements, we uncover their dynamic influence on RNA polII elongation rates, resulting in consequential alterations in alternative splicing patterns.

11:00-11:20
Discovery of new immunotherapy targets in cancer from transcriptomic data
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Athma Pai

  • Mathieu Quesnel-Vallières, University of Pennsylvania, United States
  • Caleb Radens, University of Pennsylvania, United States
  • Jacinta Davis, Children's Hospital of Philadelphia, United States
  • Katharina Hayer, Children's Hospital of Philadelphia, United States
  • Kristen Lynch, University of Pennsylvania, United States
  • Andrei Thomas-Tikhonenko, Children's Hospital of Philadelphia, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

The use of engineered T cells or antibodies to specifically target tumors is a recent form of immunotherapy that has revolutionized the treatment of certain cancer types. Unfortunately, these new technologies cannot be applied to most cancers because there is a lack of cancer-specific molecules to target. Splicing variations abound in cancer and happen in a manner that distinguishes tumors from normal cells. In order to discover new immunotherapy targets, we developed a computational pipeline that relies on large cohorts of short-read RNA sequencing (RNA-Seq) to detect molecular patterns that are enriched or exclusively found in cancer and not normal tissues. The pipeline surveys and quantifies splicing variations and gene expression in individual cancer samples and compares them to an array of over 7,000 samples covering 86 normal tissues and blood cell types. Detection is then followed by targeted long-reads sequencing for full isoform validation. We employed the pipeline to identify between 53 and 133 cancer-specific splice junctions in four types of pediatric cancers. Our method can be applied to any cancer dataset for which RNA-Seq data is available and, within a few days, generates a list of candidate immunotherapy targets that can then be validated in the lab.

11:20-11:40
Nutritional Control of Splicing Fidelity Contributes to Methionine Dependence of Cancer
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Athma Pai

  • Francisco G Carranza, UC Irvine, United States
  • Da-Wei Lin, UC Irvine, United States
  • Stacey Borrego, UC Irvine, United States
  • Linda Lauinger, UC Irvine, United States
  • Peter Kaiser, UC Irvine, United States
  • Klemens Hertel, University of California, Irvine, United States


Presentation Overview: Show

Many cancer cells depend on exogenous methionine for proliferation, whereas non-tumorigenic cells can divide in media supplemented with the metabolic precursor homocysteine. This phenomenon is known as methionine dependence of cancer or Hoffman effect. The underlying mechanisms for this cancer specific metabolic addiction are unknown. Here we find that methionine dependence is associated with severe dysregulation of pre-mRNA splicing. When cultured in homocysteine medium, cancer cells failed to efficiently methylate the spliceosomal snRNP component SmD1, which resulted in reduced binding to the Survival-of-Motor-Neuron protein SMN leading to aberrant splicing. These effects were specific for cancer cells as neither Sm protein methylation nor splicing fidelity was affected when non-tumorigenic cells were cultured in homocysteine medium. Sm protein methylation is catalyzed by Protein Arginine Methyl Transferase 5 (Prmt5), and reducing methionine concentrations in the culture medium sensitized cancer cells to Prmt5 inhibition. These results mechanistically connect splicing fidelity to nutrient availability in cancer cells.

11:40-12:00
Analyzing human population differences in alternative splicing at single-cell resolution
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Athma Pai

  • Rubén Chazarra-Gil, Barcelona Supercomputing Center, Spain
  • Marta Melé-Messeguer, Barcelona Supercomputing Center, Spain
  • Martin Hemberg, Harvard Medical School, United States


Presentation Overview: Show

Alternative splicing is a crucial mechanism for gene expression regulation and a major generator of proteome diversity. Alternative splicing has been mostly studied in bulk RNA-seq data, thus masking cellular heterogeneity. Single-cell RNA-seq can reveal cell type and trajectory-specific AS patterns leveraging from large cell numbers and fine grained cell type identification. However, commonly used droplet-based technologies for single-cell RNA-seq pose challenges due to high data sparsity and positional bias in sequencing reads across mRNA transcripts.
Here, we conduct a benchmark study on transcript quantification and differential splicing analysis using simulated 3’ scRNA-seq data. We evaluate several methods designed for AS analysis both for single-cell and bulk RNA-seq. Our results show that, in general, most methods have high precision but limited recall in transcript quantification due to undetected lowly expressed features in single-cells, which can be improved through pseudo-bulking.
We perform differential transcript usage and percent spliced-in analyses in a PBMC dataset from individuals of different genetic backgrounds. We detect population-specific splicing differences in the context of influenza virus infection across several cell types. Overall, our study highlights the strengths and limitations of alternative splicing characterization in 3’ scRNAseq data while providing insights into splicing differences between populations.

13:20-13:30
Detecting aberrant splicing events in isolated patient samples using short read RNA-seq with SAMI
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Hagen Tilgner

  • Sylvain Mareschal, NGS-HCL platform, Bioinformatics group, Hospices Civils de Lyon, Bron, France
  • Valentin Wucher, NGS-HCL platform, Bioinformatics group, Hospices Civils de Lyon, Bron, France
  • Sarah Huet, Hematology laboratory, Hospices Civils de Lyon, Pierre-Bénite, France
  • Camille Léonce, Anatomical pathology department, Hospices Civils de Lyon, Bron, France
  • Kaddour Chabane, Hematology laboratory, Hospices Civils de Lyon, Pierre-Bénite, France
  • Sandrine Hayette, Hematology laboratory, Hospices Civils de Lyon, Pierre-Bénite, France
  • Pierre-Paul Bringuier, Anatomical pathology department, Hospices Civils de Lyon, Bron, France
  • Stéphane Pinson, Genetic department, Hospices Civils de Lyon, Bron, France
  • Marc Barritault, Anatomical pathology department, Hospices Civils de Lyon, Bron, France
  • Claire Bardel, NGS-HCL platform, Bioinformatics group, Hospices Civils de Lyon, Bron, France


Presentation Overview: Show

Despite RNA-sequencing solutions have been around for 15 years, successfully replacing micro-arrays for gene expression, its potential to analyze splicing is still to be achieved in clinical settings. Powerful tools were developed to quantify known isoforms (RSEM, Kallisto), compare sample groups (rMATS) or one sample to a population (MAJIQ), but most fail to address the most common situation in routine diagnostics: non-recurring events in scarce samples. We developed in this intent an UMI-aware Nextflow pipeline named SAMI (Splicing Analysis with Molecular Indexes), including original scripts to detect and plot unannotated splicing junctions.

In a commercial control sequenced in various conditions with a 17.6 kb lung-cancer panel on MiSeq, 13/13 MET exon 14 and 12/13 EGFRvIII exon 2-7 skips were found supported by at least 3 distinct RNA molecules. In a second 1.12 Mbp multi-cancer panel sequenced on NextSeq, aberrant splicing events matching DNA mutations predicted to impact splicing were found in 5 cases, confirming further a good sensitivity toward expected events. In a third 2.44 Mbp onco-hematological panel, the preferential inclusion of EZH2 “poison exon” in SRSF2-mutated myelodysplastic syndromes could be confirmed, illustrating the pipeline suitability for cohort comparison as well. Finally, application to full-blown whole-transcriptome data (120 million reads) validated the pipeline scalability.

13:30-13:40
A Unified MAJIQ-L View of Transcriptome Complexity from Short and Long RNA-seq Reads
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Hagen Tilgner

  • Seong Woo Han, University of Pennsylvania, United States
  • San Jewell, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

High-throughput short-read RNA sequencing has given researchers unprecedented detection and quantification capabilities of splicing variations across biological conditions and disease states. However, short-read technology is limited in its ability to identify which isoforms are responsible for the observed sequence fragments and how splicing variations across a gene are related. In contrast, more recent long-read sequencing technology offers improved detection of underlying full or partial isoforms but is limited by high error rates and throughput, hindering its ability to accurately detect and quantify all splicing variations in a given condition.

To better understand the underlying isoforms and splicing changes in a given biological condition, it’s important to be able to combine the results of both short and long-read sequencing, together with the annotation of known isoforms. To address this need, we develop MAJIQ-L, a tool to visualize and quantify splicing variations from multiple data sources. MAJIQ-L combines transcriptome annotation, long reads based isoform detection tools output, and MAJIQ (Vaquero-Garcia et al. (2016, 2023)) based short-read RNA-Seq analysis of local splicing variations (LSVs). We analyze which splice junction is supported by which type of evidence (known isoforms, short-reads, long-reads), followed by the analysis of matched short and long-read human cell line datasets.

13:40-14:00
Evidence for the role of transcription factors in the co-transcriptional regulation of intron retention
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Hagen Tilgner

  • Fahad Ullah, Colorado State University, United States
  • Saira Jabeen, Department of Computer Science, Colorado State University, United States
  • Maayan Salton, Hebrew University of Jerusalem, Israel
  • Anireddy SN Reddy, Department of Biology, Colorado State University, United States
  • Asa Ben-Hur, Department of Computer Science, Colorado State University, United States


Presentation Overview: Show

The recent discovery that the majority of splicing is co-transcriptional has led to the finding that chromatin state affects alternative splicing. Therefore, it is plausible that transcription factors can regulate splicing outcomes. In this talk we review our recent published results that provide strong evidence for the hypothesis that transcription factors are involved in the regulation of intron retention by studying regions of open chromatin. Using deep learning models designed to distinguish between regions of open chromatin in retained introns and non-retained introns, we identified motifs enriched in IR events. Most of the motifs learned by the network had significant hits to known human transcription factors--more so than RNA-binding proteins. Our model predicts that the majority of transcription factors that affect intron retention come from the zinc finger family. We demonstrate the validity of these predictions using ChIP-seq data for multiple zinc finger transcription factors and find strong over-representation for their peaks in intron retention events. This work opens up opportunities for further studies that elucidate the mechanisms by which transcription factors affect intron retention and other forms of splicing.

14:00-14:20
Pre-mRNA splicing order across long multi-intronic transcripts
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Hagen Tilgner

  • Karine Choquet, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States
  • Autum R. Baxter-Koenigs, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States
  • Brendan M. Smalec, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States
  • L. Stirling Churchman, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States


Presentation Overview: Show

Pre-mRNA splicing requires the excision of multiple introns within the same nascent transcript. Combinatorically, the order of intron excision could proceed down thousands of different paths, each of which would expose different landscapes of cis-elements that contribute to alternative splicing (AS). How intron splicing proceeds across human pre-mRNAs is not well understood due to technical limitations in the quantitative analysis of long RNA molecules. Here, we investigated post-transcriptional splicing order in human cells using direct RNA nanopore sequencing. We found that multi-intron splicing order is not stochastic, but largely pre-determined, with most genes using only a few predominant splicing orders to reach a fully spliced transcript. Strikingly, splicing orders were conserved across cell types and during motor neuron differentiation, indicating that for the studied introns, splicing order is fixed. Moreover, splicing orders did not change to accommodate AS, rather introns flanking alternatively spliced exons during differentiation were largely excised last, after their neighboring introns. Interestingly, sequencing of human lymphoblast cell lines from different individuals revealed several examples of allele-specific splicing order, suggesting that genetic sequence contributes to splicing order determination. Together, our results demonstrate that multi-intron splicing order is predetermined in human cells and is partially regulated by RNA sequence.

14:20-14:40
Deregulation of Epigenetic Marks is Associated with Differential Exon Usage of Developmental Genes
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Hagen Tilgner

  • Hoang Thu Trang Do, Universität des Saarlandes, Germany
  • Siba Shanak, Arab American University, Palestine
  • Ahmad Barghash, German Jordanian University, Jordan
  • Volkhard Helms, Universität des Saarlandes, Germany


Presentation Overview: Show

Alternative splicing generates a vast variety of splice isoforms which, at the protein level, often give rise to distinctively different functions in cells. Remarkably, splicing decisions have been occasionally associated with proximal epigenetic marks on the DNA. In this study, we investigated the landscape of alternative splicing and histone marks at exon boundary regions on a genome-wide level, while excluding histone modification's effects on transcription. We considered data for 11 different human adult tissues and for 8 cultured cells available from the Human Epigenome Atlas. The tool DEXSeq was used to identify differential exon usage and MANorm for characterizing deregulated histone marks. We aimed to identify so-called "epispliced" genes where exon usage and histone marks at the exon flanks show concerted differential changes between two specific tissue/cell types. On a global level, we found a statistically significant association of these two features and it is enriched in multiple subgroups of developmental processes. "Epispliced" genes were particularly detected in cell lines associated with early embryonic development. Functional enrichment analysis showed that "epispliced" genes are often annotated with developmental or metabolic processes. We suggest that connecting alternative splicing with epigenetic may be one general means of establishing cell fate.

14:40-15:00
Transcriptome sequencing suggests that pre-mRNA splicing counteracts widespread intronic cleavage and polyadenylation
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Hagen Tilgner

  • Maria Vlasenok, Skolkovo Institute of Science and Technology, Russia
  • Dmitri Pervouchine, Skolkovo Institute for Science and Technology, Russia


Presentation Overview: Show

Alternative splicing (AS) and alternative polyadenylation (APA) are two crucial steps in the post-transcriptional regulation of eukaryotic gene expression. Protocols capturing and sequencing RNA 3'-ends have uncovered widespread intronic polyadenylation (IPA) in normal and disease conditions, where it is currently attributed to stochastic variations in the pre-mRNA processing. Here, we took advantage of the massive amount of RNA-seq data generated by the Genotype Tissue Expression project (GTEx) to simultaneously identify and match tissue-specific expression of intronic polyadenylation sites with tissue-specific splicing. A combination of computational methods including the analysis of short reads with non-templated adenines revealed that APA events are more abundant in introns than in exons. While the rate of IPA in composite terminal exons and skipped terminal exons expectedly correlates with splicing, we observed a considerable fraction of IPA events that lack AS support and attributed them to spliced polyadenylated introns (SPI). We hypothesize that SPIs represent transient byproducts of a dynamic coupling between APA and AS, in which the spliceosome removes the intron while it is being cleaved and polyadenylated. These findings indicate that cotranscriptional pre-mRNA splicing could serve as a rescue mechanism to suppress premature transcription termination at intronic polyadenylation sites.

15:30-15:50
Proceedings Presentation: isONform: reference-free transcriptome reconstruction from Oxford Nanopore data
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Klemens Hertel

  • Alexander Jürgen Petri, Department of Mathematics, Stockholm University, Sweden
  • Kristoffer Sahlin, Department of Mathematics, Stockholm University, Sweden


Presentation Overview: Show

With advances in long-read transcriptome sequencing, we can now fully sequence transcripts, which greatly improves our ability to study transcription processes. A popular long-read transcriptome sequencing technique is Oxford Nanopore Technologies (ONT), which through its cost-effective sequencing and high throughput, has the potential to characterize the transcriptome in a cell. However, due to transcript variability and sequencing errors, long cDNA reads need substantial bioinformatic processing to produce a set of isoform predictions from the reads. Several genome and annotation-based methods exist to produce transcript predictions. However, such methods require high-quality genomes and annotations and are limited by the accuracy of long-read splice aligners. In addition, gene families with high heterogeneity may not be well represented by a reference genome and would benefit from reference-free analysis. Reference-free methods to predict transcripts from ONT, such as RATTLE, exist, but their sensitivity is not comparable to reference-based approaches.

We present isONform, a high-sensitivity algorithm to construct isoforms from ONT cDNA sequencing data. The algorithm is based on iterative bubble popping on gene graphs built from fuzzy seeds from the reads. Using simulated, synthetic, and biological ONT cDNA data, we show that isONform has substantially higher sensitivity than RATTLE albeit with some loss in precision. On biological data, we show that isONform's predictions have much higher consistency with the annotation-based method StringTie2 compared to RATTLE. We believe isONform can be used both for isoform construction for organisms without well-annotated genomes and as an orthogonal method to verify predictions of reference-based methods.

15:50-16:20
Invited Presentation: Technologies for RNA isoform investigations across mouse brain development and brain regions as well as human brain structures
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Klemens Hertel

  • Hagen Tilgner


Presentation Overview: Show

To untangle cell-type isoform profiles in complex tissue, we developed ScISOr-Seq[1] (fresh tissues), SnISOr-Seq[2] (frozen tissue) and Sl-ISO-Seq[3] (spatial resolution). These long-read technologies are now aided by a single-molecule error view in PacBio/ONT[4] and an accurate long-read pipeline[5]. Collectively, we found differences between coordinated exon pairs with in-between exons (“distant”) and without in-between exons (“adjacent”): The former often distinguish cell types, the latter rarely – in mouse[1] and human brain[2]. Moreover, coordinated TSS-exon and exon-polyA-site pairs behave similarly to distant exon pairs[2]. Autism-associated (agreeing with prior research) and, albeit weaker, ALS-associated exons are cell-type specific[2]. Isoform regulation between bulk hippocampus and PFC usually arises from a precisely definable cell type (e.g., excitatory neurons) and region-specific isoforms correlate with brain-structure boundaries, while Snap25 regulates its isoforms in a gradient throughout the developing brain[3].
In unpublished work[6], we dissected the mouse brain across (i) development (P14, …, P56), (ii) brain regions (hippocampus, thalamus, visCTX, cerebellum, striatum) and (iii) cell types. For 75% of genes, full-length isoforms are regulated along >=1 of these three axes and additionally splicing varies strongly between cell types. However, genes related to neurotransmitter release/reuptake and synapse-turnover distinguish a fixed cell type by isoforms across anatomical regions. Developmental isoform regulation is stronger than across adult anatomical regions. As most cell-type specific exons in P56 mouse hippocampus behave similarly in new human hippocampal data, these principles may be extrapolated to human brain. However, human brains have evolved additional cell-type splicing specificity, suggesting gain-of-function isoforms. Taken together, we now possess a broad isoform-regulation view across brain development and anatomical regions.

1. Gupta*,Collier* et al, Nature Biotechnology, 2018
2. Hardwick*,Hu*, Joglekar* et al, Nature Biotechnology, 2022
3. Joglekar et al, Nature Communications, 2021
4. Mikheenko*,Prjibelski*, et al, Genome Research, 2022
5. Prjibelski*,Mikheenko*, et al, Nature Biotechnology, 2023
6. Joglekar et al, biorxiv, 2023

16:20-16:30
Poster prize and closing remarks
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Klemens Hertel

  • Klemens Hertel