Attention Presenters - please review the Presenter Information Page available here

iRNA sponsors

Schedule subject to change
All times listed are in EDT
Saturday, July 13th
10:40-10:50
Introduction to iRNA track
Room: 519
Format: In person

Moderator(s): Athma Pai


Authors List: Show

  • Athma Pai
10:50-11:30
Invited Presentation: SPLASH is a reference-free statistical algorithm, unifying biological discovery in RNA-seq, single cell sequencing and beyond
Confirmed Presenter: Julia Salzman, Stanford University, United States

Room: 519
Format: In person

Moderator(s): Athma Pai


Authors List: Show

  • Julia Salzman, Stanford University, United States

Presentation Overview: Show

Myriad mechanisms diversify the sequence content of RNA transcripts and are of great interest to single cell biology. Currently, these events are detected using tools that first require alignment to a necessarily incomplete reference genome alignment in the first step; this incompleteness is especially prominent in diseases such as cancer. Second, today the next step in analysis requires as a custom choice of bioinformatic procedure to follow it: for example, to detect splicing, RNA editing or V(D)J recombination among others. I will present collaborative work based on a new statistics-first analytic method —SPLASH (Statistically Primary aLignment Agnostic Sequence Homing)— that performs unified, reference-free inference directly on raw sequencing reads without a reference genome or cell metadata. SPLASH is highly efficient and simple to run. As a snapshot of SPLASH, applying to 10,326 primary human single cells in 19 tissues profiled with SmartSeq2, we discover a set of splicing and histone regulators with highly conserved intronic regions that are themselves targets of complex splicing regulation, unreported transcript diversity in the heat shock protein HSP90AA1, and diversification in centromeric RNA expression, V(D)J recombination, RNA editing, and repeat expansions missed by existing methods, as well as unpublished extensions to 10x genomics data.

11:30-11:50
Hybrid exons build genome-wide proteomic complexity
Confirmed Presenter: Zachary Wakefield, Boston University, United States

Room: 519
Format: In Person

Moderator(s): Athma Pai


Authors List: Show

  • Zachary Wakefield, Boston University, United States
  • Steven Mick, Boston University, United States
  • Ana Fiszbein, Boston University, United States

Presentation Overview: Show

Alternative splicing (AS) is a highly regulated process occurring in approximately 95% of encoded proteins, however the global implications on the proteome are largely unknown. To explore how AS impacts the proteome on a genome-wide level, we systematically identified every possible isoform switch annotated in the human genome, resulting from alternative first/last exons, alternative splice sites, and retained introns, in a pairwise manner. Additionally, we characterized isoform swaps due to the regulation of a newly identified class of exons known as hybrid exons, which can act as terminal or internal exons. We then performed sequence alignment between each protein pair and classified changes as frame shifts, partial expansions/reductions, and identical proteins. We observed the changing use of hybrid exons between internal and last exons had the most significant impact on protein sequences across all different splicing events. To elucidate the proteomic consequences of AS across phenotypes, we developed SpliceImpactR, a novel open-source R package for protein-domain analysis and isoform-specific domain-derived protein-protein interactions (ISPPI). Using adapted functionality from SpliceImpactR, we quantified the changes in ISPPI and the domain enrichment using the previously identified isoform swaps, revealing varied and unique impacts across each AS type. Applying SpliceImpactR to brain and heart samples from the GTEx database showed over 700 genes with significantly differentially used terminal exons – 33% of the identified swaps classified as strong swaps are caused by changing usage of hybrid exons. Our findings underscore the significance of hybrid exon usage in shaping the proteome diversity expressed in human cells.

11:50-12:00
Splicing-derived neo-epitopes in pediatric high-grade glioma
Confirmed Presenter: Priyanka Sehgal, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia, United States

Room: 519
Format: In Person

Moderator(s): Athma Pai


Authors List: Show

  • Priyanka Sehgal, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia, United States
  • Ammar Naqvi, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, United States
  • Katharina Hayer, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, United States
  • Makenna Higgins, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia, United States
  • Julien Jarroux, Center for Neurogenetics, Brain and Mind Research Institute, Weill Cornell Medicine, New York, United States
  • Taewoo Kim, Center for Neurogenetics, Brain and Mind Research Institute, Weill Cornell Medicine, New York, United States
  • Pamela Mishra, Division of Cancer Pathobiology, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, United States
  • Jacinta Davis, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia, United States
  • Charles Drummer, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia, United States
  • Manuel Torres Diz, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia, United States
  • Zhiwei Ang, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia, United States
  • Mathieu Quesnel-Vallieres, Departments of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, United States
  • Yoseph Barash, Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, United States
  • Jo Lynne Rokita, Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, United States
  • Hagen Tilgner, Center for Neurogenetics, Brain and Mind Research Institute, Weill Cornell Medicine, New York, United States
  • Adam Resnick, Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, United States
  • Andrei Thomas-Tikhonenko, Division of Cancer Pathobiology, Children's Hospital Of Philadelphia,, United States

Presentation Overview: Show

Pediatric high-grade gliomas (pHGG) respond poorly to standard therapies, and the development of novel immunotherapeutics (such as chimeric antigen receptor (CAR)-armed T cells) is hindered by the paucity of tumor-specific surface antigens. To overcome this problem, we used various algorithms to compare and contrast splicing patterns in 142 pHGGs vs. adult and fetal brain samples, yielding a list of pHGG-specific splice junctions. After prioritizing events corresponding to extracellular domains, we found that ~40% of them mapped to 3-51 nucleotide-long microexons. One salient example is neural cell adhesion molecule (NRCAM) mRNA, which exhibits skipping of the 18-nt microexon 9 and 30-nt microexon 23 (GTEx nomenclature) in ~70% of pHGG samples. Consequently, the corresponding junctions shows much higher expression levels in pHGGs compared to normal tissues of both neural and non-neural origins. Bulk and single-nuclei (SnISOr) long-read RNA-seq of pHGG organoids using the Oxford Nanopore platform revealed coordinated skipping of both microexons and a uniform expression pattern of the Δex9Δex23 NRCAM isoform across different cell clusters. We validated the surface expression of the corresponding proteoform using live cell biotinylation assay and demonstrated that it increases migration and invasion of KNS42 pHGG cells. We also developed a mouse monoclonal antibody with significantly higher avidity for the Δex9Δex23 vs. the full-length NRCAM isoform. Therefore, the pHGG-specific NRCAM (and possibly other microexon-derived proteoforms) are highly selective and feasible targets for CAR T cell-based immunotherapies.

12:00-12:20
Flash talks to advertise the posters
Room: 519
Format: In person

Moderator(s): Athma Pai


Authors List: Show

Presentation Overview: Show

A-180 Roni Cohen-Fultheim
A-186 Étienne Fafard-Couture
A-163 Andrew Tapia
A-184 Sumit Tarafder
A-157 Ihor Arefiev
A-183 Fozia Masood
A-188 Arsham Mikaeili Namini

14:20-14:40
Proceedings Presentation: Accurate Assembly of Multiple RNA-seq Samples with Aletsch
Confirmed Presenter: Qian Shi, Department of Computer Science and Engineering, The Pennsylvania State University, United States

Room: 519
Format: In Person

Moderator(s): Ashley Laughney


Authors List: Show

  • Qian Shi, Department of Computer Science and Engineering, The Pennsylvania State University, United States
  • Qimin Zhang, Department of Computer Science and Engineering, The Pennsylvania State University, United States
  • Mingfu Shao, Department of Computer Science and Engineering, The Pennsylvania State University, United States

Presentation Overview: Show

High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations. We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a “bridging” system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages “supporting information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch’s significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC) , Aletsch surpasses the leading assemblers TransMeta by 21.2%-57.4% and PsiCLASS by 21.9%-172.5% on human datasets. Aletsch is freely available at https://github.com/Shao-Group/aletsch.

14:40-15:00
Detecting differential transcript usage in heterogenous populations with SPIT
Confirmed Presenter: Beril Erdogdu, Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, United States

Room: 519
Format: In Person

Moderator(s): Ashley Laughney


Authors List: Show

  • Beril Erdogdu, Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, United States
  • Ales Varabyou, Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, United States
  • Stephanie Hicks, Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, United States
  • Steven Salzberg, Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, United States
  • Mihaela Pertea, Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, United States

Presentation Overview: Show

Differential transcript usage (DTU) plays a crucial role in shaping gene expression diversity across different biological scenarios, influencing cellular functionality and disease development. However, current DTU analysis methods often fail to consider the inherent population heterogeneity seen in complex human traits and diseases. Filling this important gap, our study introduces SPIT, a statistical method specifically designed to identify predominant subgroups and their unique DTU events within populations.

Utilizing an over-smoothed kernel density estimator (KDE), SPIT effectively mitigates technical and biological noise inherent in RNA-Seq data, and detects of multimodality without assumptions about expression pattern distributions. Additionally, SPIT generates an empirical null distribution of isoform abundance variability across datasets, enhancing its accuracy and versatility compared to existing tools.

Applying SPIT to a diverse array of human brain samples, our analysis unveils six significant DTU events associated with Schizophrenia subgroups, underscoring its efficacy in capturing disease heterogeneity. Furthermore, exploration of prenatal and adult brain samples reveals thousands of genes where the dominant isoform undergoes a complete shift between developmental stages and post-birth, providing novel evidence of this phenomenon in human brain development. These findings provide biological significance to specific isoforms previously lacking comprehensive functional understanding, and valuable insights into neurodevelopmental disorders.

15:00-15:30
Bias analysis for long-reads transcriptomics multi-sample datasets
Confirmed Presenter: Ana Victoria Conesa Cegarra, Spanish National Research Council, Spain

Room: 519
Format: In Person

Moderator(s): Ashley Laughney


Authors List: Show

  • Alejandro Paniagua, Spanish National Research Council, Spain
  • Jorge Mestre-Tomas, Spanish National Research Council, Spain
  • Liudmyla Kondratova, University of Florida, United States
  • Fabian Jetzinger, Biobam Bioinformatics, Spain
  • Stanley Cormack, Imperial College London, United Kingdom
  • Natalia Vega, University of Valencia, Spain
  • Luis Ferrández-Peral, Spanish National Research Council, Spain
  • Carolina Monzó, Spanish National Research Council, Spain
  • Ana Victoria Conesa Cegarra, Spanish National Research Council, Spain

Presentation Overview: Show

Long-read sequencing technologies such as PacBio and Oxford Nanopore are reshaping transcriptomics. The enhanced precision and depth of sequencing from these methods are proving critical for differential isoform expression studies across various conditions. This shift towards long-reads necessitates new best practices for experimental designs, preprocessing, and normalization tailored to these data types.
We set out to provide analysis guidelines for multi-sample long-read transcriptomics experiments. Utilizing a replicated dataset from mouse tissues and three long-read cDNA protocols, including the newest Pacbio Kinnex, we explored biases when constructing count tables.
We evaluated two main approaches: Call&Join (call transcript model for each sample and then combine results) and Join&Call (merge reads from different samples, then call transcripts models and re-quantify), finding that each strategy renders a different transcriptome composition depending on the analysis tool. Sequencing depth and replicate number significantly affect transcript identification, with known transcripts quickly stabilizing and novel ones requiring more depth, and most transcripts detected either by all or just one sample.
We detected variable biases in quantification due to read length and GC content across technologies. For instance, PacBio data showed a parabolic length bias and increased expression levels with higher GC content, although this greatly varied by sample, challenging differential analyses.
Our study highlights that experimental and preprocessing choices profoundly affect the long-read transcriptome count-tables. Length and GC content biases impact quantification, influenced by sample and technology. The results underscore the importance of thoughtful experimental design and preprocessing to ensure accurate transcriptome dataset composition and comparable quantification.

RISE: Relative Impact of Splicing and Expression in RNA-seq studies
Confirmed Presenter: Yu-Jen Lin, University of California, Berkeley, United States

Room: 519
Format: In Person

Moderator(s): Ashley Laughney


Authors List: Show

  • Yu-Jen Lin, University of California, Berkeley, United States
  • Amr Alazali, University of California, Berkeley, United States
  • Zhiqiang Hu, University of California, Berkeley. Currently at: Illumina, Foster City, California, United States
  • Steven Brenner, University of California, Berkeley, United States

Presentation Overview: Show

RNA-seq has been widely used to quantify expression and splicing changes in transcriptomes. Although biological consequences arise from changes in both expression and splicing aspects, researchers usually use their impressions to choose only one aspect to analyze, potentially overlooking significant impacts of the other. Even if researchers investigate both, the measurement scales of expression and splicing are different, and thus, their impacts are incomparable. To compare the relative impact of expression and splicing, we have developed RISE.

RISE qualifies the relative impact of expression and splicing changes caused by the treatment. To place the impact of expression and splicing changes on the same scale to compare, we developed the Normalized Variation (NV) measure. NV is defined as the proportion of the between-group variation to the total variation. Finally, we assess whether expression NV (eNV) or splicing NV (sNV) is significantly larger to understand the comparative influence of expression versus splicing alternations in the transcriptome.

To validate our method, we performed RISE analysis on RNA-seq data from knockdown or overexpression experiments of 11 transcription and splicing factors. RISE effectively categorizes transcription and splicing factors by their relative impacts on expression and splicing. As an example application, we applied RISE to 4 studies involving proteins with complex or previously unknown roles in regulating transcriptomes to understand their functions. In summary, RISE enables researchers to systematically compare the relative impact of expression and splicing.

From Noise to Signal: Quantifying Stochasticity in mRNA Splicing
Confirmed Presenter: Eraj Khokhar, RTI, UMass Chan Medical School, United States

Room: 519
Format: In Person

Moderator(s): Ashley Laughney


Authors List: Show

  • Eraj Khokhar, RTI, UMass Chan Medical School, United States
  • Kaitlyn Brokaw, RTI, UMass Chan Medical School, United States
  • Nida Javeed, RTI, UMass Chan Medical School, United States
  • Zachary Kartje, RTI, UMass Chan Medical School, United States
  • Valeria Sanabria, RTI, UMass Chan Medical School, United States
  • Jonathan Watts, RTI, UMass Chan Medical School, United States
  • Athma Pai, RTI, UMass Chan Medical School, United States

Presentation Overview: Show

Splicing is likely a major contributor to noise in mRNA regulation, with errors in splicing leading to reduced transcriptional efficiency and wasted transcriptional output. Cryptic splicing involves use of low-fidelity or infrequently bound splice sites that often leads to non-productive transcripts, likely targeted for degradation. Substantial evidence suggests that splicing noise is prevalent in homeostatic cell conditions, but the extent to which it occurs is likely underappreciated due to the challenges of identifying cryptic, low-fidelity splice site usage in mature mRNA data. Characterizing splicing noise has become increasingly important since blocking or redirecting the use of noisy splice sites in favor of productive splice sites may provide a novel strategy for up-regulating gene expression in healthy or disease contexts with high levels of splicing noise (e.g., cancer). Here, we tackle these challenges by performing high-throughput sequencing on selectively enriched nuclear nascent RNA, which greatly increases the global detection of cryptic splice sites. We further developed a python package to systematically identify and analyze cryptic, low-fidelity sites in high-throughput sequencing data from nascent, nuclear RNA and RNA from cycloheximide-treated cells. We use these experimental and computational methods to analyze cryptic splicing events in cancer cell lines and identify genomic features, sequence elements, and gene properties associated with the occurrence of cryptic splice sites across and between cell types. Our findings uncover a previously under-appreciated role for stochasticity in regulation of mRNA splicing, identify features predictive of splicing noise, and will aid in developing novel disease therapeutics to inhibit cryptic splicing.

15:30-15:50
Deciphering Transcriptional Bursting Using Single-Cell Metabolic Labeling Data
Confirmed Presenter: Teresa Rummel, Faculty for Informatics and Data Science, University Regensburg; Institute of Virology, University Würzburg, Germany

Room: 519
Format: In Person

Moderator(s): Ashley Laughney


Authors List: Show

  • Teresa Rummel, Faculty for Informatics and Data Science, University Regensburg; Institute of Virology, University Würzburg, Germany
  • Yiliam Cruz Garcia, Institute of Biochemistry, University Kiel, Germany
  • Juliane Müller, Institute of Biochemistry, University Kiel, Germany
  • Christophe Toussaint, HIRI, Helmholtz Center for Infection Research Würzburg; Institute of Molecular Infection Biology, University Würzburg, Germany
  • Bhupesh Prusty, Institute of Virology, University Würzburg, Germany
  • Antoine-Emmanuel Saliba, HIRI, Helmholtz Center for Infection Research Würzburg; Institute of Molecular Infection Biology, University Würzburg, Germany
  • Elmar Wolf, Institute of Biochemistry, University Kiel, Germany
  • Florian Erhard, Faculty for Informatics and Data Science, University Regensburg; Institute of Virology, University Würzburg, Germany

Presentation Overview: Show

In single cells, transcription is governed by bursts. The kinetics of transcriptional bursting are defined by the burst frequency, describing how often bursts occur, and the burst size, describing how many mRNA molecules are synthesized during one burst. Knowledge of these parameters of transcriptional bursting is key to gaining a better understanding of gene expression and its regulation.

Transcriptional bursting can be studied in a transcriptome-wide manner using single-cell RNA-seq. However, this approach only works under steady-state conditions and assumes a uniform RNA degradation rate across genes.

To overcome these limitations, we developed a new mathematical model that utilizes temporal data from single-cell metabolic labeling (scSLAM-seq) to quantitatively assess transcriptional bursting. This model enables the estimation of burst frequencies, sizes, and gene-specific degradation rates, applicable also under dynamic, non-steady state conditions such as upon viral infections or cytokine stimulation.

Using scSLAM-seq data we studied the role of MYC in transcriptional regulation in two cell lines with different MYC levels, A375 (high) and MaMel63a (low), after MYC depletion via an auxin inducible degron system. Our findings challenge the view that MYC primarily impacts burst size. Instead, we discovered that changes in burst frequency or a combination of both frequency and size are drivers of transcriptional changes, indicating a more complex role for MYC in gene regulation.

These insights illustrate the benefits of combining advanced sequencing techniques with dynamic modeling to study gene expression, enhancing our understanding of transcriptional mechanisms and providing a framework for analyzing gene responses under various conditions.

15:50-16:00
Coordinated regulation by lncRNAs results in tight lncRNA-target couplings
Confirmed Presenter: Pavel Sumazin, Baylor College of Medicine, United States

Room: 519
Format: In Person

Moderator(s): Ashley Laughney


Authors List: Show

  • Pavel Sumazin, Baylor College of Medicine, United States
  • Hua-Sheng Chiu, Baylor College of Medicine, United States
  • Sonal Somvanshi, Baylor College of Medicine, United States

Presentation Overview: Show

The characterization of long noncoding RNA (lncRNA) function is a major challenge in RNA biology with applications to basic, translational, and medical research. Our efforts to characterize the regulatory roles of lncRNAs in cancer identified lncRNA species that coordinately regulate both the transcriptional and post-transcriptional processing of their targets. This coordinated regulation results in tight couplings between lncRNAs and their targets and is easier to identify and verify. Our analyses suggested that hundreds of cancer genes are coordinately regulated by lncRNAs in multiple tumor types. As proof of principle, we studied the regulation of DICER1—a cancer gene that regulates microRNA biogenesis—by the lncRNA ZFAS1. ZFAS1 activates DICER1 transcription and blocks its post-transcriptional repression to control the expression of DICER1 and its target microRNAs. Both genes regulate tumor growth and DNA repair. Our analyses suggested that coordinated lncRNA regulation can propagate genomic alterations at lncRNAs to physiologically dysregulate cancer genes.

16:40-17:00
SWARM: Single-molecule Workflow for Analysing RNA Modifications
Confirmed Presenter: Stefan Prodic, Australian National University, Australia

Room: 519
Format: In Person

Moderator(s): Julia Salzman


Authors List: Show

  • Stefan Prodic, Australian National University, Australia
  • Alice Cleynen, Université de Montpellier, France
  • Akanksha Srivastava, Australian National University, Australia
  • Shafi Mahmud, Australian National University, Australia
  • Madhu Kanchi, Australian National University, Australia
  • Agin Ravindran, Australian National University, Australia
  • Nikolay Shirokikh, Australian National University, Australia
  • Eduardo Eyras, Australian National University, Australia

Presentation Overview: Show

The epitranscriptome contains over 170 chemical modifications that play a pivotal role in regulating RNA properties and function across various RNA classes. High-throughput methods for RNA modification detection remain limited and current approaches are hindered by extensive protocols that lack isoform-level resolution and restrict studies to a single modification per experiment, limiting comprehensive exploration of the dynamic and diverse epitranscriptome. Here we describe SWARM, a robust approach for the detection of m6A, m5C, pseudouridine, and ac4C from the same sample in individual RNA isoforms. SWARM exploits nanopore direct RNA sequencing signals that capture continuous native individual RNA molecules. SWARM attains unmatched accuracy in single-molecule modification detection for multiple RNA modifications through innovative neural network and training strategy applied to a broad array of diverse nanopore signals. We apply SWARM to numerous independent datasets and highlight replicable and accurate detection of modified sites in the transcriptome (messenger RNA and long non-coding RNA) and their modification rates, showing extensive agreement with experimentally validated sites. Our analysis shows that SWARM delivers confident detection of multiple RNA modifications from a single sample and provides a robust framework for comparing RNA modification landscapes between samples. We also provide an efficient workflow that opens a wealth of possibilities towards uncovering diverse RNA modification landscapes in countless contexts. SWARM enables a significant leap towards deciphering the dynamics and functional relevance of the epitranscriptome.

17:00-17:20
Refinement of SARS-CoV-2 Intra-host Mutations Using Explainable Representations
Confirmed Presenter: Fatima Mostefai, Université de Montréal; Montreal Heart Institute, Canada

Room: 519
Format: In Person

Moderator(s): Julia Salzman


Authors List: Show

  • Fatima Mostefai, Université de Montréal; Montreal Heart Institute, Canada
  • Jean-Christophe Grenier, Montreal Heart Institute, Canada
  • Raphaël Poujol, Montreal Heart Institute, Canada
  • Julie Hussin, Université de Montréal; Montreal Heart Institute, Canada

Presentation Overview: Show

SARS-CoV-2, an RNA virus, has evolved into multiple variants by accumulating mutations during transmission (inter-host) and infection (intra-host). De novo mutations arise in viral genomes during infection, and analyzing these mutations in sequencing data may predict emerging variants. Intra-host single nucleotide variants (iSNVs) can be identified by analyzing RNA sequencing (RNA-seq) reads from infections. However, sequencing artifacts introduced during the RNA-seq process can result in erroneous iSNVs. We aim to identify true intra-host mutations from viral RNA-seq data and propose metrics to refine RNA-seq analysis.

We developed a two-step workflow to isolate de novo iSNVs, focusing on the SARS-CoV-2 RNA-seq dataset. Initially, we processed a dataset of RNA-seq libraries, ensuring high-quality library preparation through whole-genome quality control. We then used these libraries for iSNV calling, using metrics such as Alternative Allele Frequency (AAF) and Strand Bias Likelihood (S) metrics to distinguish iSNVs from sequencing artifacts. We also used dimensionality reduction representations, such as PHATE and t-SNE, to visualize and analyze library structures complemented with an explainability metric.

We applied our workflow to a comprehensive SARS-CoV-2 RNA-seq dataset, distinguishing between de novo and consensus iSNVs, which is crucial for understanding viral intra-host evolution. We identified batch effects from sequencing centers and refined the AAF and S metrics for artifact resolution. Analyzing libraries from 2020 to 2023, we observed low intra-host diversity per infection, significant diversity in the spike gene, and strong purifying selection. This workflow enhances the precision and depth of RNA-seq and viral genomic analyses, advancing studies in RNA viruses.

17:20-18:00
Invited Presentation: Tackling the genotype-to-phenotype problem in cancer evolution
Confirmed Presenter: Ashley Laughney

Room: 519
Format: In Person

Moderator(s): Hagen Tilgner


Authors List: Show

  • Ashley Laughney

Presentation Overview: Show

Predicting protein function from sequence, also known as genotype-to-phenotype mapping, remains a central challenge in biology. This is because most proteins are highly pleiotropic; meaning they can perform more than one function and participate in a wide range of biological processes. As such, perturbations to a single gene often affects multiple, independent cellular responses. Integrating innovative systems and synthetic biology approaches with a hypothesis-driven framework, I will describe tools my lab has developed to map genome-encoded components to complex cellular and in vivo functions at scale. We focus on cancer metastasis as our model of a multicellular, evolutionary process and develop approaches that ask how activation of the very same protein or signaling pathway can lead to diverse functional outputs through (i) the evolution of distinct modular domains, (ii) intra-cellular genetic interactions (epistasis) and (iii) inter-cellular signaling networks (multicellular programs). We apply these emerging techniques to understand how highly pleiotropic proteins - such as an immune-related protein called Stimulator of Interferon Genes (STING) - switches from a tumor-suppressor to pro-tumoral function during the evolution of cancer metastasis.

Sunday, July 14th
10:40-11:20
Invited Presentation: Interpretable models to understand regulation of RNA splicing
Confirmed Presenter: Christopher Burge

Room: 519
Format: In Person

Moderator(s): Hagen Tilgner


Authors List: Show

  • Christopher Burge

Presentation Overview: Show

We are developing fully interpretable models of RNA splicing and its regulation for improved understanding and various applications. We recently described a model called SMsplice that predicts the splicing patterns of primary transcripts in a variety of animal and plant species, using just core splice site motifs, exon and intron length preferences, and learned scores for splicing regulatory elements (SREs) that act locally on splice sites. This model enables automatic learning of candidate SREs from any organism, and achieves accuracy of 83-86% in fish, insects, and plants and about 70% in mammals. A new direction is the inference of the splicing regulatory activity of a splicing factor from just knockdown/RNA-seq data and a model of its intrinsic binding preferences such as an RNA Bind-n-Seq, RNACompete or SELEX motif, but without using crosslinking data. Application to data from the ENCODE RNA-binding protein dataset and other data yields models that are reproducible across cell lines and species, and which can distinguish direct from indirect regulatory targets and can be used to infer cooperative splicing regulation.

11:20-11:40
IsoCLR: Contrastive learning for RNA foundation models
Confirmed Presenter: Ruian Shi, University of Toronto, Vector Institute, Canada

Room: 519
Format: In Person

Moderator(s): Hagen Tilgner


Authors List: Show

  • Philip Fradkin, Vector Institute, Canada
  • Ruian Shi, University of Toronto, Vector Institute, Canada
  • Keren Isaev, New York Genome Centre, Columbia University, United States
  • Quaid Morris, Computational and Systems Biology Program, Sloan Kettering Institute, United States
  • Bo Wang, University Health Network, Canada
  • Brendan Frey, University of Toronto, Canada
  • Leo Lee, University of Toronto, Canada

Presentation Overview: Show

In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. We introduce IsoCLR, a model trained on a novel dataset with a contrastive objective enabling the learning of generalized RNA isoform representations. We validate representation utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing across 6 tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction.

11:40-12:00
Explaining Deep Neural Networks for the Prediction of Translation Initiation
Confirmed Presenter: Uwe Ohler, Max Delbrueck Center & Humboldt University, Germany

Room: 519
Format: In Person

Moderator(s): Hagen Tilgner


Authors List: Show

  • Frederick Korbel, Max Delbruck Center, Germany
  • Gabriel Villamil, Max Delbruck Center, Germany
  • Ekaterina Eroshok, Max Delbruck Center & Humboldt University, Germany
  • Uwe Ohler, Max Delbrueck Center & Humboldt University, Germany

Presentation Overview: Show

Regulation of mRNA translation enables rapid and local control of gene expression. As rate-limiting step, translation initiation is primarily controlled by the 5’ untranslated region (5’UTR). In it, regulatory sequence elements including RNA structural motifs and upstream open reading frames (uORFs) dictate the efficiency of translation. A recent convolutional neural network model accurately quantifies the relationship between massively parallel synthetic 5’ UTRs and translation levels, but the underlying sequence determinants remain elusive.
To uncover the input features most important for prediction, feature attribution methods compute importance scores for input features, thus allowing to explain prediction output with respect to its input. Hence, model interpretation can be applied as a tool to uncover functional sequence patterns and generate novel biological hypotheses. Applying model interpretation, we extract representations of regulatory logic, revealing a complex interplay of regulatory sequence elements. Guided by insights from model interpretation, we adapt the model by human reporter data to obtain superior performance.

12:00-12:20
Translational efficiency covariation across cell types is a conserved organizing principle of mammalian transcriptomes
Confirmed Presenter: Can Cenik, UT Austin, United States

Room: 519
Format: In Person

Moderator(s): Hagen Tilgner


Authors List: Show

  • Can Cenik, UT Austin, United States

Presentation Overview: Show

Characterization of shared patterns of RNA expression between genes across conditions has led to the discovery of novel biological functions and regulatory networks. These RNA co-expression relationships have illuminated the higher-order organization of transcriptomes, yet we currently do not know if patterns of coordination in other gene expression modalities are similarly informative. In particular, translational covariation across cell types have remained unexplored, primarily due to the scarcity of comprehensive translational measurements across a large compendium of biological contexts. Here, we uniformly analyzed 2277 matched ribosome profiling and RNA-seq data from 90 human and 81 mouse tissues and cell lines. We introduce the concept of Translational Efficiency Covariation (TEC), identifying mRNAs that demonstrate coordinated translation patterns across cell types. We demonstrate that TEC is conserved across human and mouse cells and uncover novel gene functions that rely on translational covariation information alone. Moreover, our observations indicate that proteins exhibiting positive covariation at both translational and transcriptional levels are significantly more likely to physically interact. We finally discover TEC patterns indicative of RNA-binding protein (RBP) involvement, suggesting potential mechanisms of shared translational regulation. Our findings establish translational covariations across various conditions as a pervasive and conserved organizing principle of mammalian transcriptomes.

14:20-14:40
CellRBP: Improving Protein-RNA Binding Prediction In Vivo Using Cell-Type-Specific Features
Confirmed Presenter: Yaron Orenstein, Bar-Ilan University, Israel

Room: 519
Format: In Person

Moderator(s): Jérôme Waldispühl


Authors List: Show

  • Ori Feldman, Ben-Gurion University, Israel
  • Yaron Orenstein, Bar-Ilan University, Israel

Presentation Overview: Show

RNA-binding proteins play important roles in various cellular processes. For this reason, researchers have developed experimental assays to measure protein–RNA binding in vivo. However, obtaining these measurements for every protein across various cell types is infeasible due to the high cost and long times of these experiments. Thus, researchers rely on computational methods to predict protein–RNA binding, but so far methods have been limited in their success in predicting RNA binding across cell types. In this work, we present CellRBP, a novel method to accurately and efficiently predict protein–RNA binding across cell types. CellRBP is based on a convolutional neural network that uniquely receives as input cell-type-specific information, such as experimentally measured RNA structure and RNA abundance, which enable the accurate generalization across cell types (Figure 1). We trained CellRBP on 196 of eCLIP experiments and evaluated prediction performance in both cross-validation and across cell types. CellRBP achieved superior performance compared to the state of the art achieving an average AUROC score of 0.889 in cross-validation and 0.772 across cell types, respectively (Figure 2A,B). We interrogated the trained models for the important features they learned using both local and glocal interpretability techniques and discovered known and novel RNA-binding preferences (Figure 2C). CellRBP is expected to help many researchers in predicting protein–RNA binding over various cell types and conditions. CellRBP is available via https://github.com/OrensteinLab/CellRBP.

14:40-15:00
Reconstructing the sequence specificities of RNA-binding proteins across eukaryotes
Confirmed Presenter: Kaitlin Laverty, Memorial Sloan Kettering Cancer Center, United States

Room: 519
Format: In Person

Moderator(s): Jérôme Waldispühl


Authors List: Show

  • Alexander Sasse, University of Washington, United States
  • Debashish Ray, University of Toronto, Canada
  • Kaitlin Laverty, Memorial Sloan Kettering Cancer Center, United States
  • Cyrus Tam, Memorial Sloan Kettering Cancer Center, United States
  • Mihai Albu, University of Toronto, Canada
  • Hong Zheng, University of Toronto, Canada
  • Olga Lyudovyk, Memorial Sloan Kettering Cancer Center, United States
  • Kate Nie, University of Toronto, Canada
  • Cedrik Magis, The Barcelona Institute of Science and Technology, Spain
  • Cedric Notredame, The Barcelona Institute of Science and Technology, Spain
  • Matthew Weirauch, Cincinnati Children’s Hospital, United States
  • Timothy Hughes, University of Toronto, Canada
  • Quaid Morris, Memorial Sloan Kettering Cancer Center, United States

Presentation Overview: Show

RNA-binding proteins (RBPs) are key regulators of gene expression. Here, we introduce RBPzoo — a resource of RNAcompete-derived in vitro RNA-binding data for 379 RBPs from 33 diverse eukaryotes. We develop a new method, Joint Protein-Ligand Embedding (JPLE), to map specificity-determining peptides to corresponding RNA motifs for 28,667 RBPs from 690 eukaryotes. We illustrate the broad utility of this resource by inferring post-transcriptional function for 12 eukaryotic RBPs in mRNA stability and reconstructing the evolution of 2,568 RNA motifs. For the latter, we identify a universal set of 19 RNA motifs conserved between plants and metazoa and observe rapid motif evolution arising from whole genome duplications in vertebrate ancestors. RBPzoo represents a powerful resource for the study of gene regulation for any organism with an annotated genome.

15:00-15:10
A novel NLP-based RBP binding motif and context discovery method using multiple-instance learning
Confirmed Presenter: Shaimae Elhajjajy, University of Massachusetts Chan Medical School, United States

Room: 519
Format: In Person

Moderator(s): Jérôme Waldispühl


Authors List: Show

  • Shaimae Elhajjajy, University of Massachusetts Chan Medical School, United States
  • Zhiping Weng, University of Massachusetts Chan Medical School, United States

Presentation Overview: Show

RNA-binding proteins (RBPs) are the primary mediators of mRNA regulation, dynamically governing complex processes such as splicing, cleavage, and degradation. Previous studies have shown that structurally diverse RBPs recognize similar motifs but can still bind distinct sites within the transcriptome. While in vitro evidence suggests that motif context plays an important role in RBP binding specificity, the precise underlying mechanisms remain unclear. Despite recent advances in machine learning models to predict RBP binding, current methods are often difficult to interpret and do not categorically investigate motif contexts. Thus, there remains a need for interpretable predictive models to disambiguate the contextual determinants of RBP binding specificity. Here, we present, to the best of our knowledge, the first formulation of the RBP binding prediction task as an NLP-based multiple-instance learning problem. We introduce a novel sequence decomposition strategy to generate entities termed “contexts”, which we use to train and test our deep learning models. We also develop a deterministic motif discovery algorithm that is fast, accurate, and specialized to handle our data structure, recapitulating the motifs of well-characterized RBPs as validation. Importantly, we discover and characterize the in vivo sequence binding contexts for a collection of RBPs. Finally, by integrating motif and context similarity measures with a cross-prediction approach, we propose novel RBP-RBP interaction partners and hypothesize whether these interactions are cooperative or competitive. In summary, we present a comprehensive computational strategy for illuminating contextual determinants of specific RBP binding and demonstrate the implications of our findings in delineating RBP function.

15:10-15:30
snoFlake: Discovery of a snoRNA-guided splicing regulatory complex via the snoRNA-RBP interactome
Confirmed Presenter: Kristina Sungeun Song, Université de Sherbrooke, Canada

Room: 519
Format: In Person

Moderator(s): Jérôme Waldispühl


Authors List: Show

  • Kristina Sungeun Song, Université de Sherbrooke, Canada
  • Bernice Yeo, Université de Sherbrooke, Canada
  • Vivian Seow, Université de Sherbrooke, Canada
  • Laurence Faucher-Giguère, Université de Sherbrooke, Canada
  • Gabrielle Deschamps-Francoeur, Université de Sherbrooke, Canada
  • Sherif Abou Elela, Université de Sherbrooke, Canada
  • Michelle Scott, Université de Sherbrooke, Canada

Presentation Overview: Show

Box C/D small nucleolar RNAs (snoRNAs) are noncoding RNAs crucial for guiding 2’-O-ribose methylation in ribosomal RNA during ribosome biogenesis, primarily through the formation of ribonucleoprotein (snoRNP) complexes with core RNA-binding proteins (RBPs). Additional roles were proposed for box C/D snoRNAs, including the regulation of alternative splicing of protein-coding transcripts, yet few validated examples exist, with unclear mechanisms. Some noncanonical functions are thought to involve interactions with additional RBPs beyond the core snoRNA binders, indicating diverse regulatory roles of snoRNAs by interacting with various RBPs, collectively modulating protein-coding target RNAs. To explore these interactions and their functional implications, we introduce snoFlake, an interaction network of 191 box C/D snoRNAs and 166 human RBPs, showing direct binding interactions and significant overlap of binding sites on shared protein-coding target RNAs, reinforcing their concerted role in gene regulation. Focusing on snoRNAs targeting groups of functionally-related targets, also bound by snoRNA-associated RBPs led to a hub region composed of SNORD22 and U2 and U5-associated splicing factors: SF3B4, PRPF8, EFTUD2 and AQR. SNORD22, PRPF8 and AQR exhibited an enrichment of overlapping binding sites at both 5’ and 3’ splice sites with the highest number of shared protein-coding target RNAs, suggesting their involvement in a splicing regulatory model. Knockdown experiments and differential alternative splicing analysis further highlighted the potential role of the SNORD22 complex in splicing, marking the first snoRNP splicing regulatory complex. This reshapes the understanding of snoRNA biology, emphasizing snoFlake's potential as a foundation for unravelling the impact of snoRNA-RBP interactions in gene regulation.

15:30-15:40
scTail: precise polyadenylation site detection and its alternative usage analysis from reads 1 preserved 3' scRNA-seq data
Confirmed Presenter: Ruiyan Hou, The University of Hong Kong, China

Room: 519
Format: In Person

Moderator(s): Jérôme Waldispühl


Authors List: Show

  • Ruiyan Hou, The University of Hong Kong, China
  • Yuanhua Huang, The University of Hong Kong, China

Presentation Overview: Show

Three-prime single-cell RNA-seq (scRNA-seq) has been widely employed to profile cellular transcriptomes, however, its power of analysing polyadenylation sites (PAS) has not been fully utilised. Here, we present a computational method, scTail, to precisely identify PAS by using reads 1 and quantify its expression by leveraging the reads 2, which enables effective detection of alternative PAS usage. When compared with other methods, PAS detected by scTail are more accurate. With various experimental data sets, we have demonstrated that scTail can accurately identify PAS and the detected alternative PAS usages showed strong specificity in different biological processes.

15:40-16:00
G4mer: Transcriptome-wide prediction of RNA G-quadruplexes with a deep RNA language model
Confirmed Presenter: Farica Zhuang, University of Pennsylvania, United States

Room: 519
Format: In Person

Moderator(s): Jérôme Waldispühl


Authors List: Show

  • Farica Zhuang, University of Pennsylvania, United States
  • Danielle Gutman, University of Pennsylvania, United States
  • Nathaniel Islas, University of Pennsylvania, United States
  • Bryan Guzmán, University of North Carolina at Chapel Hill, United States
  • San Jewell, University of Pennsylvania, United States
  • Daniel Dominguez, University of North Carolina at Chapel Hill, United States
  • Yoseph Barash, University of Pennsylvania, United States

Presentation Overview: Show

RNA G-quadruplexes (rG4) are RNA secondary structures known to play an important role in gene regulation. Despite their importance, the effects of genetic variants on rG4 formations and functions remain unexplored. To address this challenge, we introduce G4mer, a deep learning model that predicts transcriptome-wide rG4 formations using high throughput RT-stop experimental data, rG4-seeker. While computational methods have been developed to predict whether rG4s are likely to form on a given sequence, we show that G4mer outperforms other state-of-the-art models, especially for non-canonical rG4 that do not confer to the consensus GGG-{N-1:7}(3)-GGG motif. Additionally, G4mer offers a computational approach to study the effect of variants on rG4 formation and the association of these variants with diseases. With G4mer, we map variants in the 5’ and 3’ untranslated regions that are predicted to alter rG4 formations. Then using the Penn Medicine BioBank, we identify those associated with diseases such as breast cancer. By carefully interpreting the learned G4mer model, we identify rG4 length as a significant factor that deviates between experimental data and human rG4s. Finally, we validate the effect of disease-associated rG4-altering variants on protein expression using dual luciferase assay, and assess the effect of variants on structure formation using Circular Dichroism and RT-stop assays. These experiments point to a potential interplay between structure and sequence motifs affecting downstream gene translation. Overall, our work offers a compelling framework for detecting and validating the functional effects of rG4-altering variants that are significantly associated with diseases.

16:40-17:20
Invited Presentation: Fast and accurate RNA virtual screening using non-canonical RNA base pair interaction networks and graph machine learning
Confirmed Presenter: Jérôme Waldispühl

Room: 519
Format: In Person

Moderator(s): Michelle Scott


Authors List: Show

  • Jérôme Waldispühl

Presentation Overview: Show

RNAs constitute a vast reservoir of mostly untapped drug targets. Structure-based virtual screening (VS) methods are key to massively screen molecular targets and identify promising candidate molecules binding. However, this strategy does not scale well with the size the small molecule databases and the number of potential RNA targets. Furthermore, this approach is also hampered by the scarcity of RNA 3D structural data.
In this talk, we show that using an augmented classification of RNA base pairs combined with graph machine learning methods enable us to design a new class of algorithms for screening RNAs and promising molecular compounds. We describe a data-driven VS pipeline that deals with the unique challenges of RNA molecules through coarse grained modeling of 3D structures and heterogeneous training regimes. We demonstrate strong prediction and generalizability of our framework and discuss further expansion of this platform.

17:20-17:40
Proceedings Presentation: Partial RNA Design
Confirmed Presenter: Frederic Runge, University of Freiburg, Germany

Room: 519
Format: In Person

Moderator(s): Michelle Scott


Authors List: Show

  • Frederic Runge, University of Freiburg, Germany
  • Jörg K.H. Franke, University of Freiburg, Germany
  • Daniel Fertmann, University of Freiburg, Germany
  • Rolf Backofen, University of Freiburg, Germany
  • Frank Hutter, University of Freiburg, Germany

Presentation Overview: Show

RNA design is a key technique to achieve new functionality in fields like synthetic biology or biotechnology. Computational tools could help to find such RNA sequences but they are often limited in their formulation of the search space. In this work, we propose partial RNA design, a novel RNA design paradigm that addresses the limitations of current RNA design formulations. Partial RNA design describes the problem of designing RNAs from arbitrary RNA sequences and structure motifs with different design goals. By separating the design space from the objectives, our formulation enables the design of RNAs with variable lengths and desired properties, while still allowing precise control over sequence and structure constraints at individual positions. Based on this formulation, we introduce a new algorithm, libLEARNA, capable of efficiently solving different constraint RNA design tasks. A comprehensive analysis of various problems, including a realistic riboswitch design task, reveals the outstanding performance of libLEARNA and its robustness.

17:40-17:50
High resolution deconvolution of RNA secondary structure via long read nanopore technology
Confirmed Presenter: J. White Bear, McGill University, Canada

Room: 519
Format: In Person

Moderator(s): Michelle Scott


Authors List: Show

  • J. White Bear, McGill University, Canada
  • Gregoire De Bisschop, IRCM, Canada
  • Eric Lecuyer, IRCM, Canada
  • Jérôme Waldispühl, McGill University, Canada

Presentation Overview: Show

RNA are known to be highly flexible and take on multiple conformations to perform various tasks and binding in vivo. This makes structural analysis more challenging than with larger, lower entropy molecules. Structure probing with chemical reagents has been a key tool for developing deeper understandings of secondary RNA structure. Traditional probing methods use an averaged mutational profile to detect modifications and infer secondary structural features using reverse transcription. However, this requires shortened segment lengths which can obfuscate key structural information. Moreover, mutational profiles do not express alternative conformations or indicate optimality and conserved features that may be equally viable or relevant to function. Indeed, the averaged profile may be suboptimal. In our study, we use long read nanopore technology to directly sequence RNA, with the reagent acetlyimadizole (AcIm). Our software, Dashing Turtle (DT), can identify AcIm modifications at high resolution, inferring secondary structural features, and determining diverse conformations. DT applies a unique deconvolution of large RNA samples and examines both conservation and dominance across the sample, potentially yielding optimal conformations. Identifying dominant conformations may further lead to a better understanding of key RNA hybridization strategies that can only be observed in transitional interactions and co-modifications. Additionally, DT can identify conserved states across these conformations that may have key functional implications. Furthermore, our method has the potential for application to time or phase-based strategies that can help us understand intermediate structures that play key roles in binding or other in vivo activities for both drug delivery and pathogenesis.

17:50-18:00
Conclusion and awards
Room: 519
Format: In person

Moderator(s): Michelle Scott


Authors List: Show

  • Michelle Scott