Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
C-301: Pre-mRNA splicing order across long multi-intronic transcripts
Track: iRNA
  • Karine Choquet, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States
  • Autum R. Baxter-Koenigs, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States
  • Brendan M. Smalec, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States
  • L. Stirling Churchman, Department of Genetics, Blavatnik Institute, Harvard Medical School, United States


Presentation Overview: Show

Pre-mRNA splicing requires the excision of multiple introns within the same nascent transcript. Combinatorically, the order of intron excision could proceed down thousands of different paths, each of which would expose different landscapes of cis-elements that contribute to alternative splicing (AS). How intron splicing proceeds across human pre-mRNAs is not well understood due to technical limitations in the quantitative analysis of long RNA molecules. Here, we investigated post-transcriptional splicing order in human cells using direct RNA nanopore sequencing. We found that multi-intron splicing order is not stochastic, but largely pre-determined, with most genes using only a few predominant splicing orders to reach a fully spliced transcript. Strikingly, splicing orders were conserved across cell types and during motor neuron differentiation, indicating that for the studied introns, splicing order is fixed. Moreover, splicing orders did not change to accommodate AS, rather introns flanking alternatively spliced exons during differentiation were largely excised last, after their neighboring introns. Interestingly, sequencing of human lymphoblast cell lines from different individuals revealed several examples of allele-specific splicing order, suggesting that genetic sequence contributes to splicing order determination. Together, our results demonstrate that multi-intron splicing order is predetermined in human cells and is partially regulated by RNA sequence.

C-302: Redefining normal breast cell populations using long noncoding RNAs
Track: iRNA
  • Mainá Bitar, QIMR Berghofer, Brazil
  • Isabela Almeida, USP (Brazil) / QIMR Berghofer (Australia), Australia
  • Stacey Edwards, QIMR Berghofer, Australia
  • Juliet French, QIMR Berghofer, Australia


Presentation Overview: Show

Single-cell RNAseq has allowed unprecedented insight into gene expression across different cell populations in normal tissue and disease states. However, almost all studies rely on annotated gene sets to capture gene expression levels and sequencing reads that do not align to known genes are discarded. Here, we discover thousands of long noncoding RNAs (lncRNAs) expressed in human mammary epithelial cells and analyze their expression in individual cells of the normal breast. The human mammary epithelium is a highly dynamic tissue, composed of three main cell populations, basal, luminal progenitor and luminal mature cells, that can originate different subtypes of breast cancer. We show that lncRNA expression alone can discriminate between luminal and basal cell types and define subpopulations of both compartments. Clustering cells based on lncRNA expression identified additional basal subpopulations, compared to clustering based on annotated gene expression, suggesting that lncRNAs can provide an additional layer of information to better distinguish breast cell subpopulations. In contrast, breast-specific lncRNAs poorly distinguish brain cell populations, highlighting the need to annotate tissue-specific lncRNAs prior to expression analyses. Overall, our results suggest that lncRNAs are an unexplored resource for new biomarker and therapeutic target discovery in the normal breast and breast cancer subtypes.

C-303: Identification and characterization of key long non-coding RNAs involved in cervical cancer progression using a systems biology approach
Track: iRNA
  • Mallikarjuna Thippana, Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, India, India
  • Vaibhav Vindal, Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, India, India


Presentation Overview: Show

Cervical cancer is one of the most common gynecological malignancies worldwide. The main cause of cervical cancer is persistent infection with high-risk human papillomavirus (HPV). However, not all HPV-infected women develop cervical cancer, suggesting that other factors are involved in the progression of this disease. Non-coding elements, such as long non-coding RNAs (lncRNAs) have been implicated in multiple biological processes and diseases, including cervical cancer. In this study, we used a systems biology approach to identify and characterize the key lncRNAs involved in cervical cancer progression. We integrated multiple types of data, such as gene expression, DNA methylation, and clinical information, to construct a comprehensive lncRNA-mRNA co-expression network and a lncRNA-miRNA-mRNA competing endogenous RNA (ceRNA) network. We then applied network analysis methods to prioritize the most relevant lncRNAs and their target genes and miRNAs. We also performed functional enrichment and survival analyses to validate the biological and clinical significance of the identified lncRNAs. Our results revealed several novel lncRNAs that may play essential roles in cervical cancer progression by regulating key pathways and interacting with oncogenes or tumor suppressors. These lncRNAs may serve as potential biomarkers or therapeutic targets and provides new insights into the molecular mechanisms of cervical cancer.

C-304: Refining snoRNA annotation across eukaryote genomes using machine learning
Track: iRNA
  • Étienne Fafard-Couture, Université de Sherbrooke, Canada
  • Pierre-Étienne Jacques, Université de Sherbrooke, Canada
  • Michelle S Scott, Université de Sherbrooke, Canada


Presentation Overview: Show

Background/motivation: Noncoding gene annotation often lags behind that of protein coding genes, particularly for small nucleolar RNAs (snoRNAs), a type of noncoding RNA regulating ribosome biogenesis. The latest human genome annotation contains only 50% of all snoRNAs reported in the specialized database snoDB. SnoRNA annotations are also often inadequate: we recently showed that more than 2/3 of all annotated snoRNAs are not expressed in human and are characterized by degenerate motifs and unstable predicted structures. While these features are incompatible with snoRNA expression (and challenge the very definition of a snoRNA), these non-expressed genes are still annotated as snoRNAs.
Methods/Results: To provide accurate snoRNA annotations across eukaryotes, we designed a neural network-based snoRNA predictor that uses as input the sequence, the structure stability and the presence of characteristic motifs. 1055 expressed snoRNAs were retrieved from RNA-Seq datasets and the literature, covering 15 organisms (animals, plants, fungi and protists). Negative examples were gathered from the sequence of shuffled snoRNAs, random genomic regions and other types of mid-size noncoding RNAs. After optimization, the predictor will be compared to existing tools and applied across all eukaryote genomes.
Conclusion: Our predictor will improve annotations and facilitate comparative studies of snoRNAs across all eukaryote kingdoms.

C-305: Splicing Factor Proteins That Connect Differential Exon Usage to Epigenetic Deregulation
Track: iRNA
  • Hanah Robertson, Universität des Saarlandes, Germany


Presentation Overview: Show

An important aspect of protein diversity is the alternative inclusion of exons in the mRNA transcripts of a large population of human genes. The goal of this study was to observe the association of the transcriptome and the epigenome in the context of alternative splicing. To this extent, we analyzed RNA-Seq data and histone modification ChiP-Seq data for the human embryonic stem cell and four in-vitro differentiated cell lineages from the ENCODE project. We focused on the exon-intron boundaries of the precursor mRNA transcripts. We identified ‘epispliced’ genes showing a significant correlation between differential exon usage and differential histone modification. Genes with no histone modifications enriched at the exon level were labeled as ‘non-epispliced’ genes. We trained a random forest model to classify with 80-90% accuracy exon flanks as those belonging to epispliced genes or to non-epispliced genes using as features the binding affinities of splicing factors to these regions predicted by the tool RBPmap. It turned out that the inspected splicing factors are predicted to bind either to the exon flanks of epispliced or non-epispliced genes. The classifiers showed good transferability across closely related biosamples.

C-306: Time-series Single-cell RNA-seq Data Analysis using Pseudo-bulk TO-GCN Method
Track: iRNA
  • Yao-Ming Chang, Institute of Biomedical Sciences, Academia Sinica, Taiwan


Presentation Overview: Show

Time-series single-cell RNA-seq is widely used to study tissue development processes or how cells respond to stimuli. However, current trajectory inference methods do not consider the issue of low RNA detection capability, as less than 10% of total RNA molecules can be detected in an individual cell. Instead, these methods construct a path based on the similarity of a few abundant gene expression profiles. Additionally, they do not take advantage of the data continuity, but rather pool together all cells from different time points. In this study, we developed a new method that applies TO-GCN (Time-ordered Gene Coexpression Network) methods with pseudo-bulk expression for each cell cluster. This method overcomes the issue of low RNA detection capability and helps us identify the most important key cell clusters. Furthermore, it can uncover regulatory relationships and reveal the dynamics of emerging pathways across the experimental time points.

C-307: Variability of nonsense-mediated mRNA decay (NMD) pathway efficiency in human cancers
Track: iRNA
  • Guillermo Palou Márquez, Institute for Research in Biomedicine (IRB Barcelona), Spain
  • Fran Supek, Institute for Research in Biomedicine (IRB Barcelona), Spain


Presentation Overview: Show

The nonsense-mediated mRNA decay (NMD) pathway is a critical mRNA surveillance mechanism responsible for the degradation of transcripts containing premature termination codons (PTCs), reducing the production of potentially harmful truncated proteins. It also can regulate other natural (non-PTC) endogenous genes through specific NMD-eliciting features. There is some evidence anticipating that NMD efficiency (NMDeff) can vary across human individuals and tissues, which might impact the phenotype severity of several diseases, including cancer. Here, we present a systematic quantification of NMDeff variability across 33 tumor types and 54 normal tissues. We use matched whole-exome sequencing (WES) and RNA-seq data to estimate sample-NMDeff values using two independent statistical methods: i) endogenous natural NMD target transcript levels and ii) Allele-Specific Expression of PTCs. We show how NMDeff significantly varies across tumors and normal tissues. Through genetic somatic associations we find that copy-number variation plays a role dysregulating the NMD pathway activity in some cancers. For instance, a common 1q amplification, where SMG5 and SMG7 factors are located, is associated with a reduction in NMDeff. In conclusion, we have implemented two statistical methods to quantify the NMDeff variability across individuals and tissues and detect some of its genetic underpinnings.

C-308: Probabilistic models of RNA•DNA:DNA triplex formation accurately predict genome-wide RNA-DNA interactions
Track: iRNA
  • Timothy Warwick, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany
  • Sandra Seredinski, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany
  • Nina M. Krause, Goethe University Frankfurt, Institute for Organic Chemistry and Chemical Biology, Germany
  • Jasleen Kaur Bains, Goethe University Frankfurt, Institute for Organic Chemistry and Chemical Biology, Germany
  • Harald Schwalbe, Goethe University Frankfurt, Institute for Organic Chemistry and Chemical Biology, Germany
  • Matthias S. Leisegang, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany
  • Marcel H. Schulz, Goethe University Frankfurt, Institute for Cardiovascular Regeneration, Germany
  • Ralf P. Brandes, Goethe University Frankfurt, Institute for Cardiovascular Physiology, Germany


Presentation Overview: Show

Background:

RNA•DNA:DNA triple helix (triplex) formation enables RNA transcripts to modulate local chromatin environment. Molecular detection of triplex formation is complex, making computational prediction of triplex formation important. Previous predictive methods relied upon Hoogsteen base pairing. We explored whether machine learning in conjunction with triplex-sequencing data generated in vitro could improve prediction of triplex formation.

Methods:

Triplex-enriched DNA and RNA motifs were identified from unpaired triplex-sequencing data, and input to an Expectation-Maximisation algorithm which learned probabilistic matrices linking sets of DNA and RNA motifs. Matrix error was calculated per iteration, and minimised. Final matrices and motif sets were output upon minimisation. Output matrices were implemented as score matrices in the local alignment program TriplexAligner, which uses Karlin-Altschul statistics to predict triplex formation between user-provided DNA and RNA.

Results:

TriplexAligner significantly outperformed previously published methods in the accurate recall of genome-wide RNA-DNA interactions identified by RADICL-sequencing or RedC, as well as specific interactions of lncRNA SARRAH. Predicted triplex DNA and RNA sequences were evaluated biophysically, and appeared to form valid triplex.

Outlook:

DNA-RNA pairing rules learned from triplex-sequencing data accurately predict RNA-DNA interactions. Applications of TriplexAligner could elucidate mechanisms of RNA action and potential importance of triplex formation.

C-309: Single cell transcriptome sequencing of stimulated and frozen human peripheral blood mononuclear cells
Track: iRNA
  • Céline Derbois, CEA/CNRGH, France
  • Marie-Ange Palomares, CEA/CNRGH, France
  • Jean-François Deleuze, CEA/CNRGH, France
  • Eric Cabannes, CEA/CNRGH, France
  • Eric Bonnet, CEA/CNRGH, France


Presentation Overview: Show

Peripheral blood mononuclear cells (PBMCs) are blood cells that are a critical part of the immune system used to fight off infection, defending our bodies from harmful pathogens. In biomedical research, PBMCs are commonly used to study global immune response to disease outbreak and progression, pathogen infections, for vaccine development and a multitude of other clinical applications. Over the past few years, the revolution in single-cell RNA sequencing (scRNA-seq) has enabled an unbiased quantification of gene expression in thousands of individual cells, which provides a more efficient tool to decipher the immune system in human diseases. In this work, we generate scRNA-seq data from human PBMCs at high sequencing depth (>100,000 reads/cell) for more than 30,000 cells, in resting, stimulated, fresh and frozen conditions. The data generated can be used for benchmarking batch correction and data integration methods, and to study the effect of freezing-thawing cycles on the quality of immune cell populations and their transcriptomic profiles.

C-310: grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis
Track: iRNA
  • Teresa Rummel, Institute of Virology and Immunobiology, Julius-Maximilians-Universität, Würzburg, Germany
  • Lygeri Sakellaridi, Institute of Virology and Immunobiology, Julius-Maximilians-Universität, Würzburg, Germany
  • Florian Erhard, Computational Immunology, Universität Regensburg, Regensburg, Germany


Presentation Overview: Show

Metabolic RNA labeling is a powerful method to investigate the temporal dynamics of gene expression. The introduction of nucleotide conversion RNA-seq, such as SLAM-seq, has greatly facilitated the experimental effort but has also brought new challenges for data analysis. Another layer of complexity is added when elaborate experimental designs such as time courses with multiple genotypes and treatments are required to answer complex research questions. Yet, appropriate computational tools for analyzing this kind of data are lacking. To address this need, we developed grandR, a comprehensive toolkit for the analysis of nucleotide conversion RNA-seq data.
With our software we also introduce new quality control measures to exclude effects of 4sU on transcription and describe the need for recalibration of effective labeling times that would otherwise bias results. grandR enables researchers to perform differential gene expression analysis and estimate synthesis and degradation rates for both progressive labeling as well as snapshot experiments. Additionally, our software provides a web-based interface for exploratory data analysis.
grandR represents a significant advance in the analysis of nucleotide conversion RNA-seq data, enabling researchers to gain deeper insights into the temporal dynamics of gene expression and accelerating progress in many areas of biomedical research.

C-311: Molecular function of the non-coding RNAs snord116 involved in Prader Willi syndrome
Track: iRNA
  • Laeya Baldini, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Hélène Marty-Capelle, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Anne Robert, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Bruno Charpentier, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France
  • Stéphane Labialle, Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France, France


Presentation Overview: Show

C/D-box small nucleolar RNAs (SNORDs) classically direct the post-transcriptional methylation of nucleosides in ribosomal RNAs, small nuclear RNAs, and transfer RNAs. However, the human genome produces numerous orphan SNORDs that lack the ability to interact with classical RNA targets and whose function is poorly understood. The eutherian-specific, orphan SNORD116 genes, which are organized in a large tandem repeat at 15q11-13, are strongly suspected to play a major role in the rare disease called Prader Willi syndrome (PWS), but their molecular function remains unknown. We combined phylogenetic and computational interaction analyses to reveal that a subset of snord116 copies use an antisense element, which is typically involved in classical target recognition, to interact with messenger RNA (mRNA) targets. Target status was confirmed by transient knockdown and compensation experiments in human and mouse cells. To go further, we are working to characterize the molecular mechanism of snord116 action and to identify the extent of their target repertoire. This combination of computational and experimental approaches expands the description of the molecular bases of PWS and opens new avenues for therapy. More generally, this approach could be considered for the functional characterization of other noncoding RNAs, in particular when expressed from multiple gene copies.

C-312: TSS-Captur - A Transcription Starting Site-based Characterization Pipeline for Transcribed but Unclassified Prokaryotic RNA transcripts
Track: iRNA
  • Mathias Witte Paz, Institute for Bioinformatics and Medical Informatics - University of Tübingen, Germany
  • Kay Nieselt, Institute for Bioinformatics and Medical Informatics - University of Tübingen, Germany


Presentation Overview: Show

RNA-seq and its modified enriched-based methods, such as differential RNA-seq, have enabled the base-exact identification of transcription starting sites (TSS) and have improved gene expression analysis. However, some TSSs cannot be associated with known annotated genes, thus called orphan TSSs. Hence, characterizing transcripts starting at these positions seems to be challenging for existing computational annotation pipelines. TSS-Captur, a novel pipeline, uses different computational approaches to characterize transcripts starting from experimentally confirmed orphan TSSs with a specific focus on non-coding RNA gene characterization. TSS-Captur uses two methods to classify extracted transcripts into coding or non-coding genes and predicts for each putative transcript their transcription termination sites. For each predicted ncRNA gene, the secondary structure is computed. Furthermore, putative promoter regions are analyzed for the existence of known transcription regulation motifs. The results are presented in an interactive interface for easy exploration. TSS-Captur was tested on Streptomyces coelicolor data and successfully characterized unlabeled ncRNAs overlooked by common genome annotation pipelines. Also, TSS-Captur characterized more unannotated transcripts in greater detail when compared to another similar pipeline. In summary, starting from experimental TSS data, TSS-Captur predicts the characterization of unclassified signals and complements prokaryotic annotation tools, contributing to the understanding of bacterial transcriptomes.

C-313: Exploring the Role of Alternative Splicing in Multiple Sclerosis: Identifying sQTLs through Bioinformatic Analysis
Track: iRNA
  • Cesar Lobato-Fernandez, University of Navarra, Spain
  • Angel Rubio, University of Navarra, Spain
  • David Otaegui, Biodonostia Health Research Institute, Spain
  • Maider Muñoz-Culla, Biodonostia Health Research Institute, Spain


Presentation Overview: Show

Alternative splicing (AS) may be related to genetic diseases, both in terms of its causal role and its potential as a treatment. In autoimmune diseases such as multiple sclerosis, AS exhibits aberrant behavior and plays an important role in symptom characterization. Despite significant progress, the regulatory mechanism of AS remains an open area of research involving several related biological phenomena.
Our aim is to identify Single nucleotide polymorphisms (SNPs) that influence the regulation of AS events. To this aim, we studied three patient groups: sporadic multiple sclerosis, relapsing multiple sclerosis and a control group.
To do this, we developed a bioinformatic pipeline that identifies both annotated and novo AS events using our EventPointer software. This pipeline combines the information of the differential events, the differential usage of alleles in cis-acting SNPs, SpliceAI (a machine learning method that predicts mutations than affect splicint) and sQTLs databases.
As a result, 299 SNPs were found in possible regulatory regions of 275 AS events that are differentially expressed between patient groups, generating a set of potential sQTLs that may influence the progression or symptoms of multiple sclerosis. Interestingly, immune system response was overrepresented in a enrichment analysis of the genes that include them.

C-314: Kinetic barcoding: A novel tool to estimate multi-temporal RNA biogenesis kinetics
Track: iRNA
  • Ezequiel Calvo-Roitberg, UMass Chan Medical School, United States
  • Adam Hedger, UMass Chan Medical School, United States
  • Jonathan K Watts, UMass Chan Medical School, United States
  • Athma A Pai, UMass Chan Medical School, United States


Presentation Overview: Show

mRNA production speed is determined by the time it takes to transcribe and process pre-mRNA molecules. Methods for measuring RNA maturation involve sequencing RNA intermediates at different time points. However, current techniques have limited temporal resolution, which makes it difficult to measure very fast biogenesis rates. Additionally, these methods do not allow to measure the variability in elongation rates within the same gene. To address these issues, we developed "kinetic barcoding" that involves stepwise labeling of nascent RNA with multiple nucleosides to measure multiple time points within a single sequencing library. By sequentially adding 5-ethynyl-uridine (5eU), 6-thio-guanosine (6sG), and 4-thio-uridine (4sU) at different time points we can measure nascent RNA intermediates at multiple time scales in a single experiment. We isolate nascent RNA by biotinylating and pulling down 5eU-labeled RNA, followed by alkylation of the 6sU and 4sU thiol groups to generate nucleotide-specific substitutions where these nucleotides were incorporated. This allows us to distinguish molecules transcribed during the first, second, and final labeling windows by their substitution patterns. We applied this kinetic barcoding approach to measure transcription elongation rates in human cells and showed that it provides increased temporal resolution for measuring the variation in elongation rates between genes.

C-315: A single-cell and spatio-temporal characterization of astrocyte heterogeneity in brain metastasis
Track: iRNA
  • Carolina Hernández-Oliver, Spanish National Cancer Research Center, Spain
  • Fátima Al-Shahrour, Spanish National Cancer Research Center, Spain
  • Manuel Valiente, Spanish National Cancer Research Center, Spain


Presentation Overview: Show

Brain metastasis (BrM)-associated astrocytes orchestrate a modulatory effect in the innate and acquired immune systems. However, little is known about how general and region-specific functions are aligned at the single-cell level; neither how sub-states are spatially distributed in the brain. In this study, we isolated adult mouse astrocytes by ACSA-2-mediated magnetic-activated cell sorting (MACS) to investigate transcriptome changes in three different brain metastatic mouse models. Single-cell RNA-sequencing revealed 7 gene expression clusters of BrM-associated astrocytes commonly present in breast, lung and melanoma primary origins, revealing a previously unappreciated complexity within this glial cell type. Unsupervised uniform manifold approximation and projection analysis and the expression of canonical markers upon clustering allowed us to further characterize them. We uncover an interferon(IFN)-responsive astrocyte (IRA) subpopulation, which was characterized by IFN-γ-induced JAK/STAT1 axis dysregulation, as well as high expression of the complement component C4b, previously associated with response to injury. Integration of single cell and spatial transcriptomics for early and late disease timepoints by SPOTlight deconvolution, showed different sub-states being widespread while others are regionally restricted. Specifically, IRA was exclusively present at late timepoint and restricted to the peritumoral area. This study provides unprecedented insights within the glial compartment of the brain metastasis microenvironment.

C-316: RNA splicing analysis using heterogeneous and large RNA-seq datasets
Track: iRNA
  • Jorge Vaquero-Garcia, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA, Spain
  • Joseph Aicher, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA, United States
  • San Jewell, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA, United States
  • Matthew Gazzara, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA, United States
  • Caleb Radens, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA, United States
  • Anupama Jha, Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA, United States
  • Scott Norton, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA., United States
  • Nicholas Lahens, Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA., United States
  • Gregory Grant, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA., United States
  • Yoseph Barash, Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA., United States


Presentation Overview: Show

The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. In a recent publication [1] we describe a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we assess the advantages of MAJIQ v2 compared to existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer insights into brain subregion-specific splicing regulation.

C-317: A Unified MAJIQ-L View of Transcriptome Complexity from Short and Long RNA-seq Reads
Track: iRNA
  • Seong Woo Han, University of Pennsylvania, United States
  • San Jewell, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

High-throughput short-read RNA sequencing has given researchers unprecedented detection and quantification capabilities of splicing variations across biological conditions and disease states. However, short-read technology is limited in its ability to identify which isoforms are responsible for the observed sequence fragments and how splicing variations across a gene are related. In contrast, more recent long-read sequencing technology offers improved detection of underlying full or partial isoforms but is limited by high error rates and throughput, hindering its ability to accurately detect and quantify all splicing variations in a given condition.

To better understand the underlying isoforms and splicing changes in a given biological condition, it’s important to be able to combine the results of both short and long-read sequencing, together with the annotation of known isoforms. To address this need, we develop MAJIQ-L, a tool to visualize and quantify splicing variations from multiple data sources. MAJIQ-L combines transcriptome annotation, long reads based isoform detection tools output, and MAJIQ (Vaquero-Garcia et al. (2016, 2023)) based short-read RNA-Seq analysis of local splicing variations (LSVs). We analyze which splice junction is supported by which type of evidence (known isoforms, short-reads, long-reads), followed by the analysis of matched short and long-read human cell line datasets.

C-318: Dysregulated alternative splicing and RNA-binding proteins as potential biomarkers and therapeutic targets for glioblastoma therapy
Track: iRNA
  • Laura Jareño, Tecnun - University of Navarra, Spain
  • Ana Azcue, Tecnun - University of Navarra, Spain
  • Angel Rubio, Tecnun - University of Navarra, Spain
  • Lorea Blazquez, Biodonostia Health Research Institute, Spain
  • Juan Angel Ferrer-Bonsoms, Tecnun - University of Navarra, Spain


Presentation Overview: Show

Glioblastoma is an aggressive type of brain tumour with a poor prognosis. Alternative splicing of pre-mRNA has been widely reported to be associated with the progression of malignant tumours including glioblastoma. Therefore, we reasoned that dysregulated alternative splicing and the regulatory RNA-binding proteins (RBPs) may provide potential biomarkers and therapeutic targets in glioblastoma. Using FASTQ files provided by the Chinese Glioma Genome Atlas (CGGA) we quantified alternative splicing events using EventPointer (an R package developed in our group). In the CGGA dataset, we confirmed previous observations of dysregulated alternative splicing, where autophagosome membrane docking was the most enriched function among the genes associated with the dysregulated events. Besides, we performed an enrichment analysis of the potential RBPs regulating these events, and brain-abundant and specific RBPs required for neuronal development and function were identified. Specifically, CEFL4, HNRNPR, NOVA2, RBFOX1, QKI, NOVA1 and RBFOX2 appeared to be the RBPs regulating dysregulated events, leading to poor overall survival and a direct effect on proliferation, migration and invasion in glioblastoma. Among dysregulated events, retained introns stand out and these together with the aforementioned RBPs, may lead to the discovery of potential therapeutic targets.

C-319: Natural therapeutics: Cross-kingdom miRNA and its use to target the RNA landscape of glioblastoma
Track: iRNA
  • Vanessza Fentor, Cancer Research UK Edinburgh Centre, Institute of Genetics & Cancer, University of Edinburgh, United Kingdom
  • Ted Hupp, Cancer Research UK Edinburgh Centre, Institute of Genetics & Cancer, University of Edinburgh, United Kingdom
  • Javier Alfaro, International Centre for Cancer Vaccine Science, University of Gdansk, Poland
  • Paul Brennan, Centre for Clinical Brain Sciences, Edinburgh Neuro-Oncology, University of Edinburgh, United Kingdom


Presentation Overview: Show

The development of cancer cells are frequently influenced by surrounding microenvironments. Thus, understanding how to disrupt the micro-environmental stressors promoting tumour growth through RNA expression is crucial. In glioblastoma Multiforme (GBM), a hostile tumour microenvironment (TME) generated by hypoxia as well as tumour promoting immune cells have been determined to increase the stem-like state of tumour cells by activating pro-migratory and pro-invasive factors. In addition, hypoxia poses a risk of poor therapy response or resistance. The resulting induced immunosuppression of the TME results in potentiating the tumour maintenance, progression, recurrence, and resistance to therapy causing a major dilemma for immunotherapy and new drug development. Here, we explore alternative natural medicines to combat tumour supportive microenvironments to enhance therapeutic outcomes in glioblastoma. We use transcriptomics to define novel hypoxia induced RNA transcripts in GBM cell models and use a deep learning approach to identify medicinal plant miRNAs that could be used to regulate the hypoxia induced RNA response. Our research has revealed numerous miRNAs in various edible plant species that exhibit potential regulatory effects on genes of interest from our tissue data. We are currently in the process of selecting targets for experimental validation on hypoxic stem cell lines.

C-320: Maturation study of organoids from single-cell RNA-seq data analysis
Track: iRNA
  • Solène Brohard, CEA/Jacob/CNRGH, France
  • Camille Lemercier, CEA/Jacob/CNRGH, France
  • Alexandre Hubert, CEA/Jacob/CNRGH, France
  • Jean-Francois Deleuze, CEA/Jacob/CNRGH, France
  • Eric Bonnet, CEA/Jacob/CNRGH, France


Presentation Overview: Show

Organoids are complex 3D structures derived from stem cells that have architectures and functionalities similar to in vivo organs. Organ-specific studies were able to confirm the structure and composition of mature organoids by comparing cell populations found in organoids with those found in the target organ. In this context, we aim to study the organoid maturation by exploring descriptive and statistical methods from single-cell RNA-seq data of organoids at different stages of maturation.
We used public single-cell RNA-seq datasets of kidney organoids at different stages (time points) of maturation (Subramanian et al, Nature Communications, 2019). We integrated all the datasets to highlight cell specificities for each stage of maturation. We also inferred trajectories to observe correlations between cell differentiation and the date of maturation. We finally used statistical methods to explore the variability of cell proportions between the different stages of maturation.
The perspective of the project is to establish a computational method that can predict the date of maturation of organoids.

C-321: A Systematic Identification of RBPs Driving Aberrant Splicing in Cancer
Track: iRNA
  • César Lobato-Fernández, TECNUN-Universidad de Navarra, Spain
  • Marian Gimeno, TECNUN-Universidad de Navarra, Spain
  • Ane San Martín, TECNUN-Universidad de Navarra, Spain
  • Ana Anorbe, TECNUN-Universidad de Navarra, Spain
  • Angel Rubio, TECNUN-Universidad de Navarra, Spain
  • Juan Ferrer-Bonsoms, TECNUN-Universidad de Navarra, Spain


Presentation Overview: Show

Alternative splicing (AS) is a co- and post-transcriptional process that generates multiple mRNA and protein isoforms from a single gene. The regulation of AS involves various mechanisms, with RNA-binding proteins (RBPs) being one of the key regulators. Besides their role in splicing regulation, RBPs are also implicated in cancer prognosis and represent promising therapeutic targets for cancer treatment. CLIP-seq experiments target specific RBPs and reveal the sites in the nascent transcriptome where the RBP attaches. A fruitful approach to establishing a causal connection is to analyze changes in splicing patterns around these sites. However, selecting the appropriate RBP(s) to study in a CLIP-seq experiment can be challenging and often relies on prior assumptions.
In this study, we have developed an algorithm that integrates CLIP-seq experiments with differential splicing analysis to identify RBPs likely related to splicing changes. Our refined algorithm improves prediction accuracy. We tested the algorithm on four different experiments and found significant relationships between RBPs and splicing alterations in cancer types. Afterwards, we used our model to analyse 19 cancer types in TCGA and TARGET databases. For each, we obtained a ranking identifying the RBP responsible for splicing changes, showing the algorithm's potential in identifying therapeutic targets.

C-322: RNA-RNA interactions in alternative splicing regulation
Track: iRNA
  • Egor Semenchenko, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Germany
  • Irmtraud M. Meyer, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Germany


Presentation Overview: Show

Alternative splicing (AS) is a major source of proteome and transcriptome diversity. In addition to RNA-binding proteins and their cis-motifs within the pre-mRNA, it is widely accepted that RNA secondary structure and trans-RNA interactions are important for AS regulation. However, little is known about the molecular mechanisms involved. Given the importance of AS in the context of disease and normal tissue development, there is a need to elucidate the role of RNA-RNA interactions in the regulation of AS.
To investigate this, we use data from RNA cross-linking and proximity ligation experiments. We developed a computational pipeline for RNA proximity ligation data, that facilitates the detection of the interactions with regulatory RNAs. By combining the derived RNA-RNA interactions with the complementary RNA-seq time course data, we identified several candidate RNA- pre-mRNA interactions which may be involved in splicing regulation. Our results demonstrate the importance of RNA-RNA interactions for AS and highlight the potential utility of interaction probing experiments for investigating the role of RNA-RNA interactions in other biological processes.

C-323: Single-Cell Full-Length Isoform Sequencing of patient-derived organoid cells of clear cell Renal Cell Carcinoma
Track: iRNA
  • Tülay Karakulak, University of Zurich, University Hospital Zurich, Swiss Institute of Bioinformatics, Switzerland
  • Hella Bolck, University Hospital Zurich, Switzerland
  • Anna Bratus-Neuenschwander, ETH Zurich, University of Zurich, Switzerland
  • Qin Zhang, ETH Zurich, University of Zurich, Switzerland
  • Natalia Zajac, ETH Zurich, University of Zurich, Switzerland
  • Weihong Qi, ETH Zurich, University of Zurich, Switzerland
  • Hubert Rehrauer, ETH Zurich, University of Zurich, Switzerland
  • Christian von Mering, University of Zurich, Swiss Institute of Bioinformatics, Switzerland
  • Holger Moch, University of Zurich, University Hospital Zurich, Switzerland
  • Abdullah Kahraman, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Swiss Institute of Bioinformatics, Switzerland


Presentation Overview: Show

Clear cell Renal Cell Carcinoma (ccRCC) is a lethal and aggressive cancer type that arises in kidney tubules. We have utilized single-cell long-read sequencing to identify potentially pathogenic and druggable cancer-specific transcripts in 4 patient-derived ccRCC organoid samples.

Multiplexed Arrays Sequencing (MAS-Seq) is a new single-cell full-length isoform sequencing technology that can accurately detect full-length transcripts across single cells in high-throughput. We have applied the MAS-Seq technology on patient-derived organoid cells of ccRCC samples and assessed their alternative splicing landscape. To gain a deeper view of the transcriptome of each cell, we barcoded as few cells as possible.

Preliminary results revealed, on average, the expression of 5756 genes in 958 cells per sample. The mean read number per cell was 325,688. On average, 15,000 transcripts were found in at least three cells per sample expressing a minimum of 100 genes. More than 22% of the sequenced transcripts were novel isoforms compared to GENCODE v39. Interestingly, the transcripts with the highest cell-to-cell variation were genes that have roles in ccRCC development.

Our results and analyses will likely reveal new diagnostic and predictive biomarker candidates and give ccRCC patients new hope for novel precision medicine treatments against their detrimental disease.

C-324: Computational analysis of transcriptomic data to identify PTEN-loss associated alternative splicing (AS) derived neoantigens in cancer datasets
Track: iRNA
  • Mosammat Antara Labiba, Queen Mary University of London, United Kingdom
  • Prabhakar Rajan, Barts Cancer Institute, United Kingdom
  • Conrad Bessant, Queen Mary University of London, United Kingdom


Presentation Overview: Show

Neoantigens are a type of tumour-specific antigen which can result from aberrations to the DNA or RNA. New non-catalytic functions for the tumour suppressor gene PTEN are emerging, including in alternative splicing (AS), where PTEN interacting with splicing machinery and the spliceosome has been described. Furthermore, PTEN-loss associated alterations to the immune tumour microenvironment (TME) have been observed. Preliminary findings in our group have found a role for PTEN-loss associated aberrant AS and we hypothesise neoantigens arising from these events could alter tumour cell recognition by immune cells and/or AS-derived proteins from these events could affect immune cell function in the TME. Using machine learning neoantigen prediction algorithms we have extracted putative PTEN-loss associated MHC-I AS-derived neoantigens from DU145 cell-line data that have been depleted of PTEN. Matched mass spectrometry data has been processed to identify translated neoantigens in the samples’ own reference proteome. Finally, TCGA prostate cancer data is currently being processed through the same pipelines to identify strong computationally predicted neoantigens in PTEN-loss conditions which will be correlated to the altered immune TME. This research is novel and will elucidate the PTEN-loss associated alterations on the transcriptome and proteome through neoantigens presented by MHC molecules.

C-325: miRarmature: a time series analysis framework for paired miRNA and RNA-seq data reveals new regulatory dynamics
Track: iRNA
  • Ranjan Kumar Maji, Goethe University and Uniklinikum Frankfurt, Germany
  • Eva-Maria Rogg, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
  • Ariane Fisher, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
  • Gilles Gasparoni, Epigenetics, Saarland Universtiy, Saarbrücken, Germany
  • Melanie Möller, Molecular Cell Biology & Microbiology, Wuppertal University, Wuppertal, Germany
  • Martin Simon, Molecular Cell Biology & Microbiology, Wuppertal University, Wuppertal, Germany
  • Stefanie Dimmeler, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
  • Marcel H. Schulz, Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany


Presentation Overview: Show

MicroRNAs (miRNAs) are the most versatile small RNA regulators in the cell. Cellular stress causes altered canonical miRNA processing by their biogenesis factors which in turn causes the generation of their isomiRic (modified miRNA) forms.
In our study, we perform paired small and ribo-depleted RNA-seq from induced AMI stress experiments, within 4 major cell types of heart at 6-time points. We observe significant dynamics in expression patterns of mature miRNA 3p and 5p arms. In this context, we investigate associations of miRNA target gene regulations with the dynamics of mature miRNA arm ratios (3p / 5p) at every time point. Further, we assess the isomiR expression ratios, their biogenesis factors (Drosha, Dicer, etc.), and TDMD (Target RNA-Directed MicroRNA Degradation) factors (Dis3l2, Tents, etc.) as plausible associative causes for canonical arm ratio changes. Finally, we predict co-regulated functional pairs as an effect of changing miRNA arm ratios and changing median expression of the target genes within functional gene sets. We implement this novel analysis framework in miRarmature, an R package to enable systematic investigation of the interesting dynamics of these differentially processed miRNAs, their associated factors, and predicted functional roles with statistical inferences and visualizations from time series experiments.

C-326: Oleaginous yeast biology elucidated with comparative transcriptomics
Track: iRNA
  • Sarah Weintraub, Worcester Polytechnic Institute, United States
  • Zekun Li, Worcester Polytechnic Institute, United States
  • Sarah Weintraub, Worcester Polytechnic Institute, United States


Presentation Overview: Show

Nonconventional yeasts naturally produce molecules useful in agriculture, medicine, and sustainable energy; however, it is difficult to engineer their metabolism. This is because fundamental biological knowledge is lacking. In this study, we used comparative transcriptomics to investigate the metabolism of oleaginous yeasts Yarrowia lipolytica, Debaryomyces hansenii, and Debaryomyces subglobosus to understand phenotypes observed under stress conditions. Since the -omics methods did not exist to conduct comparison across so many yeasts and conditions, we evaluated two different approaches – “network first” where a metabolic map is used as a scaffold to map genes and “cluster first” where a clustering algorithm is used to group genes with similar expression patterns across conditions. Our results showed that D. hansenii extensively and exclusively rewired its transcriptome in response to NaCl stress, while Y. lipolytica responded to multiple conditions including NaCl stress, (NH4)2SO4 starvation, and Fe(Cl3) starvation. For D. hansenii, flavinogenesis decreased in salt stress, but the transcriptome did not follow the same pattern. Thus, both the network first and cluster first approaches provided novel insights into nonconventional yeast biology. This has potential to simplify transcriptomics analysis for poorly annotated genomes and generate hypotheses for the creation of cell factories based on Debaryomyces yeasts.

C-327: e-RNA: a collection of web-servers for the prediction and visualisation of RNA secondary structure and their functional features
Track: iRNA
  • Volodymyr Tsybulskyi, MDC and Freie University, Berlin, Germany, Germany
  • Egor Semenchenko, Max Delbrück Center, Germany
  • Irmtraud M. Meyer, Max-Delbrück-Centrum für Molekulare Medizin (MDC), Berlin, and Free University, Berlin, Germany, Germany


Presentation Overview: Show

e-RNA is a collection of web-servers for the
prediction and visualisation of RNA secondary
structures and their functional features, including in
particular RNA-RNA interactions. In this updated
version, we have added novel tools for RNA secondary
structure prediction and have significantly updated the
visualisation functionality. The new method COBOLD
can identify transient RNA structure features and
their potential functional effects on a known RNA
structure during co-transcriptional structure formation.
New tool SHAPESORTER can predict evolutionarily
conserved RNA secondary structure features while
simultaneously taking experimental SHAPE probing
evidence into account. The web-server R-CHIE which
visualises RNA secondary structure information in
terms of arc diagrams, can now be used to also
visualise and intuitively compare RNA-RNA, RNA-
DNA and DNA-DNA interactions alongside multiple
sequence alignments and quantitative information. The
prediction generated by any method in e-RNA can be
readily visualised on the web-server. For completed
tasks, users can download their results and readily
visualise them later on with R-CHIE without having
to re-run the predictions. e-RNA can be found at
http://www.e-rna.org.

C-328: ShapeSorter: a fully probabilistic method for detecting conserved RNA structure features supported by SHAPE evidence
Track: iRNA
  • Volodymyr Tsybulskyi, MDC and Freie University, Berlin, Germany, Germany
  • Irmtraud M. Meyer, Max-Delbrück-Centrum für Molekulare Medizin (MDC), Berlin, and Free University, Berlin, Germany, Germany


Presentation Overview: Show

There is an increased interest determining RNA structures as it is now possible to probe them in a high-throughput manner, e.g. using SHAPE protocols. There alteady exist computational methods that integrate experimental SHAPE-probing evidence into computational RNA secondary structure prediction. The state-of-the-art in this field is currently provided by computational methods that employ the minimum-free energy strategy for prediction RNA secondary structures with SHAPE-probing evidence. These methods, however, rely on the assumption that transcripts in vivo fold into the thermodynamically most stable configuration and ignore evolutionary evidence for conserved RNA structure features. We here present a new computational method, SHAPESORTER, that predicts RNA structure features without employing the thermodynamic strategy. Instead, SHAPESORTER employs a fully probabilistic framework to identify RNA structure features that are supported by evolutionary and SHAPE-probing evidence. Our method can capture RNA structure heterogeneity, pseudo-knotted RNA structures as well as transient and mutually exclusive RNA structure features. Moreover, it estimates P-values for the predicted RNA structure features which allows for easy filtering and ranking. We investigate the merits of our method in a comprehensive performance benchmarking and conclude that SHAPESORTER has a significantly superior performance for predicting base-pairs than the existing state-of-the-art methods.

C-329: Isoform investigation of hemostatic genes in the liver
Track: iRNA
  • Alina Orozco, University of Gothenburg, Sweden
  • Tara Stanne, University of Gothenburg, Sweden


Presentation Overview: Show

Analysis of high-resolution long-read RNA-seq data (e.g Pacific Bioscience single-molecule real-time [PacBio] sequencing) is the most straightforward and reliable method to detect alternative splicing. The main advantage over conventional short-read NGS is that it eliminates the need for transcript reconstruction as full-length RNA transcripts are read (up to 10kb) and full-length isoform detection overcomes haplotyping issues associated with short-read data. Both of these features expedite analyses for allele-specific isoform determination compared to short-read sequencing studies (Deonovic et al, 2017).
The Liver Project (approved by local ethics committee: 665-13) complements ongoing clinical stroke cohort studies, in order to better understand how 32 hemostatic genes that are predominantly expressed in the liver are regulated. For this project, liver tissue and blood samples were collected from patients undergoing liver surgery at the Dept. of Surgery at the Sahlgrenska University Hospital. Previously, we used short-read targeted (NGS) sequencing: DNA-seq for SNP calling, RNA-seq for gene-level mRNA quantification, and bisulfite-sequencing for determination of. DNA methylation at CpG sites. We will now use cDNA capture, followed by PacBio sequencing to analyze full-length isoforms of the 32 genes from the same liver sample.

C-330: Muex: a method to identify microexons from long-read RNA sequencing data
Track: iRNA
  • Kamil Hepak, Earlham Institute, United Kingdom
  • Nicola Hall, University of Oxford, United Kingdom
  • Sofia Kudasheva, Earlham Institute, United Kingdom
  • Elizabeth Tunbridge, University of Oxford, United Kingdom
  • Wilfried Haerty, Earlham Institute, United Kingdom


Presentation Overview: Show

The average human exon is approximately 140nt long. However, a sizeable number of exons are shorter than 22nt, highly conserved, and play an important biological role. These microexons lack splicing regulator motifs and rely on other regulatory mechanisms. They are therefore a useful lens through which to study splicing regulation. However, their short size makes them hard to accurately detect, align, and annotate. Advances in long-read RNA sequencing (lrRNAseq) technology allow the sequencing of full-length transcripts, placing those microexons in their appropriate transcriptional context, but at the cost of lower accuracy relative to short-read approaches.
We have developed Muex, a method to identify microexons from lrRNAseq data. To test our method’s accuracy and sensitivity, and to fine-tune alignment parameters to better detect microexons, we used a dataset combining short- and long-read sequences from a human cell line. Finally, we applied Muex to lrRNAseq data from post-mortem disordered human brain samples. From a panel of 389 genes, we detected 55 putative novel microexons not previously annotated. Our work enables the prediction and functional characterisation of novel microexons, improving annotation and facilitating downstream investigations into their splicing regulation across tissues, cells, and development in health and disease.

C-331: On the predictibility of A-minor motifs from their local contexts
Track: iRNA
  • Coline Gianfrotta, DAVID, UVSQ, Université Paris-Saclay, France & LISN, CNRS, Univ. Paris-Saclay, France, France
  • Vladimir Reinharz, Université du Québec à Montréal, Canada
  • Olivier Lespinet, I2BC, UMR 9198, CNRS - Univ. Paris-Saclay, France
  • Dominique Barth, UVSQ, France
  • Alain Denise, LISN, CNRS, Univ. Paris-Saclay & I2BC, UMR 9198, CNRS - Univ. Paris-Saclay, France


Presentation Overview: Show

Our study concerns the classification and prediction of type I/II A-minor motifs in RNA 3D structures. It investigates the importance of several kinds of structural context in the formation of this motif, such as the 3D substructure around a motif. Our purpose is to determine what kind of information in the structural context can be useful to characterize and predict the presence and the position of these motifs.

Firstly, we develop an automated method to classify and characterize A-minor motifs according to their 3D context similarities. Secondly, we model the topological context of A-minor motifs and of their classes by graphs, and use it to study predictive ability of A-minor motifs, knowing only the topological context and sequence information.

We thus uncovered new subclasses of A-minor motifs according to their local 3D similarities. Most classes are composed of homologous occurrences, but some are composed of non-homologous occurrences, which could suggest an evolutive convergence. We also showed that, for some A-minor motifs, the topology combined with a sequence signal is sufficient to predict their presence and position. In most other cases, these signals are not sufficient for predicting A-minor motifs, however we show that they are interesting signals for this purpose.

C-332: splicekit: comprehensive toolkit for splicing analysis from short-read RNA-seq
Track: iRNA
  • Gregor Rot, Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland, Switzerland
  • Arne Wehling, Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland, Switzerland
  • Roland Schmucki, Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland, Switzerland
  • Nikolaos Berntenis, Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland, Switzerland
  • Jitao David Zhang, Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland, Switzerland
  • Martin Ebeling, Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland, Switzerland


Presentation Overview: Show

* Equal contribution: Gregor Rot and Arne Wehling

Splicing of RNA is a fundamental biological process. Dysregulation of splicing has been implicated in many human diseases and successfully exploited as a therapeutic target. Splicing analysis using short-read RNA-sequencing is a powerful technique to triage mechanism of action and safety profiles of drug candidates. There is, however, currently no open-source software pipelines for such applications.
To address this need, we introduce splicekit, a Python package that provides a comprehensive set of splicing analysis tools. A prototypical pipeline built with splicekit starts with the identification of count data at the levels of junctions, exons and genes, and identifies regulated features. Downstream analysis includes sample clustering, event visualization, RNA-protein binding and sequence motif analysis to elucidate both cis and trans regulatory events involving regulated splice sites.
In summary, splicekit provides a user-friendly and powerful toolset for comprehensive splicing analysis from short-read RNA-seq data. We anticipate that it will be valuable for researchers in both basic and translational research of splicing modulation.

C-333: Comparative analysis of single-cell RNA-seq protocols for transcript quantification
Track: iRNA
  • Chit Tong Lio, Technical University of Munich, Germany
  • Jan Baumbach, University of Hamburg, Germany
  • Markus List, Technical University of Munich, Germany
  • Olga Tsoy, University of Hamburg, Germany
  • Ana Conesa, Institute for Integrative Systems Biology, Spanish National Research Council, Spain


Presentation Overview: Show

Single-cell RNA sequencing (scRNA-seq) is a powerful technique for studying cellular heterogeneity and identifying biomarkers across various cell types. However, current scRNA-seq protocols have limitations in isoform-level analysis due to biases such as 3'-coverage biases and low RNA capture efficiency. To address these limitations, Smart-Seq technologies have been developed with limited throughput, enabling the capture of full-length transcripts in single cells, particularly useful for single-cell alternative splicing analysis. Meanwhile, single-cell long-read sequencing (MAS-seq) covers the entire length of transcripts, overcoming coverage bias, but suffers from high sequencing error and low throughput.

Since it is currently unclear how these new technologies perform in isoform-level analysis, we systematically compared scRNA-seq full-length transcript methods, including Smart-Seq2/3/3xpress and MAS-seq. We analyzed 5 data sets from human PBMC and performed downsample analysis to account for cell number variations. Our study provides insights into the effectiveness of different technologies in isoform-level analysis. We found that Smart-Seq3 shows superior performance in mapping quality, coverage bias, transcript counting, and cell type composition analysis. In conclusion, Smart-Seq3 is currently a superior technology for isoform-level analysis in single-cell RNA sequencing, while long-read methods (MAS-seq) still require improvement in RNA capture efficiency.

C-334: Identification of cancer cells from de novo SNVs calls in single-cell transcriptomes
Track: iRNA
  • Valérie Marot-Lassauzaie, Berlin Institute for Medical Systems Biology (BIMSB), Germany
  • Laleh Haghverdi, Berlin Institute for Medical Systems Biology (BIMSB), Germany


Presentation Overview: Show

Single-cell RNA (scRNA) is widely used to study cancer. However, it can be challenging to separate healthy from malignant cells based on gene expression alone. Cancer cells are clearly distinguished by somatic mutations, but these mutations are difficult to capture in scRNA data as the mutations have to be within actively transcribed regions with sufficient coverage across cells.

In our work, we directly use single nucleotide variants (SNVs) identified in scRNA data to separate healthy from cancer cells. We first show that it is possible to reliably call SNVs in scRNA data by leveraging information across cells, allowing us to capture even low coverage SNVs. We also identify patterns distinguishing false positive from true genomic SNVs and use these for filtering. We developed a clustering approach that accounts for the confounding effect of gene expression and the high drop-out rate. We validate our results on two datasets with known clonal structure: an acute myeloid leukemia dataset, and a colorectal cancer dataset. In both cases, we recover the known clonal structure using only the new SNVs. The method also returns a set of SNVs enriched in the different clones which we show can give valuable biological insight into the sample.

C-335: Identification and characterization of new unannotated human snoRNAs using an integrative transcriptomics strategy.
Track: iRNA
  • Alphonse Birane Thiaw, Université de Sherbrooke, Canada
  • Étienne Fafard-Couture, Université de Sherbrooke, Canada
  • Danny Bergeron, Université de Sherbrooke, Canada
  • Michelle Scott, Université de Sherbrooke, Canada


Presentation Overview: Show

Small nucleolar RNAs (snoRNAs) are non-coding RNAs present in all eukaryotes, known for their involvement in ribosomes biogenesis and gene expression regulation. Generally, snoRNAs are located in introns of longer genes or intergenic regions. The majority of snoRNAs are poorly characterized and there are currently a few predictors for their identification, which unfortunately, suffer typically from high rates of false positives and negatives. We aim to carry out a much wider screen to test the completeness of human snoRNAs. To do so, we have employed StringTie on TGIRT-seq from diverse normal human tissues to identify expressed genes missing from current annotations, having an intronic or intergenic location with size between 50 and 200 nucleotides. These data were integrated with immunoprecipitation sequencing studies of core snoRNA binding proteins (PAR-CLIP and eCLIP) as well as RNA-seq studies following depletion of these proteins to identify snoRNA candidates. This methodology allowed us to identify 119 potential snoRNAs including 65 box C/D and 55 box H/ACA snoRNAs which we are currently further validating and characterizing. These results further demonstrate that annotation of snoRNAs in humans is far from being exhaustive, hence the interest of implementing more efficient and more reliable pipelines for their identification.

C-336: Towards In-Silico CLIP-seq: Predicting Protein-RNA Interaction via Sequence-to-Signal Learning
Track: iRNA
  • Marc Horlacher, Helmholtz Center Munich, Germany
  • Nils Wagner, Department of Informatics, Technical University of Munich, Germany
  • Lambert Moyon, Helmholtz Center Munich, Germany
  • Klara Kuret, The Francis Crick Institute, London, UK, United Kingdom
  • Nicolas Goedert, Helmholtz Center Munich, Germany
  • Marco Salvatore, Department of Biology, University of Copenhagen, Denmark
  • Jernej Ule, The Francis Crick Institute, London, UK, United Kingdom
  • Julien Gagneur, Department of Informatics, Technical University of Munich, Germany
  • Ole Winther, Department of Biology, University of Copenhagen, Denmark
  • Annalisa Marsico, Helmholtz Center Munich, Germany


Presentation Overview: Show

RNA-binding proteins (RBPs) play a vital role in post-transcriptional regulation, including RNA modification, stabilization, localization, and translation. Knowing their RNA targets and binding preferences is important for understanding the mechanisms of post-transcriptional regulation and their implications in human diseases.
We present RBPNet, a novel deep learning method, which, given an RNA sequence, predicts the CLIP-seq signal distribution at single-nucleotide resolution. RBPNet utilizes a dilated convolutional neural network (CNN) architecture and achieves high generalization performance on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. Crucially, RBPNet directly operates on the raw CLIP-seq signal, eliminating the need for preprocessing steps, such as peak calling. RBPNet performs bias correction by modeling the CLIP-seq signal as a mixture of the protein-specific and control signal, thus mitigating technical biases which may otherwise hinder downstream analysis. Through model interrogation via Integrated Gradients feature importance scores, RBPNet identifies predictive sub-sequences corresponding to known and novel binding motifs. Using in silico mutagenesis, RBPNet scores the impact of single-nucleotide variants on RBP-binding, thus aiding in prioritizing potentially disease-causing variants.
RBPNet is the first method to directly model the raw CLIP-seq signal at nucleotide-resolution, thus improving both computational inference of protein-RNA interaction and interpretation of predictions over the state-of-the-art.

C-337: A systematic benchmark of machine learning methods for protein-RNA interaction prediction
Track: iRNA
  • Marc Horlacher, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Giulia Cantini, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Julian Hesse, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Patrick Schinke, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Nicolas Goedert, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Shubhankar Londhe, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Lambert Moyon, Computational Health Center, Helmholtz Center Munich, Germany, Germany
  • Annalisa Marsico, Computational Health Center, Helmholtz Center Munich, Germany, Germany


Presentation Overview: Show

RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation, for which profiling of binding sites on targets can be experimentally evaluated in vivo. As such profiles are limited to the transcripts expressed in the experimental cell type, numerous machine-learning based methods have been developed to infer missing binding information. However, heterogeneity of training and evaluation datasets across various sets of RBPs and CLIP-seq protocols prevents a direct comparison of their performance.

To address this, we systematically benchmarked 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluated the predictive performance of these methods and assessed the contribution of neural network architectures and input modalities on model performance. We show that the negative sampling strategy significantly affects model performance, with notably a strong impact on secondary structure’s contribution. Additionally, we show various degrees of performance degradation in cross-cell-type prediction settings, with some models being more sensitive, reflecting RBP-specific context dependence.

We believe that this study will guide future methods development in the field of computational modeling of protein-RNA interaction by serving as a reference for method design in regards to architecture, input modalities and generation of negative controls.

C-338: Discovery of new immunotherapy targets in cancer from transcriptomic data
Track: iRNA
  • Mathieu Quesnel-Vallières, University of Pennsylvania, United States
  • Caleb Radens, University of Pennsylvania, United States
  • Jacinta Davis, Children's Hospital of Philadelphia, United States
  • Katharina Hayer, Children's Hospital of Philadelphia, United States
  • Kristen Lynch, University of Pennsylvania, United States
  • Andrei Thomas-Tikhonenko, Children's Hospital of Philadelphia, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

The use of engineered T cells or antibodies to specifically target tumors is a recent form of immunotherapy that has revolutionized the treatment of certain cancer types. Unfortunately, these new technologies cannot be applied to most cancers because there is a lack of cancer-specific molecules to target. Splicing variations abound in cancer and happen in a manner that distinguishes tumors from normal cells. In order to discover new immunotherapy targets, we developed a computational pipeline that relies on large cohorts of short-read RNA sequencing (RNA-Seq) to detect molecular patterns that are enriched or exclusively found in cancer and not normal tissues. The pipeline surveys and quantifies splicing variations and gene expression in individual cancer samples and compares them to an array of over 7,000 samples covering 86 normal tissues and blood cell types. Detection is then followed by targeted long-reads sequencing for full isoform validation. We employed the pipeline to identify between 53 and 133 cancer-specific splice junctions in four types of pediatric cancers. Our method can be applied to any cancer dataset for which RNA-Seq data is available and, within a few days, generates a list of candidate immunotherapy targets that can then be validated in the lab.

C-339: Analyzing human population differences in alternative splicing at single-cell resolution
Track: iRNA
  • Rubén Chazarra-Gil, Barcelona Supercomputing Center, Spain
  • Marta Melé-Messeguer, Barcelona Supercomputing Center, Spain
  • Martin Hemberg, Harvard Medical School, United States


Presentation Overview: Show

Alternative splicing is a crucial mechanism for gene expression regulation and a major generator of proteome diversity. Alternative splicing has been mostly studied in bulk RNA-seq data, thus masking cellular heterogeneity. Single-cell RNA-seq can reveal cell type and trajectory-specific AS patterns leveraging from large cell numbers and fine grained cell type identification. However, commonly used droplet-based technologies for single-cell RNA-seq pose challenges due to high data sparsity and positional bias in sequencing reads across mRNA transcripts.
Here, we conduct a benchmark study on transcript quantification and differential splicing analysis using simulated 3’ scRNA-seq data. We evaluate several methods designed for AS analysis both for single-cell and bulk RNA-seq. Our results show that, in general, most methods have high precision but limited recall in transcript quantification due to undetected lowly expressed features in single-cells, which can be improved through pseudo-bulking.
We perform differential transcript usage and percent spliced-in analyses in a PBMC dataset from individuals of different genetic backgrounds. We detect population-specific splicing differences in the context of influenza virus infection across several cell types. Overall, our study highlights the strengths and limitations of alternative splicing characterization in 3’ scRNAseq data while providing insights into splicing differences between populations.

C-340: Machine learning methods for decoding subcellular RNA organization from spatial transcriptomics data
Track: iRNA
  • Clarence Mah, University of California San Diego, United States
  • Noorsher Ahmed, University of California San Diego, United States
  • Gene Yeo, University of California San Diego, United States
  • Hannah Carter, University of California San Diego, United States
  • Nicole Lopez, University of California San Diego, United States


Presentation Overview: Show

The spatial organization of molecules in a cell is essential for performing their functions. Spatial transcriptomics technologies have opened the door to characterization of cellular and subcellular organization. While current computational methods focus on discerning tissue architecture, cell-cell interactions and spatial expression patterns, these approaches are limited to investigating spatial variation at the multicellular scale. We present Bento, a Python toolkit that fully takes advantage of single-molecule information to enable spatial analysis at the subcellular scale. Bento ingests molecular coordinates and segmentation boundaries to perform three fundamental analyses: defining subcellular domains, annotating localization patterns, and quantifying gene-gene colocalization. To demonstrate the toolkit, we apply these methods to a variety of datasets including U2-OS cells (MERFISH), 3T3 cells (seqFISH+), and treated cardiomyocytes (Molecular Cartography). We quantify RNA localization changes in cardiomyocytes identifying mRNA depletion of critical cardiac disease-associated genes RBM20 and CACNB2 from the endoplasmic reticulum upon doxorubicin treatment. The Bento package is a member of the open-source Scverse ecosystem, enabling integration with other single-cell omics analysis tools.

C-341: CamoTSS: analysis of alternative transcription start sites for cellular phenotypes and regulatory patterns from 5' scRNA-seq data
Track: iRNA
  • Ruiyan Hou, The University of Hong Kong, Hong Kong
  • Chung-Chau Hon, RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan, Japan
  • Yuanhua Huang, The University of Hong Kong, Hong Kong


Presentation Overview: Show

Five-prime single-cell RNA-seq (scRNA-seq) has been widely employed to profile cellular transcriptomes, however, its power of analysing transcription start sites (TSS) has not been fully utilised. Here, we present a computational method suite, CamoTSS, to precisely identify TSS and quantify its expression by leveraging the cDNA on read 1, which enables effective detection of alternative TSS usage. With various experimental data sets, we have demonstrated that CamoTSS can accurately identify TSS and the detected alternative TSS usages showed strong specificity in different biological processes, including cell types across human organs, the development of human thymus, and cancer conditions. As evidenced in nasopharyngeal cancer, alternative TSS usage can also reveal regulatory patterns including systematic TSS dysregulations.

C-342: Long non-coding RNAs: The missing link to understanding the complex etiology of congenital heart disease?
Track: iRNA
  • Jacqueline Penaloza, Nationwide Children's Hospital, United States
  • Blythe Moreland, Nationwide Children's Hospital, United States
  • Jeffrey Gaither, Nationwide Children's Hospital, United States
  • Peter White, Nationwide Children's Hospital, United States


Presentation Overview: Show

Congenital Heart Disease (CHD) is the most common type of birth defect, yet its genetic contributors are poorly understood. There is growing evidence to the role of long non-coding RNAs (lncRNAs) in causing disease. Single nucleotide variants (SNVs) can alter the function of lncRNAs, however, there are no in silico tools which are heart-specific that can predict which of these variants are pathogenic versus benign. Therefore, we aim to develop a machine learning classifier to predict the pathogenic impact of lncRNA variants that contribute to CHD etiology.

We used three components to inform our classifier. First, using developing human heart single-cell RNA-sequencing we identified expressed lncRNAs. Second, we used the Genome Aggregation Database to determine the frequency of SNVs. Third, given that structure plays a role in lncRNA function, we computed the impact of SNVs on lncRNA structure. From the expression profiles we revealed lncRNAs which are co-expressed with known CHD genes. We found context-specific lncRNA SNVs which appear in lower rates in the human population and are more likely to destabilize RNA structure. In conclusion, by combining heart expression profiles, population frequencies, RNA stability metrics, and sequence-specific context we are developing the first lncRNA variant pathogenic score for CHD.

C-343: LinearSankoff: Linear-time Simultaneous Folding and Alignment of RNA Homologs
Track: iRNA
  • Sizhen Li, Oregon State University, United States
  • Ning Dai, Oregon State University, United States
  • He Zhang, Baidu Research USA, United States
  • Apoorv Malik, Oregon State University, United States
  • David Mathews, University of Rochester Medical Center, United States
  • Liang Huang, Oregon State University, United States


Presentation Overview: Show

The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes O(n^6) for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to O(n^3) by restricting the alignment search space, but this is still too slow for long sequences such as full-length viral genomes. On the other hand, the Sankoff algorithm and all its existing implementations use a rather simplistic alignment model, which can result in poor alignment accuracy. To address these problems, we propose LinearSankoff, which seamlessly integrates the original Sankoff algorithm with a powerful Hidden Markov Model-based alignment model. This extension substantially improves alignment quality, which in turn benefits secondary structure prediction quality, confirmed over a diverse set of RNA families. LinearSankoff also applies beam search heuristics and the A⋆-like algorithm to achieve that runtime scales linearly with sequence length. LinearSankoff is the first linear-time algorithm for global simultaneous folding and alignment, and the first such algorithm to scale to coronavirus genomes (n ≃ 30, 000nt). It only takes 10 minutes for a pair of SARS-CoV-2 and SARS-related genomes.

C-344: LinearDesign: Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity (Nature paper)
Track: iRNA
  • He Zhang, Baidu Research, United States
  • Liang Zhang, Baidu Research USA (formerly), United States
  • Ang Lin, China Pharmaceutical University, China
  • Congcong Xu, StemiRNA Therapeutics, China
  • Hangwen Li, StemiRNA Therapeutics, China
  • David Mathews, University of Rochester, United States
  • Yujian Zhang, StemiRNA Therapeutics (formerly), United States
  • Liang Huang, Oregon State University, United States


Presentation Overview: Show

(To Appear in Nature)

Messenger RNA (mRNA) vaccines are being used to contain COVID-19, but still suffer from the critical limitation of mRNA instability and degradation, which is a major obstacle in the storage, distribution, and efficacy of the vaccine products. Previous work showed that increasing secondary structure lengthens mRNA half-life, which, together with optimal codons, improves protein expression. Therefore, a principled mRNA design algorithm must optimize both structural stability and codon usage. However, due to synonymous codons, the mRNA design space is prohibitively large (e.g., ~10^632 candidates for the SARS-CoV-2 Spike protein). Here we provide a simple and unexpected solution using a classical concept in computational linguistics, where finding the optimal mRNA sequence is akin to identifying the most likely sentence among similar sounding alternatives. Our algorithm takes only 11 minutes for the Spike protein, and can jointly optimize stability and codon usage. On both COVID-19 and VZV vaccines, LinearDesign substantially improves mRNA half-life and protein expression, and dramatically increases antibody titer by up to 128× in vivo, compared to the codon-optimization benchmark. This surprising result reveals the great potential of principled mRNA design, and enables the exploration of previously unreachable but highly stable and efficient designs.

C-345: IDEAS: Integrative Differential Expression and Alternative Splicing Analysis
Track: iRNA
  • Armin Hadziahmetovic, LMU Munich, Germany
  • Leonie Pohl, LMU Munich, Germany
  • Alexandra Schubö, LMU Munich, Germany
  • Ralf Zimmer, LMU Munich, Germany


Presentation Overview: Show

attached long abstract

C-346: Computational resources at RTH for CRISPR gRNA design and off-target analysis with post editing RNAseq data
Track: iRNA
  • Christian Anthon, Center for non-coding RNA in Technology and Health, IVH, University of Copenhagen, Denmark
  • Giulia Corsi, Center for non-coding RNA in Technology and Health, IVH, University of Copenhagen, Denmark
  • Stefan Seemann, Center for non-coding RNA in Technology and Health, IVH, University of Copenhagen, Denmark
  • Xiaoguang Pan, Lars Bolund Institute of Regenerative Medicine, China / University of Copenhagen, Denmark, Denmark
  • Kunli Qu, Lars Bolund Institute of Regenerative Medicine, China / Aarhus University, Denmark, Denmark
  • Veerendra Gadekar, Center for non-coding RNA in Technology and Health, IVH, University of Copenhagen, Denmark
  • Yonglun Luo, Lars Bolund Institute of Regenerative Medicine, China / Aarhus University, Denmark, Denmark
  • Jan Gorodkin, Center for non-coding RNA in Technology and Health, IVH, University of Copenhagen, Denmark
  • Ying Sun, University of Copenhagen, Denmark


Presentation Overview: Show

At Center for non-coding RNA in Technology and Health (RTH) we provide tools for CRISPR/Cas9 gRNA design and for analysis of on-targets and off-targets for edits that was followed by transcriptome analysis. For gRNA design, our tools CRISPRon[1] and CRISPRoff[2] are combined into a user-friendly webserver: CRISPRon/off[3]. CRISPRon is a state-of-the-art deep learning model trained on human indel data, which outperforms other methods including those based on loss of function data used in several popular webservers. CRISPRoff, which is among the best performing off-target models, is based on a nucleotide binding energy model and not organism specific. For post editing analysis involving RNAseq data, we provide CRISPRroots[4] which combines CRISPRoff predictions with variance calling and differential expression analysis to establish if the observed changes between a CRISPR edited cell and a wildtype cell is influenced by one or more off-targets. The tools are available at http://rth.dk/resources/crispr.

References:
[1] Xiang X†, Corsi GI†, Anthon C†, Qu K†, et al. Nat Commun. 2021.

[2] Alkan F, et al. Genome Biol. 2018.

[3] Anthon C, et al. Bioinformatics. 2022

[4] Corsi GI, et al. Nucleic Acids Res. 2022

C-347: Deciphering common Alternative Splicing biomarkers across multiple Cancer Cohorts
Track: iRNA
  • Tülay Karakulak, University of Zurich, Switzerland
  • Tumor Profiler Consortium, University of Zurich, University Hospital Zurich, Switzerland
  • Christian von Mering, University of Zurich, Switzerland
  • Holger Moch, University Hospital Zurich, Switzerland
  • Abdullah Kahraman, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Switzerland


Presentation Overview: Show

Isoform switching events have been implicated in causing tumor progression. The Tumor Profiler (TuPro) Study, a multi-omics study of metastatic Melanoma, metastatic Ovarian cancer, and Acute Myeloid Leukemia, provides short-RNA-seq data and matched DIA-MS proteomics data. Here, we aimed to identify cancer-specific isoforms (CSI) in all three cancer types by integrating multi-omics level data from TuPro and TCGA and utilizing GTEx samples as a normal reference. Isoform switches were calculated with our in-house R tool MDTSwitchAnalyzer. In total, we analyzed 1112 cancer samples and identified 1644 CSI across three cancer types. Notably, the most frequently observed CSI were transcripts of the ADD3, TRAPPC5, and HIPK1 genes in Ovarian Cancer, Melanoma, and AML, respectively. Over 82% of Ovarian cancer samples exhibited the longest isoform of ADD3, which is already associated with cancer cell proliferation. However, the CSI of TRAPPC5 was identified in 60% of Melanoma samples, causing the loss of all protein interactions with the tumorigenesis-associated TRAPP complex.
Furthermore, a CSI of the HIPK1 gene, linked to angiogenesis, was observed in over 58% of AML samples. Validation analyses utilizing DIA-MS proteomics are ongoing for all identified transcripts. Our study reveals biomarker candidates across three cancer types by using multi-level data.

C-348: TranSipedia: a novel framework for large scale RNAseq data analysis with applications in cancer from research to diagnosis
Track: iRNA
  • Chloé Bessiere, Centre de Recherches en Cancérologie de Toulouse, INSERM, France
  • Benoit Guibert, INSERM U1183, ​Institute for Regenerative Medicine and Biotherapy, France
  • Hao Liang Xue, I2BC Paris Saclay, France
  • Florence Ruffle, Universite de Montpellier, France
  • Anthony Boureux, IRMB, Bio2M, University of Montpellier, France
  • Camille Marchet, CRIStAL - CNRS/Université de Lille, France
  • Rayan Chikhi, Institut Pasteur Paris, France
  • Daniel Gautheret, I2BC, Université Paris-Saclay, France
  • Thérèse Commes, UNIVERSITE DE MONTPELLIER, France


Presentation Overview: Show

Public RNAseq databases are precious resources to identify specific transcriptional events. Therefore, we want to make them accessible providing a better capture of the transcriptome complexity. Computational methods performing indexing of k-mers constitute interesting solutions to interrogate large omics datasets. We developed TranSipedia, a new framework based on k-mers, constructed with several modules: the RNAseq indexing with Reindeer (Marchet et al, 2020), a novel method to request all transcribed information; a module to generate k-mers as signature of transcripts (Kmerator; Riquier et al, 2021); a supporting website.
Reindeer performs indexing of k-mers and records their counts across large dataset collections. It provides an ultra-fast performance in the query process while indexing thousands of RNAseq. Moreover, it retains all the information contained in the raw data (annotation-free). For applications where gene expression level is required, the k-mer count must be sufficiently representative. Indeed, we developed Kmerator to construct specific k-mers. Lastly, the Transipedia website is available to facilitate queries and sharing by biologists. It now includes several thousands of datasets, mainly for cancer applications for example with the whole CCLE cohort (1019 RNAseq ~10 To), and is necessary to check biomarkers specificity comparing normal and tumor datasets.

C-348: TranSipedia: a novel framework for large scale RNAseq data analysis with applications in cancer from research to diagnosis
Track: iRNA
  • Chloé Bessiere, Centre de Recherches en Cancérologie de Toulouse, INSERM, France
  • Benoit Guibert, INSERM U1183, ​Institute for Regenerative Medicine and Biotherapy, France
  • Hao Liang Xue, I2BC Paris Saclay, France
  • Florence Ruffle, Universite de Montpellier, France
  • Anthony Boureux, IRMB, Bio2M, University of Montpellier, France
  • Camille Marchet, CRIStAL - CNRS/Université de Lille, France
  • Rayan Chikhi, Institut Pasteur Paris, France
  • Daniel Gautheret, I2BC, Université Paris-Saclay, France
  • Thérèse Commes, UNIVERSITE DE MONTPELLIER, France


Presentation Overview: Show

Public RNAseq databases are precious resources to identify specific transcriptional events. Therefore, we want to make them accessible providing a better capture of the transcriptome complexity. Computational methods performing indexing of k-mers constitute interesting solutions to interrogate large omics datasets. We developed TranSipedia, a new framework based on k-mers, constructed with several modules: the RNAseq indexing with Reindeer (Marchet et al, 2020), a novel method to request all transcribed information; a module to generate k-mers as signature of transcripts (Kmerator; Riquier et al, 2021); a supporting website.
Reindeer performs indexing of k-mers and records their counts across large dataset collections. It provides an ultra-fast performance in the query process while indexing thousands of RNAseq. Moreover, it retains all the information contained in the raw data (annotation-free). For applications where gene expression level is required, the k-mer count must be sufficiently representative. Indeed, we developed Kmerator to construct specific k-mers. Lastly, the Transipedia website is available to facilitate queries and sharing by biologists. It now includes several thousands of datasets, mainly for cancer applications for example with the whole CCLE cohort (1019 RNAseq ~10 To), and is necessary to check biomarkers specificity comparing normal and tumor datasets.

C-349: Deduplication of reads using UMIs influences RNA-Seq based prediction of small RNAs in non-model bacteria
Track: iRNA
  • Karel Sedlar, Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Germany
  • Armin Hadziahmetovic, Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Germany
  • Ralf Zimmer, Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Germany


Presentation Overview: Show

Small RNAs (sRNAs) play important regulatory roles in bacteria affecting their phenotype and tolerance to various chemicals, which could be utilized for their genome engineering to improve various fermentation processes for the production of bio-based chemicals. Although specialized techniques to study sRNAs like GRIL-Seq, RIP-Seq, or RIL-Seq exist, their implementation in non-model bacteria is cumbersome and sRNA prediction is usually performed with computational tools processing standard RNA-Seq data. Here, we studied the influence of RNA-Seq deduplication using unique molecular identifiers (UMIs) on prediction of sRNAs in two non-model bacteria, Caldimonas thermodepolymerans and Rhodospirillum rubrum. Even after quality and adapter trimming, RNA-Seq data contained erroneous reads causing unexplained bias in per sequence GC content. This bias was removed by deduplication using UMIs that for test dataset of 48 RNA-Seq samples removed roughly 50% of all reads. While gene expression of protein coding genes remained almost unaffected, numbers of predicted sRNA differed greatly. The explanation was found in the change of sequencing depth that highly influences prediction of sRNAs based on identifying peaks in coverage of non-coding regions. Therefore, algorithms for prediction of sRNAs from standard RNA-Seq need to be revised to allow fully automatic reliable predictions.

C-350: Rules for synonymous codon choice: When do suboptimal codons matter?
Track: iRNA
  • Helen Sakharova, University of California, Berkeley, United States
  • Liana Lareau, University of California, Berkeley, United States


Presentation Overview: Show

Genes can be encoded with seemingly equivalent synonymous codons, but codon choice can have dramatic effects on gene output. Naive rules for codon optimization replace slowly-translated codons with synonymous, optimally-translated codons, with the goal of increasing protein production. However, slowly-translated codons can have important functional roles, for instance by facilitating co-translational folding. While it is widely acknowledged that synonymous codons are not exact synonyms, the basic molecular rules governing codon choice are still poorly understood. We have explored when and where codon choice is most strongly constrained using computational and experimental methods. We conducted a genome-wide screen in yeast that targets positions of conserved slow translation. Using Cas9 retron editing, we created thousands of slow-to-fast synonymous codon substitutions, and grew them together in a pooled competition. Careful controls, including slow-to-slow substitutions and multiple guides targeting each site, allow confident identification of synonymous variants that significantly decrease or increase fitness. In parallel, we are training large language models on hundreds of thousands of eukaryotic genes in order to identify constraints on codon sequences. Combined with our large scale experimental data, our model will produce general rules for predicting the rare but important positions where ‘optimal’ codons are detrimental.

C-351: RNA methylation landscape of innate immune genes
Track: iRNA
  • Carmen Maria Livi, STORM Therapeutics, United Kingdom
  • Yaara Ofir-Rosenfeld, STORM Therapeutics, United Kingdom
  • Hendrik Weisser, STORM Therapeutics, United Kingdom


Presentation Overview: Show

Post-transcriptional chemical modifications regulate RNA biology. The most abundant modification in eukaryotic mRNA is N6-methyladenosine (m6A) which is deposited by the N6-methyltransferase (METTL3/METTL14) complex co-transcriptionally and affects splicing, stability, transport and translation. m6A dysregulation is implicated in cancer and other diseases. The precise mapping of m6A is therefore crucial, with recent profiling techniques enabling the detection of methylation sites at single-base resolution.

METTL3 inhibition impacts the expression of genes associated with innate immune pathways. To investigate the role of RNA methylation in this response, we collected and analysed publicly available high-resolution m6A data of human cell lines: HEK293, profiled using individual-nucleotide crosslinking and immunoprecipitation (miCLIP); HEK293T, profiled using chemical deamination of unmethylated adenosines (GLORI); HEK239, HeLa and HepG2, profiled using m6A-selective allyl chemical labelling (m6A-SAC-seq).

Our results show good reproducibility between recent antibody-free methods that detect far more m6A sites than previous techniques. In most of the datasets, immune-associated genes do not differ in overall methylation levels compared to other genes. However, GLORI data shows significant (Kolmogorov–Smirnov test < 0.01) hypomethylation in innate immune genes in HEK293T cells. Specifically, only 25% of immune genes have more than 10 m6A sites, compared to 35% of non-immune genes in this dataset.

C-352: Annotation free identification of differentially expressed smallRNAs from RNA-Seq
Track: iRNA
  • Armin Hadziahmetovic, LMU Munich, Germany
  • Alexandra Schubö, LMU Munich, Germany
  • Leonie Pohl, LMU Munich, Germany
  • Samuel Klein, LMU Munich, Germany
  • Ralf Zimmer, LMU Munich, Germany


Presentation Overview: Show

Small non-coding RNAs (sncRNAs), are crucial in the regulation of transcript expression. While the annotation of human miRNAs is comprehensive, other un-annotated sncRNAs and exogenously induced RNA molecules like small viral RNAs (svRNAs) are possibly overseen in smallRNA/transcriptome analyses. Yet, they can play a role in the regulation of gene expression, especially infection progression. By performing annotation-free and differential comparisons of small RNAs, even before identification of the RNA origin, we can bypass bad or missing annotation, which can be especially useful in case of novel viral infections like the recent SARS-CoV-2 outbreak. In our approach, we utilize a count model independent methodology that works for all signals, including skewed miRNA expression, or UMI barcoded and deduplicated data. The identified sequences can be used in combination with further analyses, e.g. assist and guide computational predictions or can be compared to other known occurrences of the identified or similar sncRNAs. With this methodology of gathering evidence for novel sncRNAs from RNA-seq data, it can be possible to fill gaps in common analyses that may be missing in order to further our understanding of causal interactions in complex regulatory networks, especially in the case of infectious disease progression.

C-353: DIANA-microT 2023 webserver expands miRNA interactomes beyond miRBase and host miRNAs
Track: iRNA
  • Spyros Tastsoglou, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece
  • Athanasios Alexiou, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece
  • Dimitra Karagkouni, Dept. of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Broad Institute, USA, United States
  • Giorgos Skoufos, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece
  • Elissavet Zacharopoulou, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece
  • Artemis Hatzigeorgiou, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece


Presentation Overview: Show

DIANA-microT-CDS is a state-of-the-art miRNA target prediction algorithm and one of the first algorithms to predict miRNA binding sites in both the 3’ Untranslated Region (3’-UTR) and the coding sequence (CDS) of transcripts, with increased performance. The current version of the microT webserver, (DIANA-microT 2023, www.microrna.gr/microt_webserver), brings forward a significantly updated set of interactions. DIANA-microT-CDS has been executed utilizing annotation information from Ensembl v102, miRBase 22.1 and, for the first time, MirGeneDB 2.1, yielding more than 83 million interactions in human, mouse, rat, chicken, fly and worm species. Additionally, this version delivers predicted interactions of miRNAs encoded from 20 viruses against host transcripts from human, mouse and chicken species. DIANA-microT integrates supplemental computational resources, including interactions from DIANA-TarBase and TargetScan, miRNA-disease links from plasmiR and HMDD, variant information from dbSNP, ClinVar, as well as miRNA/gene abundance values in numerous cellular/tissue contexts. The server interface has been redesigned allowing users to use smart filtering options, identify abundance patterns of interest, pinpoint known SNPs residing on binding sites and obtain miRNA-disease information. The contents of DIANA-microT webserver are freely accessible and can also be locally downloaded without any login requirements.

C-354: DIANA-miRPath v4.0: Context-specific analysis of combined miRNA functions
Track: iRNA
  • Artemis Hatzigeorgiou, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece
  • Spyros Tastsoglou, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece
  • Giorgos Skoufos, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece
  • Marios Miliotis, DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, Univ. of Thessaly, Hellenic Pasteur Institute, Greece, Greece


Presentation Overview: Show

DIANA-miRPath webserver enables the exploration of combined miRNA effects using predicted or experimentally supported miRNA interactions. Its latest version (DIANA-miRPath v4.0, http://www.microrna.gr/miRPathv4), introduces the capacity to tailor its target-based miRNA functional analysis engine towards specific biological/experimental contexts. Via a redesigned modular interface with rich interaction, annotation and parameterization options, users can perform enrichment analysis on Gene Ontology (GO) terms, KEGG and REACTOME pathways, gene sets from Molecular Signatures Database (MSigDB) and PFAM. Included miRNA interaction sets are derived from state-of-the-art resources of experimentally supported (DIANA-TarBase v8.0, miRTarBase and microCLIP cell-type-specific interactions) or from in silico miRNA-target interactions (DIANA-microT-2023 and TargetScan predictions). Bulk and single-cell expression datasets from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), as well as single-cell expression atlases can be used to assess expression change of targeted genes within terms, across a wide range of states. A discrete module to perform miRNA-tailored CRISPR knock-out screen analyses deems possible the investigation of selected miRNAs within conditions under study. Lastly, the option to upload custom interaction, term, expression, and screen sets further expands miRPath’s utility.