Posters - Schedules

Posters Home

View Posters By Category

Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT
Session A Poster Set-up and Dismantle Session A Posters set up:
Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT
Session A Posters dismantle:
Tuesday, July 12 at 6:00 PM CDT
Session B Poster Set-up and Dismantle Session B Posters set up:
Wednesday, July 13 between 7:30 AM - 10:00 AM CDT
Session B Posters dismantle:
Thursday. July 14 at 2:00 PM CDT
Virtual: A systemics view of PURAs cellular function from omics data gives insights into PURA related diseases
COSI: iRNA
  • Melina Klostermann, Buchmann Institute for Molecular Life Sciences (BMLS) and Faculty Biological Sciences, Goethe University Frankfurt, Germany
  • Lena Molitor, Department of Molecular Neuroscience Weizman Institute of Science, Tel Aviv, Israel
  • Dierk Niessing, Institute of Pharmaceutical Biotechnology, Ulm University, Germany
  • Kathi Zarnack, Buchmann Institute for Molecular Life Sciences (BMLS) and Faculty Biological Sciences, Goethe University Frankfurt, Germany


Presentation Overview: Show

The protein PURA (purine-rich element binding protein A) is associated to two different types of neuronal diseases. It is described to be bound to neuronal aggregates in the neurodegenerative RNA-repeat expansion disorders ALS (Amyotrophic lateral sclerosis) and FTD (frontotemporal dementia). At the same time, spontaneous de novo mutations of the PURA gene cause a neurodevelopmental disease called PURA Syndrome. However, in both cases very little is known about the PURA-affected molecular pathways.

In order to elucidate PURA binding and its function in a cellular context, here we show an in-depth characterization of PURA bound RNAs in vivo as well as the effects of PURA depletion on cellular RNA and protein levels from high throughput data. We find that PURA globally binds protein-coding RNAs in the 3’UTR or CDS and a depletion of PURA causes widespread changes in the transcriptome. The RNAs regulated by PURA connect it to mitochondrial function, cytoplasmic granules and autophagy. Focusing more closely on cytoplasmic granules, we show that PURA locates to p-bodies (processing bodies) which are lost upon PURA depletion. These facets of PURA functioning in healthy cells might be indicative for PURAs role in disease.

Virtual: An integer programming framework for simultaneous prediction of RNA structure with pseudoknots and insertion of local 3D motifs
COSI: iRNA
  • Gabriel Loyer, Université du Québec à Montréal, Canada
  • Vladimir Reinharz, Université du Québec à Montréal, Canada


Presentation Overview: Show

Motivation: The prediction of RNA structure canonical base pairs from a single sequence, especially pseudoknotted ones, remains challenging in a thermodynamic models that grossly approximates the energy of the local 3D motifs joining them. It has become more and more apparent in recent years that the loops structural motifs, composed of non-canonical interactions, are essential for the multiple functions of RNAs and shaping its global structure. Our capacity to predict accurate 3D structures is also limited when it comes to the organization of the large intricate network of interactions that form inside those loops.
Results: We previously developed the integer programming framework RNA-MoIP (RNA Motifs over Integer Programming) to reconcile RNA secondary structure and local 3D motif information available in databases. We further develop our model to now simultaneously predict the canonical base pairs (with pseudoknots) from base pair probability matrices. We benchmarked our new method over a non-redundant set of 26 RNAs with pseudoknots and known RNA 3D structure. We show that the joined prediction of canonical base pairs structure and local conserved motifs (i) improves the ratio of well predicted interactions and (ii) avoids the prediction of overly knotted structures.
Availability: https://gitlab.info.uqam.ca/cbe/RNAMoIP

Virtual: Asymmetric inheritance of RNA toxicity in C. elegans expressing CTG repeats
COSI: iRNA
  • Maya Braun, Hebrew university, Israel
  • Shachar Shoshani, Hebrew university, Israel
  • Yuval Tabach, Hebrew university, Israel


Presentation Overview: Show

Nucleotide repeat expansions are a hallmark of over 40 neurodegenerative diseases and cause RNA toxicity and multisystemic symptoms that worsen with age. Through an unclear mechanism, RNA toxicity can trigger severe disease manifestation in infants if the repeats are inherited from their mother. Here we use Caenorhabditis elegans bearing expanded CUG repeats to show that this asymmetric intergenerational
inheritance of toxicity contributes to disease pathogenesis. In addition, we show that this mechanism is dependent on small RNA pathways with maternal repeat-derived small RNAs causing transcriptomic changes in the offspring, reduced motility, and shortened lifespan. We rescued the toxicity phenotypes in the offspring by perturbing the RNAi machinery in the affected hermaphrodites. This points to a novel mechanism linking maternal bias and the RNAi machinery and suggests that toxic RNA is transmitted to
offspring, causing disease phenotypes through intergenerational epigenetic inheritance.

Virtual: CRISPRon: Enhanced CRISPR/Cas9 gRNA design by deep neural networks, data aggregation and interactive user interface
COSI: iRNA
  • Xiuqing Zhang, BGI-Shenzhen, China
  • Jan Gorodkin, University of Copenhagen, Denmark
  • Yonglun Luo, Lars Bolund Institute of Regenerative Medicine, China
  • Lin Lin, Aarhus University, Denmark
  • George Church, Harvard Medical School, United States
  • Lars Bolund, Lars Bolund Institute of Regenerative Medicine, China
  • Huanming Yang, BGI-Shenzhen, China
  • Jian Wang, BGI-Shenzhen, China
  • Xun Xu, BGI-Shenzhen, China
  • Xin Liu, BGI-Shenzhen, China
  • Fengping Xu, Lars Bolund Institute of Regenerative Medicine, China
  • Hui Jiang, BGI-Shenzhen, China
  • Giulia Corsi, University of Copenhagen, Denmark
  • Jinbao Wang, MGI, BGI-Shenzhen, China
  • Tao Ma, MGI, BGI-Shenzhen, China
  • Jiayan Zhong, MGI, BGI-Shenzhen, China
  • Lijun Liu, Lars Bolund Institute of Regenerative Medicine, China
  • Zhanying Dong, Lars Bolund Institute of Regenerative Medicine, China
  • Peng Han, Lars Bolund Institute of Regenerative Medicine, China
  • Xue Liang, Lars Bolund Institute of Regenerative Medicine, China
  • Xiaoguang Pan, Lars Bolund Institute of Regenerative Medicine, China
  • Kunli Qu, Lars Bolund Institute of Regenerative Medicine, China
  • Xi Xiang, Lars Bolund Institute of Regenerative Medicine, China
  • Christian Anthon, University of Copenhagen, Denmark


Presentation Overview: Show

One of the most used genome editing tools is the CRISPR/Cas9 system, which make use of a guide RNA (gRNA) that contains a stretch of 20 nucleotides to match a genomic region at which editing is carried out. Hence, editing requires selecting a specific gRNA out of several possible ones in a given target region, so that efficiency is maximized on the intended target, the on-target, while the off-target effects are minimized. Here, we focus on on-target design and present the deep learning-based method CRISPRon trained on ~24,000 gRNAs, obtained from ~13,000 publicly available and additional ~11,000 in-house ones. The data were carefully split into sets with minimal overlap in sequence to make cross-validation while leaving out one set as a completely independent test set. Comparing to other methods on individual independent test sets which have minimal sequence overlap to the gRNAs used for training in these respective methods, we find that CRISPRon outperform them. Methods like CRISPRon trained on indel-based data substantially outperform models made from loss-of-function data, which are still widely used. CRISPRon is available as stand-alone tool or as a web server where on-target predictions are combined with off-target assessment by our CRISPR/Cas9 energy-based binding model CRISPRoff.

Virtual: CRISPRon: Enhanced CRISPR/Cas9 gRNA design by deep neural networks, data aggregation and interactive user interface
COSI: iRNA
  • Xiuqing Zhang, BGI-Shenzhen, China
  • Jan Gorodkin, University of Copenhagen, Denmark
  • Yonglun Luo, Lars Bolund Institute of Regenerative Medicine, China
  • Lin Lin, Aarhus University, Denmark
  • George Church, Harvard Medical School, United States
  • Lars Bolund, Lars Bolund Institute of Regenerative Medicine, China
  • Huanming Yang, BGI-Shenzhen, China
  • Jian Wang, BGI-Shenzhen, China
  • Xun Xu, BGI-Shenzhen, China
  • Xin Liu, BGI-Shenzhen, China
  • Fengping Xu, Lars Bolund Institute of Regenerative Medicine, China
  • Hui Jiang, BGI-Shenzhen, China
  • Giulia Corsi, University of Copenhagen, Denmark
  • Jinbao Wang, MGI, BGI-Shenzhen, China
  • Tao Ma, MGI, BGI-Shenzhen, China
  • Jiayan Zhong, MGI, BGI-Shenzhen, China
  • Lijun Liu, Lars Bolund Institute of Regenerative Medicine, China
  • Zhanying Dong, Lars Bolund Institute of Regenerative Medicine, China
  • Peng Han, Lars Bolund Institute of Regenerative Medicine, China
  • Xue Liang, Lars Bolund Institute of Regenerative Medicine, China
  • Xiaoguang Pan, Lars Bolund Institute of Regenerative Medicine, China
  • Kunli Qu, Lars Bolund Institute of Regenerative Medicine, China
  • Xi Xiang, Lars Bolund Institute of Regenerative Medicine, China
  • Christian Anthon, University of Copenhagen, Denmark


Presentation Overview: Show

One of the most used genome editing tools is the CRISPR/Cas9 system, which make use of a guide RNA (gRNA) that contains a stretch of 20 nucleotides to match a genomic region at which editing is carried out. Hence, editing requires selecting a specific gRNA out of several possible ones in a given target region, so that efficiency is maximized on the intended target, the on-target, while the off-target effects are minimized. Here, we focus on on-target design and present the deep learning-based method CRISPRon trained on ~24,000 gRNAs, obtained from ~13,000 publicly available and additional ~11,000 in-house ones. The data were carefully split into sets with minimal overlap in sequence to make cross-validation while leaving out one set as a completely independent test set. Comparing to other methods on individual independent test sets which have minimal sequence overlap to the gRNAs used for training in these respective methods, we find that CRISPRon outperform them. Methods like CRISPRon trained on indel-based data substantially outperform models made from loss-of-function data, which are still widely used. CRISPRon is available as stand-alone tool or as a web server where on-target predictions are combined with off-target assessment by our CRISPR/Cas9 energy-based binding model CRISPRoff.

Virtual: CSSR: assignment of secondary structure to coarse-grained RNA tertiary structures
COSI: iRNA
  • Chengxin Zhang, Yale University, United States
  • Anna Marie Pyle, Yale University, United States


Presentation Overview: Show

RNA secondary-structure (rSS) assignment is one of the most routine forms of analysis of RNA tertiary structures. However, traditional rSS assignment programs require full-atomic structures of the individual RNA nucleotides. This prevents their application to the analysis of coarse-grained RNA structures with missing base atoms, include 6% of all RNA chains from the PDB as well as computational models from popular structure prediction programs, such as NAST and 3dRNA. To address this issue, we developed CSSR (Coarse-grained Secondary Structure of RNA), an automated program for rSS assignment. CSSR assigns rSS to any PDB or mmCIF format structures with one or any combination of the following atom types: P, C5’, C4’, C3’, C2’, C1’, O5’, O4’, O3’, and N, where N is atom N9 and N1 for purine and pyrimidine, respectively. On both experimental structure from PDB and computational models from RNA-puzzle, CSSR assigned rSS achieves agreement of >90% to standard full-atomic rSS. CSSR should therefore be a useful tool for analysis of computational and experimental RNA structures alike. The source code and pre-compiled programs for CSSR is available at https://github.com/pylelab/CSSR.

Virtual: Exploring DEG calling in the presence of limited patient samples
COSI: iRNA
  • Pia Lange, Institute of Medical Informatics, University of Münster, Münster, Germany, Germany
  • Julian Varghese, Institute of Medical Informatics, University of Münster, Münster, Germany, Germany
  • Sarah Sandmann, Institute of Medical Informatics, University of Münster, Münster, Germany, Germany


Presentation Overview: Show

For detection of differentially expressed genes (DEGs) in RNA-sequencing data, approaches such as DESeq2 apply statistical tests to compare two conditions. Although accurate statistics always call for large group sizes, resources are often limited and researchers can only offer a limited number of samples.

Exploring current constraints of DEG calling in the context of limited data, we evaluated the performance of DESeq2 at a low number of samples being analyzed (< 7 samples per group). To base our analysis on a set of validated DEGs, we considered synthetically generated count data.

Even though recall of true DEGs steadily decreases along with the number of samples, high precision can be retained, depending on the proportion of true DEGs. For instance, data featuring high proportions of true DEGs (> 50%) already reach maximum precision at only three samples.
We also developed a strategy to further increase the ratio of true positives by intersecting the discoveries with results of a second analysis that is performed on a subset of the original samples.

Concluding, we could show that precise DEG calling results can be generated in the presence of low sample numbers, provided that planned downstream analyses are feasible in context of low recall.

Virtual: Feature and model selection for non-coding RNA classification
COSI: iRNA
  • Ibrahim Chegrane, CoBIUS lab, Department of Computer Science, University of Sherbrooke Qc, Canada., Canada
  • Nabil Benjaa, CoBIUS lab, Department of Computer Science, University of Sherbrooke Qc, Canada., Canada
  • Aïda Ouangraoua, CoBIUS lab, Department of Computer Science, University of Sherbrooke Qc, Canada., Canada


Presentation Overview: Show

Non-coding RNAs classification is important for genome annotation and for functional analyses of these molecules. Efficient methods for large scale RNA classification remain challenging. Existing methods often rely on structure similarities, and they have huge time complexities due to the use of ncRNA secondary structure information, which prevent them from being used at large scale. We present a sequence-based method that relies on common sequence motifs computation to provide a set of features used for fast and accurate classification of ncRNAs families using a supervised learning approach.
The experimental analysis shows that the method achieves comparable or more accurate classification than existing structure-based methods with drastically reduced processing times. For instance, compared to Infernal (Nawrocki-Eddy, Bioinformatics 2013 ), BlastN (Altschul et al., Journal of molecular biology 1990), ncrna-deep (Noviello et al., PLoS computational biology 2020), our method allows to classify all RNAs of the Rfam database (v14.1) families that contain at least 2 members (2636 families) with the highest accuracy (accuracy of 92.4\% and F-beta score of 86.7) and the lowest processing time (one hour).
These results demonstrate that, thanks to an appropriate selection of informative and discriminative conserved sequence motifs, it is possible to efficiently and accurately classify ncRNAs.

Virtual: Feature and model selection for non-coding RNA classification
COSI: iRNA
  • Ibrahim Chegrane, CoBIUS lab, Department of Computer Science, University of Sherbrooke Qc, Canada., Canada
  • Nabil Benjaa, CoBIUS lab, Department of Computer Science, University of Sherbrooke Qc, Canada., Canada
  • Aïda Ouangraoua, CoBIUS lab, Department of Computer Science, University of Sherbrooke Qc, Canada., Canada


Presentation Overview: Show

Non-coding RNAs classification is important for genome annotation and for functional analyses of these molecules. Efficient methods for large scale RNA classification remain challenging. Existing methods often rely on structure similarities, and they have huge time complexities due to the use of ncRNA secondary structure information, which prevent them from being used at large scale. We present a sequence-based method that relies on common sequence motifs computation to provide a set of features used for fast and accurate classification of ncRNAs families using a supervised learning approach.
The experimental analysis shows that the method achieves comparable or more accurate classification than existing structure-based methods with drastically reduced processing times. For instance, compared to Infernal (Nawrocki-Eddy, Bioinformatics 2013 ), BlastN (Altschul et al., Journal of molecular biology 1990), ncrna-deep (Noviello et al., PLoS computational biology 2020), our method allows to classify all RNAs of the Rfam database (v14.1) families that contain at least 2 members (2636 families) with the highest accuracy (accuracy of 92.4\% and F-beta score of 86.7) and the lowest processing time (one hour).
These results demonstrate that, thanks to an appropriate selection of informative and discriminative conserved sequence motifs, it is possible to efficiently and accurately classify ncRNAs.

Virtual: Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions
COSI: iRNA
  • Jiayang Chen, The Chinese University of Hong Kong, Hong Kong
  • Zhihang Hu, The Chinese University of Hong Kong, Hong Kong
  • Siqi Sun, Fudan University, China
  • Qingxiong Tan, The Chinese University of Hong Kong, Hong Kong
  • Yixuan Wang, Harbin Institute of Technology, China
  • Qinze Yu, University of Electronic Science and Technology of China, China
  • Licheng Zong, The Chinese University of Hong Kong, Hong Kong
  • Jin Xiao, The Chinese University of Hong Kong, Hong Kong
  • Irwin King, The Chinese University of Hong Kong, Hong Kong
  • Yu Li, CUHK, Hong Kong


Presentation Overview: Show

Non-coding RNA structure and function are essential to understanding biological processes, such as cell signaling, gene expression, and post-transcriptional regulations. These are all among the core problems in the RNA field. With the rapid growth of sequencing technology, we have accumulated a massive amount of unannotated RNA sequences. On the other hand, expensive experimental observatory results in only limited numbers of annotated data and 3D structures. Hence, it is still challenging to design supervised computational methods for predicting their structures and functions. Lack of annotated data and systematic study causes inferior performance. To resolve the issue, we propose a novel RNA foundation model (RNA-FM) to take advantage of all the 23 million non-coding RNA sequences through self-supervised learning. Within this approach, we discover that the pre-trained RNA-FM could infer sequential and evolutionary information of non-coding RNAs without using any labels. Furthermore, we demonstrate RNA-FM's effectiveness by applying it to the downstream secondary/3D structure prediction, protein-RNA binding preference modeling, and 5' UTR-based mean ribosome loading prediction. The comprehensive experiments show that the proposed method improves the RNA structural and functional modeling results significantly and consistently. Despite only trained with unlabelled data, RNA-FM can serve as the foundational model for the field.

Virtual: Metabolic responsive landscape of hepatic RNA metabolism revealed by single molecular RNA seq in humanized liver
COSI: iRNA
  • Chengfei Jiang, NIH/NHLBI, United States
  • Ping Li, NIH/NHLBI, United States
  • Haiming Cao, NIH/NHLBI, United States


Presentation Overview: Show

As the central metabolism organ in a human body, the liver can promptly adjust its transcriptome in response to systemic metabolism fluctuation. However, the transcriptional information generated from short-read RNA sequencing mainly reflect gene level expression. Thus, we employed the humanized mouse model whose liver was repopulated by human hepatocytes to perform the long reads direct RNA sequencing (DRS) using Nanopore sequencing to revel the metabolic regulation of transcriptome architecture in hepatocytes for both human and mouse. Totally, we got 31573 isoforms for human and 64402 isoforms for mouse, among which 45.8 % of human and 64.6% of mouse isoforms are different from annotation. Next, we identified 460 poly(A) tail dynamic transcripts and discovered 14929 N6-Methyladenosine RNA (m6A) modification sites during fasting in human respectively. Nevertheless, we noticed that human and mouse displayed significant divergences of gene transcript regulations in physiological regulation process. Besides, we found there were huge difference of transcript regulation among individuals in isoform structure, transcript mediation, Poly(A) dynamic and m6A modifications. In all, these results may provide the valuable resource for human liver studies and the translation from mouse to human as well as the personalized medicine.

Virtual: snoDB 2.0: An enhanced interactive database, specializing in snoRNAs
COSI: iRNA
  • Danny Bergeron, Université de Sherbrooke, Canada
  • Gabrielle Deschamps-Francoeur, Université de Sherbrooke, Canada
  • Étienne Fafard-Couture, Université de Sherbrooke, Canada
  • Philia Bourelle-Bouchard, Université de Sherbrooke, Canada
  • Michelle Scott, Université de Sherbrooke, Canada


Presentation Overview: Show

Small nucleolar RNAs (snoRNAs) are highly expressed non-coding RNAs whose canonical function is to guide the 2’-O-methlylation or pseudouridylation of ribosomal RNAs. In human, most of them are embedded in other genes. During the last two decades, several non-canonical functions in diverse cellular processes have been discovered but only for a handful of snoRNAs. This is in part due to technical difficulties in detecting and quantifying these embedded, high-copy, highly structured RNA molecules. In addition, useful quality information on snoRNAs is often missing or difficult to find in general and specialized publicly available resources.

To address this issue, we created snoDB, a specialized, interactive snoRNA database listing a multitude of relevant information on all known human snoRNAs. In this second and updated version, we included, among others, newly discovered snoRNAs and snoRNA expression data of diverse tissues and cell lines. We also included snoRNA-protein interactions from ENCODE, predicted secondary structure, snoRNA motifs and functional regions. Furthermore, we now provide an easy way to find all the different copies of a snoRNA.

SnoDB 2.0 aims to give up-to-date and quality data on snoRNAs, to help the scientific community to further characterize and understand these small versatile regulatory non-coding RNAs.

Virtual: Using machine learning methods to explore significant alternative splicing events in murine melanoma with single cell RNA-seq data
COSI: iRNA
  • Catherine Zhou, Lynbrook High School, United States
  • Menglan Xiang, Stanford University, United States
  • Eugene Butcher, Stanford University, United States


Presentation Overview: Show

Alternative splicing (AS) is a regulatory process that generates different isoforms from a single gene, and studies have shown that abnormal splicing events are linked to the progression of cancer. While differential alternative splicing has mainly been analyzed in bulk RNA-seq data, more robust methods are needed for analyzing splicing variants in single-cell data. In this study, we present the use of machine learning models to detect differentially expressed exons and subsequently classify cells based on splicing profile, revealing the pathological mechanisms behind cancer-associated alternative splicing in murine melanoma. The pipeline takes in files from scRNA-seq experiments and performs sequence alignment, AS event quantification, clustering, and filtering. The exons are then inputted into a hierarchical machine learning model consisting of a random forest feature detector and multi-layer perceptron, which identifies significant exons and classifies cancerous cells with high accuracy. The model was trained and validated with sets of scRNA-seq data derived from melanoma tissue, wherein pathway analysis and protein network analysis of the output verified established AS events in melanoma and pinpointed receptors and molecular pathways that were differentially spliced. Specifically, pathways regulating the spliceosome were highly enriched and the exons can be further explored as targets of melanoma treatment.

C-002: Long-read transcriptome sequencing analysis with IsoTools
COSI: iRNA
  • Matthias Lienhard, Max Planck Institute for molecular genetics, Germany
  • Ralf Herwig, Max Planck Institute for molecular genetics, Germany


Presentation Overview: Show

Long-read transcriptome sequencing (LRTS) enables sequencing of individual mRNA molecules at full length, facilitating the discovery, characterization, and quantification of novel genes, isoforms, and alternative splicing events. However, to unravel the full potential of LRTS, computational tools are pivotal that explore the data at all scales, ranging from single nucleotide information over isoform and gene level, to transcriptome-wide statistics.
With this in mind, we developed IsoTools, a comprehensive Python package for the analysis of LRTS data, which implements data structures integrating all relevant information from LRTS transcripts and reference annotation, together with broad analysis functionality to explore, analyze, and interpret the data. In particular, we implemented a graph-based method for the identification of alternative splicing events (ASE) and a statistical approach to detect differential events. This approach adds a valuable perspective on alternative splicing, especially for genes with complex splicing structure that covers several independent ASE.
To demonstrate our methods, we generated PacBio Iso-Seq data of human hepatocytes treated with the HDAC inhibitor VPA, a compound known to induce widespread transcriptional changes. Contrasted with short read RNA-seq, this analysis provides additional insights for a better understanding of alternative splicing, in particular with respect to complex novel and differential splicing events.

C-003: Context-Specific Long-Read Transcriptomics with Bambu
COSI: iRNA
  • Andre Sim, Genome Institute Singapore, Singapore
  • Ying Chen, Genome Institute Singapore, Singapore
  • Jonathan Göke, Genome Institute Singapore, Singapore


Presentation Overview: Show

Transcriptomics is moving towards sequencing more and more samples and conditions, but capturing novel transcripts while reducing the impact from non-expressed genes remains an ever present challenge. To address this problem we developed Bambu, an R software package that uses long read RNA-seq data for both transcript discovery and quantification, enabling context specific transcriptomics in a seamless way. Because Bambu uses long-reads this simplifies the discovery and read assignment of new complex transcripts with multiple exons that was not possible with standard short-read sequencing. To do this Bambu uses two modules: (1) for transcript discovery, Bambu trains a model using the preexisting known transcripts to reduce the impact of sample to sample variations that impact commonly used static thresholds. (2) for transcript quantification an expectation maximization algorithm is employed that estimates full-length and partial-length read support per transcript. With these two features we show that not only does Bambu greatly improve transcript discovery but that using these context specific transcripts improves the accuracy of transcript quantification as a whole.

C-004: The kinetics of RNA flow across subcellular compartments
COSI: iRNA
  • Robert Ietswaart, Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Equal contribution, United States
  • Brendan M. Smalec, Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Equal contribution, United States
  • Karine Choquet, Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA, United States
  • Erik McShane, Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA, United States
  • L. Stirling Churchman, Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA, United States


Presentation Overview: Show

The life of an mRNA is a dynamic enterprise. Eukaryotic mRNAs travel from the nucleus, where they are synthesized, to the cytoplasm, where they are translated. To quantitatively estimate the rates at which mRNAs flow across the cell, we developed subcellular TimeLapse-seq, a method that combines RNA metabolic labeling and biochemical cell fractionation. Next, we developed a system of ordinary differential equations in a Bayesian framework to mathematically model the observed fraction of newly-synthesized RNA across subcellular compartments to estimate the rate posterior distributions and paths of RNA flow genome-wide in human and mouse cells. For each gene, we measured the rate at which RNAs are released from chromatin into the nucleoplasm, exported from the nucleus into the cytoplasm, and loaded onto polysomes. We observed substantial variability between flow rates of different transcripts across all compartments. Transcripts from genes with related functions, such as ribosomal protein and histone genes, flow across compartments with similar kinetics. Finally, using machine learning we identified several molecular features that explain the rates of RNA flow. Overall, our study comprehensively characterizes the spatiotemporal life cycle of mammalian mRNAs, revealing the many lives of RNA transcripts and the molecular features underlying their fates.

C-005: Correlated evolution-guided binding sequence specificity predictions for experimentally unexplored RNA-binding proteins
COSI: iRNA
  • Shu Yang, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, United States
  • Jiahang Sha, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, United States
  • Kefei Liu, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, United States
  • Sumita Garai, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, United States
  • Jingxuan Bao, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, United States
  • Zixuan Wen, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, United States
  • Raymond T. Ng, Department of Computer Science, University of British Columbia, Canada
  • Li Shen, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, United States


Presentation Overview: Show

Characterizing the binding preferences of RNA binding proteins (RBPs) to their target RNAs is essential to decipher the underlying mechanism of the cross-talk between RNAs and proteins. Although RNAs can naturally fold into various structures, many RBPs are reported to display intrinsic preferences to specific nucleotide sequence motifs. Currently, wet-lab experiments have generated a large amount of data to explore RBP’s binding specificities; however, only a relatively small subset of RBPs have been covered. The binding information on the vast majority of the RBPs is still unknown.

In this work, we propose to improve the situation by utilizing the correlated evolutionary relationship between RBPs and their target RNA sequence motifs. Intuitively, similar to the setting of zero-shot learning, here we leverage the existing binding data from experimentally explored RBPs to infer the binding sequence specificities for RBPs without binding data, using evolution as auxiliary information. Specifically, we take the correlated evolution as the objective function and formulate the prediction of sequence specificities as optimization problems. We evaluated our method on the benchmarking CISBP-RNA database. The preliminary results showed that our predictions are better than the previous nearest neighbor alternatives and comparable to the true specificity model.

C-006: Trichotomous classification on small, limited datasets enables predictive model development for therapeutic small interfering RNA
COSI: iRNA
  • Kathryn Monopoli, University of Massachusetts Medical School | RNA Therapeutics Institute, United States
  • Dmitry Korkin, Worcester Polytechnic Institute | Bioinformatics & Computational Biology, United States
  • Anastasia Khvorova, University of Massachusetts Medical School | RNA Therapeutics Institute, United States


Presentation Overview: Show

While designing potent nonmodified small interfering RNAs (siRNAs) is trivial, no methods exist to accurately predict activity of therapeutic, chemically modified siRNAs. Building predictive models for modified siRNA design is challenging due to limited sizes of publicly available modified siRNA datasets. This limits application of potentially powerful machine learning methods. We present a framework for building supervised classification models using a small dataset (356 siRNA sequences with corresponding efficacies) for siRNA efficacy prediction. A trichotomous partitioning approach overcomes data noise and enabled exploration of several classification threshold combinations. In assessing these thresholds, we present a novel evaluation metric enabling model performance comparison despite large class imbalances. We identified a threshold pair yielding a random forest model that outperformed models developed from the same data using previously published linear methods. We employed a novel method for extracting features by proxy from the random forest model that can be applied to any classification model type, enabling simple feature preference comparisons. Extracted sequence preferences were consistent with current understanding of siRNA-mediated silencing mechanism, representing utility of feature extraction for exploring biological mechanisms. The presented framework applies to any classification problem where datasets are limited, enabling exploration of a great range of biological questions.

C-007: Roles of mouse liver lncRNAs in non-alcoholic steatohepatitis (NASH) and liver fibrosis identified by single-cell transcriptomics-based gene co-expression regulatory network analysis
COSI: iRNA
  • Kritika Karri, Boston University, United States
  • David Waxman, Boston University, United States


Presentation Overview: Show

Long non-coding RNAs comprise of non-coding genes with diverse mechanisms and regulatory roles. LncRNAs show high tissue and cell-type specific expression patterns with uncharacterized functions. Here, we assembled the non-coding transcriptome from >2,000 mouse liver RNA-seq datasets and discovered ~48,000 liver-expressed lncRNAs. These lncRNAs were used to analyze the single-cell transcriptome of two liver disease models, diet-induced NASH, and chemical-induced liver fibrosis. We applied trajectory inference algorithms to elucidate lncRNA zonation and discovered genes whose zonation was dysregulated during NASH pathogenesis. Changes in lncRNA expression emerged as a characteristic of macrophage expansion and the differentiation to NASH-associated macrophages, which is strongly linked to disease progression. Hundreds of lncRNAs were expressed in myofibroblasts, a key source of the fibrous scar in fibrotic liver. Regulatory network analysis using bigSCale2 was used to associate lncRNAs with biological functions and to predict key regulatory lncRNAs in NASH and liver fibrosis. Lnc10922(Meg3) and lnc47443(Fendrr) emerged as central regulators of Wnt signaling and immunity during liver fibrosis. Finally, we used triplex domain finder to identify regulatory gene targets for lncRNAs. Thus, we have characterized thousands of lncRNAs based on their cell-type specificity, spatial location, and used network analysis to predict their roles in liver disease.

C-008: A cell-type centric view of isoform expression reveals combination patterns of transcript elements across spatiotemporal axes of the brain
COSI: iRNA
  • Anoushka Joglekar, Weill Cornell Medicine, United States
  • Wen Hu, Weill Cornell Medicine, United States
  • Mark Diekhans, University of California Santa Cruz, United States
  • Alexander Stein, Weill Cornell Medicine, United States
  • Hagen Tilgner, Weill Cornell Medicine, United States


Presentation Overview: Show

Complex systems such as the brain leverage alternative splicing to expand the proteome. To enable splicing studies in distinct cell types, we developed methods for the identification of full-length isoforms from single cells. Using our robust algorithmic framework, we demonstrated that regional identity can sometimes override cell-type specificity, and our efforts to spatially resolve isoform sequencing revealed that some developmentally regulated genes display regional splicing gradients throughout the mouse brain. Our ongoing work applies these methods to more developmental timepoints and brain regions where we have identified temporally and regionally mediated patterns of splicing.
However, the conservation of cell-type specific isoform expression in human brain is underexplored. Not only did we develop single-nuclei isoform sequencing (SnISOr-Seq) to allow sequencing of frozen tissue, but we applied this technique to human frontal cortex to uncover coordination patterns, and the disparity of exon expression variability across different neurological conditions. Lastly, to allow the extrapolation of model organism results to human, our current efforts identify conserved and divergent cell-type specific splicing patterns in the human and mouse hippocampus, and evaluate the extent of inter-individual variability in splicing. Taken together, this provides a comprehensive view of spatio-temporal splicing patterns in the mammalian brain.

C-009: Isoform-specific structural characterization of RNA elements by long- and short-read sequencing
COSI: iRNA
  • Eric Pederson, University of Massachusetts Amherst, United States
  • Zhengqing Ouyang, University of Massachusetts Amherst, United States


Presentation Overview: Show

We present a new software pipeline that integrates modules developed in our lab to predict RNA secondary structures based on chemical probing data, such as icSHAPE. The joint Poisson-Gamma mixture model (JPGM) incorporates experimental data to estimate the probabilistic structure preference profile (SPP) across the transcriptome, which are applied as soft constraints for structure prediction by SeqFold. We apply our pipeline to publicly available human icSHAPE datasets and compare long-read (Nanopore) and short-read (Illumina) sequencing platforms. Using this pipeline, we characterize the secondary structures of 18 regulatory elements across 14 isoforms, as well as the 3D dynamics of the 5.8S rRNA and two MAT2A 3’ UTR hairpins. Illumina and Nanopore SeqFold secondary structure predictions are highly specific and sensitive for accurately predicting base pairing. Simulated SPP constructed from replica exchange Monte Carlo simulations for 5.8S rRNA highly correlate with the SeqFold ensemble SPP. Our modular JPGM-SeqFold pipeline can be adapted for other RNA structure probing experiments and structure sampling approaches.

C-010: Unraveling the roles of 5’ transcript leaders in gene regulation
COSI: iRNA
  • Christina Akirtava, Carnegie Mellon University, United States
  • Gemma E. May, Carnegie Mellon University, United States
  • Hunter Kready, Carnegie Mellon University, United States
  • Lauren Nazzaro, Carnegie Mellon University, United States
  • Matt Agar-Johnson, Carnegie Mellon University, United States
  • C. Joel McManus, Carnegie Mellon University, United States


Presentation Overview: Show

Translation initiation typically follows a cap-dependent scanning model, and thus is limited by the 5’ transcript leader (TL). The 5’TL contains cis-regulatory sequences and trans-acting factors which influence translation initiation efficiency (TE). Previous studies using in vivo reporters of synthetic TLs identified upstream AUGs as repressors of expression (Cuperus et al, 2017; Dvir et al, 2013; Sample et al, 2018). However, the relative contributions of these features to TE in natural TLs have not been determined. To address this, we use high-throughput assays (Noderer et al, 2014) to quantify in vivo regulation from 86% of natural yeast TLs. First, we find alternative start sites greatly impact initiation. We also measure yeast Kozak strengths to develop a leaky scanning model that predicts TE. Using an elastic net model, we explain ~64% of TE and quantify sequence feature effects in vivo. Additionally, we measure expression in an eIF2a mutant that limits translation. Computational modeling suggests ribosomes scan the TL less effectively and are impacted by TL length and structure in the mutant. Finally, we compared our results to an in vitro study of initiation (Niederer et al, 2022). Together, we identify the influence of cis-acting sequences and structures on TE in vivo.

C-011: The chronology of mRNP remodeling
COSI: iRNA
  • Yeon Choi, Institute for Basic Science, Seoul National University, South Korea
  • Buyeon Um, Institute for Basic Science, Seoul National University, South Korea
  • Yongwoo Na, Institute for Basic Science, Seoul National University, South Korea
  • Jeesoo Kim, Institute for Basic Science, Seoul National University, South Korea
  • Jong-Seo Kim, Institute for Basic Science, Seoul National University, South Korea
  • V. Narry Kim, Institute for Basic Science, Seoul National University, South Korea


Presentation Overview: Show

Throughout the journey from synthesis to decay, mRNA changes its protein partners, dynamically remodeling the ribonucleoprotein complex (mRNP). Yet, previous studies captured unsynchronized pools of mRNA interactome from mixed stages, without a temporal resolution that is critical for dissecting mRNP remodeling and understanding RNA binding protein’s (RBP’s) functions. Here, we provide an mRNA interactome in a time-resolved fashion. By 4-thiouridine (4sU) pulse-labeling, we captured mRNPs from 10-time points and quantified ≥700 RBPs using liquid chromatography with tandem mass spectrometry. The chronological orders of mRNA interaction are consistent with the known functions and localizations of RBPs: nuclear RBPs involved in pre-mRNA processing are “early” binders while translation factors and decay factors are detected at later time points.
Interestingly, some RBPs showed clear disagreement between mRNA binding dynamics and their reported functions, hinting at yet-unknown functions. We developed a supervised machine learning method that predicts mRNA binding dynamics based on Gene Ontology annotations and systematically identified RBPs with unexpected dynamics. RBPs with high prediction errors were not functionally well-characterized, indicating mRNA binding dynamics offer a useful clue for understanding these RBPs. Our study introduces a temporal dimension to the mRNA interactome research, and provides new insights into RBP functions and posttranscriptional regulation.

C-012: Characterisation of full-length transcripts with ‘Nexons’ uncovers the regulation of poison exons in splicing factors in human germinal centre B cells
COSI: iRNA
  • Ozge Gizlenci, Immunology Programme, The Babraham Institute, University of Cambridge, Cambridge, UK, United Kingdom
  • Louise Matheson, Immunology Programme, The Babraham Institute, University of Cambridge, Cambridge, UK, United Kingdom
  • Simon Andrews, Bioinformatics Group, The Babraham Institute, University of Cambridge, Cambridge, UK, United Kingdom
  • Laura Biggins, Bioinformatics Group, The Babraham Institute, University of Cambridge, Cambridge, UK, United Kingdom
  • Jingyu Chen, Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK, United Kingdom
  • Rebecca Berrens, University of Oxford, Oxford, UK, United Kingdom
  • Daniel J. Hodson, Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK, United Kingdom
  • Martin Turner, Immunology Programme, The Babraham Institute, University of Cambridge, Cambridge, UK, United Kingdom


Presentation Overview: Show

Alternative splicing (AS) plays a major role in the differentiation of immune cells during an immune response as 29% of AS genes are specific to the immune system. Although the role of AS is extensively investigated in T cells, its role in B cell activation is less characterised. We sought to develop a long-read technology ONT workflow to understand post-transcriptional regulation at both gene and isoform levels of human germinal centre (GC) B cells. As one of the challenges of ONT is the accurate computational analysis of isoforms, we developed ‘Nexons’ pipeline to identify the differentially spliced transcript variants using long-read sequencing. An in-depth analysis of splicing regulators with the Nexons revealed the differential regulation of the poison exon (PE) in splicing regulators (e.g. SRSF3) in GC B cells. In GC B cells, PEs of the splicing factors were preferentially spliced out whereas naïve B cells expressed isoforms carrying PE, leading to nonsense-mediated mRNA decay. Moreover, we identified novel spliced variants of these genes, which were undetectable due to the limitations of short-read data. Altogether, our findings validate the combination of Nexons with Smart-seq2 adapted ONT RNA-sequencing workflow as a suitable method for the identification and quantification of complex isoforms.

C-013: RISER: Real-time in-silico enrichment of RNA species from Nanopore signals
COSI: iRNA
  • Alexandra Sneddon, Australian National University, EMBL Australia, Australia
  • Agin Ravindran, Australian National University, EMBL Australia, Australia
  • Nikolay Shirokikh, Australian National University, Australia
  • Eduardo Eyras, Australian National University, EMBL Australia, Australia


Presentation Overview: Show

Nanopore technology makes possible the sequencing of RNA with single molecule sensitivity. Standard library preparation protocols for nanopore direct RNA sequencing (DRS) enrich for transcripts that are polyadenylated to facilitate ligation of the sequencing adapter. The resultant libraries therefore typically contain a medley of RNA species with 3’ polyA tails. Specialised biochemical enrichment protocols are currently required if only a specific RNA species is of interest. Here we describe the first method for Real-time In-Silico Enrichment of RNA species (RISER) during DRS. RISER accurately classifies protein-coding from non-coding species directly from only 4 seconds of raw DRS signal with an independent test accuracy of 88% and AUROC of 0.96. RISER has also been integrated with Oxford Nanopore’s ReadUntil API to enact targeted real-time RNA sequencing. The potential applications of this novel technology are numerous, allowing for the first time; the enrichment of mRNAs by cleansing the sequencing data of unwanted non-coding RNA species; the enrichment of non-coding RNA species that are lowly expressed and difficult to detect in an unbiased sequencing experiment; or tagging reads with the RNA species to which they belong in real-time during sequencing. Additionally, the real-time nature of RISER confers its utility for time-sensitive applications.

C-014: Visualizing position-dependent effects of RNA binding proteins on alternatively spliced events
COSI: iRNA
  • Marina Yurieva, The Jackson Laboratory for Genomic Medicine, United States
  • Nathan Leclair, The Jackson Laboratory for Genomic Medicine; Graduate Program in Genetics and Development, UConn Health, United States
  • Brittany Angarola, The Jackson Laboratory for Genomic Medicine, United States
  • Laura Urbanski, The Jackson Laboratory for Genomic Medicine; Graduate Program in Genetics and Development, UConn Health, United States
  • Olga Anczukow, The Jackson Laboratory for Genomic Medicine; Department of Genetics and Genome Sciences, UConn Health, United States


Presentation Overview: Show

Alternative RNA splicing (AS) is a post-transcriptional mechanism that allows to generate multiple mRNA isoforms from a single gene. AS is regulated by the splicing machinery along with trans-acting RNA-binding proteins (RBPs) that bind cis-regulatory exonic or intronic motifs in the pre-mRNA. AS is a tunable process critical in development and often dysregulated in disease including cancer. Predicting which RBPs regulate specific splicing events and defining their position-dependent effects is needed to understand how AS is regulated in healthy tissues and in disease. Therefore, better tools are needed for the prediction and visualization of RBP motifs enriched in splicing events.
We developed a computational pipeline that inputs AS events detected by RNA-seq using rMATS1, searches for known RBP binding motifs within exons and introns using RBPMap2, and calculates their frequencies along with Bayesian probabilities based on background sequences without AS. The use of Bayesian probability enables a more straightforward interpretation and comparison of binding motifs across RBPs. Motifs enrichments are visualized as position-dependent probabilities along intronic or exonic sequences for each AS event type.
In sum, this visualization pipeline helps mapping the position of RBPs motifs and defining the role of RBPs in regulating AS in cell- and disease-specific conditions.

C-015: tRuffle: an R/Shiny app for exploring unique DEGs in scRNA-seq clusters
COSI: iRNA
  • Linda Ebbert, Institute of Medical Informatics, University of Münster, Germany
  • Julian Varghese, Institute of Medical Informatics, University of Münster, Germany
  • Sarah Sandmann, Institute of Medical Informatics, University of Münster, Germany


Presentation Overview: Show

Single-cell RNA-sequencing (scRNA-seq) represents a powerful method to investigate differential gene expression patterns among distinct cell types. Although the majority of identified differentially expressed genes (DEGs) are based on biological differences, there are still those that result from technical noise.
We present tRuffle, a user-friendly R/Shiny app, which does not only help to exclude systematic artifacts, but in addition, allows to easily recognize the most meaningful DEGs for each scRNA-seq cluster. Guided by an intuitive graphical user interface, users are able to upload DEG lists generated by commonly used tools for each cluster in comparison to the remaining data. Based on the DEG occurrence among all clusters, a uniqueness-score is calculated for every gene. Subsequently, this score operates as an additional evaluation parameter by excluding genes that are found to be differentially expressed in many clusters and are thus unspecific. Furthermore, specifying individual thresholds enables the user to adjust for varying levels of background noise and to explore only those genes which show cluster-specific expression. The combination of tabular and graphical output makes tRuffle a convenient, fast and adaptable tool to generate valid biological hypotheses from scRNA-seq data.

C-016: Bento: A toolkit for subcellular analysis of spatial transcriptomics data
COSI: iRNA
  • Clarence Mah, University of California San Diego, United States
  • Noorsher Ahmed, University of California San Diego, United States
  • Dylan Lam, University of California San Diego, United States
  • Hannah Carter, University of California San Diego, United States
  • Gene Yeo, University of California San Diego, United States


Presentation Overview: Show

Advances in spatial transcriptomics technologies produce RNA imaging data at increasingly higher throughput and scale, enabling insight into the spatial organization of RNA transcripts. While methods have been developed for tasks such as identifying cell types, characterizing tissue heterogeneity and predicting ligand-receptor interactions, the ability to study RNA organization at the subcellular level remains elusive. We present Bento, a Python toolkit for characterizing subcellular spatial RNA organization, implementing tools for platform-agnostic ingestion of high-throughput multiplexed spatial transcriptomics data, visualization, detection of subcellular RNA localization patterns, and identification of colocalizing transcripts. We applied Bento to analyze spatial transcriptomics datasets generated by seqFISH+ (10k genes) and MERFISH (130 genes) to understand the interplay between gene function, cellular morphology and RNA localization. We also utilize Bento to demonstrate how cis- acting sequence features and trans- acting RNA binding proteins can work together to differentially localize RNA molecules into subcellular compartments. The toolkit is implemented parallel to the existing ecosystem of single-cell and spatial analysis toolkits, thereby reducing the technical barrier to entry, as well as enabling the community to continue developing and maintaining the toolkit. We provide Bento as an open-source tool for the community to further expand our understanding of subcellular biology.

C-017: Predicting in-vivo binding preferences of RNA-binding proteins using transfer learning
COSI: iRNA
  • Ilyes Baali, Memorial Sloan Kettering Cancer Center, United States
  • Alexander Sasse, University of Washington, United States
  • Quaid Morris, Memorial Sloan Kettering Cancer Center, United States


Presentation Overview: Show

Predicting in vivo binding sites for RNA-binding proteins (RBPs) is challenging. Crosslinking immunoprecipitation (CLIP) assays allow the characterization of in-vivo binding preferences of many RBPs, but these assays only query a single genomic background and cellular context, often limited to cell lines. Here we explore a different approach -- pairing in vitro RBP binding data, from RNAcompete, with a transfer learning approach to predict in vivo binding sites given the in vitro binding preference. Specifically, we present a new multi-task deep learning-based model that predicts in-vivo binding sites for an RBP for which only the in-vitro binding preferences are available. In short, the model is pre-trained on a set of RBPs for which both in-vitro and in-vivo sequences are present, in order to learn extrinsic factors that affect binding in a cellular environment. Then, after training on in vitro binding data for additional RBPs, the model can be queried to predict the in-vivo binding preferences for unseen RBPs.This methodology has the potential to infer the binding preferences for many RBPs for which in-vivo characterizations are yet to be established.

C-018: Improved prediction of RBP-RNA interactions using AlphaFold-derived RBP representations
COSI: iRNA
  • Ilyes Baali, Memorial Sloan Kettering Cancer Center, United States
  • Cyrus Tam, Memorial Sloan Kettering Cancer Center, United States
  • Quaid Morris, Memorial Sloan Kettering Cancer Center, United States


Presentation Overview: Show

RBP-RNA interactions are essential to post-transcriptional regulation. Accurate prediction of their interactions could provide us with mechanistic insights into processes such as alternative splicing and facilitate the identification of potential therapeutic targets. Multiple methods have been proposed to predict such interactions, yet a key challenge is to effectively represent RBPs under the context of RNA binding, which is important as similarities between effective RBP representations would likely be a better predictor of similar RNA specificities than amino acid conservation. Inspired by the near-experimental accuracy achieved by AlphaFold in predicting protein structures, we hypothesize that intermediate features generated by AlphaFold during the RBP structure prediction process would serve as effective RBP representations.
We extracted the intermediate representations for 355 RBPs at multiple stages of the AlphaFold structure inference process, and used them to predict the binding specificities of their respective RBPs to 47 7-mer RNAs from an RNACompete assay, using various deep learning architectures. Our preliminary results indicate that models using AlphaFold-derived representations generally outperform those that utilize bag-of-words features, and is especially capable at generalizing across RBPs with distinct RNA-binding domains.

C-019: Improved long non-coding RNA annotation and detection in human transcriptomic profiles
COSI: iRNA
  • Phillip McCown, University of Michigan, United States
  • Felix Eichinger, University of Michigan, United States
  • Matthew Manninen, University of Michigan, United States
  • Fadhl Alakwaa, University of Michigan, United States
  • Damian Fermin, University of Michigan, United States
  • Abhijit Naik, University of Michigan, United States
  • Sean Eddy, University of Michigan, United States
  • Matthias Kretzler, University of Michigan, United States


Presentation Overview: Show

Introduction: The importance of long non-coding (lnc) RNA involvement in biology is increasingly recognized. By remapping transcriptomic datasets from human kidneys, we can identify lncRNAs involved in disease processes with cell type selectivity.

Methods: We evaluated different lncRNA annotation sources. A modified GeneTransferFile (GTF) was created incorporating the lncRNAKB (version 6) and Ensembl GTFs (GRCh38) into a non-redundant GTF, named lEG (lncRNAKB-Ensembl GTF). Three kidney biopsy datasets, comprising 59 patients, were remapped using lEG, effectively annotating genes and lncRNAs within one pipeline.

Results: Our non-redundant lEG contained 102,420 mRNA and lncRNAs. In kidney biopsy datasets, we mapped >= putative 25,000 lncRNAs in bulk and scRNA-seq profiles. This included previously undetected lncRNAs. In scRNAseq, at least 91 lncRNAs clustered discretely in glomerular, tubular, or immune cells with up to 84.5% specificity.

Conclusion: The lEG allows efficient gene-lncRNA mapping in a single pipeline, enabling rapid quantitation of lncRNA expression alongside gene profiles reducing the need to remap sequencing profiles for different use cases. We demonstrate lncRNAs with selective cell type expression and in disease states in human kidneys. Use of this approach can further our understanding in the etiology of several kidney diseases, while also possibly revealing therapeutic targets or diagnostic markers.

C-020: Opposing gradients of retinoic acid and sonic hedgehog specify tonotopic identity in the murine cochlea
COSI: iRNA
  • Shuze Wang, University of Michigan, United States
  • Saikat Chakraborty, University of Michigan, United States
  • Yujuan Fu, University of Washington, United States
  • Mary Lee, University of Michigan, United States
  • Jie Liu, University of Michigan, United States
  • Joerg Waldhaus, University of Michigan, United States


Presentation Overview: Show

In the mammalian auditory system, frequency discrimination depends on morphological and physiological properties of the organ of Corti that gradually change along the tonotopic axis of the organ and therefore shape the tuning properties of hearing. At the molecular level, those frequency-specific characteristics are mirrored in gene expression gradients, which require tonotopic patterning of the cochlea. However, molecular mechanisms that specify tonotopic identity remain poorly understood. To infer molecular mechanisms that pattern the organ of Corti along the frequency axis, we reconstructed the embryonic cochlear duct in 3D-space from scRNA-seq data and proposed two hypotheses regarding spatial patterning. Analyzing two developmental time points suggested that morphogens, rather than a timing-related mechanism, confer spatial identity in the cochlea. Subsequently, retinoic acid (RA) signaling was identified as a morphogen with a tonotopic gradient in the cochlear floor. Utilizing cochlear explants, functionality of the RA cascade was confirmed and an inverse relation with sonic hedgehog (SHH) signaling was predicted. Cell culture experiments indicated that SHH is involved in shaping the RA gradient via transcriptional regulation of Cyp26b1, which is a RA degrading enzyme. In summary, the findings suggest that RA and SHH form opposing morphogen gradients patterning tonotopic identity of the developing cochlea.

C-021: Full-length direct RNA sequencing identifies differential RNA length upon cellular stress
COSI: iRNA
  • Sulochan Malla, National Institute on Aging, United States
  • Showkat Dar, National Institute on Aging, United States
  • Christopher Lee, National Institutes on Aging, United States
  • Jessica Martin, National Institute on Aging, United States
  • Aditya Khandeshi, National Institute on Aging, United States
  • Jennifer Martindale, National Institute on Aging, United States
  • Cedric Belair, National Institute on Aging, United States
  • Manolis Maragkakis, National Institute on Aging, United States


Presentation Overview: Show

Cells respond to endogenous and exogenous stressors by activating stress response pathways that restore cellular homeostasis. The ability of cells to respond to stress reduces with age and has been associated with diseases such as neurodegeneration. Nevertheless, the molecular mechanisms underlying stress response and RNA regulation are poorly characterized. Here we employ end-to-end, direct RNA sequencing with ligated adaptors to capture the complete transcriptome in human cells upon stress. We developed NanopLen, a software package that identifies RNA length differences across conditions using linear mixed models. We find that upon stress, RNAs are subject to 5’ to 3’ end shortening. We show that RNA shortening is coupled to translation and ribosome occupancy and can be partially rescued by restoring translation initiation or preventing ribosome run-off. Our results show that RNA shortening is independent of the poly(A) tail length but dependent on XRN1, the primary 5’ to 3’ exonuclease. Finally, we show that upon stress, shortened RNAs are enriched in the stress granules and the splicing program shifts towards shorter gene isoforms. Taken together, this work provides a computational tool for differential RNA length analysis and reveals the dynamics of response mechanisms regulating the RNA state during oxidative stress.

C-022: LSV-Seq: A Novel Targeted Sequencing Method to Detect and Quantify Alternative Splicing Across Human Tissues
COSI: iRNA
  • Kevin Yang, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States
  • Peter Choi, Children's Hospital of Philadelphia, United States


Presentation Overview: Show

Alternative splicing is a key regulatory process that allows multiple transcripts to be produced from a single gene. Splicing has been primarily studied on a high-throughput scale via RNA sequencing (RNA-Seq). However, most of the reads in standard RNA-Seq are not the junction-spanning reads required for detecting splicing changes and splice isoforms. To address this, we developed a cost-effective, targeted RNA-Seq method to quantify splicing variations across human tissues. Building on the previous MPE-Seq method (Xu et al 2019) for detecting splicing in yeast, our LSV-Seq method uses thousands of reverse transcription primers anchored near 3’ splice sites. We created a new pipeline to identify targetable regions from previous RNA-Seq data using the MAJIQ algorithm and predict high yield primers. The library preparation protocol uses highly specific reverse transcription conditions to prevent off-target amplification. LSV-Seq achieves an overall median enrichment of ~500-fold compared to standard RNA-Seq and a median enrichment of ~800-fold for lowly expressed genes. Furthermore, LSV-Seq quantifications correlate well with RNA-Seq and detect numerous de novo junctions not found with RNA-Seq. We envision that LSV-Seq will be used to quantify splicing in large patient cohorts, detect splicing variation in lowly expressed genes, and detect transient splicing intermediates.

C-023: Assessing the diversity of transcription from the ENCODE4 long-read RNA-seq dataset
COSI: iRNA
  • Fairlie Reese, University of California, Irvine, United States
  • Brian Williams, California Institute of Technology, United States
  • Muhammed Hasan Çelik, University of California, Irvine, United States
  • Elisabeth Rebboah, University of California, Irvine, United States
  • Narges Rezaie, University of California, Irvine, United States
  • Diane Trout, California Institute of Technology, United States
  • Heidi Liang, University of California, Irvine, United States
  • Barbara Wold, California Institute of Technology, United States
  • Ali Mortazavi, University of California, Irvine, United States


Presentation Overview: Show

Alternative splicing allows a gene to encode multiple isoforms, which can also differ at the 5’ and 3’ ends, and is an important method of gene regulation in higher eukaryotes. Long-read RNA-seq (lrRNA-seq) allows for sequencing of entire transcripts which provides the putative 5’ and 3’ ends and the internal structure of each transcript. This allows us to explore the diversity of isoform usage across different samples.

As part of the final phase of the ENCODE Consortium, we have sequenced 216 lrRNA-seq libraries for 60 unique human and mouse samples using the PacBio platform. We detect a large portion of annotated protein-coding human genes and transcripts. We also identify unnannotated transcripts, for which we predict and characterize the coding potential. We assess the post-transcriptional regulatory impact of each isoform by predicting which are subject to NMD or include microRNA binding sites. We use this collection of known and novel transcripts to define a reference set of 5’/3’ ends and intron chains that are used for each gene. We characterize each gene based on how much variation is seen in end usage and internal splicing and investigate genes whose transcriptional preferences differ between different tissues, cell lines, or species.

C-024: Integrative analysis of hundreds of RNA binding proteins suggests known and novel regulators of 3’UTR diversity
COSI: iRNA
  • Matthew Gazzara, University of Pennsylvania Perelman School of Medicine, United States
  • Kristen Lynch, University of Pennsylvania Perelman School of Medicine, United States
  • Yoseph Barash, University of Pennsylvania Perelman School of Medicine, United States


Presentation Overview: Show

3’ untranslated regions (3’UTRs) play a critical role in controlling gene expression by altering mRNA stability, translation, and localization. Diversity in 3’UTR isoforms is generated through use of alternative polyadenylation sites (APA) and splicing of alternative last exons (ALEs). Although the breadth of alternative 3’UTR isoforms in different biological contexts is known, the mechanisms driving their regulation remain poorly understood. In order to identify novel regulators of 3’UTR diversity, we performed integrative analysis of diverse algorithms to detect multiple patterns of APA and ALEs in RNA-seq data from ENCODE. We applied this approach on the large set of over 350 RBP depletion experiments to detect significant shifts in 3’UTR isoforms. Integrating binding (eCLIP) and motif data with these functional targets identified several novel regulators. We show subsets of these regulated events influence mRNA stability and localization. Finally, co-expression analysis shows altered expression of specific regulators associated with shifts in specific cancers and across tissues in GTEx. We experimentally validated several targets of one of these novel regulators, the largely unstudied RNA helicase DDX55, which showed altered expression in ovarian cancer and acute myeloid leukemia.

C-025: Using machine learning to understand the determinants of mRNA stability in mycobacteria
COSI: iRNA
  • Huaming Sun, Worcester Polytechnic Institute, United States
  • Ying Zhou, Worcester Polytechnic Institute, United States
  • Diego Vargas Blanco, Massachusetts General Hospital, United States
  • Catherine Masiello, Worcester Polytechnic Institute, United States
  • Jessica Kelly, Worcester Polytechnic Institute, United States
  • Justin Moy, Worcester Polytechnic Institute, United States
  • Scarlet Shell, Worcester Polytechnic Institute, United States
  • Dmitry Korkin, Worcester Polytechnic Institute, United States


Presentation Overview: Show

As a highly successful pathogen, M. tuberculosis is able to infect, survive and proliferate within harsh microenvironments created by human host with the help of mRNA degradation regulation. Previous studies have shown that the variability in mRNA degradation presents not only among genes but also between conditions. Here we developed a computational pipeline using RNAseq and machine learning to identify the features that determine mRNA degradation in mycobacteria. First, we performed RNAseq to quantify mRNAs degradation profiles transcriptome-wide using the non-pathogenic model M. smegmatis in normal and stress condition. Next, we clustered mRNAs according to their degradation time. Then we trained a random forest classifier to explore the mRNA features that are associated with different degradation time. Our results show that instead of one dominant feature, various types of features including nucleotide and codon content, secondary structure, ribosome occupancy and other sequence features all contribute to differentiate the degradation time. Our results also demonstrate that the determinants of degradation are different for leadered and leaderless mRNAs and for mRNAs in normal and stress condition. All of these suggest that there are complex regulation mechanisms for mRNA degradation in mycobacteria.

C-026: A pan-cancer transcriptome analysis of 3’UTR exitron splicing
COSI: iRNA
  • Joshua Fry, University of Minnesota, United States
  • Rendong Yang, University of Minnesota, United States


Presentation Overview: Show

Much attention has been paid to alternative splicing and aberrant gene isoforms in protein coding regions, especially in cancer. However, less is known about splicing in untranslated regions (UTR). Here we present the first systematic pan-cancer study of introns within the UTR. Data presented in this paper shows that cryptic introns (called exitrons) in the 3'UTR are widespread in cancer. We analyze and call novel exitrons in over 9,000 tumor samples from The Cancer Genome Atlas (TCGA) pan-cancer dataset and over 9,000 normal samples from the Genotype-Tissue Expression (GTEx) database. We calculate the nonsense mediated decay (NMD) efficiency of 3'UTR exitrons, controlling for tumor expression heterogeneity and molecular subtype, and show that for many exitrons, including one in the AR gene, percent spliced is positively correlated with NMD efficiency. However, there are some genes where the inverse relation holds. We focus on one such gene, IGF2, and show that IGF2 gene expression in spliced samples can be explained by RNA binding proteins associated with the 3'UTR. We demonstrate that exitrons, while potentially triggering NMD, also transform the regulatory landscape of the 3'UTR, resulting in yet another avenue for the cancer cell to fine-tune gene expression and pursue advantageous cellular programs.

C-027: RNA splicing analysis using heterogeneous and large RNA-seq datasets
COSI: iRNA
  • Yoseph Barash, University of Pennsylvania Biociphers Lab, United States
  • Matthew Gazzara, University of Pennsylvania Biociphers Lab, United States
  • Jorge Vaquero-Garcia, University of Pennsylvania Biociphers Lab, Spain
  • Joseph Aicher, University of Pennsylvania Biociphers Lab, United States
  • Paul Jewell, University of Pennsylvania Biociphers Lab, United States
  • Caleb Radens, University of Pennsylvania Biociphers Lab, United States
  • Anupama Jha, University of Pennsylvania, United States
  • Christopher Green, University of Pennsylvania, United States
  • Scott Norton, University of Pennsylvania, United States
  • Nicholas Lahens, University of Pennsylvania, United States
  • Gregory Grant, University of Pennsylvania, United States


Presentation Overview: Show

The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. We describe here a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we demonstrate that the approaches in MAJIQ v2 outperform existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer new insights into brain subregion-specific splicing regulation.

C-028: A Unified View of Transcriptome Complexity by Combining Transcriptome Annotation with Long and Short Reads RNA Sequencing
COSI: iRNA
  • Yoseph Barash, University of Pennsylvania, United States
  • Seong Woo Han, University of Pennsylvania, United States
  • Paul Jewell, University of Pennsylvania, United States


Presentation Overview: Show

Although high-throughput short-read RNA sequencing has given researchers unprecedented detection and quantification capabilities of splicing variation, it is not able to directly capture full-length isoforms. In contrast, long-read sequencing offers improved detection of full or partial isoforms, but is limited by high error rates and throughput. Thus, in order to better understand the underlying isoforms and splicing changes in a given biological condition, we developed MAJIQ-COMB, a tool to visualize and quantify splicing variations which combines transcriptome annotation, long-read isoform detection from available tools, and MAJIQ (Vaquero et al., 2016, 2021) short-read RNA-Seq analysis in a modular manner. A “module” is a subunit of the gene splice graph that has a single point of entry and exit, breaking down a complex gene splice graph to its subunits. We analyzed human cell-line samples from the Long-read RNA-seq Genome Annotation Assessment Project dataset, enumerating possible paths (i.e. parts of isoforms) within each module, analyzing which path is supported by which type of evidence (known isoforms, short-reads, long-reads), quantifying the splicing variations involved, and combining them back into full transcripts. Our results demonstrate the utility of MAJIQ-COMB and the importance of such combined analysis for mapping transcriptome variations accurately.

C-029: scMAPA: Identification of cell-type–specific alternative polyadenylation in complex tissues
COSI: iRNA
  • Yulong Bai, University of Pittsburgh, United States
  • Yidi Qin, University of Pittsburgh, United States
  • Zhenjiang Fan, University of Pittsburgh, United States
  • Robert Morrison, University of Pittsburgh, United States
  • Kyongnyon Nam, University of Pittsburgh, United States
  • Hassane Zarour, University of Pittsburgh, United States
  • Radosveta Koldamova, University of Pittsburgh, United States
  • Quasar Padiath, University of Pittsburgh, United States
  • Soyeon Kim, University of Pittsburgh, United States
  • Hyun Jung Park, University of Pittsburgh, United States


Presentation Overview: Show

Alternative polyadenylation (APA) causes shortening or lengthening of the 3’-untranslated region (3’-UTR) of genes (APA genes) in diverse cellular processes. To identify cell-type-specific APA genes in scRNA-Seq data, current bioinformatic methods have several limitations. First, they assume certain read coverage shapes, which can be violated in multiple APA genes and in various technologies. Second, their identification is limited between 2 cell types and not directly applicable when multiple cell types exist. Third, they do not control undesired sources of variance, which potentially introduces noise to the cell-type-specific identification of APA genes.
We developed a bioinformatics tool, single-cell Multi-group identification of APA (scMAPA). To avoid the assumptions on the read coverage shape, scMAPA formulates a change-point problem after transforming the 3’-biased scRNA-Seq data to represent the full-length 3’-UTR signal. To identify cell-type-specific APA genes while adjusting for undesired sources of variation, scMAPA models the abundance of APA isoforms by multinomial logistic regression. In our novel APA simulation pipeline and human PBMC data, scMAPA outperforms existing methods in sensitivity, robustness, and stability. In mouse brain data consisting of multiple cell types sampled from multiple regions, scMAPA identifies cell-type-specific APA genes, elucidating novel roles of APA for dividing immune cells and differentiated neurons.

C-030: Temporally controlled expression of a splicing factor in single cells coordinates the metabolic and proliferative activities of regenerating livers
COSI: iRNA
  • Nick Baker, University of Illinois Urbana-Champaign, United States
  • Auinash Kalsotra, University of Illinois Urbana-Champaign, United States


Presentation Overview: Show

The liver maintains an extensive ability to regenerate following injury, typically via cellular proliferation of the main parenchymal cell type of the liver, the hepatocyte. The exact mechanics of regeneration such as how quiescent hepatocytes transition into a proliferative state and how regenerating livers sustain normal metabolic activities while the tissue recovers from injury remain largely unknown. Epithelial splicing regulator protein 2 (ESRP2) is an RNA splicing factor that acts as the developmental switch for splicing targets included in the Hippo signaling pathway, which is critical for organ development and regeneration. ESRP2 promotes production of these adult splice variants to turn off hepatocytes’ proliferation during maturation of the liver. Previous studies, as well as our own data, has shown that ESRP2 is temporally downregulated in the liver during regeneration, and then re-expressed during termination. We hypothesized that ESRP2 is temporally modulated during liver regeneration in individual hepatocytes to promote a proliferative or metabolic state. Induced overexpression of ESRP2 inhibited proliferation of hepatocytes, while ESRP2 KO hepatocytes showed increased proliferative index during liver regeneration. We used single-cell RNA sequencing to determine altered cell states and gene expression changes in ESRP2 KO cells compared to WT mice during liver regeneration and termination.

C-031: Post-transcriptional silencing of the nuclear poly(A) binding protein expression is essential for postnatal cardiac maturation and function
COSI: iRNA
  • Sandip Chorghade, University of Illinois at Urbana-Champaign, United States
  • Joe Seimetz, University of Illinois at Urbana-Champaign, United States
  • Subhashis Natua, University of Illinois at Urbana-Champaign, United States
  • Bo Zhang, University of Illinois at Urbana-Champaign, United States
  • Chaitali Misra, University of Illinois at Urbana-Champaign, United States
  • Ullas Chambazhi, University of Illinois at Urbana-Champaign, United States
  • Qinyu Hao, University of Illinois at Urbana-Champaign, United States
  • Kannanganattu Prasanth, University of Illinois at Urbana-Champaign, United States
  • Xander Wehrens, Baylor College of Medicine, United States
  • Jiwang Chen, University of Illinois Chicago, United States
  • Auinash Kalsotra, University of Illinois at Urbana-Champaign, United States


Presentation Overview: Show

The nuclear poly(A) binding protein (PABPN1) is a ubiquitously expressed, evolutionarily conserved protein that regulates various important aspects of mRNA metabolism, including poly(A) tail elongation and alternative polyadenylation (ApA). Here we report that PABPN1 protein level is diminished post-transcriptionally during postnatal heart development in mammals and that this silencing is critical for achieving adult, cardiac-specific gene expression profiles. Combining single-molecule RNA imaging and nucleo-cytoplasmic fractionations, we demonstrate that PABPN1 silencing during development occurs through a dramatic reorganization in its subcellular mRNA localization. While the PABPN1 mRNA is predominantly cytoplasmic in neonatal cardiomyocytes, it becomes nuclear-retained, partially spliced, and translation-inaccessible in adult cardiomyocytes. Importantly, cardiomyocyte-specific conditional loss- and gain-of-function studies revealed that premature elimination or persistent expression of PABPN1 in the mouse myocardium results in major structural and functional abnormalities, leading to heart failure and death. The direct RNA sequencing on nanopore arrays and high-resolution illumina sequencing revealed widespread changes in ApA and poly(A) tail lengths of cardiac transcripts during development, a significant portion of which is reliant on the postnatal silencing of PABPN1. Overall, these results indicate that developmental silencing of PABPN1 through its regulated mRNA splicing and localization is critical for postnatal cardiac maturation and function.

C-032: Comprehensive and scalable analysis of RNA splicing for examining splicing signatures
COSI: iRNA
  • Dennis Mulligan, University of California, Santa Cruz, United States
  • Mayra Gaspariano-Cholula, Benemérita Universidad Autónoma de Puebla, Mexico
  • Rebeca Martínez-Contreras, Benemérita Universidad Autónoma de Puebla, Mexico
  • Angela Brooks, University of California, Santa Cruz, United States


Presentation Overview: Show

We developed MESA (Mutually Exclusive Splicing Analysis), to quickly and comprehensively quantify alternative splicing from RNA-seq data. MESA calculates a single percent-spliced value for each splice interval, for analyzing global splicing signatures or local events.

We tested MESA and 6 other splicing quantification tools on Sequins synthetic RNA, evaluating each tool’s ability to detect differences in splicing. MESA was the most sensitive tool, finding 92.6% of the expected differences.

We created a unique resource of splicing signatures from the intropolis dataset, a collection of ~50,000 samples from the Sequence Read Archive (SRA). This resource can be queried with an input signature to find conditions with similar splicing. We used the signature for the U2AF1-S34F mutation in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples, and found similarity to multiple prostate cancer samples, which have been reported to have rare U2AF1 mutations.

We generated a splicing profile for TCGA breast tumors and compared it to 56 cell line samples from the Cancer Cell Line Encyclopedia (CCLE). We found certain cell lines to be more representative of tumor-specific splicing patterns and identified tumor-specific alternative splicing events that are candidates for further experimental investigation.

We believe MESA’s high scalability makes it a valuable tool for future RNA splicing analyses, including cross-study comparisons from large datasets.

C-033: Alternative Splicing Based Classification of Heterogeneous Cancers Reveals Novel Disease Subtypes
COSI: iRNA
  • Yoseph Barash, University of Pennsylvania, United States
  • David Wang, University of Pennsylvania, United States
  • Mathieu Quesnel-Vallieres, University of Pennsylvania, United States
  • Paul Jewell, University of Pennsylvania, United States
  • Kristen Lynch, University of Pennsylvania, United States
  • Andrei Thomas-Tikhonenko, University of Pennsylvania, United States


Presentation Overview: Show

Identification of cancer subtypes characterized by actionable genetic lesions is a pivotal step for developing treatment and improving clinical care. However, in heterogeneous diseases such as Acute Myeloid Leukemia (AML), subtype discovery can be challenging since mutation burden, which has traditionally been prioritized for this task, is low. However, recent studies pointing to splicing aberrations in AML motivate splicing based detection of cancer subtypes. We thus developed CHESSBOARD, an unsupervised machine learning algorithm, to identify “tiles” defined by a subset of splicing events and patient samples that represent disease subtypes. We applied our method to the beatAML dataset and found tiles that are enriched for splicing events that are affected by upstream regulator factors that have strong evidence of binding and differential behavior including SRSF1 and HNRNPC. We also show that the tiles correlate to therapeutic response to Sorafenib and mutations in FLT3, NPM1 and CEBPA that define known subtypes. Finally, to further explore CHESSBOARD’s utility, we analyze the TARGET B-ALL data discovering novel splicing subtypes characterized by patients with low relapse rates harboring a RUNX1-ETV6 fusion. Our results reveal novel mechanisms that drive tile formation and further confirm the translational importance of splicing aberrations in AML and related diseases.