Presentation Overview: Show 
While long-read sequencing is offering very informative data for gene isoform identification and quantification, there lack a theoretical and rigorous foundation of modeling and quantifying the new and unique benefit/information that long reads offer. Here I will present the mathematical basics of long-read information and demonstrate how long reads can help to improve gene isoform identification and quantification, i.e., to study the "intuitive ideas" of "unambiguous alignment of long reads is helpful" in a rigorous manner.
Presentation Overview: Show 
Cells respond to endogenous and exogenous stressors by activating stress response pathways that restore cellular homeostasis. The ability of cells to respond to stress reduces with age and has been associated with diseases such as neurodegeneration. Nevertheless, the molecular mechanisms underlying stress response and RNA regulation are poorly characterized. Here we employ end-to-end, direct RNA sequencing with ligated adaptors to capture the complete transcriptome in human cells upon stress. We developed NanopLen, a software package that identifies RNA length differences across conditions using linear mixed models. We find that upon stress, RNAs are subject to 5’ to 3’ end shortening. We show that RNA shortening is coupled to translation and ribosome occupancy and can be partially rescued by restoring translation initiation or preventing ribosome run-off. Our results show that RNA shortening is independent of the poly(A) tail length but dependent on XRN1, the primary 5’ to 3’ exonuclease. Finally, we show that upon stress, shortened RNAs are enriched in the stress granules and the splicing program shifts towards shorter gene isoforms. Taken together, this work provides a computational tool for differential RNA length analysis and reveals the dynamics of response mechanisms regulating the RNA state during oxidative stress.
Presentation Overview: Show 
Alternative splicing allows a gene to encode multiple isoforms, which can also differ at the 5’ and 3’ ends, and is an important method of gene regulation in higher eukaryotes. Long-read RNA-seq (lrRNA-seq) allows for sequencing of entire transcripts which provides the putative 5’ and 3’ ends and the internal structure of each transcript. This allows us to explore the diversity of isoform usage across different samples.
As part of the final phase of the ENCODE Consortium, we have sequenced 216 lrRNA-seq libraries for 60 unique human and mouse samples using the PacBio platform. We detect a large portion of annotated protein-coding human genes and transcripts. We also identify unnannotated transcripts, for which we predict and characterize the coding potential. We assess the post-transcriptional regulatory impact of each isoform by predicting which are subject to NMD or include microRNA binding sites. We use this collection of known and novel transcripts to define a reference set of 5’/3’ ends and intron chains that are used for each gene. We characterize each gene based on how much variation is seen in end usage and internal splicing and investigate genes whose transcriptional preferences differ between different tissues, cell lines, or species.
Presentation Overview: Show 
Alternative splicing (AS) plays a major role in the differentiation of immune cells during an immune response as 29% of AS genes are specific to the immune system. Although the role of AS is extensively investigated in T cells, its role in B cell activation is less characterised. We sought to develop a long-read technology ONT workflow to understand post-transcriptional regulation at both gene and isoform levels of human germinal centre (GC) B cells. As one of the challenges of ONT is the accurate computational analysis of isoforms, we developed ‘Nexons’ pipeline to identify the differentially spliced transcript variants using long-read sequencing. An in-depth analysis of splicing regulators with the Nexons revealed the differential regulation of the poison exon (PE) in splicing regulators (e.g. SRSF3) in GC B cells. In GC B cells, PEs of the splicing factors were preferentially spliced out whereas naïve B cells expressed isoforms carrying PE, leading to nonsense-mediated mRNA decay. Moreover, we identified novel spliced variants of these genes, which were undetectable due to the limitations of short-read data. Altogether, our findings validate the combination of Nexons with Smart-seq2 adapted ONT RNA-sequencing workflow as a suitable method for the identification and quantification of complex isoforms.
Presentation Overview: Show 
Nanopore technology makes possible the sequencing of RNA with single molecule sensitivity. Standard library preparation protocols for nanopore direct RNA sequencing (DRS) enrich for transcripts that are polyadenylated to facilitate ligation of the sequencing adapter. The resultant libraries therefore typically contain a medley of RNA species with 3’ polyA tails. Specialised biochemical enrichment protocols are currently required if only a specific RNA species is of interest. Here we describe the first method for Real-time In-Silico Enrichment of RNA species (RISER) during DRS. RISER accurately classifies protein-coding from non-coding species directly from only 4 seconds of raw DRS signal with an independent test accuracy of 88% and AUROC of 0.96. RISER has also been integrated with Oxford Nanopore’s ReadUntil API to enact targeted real-time RNA sequencing. The potential applications of this novel technology are numerous, allowing for the first time; the enrichment of mRNAs by cleansing the sequencing data of unwanted non-coding RNA species; the enrichment of non-coding RNA species that are lowly expressed and difficult to detect in an unbiased sequencing experiment; or tagging reads with the RNA species to which they belong in real-time during sequencing. Additionally, the real-time nature of RISER confers its utility for time-sensitive applications.
Presentation Overview: Show 
Long-read transcriptome sequencing (LRTS) enables sequencing of individual mRNA molecules at full length, facilitating the discovery, characterization, and  quantification  of  novel  genes, isoforms,  and  alternative splicing events.  However, to unravel the full potential of LRTS, computational tools  are  pivotal  that  explore  the  data  at  all  scales,  ranging  from  single  nucleotide information over isoform and gene level, to transcriptome-wide statistics.
With this in mind, we developed IsoTools, a comprehensive Python package for the analysis of LRTS data, which implements data structures integrating all relevant information from LRTS transcripts and reference annotation, together with  broad  analysis  functionality  to  explore,  analyze,  and  interpret  the  data. In particular,  we implemented a graph-based method for the identification of alternative splicing events (ASE) and a statistical approach to detect differential events. This approach adds a valuable perspective on alternative splicing, especially for genes with complex splicing structure that covers several independent ASE.
To demonstrate our methods, we generated PacBio Iso-Seq data of human hepatocytes treated with the HDAC inhibitor VPA, a compound known to induce widespread transcriptional changes.  Contrasted with short read RNA-seq,  this  analysis  provides  additional insights  for  a  better  understanding  of  alternative  splicing,  in  particular  with respect to complex novel and differential splicing events.
Presentation Overview: Show 
Transcriptomics is moving towards sequencing more and more samples and conditions, but capturing novel transcripts while reducing the impact from non-expressed genes remains an ever present challenge. To address this problem we developed Bambu, an R software package that uses long read RNA-seq data for both transcript discovery and quantification, enabling context specific transcriptomics in a seamless way. Because Bambu uses long-reads this simplifies the discovery and read assignment of new complex transcripts with multiple exons that was not possible with standard short-read sequencing. To do this Bambu uses two modules: (1) for transcript discovery, Bambu trains a model using the preexisting known transcripts to reduce the impact of sample to sample variations that impact commonly used static thresholds. (2) for transcript quantification an expectation maximization algorithm is employed that estimates full-length and partial-length read support per transcript. With these two features we show that not only does Bambu greatly improve transcript discovery but that using these context specific transcripts improves the accuracy of transcript quantification as a whole.
Presentation Overview: Show 
Alternative splicing is a key regulatory process that allows multiple transcripts to be produced from a single gene. Splicing has been primarily studied on a high-throughput scale via RNA sequencing (RNA-Seq). However, most of the reads in standard RNA-Seq are not the junction-spanning reads required for detecting splicing changes and splice isoforms. To address this, we developed a cost-effective, targeted RNA-Seq method to quantify splicing variations across human tissues. Building on the previous MPE-Seq method (Xu et al 2019) for detecting splicing in yeast, our LSV-Seq method uses thousands of reverse transcription primers anchored near 3’ splice sites. We created a new pipeline to identify targetable regions from previous RNA-Seq data using the MAJIQ algorithm and predict high yield primers. The library preparation protocol uses highly specific reverse transcription conditions to prevent off-target amplification. LSV-Seq achieves an overall median enrichment of ~500-fold compared to standard RNA-Seq and a median enrichment of ~800-fold for lowly expressed genes. Furthermore, LSV-Seq quantifications correlate well with RNA-Seq and detect numerous de novo junctions not found with RNA-Seq. We envision that LSV-Seq will be used to quantify splicing in large patient cohorts, detect splicing variation in lowly expressed genes, and detect transient splicing intermediates.
Presentation Overview: Show 
Complex systems such as the brain leverage alternative splicing to expand the proteome. To enable splicing studies in distinct cell types, we developed methods for the identification of full-length isoforms from single cells. Using our robust algorithmic framework, we demonstrated that regional identity can sometimes override cell-type specificity, and our efforts to spatially resolve isoform sequencing revealed that some developmentally regulated genes display regional splicing gradients throughout the mouse brain. Our ongoing work applies these methods to more developmental timepoints and brain regions where we have identified temporally and regionally mediated patterns of splicing. 
However, the conservation of cell-type specific isoform expression in human brain is underexplored. Not only did we develop single-nuclei isoform sequencing (SnISOr-Seq) to allow sequencing of frozen tissue, but we applied this technique to human frontal cortex to uncover coordination patterns, and the disparity of exon expression variability across different neurological conditions. Lastly, to allow the extrapolation of model organism results to human, our current efforts identify conserved and divergent cell-type specific splicing patterns in the human and mouse hippocampus, and evaluate the extent of inter-individual variability in splicing. Taken together, this provides a comprehensive view of spatio-temporal splicing patterns in the mammalian brain.
Presentation Overview: Show 
Alternative splicing (AS) is a regulatory process that generates different isoforms from a single gene, and studies have shown that abnormal splicing events are linked to the progression of cancer. While differential alternative splicing has mainly been analyzed in bulk RNA-seq data, more robust methods are needed for analyzing splicing variants in single-cell data. In this study, we present the use of machine learning models to detect differentially expressed exons and subsequently classify cells based on splicing profile, revealing the pathological mechanisms behind cancer-associated alternative splicing in murine melanoma. The pipeline takes in files from scRNA-seq experiments and performs sequence alignment, AS event quantification, clustering, and filtering. The exons are then inputted into a hierarchical machine learning model consisting of a random forest feature detector and multi-layer perceptron, which identifies significant exons and classifies cancerous cells with high accuracy. The model was trained and validated with sets of scRNA-seq data derived from melanoma tissue, wherein pathway analysis and protein network analysis of the output verified established AS events in melanoma and pinpointed receptors and molecular pathways that were differentially spliced. Specifically, pathways regulating the spliceosome were highly enriched and the exons can be further explored as targets of melanoma treatment.
Presentation Overview: Show 
Motivation: Single cell RNA sequencing (scRNA-seq) data makes studying the development of cells possible at unparalleled resolution. Given that many cellular differentiation processes are hierarchical, their scRNA-seq data is expected to be approximately tree-shaped in gene expression space. Inference and representation of this tree-structure in two dimensions is highly desirable for biological interpretation and exploratory analysis.
Results: Our two contributions are an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data, and a visualization method respecting the tree-structure. We extract the tree structure by means of a density based minimum spanning tree on a vector quantization of the data and show that it captures biological information well. We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space. We compare to other dimension reduction methods and demonstrate the success of our method both qualitatively and quantitatively on real and toy data.
Presentation Overview: Show 
Long non-coding RNAs comprise of non-coding genes with diverse mechanisms and regulatory roles. LncRNAs show high tissue and cell-type specific expression patterns with uncharacterized functions. Here, we assembled the non-coding transcriptome from >2,000 mouse liver RNA-seq datasets and discovered ~48,000 liver-expressed lncRNAs. These lncRNAs were used to analyze the single-cell transcriptome of two liver disease models, diet-induced NASH, and chemical-induced liver fibrosis. We applied trajectory inference algorithms to elucidate lncRNA zonation and discovered genes whose zonation was dysregulated during NASH pathogenesis. Changes in lncRNA expression emerged as a characteristic of macrophage expansion and the differentiation to NASH-associated macrophages, which is strongly linked to disease progression. Hundreds of lncRNAs were expressed in myofibroblasts, a key source of the fibrous scar in fibrotic liver. Regulatory network analysis using bigSCale2 was used to associate lncRNAs with biological functions and to predict key regulatory lncRNAs in NASH and liver fibrosis. Lnc10922(Meg3) and lnc47443(Fendrr) emerged as central regulators of Wnt signaling and immunity during liver fibrosis. Finally, we used triplex domain finder to identify regulatory gene targets for lncRNAs. Thus, we have characterized thousands of lncRNAs based on their cell-type specificity, spatial location, and used network analysis to predict their roles in liver disease.
Presentation Overview: Show 
In the mammalian auditory system, frequency discrimination depends on morphological and physiological properties of the organ of Corti that gradually change along the tonotopic axis of the organ and therefore shape the tuning properties of hearing. At the molecular level, those frequency-specific characteristics are mirrored in gene expression gradients, which require tonotopic patterning of the cochlea. However, molecular mechanisms that specify tonotopic identity remain poorly understood. To infer molecular mechanisms that pattern the organ of Corti along the frequency axis, we reconstructed the embryonic cochlear duct in 3D-space from scRNA-seq data and proposed two hypotheses regarding spatial patterning. Analyzing two developmental time points suggested that morphogens, rather than a timing-related mechanism, confer spatial identity in the cochlea. Subsequently, retinoic acid (RA) signaling was identified as a morphogen with a tonotopic gradient in the cochlear floor. Utilizing cochlear explants, functionality of the RA cascade was confirmed and an inverse relation with sonic hedgehog (SHH) signaling was predicted. Cell culture experiments indicated that SHH is involved in shaping the RA gradient via transcriptional regulation of Cyp26b1, which is a RA degrading enzyme. In summary, the findings suggest that RA and SHH form opposing morphogen gradients patterning tonotopic identity of the developing cochlea.
Presentation Overview: Show 
Motivation: The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA-protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. 
Results: In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA-RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence- based RNA-RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. 
Presentation Overview: Show 
Alternative splicing shapes the transcriptome and contributes to each cell’s unique identity, but single-cell RNA sequencing has struggled to capture the impact of alternative splicing. We previously showed that low recovery of mRNAs from single cells led to erroneous conclusions about the cell-to-cell variability of alternative splicing (Buen Abad Najar et al. 2020). We have now developed a method, Psix, to confidently identify splicing that changes across a landscape of single cells, using a probabilistic model that is robust against the data limitations of scRNA-seq. Its autocorrelation-inspired approach finds patterns of alternative splicing that correspond to patterns of cell identity, such as cell type or developmental stage, without the need for explicit cell clustering, labeling, or trajectory inference. Applying Psix to data that follow the trajectory of mouse brain development, we identify exons whose alternative splicing patterns cluster into modules of co-regulation. We show that the exons in these modules are enriched for binding by distinct neuronal splicing factors, and that their changes in splicing correspond to changes in expression of these splicing factors. Thus, Psix reveals cell-type-dependent splicing patterns and the wiring of the splicing regulatory networks that control them. Our new method will enable scRNA-seq analysis to go beyond transcription to understand the roles of post-transcriptional regulation in determining cell identity.
Presentation Overview: Show 
TCP-seq provides the footprints of the small subunit (SSU) of the ribosome (with additional factors) across the entire transcriptome of the analyzed organism.  Based on the TCP-seq data, we developed for the first-time a predictive model of the SSU density and analyzed the effect of transcript features on the dynamics of the SSU scan in the 5UTR. Our model is based on novel tools for detecting complex statistical relations tailored to TCP-seq. We quantitatively estimated the effect of several important features, including the context of the upstream AUG, the upstream ORF length and the mRNA folding strength. Specifically, we suggest that around 50% of the variance related to the read counts distribution near a start codon can be attributed to the AUG context score.
We provide the first large scale direct quantitative evidence that shows that indeed AUG-context affects the small sub-unit movement. We suggest that strong folding may cause the detachment of the SSU from the mRNA. We also identified a number of novel sequence motifs that can affect the SSU scan; some of these motifs affect transcription factors and RNA binding proteins. The results presented in this
study are a fundamental step toward a comprehensive modeling of initiation.
Presentation Overview: Show 
Small nucleolar RNAs (snoRNAs) are conserved non-coding RNAs that guide modifications on their target RNA, thereby regulating processes such as ribosome biogenesis. Most human snoRNAs are embedded in introns of host genes. It was shown for a few snoRNAs that their relative distance to the branch point, and the formation of a terminal stem are key regulators of their production. However, we recently showed that less than 30% of all annotated snoRNAs are expressed in human tissues, with several snoRNAs being expressed even in non-optimal genomic context. To identify which factors modulate the abundance of human snoRNAs, we trained 3 classifiers to predict snoRNA abundance status (i.e. expressed or not expressed in our human tissue expression datasets) based on the two aforementioned snoRNA characteristics and new potential abundance determinants. Our models are highly performant (AUC>0.88) and suggest that 4 features modulate snoRNA abundance status: motif conservation, global and terminal stem stability, and host gene expression level. When applied to mouse, they predict with great accuracy that >70% of all annotated snoRNAs are never expressed. Altogether, we developed a workflow which highlights 4 key factors regulating snoRNA abundance, and that can be applied to other species to refine existing gene annotations.
Presentation Overview: Show 
Much attention has been paid to alternative splicing and aberrant gene isoforms in protein coding regions, especially in cancer. However, less is known about splicing in untranslated regions (UTR). Here we present the first systematic pan-cancer study of introns within the UTR. Data presented in this paper shows that cryptic introns (called exitrons) in the 3'UTR are widespread in cancer. We analyze and call novel exitrons in over 9,000 tumor samples from The Cancer Genome Atlas (TCGA) pan-cancer dataset and over 9,000 normal samples from the Genotype-Tissue Expression (GTEx) database. We calculate the nonsense mediated decay (NMD) efficiency of 3'UTR exitrons, controlling for tumor expression heterogeneity and molecular subtype, and show that for many exitrons, including one in the AR gene, percent spliced is positively correlated with NMD efficiency. However, there are some genes where the inverse relation holds. We focus on one such gene, IGF2, and show that IGF2 gene expression in spliced samples can be explained by RNA binding proteins associated with the 3'UTR. We demonstrate that exitrons, while potentially triggering NMD, also transform the regulatory landscape of the 3'UTR, resulting in yet another avenue for the cancer cell to fine-tune gene expression and pursue advantageous cellular programs.
Presentation Overview: Show 
3’ untranslated regions (3’UTRs) play a critical role in controlling gene expression by altering mRNA stability, translation, and localization. Diversity in 3’UTR isoforms is generated through use of alternative polyadenylation sites (APA) and splicing of alternative last exons (ALEs). Although the breadth of alternative 3’UTR isoforms in different biological contexts is known, the mechanisms driving their regulation remain poorly understood. In order to identify novel regulators of 3’UTR diversity, we performed integrative analysis of diverse algorithms to detect multiple patterns of APA and ALEs in RNA-seq data from ENCODE. We applied this approach on the large set of over 350 RBP depletion experiments to detect significant shifts in 3’UTR isoforms. Integrating binding (eCLIP) and motif data with these functional targets identified several novel regulators. We show subsets of these regulated events influence mRNA stability and localization. Finally, co-expression analysis shows altered expression of specific regulators associated with shifts in specific cancers and across tissues in GTEx. We experimentally validated several targets of one of these novel regulators, the largely unstudied RNA helicase DDX55, which showed altered expression in ovarian cancer and acute myeloid leukemia.
Presentation Overview: Show 
One of the most used genome editing tools is the CRISPR/Cas9 system, which make use of a guide RNA (gRNA) that contains a stretch of 20 nucleotides to match a genomic region at which editing is carried out. Hence, editing requires selecting a specific gRNA out of several possible ones in a given target region, so that efficiency is maximized on the intended target, the on-target, while the off-target effects are minimized. Here, we focus on on-target design and present the deep learning-based method CRISPRon trained on ~24,000 gRNAs, obtained from ~13,000 publicly available and additional ~11,000 in-house ones. The data were carefully split into sets with minimal overlap in sequence to make cross-validation while leaving out one set as a completely independent test set. Comparing to other methods on individual independent test sets which have minimal sequence overlap to the gRNAs used for training in these respective methods, we find that CRISPRon outperform them. Methods like CRISPRon trained on indel-based data substantially outperform models made from loss-of-function data, which are still widely used. CRISPRon is available as stand-alone tool or as a web server where on-target predictions are combined with off-target assessment by our CRISPR/Cas9 energy-based binding model CRISPRoff.
Presentation Overview: Show 
Motivation: In the last few years, computational methods have been used to predict the editing efficiency of CRISPR/Cas9 gene editing for any guide RNA (gRNA) of interest. High-throughput datasets were collected to train machine-learning models for this task, but they have a low correlation with functional or endogenous editing. 
Results: To better utilize high-throughput datasets of CRISPR/Cas9 editing for functional and endogenous editing predictions, we developed DeepCRISTL, a deep learning model to predict the on-target efficiency given a gRNA sequence. The DeepCRISTL model is based on pre-training on more than 150,000 gRNAs over three enzymes, and improving prediction performance by multi-task and ensemble techniques. Our new model achieves state-of-the-art results over all three enzymes: up to 0.89 in Spearman correlation between predicted and measured on-target efficiencies. To fine-tune the model for functional or endogenous prediction tasks, we tested several transfer-learning (TL) approaches, with gradual-learning being the overall best performer. Our final model DeepCRISTL has been evaluated and compared versus popular extant methods, and achieved state-of-the-art results over all datasets. 
Availability: DeepCRISTL is publicly available at github.com/OrensteinLab/DeepCRISTL/.
Presentation Overview: Show 
While designing potent nonmodified small interfering RNAs (siRNAs) is trivial, no methods exist to accurately predict activity of therapeutic, chemically modified siRNAs. Building predictive models for modified siRNA design is challenging due to limited sizes of publicly available modified siRNA datasets. This limits application of potentially powerful machine learning methods. We present a framework for building supervised classification models using a small dataset (356 siRNA sequences with corresponding efficacies) for siRNA efficacy prediction. A trichotomous partitioning approach overcomes data noise and enabled exploration of several classification threshold combinations. In assessing these thresholds, we present a novel evaluation metric enabling model performance comparison despite large class imbalances. We identified a threshold pair yielding a random forest model that outperformed models developed from the same data using previously published linear methods. We employed a novel method for extracting features by proxy from the random forest model that can be applied to any classification model type, enabling simple feature preference comparisons. Extracted sequence preferences were consistent with current understanding of siRNA-mediated silencing mechanism, representing utility of feature extraction for exploring biological mechanisms. The presented framework applies to any classification problem where datasets are limited, enabling exploration of a great range of biological questions.
Presentation Overview: Show 
As a highly successful pathogen, M. tuberculosis is able to infect, survive and proliferate within harsh microenvironments created by human host with the help of mRNA degradation regulation. Previous studies have shown that the variability in mRNA degradation presents not only among genes but also between conditions. Here we developed a computational pipeline using RNAseq and machine learning to identify the features that determine mRNA degradation in mycobacteria. First, we performed RNAseq to quantify mRNAs degradation profiles transcriptome-wide using the non-pathogenic model M. smegmatis in normal and stress condition. Next, we clustered mRNAs according to their degradation time. Then we trained a random forest classifier to explore the mRNA features that are associated with different degradation time. Our results show that instead of one dominant feature, various types of features including nucleotide and codon content, secondary structure, ribosome occupancy and other sequence features all contribute to differentiate the degradation time. Our results also demonstrate that the determinants of degradation are different for leadered and leaderless mRNAs and for mRNAs in normal and stress condition. All of these suggest that there are complex regulation mechanisms for mRNA degradation in mycobacteria.
Presentation Overview: Show 
The life of an mRNA is a dynamic enterprise. Eukaryotic mRNAs travel from the nucleus, where they are synthesized, to the cytoplasm, where they are translated. To quantitatively estimate the rates at which mRNAs flow across the cell, we developed subcellular TimeLapse-seq, a method that combines RNA metabolic labeling and biochemical cell fractionation. Next, we developed a system of ordinary differential equations in a Bayesian framework to mathematically model the observed fraction of newly-synthesized RNA across subcellular compartments to estimate the rate posterior distributions and paths of RNA flow genome-wide in human and mouse cells. For each gene, we measured the rate at which RNAs are released from chromatin into the nucleoplasm, exported from the nucleus into the cytoplasm, and loaded onto polysomes. We observed substantial variability between flow rates of different transcripts across all compartments. Transcripts from genes with related functions, such as ribosomal protein and histone genes, flow across compartments with similar kinetics. Finally, using machine learning we identified several molecular features that explain the rates of RNA flow. Overall, our study comprehensively characterizes the spatiotemporal life cycle of mammalian mRNAs, revealing the many lives of RNA transcripts and the molecular features underlying their fates.
Presentation Overview: Show 
RNA-protein complexes (RNPs) are critical in biological regulatory networks. Identifying and characterizing these networks can be difficult, especially in cells. Chemical probing has been a useful tool for footprinting protein interactions on RNA, but many strategies rely on the indirect protection of RNA from reactivity with structure-sensitive probes. We have been applying a new direct strategy called RNP interaction network mapping by mutational profiling (RNP-MaP) to comprehensively identify protein-binding sites and to characterize protein interaction networks on noncoding RNAs ranging in length from 100 to 20,000 nucleotides inside live cells. 
In RNP-MaP, a chemical crosslinker marks RNA at sites of protein binding, and specialized reverse transcription and sequencing locates these marks with single nucleotide resolution. Importantly, multiple protein binding sites can be mapped on single molecules of RNA, defining coordinated protein interaction networks on RNA targets. More recently, new advances in probing, crosslink enrichment, and analysis now offer enhanced signal, increased confidence in assignments, and lower input requirements for RNP-MaP experiments.
Presentation Overview: Show 
Throughout the journey from synthesis to decay, mRNA changes its protein partners, dynamically remodeling the ribonucleoprotein complex (mRNP). Yet, previous studies captured unsynchronized pools of mRNA interactome from mixed stages, without a temporal resolution that is critical for dissecting mRNP remodeling and understanding RNA binding protein’s (RBP’s) functions. Here, we provide an mRNA interactome in a time-resolved fashion. By 4-thiouridine (4sU) pulse-labeling, we captured mRNPs from 10-time points and quantified  ≥700 RBPs using liquid chromatography with tandem mass spectrometry. The chronological orders of mRNA interaction are consistent with the known functions and localizations of RBPs: nuclear RBPs involved in pre-mRNA processing are “early” binders while translation factors and decay factors are detected at later time points.
Interestingly, some RBPs showed clear disagreement between mRNA binding dynamics and their reported functions, hinting at yet-unknown functions. We developed a supervised machine learning method that predicts mRNA binding dynamics based on Gene Ontology annotations and systematically identified RBPs with unexpected dynamics. RBPs with high prediction errors were not functionally well-characterized, indicating mRNA binding dynamics offer a useful clue for understanding these RBPs. Our study introduces a temporal dimension to the mRNA interactome research, and provides new insights into RBP functions and posttranscriptional regulation.
