All times listed are in UTC
- Manuel Irimia , mirimia@gmail.com
Presentation Overview: Show
One major challenge for the development of complex multicellular organisms is the generation of dozens of cell and tissue types from a single genomic sequence. Alternative splicing can produce protein isoforms through the differential processing of introns and exons, allowing optimization of specific cellular roles or even the emergence of novel functions in a tissue-restricted manner. Perhaps the most striking example of this is the program of neuronal microexons, regulated by Srrm3/4 in mammals. However, other (longer) exons also show largely restricted expression in neurons or other cell types, which is driven by distinct tissue-specific master splicing regulators (e.g. Nova, Esrp). Interestingly, although these alternative splicing programs are relatively well conserved within vertebrates, their conservation across larger evolutionary distances is minimal, despite the remarkable conservation of the tissue-specific expression and biochemical activity of their master regulators. I will discuss our results investigating the evolution and assembly of various exon programs in closely and distantly related species, as well as the tools and resources (ExOrthist and VastDB) we have generated to address these questions.
- Gabriela Santos Rodriguez, Garvan Institute of Medical Research, Australia
- Irina Voineagu, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia, Australia
- Robert Weatheritt, Garvan Institute of Medical Research, Australia
Presentation Overview: Show
The conservation of circular RNAs (circRNAs) between closely related species remains unclear. In the particular case of primates, it is known that many primate genes produce circular RNAs but their extent of conservation is unkwon. By comparing tissue-specific transcriptomes across over 70 million years of primate evolution, we identify that within 3 million years circRNA expression profiles diverged such that they are more related to species identity than organ type. Nonetheless, our analysis also revealed a subset of circRNAs with conserved neural expression across tens of millions of years of evolution. These circRNAs are defined by an extended downstream intron that has shown dramatic lengthening during evolution due to the insertion of novel retrotransposons. Our work provides comparative analyses of the mechanisms promoting circRNAs to generate increased transcriptomic complexity in primates.
- Michay Diez, Stowers Institute for Medical Research, United States
- Santiago Gerardo Medina-Muñoz, Stowers Institute for Medical Research, United States
- Luciana Andrea Castellano, Stowers Institute for Medical Research, United States
- Gabriel da Silva Pescador, Stowers Institute for Medical Research, United States
- Qiushuang Wu, Stowers Institute for Medical Research, United States
- Ariel Alejandro Bazzini, Stowers Institute for Medical Research, United States
Presentation Overview: Show
The codon composition of messenger RNAs (mRNAs) imposes regulatory information that strongly affects transcript stability, allowing cells to fine-tune protein expression. Current codon optimization methods revolve around codon usage frequency, despite the fact that it weakly correlates with mRNA stability. Here, we trained a machine learning model with mRNA stability profiles from several vertebrate species to predict mRNA stability based on the regulatory properties of codon composition. Using this model, we developed www.iCodon.org, a web interface that predicts mRNA stability, and customizes gene expression by introducing synonymous codon substitutions. To validate the potential of iCodon, we constructed twelve EGFP variants ranging in levels of predicted mRNA stability. Transfection of these variants in human cells revealed that mRNA stability predictions correlated with fluorescence intensity and captured a range of nearly 50-fold differences in gene expression. Additionally, zebrafish embryos injected with these EGFP variants recapitulated the human cells results, demonstrating that iCodon can also modulate gene expression in vivo. In conclusion, iCodon provides a powerful tool to interrogate mRNA stability and design strategies to modulate gene expression in vertebrates, for a wide range of applications for research, and for the potential optimization of RNA-based therapeutics and vaccines.
- Marie Coutelier, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Canada
- Selin Jessa, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Canada
- Nisha Kabir, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Canada
- Steven Hébert, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Canada
- Véronique Lisi, The Research Institute of the McGill University Health Center, McGill University, Canada
- Damien Faury, The Research Institute of the McGill University Health Center, McGill University, Canada
- Brian Krug, The Research Institute of the McGill University Health Center, McGill University, Canada
- Nicolas De Jay, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Canada
- Nada Jabado, The Research Institute of the McGill University Health Center, McGill University, Canada
- Claudia L Kleinman, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Canada
Presentation Overview: Show
Introduction: Repeated genomic elements (RGE) make up more than half the human genome. They are involved in coding genes regulation during development and cell type differentiation. They are also de-repressed in several tumour types, e.g. in high-grade gliomas with lysine-to-methionine substitution at position 27 of histone 3.
Material and methods: To investigate the intra-sample heterogeneity of RGE expression, we used single-cell or single-nuclei sequencing in K27M-mutated gliomas and in a developmental atlas including time points from E10 to P6 in the mouse forebrain.
Results: In several K27M-mutated gliomas, we identified heterogeneity in RGE expression, correlating with gene signatures specific of K27M gliomas and targets of the repressive methylation complex PRC2. Cells displaying high RGE expression showed a transcriptomic profile resembling less differentiated progenitor cells. Along development, we identified repeats families with defined expression dynamics in some cell types and developmental windows.
Conclusions: Our pipeline for RGE quantification allows us to reliably detect repeat families in single-cell data. It opens promising avenues towards a better understanding of the functional consequences of this unexplored fraction of the genome at the single-cell level.
- Frédérique White, Universitié de Sherbrooke, Canada
- Groleau Marika, Universitié de Sherbrooke, Canada
- Catherine Allard, Universitié de Sherbrooke, Canada
- Cécilia Légaré, Universitié de Sherbrooke, Canada
- Marie-France Hivert, Harvard Medical School, United States
- Luigi Bouchard, Universitié de Sherbrooke, Canada
- Pierre-Étienne Jacques, Universitié de Sherbrooke, Canada
Presentation Overview: Show
MicroRNAs are involved in many biological contexts, including pregnancy. The identification of genetic variants influencing gene expression, known as expression quantitative trait loci (eQTL), helps to understand the underlying molecular determinants of complex disease. Using data from the Genetics of Glucose Regulation in Gestation and Growth (Gen3G) Cohort, we investigated associations between SNPs and microRNA plasma levels during the first trimester of pregnancy. We used 369 samples from which maternal genotypes and full microRNA quantification were both available to identify 22,634 eQTLs involving 149 unique microRNAs.
Using these eQTLs we built genetic risk scores (GRS) using elastic-net regressions to select the most relevant SNPs. For about half of the selected microRNAs, the GRS capture more than 10% of the plasma level variance.
We also applied Mendelian randomization using eQTLs involved in GRS and found associations between the levels of circulating microRNAs during the first trimester of pregnancy and pregnancy complications reported in the Gen3G cohort, including gestational diabetes melitus.
Our results highlight the potential of genetic instruments in predicting circulating microRNA levels associated with pregnancy complications. Such instruments can help understanding the regulation of microRNA expression and the etiology of complex traits such as pregnancy complications.
- Mathieu Quesnel-Vallieres, University of Pennsylvania, United States
- Anupama Jha, University of Pennsylvania, United States
- Andrei Thomas-Tikhonenko, Children's Hospital of Philadelphia, United States
- Kristen Lynch, University of Pennsylvania, United States
- Yoseph Barash, University of Pennsylvania, United States
Presentation Overview: Show
Cancer is a set of diseases characterized by unchecked cell proliferation and invasion of surrounding tissues. The many genes that have been genetically associated with cancer or shown to directly contribute to oncogenesis vary widely between tumor types and it is not clear whether there exists a set of genes or other transcriptomic features commonly deregulated across several cancer types. We trained three feed-forward neural networks to predict the cancer state (healthy tissue versus tumor) of RNA-seq samples using either gene expression (protein-coding or lncRNA) or splice junction usage data on a set comprising 17 healthy tissue types and 18 solid tumor types. All three models achieve high precision (95.7% ± 2.1%) and high recall (97.3% ± 1.3%) across 14 datasets. Analysis of attribution values extracted from our models reveals that genes with high attribution values are evolutionarily conserved and are under strong selective pressure against loss of function. These findings suggest that the features making up the transcriptomic profile of cancer have essential cellular functions. Our results also highlight that deregulation of RNA-regulating genes and aberrant splicing are pervasive features across a large array of solid tumor types.
- Regan Hayward, Helmholtz Institute for RNA-based Infection Research (HIRI), Germany
- Bozena Mika-Gospodorz, Helmholtz Institute for RNA-based Infection Research (HIRI), Germany
- Ming Kang, Helmholtz Institute for RNA-based Infection Research (HIRI), Germany
- Lars Barquist, Helmholtz Institute for RNA-based Infection Research (HIRI), Germany
Presentation Overview: Show
Infection is a dynamic process centered on the interaction between host and pathogen. Dual RNA-seq provides a method to simultaneously capture both transcriptomes; being applied across a wide range of bacterial pathogens, assisting in the identification of key infection processes. However, there has been little investigation defining an optimal method for simultaneous quantification across diverse genomes. To address this, we have created a Nextflow pipeline called dualrnaseq, implementing multiple approaches to assign reads to each organism's transcriptome.
Most dual RNA-seq applied tools cannot accurately assign multi-mapping reads or assign reads to repetitive elements. To investigate this, we compared both traditional and our quantification-based workflow with built-in functionality to aid in assigning many of these challenging reads. We benchmarked both methods across different host/pathogen data sets covering diverse bacterial genomes, including Salmonella Typhimurium and Orientia tsutsugamushi. We show that across different read lengths and library types, we were able to recover and assign more host and pathogen reads with higher accuracy.
Our dual RNA-seq pipeline is available as part of the nf-core project utilizing the Nextflow infrastructure (https://github.com/nf-core/dualrnaseq), and can be run natively on a PC, a compute cluster, or in the cloud, and is fully containerized ensuring reproducibility.
- Malindrie Dharmaratne, University of Queensland, Australia
- Atefeh Taherian Fard, University of Queensland, Australia
- Jessica Mar, University of Queensland, Australia
Presentation Overview: Show
We present a novel statistical framework for identifying differential distributions in single-cell RNA-sequencing (scRNA-seq) data between treatment conditions by modelling gene expression read counts using generalized linear models. We model each gene independently under each treatment condition using the error distributions Poisson, Negative Binomial, Zero-inflated Poisson and Zero-inflated Negative Binomial with log link function and model-based normalization for differences in sequencing depth. Model selection is done by calculating the Bayesian Information Criterion and likelihood ratio test statistic. While most methods for differential gene expression analysis aim to detect a shift in the mean of expressed values, single-cell data are driven by over-dispersion and dropouts requiring statistical distributions that can handle the excess zeros. By modelling gene expression distributions, our framework, scShapes, can identify subtle variations that do not involve the change in mean. It also has the flexibility to adjust for covariates and perform multiple comparisons while explicitly modelling the variability between samples. Through simulation, we show that this framework is able to detect zero-inflated genes and when applied to real scRNA-seq datasets, our framework was able to identify genes and pathways linked to the phenotype of interest that were not discovered through traditional analysis of transcriptomic data.
- Jyun-Yu Jiang, University of California, Los Angeles, United States
- Chelsea J.-T. Ju, University of California, Los Angeles, United States
- Junheng Hao, University of California, Los Angeles, United States
- Muhao Chen, Information Sciences Institute, USC, United States
- Wei Wang, University of California, Los Angeles, United States
Presentation Overview: Show
Circular RNA is a novel class of long non-coding RNAs that have been broadly discovered in the eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backspliced to an upstream acceptor site. These circular RNA sequences are conserved across species.
More importantly, rising evidence suggests their vital roles in gene regulation and association with diseases. As the fundamental effort toward elucidating their functions and mechanisms, several computational methods have been proposed to predict the circular structure from the primary sequence. Recently, advanced computational methods leverage deep learning to capture the relevant patterns from RNA sequences and model their interactions to facilitate the prediction. However, these methods fail to fully explore positional information of splice junctions and their deep interaction.
Results: We present a robust end-to-end framework, JEDI, for circular RNA prediction using only nucleotide sequences. JEDI first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI can not only predict circular RNAs but also interpret relationships among splice sites to discover backsplicing hotspots within a gene region. Experiments demonstrate JEDI significantly outperforms state-of-the-art approaches in circular RNA prediction on both isoform-level and gene-level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve.
Availability: The implementation of our framework is available at https://github.com/hallogameboy/JEDI
- Jerome Waldispuhl, McGill University, Canada
- Carlos G. Oliver, McGill University, Canada
- Vladimir Reinharz, Université du Québec à Montréal, Canada
- Eric Westhof, IBMC-CNRS, France
- Rhiju Das, Stanford University, United States
- Vincent Mallet, Pasteur Institute, Les Mines-Paristech, France
- Jonathan Broadbent, McGill University, Canada
Presentation Overview: Show
The development of reliable computational methods for predicting RNA 3D structures and their interactions is key for advancing our understanding of RNA folding and therapeutics.
First, we introduce an open-source python package facilitating the development of statistical models and analyses of RNA 3D base pairing networks. This resource aims to facilitate the loading, sampling, visualization, and structural comparison of RNA base pair graphs, which will facilitate the development and democratization of machine learning frameworks for RNA structure prediction tasks.
Then, we present RNA-puzzles (https://www.rnapuzzles.org), a collective experiment for computational RNA 3D structure prediction methods launched in 2011. Like CASP, the organizers provide to participants sequences for which experimentalists have obtained 3D models still unpublished. Beyond a blind assessment of the performance of participating teams and algorithms, the consortium aims to promote discussions between researchers, identify bottlenecks as well as promising avenues, but also coordinate the efforts of the community and agree on models and representations.
We discuss the specificity of RNA 3D structure prediction and highlight current challenges. We also emphasize the importance of this exercise and its impact for the broad RNA community.
- Kathi Zarnack
Presentation Overview: Show
N6-methyladenosine (m6A) is the most abundant internal RNA modification in eukaryotic mRNAs and influences many aspects of RNA processing. miCLIP (m6A individual-nucleotide resolution UV crosslinking and immunoprecipitation) is an antibody-based approach to map m6A sites with single-nucleotide resolution. However, due to broad antibody reactivity, reliable identification of m6A sites from miCLIP data remains challenging. Here, we present miCLIP2 in combination with machine learning to significantly improve m6A detection. The optimised miCLIP2 results in high-complexity libraries from less input material. Importantly, we established a robust computational pipeline to tackle the inherent issue of false positives in antibody-based m6A detection. The analyses are calibrated with Mettl3 knockout cells to learn the characteristics of m6A deposition, including m6A sites outside of DRACH motifs. To make our results universally applicable, we trained a machine learning model, m6Aboost, based on the experimental and RNA sequence features. Importantly, m6Aboost allows prediction of genuine m6A sites in miCLIP2 data without filtering for DRACH motifs or the need for Mettl3 depletion. Using m6Aboost, we identify thousands of high-confidence m6A sites in different murine and human cell lines, which provide a rich resource for future analysis. Collectively, our methodology greatly improves m6A identification.
- Ruiyan Hou, School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, China
- Yuanhua Huang, School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, China
Presentation Overview: Show
RNA splicing is a key step of gene expression in higher organisms. Accurate quantification of the two-step splicing kinetics is of high interests not only for understanding the regulatory machinery, but also for estimating the RNA velocity in single cells. However, the kinetic rates remain poorly understood due to the intrinsic low content of unspliced RNAs and its stochasticity across contexts. Here, we estimated the relative splicing efficiency across a variety of single-cell RNA-Seq data with scVelo. We further extracted three large feature sets including 92 basic genomic sequence features, 65,536 octamers and 120 RNA binding proteins features and found they are highly predictive to RNA splicing efficiency across multiple tissues on human and mouse. A set of important features have been identified with strong regulatory potentials on splicing efficiency. This predictive power brings promise to reveal the complexity of RNA processing and to enhance the estimation of single-cell RNA velocity.
- Salma Sohrabi-Jahromi, Max Planck Institute for Biophysical Chemistry, Germany
- Johannes Söding, Max Planck Institute for Biophysical Chemistry, Germany
Presentation Overview: Show
Motivation: Understanding how proteins recognize their RNA targets is essential to elucidate regulatory processes in the cell. Many RNA-binding proteins (RBPs) form complexes or have multiple domains that allow them to bind to RNA in a multivalent, cooperative manner. They can thereby achieve higher specificity and affinity than proteins with a single RNA-binding domain. However, current approaches to de-novo discovery of RNA binding motifs do not take multivalent binding into account.
Results: We present Bipartite Motif Finder (BMF), which is based on a thermodynamic model of RBPs with two cooperatively binding RNA-binding domains. We show that bivalent binding is a common strategy among RBPs, yielding higher affinity and sequence specificity. We furthermore illustrate that the spatial geometry between the binding sites can be learned from bound RNA sequences. These discovered bipartite motifs are consistent with previously known motifs and binding behaviors. Our results demonstrate the importance of multivalent binding for RNA-binding proteins and highlight the value of bipartite motif models in representing the multivalency of protein-RNA interactions.
Availability: BMF source code is available at https://github.com/soedinglab/bipartite_motif_finder under a GPL license. The BMF web server is accessible at https://bmf.soedinglab.org.
- Eddie Park, Children's Hospital of Philadelphia, United States
- Yi Xing, Children's Hospital of Philadelphia, United States
Presentation Overview: Show
A-to-I RNA editing diversifies the transcriptome and has multiple downstream functional effects. We analyze matched genetic and transcriptomic data in 49 tissues across 437 individuals to identify RNA editing events that are associated with genetic variation. Using an RNA editing quantitative trait loci (edQTL) mapping approach, we identify 3117 unique RNA editing events associated with a cis genetic polymorphism. Fourteen percent of these edQTL events are also associated with genetic variation in their gene expression. A subset of these events are associated with genome-wide association study signals of complex traits or diseases. We find that certain microRNAs are able to differentiate between the edited and unedited isoforms of their targets. Furthermore, microRNAs can generate an expression quantitative trait loci (eQTL) signal from an edQTL locus by microRNA-mediated transcript degradation in an editing-specific manner. By integrative analyses of edQTL, eQTL, and microRNA expression profiles, we computationally discover and experimentally validate edQTL-microRNA pairs for which the microRNA may generate an eQTL signal from an edQTL locus in a tissue-specific manner. Our work suggests a mechanism in which RNA editing variability can influence the phenotypes of complex traits and diseases by altering the stability and steady-state level of critical RNA molecules.
- Eduardo Eyras, Australian National University, Australia
- Pablo Acera, Australian National University, Australia
- Jiajia Xu, Australian National University, Australia
Presentation Overview: Show
Direct RNA-seq with Oxford Nanopore Technologies offers the unprecedented opportunity to measure RNA modifications at single-molecule resolution. Existing tools to detect RNA modifications are based on statistical frameworks that compare a WT with a KO/KD sample depleted of modifications, and are not designed to detect modifications at single-molecule level in transcriptomes. Lack of appropriate training datasets is one of the limiting factors to develop better predictive algorithms. To yield maximum performance with the available datasets, we developed CHEUI, a two-stage deep-learning model combined with feature engineering of Nanopore signals to detect m6A RNA modifications. CHEUI accurately predicts methylation status both at single-read and single-site level in individual transcriptome samples. Our tool achieves high accuracy in independent datasets containing sequence motifs not seen during training, achieving an average area under the ROC curve of 0.875 for predictions on individual reads. Using controlled mixture datasets with different proportion of modified and unmodified reads, CHEUI outperformed Xpore, Nanocompore and EpiNano in stoichiometry and true positive rate prediction. CHEUI provides the opportunity to investigate functional roles of RNA modifications at isoform-level without the need of KO/KDs and has the potential to be extended to other modifications.
- Sara H Rouhanifard, Northeastern University, United States
- Sepideh Tavakoli, Northeastern University, United States
- Mohammad Nabizadehmashhadtoroghi, Northeastern University, United States
- Amr Makhamreh, Northeastern University, United States
- Neda Rezapour, Northeastern University, United States
- Meni Wanunu, Northeastern University, United States
Presentation Overview: Show
Mammalian cells generate >100 different RNA modifications that can change the base-pairing, RNA structures, or recruitment of RNA binding proteins. Pseudouridine modified mRNAs are more resistant to RNAse-mediated degradation and also have the potential to modulate immunogenicity and enhance translation in vivo. However, we have yet to understand the precise biological function of pseudouridine on mRNAs due to a lack of tools for their direct detection and quantification.
We have recently developed an algorithm for identifying pseudouridylated sites directly on mammalian mRNA transcripts using nanopore sequencing. We use our algorithm to classify 3 types of pseudouridine hyper-modification that may occur on mRNAs: Type 1 has a high percentage of pseudouridine at a given site; type 2 has >1 pseudouridine on a single read; type 3 has pseudouridine in addition to other modifications on a given read.
Our pipeline enables the direct identification and quantification of the pseudouridine modification on native RNA molecules. Further, the long read lengths allow multiple modifications to be detected on the same transcript, which allows layering of RNA modification data and RNA sequence information. Future applications of this pipeline will enhance our understanding of the biological impacts of pseudouridylation as they pertain to disease and development.
- Daiyun Huang, Xi'an Jiaotong-Liverpool University, China
- Bowen Song, Xi'an Jiaotong-Liverpool University, China
- Jingjue Wei, Xi'an Jiaotong-Liverpool University, China
- Jionglong Su, Xi'an Jiaotong-Liverpool University, China
- Frans Coenen, University of Liverpool, United Kingdom
- Jia Meng, Xi'an Jiaotong-Liverpool University, China
Presentation Overview: Show
Motivation: Increasing evidence suggests that post-transcriptional RNA modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches have been developed for the prediction of RNA modifications, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available.
Results: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution.
- Michael Piechotta, Humboldt University of Berlin, Germany
- Qi Wang, Klaus Tschira Institute for Integrative Computational Biology, Germany
- Christoph Dieterich, University Hospital Heidelberg, Germany
Presentation Overview: Show
A whole series of high-throughput antibody-free methods for RNA modification detection from sequencing data have been proposed lately.
We present JACUSA2 as a versatile software solution and comprehensive analysis framework of reverse transcription (RT) signatures in mapping experiments for Pseudouridine or N6-Methyladenosine (m6A) experiments.
The successor of JACUSA (https://github.com/dieterich-lab/JACUSA2) features better running time performance and is able to capture more complex read signatures involving base substitutions, insertions, deletions and read truncations.
JACUSA2 is able to process data from any read technology as long as it can be provided in the binary sequence alignment map format (BAM). We have applied JACUSA2 to m6A site prediction using 2 Illumina and 1 ONT Nanopore platform datasets, which were all generated in HEK293 cells.
Our benchmark results show high specificity and good sensitivity in comparison to other available prediction tools.
- Nicole Martinez
- Shengdong Ke
- Schragi Schwartz
- Yue Wan, Genome Institute of Singapore
Presentation Overview: Show
SARS-CoV-2 has emerged as a major threat to global public health, resulting in global societal and economic disruptions. Here, we investigate the intramolecular and intermolecular RNA interactions of wildtype (WT) and a mutant (Δ382) SARS-CoV-2 virus in cells using high throughput structure probing on Illumina and Nanopore platforms. We identified twelve potentially functional structural elements within the SARS-CoV-2 genome, observed that identical sequences can fold into divergent structures on different subgenomic RNAs, and that WT and Δ382 virus genomes can fold differently. Proximity ligation sequencing experiments identified hundreds of intramolecular and intermolecular pair-wise interactions within the virus genome and between virus and host RNAs. SARS-CoV-2 binds strongly to mitochondrial and small nucleolar RNAs and is extensively 2’-O-methylated. 2’-O-methylation sites in the virus genome are enriched in the untranslated regions and are associated with increased pair-wise interactions. SARS-CoV-2 infection results in a global decrease of 2’-O-methylation sites on host mRNAs, suggesting that binding to snoRNAs could be a pro-viral mechanism to sequester methylation machinery from host RNAs towards the virus genome. Collectively, these studies deepen our understanding of the molecular basis of SARS-CoV-2 pathogenicity, cellular factors important during infection and provide a platform for targeted therapy.
- Victoire Fort, Université Laval - CRCHU de Québec, Canada
- Gabriel Khelifi, Université Laval - CRCHU de Québec, Canada
- Samer Hussein, Université Laval - CRCHU de Québec, Canada
Presentation Overview: Show
Long non-coding RNAs (lncRNAs) are emerging new players of gene regulation. Here, we identified a novel lncRNA, Tapir, that regulates the pluripotent state of embryonic stem cells (ESCs), cells that are necessary for proper embryo development. Knock-down of Tapir in ESCs leads to a decrease in pluripotency genes expression, whereas up-regulation of Tapir increases their expression. Moreover, Tapir accelerates the reprogramming of differentiated cells towards induced pluripotent stem cells, which is accompanied by rapid up-regulation of chromatin regulators and pluripotency gene expression. We also found that it directly interacts with several mRNAs implicated in pluripotency and chromatin regulation in ESCs. By screening for sequence complementarity, we identified a SINE retrotransposable element in Tapir lncRNA, also present in the intronic regions and 3’UTRs of several pluripotency genes in the antisense orientation suggesting that Tapir functions through intermolecular hybridization with these mRNAs. Overexpression of a SINE depleted form of Tapir in ESCs or during reprogramming significantly altered the pluripotency-enhancer effect of Tapir. This suggests that the SINE element is essential for Tapir’s function, possibly by affecting the regulation or translation of pluripotency associated mRNAs. Our study highlights a new aspect of pluripotent stem cells regulation by lncRNAs.
- Danny Bergeron, Université de Sherbrooke, Canada
- Gabrielle Deschamps-Francoeur, Université de Sherbrooke, Canada
- Étienne Fafard-Couture, Université de Sherbrooke, Canada
- Laurence Faucher Giguere, Université de Sherbrooke, Canada
- Sonia Couture, université de sherbrooke, Canada
- Sherif Abou Elela, University of Sherbrooke, Canada
- Michelle Scott, University of Sherbrooke, Canada
Presentation Overview: Show
Small nucleolar RNAs (snoRNAs) are highly expressed short non-coding RNAs subdivided in two groups: box C/D and box H/ACA snoRNAs. The canonical function of the former is to guide 2’O-methylation of ribosomal RNAs (rRNAs), whereas the latter is to guide pseudouridylation of rRNAs. During the last two decades, a growing body of evidence has shown that snoRNAs can also have non-canonical functions in a variety of cellular processes. To gain more insight into snoRNA non-canonical functions, we reanalysed the datasets of high throughput methods aimed at capturing RNA-RNA interactions in cells, to create a snoRNA-RNA interaction network.
Surprisingly, we found that one third of the snoRNA detected in our network were interacting with their host gene. From these candidates, we found SNORD2, a snoRNA embedded in an intronic sequence of the EIF4A2 gene. We discovered an alternative folding of the snoRNA with the downstream EIF4A2 intron sequence, producing a stable intermediate, whose expression is highly anti-correlated with the percent spliced in of a cassette exon located immediately downstream of the SNORD2 intron. In conclusion, we unveiled a novel alternative splicing mechanism that involves regulation by embedded snoRNAs which could be used by the cell to alter gene expression.
- Xin Lai, Universitätsklinikum Erlangen, Germany
- Julio Vera, Universitätsklinikum Erlangen, Germany
- Martin Eberhardt, Universitätsklinikum Erlangen, Germany
- Christopher Lischer, Universitätsklinikum Erlangen, Germany
Presentation Overview: Show
Dendritic cells (DCs) are professional antigen-presenting cells that induce and regulate adaptive immunity by presenting antigens to T cells. The capacity of DCs to induce a therapeutic immune response against cancer can be enhanced by re-wiring of cellular signaling pathways with microRNAs (miRNAs). We developed a systems biology approach that combines RNA sequencing data and computational methods to delineate miRNA-based strategies that enhance DC-elicited immune responses. Through RNA sequencing of IKKβ-matured DCs that are currently being tested in a clinical trial on therapeutic anti-cancer vaccination, we identified 44 differentially expressed miRNAs. According to network analysis, most of these miRNAs regulate targets that are linked to immune pathways, such as cytokine and interleukin signaling. We employed a network topology-oriented scoring model to rank the miRNAs, analyzed their impact on the immunogenic potency of DCs, and identified dozens of promising miRNA candidates, with miR-15a and miR-16 as the top ones. The results of our analysis are presented in a database that constitutes a tool to identify DC-relevant miRNA-gene interactions with therapeutic potential. Our approach enables the systematic analysis and identification of functional miRNA-gene interactions that can be experimentally tested for improving DC immunogenic potency.
- Francisco Pardo-Palacios, Polytechnical University of Valencia, Spain
- Rocio Amorin, University of Florida, Spain
- Angeles Arzalluz-Luque, Polytechnical University of Valencia, Spain
- Leandro Balzano-Nogueira, University of Florida, United States
- Adalena Nanni, University of Florida, United States
- Liudmyla Kondratova, University of Valencia, United States
- Pedro Salguero, Polytechical University of Valencia, Spain
- Ben Jordan, University of Virginia, United States
- Gloria Sheynkman, University of Virginia, United States
- Lauren McIntyre, University of Florida, United States
- Elizabeth Tseng, Pacific Biosystems, United States
- Ana Conesa, University of Florida, United States
Presentation Overview: Show
Long-read sequencing technologies such as have created new possibilities in transcrip-tome analysis due to their ability to sequence full-length transcript molecules. SQANTI, a methodology based on reference genome mapping and splice junction analysis, and has become an essential quality-control and annotation tool for lrRNA-seq. However, as the use of the technology expands, additional QC and analysis needs, such as evaluation of 3’ and 5’ end diversity, quantification of gene expression, and transcriptome description lacking a reference genome, have become evident. We present SQANTI3 to successfully address these challenges (Figure1). SQANTI3 implements new quality metrics to assess 3’/5’ end diversity, which are subsequently incorporated into the tool’s machine-learning-based quality filter, ultimately improving the curation of false transcripts generated due to inaccuracies at transcript model ends. SQANTI3 includes a novel quantification strategy that leverages SQANTI categories and employs both full and non-full length reads to improve transcript expression estimates. SQANTI3 provides a novel classification scheme to describe long-read transcriptomes in the absence of a reference genome. Finally, as a community-developed resource, SQANTI3 incorporates third-party contributions, such as sqanti_protein, which facilitates long-read support of proteomics studies. These new developments were evaluated using several species datasets, as well as tested on simulated data.
- Alla Mikheenko, Saint Petersburg State University, Russia
- Andrey Prjibelski, Saint Petersburg State University, Russia
- Anoushka Joglekar, Weill Cornell Medicine, United States
- Hagen Tilgner, Weill Cornell Medicine, United States
Presentation Overview: Show
Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we combined barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences and Oxford Nanopore. We compared these long reads pairs in terms of sequence content and splicing structure. Although individual read pairs show high similarity, we found differences in (i) aligned length, (ii) polyA-tail length, (iii) TSS and (iv) polyA-site assignment and (v) exon-intron structures. Overall 25% of read pairs disagreed on either TSS, polyA-site or a splice site. Intron-chain disagreement typically arises from microexons and complicated splice sites. Our single-molecule technology comparison revealed that inconsistencies are often caused by sequencing-error induced inaccurate ONT alignments, especially to downstream GTNNGT donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors are often confirmed by both technologies and thus real. We also analyzed non-barcoded ONT reads and confirmed that the intron number and proximity of other GT/AGs better predict inconsistency with the annotation than the read quality alone. Taken together, our novel technology comparison approach enables an accurate delineation of true isoform characteristics from sequencing and analysis errors in individual reads.
- Luke Saville, University of Lethbridge, Canada
- Yubo Cheng, University of Lethbridge, Canada
- Babita Gollen, University of Lethbridge, Canada
- Liam Mitchell, University of Lethbridge, Canada
- Matthew Stuart-Edwards, University of Lethbridge, Canada
- Travis Haight, University of Lethbridge, Canada
- Majid Mohajerani, University of Lethbridge, Canada
- Athanasios Zovoilis, University of Lethbridge, Canada
Presentation Overview: Show
The new next-generation sequencing platforms by Oxford Nanopore Technologies for direct RNA sequencing (direct RNA-seq) allow for an in-depth and comprehensive study of the epitranscriptome by enabling direct base calling of RNA modifications. Non-coding RNAs constitute the most frequently documented targets for RNA modifications. However, the current standard direct RNA-seq approach is unable to detect many of these RNAs. Here we present NERD-seq, a sequencing approach which enables the detection of multiple classes of non-coding RNAs excluded by the current standard approach. Using total RNA from a tissue with high known transcriptional and non-coding RNA activity in mouse, the brain hippocampus, we show that, in addition to detecting polyadenylated coding and non-coding transcripts as the standard approach does, NERD-seq is able to significantly expand the representation for other classes of RNAs such as snoRNAs, snRNAs, scRNAs, srpRNAs, tRNAs, rRFs and non-coding RNAs originating from LINE L1 elements. Thus, NERD-seq presents a new comprehensive direct RNA-seq approach for the study of epitranscriptomes in brain tissues and beyond.
- Pierre-Luc Germain, ETH Zurich, Switzerland
- Federico Marini, Mainz University, Germany
- Yoseph Barash, University of Pennsylvania, United States
- Dominik Burri, University of Basel, Switzerland
- Weronika Danecka, University of Edinburgh, United Kingdom
- Mervin Fansier, Memorial Sloan Kettering Cancer Center, United States
- Christina Fitzsimmons, National Institute of Health, United States
- Matthew Gazzara, University of Pennsylvania, United States
- Sam Haynes, University of Edinburgh, United Kingdom
- Chelsea Herdman, University of Utah, United States
- Christina Herrmann, University of Basel, Switzerland
- Alexander Kanitz, University of Basel, Switzerland
- Maria Katsantoni, University of Basel, Switzerland
- Euan McDonnel, University of Leeds, United Kingdom
- Ben Nicolet, Sanquin Research, Netherlands
- Chi Lam Poon, Cornell University, United States
- Leonard Schärfen, Yale University, United States
- Denis Seyres, University of Basel, Switzerland
- Yuk Kei Wan, A-Star Institute, Singapore
- Pin-Jou Wu, National Yang Ming Chiao Tung University, China
- Farica Zhuang, University of Pennsylvania, United States
- Michelle Meyer, Boston College, United States
- Sharon Aviran, UC Davis, United States
Presentation Overview: Show
RNA binding proteins (RBP) play critical roles in post-transcriptional regulation. However, a detailed understanding of the sequence and/or secondary structure required for specific binding is lacking. We organized the RBP Footprint Challenge at RNA 2021 with two primary goals: (1) engage computational biologists in developing, improving, and assessing computational methods used in RNA research, and (2) draw attention to recent high-throughput structural datasets to encourage more comprehensive and rigorous analysis of these data. The challenge organized participants into five teams of two to six participants with a total of twenty participants. The teams were tasked to utilize existing high-throughput data sets (particularly structure probing data) to identify signatures associated with the binding of a protein of their choice. Subsequently, the teams used these patterns to detect and rank putative binding sites and compare these findings with CLIP-detected sites. The final product of the challenge was a list of five predicted binding sites not previously detected by CLIP. During RNA 2021, the teams presented their analyses, which varied considerably in both the data utilized and the computational approach used. In the coming year, predictions will be experimentally validated by Gene Yao’s group. The final results will be presented at RNA 2022.
- Liang Huang, School of EECS, Oregon State University, USA
Presentation Overview: Show
Since 1978 or so, the standard algorithms for RNA folding (Nussinov, Zuker, etc.) scale cubically with sequence length. These algorithms are OK for short RNAs, but as we start to study more and more long non-coding RNAs and mRNAs, they become too slow. In particular, the outbreak of the COVID-19 pandemic demands an efficient algorithm that can scale to the full-length SARS-CoV-2 genome which contains ~30,000 nucleotides, so that we can study the RNA structure of the virus and find potential targets for diagnostics and therapeutics. Existing workarounds run folding locally on short windows with limited pairing distance, but they can not predict any non-local interactions and it is well-known that end-to-end interactions are prevalent and important for RNAs.
To address this need, starting from LinearFold (2019), our group has designed a set of linear-time (approximate) algorithms for RNA folding, partition function calculation, stochastic sampling, mRNA design, and homologous folding. Unlike “local folding” methods, our algorithms are global, i.e., without any constraints on pairing distance. More interestingly, these algorithms have even higher accuracy than the cubic-time (exact search) counterparts. Thanks to the linear runtime, these algorithms can easily scale up to the full SARS-CoV-2 genomes, and predict global (end-to-end) interactions across the whole genome supported by experimental data.
I will highlight two recent and exciting developments along this line. First, to address the stability challenge of mRNA vaccines, our LinearDesign algorithm can efficiently design the most stable mRNA vaccine sequence for any protein target, and our designs for the COVID-19 spike protein have been verified in vivo to produce ~30x neutralizing antibody titration. Secondly, our LinearTurboFold algorithm can jointly fold and align a set of RNA homologs in linear time, and when applied to SARS-CoV-2 genomes, identifies not only conserved structures, but also conserved and accessible regions as potential targets for small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 gRNAs and RT-PCR primers.
- Chengxin Zhang, Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06511, USA, United States
- Yang Zhang, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA, United States
- Anna M. Pyle, Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06511, USA, United States
Presentation Overview: Show
The multiple sequence alignment (MSA) is the entry point for many RNA structure modeling tasks, such as prediction of RNA secondary structure (rSS), molecular contacts and solvent accessibility (SA). Yet, there are few automated programs for consistent generation of high quality MSA for a target RNA. We developed rMSA, an automated five-stage approach for sensitive search and accurate alignment of RNA homologs from the standard RNAcentral and NCBI nucleotide collection (nt) databases for a target RNA. rMSA is benchmarked on a diverse set of 365 non-redundant and high-resolution RNA structures against four state-of-the-art programs (RNAcmap, Infernal, nhmmer, blastn). It significantly outperforms the state-of-the-art MSA programs by approximately 20% and 5% higher F1-score for rSS and contact prediction, respectively. Moreover, it is comparable to state-of-the-art for SA prediction. Our program enables the detection of conserved rSS in 3 lncRNAs (RepA, SRA and HOTAIR) previously claimed to lack evolutionary conserved base pairs. Detailed analysis suggests that the advantage of rMSA lies in its hierarchical search strategy, which progressively incorporates more diverse homologs at each stage while avoiding attraction of unrelated sequences. rMSA is available at https://github.com/pylelab/rMSA.