Monday, July 24, between 18:00 CEST and 19:00 CEST |
Tuesday, July 25, between 18:00 CEST and 19:00 CEST |
---|---|
Session A Poster Set-up and Dismantle Session A Posters set up: Monday, July 24, between 08:00 CEST and 08:45 CEST Session A Posters dismantle: Monday, July 24, at 19:00 CEST | Session B Poster Set-up and Dismantle Session B Posters set up: Tuesday, July 25, between 08:00 CEST and 08:45 CEST Session B Posters dismantle: Tuesday, July 25, at 19:00 CEST |
Wednesday, July 26, between 18:00 CEST and 19:00 CEST |
|
---|---|
Session C Poster Set-up and Dismantle Session C Posters set up: Wednesday, July 26,between 08:00 CEST and 08:45 CEST Session C Posters dismantle: Wednesday, July 26, at 19:00 CEST |
Virtual |
|
---|
Presentation Overview: Show
Pre-mRNA splicing requires the excision of multiple introns within the same nascent transcript. Combinatorically, the order of intron excision could proceed down thousands of different paths, each of which would expose different landscapes of cis-elements that contribute to alternative splicing (AS). How intron splicing proceeds across human pre-mRNAs is not well understood due to technical limitations in the quantitative analysis of long RNA molecules. Here, we investigated post-transcriptional splicing order in human cells using direct RNA nanopore sequencing. We found that multi-intron splicing order is not stochastic, but largely pre-determined, with most genes using only a few predominant splicing orders to reach a fully spliced transcript. Strikingly, splicing orders were conserved across cell types and during motor neuron differentiation, indicating that for the studied introns, splicing order is fixed. Moreover, splicing orders did not change to accommodate AS, rather introns flanking alternatively spliced exons during differentiation were largely excised last, after their neighboring introns. Interestingly, sequencing of human lymphoblast cell lines from different individuals revealed several examples of allele-specific splicing order, suggesting that genetic sequence contributes to splicing order determination. Together, our results demonstrate that multi-intron splicing order is predetermined in human cells and is partially regulated by RNA sequence.
Presentation Overview: Show
Single-cell RNAseq has allowed unprecedented insight into gene expression across different cell populations in normal tissue and disease states. However, almost all studies rely on annotated gene sets to capture gene expression levels and sequencing reads that do not align to known genes are discarded. Here, we discover thousands of long noncoding RNAs (lncRNAs) expressed in human mammary epithelial cells and analyze their expression in individual cells of the normal breast. The human mammary epithelium is a highly dynamic tissue, composed of three main cell populations, basal, luminal progenitor and luminal mature cells, that can originate different subtypes of breast cancer. We show that lncRNA expression alone can discriminate between luminal and basal cell types and define subpopulations of both compartments. Clustering cells based on lncRNA expression identified additional basal subpopulations, compared to clustering based on annotated gene expression, suggesting that lncRNAs can provide an additional layer of information to better distinguish breast cell subpopulations. In contrast, breast-specific lncRNAs poorly distinguish brain cell populations, highlighting the need to annotate tissue-specific lncRNAs prior to expression analyses. Overall, our results suggest that lncRNAs are an unexplored resource for new biomarker and therapeutic target discovery in the normal breast and breast cancer subtypes.
Presentation Overview: Show
Cervical cancer is one of the most common gynecological malignancies worldwide. The main cause of cervical cancer is persistent infection with high-risk human papillomavirus (HPV). However, not all HPV-infected women develop cervical cancer, suggesting that other factors are involved in the progression of this disease. Non-coding elements, such as long non-coding RNAs (lncRNAs) have been implicated in multiple biological processes and diseases, including cervical cancer. In this study, we used a systems biology approach to identify and characterize the key lncRNAs involved in cervical cancer progression. We integrated multiple types of data, such as gene expression, DNA methylation, and clinical information, to construct a comprehensive lncRNA-mRNA co-expression network and a lncRNA-miRNA-mRNA competing endogenous RNA (ceRNA) network. We then applied network analysis methods to prioritize the most relevant lncRNAs and their target genes and miRNAs. We also performed functional enrichment and survival analyses to validate the biological and clinical significance of the identified lncRNAs. Our results revealed several novel lncRNAs that may play essential roles in cervical cancer progression by regulating key pathways and interacting with oncogenes or tumor suppressors. These lncRNAs may serve as potential biomarkers or therapeutic targets and provides new insights into the molecular mechanisms of cervical cancer.
Presentation Overview: Show
Background/motivation: Noncoding gene annotation often lags behind that of protein coding genes, particularly for small nucleolar RNAs (snoRNAs), a type of noncoding RNA regulating ribosome biogenesis. The latest human genome annotation contains only 50% of all snoRNAs reported in the specialized database snoDB. SnoRNA annotations are also often inadequate: we recently showed that more than 2/3 of all annotated snoRNAs are not expressed in human and are characterized by degenerate motifs and unstable predicted structures. While these features are incompatible with snoRNA expression (and challenge the very definition of a snoRNA), these non-expressed genes are still annotated as snoRNAs.
Methods/Results: To provide accurate snoRNA annotations across eukaryotes, we designed a neural network-based snoRNA predictor that uses as input the sequence, the structure stability and the presence of characteristic motifs. 1055 expressed snoRNAs were retrieved from RNA-Seq datasets and the literature, covering 15 organisms (animals, plants, fungi and protists). Negative examples were gathered from the sequence of shuffled snoRNAs, random genomic regions and other types of mid-size noncoding RNAs. After optimization, the predictor will be compared to existing tools and applied across all eukaryote genomes.
Conclusion: Our predictor will improve annotations and facilitate comparative studies of snoRNAs across all eukaryote kingdoms.
Presentation Overview: Show
An important aspect of protein diversity is the alternative inclusion of exons in the mRNA transcripts of a large population of human genes. The goal of this study was to observe the association of the transcriptome and the epigenome in the context of alternative splicing. To this extent, we analyzed RNA-Seq data and histone modification ChiP-Seq data for the human embryonic stem cell and four in-vitro differentiated cell lineages from the ENCODE project. We focused on the exon-intron boundaries of the precursor mRNA transcripts. We identified ‘epispliced’ genes showing a significant correlation between differential exon usage and differential histone modification. Genes with no histone modifications enriched at the exon level were labeled as ‘non-epispliced’ genes. We trained a random forest model to classify with 80-90% accuracy exon flanks as those belonging to epispliced genes or to non-epispliced genes using as features the binding affinities of splicing factors to these regions predicted by the tool RBPmap. It turned out that the inspected splicing factors are predicted to bind either to the exon flanks of epispliced or non-epispliced genes. The classifiers showed good transferability across closely related biosamples.
Presentation Overview: Show
Time-series single-cell RNA-seq is widely used to study tissue development processes or how cells respond to stimuli. However, current trajectory inference methods do not consider the issue of low RNA detection capability, as less than 10% of total RNA molecules can be detected in an individual cell. Instead, these methods construct a path based on the similarity of a few abundant gene expression profiles. Additionally, they do not take advantage of the data continuity, but rather pool together all cells from different time points. In this study, we developed a new method that applies TO-GCN (Time-ordered Gene Coexpression Network) methods with pseudo-bulk expression for each cell cluster. This method overcomes the issue of low RNA detection capability and helps us identify the most important key cell clusters. Furthermore, it can uncover regulatory relationships and reveal the dynamics of emerging pathways across the experimental time points.
Presentation Overview: Show
The nonsense-mediated mRNA decay (NMD) pathway is a critical mRNA surveillance mechanism responsible for the degradation of transcripts containing premature termination codons (PTCs), reducing the production of potentially harmful truncated proteins. It also can regulate other natural (non-PTC) endogenous genes through specific NMD-eliciting features. There is some evidence anticipating that NMD efficiency (NMDeff) can vary across human individuals and tissues, which might impact the phenotype severity of several diseases, including cancer. Here, we present a systematic quantification of NMDeff variability across 33 tumor types and 54 normal tissues. We use matched whole-exome sequencing (WES) and RNA-seq data to estimate sample-NMDeff values using two independent statistical methods: i) endogenous natural NMD target transcript levels and ii) Allele-Specific Expression of PTCs. We show how NMDeff significantly varies across tumors and normal tissues. Through genetic somatic associations we find that copy-number variation plays a role dysregulating the NMD pathway activity in some cancers. For instance, a common 1q amplification, where SMG5 and SMG7 factors are located, is associated with a reduction in NMDeff. In conclusion, we have implemented two statistical methods to quantify the NMDeff variability across individuals and tissues and detect some of its genetic underpinnings.
Presentation Overview: Show
Background:
RNA•DNA:DNA triple helix (triplex) formation enables RNA transcripts to modulate local chromatin environment. Molecular detection of triplex formation is complex, making computational prediction of triplex formation important. Previous predictive methods relied upon Hoogsteen base pairing. We explored whether machine learning in conjunction with triplex-sequencing data generated in vitro could improve prediction of triplex formation.
Methods:
Triplex-enriched DNA and RNA motifs were identified from unpaired triplex-sequencing data, and input to an Expectation-Maximisation algorithm which learned probabilistic matrices linking sets of DNA and RNA motifs. Matrix error was calculated per iteration, and minimised. Final matrices and motif sets were output upon minimisation. Output matrices were implemented as score matrices in the local alignment program TriplexAligner, which uses Karlin-Altschul statistics to predict triplex formation between user-provided DNA and RNA.
Results:
TriplexAligner significantly outperformed previously published methods in the accurate recall of genome-wide RNA-DNA interactions identified by RADICL-sequencing or RedC, as well as specific interactions of lncRNA SARRAH. Predicted triplex DNA and RNA sequences were evaluated biophysically, and appeared to form valid triplex.
Outlook:
DNA-RNA pairing rules learned from triplex-sequencing data accurately predict RNA-DNA interactions. Applications of TriplexAligner could elucidate mechanisms of RNA action and potential importance of triplex formation.
Presentation Overview: Show
Peripheral blood mononuclear cells (PBMCs) are blood cells that are a critical part of the immune system used to fight off infection, defending our bodies from harmful pathogens. In biomedical research, PBMCs are commonly used to study global immune response to disease outbreak and progression, pathogen infections, for vaccine development and a multitude of other clinical applications. Over the past few years, the revolution in single-cell RNA sequencing (scRNA-seq) has enabled an unbiased quantification of gene expression in thousands of individual cells, which provides a more efficient tool to decipher the immune system in human diseases. In this work, we generate scRNA-seq data from human PBMCs at high sequencing depth (>100,000 reads/cell) for more than 30,000 cells, in resting, stimulated, fresh and frozen conditions. The data generated can be used for benchmarking batch correction and data integration methods, and to study the effect of freezing-thawing cycles on the quality of immune cell populations and their transcriptomic profiles.
Presentation Overview: Show
Metabolic RNA labeling is a powerful method to investigate the temporal dynamics of gene expression. The introduction of nucleotide conversion RNA-seq, such as SLAM-seq, has greatly facilitated the experimental effort but has also brought new challenges for data analysis. Another layer of complexity is added when elaborate experimental designs such as time courses with multiple genotypes and treatments are required to answer complex research questions. Yet, appropriate computational tools for analyzing this kind of data are lacking. To address this need, we developed grandR, a comprehensive toolkit for the analysis of nucleotide conversion RNA-seq data.
With our software we also introduce new quality control measures to exclude effects of 4sU on transcription and describe the need for recalibration of effective labeling times that would otherwise bias results. grandR enables researchers to perform differential gene expression analysis and estimate synthesis and degradation rates for both progressive labeling as well as snapshot experiments. Additionally, our software provides a web-based interface for exploratory data analysis.
grandR represents a significant advance in the analysis of nucleotide conversion RNA-seq data, enabling researchers to gain deeper insights into the temporal dynamics of gene expression and accelerating progress in many areas of biomedical research.
Presentation Overview: Show
C/D-box small nucleolar RNAs (SNORDs) classically direct the post-transcriptional methylation of nucleosides in ribosomal RNAs, small nuclear RNAs, and transfer RNAs. However, the human genome produces numerous orphan SNORDs that lack the ability to interact with classical RNA targets and whose function is poorly understood. The eutherian-specific, orphan SNORD116 genes, which are organized in a large tandem repeat at 15q11-13, are strongly suspected to play a major role in the rare disease called Prader Willi syndrome (PWS), but their molecular function remains unknown. We combined phylogenetic and computational interaction analyses to reveal that a subset of snord116 copies use an antisense element, which is typically involved in classical target recognition, to interact with messenger RNA (mRNA) targets. Target status was confirmed by transient knockdown and compensation experiments in human and mouse cells. To go further, we are working to characterize the molecular mechanism of snord116 action and to identify the extent of their target repertoire. This combination of computational and experimental approaches expands the description of the molecular bases of PWS and opens new avenues for therapy. More generally, this approach could be considered for the functional characterization of other noncoding RNAs, in particular when expressed from multiple gene copies.
Presentation Overview: Show
RNA-seq and its modified enriched-based methods, such as differential RNA-seq, have enabled the base-exact identification of transcription starting sites (TSS) and have improved gene expression analysis. However, some TSSs cannot be associated with known annotated genes, thus called orphan TSSs. Hence, characterizing transcripts starting at these positions seems to be challenging for existing computational annotation pipelines. TSS-Captur, a novel pipeline, uses different computational approaches to characterize transcripts starting from experimentally confirmed orphan TSSs with a specific focus on non-coding RNA gene characterization. TSS-Captur uses two methods to classify extracted transcripts into coding or non-coding genes and predicts for each putative transcript their transcription termination sites. For each predicted ncRNA gene, the secondary structure is computed. Furthermore, putative promoter regions are analyzed for the existence of known transcription regulation motifs. The results are presented in an interactive interface for easy exploration. TSS-Captur was tested on Streptomyces coelicolor data and successfully characterized unlabeled ncRNAs overlooked by common genome annotation pipelines. Also, TSS-Captur characterized more unannotated transcripts in greater detail when compared to another similar pipeline. In summary, starting from experimental TSS data, TSS-Captur predicts the characterization of unclassified signals and complements prokaryotic annotation tools, contributing to the understanding of bacterial transcriptomes.
Presentation Overview: Show
Alternative splicing (AS) may be related to genetic diseases, both in terms of its causal role and its potential as a treatment. In autoimmune diseases such as multiple sclerosis, AS exhibits aberrant behavior and plays an important role in symptom characterization. Despite significant progress, the regulatory mechanism of AS remains an open area of research involving several related biological phenomena.
Our aim is to identify Single nucleotide polymorphisms (SNPs) that influence the regulation of AS events. To this aim, we studied three patient groups: sporadic multiple sclerosis, relapsing multiple sclerosis and a control group.
To do this, we developed a bioinformatic pipeline that identifies both annotated and novo AS events using our EventPointer software. This pipeline combines the information of the differential events, the differential usage of alleles in cis-acting SNPs, SpliceAI (a machine learning method that predicts mutations than affect splicint) and sQTLs databases.
As a result, 299 SNPs were found in possible regulatory regions of 275 AS events that are differentially expressed between patient groups, generating a set of potential sQTLs that may influence the progression or symptoms of multiple sclerosis. Interestingly, immune system response was overrepresented in a enrichment analysis of the genes that include them.
Presentation Overview: Show
mRNA production speed is determined by the time it takes to transcribe and process pre-mRNA molecules. Methods for measuring RNA maturation involve sequencing RNA intermediates at different time points. However, current techniques have limited temporal resolution, which makes it difficult to measure very fast biogenesis rates. Additionally, these methods do not allow to measure the variability in elongation rates within the same gene. To address these issues, we developed "kinetic barcoding" that involves stepwise labeling of nascent RNA with multiple nucleosides to measure multiple time points within a single sequencing library. By sequentially adding 5-ethynyl-uridine (5eU), 6-thio-guanosine (6sG), and 4-thio-uridine (4sU) at different time points we can measure nascent RNA intermediates at multiple time scales in a single experiment. We isolate nascent RNA by biotinylating and pulling down 5eU-labeled RNA, followed by alkylation of the 6sU and 4sU thiol groups to generate nucleotide-specific substitutions where these nucleotides were incorporated. This allows us to distinguish molecules transcribed during the first, second, and final labeling windows by their substitution patterns. We applied this kinetic barcoding approach to measure transcription elongation rates in human cells and showed that it provides increased temporal resolution for measuring the variation in elongation rates between genes.
Presentation Overview: Show
Brain metastasis (BrM)-associated astrocytes orchestrate a modulatory effect in the innate and acquired immune systems. However, little is known about how general and region-specific functions are aligned at the single-cell level; neither how sub-states are spatially distributed in the brain. In this study, we isolated adult mouse astrocytes by ACSA-2-mediated magnetic-activated cell sorting (MACS) to investigate transcriptome changes in three different brain metastatic mouse models. Single-cell RNA-sequencing revealed 7 gene expression clusters of BrM-associated astrocytes commonly present in breast, lung and melanoma primary origins, revealing a previously unappreciated complexity within this glial cell type. Unsupervised uniform manifold approximation and projection analysis and the expression of canonical markers upon clustering allowed us to further characterize them. We uncover an interferon(IFN)-responsive astrocyte (IRA) subpopulation, which was characterized by IFN-γ-induced JAK/STAT1 axis dysregulation, as well as high expression of the complement component C4b, previously associated with response to injury. Integration of single cell and spatial transcriptomics for early and late disease timepoints by SPOTlight deconvolution, showed different sub-states being widespread while others are regionally restricted. Specifically, IRA was exclusively present at late timepoint and restricted to the peritumoral area. This study provides unprecedented insights within the glial compartment of the brain metastasis microenvironment.
Presentation Overview: Show
The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. In a recent publication [1] we describe a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we assess the advantages of MAJIQ v2 compared to existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer insights into brain subregion-specific splicing regulation.
Presentation Overview: Show
High-throughput short-read RNA sequencing has given researchers unprecedented detection and quantification capabilities of splicing variations across biological conditions and disease states. However, short-read technology is limited in its ability to identify which isoforms are responsible for the observed sequence fragments and how splicing variations across a gene are related. In contrast, more recent long-read sequencing technology offers improved detection of underlying full or partial isoforms but is limited by high error rates and throughput, hindering its ability to accurately detect and quantify all splicing variations in a given condition.
To better understand the underlying isoforms and splicing changes in a given biological condition, it’s important to be able to combine the results of both short and long-read sequencing, together with the annotation of known isoforms. To address this need, we develop MAJIQ-L, a tool to visualize and quantify splicing variations from multiple data sources. MAJIQ-L combines transcriptome annotation, long reads based isoform detection tools output, and MAJIQ (Vaquero-Garcia et al. (2016, 2023)) based short-read RNA-Seq analysis of local splicing variations (LSVs). We analyze which splice junction is supported by which type of evidence (known isoforms, short-reads, long-reads), followed by the analysis of matched short and long-read human cell line datasets.
Presentation Overview: Show
Glioblastoma is an aggressive type of brain tumour with a poor prognosis. Alternative splicing of pre-mRNA has been widely reported to be associated with the progression of malignant tumours including glioblastoma. Therefore, we reasoned that dysregulated alternative splicing and the regulatory RNA-binding proteins (RBPs) may provide potential biomarkers and therapeutic targets in glioblastoma. Using FASTQ files provided by the Chinese Glioma Genome Atlas (CGGA) we quantified alternative splicing events using EventPointer (an R package developed in our group). In the CGGA dataset, we confirmed previous observations of dysregulated alternative splicing, where autophagosome membrane docking was the most enriched function among the genes associated with the dysregulated events. Besides, we performed an enrichment analysis of the potential RBPs regulating these events, and brain-abundant and specific RBPs required for neuronal development and function were identified. Specifically, CEFL4, HNRNPR, NOVA2, RBFOX1, QKI, NOVA1 and RBFOX2 appeared to be the RBPs regulating dysregulated events, leading to poor overall survival and a direct effect on proliferation, migration and invasion in glioblastoma. Among dysregulated events, retained introns stand out and these together with the aforementioned RBPs, may lead to the discovery of potential therapeutic targets.
Presentation Overview: Show
The development of cancer cells are frequently influenced by surrounding microenvironments. Thus, understanding how to disrupt the micro-environmental stressors promoting tumour growth through RNA expression is crucial. In glioblastoma Multiforme (GBM), a hostile tumour microenvironment (TME) generated by hypoxia as well as tumour promoting immune cells have been determined to increase the stem-like state of tumour cells by activating pro-migratory and pro-invasive factors. In addition, hypoxia poses a risk of poor therapy response or resistance. The resulting induced immunosuppression of the TME results in potentiating the tumour maintenance, progression, recurrence, and resistance to therapy causing a major dilemma for immunotherapy and new drug development. Here, we explore alternative natural medicines to combat tumour supportive microenvironments to enhance therapeutic outcomes in glioblastoma. We use transcriptomics to define novel hypoxia induced RNA transcripts in GBM cell models and use a deep learning approach to identify medicinal plant miRNAs that could be used to regulate the hypoxia induced RNA response. Our research has revealed numerous miRNAs in various edible plant species that exhibit potential regulatory effects on genes of interest from our tissue data. We are currently in the process of selecting targets for experimental validation on hypoxic stem cell lines.
Presentation Overview: Show
Organoids are complex 3D structures derived from stem cells that have architectures and functionalities similar to in vivo organs. Organ-specific studies were able to confirm the structure and composition of mature organoids by comparing cell populations found in organoids with those found in the target organ. In this context, we aim to study the organoid maturation by exploring descriptive and statistical methods from single-cell RNA-seq data of organoids at different stages of maturation.
We used public single-cell RNA-seq datasets of kidney organoids at different stages (time points) of maturation (Subramanian et al, Nature Communications, 2019). We integrated all the datasets to highlight cell specificities for each stage of maturation. We also inferred trajectories to observe correlations between cell differentiation and the date of maturation. We finally used statistical methods to explore the variability of cell proportions between the different stages of maturation.
The perspective of the project is to establish a computational method that can predict the date of maturation of organoids.
Presentation Overview: Show
Alternative splicing (AS) is a co- and post-transcriptional process that generates multiple mRNA and protein isoforms from a single gene. The regulation of AS involves various mechanisms, with RNA-binding proteins (RBPs) being one of the key regulators. Besides their role in splicing regulation, RBPs are also implicated in cancer prognosis and represent promising therapeutic targets for cancer treatment. CLIP-seq experiments target specific RBPs and reveal the sites in the nascent transcriptome where the RBP attaches. A fruitful approach to establishing a causal connection is to analyze changes in splicing patterns around these sites. However, selecting the appropriate RBP(s) to study in a CLIP-seq experiment can be challenging and often relies on prior assumptions.
In this study, we have developed an algorithm that integrates CLIP-seq experiments with differential splicing analysis to identify RBPs likely related to splicing changes. Our refined algorithm improves prediction accuracy. We tested the algorithm on four different experiments and found significant relationships between RBPs and splicing alterations in cancer types. Afterwards, we used our model to analyse 19 cancer types in TCGA and TARGET databases. For each, we obtained a ranking identifying the RBP responsible for splicing changes, showing the algorithm's potential in identifying therapeutic targets.
Presentation Overview: Show
Alternative splicing (AS) is a major source of proteome and transcriptome diversity. In addition to RNA-binding proteins and their cis-motifs within the pre-mRNA, it is widely accepted that RNA secondary structure and trans-RNA interactions are important for AS regulation. However, little is known about the molecular mechanisms involved. Given the importance of AS in the context of disease and normal tissue development, there is a need to elucidate the role of RNA-RNA interactions in the regulation of AS.
To investigate this, we use data from RNA cross-linking and proximity ligation experiments. We developed a computational pipeline for RNA proximity ligation data, that facilitates the detection of the interactions with regulatory RNAs. By combining the derived RNA-RNA interactions with the complementary RNA-seq time course data, we identified several candidate RNA- pre-mRNA interactions which may be involved in splicing regulation. Our results demonstrate the importance of RNA-RNA interactions for AS and highlight the potential utility of interaction probing experiments for investigating the role of RNA-RNA interactions in other biological processes.
Presentation Overview: Show
Clear cell Renal Cell Carcinoma (ccRCC) is a lethal and aggressive cancer type that arises in kidney tubules. We have utilized single-cell long-read sequencing to identify potentially pathogenic and druggable cancer-specific transcripts in 4 patient-derived ccRCC organoid samples.
Multiplexed Arrays Sequencing (MAS-Seq) is a new single-cell full-length isoform sequencing technology that can accurately detect full-length transcripts across single cells in high-throughput. We have applied the MAS-Seq technology on patient-derived organoid cells of ccRCC samples and assessed their alternative splicing landscape. To gain a deeper view of the transcriptome of each cell, we barcoded as few cells as possible.
Preliminary results revealed, on average, the expression of 5756 genes in 958 cells per sample. The mean read number per cell was 325,688. On average, 15,000 transcripts were found in at least three cells per sample expressing a minimum of 100 genes. More than 22% of the sequenced transcripts were novel isoforms compared to GENCODE v39. Interestingly, the transcripts with the highest cell-to-cell variation were genes that have roles in ccRCC development.
Our results and analyses will likely reveal new diagnostic and predictive biomarker candidates and give ccRCC patients new hope for novel precision medicine treatments against their detrimental disease.
Presentation Overview: Show
Neoantigens are a type of tumour-specific antigen which can result from aberrations to the DNA or RNA. New non-catalytic functions for the tumour suppressor gene PTEN are emerging, including in alternative splicing (AS), where PTEN interacting with splicing machinery and the spliceosome has been described. Furthermore, PTEN-loss associated alterations to the immune tumour microenvironment (TME) have been observed. Preliminary findings in our group have found a role for PTEN-loss associated aberrant AS and we hypothesise neoantigens arising from these events could alter tumour cell recognition by immune cells and/or AS-derived proteins from these events could affect immune cell function in the TME. Using machine learning neoantigen prediction algorithms we have extracted putative PTEN-loss associated MHC-I AS-derived neoantigens from DU145 cell-line data that have been depleted of PTEN. Matched mass spectrometry data has been processed to identify translated neoantigens in the samples’ own reference proteome. Finally, TCGA prostate cancer data is currently being processed through the same pipelines to identify strong computationally predicted neoantigens in PTEN-loss conditions which will be correlated to the altered immune TME. This research is novel and will elucidate the PTEN-loss associated alterations on the transcriptome and proteome through neoantigens presented by MHC molecules.
Presentation Overview: Show
MicroRNAs (miRNAs) are the most versatile small RNA regulators in the cell. Cellular stress causes altered canonical miRNA processing by their biogenesis factors which in turn causes the generation of their isomiRic (modified miRNA) forms.
In our study, we perform paired small and ribo-depleted RNA-seq from induced AMI stress experiments, within 4 major cell types of heart at 6-time points. We observe significant dynamics in expression patterns of mature miRNA 3p and 5p arms. In this context, we investigate associations of miRNA target gene regulations with the dynamics of mature miRNA arm ratios (3p / 5p) at every time point. Further, we assess the isomiR expression ratios, their biogenesis factors (Drosha, Dicer, etc.), and TDMD (Target RNA-Directed MicroRNA Degradation) factors (Dis3l2, Tents, etc.) as plausible associative causes for canonical arm ratio changes. Finally, we predict co-regulated functional pairs as an effect of changing miRNA arm ratios and changing median expression of the target genes within functional gene sets. We implement this novel analysis framework in miRarmature, an R package to enable systematic investigation of the interesting dynamics of these differentially processed miRNAs, their associated factors, and predicted functional roles with statistical inferences and visualizations from time series experiments.
Presentation Overview: Show
Nonconventional yeasts naturally produce molecules useful in agriculture, medicine, and sustainable energy; however, it is difficult to engineer their metabolism. This is because fundamental biological knowledge is lacking. In this study, we used comparative transcriptomics to investigate the metabolism of oleaginous yeasts Yarrowia lipolytica, Debaryomyces hansenii, and Debaryomyces subglobosus to understand phenotypes observed under stress conditions. Since the -omics methods did not exist to conduct comparison across so many yeasts and conditions, we evaluated two different approaches – “network first” where a metabolic map is used as a scaffold to map genes and “cluster first” where a clustering algorithm is used to group genes with similar expression patterns across conditions. Our results showed that D. hansenii extensively and exclusively rewired its transcriptome in response to NaCl stress, while Y. lipolytica responded to multiple conditions including NaCl stress, (NH4)2SO4 starvation, and Fe(Cl3) starvation. For D. hansenii, flavinogenesis decreased in salt stress, but the transcriptome did not follow the same pattern. Thus, both the network first and cluster first approaches provided novel insights into nonconventional yeast biology. This has potential to simplify transcriptomics analysis for poorly annotated genomes and generate hypotheses for the creation of cell factories based on Debaryomyces yeasts.
Presentation Overview: Show
e-RNA is a collection of web-servers for the
prediction and visualisation of RNA secondary
structures and their functional features, including in
particular RNA-RNA interactions. In this updated
version, we have added novel tools for RNA secondary
structure prediction and have significantly updated the
visualisation functionality. The new method COBOLD
can identify transient RNA structure features and
their potential functional effects on a known RNA
structure during co-transcriptional structure formation.
New tool SHAPESORTER can predict evolutionarily
conserved RNA secondary structure features while
simultaneously taking experimental SHAPE probing
evidence into account. The web-server R-CHIE which
visualises RNA secondary structure information in
terms of arc diagrams, can now be used to also
visualise and intuitively compare RNA-RNA, RNA-
DNA and DNA-DNA interactions alongside multiple
sequence alignments and quantitative information. The
prediction generated by any method in e-RNA can be
readily visualised on the web-server. For completed
tasks, users can download their results and readily
visualise them later on with R-CHIE without having
to re-run the predictions. e-RNA can be found at
http://www.e-rna.org.
Presentation Overview: Show
There is an increased interest determining RNA structures as it is now possible to probe them in a high-throughput manner, e.g. using SHAPE protocols. There alteady exist computational methods that integrate experimental SHAPE-probing evidence into computational RNA secondary structure prediction. The state-of-the-art in this field is currently provided by computational methods that employ the minimum-free energy strategy for prediction RNA secondary structures with SHAPE-probing evidence. These methods, however, rely on the assumption that transcripts in vivo fold into the thermodynamically most stable configuration and ignore evolutionary evidence for conserved RNA structure features. We here present a new computational method, SHAPESORTER, that predicts RNA structure features without employing the thermodynamic strategy. Instead, SHAPESORTER employs a fully probabilistic framework to identify RNA structure features that are supported by evolutionary and SHAPE-probing evidence. Our method can capture RNA structure heterogeneity, pseudo-knotted RNA structures as well as transient and mutually exclusive RNA structure features. Moreover, it estimates P-values for the predicted RNA structure features which allows for easy filtering and ranking. We investigate the merits of our method in a comprehensive performance benchmarking and conclude that SHAPESORTER has a significantly superior performance for predicting base-pairs than the existing state-of-the-art methods.
Presentation Overview: Show
Analysis of high-resolution long-read RNA-seq data (e.g Pacific Bioscience single-molecule real-time [PacBio] sequencing) is the most straightforward and reliable method to detect alternative splicing. The main advantage over conventional short-read NGS is that it eliminates the need for transcript reconstruction as full-length RNA transcripts are read (up to 10kb) and full-length isoform detection overcomes haplotyping issues associated with short-read data. Both of these features expedite analyses for allele-specific isoform determination compared to short-read sequencing studies (Deonovic et al, 2017).
The Liver Project (approved by local ethics committee: 665-13) complements ongoing clinical stroke cohort studies, in order to better understand how 32 hemostatic genes that are predominantly expressed in the liver are regulated. For this project, liver tissue and blood samples were collected from patients undergoing liver surgery at the Dept. of Surgery at the Sahlgrenska University Hospital. Previously, we used short-read targeted (NGS) sequencing: DNA-seq for SNP calling, RNA-seq for gene-level mRNA quantification, and bisulfite-sequencing for determination of. DNA methylation at CpG sites. We will now use cDNA capture, followed by PacBio sequencing to analyze full-length isoforms of the 32 genes from the same liver sample.
Presentation Overview: Show
The average human exon is approximately 140nt long. However, a sizeable number of exons are shorter than 22nt, highly conserved, and play an important biological role. These microexons lack splicing regulator motifs and rely on other regulatory mechanisms. They are therefore a useful lens through which to study splicing regulation. However, their short size makes them hard to accurately detect, align, and annotate. Advances in long-read RNA sequencing (lrRNAseq) technology allow the sequencing of full-length transcripts, placing those microexons in their appropriate transcriptional context, but at the cost of lower accuracy relative to short-read approaches.
We have developed Muex, a method to identify microexons from lrRNAseq data. To test our method’s accuracy and sensitivity, and to fine-tune alignment parameters to better detect microexons, we used a dataset combining short- and long-read sequences from a human cell line. Finally, we applied Muex to lrRNAseq data from post-mortem disordered human brain samples. From a panel of 389 genes, we detected 55 putative novel microexons not previously annotated. Our work enables the prediction and functional characterisation of novel microexons, improving annotation and facilitating downstream investigations into their splicing regulation across tissues, cells, and development in health and disease.
Presentation Overview: Show
Our study concerns the classification and prediction of type I/II A-minor motifs in RNA 3D structures. It investigates the importance of several kinds of structural context in the formation of this motif, such as the 3D substructure around a motif. Our purpose is to determine what kind of information in the structural context can be useful to characterize and predict the presence and the position of these motifs.
Firstly, we develop an automated method to classify and characterize A-minor motifs according to their 3D context similarities. Secondly, we model the topological context of A-minor motifs and of their classes by graphs, and use it to study predictive ability of A-minor motifs, knowing only the topological context and sequence information.
We thus uncovered new subclasses of A-minor motifs according to their local 3D similarities. Most classes are composed of homologous occurrences, but some are composed of non-homologous occurrences, which could suggest an evolutive convergence. We also showed that, for some A-minor motifs, the topology combined with a sequence signal is sufficient to predict their presence and position. In most other cases, these signals are not sufficient for predicting A-minor motifs, however we show that they are interesting signals for this purpose.
Presentation Overview: Show
* Equal contribution: Gregor Rot and Arne Wehling
Splicing of RNA is a fundamental biological process. Dysregulation of splicing has been implicated in many human diseases and successfully exploited as a therapeutic target. Splicing analysis using short-read RNA-sequencing is a powerful technique to triage mechanism of action and safety profiles of drug candidates. There is, however, currently no open-source software pipelines for such applications.
To address this need, we introduce splicekit, a Python package that provides a comprehensive set of splicing analysis tools. A prototypical pipeline built with splicekit starts with the identification of count data at the levels of junctions, exons and genes, and identifies regulated features. Downstream analysis includes sample clustering, event visualization, RNA-protein binding and sequence motif analysis to elucidate both cis and trans regulatory events involving regulated splice sites.
In summary, splicekit provides a user-friendly and powerful toolset for comprehensive splicing analysis from short-read RNA-seq data. We anticipate that it will be valuable for researchers in both basic and translational research of splicing modulation.
Presentation Overview: Show
Single-cell RNA sequencing (scRNA-seq) is a powerful technique for studying cellular heterogeneity and identifying biomarkers across various cell types. However, current scRNA-seq protocols have limitations in isoform-level analysis due to biases such as 3'-coverage biases and low RNA capture efficiency. To address these limitations, Smart-Seq technologies have been developed with limited throughput, enabling the capture of full-length transcripts in single cells, particularly useful for single-cell alternative splicing analysis. Meanwhile, single-cell long-read sequencing (MAS-seq) covers the entire length of transcripts, overcoming coverage bias, but suffers from high sequencing error and low throughput.
Since it is currently unclear how these new technologies perform in isoform-level analysis, we systematically compared scRNA-seq full-length transcript methods, including Smart-Seq2/3/3xpress and MAS-seq. We analyzed 5 data sets from human PBMC and performed downsample analysis to account for cell number variations. Our study provides insights into the effectiveness of different technologies in isoform-level analysis. We found that Smart-Seq3 shows superior performance in mapping quality, coverage bias, transcript counting, and cell type composition analysis. In conclusion, Smart-Seq3 is currently a superior technology for isoform-level analysis in single-cell RNA sequencing, while long-read methods (MAS-seq) still require improvement in RNA capture efficiency.
Presentation Overview: Show
Single-cell RNA (scRNA) is widely used to study cancer. However, it can be challenging to separate healthy from malignant cells based on gene expression alone. Cancer cells are clearly distinguished by somatic mutations, but these mutations are difficult to capture in scRNA data as the mutations have to be within actively transcribed regions with sufficient coverage across cells.
In our work, we directly use single nucleotide variants (SNVs) identified in scRNA data to separate healthy from cancer cells. We first show that it is possible to reliably call SNVs in scRNA data by leveraging information across cells, allowing us to capture even low coverage SNVs. We also identify patterns distinguishing false positive from true genomic SNVs and use these for filtering. We developed a clustering approach that accounts for the confounding effect of gene expression and the high drop-out rate. We validate our results on two datasets with known clonal structure: an acute myeloid leukemia dataset, and a colorectal cancer dataset. In both cases, we recover the known clonal structure using only the new SNVs. The method also returns a set of SNVs enriched in the different clones which we show can give valuable biological insight into the sample.
Presentation Overview: Show
Small nucleolar RNAs (snoRNAs) are non-coding RNAs present in all eukaryotes, known for their involvement in ribosomes biogenesis and gene expression regulation. Generally, snoRNAs are located in introns of longer genes or intergenic regions. The majority of snoRNAs are poorly characterized and there are currently a few predictors for their identification, which unfortunately, suffer typically from high rates of false positives and negatives. We aim to carry out a much wider screen to test the completeness of human snoRNAs. To do so, we have employed StringTie on TGIRT-seq from diverse normal human tissues to identify expressed genes missing from current annotations, having an intronic or intergenic location with size between 50 and 200 nucleotides. These data were integrated with immunoprecipitation sequencing studies of core snoRNA binding proteins (PAR-CLIP and eCLIP) as well as RNA-seq studies following depletion of these proteins to identify snoRNA candidates. This methodology allowed us to identify 119 potential snoRNAs including 65 box C/D and 55 box H/ACA snoRNAs which we are currently further validating and characterizing. These results further demonstrate that annotation of snoRNAs in humans is far from being exhaustive, hence the interest of implementing more efficient and more reliable pipelines for their identification.
Presentation Overview: Show
RNA-binding proteins (RBPs) play a vital role in post-transcriptional regulation, including RNA modification, stabilization, localization, and translation. Knowing their RNA targets and binding preferences is important for understanding the mechanisms of post-transcriptional regulation and their implications in human diseases.
We present RBPNet, a novel deep learning method, which, given an RNA sequence, predicts the CLIP-seq signal distribution at single-nucleotide resolution. RBPNet utilizes a dilated convolutional neural network (CNN) architecture and achieves high generalization performance on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. Crucially, RBPNet directly operates on the raw CLIP-seq signal, eliminating the need for preprocessing steps, such as peak calling. RBPNet performs bias correction by modeling the CLIP-seq signal as a mixture of the protein-specific and control signal, thus mitigating technical biases which may otherwise hinder downstream analysis. Through model interrogation via Integrated Gradients feature importance scores, RBPNet identifies predictive sub-sequences corresponding to known and novel binding motifs. Using in silico mutagenesis, RBPNet scores the impact of single-nucleotide variants on RBP-binding, thus aiding in prioritizing potentially disease-causing variants.
RBPNet is the first method to directly model the raw CLIP-seq signal at nucleotide-resolution, thus improving both computational inference of protein-RNA interaction and interpretation of predictions over the state-of-the-art.
Presentation Overview: Show
RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation, for which profiling of binding sites on targets can be experimentally evaluated in vivo. As such profiles are limited to the transcripts expressed in the experimental cell type, numerous machine-learning based methods have been developed to infer missing binding information. However, heterogeneity of training and evaluation datasets across various sets of RBPs and CLIP-seq protocols prevents a direct comparison of their performance.
To address this, we systematically benchmarked 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluated the predictive performance of these methods and assessed the contribution of neural network architectures and input modalities on model performance. We show that the negative sampling strategy significantly affects model performance, with notably a strong impact on secondary structure’s contribution. Additionally, we show various degrees of performance degradation in cross-cell-type prediction settings, with some models being more sensitive, reflecting RBP-specific context dependence.
We believe that this study will guide future methods development in the field of computational modeling of protein-RNA interaction by serving as a reference for method design in regards to architecture, input modalities and generation of negative controls.
Presentation Overview: Show
The use of engineered T cells or antibodies to specifically target tumors is a recent form of immunotherapy that has revolutionized the treatment of certain cancer types. Unfortunately, these new technologies cannot be applied to most cancers because there is a lack of cancer-specific molecules to target. Splicing variations abound in cancer and happen in a manner that distinguishes tumors from normal cells. In order to discover new immunotherapy targets, we developed a computational pipeline that relies on large cohorts of short-read RNA sequencing (RNA-Seq) to detect molecular patterns that are enriched or exclusively found in cancer and not normal tissues. The pipeline surveys and quantifies splicing variations and gene expression in individual cancer samples and compares them to an array of over 7,000 samples covering 86 normal tissues and blood cell types. Detection is then followed by targeted long-reads sequencing for full isoform validation. We employed the pipeline to identify between 53 and 133 cancer-specific splice junctions in four types of pediatric cancers. Our method can be applied to any cancer dataset for which RNA-Seq data is available and, within a few days, generates a list of candidate immunotherapy targets that can then be validated in the lab.
Presentation Overview: Show
Alternative splicing is a crucial mechanism for gene expression regulation and a major generator of proteome diversity. Alternative splicing has been mostly studied in bulk RNA-seq data, thus masking cellular heterogeneity. Single-cell RNA-seq can reveal cell type and trajectory-specific AS patterns leveraging from large cell numbers and fine grained cell type identification. However, commonly used droplet-based technologies for single-cell RNA-seq pose challenges due to high data sparsity and positional bias in sequencing reads across mRNA transcripts.
Here, we conduct a benchmark study on transcript quantification and differential splicing analysis using simulated 3’ scRNA-seq data. We evaluate several methods designed for AS analysis both for single-cell and bulk RNA-seq. Our results show that, in general, most methods have high precision but limited recall in transcript quantification due to undetected lowly expressed features in single-cells, which can be improved through pseudo-bulking.
We perform differential transcript usage and percent spliced-in analyses in a PBMC dataset from individuals of different genetic backgrounds. We detect population-specific splicing differences in the context of influenza virus infection across several cell types. Overall, our study highlights the strengths and limitations of alternative splicing characterization in 3’ scRNAseq data while providing insights into splicing differences between populations.
Presentation Overview: Show
The spatial organization of molecules in a cell is essential for performing their functions. Spatial transcriptomics technologies have opened the door to characterization of cellular and subcellular organization. While current computational methods focus on discerning tissue architecture, cell-cell interactions and spatial expression patterns, these approaches are limited to investigating spatial variation at the multicellular scale. We present Bento, a Python toolkit that fully takes advantage of single-molecule information to enable spatial analysis at the subcellular scale. Bento ingests molecular coordinates and segmentation boundaries to perform three fundamental analyses: defining subcellular domains, annotating localization patterns, and quantifying gene-gene colocalization. To demonstrate the toolkit, we apply these methods to a variety of datasets including U2-OS cells (MERFISH), 3T3 cells (seqFISH+), and treated cardiomyocytes (Molecular Cartography). We quantify RNA localization changes in cardiomyocytes identifying mRNA depletion of critical cardiac disease-associated genes RBM20 and CACNB2 from the endoplasmic reticulum upon doxorubicin treatment. The Bento package is a member of the open-source Scverse ecosystem, enabling integration with other single-cell omics analysis tools.
Presentation Overview: Show
Five-prime single-cell RNA-seq (scRNA-seq) has been widely employed to profile cellular transcriptomes, however, its power of analysing transcription start sites (TSS) has not been fully utilised. Here, we present a computational method suite, CamoTSS, to precisely identify TSS and quantify its expression by leveraging the cDNA on read 1, which enables effective detection of alternative TSS usage. With various experimental data sets, we have demonstrated that CamoTSS can accurately identify TSS and the detected alternative TSS usages showed strong specificity in different biological processes, including cell types across human organs, the development of human thymus, and cancer conditions. As evidenced in nasopharyngeal cancer, alternative TSS usage can also reveal regulatory patterns including systematic TSS dysregulations.
Presentation Overview: Show
Congenital Heart Disease (CHD) is the most common type of birth defect, yet its genetic contributors are poorly understood. There is growing evidence to the role of long non-coding RNAs (lncRNAs) in causing disease. Single nucleotide variants (SNVs) can alter the function of lncRNAs, however, there are no in silico tools which are heart-specific that can predict which of these variants are pathogenic versus benign. Therefore, we aim to develop a machine learning classifier to predict the pathogenic impact of lncRNA variants that contribute to CHD etiology.
We used three components to inform our classifier. First, using developing human heart single-cell RNA-sequencing we identified expressed lncRNAs. Second, we used the Genome Aggregation Database to determine the frequency of SNVs. Third, given that structure plays a role in lncRNA function, we computed the impact of SNVs on lncRNA structure. From the expression profiles we revealed lncRNAs which are co-expressed with known CHD genes. We found context-specific lncRNA SNVs which appear in lower rates in the human population and are more likely to destabilize RNA structure. In conclusion, by combining heart expression profiles, population frequencies, RNA stability metrics, and sequence-specific context we are developing the first lncRNA variant pathogenic score for CHD.
Presentation Overview: Show
The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes O(n^6) for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to O(n^3) by restricting the alignment search space, but this is still too slow for long sequences such as full-length viral genomes. On the other hand, the Sankoff algorithm and all its existing implementations use a rather simplistic alignment model, which can result in poor alignment accuracy. To address these problems, we propose LinearSankoff, which seamlessly integrates the original Sankoff algorithm with a powerful Hidden Markov Model-based alignment model. This extension substantially improves alignment quality, which in turn benefits secondary structure prediction quality, confirmed over a diverse set of RNA families. LinearSankoff also applies beam search heuristics and the A⋆-like algorithm to achieve that runtime scales linearly with sequence length. LinearSankoff is the first linear-time algorithm for global simultaneous folding and alignment, and the first such algorithm to scale to coronavirus genomes (n ≃ 30, 000nt). It only takes 10 minutes for a pair of SARS-CoV-2 and SARS-related genomes.
Presentation Overview: Show
(To Appear in Nature)
Messenger RNA (mRNA) vaccines are being used to contain COVID-19, but still suffer from the critical limitation of mRNA instability and degradation, which is a major obstacle in the storage, distribution, and efficacy of the vaccine products. Previous work showed that increasing secondary structure lengthens mRNA half-life, which, together with optimal codons, improves protein expression. Therefore, a principled mRNA design algorithm must optimize both structural stability and codon usage. However, due to synonymous codons, the mRNA design space is prohibitively large (e.g., ~10^632 candidates for the SARS-CoV-2 Spike protein). Here we provide a simple and unexpected solution using a classical concept in computational linguistics, where finding the optimal mRNA sequence is akin to identifying the most likely sentence among similar sounding alternatives. Our algorithm takes only 11 minutes for the Spike protein, and can jointly optimize stability and codon usage. On both COVID-19 and VZV vaccines, LinearDesign substantially improves mRNA half-life and protein expression, and dramatically increases antibody titer by up to 128× in vivo, compared to the codon-optimization benchmark. This surprising result reveals the great potential of principled mRNA design, and enables the exploration of previously unreachable but highly stable and efficient designs.
Presentation Overview: Show
attached long abstract
Presentation Overview: Show
At Center for non-coding RNA in Technology and Health (RTH) we provide tools for CRISPR/Cas9 gRNA design and for analysis of on-targets and off-targets for edits that was followed by transcriptome analysis. For gRNA design, our tools CRISPRon[1] and CRISPRoff[2] are combined into a user-friendly webserver: CRISPRon/off[3]. CRISPRon is a state-of-the-art deep learning model trained on human indel data, which outperforms other methods including those based on loss of function data used in several popular webservers. CRISPRoff, which is among the best performing off-target models, is based on a nucleotide binding energy model and not organism specific. For post editing analysis involving RNAseq data, we provide CRISPRroots[4] which combines CRISPRoff predictions with variance calling and differential expression analysis to establish if the observed changes between a CRISPR edited cell and a wildtype cell is influenced by one or more off-targets. The tools are available at http://rth.dk/resources/crispr.
References:
[1] Xiang X†, Corsi GI†, Anthon C†, Qu K†, et al. Nat Commun. 2021.
[2] Alkan F, et al. Genome Biol. 2018.
[3] Anthon C, et al. Bioinformatics. 2022
[4] Corsi GI, et al. Nucleic Acids Res. 2022
Presentation Overview: Show
Isoform switching events have been implicated in causing tumor progression. The Tumor Profiler (TuPro) Study, a multi-omics study of metastatic Melanoma, metastatic Ovarian cancer, and Acute Myeloid Leukemia, provides short-RNA-seq data and matched DIA-MS proteomics data. Here, we aimed to identify cancer-specific isoforms (CSI) in all three cancer types by integrating multi-omics level data from TuPro and TCGA and utilizing GTEx samples as a normal reference. Isoform switches were calculated with our in-house R tool MDTSwitchAnalyzer. In total, we analyzed 1112 cancer samples and identified 1644 CSI across three cancer types. Notably, the most frequently observed CSI were transcripts of the ADD3, TRAPPC5, and HIPK1 genes in Ovarian Cancer, Melanoma, and AML, respectively. Over 82% of Ovarian cancer samples exhibited the longest isoform of ADD3, which is already associated with cancer cell proliferation. However, the CSI of TRAPPC5 was identified in 60% of Melanoma samples, causing the loss of all protein interactions with the tumorigenesis-associated TRAPP complex.
Furthermore, a CSI of the HIPK1 gene, linked to angiogenesis, was observed in over 58% of AML samples. Validation analyses utilizing DIA-MS proteomics are ongoing for all identified transcripts. Our study reveals biomarker candidates across three cancer types by using multi-level data.
Presentation Overview: Show
Public RNAseq databases are precious resources to identify specific transcriptional events. Therefore, we want to make them accessible providing a better capture of the transcriptome complexity. Computational methods performing indexing of k-mers constitute interesting solutions to interrogate large omics datasets. We developed TranSipedia, a new framework based on k-mers, constructed with several modules: the RNAseq indexing with Reindeer (Marchet et al, 2020), a novel method to request all transcribed information; a module to generate k-mers as signature of transcripts (Kmerator; Riquier et al, 2021); a supporting website.
Reindeer performs indexing of k-mers and records their counts across large dataset collections. It provides an ultra-fast performance in the query process while indexing thousands of RNAseq. Moreover, it retains all the information contained in the raw data (annotation-free). For applications where gene expression level is required, the k-mer count must be sufficiently representative. Indeed, we developed Kmerator to construct specific k-mers. Lastly, the Transipedia website is available to facilitate queries and sharing by biologists. It now includes several thousands of datasets, mainly for cancer applications for example with the whole CCLE cohort (1019 RNAseq ~10 To), and is necessary to check biomarkers specificity comparing normal and tumor datasets.
Presentation Overview: Show
Public RNAseq databases are precious resources to identify specific transcriptional events. Therefore, we want to make them accessible providing a better capture of the transcriptome complexity. Computational methods performing indexing of k-mers constitute interesting solutions to interrogate large omics datasets. We developed TranSipedia, a new framework based on k-mers, constructed with several modules: the RNAseq indexing with Reindeer (Marchet et al, 2020), a novel method to request all transcribed information; a module to generate k-mers as signature of transcripts (Kmerator; Riquier et al, 2021); a supporting website.
Reindeer performs indexing of k-mers and records their counts across large dataset collections. It provides an ultra-fast performance in the query process while indexing thousands of RNAseq. Moreover, it retains all the information contained in the raw data (annotation-free). For applications where gene expression level is required, the k-mer count must be sufficiently representative. Indeed, we developed Kmerator to construct specific k-mers. Lastly, the Transipedia website is available to facilitate queries and sharing by biologists. It now includes several thousands of datasets, mainly for cancer applications for example with the whole CCLE cohort (1019 RNAseq ~10 To), and is necessary to check biomarkers specificity comparing normal and tumor datasets.
Presentation Overview: Show
Small RNAs (sRNAs) play important regulatory roles in bacteria affecting their phenotype and tolerance to various chemicals, which could be utilized for their genome engineering to improve various fermentation processes for the production of bio-based chemicals. Although specialized techniques to study sRNAs like GRIL-Seq, RIP-Seq, or RIL-Seq exist, their implementation in non-model bacteria is cumbersome and sRNA prediction is usually performed with computational tools processing standard RNA-Seq data. Here, we studied the influence of RNA-Seq deduplication using unique molecular identifiers (UMIs) on prediction of sRNAs in two non-model bacteria, Caldimonas thermodepolymerans and Rhodospirillum rubrum. Even after quality and adapter trimming, RNA-Seq data contained erroneous reads causing unexplained bias in per sequence GC content. This bias was removed by deduplication using UMIs that for test dataset of 48 RNA-Seq samples removed roughly 50% of all reads. While gene expression of protein coding genes remained almost unaffected, numbers of predicted sRNA differed greatly. The explanation was found in the change of sequencing depth that highly influences prediction of sRNAs based on identifying peaks in coverage of non-coding regions. Therefore, algorithms for prediction of sRNAs from standard RNA-Seq need to be revised to allow fully automatic reliable predictions.
Presentation Overview: Show
Genes can be encoded with seemingly equivalent synonymous codons, but codon choice can have dramatic effects on gene output. Naive rules for codon optimization replace slowly-translated codons with synonymous, optimally-translated codons, with the goal of increasing protein production. However, slowly-translated codons can have important functional roles, for instance by facilitating co-translational folding. While it is widely acknowledged that synonymous codons are not exact synonyms, the basic molecular rules governing codon choice are still poorly understood. We have explored when and where codon choice is most strongly constrained using computational and experimental methods. We conducted a genome-wide screen in yeast that targets positions of conserved slow translation. Using Cas9 retron editing, we created thousands of slow-to-fast synonymous codon substitutions, and grew them together in a pooled competition. Careful controls, including slow-to-slow substitutions and multiple guides targeting each site, allow confident identification of synonymous variants that significantly decrease or increase fitness. In parallel, we are training large language models on hundreds of thousands of eukaryotic genes in order to identify constraints on codon sequences. Combined with our large scale experimental data, our model will produce general rules for predicting the rare but important positions where ‘optimal’ codons are detrimental.
Presentation Overview: Show
Post-transcriptional chemical modifications regulate RNA biology. The most abundant modification in eukaryotic mRNA is N6-methyladenosine (m6A) which is deposited by the N6-methyltransferase (METTL3/METTL14) complex co-transcriptionally and affects splicing, stability, transport and translation. m6A dysregulation is implicated in cancer and other diseases. The precise mapping of m6A is therefore crucial, with recent profiling techniques enabling the detection of methylation sites at single-base resolution.
METTL3 inhibition impacts the expression of genes associated with innate immune pathways. To investigate the role of RNA methylation in this response, we collected and analysed publicly available high-resolution m6A data of human cell lines: HEK293, profiled using individual-nucleotide crosslinking and immunoprecipitation (miCLIP); HEK293T, profiled using chemical deamination of unmethylated adenosines (GLORI); HEK239, HeLa and HepG2, profiled using m6A-selective allyl chemical labelling (m6A-SAC-seq).
Our results show good reproducibility between recent antibody-free methods that detect far more m6A sites than previous techniques. In most of the datasets, immune-associated genes do not differ in overall methylation levels compared to other genes. However, GLORI data shows significant (Kolmogorov–Smirnov test < 0.01) hypomethylation in innate immune genes in HEK293T cells. Specifically, only 25% of immune genes have more than 10 m6A sites, compared to 35% of non-immune genes in this dataset.
Presentation Overview: Show
Small non-coding RNAs (sncRNAs), are crucial in the regulation of transcript expression. While the annotation of human miRNAs is comprehensive, other un-annotated sncRNAs and exogenously induced RNA molecules like small viral RNAs (svRNAs) are possibly overseen in smallRNA/transcriptome analyses. Yet, they can play a role in the regulation of gene expression, especially infection progression. By performing annotation-free and differential comparisons of small RNAs, even before identification of the RNA origin, we can bypass bad or missing annotation, which can be especially useful in case of novel viral infections like the recent SARS-CoV-2 outbreak. In our approach, we utilize a count model independent methodology that works for all signals, including skewed miRNA expression, or UMI barcoded and deduplicated data. The identified sequences can be used in combination with further analyses, e.g. assist and guide computational predictions or can be compared to other known occurrences of the identified or similar sncRNAs. With this methodology of gathering evidence for novel sncRNAs from RNA-seq data, it can be possible to fill gaps in common analyses that may be missing in order to further our understanding of causal interactions in complex regulatory networks, especially in the case of infectious disease progression.
Presentation Overview: Show
DIANA-microT-CDS is a state-of-the-art miRNA target prediction algorithm and one of the first algorithms to predict miRNA binding sites in both the 3’ Untranslated Region (3’-UTR) and the coding sequence (CDS) of transcripts, with increased performance. The current version of the microT webserver, (DIANA-microT 2023, www.microrna.gr/microt_webserver), brings forward a significantly updated set of interactions. DIANA-microT-CDS has been executed utilizing annotation information from Ensembl v102, miRBase 22.1 and, for the first time, MirGeneDB 2.1, yielding more than 83 million interactions in human, mouse, rat, chicken, fly and worm species. Additionally, this version delivers predicted interactions of miRNAs encoded from 20 viruses against host transcripts from human, mouse and chicken species. DIANA-microT integrates supplemental computational resources, including interactions from DIANA-TarBase and TargetScan, miRNA-disease links from plasmiR and HMDD, variant information from dbSNP, ClinVar, as well as miRNA/gene abundance values in numerous cellular/tissue contexts. The server interface has been redesigned allowing users to use smart filtering options, identify abundance patterns of interest, pinpoint known SNPs residing on binding sites and obtain miRNA-disease information. The contents of DIANA-microT webserver are freely accessible and can also be locally downloaded without any login requirements.
Presentation Overview: Show
DIANA-miRPath webserver enables the exploration of combined miRNA effects using predicted or experimentally supported miRNA interactions. Its latest version (DIANA-miRPath v4.0, http://www.microrna.gr/miRPathv4), introduces the capacity to tailor its target-based miRNA functional analysis engine towards specific biological/experimental contexts. Via a redesigned modular interface with rich interaction, annotation and parameterization options, users can perform enrichment analysis on Gene Ontology (GO) terms, KEGG and REACTOME pathways, gene sets from Molecular Signatures Database (MSigDB) and PFAM. Included miRNA interaction sets are derived from state-of-the-art resources of experimentally supported (DIANA-TarBase v8.0, miRTarBase and microCLIP cell-type-specific interactions) or from in silico miRNA-target interactions (DIANA-microT-2023 and TargetScan predictions). Bulk and single-cell expression datasets from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), as well as single-cell expression atlases can be used to assess expression change of targeted genes within terms, across a wide range of states. A discrete module to perform miRNA-tailored CRISPR knock-out screen analyses deems possible the investigation of selected miRNAs within conditions under study. Lastly, the option to upload custom interaction, term, expression, and screen sets further expands miRPath’s utility.