Bias analysis for long-reads transcriptomics multi-sample datasets
Confirmed Presenter: Ana Victoria Conesa Cegarra, Spanish National Research Council, Spain
Room: 519
Format: In Person
Moderator(s): Ashley Laughney
Authors List: Show
- Alejandro Paniagua, Spanish National Research Council, Spain
- Jorge Mestre-Tomas, Spanish National Research Council, Spain
- Liudmyla Kondratova, University of Florida, United States
- Fabian Jetzinger, Biobam Bioinformatics, Spain
- Stanley Cormack, Imperial College London, United Kingdom
- Natalia Vega, University of Valencia, Spain
- Luis Ferrández-Peral, Spanish National Research Council, Spain
- Carolina Monzó, Spanish National Research Council, Spain
- Ana Victoria Conesa Cegarra, Spanish National Research Council, Spain
Presentation Overview: Show
Long-read sequencing technologies such as PacBio and Oxford Nanopore are reshaping transcriptomics. The enhanced precision and depth of sequencing from these methods are proving critical for differential isoform expression studies across various conditions. This shift towards long-reads necessitates new best practices for experimental designs, preprocessing, and normalization tailored to these data types.
We set out to provide analysis guidelines for multi-sample long-read transcriptomics experiments. Utilizing a replicated dataset from mouse tissues and three long-read cDNA protocols, including the newest Pacbio Kinnex, we explored biases when constructing count tables.
We evaluated two main approaches: Call&Join (call transcript model for each sample and then combine results) and Join&Call (merge reads from different samples, then call transcripts models and re-quantify), finding that each strategy renders a different transcriptome composition depending on the analysis tool. Sequencing depth and replicate number significantly affect transcript identification, with known transcripts quickly stabilizing and novel ones requiring more depth, and most transcripts detected either by all or just one sample.
We detected variable biases in quantification due to read length and GC content across technologies. For instance, PacBio data showed a parabolic length bias and increased expression levels with higher GC content, although this greatly varied by sample, challenging differential analyses.
Our study highlights that experimental and preprocessing choices profoundly affect the long-read transcriptome count-tables. Length and GC content biases impact quantification, influenced by sample and technology. The results underscore the importance of thoughtful experimental design and preprocessing to ensure accurate transcriptome dataset composition and comparable quantification.
RISE: Relative Impact of Splicing and Expression in RNA-seq studies
Confirmed Presenter: Yu-Jen Lin, University of California, Berkeley, United States
Room: 519
Format: In Person
Moderator(s): Ashley Laughney
Authors List: Show
- Yu-Jen Lin, University of California, Berkeley, United States
- Amr Alazali, University of California, Berkeley, United States
- Zhiqiang Hu, University of California, Berkeley. Currently at: Illumina, Foster City, California, United States
- Steven Brenner, University of California, Berkeley, United States
Presentation Overview: Show
RNA-seq has been widely used to quantify expression and splicing changes in transcriptomes. Although biological consequences arise from changes in both expression and splicing aspects, researchers usually use their impressions to choose only one aspect to analyze, potentially overlooking significant impacts of the other. Even if researchers investigate both, the measurement scales of expression and splicing are different, and thus, their impacts are incomparable. To compare the relative impact of expression and splicing, we have developed RISE.
RISE qualifies the relative impact of expression and splicing changes caused by the treatment. To place the impact of expression and splicing changes on the same scale to compare, we developed the Normalized Variation (NV) measure. NV is defined as the proportion of the between-group variation to the total variation. Finally, we assess whether expression NV (eNV) or splicing NV (sNV) is significantly larger to understand the comparative influence of expression versus splicing alternations in the transcriptome.
To validate our method, we performed RISE analysis on RNA-seq data from knockdown or overexpression experiments of 11 transcription and splicing factors. RISE effectively categorizes transcription and splicing factors by their relative impacts on expression and splicing. As an example application, we applied RISE to 4 studies involving proteins with complex or previously unknown roles in regulating transcriptomes to understand their functions. In summary, RISE enables researchers to systematically compare the relative impact of expression and splicing.
From Noise to Signal: Quantifying Stochasticity in mRNA Splicing
Confirmed Presenter: Eraj Khokhar, RTI, UMass Chan Medical School, United States
Room: 519
Format: In Person
Moderator(s): Ashley Laughney
Authors List: Show
- Eraj Khokhar, RTI, UMass Chan Medical School, United States
- Kaitlyn Brokaw, RTI, UMass Chan Medical School, United States
- Nida Javeed, RTI, UMass Chan Medical School, United States
- Zachary Kartje, RTI, UMass Chan Medical School, United States
- Valeria Sanabria, RTI, UMass Chan Medical School, United States
- Jonathan Watts, RTI, UMass Chan Medical School, United States
- Athma Pai, RTI, UMass Chan Medical School, United States
Presentation Overview: Show
Splicing is likely a major contributor to noise in mRNA regulation, with errors in splicing leading to reduced transcriptional efficiency and wasted transcriptional output. Cryptic splicing involves use of low-fidelity or infrequently bound splice sites that often leads to non-productive transcripts, likely targeted for degradation. Substantial evidence suggests that splicing noise is prevalent in homeostatic cell conditions, but the extent to which it occurs is likely underappreciated due to the challenges of identifying cryptic, low-fidelity splice site usage in mature mRNA data. Characterizing splicing noise has become increasingly important since blocking or redirecting the use of noisy splice sites in favor of productive splice sites may provide a novel strategy for up-regulating gene expression in healthy or disease contexts with high levels of splicing noise (e.g., cancer). Here, we tackle these challenges by performing high-throughput sequencing on selectively enriched nuclear nascent RNA, which greatly increases the global detection of cryptic splice sites. We further developed a python package to systematically identify and analyze cryptic, low-fidelity sites in high-throughput sequencing data from nascent, nuclear RNA and RNA from cycloheximide-treated cells. We use these experimental and computational methods to analyze cryptic splicing events in cancer cell lines and identify genomic features, sequence elements, and gene properties associated with the occurrence of cryptic splice sites across and between cell types. Our findings uncover a previously under-appreciated role for stochasticity in regulation of mRNA splicing, identify features predictive of splicing noise, and will aid in developing novel disease therapeutics to inhibit cryptic splicing.