Adenosine-to-inosine (A-to-I) RNA editing is a post-transcriptional modification catalyzed by ADAR enzymes that can alter codons, splicing patterns, and RNA secondary structures. This process is essential for neuronal development and immune function, with dysregulation implicated in neurological disorders, cancers, and autoimmune diseases. Despite its biological importance, accurate detection of RNA editing from RNA-seq data remains technically challenging, and robust inference of differential editing between experimental conditions is not straightforward.
To address these challenges, we have developed EdiSetFlow, a reproducible and scalable pipeline for transcriptome-wide A-to-I RNA editing analysis from bulk RNA-seq data. EdiSetFlow is implemented in Nextflow takes raw FASTQ files as input, performs read trimming and quality filtering, aligns reads to the reference genome, and identifies editing sites with JACUSA. Common genetic variants are excluded based on the gnomAD population database. Identified sites are annotated for gene context and predicted functional consequences, with results summarized in a user-friendly HTML report. The pipeline is designed to efficiently scale to hundreds or thousands of samples, making it suitable for large datasets such as GTEx.
An accompanying R package enables advanced analyses, including model fitting, hypothesis testing, false discovery rate control, and visualisations, facilitating reliable statistical comparisons of editing between experimental groups. Applying EdiSetFlow to GTEx brain RNA-seq data, we uncovered distinct RNA editing signatures across brain regions, identifying both known and previously uncharacterized regional editing patterns. EdiSetFlow provides researchers with a robust, end-to-end solution to efficiently discover and interpret biologically meaningful RNA editing events in diverse transcriptomic datasets.