Spatial transcriptomics (ST) technologies enable detailed transcriptome profiling while preserving spatial context, offering insights into tissue architecture and transcriptional patterns. Spot-based ST data lacks single-cell resolution, challenging the identification of cellular heterogeneity. High-resolution wet-lab technologies exist but often at a higher cost, prompting experimentalists to question the necessary spot size. This decision partly depends on the computational tasks performed on ST data. Here, we focus on cell type annotation for spots, encompassing cell type deconvolution and label transfer analyses.
This study compares six methods—three deconvolution methods (Tangram, CARD, and cell2location) and three label transfer methods (scVI, SingleR, and SVM)—that use both spot-based ST and single-cell RNA sequencing (scRNA-seq) data for deconvoluting and labeling spots of varying sizes and average cell counts. We aim to provide practical guidelines for selecting spot sizes and the most appropriate method for specific requirements.
We simulated spots with varying sizes using labeled single-cell ST data. The evaluation included four real-world datasets (Mouse brain, Mouse embryo E16.5, Mouse Gastrulation E7.5, and Mouse olfactory bulb) and considered various metrics, resolutions, and spatial transcriptomics technologies. We evaluated the methods using traditional metrics (accuracy, cosine similarity, and Euclidean distance) and their adjusted versions incorporating an evenness measure to address prediction skew. Traditional metrics often fail to account for prediction diversity, favoring methods that predict all spots as a single cell type. To mitigate this, we incorporated an evenness score to penalize homogeneous predictions.
In this study, we consider two primary tasks: 1) obtaining proportions of cell types in each spot, and 2) assigning one label to each spot. For determining cell type proportions, we recommend using spots with 25+ cells to ensure accurate deconvolution, with Tangram and CARD being the best-performing methods. For the label transfer task, which involves assigning a single label to each spot, a spot size of 5-10 cells is optimal, balancing cost considerations with the resolution needed for effective spatial transcriptomics. Here, SingleR and Tangram emerge as the top-performing methods. These recommendations consider the cost of high-resolution spatial transcriptomics and the need for precise cellular composition data.
This study offers a comprehensive analysis of the strengths and limitations of existing methods for spot-based spatial transcriptomics, providing guidelines for selecting the optimal spot size and the best computational methods based on research needs and helping researchers make informed decisions tailored to their specific needs in spatial transcriptomics.