|11:00 - 11:30||Single Cell and Spatial Data Analysis Keynote: Single-cell eQTL mapping identifies cell type specific genetic control of autoimmune disease||Joseph Powell, Director of the Garvan-Weizmann Centre for Cellular Genomics, Australia||Yes|
|11:30 - 12:00||Single Cell and Spatial Data Analysis Keynote: Multi-omic data integration to investigate tissue architecture||Sarah A. Teichmann, FMedSci FRS, Cellular Genetics Programme Head, Wellcome Sanger Institute Director of Research, Cavendish Laboratory, Univ. Cambridge||Yes|
|12:00 - 12:05||Comparison of Resources and Methods to infer Cell-Cell Communication from Single-cell RNA Data||Daniel Dimitrov, Dénes Türei, Charlotte Boys, James S. Nagai, Ricardo O. Ramirez Flores, Hyojin Kim, Bence Szalai, Ivan G. Costa, Aurelien Dugourd, Alberto Valdeolivas and Julio Saez-Rodriguez||No|
|12:05 - 12:10||DUBStepR: correlation-based feature selection for clustering single-cell RNA sequencing data||Bobby Ranjan, Wenjie Sun, Jinyu Park, Kunal Mishra, Ronald Xie, Fatemeh Alipour, Vipul Singhal, Florian Schmidt, Ignasius Joanito, Nirmala Arul Rayan, Michelle Gek Liang Lim and Shyam Prabhakar||No|
|12:10 - 12:15||CellRank for directed single-cell fate mapping||Marius Lange, Volker Bergen, Michal Klein, Manu Setty, Bernhard Reuter, Mostafa Bakhti, Heiko Lickert, Meshal Ansari, Janine Schniering, Herbert B. Schiller, Dana Pe'Er and Fabian J. Theis||No|
|12:15 - 12:20||MAAPER: model-based analysis of alternative polyadenylation using 3’ end-linked reads||Wei Vivian Vivian, Dinghai Zheng, Ruijia Wang and Bin Tian||No|
|12:40 - 12:53||Sparcle: assigning transcripts to cells in multiplexed images||Sandhya Prabhakaran, Tal Nawy and Dana Pe'Er||No|
|12:53 - 13:06||Giotto suite, a flexible and high-performing framework for spatial multi-modal analysis||Natalie Del Rossi, Jiaji George Chen, Guo-Cheng Yuan and Ruben Dries||No|
|13:06 - 13:19||Spatial multi-omic map of human myocardial infarction||Ricardo Omar Ramirez Flores, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Germany||No|
|13:20 - 13:33||TUMOROSCOPE: Inferring a map of tumor subclones from Spatial Transcriptomics and bulk DNA sequencing data||Shadi Darvish Shafighi, Agnieszka Geras, Alireza Sahaf Naeini, Barbara Jurzysta, Hosein Toosi, Igor Filipiuk, Łukasz Rączkowski, Łukasz Koperski, Jens Lagergren and Ewa Szczurek||No|
|13:34 - 13:47||scJoint: transfer learning for data integration of atlas-scale single-cell RNA-seq and ATAC-seq||Yingxin Lin, Tung-Yu Wu, Sheng Wan, Jean Yang, Wing Wong and Rachel Wang||No|
|13:47 - 14:00||Inferring regulomes from multi-modal single-cell measurements with Pando||Jonas Simon Fleck, Sophie Mj Jansen, Zhisong He, J Gray Camp and Barbara Treutlein||No|
|14:20 - 14:50||Single Cell and Spatial Data Analysis Keynote: Analysis of large-scale single cell transcriptome data for tumor-infiltrating immune cells||Zemin Zhang, Biomedical Pioneering Innovation Centre Peking University, Beijing, China||Yes|
|14:50 - 15:20||Single Cell and Spatial Data Analysis Keynote: New single-cell technologies to dissect reprogramming and development||Samantha A Morris, Assistant Professor of Developmental Biology and Genetics, Allen Distinguished Investigator, New York Stem Cell Foundation Robertson Investigator, Washington University School of Medicine||Yes|
The human immune system displays remarkable variation between individuals, leading to differences in susceptibility to autoimmune disease.
We present single cell RNA sequence data from 1.3m peripheral blood mononuclear cells from 982 healthy human subjects.
For 14 cell types, we identified 26,597 independent cis-expression quantitative trait loci (eQTLs), and 62,305 trans-eQTL,
with the majority showing cell type specific effects on gene expression.
We subsequently show how eQTLs have dynamic allelic effects in B cells transitioning from naïve to memory states,
and demonstrate how commonly segregating alleles lead to inter-individual variation in immune function.
Finally, utilizing a Mendelian randomization approach, we identify the causal route by which 305 risk loci contribute to autoimmune disease at the cellular level.
This work brings together genetic epidemiology with scRNA-seq to uncover drivers of inter-individual variation in the immune system.
Multi-modal data sets are growing rapidly in single cell genomics, as well as other fields in science and engineering.
We introduce MultiMAP, an approach for dimensionality reduction and integration of multiple datasets.
MultiMAP embeds multiple datasets into a shared space so as to preserve both the manifold structure of each dataset independently,
in addition to the manifold structure in shared feature spaces. MultiMAP is based on the rich mathematical foundation of UMAP,
generalizing it to the setting of more than one data manifold.
MultiMAP can be used for visualization of multiple datasets as well as an integration approach that enables subsequent joint analyses.
Compared to other integration for single cell data, MultiMAP is not restricted to a linear transformation, is extremely fast,
and is able to leverage features that may not be present in all datasets. We apply MultiMAP to the integration of a variety of single-cell transcriptomics,
chromatin accessibility, methylation, and spatial data, and show that it outperforms current approaches in run time, label transfer, and label consistency.
On a newly generated single cell ATAC-seq and RNA-seq dataset of the human thymus, we use MultiMAP to integrate cells across pseudotime.
This enables the study of chromatin accessibility and TF binding over the course of T cell differentiation.
The growing availability of single-cell data has sparked an increased interest in the inference of cell-cell communication from this data. Many tools have been developed for this purpose. Each of them consists of a resource of intercellular interactions prior knowledge and a method to predict potential cell-cell communication events. Yet the impact of the choice of resource and method on the resulting predictions is largely unknown. To shed light on this, we created a framework, available at https://github.com/saezlab/liana, to facilitate a comparative assessment of methods for inferring cell-cell communication from single cell transcriptomics data and then compared 15 resources and 6 methods. We found few unique interactions and a varying degree of overlap among the resources, and observed uneven coverage in terms of pathways and biological categories. We analysed a colorectal cancer single cell RNA-Seq dataset using all possible combinations of methods and resources. We found major differences among the highest ranked intercellular interactions inferred by each method even when using the same resources. The varying predictions lead to fundamentally different biological interpretations, highlighting the need to benchmark resources and methods.
Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. However, we found that the performance of existing feature selection methods was inconsistent across benchmark datasets, and occasionally even worse than without feature selection. Moreover, existing methods ignored information contained in gene-gene correlations. We therefore developed DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. DUBStepR also demonstrated a significant improvement in additional benchmarking analyses focused on detection of rare cell types. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.
Computational trajectory inference enables the reconstruction of cell-state dynamics from single-cell RNA sequencing experiments. However, trajectory inference requires that the direction of a biological process is known, largely limiting its application to differentiating systems in normal development. Here, we present CellRank (https://cellrank.org) for single-cell fate mapping in diverse scenarios, including regeneration, reprogramming and disease, for which direction is unknown. Our approach combines the robustness of trajectory inference with directional information from RNA velocity, taking into account the gradual and stochastic nature of cellular fate decisions, as well as uncertainty in velocity vectors. On pancreas development data, CellRank automatically detects initial, intermediate and terminal populations, predicts fate potentials and visualizes continuous gene expression trends along individual lineages. Applied to lineage-traced reprogramming data, fate probabilities correctly recover reprogramming outcomes. CellRank also predicts a novel dedifferentiation trajectory during post-injury lung regeneration, including previously unknown intermediate cell states, which we confirm experimentally.
Most eukaryotic genes harbor multiple cleavage and polyadenylation sites (PASs), leading to expression of alternative polyadenylation (APA) isoforms. APA regulation has been implicated in a diverse array of physiological and pathological conditions. While RNA sequencing tools that generate reads containing the PAS, named onSite reads, have been instrumental in identifying PASs, they have not been widely used. By contrast, a growing number of methods generate reads that are close to the PAS, named nearSite reads, including the 3’ end counting strategy commonly used in single cell analysis. How these nearSite reads can be used for APA analysis, however, is poorly studied. Here, we present a computational method, named model-based analysis of alternative polyadenylation using 3’ end-linked reads (MAAPER), to examine APA using nearSite reads. MAAPER uses a probabilistic model to predict PASs for nearSite reads with high accuracy and sensitivity, and examines different types of APA events, including those in 3’UTRs and introns, with robust statistics. We show MAAPER’s accuracy with data from both bulk and single cell RNA samples and its applicability in unpaired or paired experimental designs. Our result also highlights the importance of using well annotated PASs for nearSite read analysis.
Imaging-based spatial transcriptomics has the power to reveal patterns of single-cell gene expression by detecting mRNA transcripts as individually resolved spots in multiplexed images. However, molecular quantification has been severely limited by the computational challenges of segmenting poorly outlined, overlapping cells, and of overcoming technical noise; the majority of transcripts are routinely discarded because they fall outside the segmentation boundaries. This lost information leads to less accurate gene count matrices and weakens downstream analyses, such as cell type or gene program identification.
Here, we present Sparcle, a probabilistic model that reassigns transcripts to cells based on gene covariation patterns and incorporates spatial features such as distance to nucleus. We demonstrate its utility on multiplexed error-robust fluorescence in situ hybridization (MERFISH), single-molecule FISH (smFISH) data, and spatially-resolved transcript amplicon readout mapping (STARmap).
Sparcle improves transcript assignment, providing more realistic per-cell quantification of each gene, better delineation of cell boundaries, and improved cluster assignments. Critically, our approach does not require an accurate segmentation and is agnostic to technological platform.
Previously our team introduced Giotto, an R package for the analysis and visualization of single-cell spatial data. Giotto represents an easy-to-use toolbox agnostic of the spatial platform used and offers a range of innovative algorithms to characterize tissue composition, identify spatial expression patterns, and find cellular interactions. To further support our overarching philosophy to create a multi-functional tool we built Giotto suite, which is backwards compatible and incorporates a number of extensions and enhancements. It provides an improved framework to represent current and future spatial datasets. First, we designed data structures that capture cell morphology features (e.g. cell boundary) and that enable incorporation of individual transcript information at the subcellular level for one or more modalities. To further handle such large datasets we integrated an HDF5 representation and optimized our computations. Finally, to improve (inter)operability we created i) object converters between Giotto and other popular tools (e.g. Seurat, SpatialExperiment, etc.) ii) changed to a modular package format to promote contributions from external developers, and iii) developed a reactive object for interactive selection and analysis using the R/Shiny platform. Altogether, Giotto Suite represents a powerful toolbox ready to tackle the next generation of challenges in spatial data analysis and visualization.
Cardiovascular diseases, including myocardial infarction (MI) are the leading cause of mortality worldwide. After MI, inflammatory and reparative responses trigger widespread myocardial remodeling that affects cardiac function. Understanding the heart specific intra- and intercellular signaling mechanisms that coordinate this remodeling is key to developing novel therapeutics. Here we present a multi-omics analysis of single cells and spatial transcriptomics to map human myocardial tissue in homeostasis and at different stages after MI. Eight left-ventricle samples of four MI patients and one healthy donor were profiled with 10x Visium spatial transcriptomics, single nuclear RNA-seq and single nuclear ATAC-seq. We defined a catalog of cell types that comprise the adult heart and explored their functional states. We mapped cell lineages in space and estimated pathway and transcription factor activities to increase the resolution of the spatial datasets. We studied how tissue structure influenced the location of cell types and their gene expression using MISTy, a machine-learning framework that models spatial interactions. Finally, we identified shared cellular niches between patients and compared their compositional differences. Our results provide novel links between cell composition and function that couldn’t be achieved from single cell technologies alone and allow a more detailed description of MI.
Tumor cell populations are highly heterogeneous and form clones with different genotypes. Geographically distinct parts of the tumor have different genetic and phenotypic compositions. Elucidating tumor heterogeneity is hampered by the fact that there is no technology available that would directly identify the localization of individual cells coming from different tumor clones. Spatial transcriptomics provides mini-bulk RNA-sequencing measurements in multiple spots of the tumor tissue. Here, we propose a probabilistic approach to tumor heterogeneity, accounting for the spatial resolution. The model, called Tumoroscope, efficiently combines information contained in spatial transcriptomics and bulk DNA-sequencing measurements. We first reconstruct the phylogeny of the tumor from DNA-sequencing data, inferring the clones and their corresponding genotypes. Second, the model maps variants found in RNA reads in each spot to the variants existing in the genotype of the clones and finds the most likely clonal structure of each spot. In this way, Tumoroscope identifies the location and abundance of each clone in the tissue. The model is highly accurate on simulated data and was applied to yield spatial composition of tumor clones in prostate cancer. Tumoroscope is a step forward in constructing a tool for high-resolution mapping of tumor subclones in the tumor tissue.
Single-cell multi-omics data continues to grow at an unprecedented pace, and effectively integrating different modalities holds the promise for better characterization of cell identities. Although a number of methods have demonstrated promising results in integrating multiple modalities from the same tissue, the complexity and scale of data compositions typically present in cell atlases still pose a significant challenge for existing methods. Here we present scJoint, a transfer learning method to integrate atlas-scale, heterogeneous collections of scRNA-seq and scATAC-seq data. scJoint leverages information from annotated scRNA-seq data in a semi-supervised framework and uses a neural network to simultaneously train labeled and unlabeled data, enabling label transfer and joint visualization in an integrative framework. Using multiple atlas data and a biologically varying multi-modal data, we demonstrate scJoint is computationally efficient and consistently achieves significantly higher cell type label accuracy than existing methods while providing meaningful joint visualizations. This suggests scJoint is effective in overcoming the heterogeneity in different modalities towards a more comprehensive understanding of cellular phenotypes.
Cell fate transitions in multicellular systems are coordinated through complex circuits of transcription factors (TFs) converging at regulatory elements to enable precise control of gene expression. Modern single-cell genomic approaches allow the simultaneous profiling of gene expression and chromatin accessibility in individual cells, which opens up new opportunities for the inference of cell fate regulomes. Here, we present a method called Pando, which leverages these rich multi-modal measurements to infer gene regulatory networks using a flexible linear model-based framework. By modeling the relationship between TF-binding site pairs with the expression of target genes, Pando simultaneously infers gene modules and sets of regulatory regions for each transcription factor. We use Pando to infer the regulatory network underlying the formation of human cerebral organoids to reveal fate- and state-specific genetic programs during neurogenesis. We further validate and refine the GRN using genetic perturbations and show that it can yield concrete mechanistic models of the perturbation effects. Taken together, our results highlight that comprehensive inference of regulomes requires integration of transcriptomic and epigenomic information as well as perturbation experiments.
We developed several computational tools for single cell data processing, integration and in-depth analysis,
including SciBet, ROGUE and CSOmap, and applied such tools to understand the tumor microenvironment.
Combining published and newly generated single-cell RNA sequencing data, we performed analysis of tumor-infiltrating
immune cell sub-populations and delineated their abundance across 21 human cancer types.
Such pan-cancer analysis revealed a congruence of major myeloid lineages, dendritic cell subsets and monocyte subsets,
while T cells and macrophage subsets exhibited unique transcriptomic patterns across tumor types.
Our analyses reveal features that might shape how different tumors respond to immunotherapies.
Direct lineage reprogramming involves the remarkable conversion of cellular identity.
Single-cell technologies aid in deconstructing the considerable heterogeneity in transcriptional states that typically arise during lineage conversion.
However, lineage relationships are lost during cell processing, limiting accurate trajectory reconstruction.
We previously developed ‘CellTagging’, a combinatorial cell indexing methodology, permitting the parallel capture of clonal history and cell identity,
where sequential rounds of cell labeling enable the construction of multi-level lineage trees.
CellTagging and longitudinal tracking of fibroblast to induced endoderm progenitor (iEP) reprogramming reveals two distinct trajectories:
one leading to successfully reprogrammed cells, and one leading to a dead-end state.
Here, I present two new methods to enable the molecular mechanisms underlying reprogramming outcome to be dissected.
The first is an experimental method, 'Calling Cards', enabling transcription factor binding to be recorded, in individual cells, in the earliest stages of reprogramming.
The second method is a new computational platform, called 'CellOracle', that uses single-cell transcriptome and chromatin accessibility data to reconstruct changes in GRN configurations across the reprogramming process.
Together, these tools provide new mechanistic insights into how transcription factors can drive changes in cell identity, and help reveal new factors to enhance the efficiency and fidelity of reprogramming.