RSG

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in PDT
Tuesday, November 28th
9:00-9:15
Welcome and Opening
Format: Live from venue

Moderator(s): Jason Ernst

9:15-10:00
Invited Presentation: Keynote: Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation
Format: Live from venue

Moderator(s): Ferhat Ay

  • David Kelley


Presentation Overview: Show

I will describe a neural network model, Borzoi, trained to predict cell- and tissue-specific RNA-seq coverage from DNA sequence. Statistics derived from Borzoi’s predicted coverage isolate and accurately score variant effects across multiple layers of regulation, including transcription, splicing, and polyadenylation. The wide availability of RNA-seq data across species, conditions, and assays profiling specific aspects of regulation emphasizes the potential of this approach to decipher the mapping from DNA sequence to regulatory function.

10:00-10:15
Session: Epigenome
Super-silencers are crucial for development and carcinogenesis.
Format: Live from venue

Moderator(s): Ferhat Ay

  • Di Huang, Intramural Research Program, National Library of Medicine, National Institutes of Health, United States
  • Hanna Petrykowska, National Human Genome Research Institute, National Institutes of Health, United States
  • Dhaneshwar Kumar, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, United States
  • Behdad Afzali, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, United States
  • Laura Elnitski, National Human Genome Research Institute, National Institutes of Health, United States
  • Ivan Ovcharenko, Intramural Research Program, National Library of Medicine, National Institutes of Health, United States


Presentation Overview: Show

The strength of the repressive histone H3 lysine 27 trimethylation modification signal varies drastically at individual silencers. Focusing on cases of an unusually strong repressive signal in regions that we refer to as super-silencers, we demonstrate that the regions that become B-cell super-silencers are originally associated with gene upregulation during development, and their target genes are highly expressed in stem cells, especially during early developmental stages. About 13% of B-cell super-silencers transmute to super-enhancers in B-cell lymphoma and 22% of these conversions recur across more than half of patients. Notably, genes, like BCL6 and BACH2, which are associated with these super-enhancer conversions are downregulated more swiftly than others when subjected to JQ1, a super-enhancer-disrupting bromodomain and extra-terminal domain inhibitor utilized in cancer chemotherapy. Furthermore, super-silencers are characterized by an over-representation of B-cell-cancer-associated mutations, both somatic and germline, and B-cell-cancer translocation breakpoints. This surpasses the prevalence found in other regulatory elements, such as CTCF binding sites, underlining the crucial role of super-silencers in forming and stabilizing regulatory topologies in standard B cells. For example, over 80% of cases involving the B-cell-lymphoma translocation t(3;14)(q27;q32) fuse super-silencers in the BCL6 locus with enhancer-rich domains. The repressive mechanisms of super-silencers are partially governed by the CpG content in their sequences. While CpG-rich super-silencers often prevent promoters from interacting with enhancers, CpG-depleted super-silencers typically suppress the chromatin looping of nearby enhancers. In summary, our findings accentuate the critical role super-silencers play in the normal function of B-cells, suggesting that sequence mutations and activity modifications in these elements could be primary factors in B-cell carcinogenesis.

10:15-10:30
Session: Epigenome
Cross-species and tissue imputation of species-level DNA methylation samples
Format: Live from venue

Moderator(s): Ferhat Ay

  • Emily Maciejewski, University of California, Los Angeles, United States
  • Steve Horvath, Altos Labs, United Kingdom
  • Jason Ernst, University of California, Los Angeles, United States


Presentation Overview: Show

DNA methylation data is highly informative to study a variety of aspects of mammalian biology. However, the availability of such data for many mammals has been historically limited due to a lack of applicable microarrays in species other than human and mouse. The availability of this data was recently vastly enhanced by the development and large-scale application of the mammalian methylation array, which allows DNA methylation to be measured across mammalian species at a set of 36k CpGs that are well conserved across mammals. We consider here 13,245 samples profiled on this array representing 348 species and 59 tissues from 746 species-tissue combinations. While having some coverage of many different species and tissue types, this data only captures 3.6% of potential species-tissue combinations. We thus developed CMImpute (Cross-species Methylation Imputation) which uses a Conditional Variational Autoencoder (CVAE), a conditional generative model implemented via neural networks, to impute DNA methylation of non-profiled species-tissue combinations. CMImpute specifically conditions the CVAE on species and tissue labels, allowing for direct control over the combination to be imputed. In cross-validation, we show that CMImpute yields high correlation with held-out observed values, outperforming multiple baselines in terms of agreement across methylation array probes with a mean correlation of 0.92 and across samples with a mean correlation of 0.69. CMImpute’s performance gains relative to baselines were most substantial for probes with higher observed variation in methylation levels across species and tissue types, resulting in a mean correlation across samples of 0.77 for higher variance probes. We additionally show CMImpute’s performance increases as the amount of same-species information available for a target species-tissue combination increases, resulting in a mean correlation of 0.95 when more training information is available. We then train CMImpute on all the data to impute 19,786 new species-tissue combinations representing the remaining 96.4% of potential combinations. With the new and cross-validation predictions, we show that CMImpute’s imputed samples contain species and tissue signals consistent with observed patterns. Finally, we demonstrate that we can predict the maximum lifespan of a species using the new imputed samples with similar accuracy as when using observed samples, resulting in correlations between the predicted and reported lifespan values across all species of 0.83 and 0.81 when using imputed and observed data, respectively. We expect CMImpute and our imputed data resource will be useful for DNA methylation analyses across mammalian species.

10:45-11:00
Session: Epigenome
Benchmarking deep and shallow methods for Hi-C count prediction within and across species
Format: Live from venue

Moderator(s): Oana Ursu

  • Elias DeVoe, University of Wisconsin – Madison, United States
  • Armando Serrato, National Chengchi University, United States
  • Rathnakumar Kumaragurubaran, The Hospital for Sick Children, Toronto General Hospital Research Institute, Canada
  • Liangxi Wang, The Hospital for Sick Children, University of Toronto, Canada
  • Jason Fish, Toronto General Hospital Research Institute, Canada
  • Michael Wilson, The Hospital for Sick Children, University of Toronto, Canada
  • Sushmita Roy, University of Wisconsin – Madison, United States


Presentation Overview: Show

The three-dimensional organization of the genome has emerged as an important layer of gene regulation and determines long-range regulatory interactions between enhancer elements and genes.High-throughput chromosomal conformation assays such as Hi-C technologies enable analysis of the 3D genome organization, however, high resolution Hi-C datasets across diverse biological contexts, especially non-model organisms, remains limited. This has prompted the development of computational methods that can predict Hi-C contact maps from one-dimensional signals such as chromatin accessibility, transcription factor binding, and histone marks. Such predictive models can in turn provide insights into the relationship between one-dimensional regulatory signals and long-range gene regulatory interactions. While the first generation of these methods leveraged shallow learning models, recently a number of deep learning models have been developed. However, we are lacking comprehensive benchmarking experiments that inform us about the tradeoffs between deep and shallow methods as well as how different model architectures affect count prediction performance especially across species. Here, we evaluate the performance of five methods for predicting Hi-C counts from one-dimensional signals spanning different model architectures within and across human and rat Hi-C data. These include two recent deep learning approaches, C.Origami, which uses a transformer, Epiphany, a long-short term memory model, and two simpler deep models based on a convolutional neural network and a fully connected feed forward network. We also include a previously developed shallow method, based on Random Forests for predicting these counts within and across species. When comparing methods within a single species, we found that deep methods significantly outperform shallow methods, however, the margin of performance greatly depends upon the amount of training data, feature encoding in addition to the model architecture. Across all deep models, C.Origami, a transformer-based model performed the best. Surprisingly, when we compare models for across-species predictions, deep models suffer substantially in count prediction especially when training on a single chromosome as opposed to multiple chromosomes. In the case of limited training data, the performance can degrade to be comparable or worse than shallow models in some cases. Finally, we carried out a feature ablation study for the deep models comparing model performance using accessibility, CTCF, cohesin and histone marks. Among the genomic features, CTCF was the most important feature, and accessibility could be dispensed if chromatin marks are available. Finally incorporation of sequence modeling could benefit in situations where fewer one dimensional datasets are available, especially when predicting across species.

11:00-11:15
Session: Epigenome
A multiscale 3D chromatin architecture controls lineage commitment
Format: Live from venue

Moderator(s): Oana Ursu

  • Daniela Salgado Figueroa, La Jolla Institute for Immunology; Bioinformatics and Systems Biology Program, UCSD, United States
  • Yeguang Hu, Massachusetts General Hospital, Harvard Medical School, United States
  • Zhihong Zhang, Massachusetts General Hospital, Harvard Medical School, United States
  • Margaret Veselits, Gwen Knapp Center for Lupus and Immunology Research, United States
  • Sourya Bhattacharyya, La Jolla Institute for Immunology, United States
  • Mariko Kashiwagi, Massachusetts General Hospital, Harvard Medical School, United States
  • Marcus R Clark, Gwen Knapp Center for Lupus and Immunology Research, United States
  • Bruce A Morgan, Massachusetts General Hospital, Harvard Medical School, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States
  • Katia Georgopoulos, Massachusetts General Hospital, Harvard Medical School, United States


Presentation Overview: Show

A generic level of chromatin organization generated by cohesin and CTCF suffices to limit promiscuous interactions between regulatory elements, but a lineage-specific chromatin assembly that supersedes these constraints is required to configure the genome to guide gene expression changes that drive faithful lineage progression. To address how lineage-determining transcription factors organize the genome’s three-dimensional architecture to support and enforce lineage choice, we examine the effects of loss of function of IKAROS, a DNA binding protein required for lymphocyte differentiation. By performing ChIP-seq, Hi-C and HiChIP experiments we identified a hierarchy of changes in a multiscale chromatin organization process that involves chromatin loops, contact domains, histone marks, and compartmental organization. We show a dynamic interplay between a lineage-determining transcription factor, CTCF and cohesin loop extrusion to prime lineage-specific genome organization in place before the induction of the transcriptional program needed for lineage commitment. Chromatin interactions emanating from IKAROS-bound enhancers override CTCF-imposed boundaries to assemble lineage-specific regulatory units built on a backbone of smaller invariant topological domains, and long-distance interactions between lymphoid-specific enhancers bound by IKAROS keep the associated regions affiliated with euchromatin. The effects of IKAROS loss on regulatory loops spanning the 3.2MB Immunoglobulin kappa (IgK) locus highlights the role of interactions involving IKAROS-bound regulatory elements in contraction and recombination of this locus. Our gain of function experiments using ectopic expression in a human skin epithelial cell line, where IKAROS has no activity, confirmed the ability of IKAROS to promote interactions between distant enhancers and promoters, assemble them and adjacent DNA into topologically associated domains and confer euchromatic localization to regions associated with induced loops, in conjunction with upregulation of extra-lineage gene expression. By demonstrating a clear role of IKAROS in regulating 3D chromatin organization, we provide a paradigm for how lineage-defining DNA binding proteins can direct the machineries that configure the genome in nuclear space to specify lineage potential and guide appropriate development.

11:15-11:30
Flash Talks
Format: Live from venue

Moderator(s): Oana Ursu


Presentation Overview: Show

A lack of distinct cell identities in single-cell measurements: revisiting Waddington’s landscape

scDEED: a statistical method for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters

Integrating single-cell chromatin accessibility data with GWAS improves detection of relevant cell types in 59 complex phenotypes

MODEST: Modeling of Epigenomics and Spatial Transcriptomics Data

Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types

11:30-13:30
Poster Session with Lunch
Format: Live from venue

13:30-14:15
Invited Presentation: Keynote: Simulator: the Swiss Army knife of biomedical data science
Format: Live from venue

Moderator(s): Jason Ernst

  • Jingyi Jessica Li


Presentation Overview: Show

Reference-based simulators, which generate realistic synthetic data as digital twins of reference real data, can help researchers imagine hypothetical experimental results, thus informing study design, method benchmarking, and scientific discovery. In this talk, I will introduce our recent development of simulators for single-cell and spatial multi-omics data, including count data and raw sequencing reads. Our simulators aim to balance two aspects: (1) mimicking real data and (2) allowing user-specified ground truths. Specifically, our latest count simulator scDesign3 (https://www.nature.com/articles/s41587-023-01772-1) uses a unified probabilistic model for single-cell and spatial multi-omics count data. Hence, scDesign3 can infer biologically meaningful parameters; assess the goodness-of-fit of inferred cell clusters, trajectories and spatial locations; and generate synthetic negative- and positive-control data for computational analysis. Coupled with scDesign3, our read simulator scReadSim (https://doi.org/10.1101/2022.05.29.493924) enables the robustness evaluation and performance benchmarking of read-level computational tools for single-cell RNA-seq and ATAC-seq data. I will provide use examples and discuss how simulators can help improve the reliability of data-driven discoveries.

14:15-14:30
Session: Single-cell
Voyager: exploratory single-cell genomics data analysis with geospatial statistics
Format: Live from venue

Moderator(s): Jason Ernst

  • Lambda Moses, California Institute of Technology, United States
  • Pétur Einarsson, University of Iceland, Iceland
  • Alik Huseynov, German Cancer Research Center, Germany
  • Kayla Jackson, California Institute of Technology, United States
  • Laura Luebbert, California Institute of Technology, United States
  • Sina Booeshaghi, California Institute of Technology, United States
  • Sindri Antonsson, University of Iceland, Iceland
  • Páll Melsted, University of Iceland, Iceland
  • Lior Pachter, California Institute of Technology, United States


Presentation Overview: Show

Exploratory spatial data analysis (ESDA) can be a powerful approach to understanding single-cell genomics datasets, but it is not yet part of standard data analysis workflows. In particular, geospatial analyses, which have been developed and refined for decades, have yet to be fully adapted and applied to spatial single-cell analysis. We introduce the Voyager platform, which systematically brings the geospatial ESDA tradition to (spatial) -omics, with local, bivariate, and multivariate spatial methods not yet commonly applied to spatial -omics, united by a uniform user interface. Using Voyager, we showcase biological insights that can be derived with its methods, such as biologically relevant negative spatial autocorrelation. Underlying Voyager is the SpatialFeatureExperiment data structure, which combines Simple Feature with SingleCellExperiment and AnnData to represent and operate on geometries bundled with gene expression data. Voyager has comprehensive tutorials demonstrating ESDA built on GitHub Actions to ensure reproducibility and scalability, using data from popular commercial technologies. Voyager is implemented in both R/Bioconductor and Python/PyPI, and features compatibility tests to ensure that both implementations return consistent results.

14:30-14:45
Session: Single-cell
A Semi-supervised Transformer-Based Model for Single Cell Type Classification
Format: Live from venue

Moderator(s): Jason Ernst

  • Dante Bolzan, La Jolla Institute for Immunology, United States
  • Abbas Ardakany, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States


Presentation Overview: Show

Single-cell RNA sequencing (scRNA-seq) is a powerful experimental technique that simultaneously measures the levels of thousands of transcripts for individual cells. In order to derive biological insights from the data, it is essential to classify or assign cells to known or unknown cell types using these measurements of gene expression. One approach is the manual annotation of identified clusters, which is limited by the choice of clustering method and is prone to individual bias or discordant naming conventions across different fields. Recent approaches use machine learning models that are designed to transfer labels from one or multiple datasets (reference atlases) to another by learning general representations of cell types. However, only a limited number of these approaches can handle different conditions or domains to identify representations that are robust or invariant to such domain shifts.
Here, we present scPerformer, a transformer-based, domain adversarial cell-type classification model. scPerformer makes use of semi-supervised learning, a technique that utilizes unlabeled data while training. scPerformer first performs a dimensionality reduction on gene expression values and then feeds these latent representations into a transformer encoder which learns how each gene attends to every other gene. Simultaneously, a domain adversarial is trained to distinguish data coming from the training and test sets. Due to its transformer architecture, it learns long-range gene-gene interactions to learn more complex representations of each cell type. We compared scPerformer to scNym, a state-of-the-art deep learning method, and a baseline logistic regression model using intra-dataset and cross-dataset tasks. For the intra-dataset tasks, we trained and tested across different domains (age, sequencing technology, donors, and experimental batches) and found scPerformer to be competitive. When we looked at classification performance more closely, we found that scPerformer misclassified cells that were transcriptionally similar to the predicted cell type for closely related CD4 T cell subsets. For the cross-dataset tasks, we selected three datasets generated from COVID-19 patients and controls and found that scPerformer was competitive in classifying coarsely-defined cell types that are shared across datasets. scPerformer was also able to capture changes in the frequency of cell types and subsets that were found to correlate with COVID-19 severity in one publication using single-cell data from the other, thereby helping assess the generalizability of such correlative findings across different disease cohorts. We also observed that scPerformer can learn representations of rare cell types with a small number of cells.

14:45-15:00
Session: Single-cell
Single cell Orthogonal Non-negative Matrix Tri-factorization for identification of cell type and state-specific gene expression programs
Format: Live from venue

Moderator(s): Jason Ernst

  • Harmon Bhasin, Wisconsin Institute for Discovery, Department of Computer Science University of Wisconsin-Madison, United States
  • Spencer Halberg-Spencer, Wisconsin Institute for Discovery, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, United States
  • Katherine Mueller, Wisconsin Institute for Discovery, Department of Biomedical Engineering University of Wisconsin-Madison, United States
  • Junha Shin, Wisconsin Institute for Discovery, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, United States
  • Sunnie Grace McCalla, Wisconsin Institute for Discovery, Laboratory of Genetics University of Wisconsin-Madison, United States
  • Elizabeth Capowski, Waisman Center, United States
  • David Gamm, Waisman Center, McPherson Eye Research Institute, Department of Ophthalmology University of Wisconsin-Madison, United States
  • Krishanu Saha, Wisconsin Institute for Discovery, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, United States
  • Sushmita Roy, Wisconsin Institute for Discovery, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, United States


Presentation Overview: Show

Single cell genomics allows researchers to capture high-dimensional omic profiles such as gene expression and accessibility at the individual cell level and has transformed our ability to study heterogeneous populations of cells from diverse tissue, disease, and developmental contexts. A first step in the analysis of such datasets is to define cell clusters and annotate them to determine the cell type and state composition. Existing approaches first define cell clusters followed by identification of differentially expressed genes to annotate these clusters. This two-step approach may not accurately capture gene expression programs that are important for cellular state and identity and is especially limited for less studied systems such as organoids.

To address this gap, we have developed single cell non-negative matrix tri-factorization (scONMTF) which allows for simultaneous identification of cell clusters and their associated gene programs. scONMTF identifies interpretable low-dimensional representations of cells each associated with multiple driver gene expression programs, thereby generalizing NMF which is restricted to a single program for each latent factor. We compared scONMTF to baseline NMF and the two-step clustering based approaches on simulated, published, and novel single cell RNA-seq (scRNA-seq) data. On simulated data, scONMTF outperforms NMF by being flexible in identifying many-to-many relationships that are otherwise missed by NMF, while maintaining high accuracy in recovering known clusters. We next applied scONMTF to a real scRNA-seq dataset for PBMCs with known cell type labels, which has been widely used for benchmarking scRNA-seq clustering methods. scONMTF outperformed NMF and Louvain clustering by identifying hematopoietic cell types and further captured shared gene expression programs relating similar cell types such as CD4+ T-cells and natural killer cells.

We applied scONMTF to a multi-sample dataset from 2D and 3D in vitro organoid platforms of the human retina. While 2D organoids are more experimentally tractable, 3D organoids are expected to recapitulate the biology of human tissue more faithfully. Identification of gene expression programs that capture differences across these platforms could enable efficient production of organoids. Using scONMTF gene expression programs we find that the retinal cells in the 2D platform are developmentally less mature compared to those from the 3D platform. Regulators associated with these gene expression programs could help engineer 2D organoids with greater developmental maturity. Taken together, our results suggest that scONMTF is a powerful and flexible approach that can be applied to complex scRNA-seq datasets to identify informative gene expression programs across different contexts.

15:00-15:15
Session: Single-cell
Letting GRNs causally drive adversarial simulation of scRNA-seq data using GRouNdGAN
Format: Live from venue

Moderator(s): Jason Ernst

  • Yazdan Zinati, McGill University, Canada
  • Abdulrahman Takiddeen, McGill University, Canada
  • Amin Emad, McGill University, Canada


Presentation Overview: Show

Data-driven inference of gene regulatory networks (GRNs) from single-cell RNAseq (scRNAseq) data has been a topic of interest for many years. Nevertheless, benchmarking GRN inference algorithms remains challenging due to the absence of gold-standard ground truth. Though reference GRNs can be built based on experimental data such as ChIP-Seq, or curated from literature, interactions might not fully align with the biological context under investigation, necessitating lengthy and costly perturbation experiments.

To address this challenge, we developed GRouNdGAN, a causal implicit generative model based on the causal GAN framework for simulating scRNAseq data, in-silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-provided GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on three reference experimental datasets, we show that our model captures non-linear TF-gene dependencies, as well as technical and biological noise in real scRNAseq data to generate realistic datasets in which GRN properties are captured and pseudo-time ordering, cell trajectories, and gene identities are preserved with no user manipulation and solely implicit parameterization.

We evaluated the ability of GRouNdGAN in simulating cells that are indistinguishable from real ones using different datasets, GRN densities, and metrics. Despite imposing a rigid constraint on causality, GRouNdGAN outperforms state-of-the-art simulators in this regard by incorporating domain knowledge through the GRN while showing no sign of mode collapse.

GRouNdGAN learns meaningful causal regulatory dynamics, allowing it to sample from both observational and interventional distributions. This property enables it to synthesize cells under conditions that do not occur in the dataset at inference time, allowing to perform in-silico TF knockout and perturbation experiments. Our results show that in-silico knockout of cell type-specific TFs significantly reduces cells of that type being generated. Furthermore, GRouNdGAN can be used to generate paired control-case samples for an intervention through a deterministic mode of operation.

Interactions imposed through the GRN are emphasized in the simulated datasets, resulting in GRN inference algorithms assigning them much higher scores than interactions not imposed but of equal importance in the experimental training dataset. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms. Our results show that GRouNdGAN is a stable, realistic, and effective simulator with various applications in single-cell RNA-seq analysis.

15:15-15:30
Session: Single-cell
scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics
Format: Live from venue

Moderator(s): Jason Ernst

  • Dongyuan Song, Univeristy of California, Los Angeles, United States
  • Qingyang Wang, University of California, Los Angeles, United States
  • Guanao Yan, University of California, Los Angeles, United States
  • Tianyang Liu, University of California, Los Angeles, United States
  • Tianyi Sun, Department of Statistics, UCLA, United States
  • Jingyi Jessica Li, University of California, Los Angeles, United States


Presentation Overview: Show

We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.

16:00-16:15
Session: Single-cell
Batch integration and pre-trained novel cell type discovery with heterogeneous single-cell ATAC-seq features
Format: Live from venue

Moderator(s): Saurabh Sinha

  • Yuqi Cheng, Georgia Tech, United States
  • Xiuwei Zhang, Georgia Tech, United States


Presentation Overview: Show

Single-cell chromatin accessibility sequencing (scATAC-seq) has emerged as a powerful tool for deciphering cell-type-specific epigenetic regulation in recent years. The rapid expansion of scATAC-seq-derived cell atlases and increasing sequencing throughput have heralded the era of single-cell epigenomic atlases. Consequently, this progress raises two pressing questions: (1) how to effectively integrate heterogeneous scATAC-seq datasets and (2) how to accurately learn cell types from existing reference atlases. Most of the current related methods that are designed for single-cell RNA-seq fail to address these questions due to two unique challenges of scATAC-seq data: (1) distinct feature set across batches (2) reference genome discrepancies between query data and published atlas. Traditionally, an aligned feature space is needed in such tasks. However, this often leads to both computational inefficiency due to the high dimensionality of scATAC-seq data and inaccurate results due to the inner heterogeneity of features. These problems have harmed batch integration and reference mapping tasks in the scATAC-seq community. Additionally, cell label transfer methods are in a limited number for scATAC-seq data, none of which are able to discover novel cell types from reference datasets. Thus, there is a pressing need to explore a new computational method that is in consensus-feature-free and novel-cell-type-sensitive modes. Here, we introduce EpiPack, a parallel autoencoder framework designed for the integration of heterogeneous scATAC-seq data and novel cell-type discovery. EpiPack leverages peak-embedding-regularized domain adaptation, offering a versatile framework to bridge the gap between multi-condition heterogeneous scATAC-seq datasets. The unique ability to perform cross-batch mapping without necessitating consensus peak sets makes EpiPack an efficient solution to utilizing multi-source reference atlas without concerning feature alignment. Being able to dissect novel cell types from reference data also makes EpiPack an important complement to the current cell annotation tools in scATAC-seq. Comprehensive benchmarking tests confirm EpiPack's robustness and efficiency in handling heterogeneous scATAC-seq data. EpiPack consistently outperforms the majority of existing integration and cell label transfer methods across all tasks. By solving the two fundamental questions outlined earlier, EpiPack promises to broaden the scope of applications for scATAC-seq technology and provide the community with a new perspective on addressing the heterogeneity inherent in scATAC-seq datasets.

16:15-16:30
Session: Single-cell
CMOT: Cross-Modality Optimal Transport for multimodal inference
Format: Live from venue

Moderator(s): Saurabh Sinha

  • Sayali Anil Alatkar, University of Wisconsin - Madison, United States
  • Daifeng Wang, University of Wisconsin - Madison, United States


Presentation Overview: Show

Multimodal measurements of single-cell sequencing technologies facilitate a comprehensive understanding of specific cellular and molecular mechanisms. However, simultaneous profiling of multiple modalities of single cells is challenging, and data integration remains elusive due to missing modalities and cell–cell correspondences. To address this, we developed a computational approach, Cross-Modality Optimal Transport (CMOT) [1], which aligns cells within available multi-modal data (source) onto a common latent space and infers missing modalities for cells from another modality (target) of mapped source cells. CMOT outperforms existing methods in various applications from developing brain, cancers to immunology, and provides biological interpretations improving cell-type or cancer classifications.

[1] Sayali Alatkar, Daifeng Wang, CMOT: Cross-Modality Optimal Transport for multimodal inference, Genome Biology, 24, 163, 2023

16:30-17:15
Invited Presentation: Keynote: Everything as code
Format: Live from venue

Moderator(s): Saurabh Sinha

  • David Van Valen


Presentation Overview: Show

Biological systems are difficult to study because they consist of thousands of parts, vary in space and time, and their fundamental unit—the cell—displays remarkable variation in its behavior. These challenges have spurred the development of genomics and imaging technologies over the past 30 years that have revolutionized our ability to capture information about biological systems in the form of images. Excitingly, these advances are poised to place the microscope back at the center of the modern biologist’s toolkit. Because we can now access temporal, spatial, and “parts list” variation via imaging, images have the potential to be a standard data type for biology.
Biology needs a new data infrastructure for this vision to become a reality. Imaging methods are of little use if it is too difficult to convert the resulting data into quantitative, interpretable information. New deep learning methods are essential to the reliable interpretation of imaging data. These methods differ from conventional algorithms in that they learn how to perform tasks from labeled data; they have demonstrated immense promise, but they are challenging to use in practice. The expansive training data required to power them are sorely lacking, as are easy-to-use software tools for creating and deploying new models. Solving these challenges through open software is a key goal of the Van Valen lab. In this talk, I describe DeepCell, a collection of software tools that meet the data, model, and deployment challenges associated with deep learning. These include tools for distributed labeling of biological imaging data, a collection of modern deep learning architectures tailored for biological-image analysis tasks, and cloud-native software for making deep learning methods accessible to the broader life science community. I discuss how we have used DeepCell to label large-scale imaging datasets to power deep learning methods that achieve human-level performance and enable new experimental designs for imaging-based experiments.

Wednesday, November 29th
9:00-9:15
Day 2 Welcome
Format: Live from venue

Moderator(s): Sushmita Roy

9:15-10:00
Invited Presentation: Keynote: Comparative spatial omics data analysis
Format: Live from venue

Moderator(s): Sushmita Roy

  • Jean Fan


Presentation Overview: Show

Abstract: Recent advances in high-throughput spatially resolved transcriptomics technologies now enable the identification and characterization of these cell-type and their molecular states in health versus disease while preserving the cell’s spatial context. Application of these technologies provides the opportunity to contribute to a more complete understanding of how cellular spatial organization relates to tissue function and how cellular spatial organization is altered in disease. New statistical approaches and computational tools are needed to identify and compare these cellular spatial organizational differences. In this talk, I will provide an overview the latest spatially resolved transcriptomics technologies and associated computational analysis methods developed by my lab. Specifically, in order to make molecular and cell-type compositional comparisons at matched spatial locations across structurally similar tissues, we developed STalign to align 2D spatially resolved transcriptomics datasets within and across technologies and to 3D common coordinate framework. To make cell-type relational comparisons, we developed CRAWDAD, Cell-type Relationship Analysis Workflow Done Across Distances, to quantitatively evaluate such cell-type spatial relationships across different length scales. We anticipate that such statistical approaches and computational methods for analyzing spatially resolved transcriptomic data will offer the potential to identify and characterize spatial organizational differences and contribute to important fundamental biological insights regarding how cell-type spatial organization differs in healthy and diseased settings.

10:00-10:15
Session: Variation
C-STEM: A Fast and Powerful Method for Robust Context-Specific trans-eQTL Mapping in Multi Context Studies
Format: Live from venue

Moderator(s): Sushmita Roy

  • Lena Krockenberger, Bioinformatics Interdepartmental Graduate Program, UCLA, United States
  • Mike Thompson, Systems Biology Department, Center for Genomic Regulation (CRG), Barcelona, Spain, Spain
  • Noah Zaitlen, Department of Computational Medicine, David Geffen School of Medicine, UCLA, United States
  • Xuanyao Liu, Section of Genetic Medicine, School of Medicine, University of Chicago, United States
  • Brunilda Balliu, Department of Computational Medicine, David Geffen School of Medicine, UCLA, United States


Presentation Overview: Show

The discovery that most disease-associated genetic variants lie outside exons has fueled extensive research into genetic mechanisms governing transcriptional regulation. While identifying associations between gene expression levels and proximal SNPs (cis-eQTLs) has become more feasible, identification of high-quality distal associations (trans-eQTLs) has been challenging. Trans effects are typically weaker and more context-specific than cis effects, making them harder to detect. In addition, existing methods for trans-eQTL mapping often incur a heavy multiple testing burden and overlook the inherent intra-individual correlation of gene expression found in multi context studies with repeated sampling, e.g., GTEx and single-cell RNA-Seq studies. These oversights can significantly diminish the power to detect context-specific trans-eQTLs.
To address these issues, we develop C-STEM, a fast and powerful method for context-specific trans-eQTL mapping. C-STEM first accounts for intra-individual correlation by decomposing the expression of a gene into context shared and context specific components, builds cross-validated cis-genetic predictors (CVGP) for each component, and creates a final predictor using both components. C-STEM then tests for association between all CVGP-trans gene pairs, significantly reducing the number of tests and improving power to detect trans-eQTLs that act through cis effects on a gene. Finally, C-STEM employs a hierarchical testing procedure to control FDR across and within contexts and boost power when a significant CVGP-trans gene pair association exists in multiple contexts.
Through simulations, we demonstrate C-STEM's increased power in detecting trans-gene regulation compared to other methods. We apply C-STEM to bulk multi-tissue RNA-seq data from the GTEx consortium (N=948) and peripheral-blood single-cell RNA-Seq data from the CLUES (N=234) and OneK1K (N=982) cohorts, generating a comprehensive tissue and peripheral blood cell type-specific trans-eQTL map. C-STEM identifies 89% of trans-eQTLs mapped by existing approaches, while providing a 65% increase in the number of trans-gene regulations identified. Existing approaches overestimate specificity of trans-eQTL effects across contexts; 12% of trans-eQTLs appear unique to a single context using existing methods, compared to only 6% using C-STEM.
In summary, C-STEM enables construction of context-specific trans-eQTL maps, aiding in understanding context-specific gene regulatory networks underlying complex human traits.

10:15-10:30
Session: Variation
Detection, characterization and prevention of homology-mediated deletions
Format: Live from venue

Moderator(s): Sushmita Roy

  • Aditee Kadam, Weizmann Institute of Science, India
  • Shay Shilo, Weizmann Institute of Science, Israel, Israel


Presentation Overview: Show

Background:

Detecting medium-sized deletions in short-reads is highly challenging due to reference biases and mapping issues. We developed an algorithm that enables de novo detection of medium-sized deletions: DelRead. The algorithm focuses on a specific type of deterministic deletion with a well-defined genetic mechanism - Micro-Homology mediated End Joining deletions (MMEJ-del). Using prior knowledge of the MMEJ mechanism, our algorithm compiles a complete set of potential deletions in the exome. Subsequently, it maps these deletions to sequencing reads, thereby reducing reliance on mapping differences to a reference genome.

Aims:

To explore the somatic and germline MMEJ-del landscape using Del-Read, and provide insights into preventing somatic deletions through genome editing.

Methods:

The Del-Read algorithm was applied to two datasets - Beat AML and TCGA-breast - which comprised tumor-control paired exomes (N=359 and 225, respectively). A subset of these mutations underwent deep targeted sequencing in a cohort of 672 healthy individuals. In addition, the ASXL1 gene was modified at specific positions using the CRISPR-Cas9 technology.

Results:

The Del-Read algorithm identified reported (N=82), novel germline (N=486), and somatic (N=20) MMEJ-del in the datasets. A subset of these mutations (N=37) was validated with comparable population frequencies using targeted sequencing of healthy individuals (N=672) in ethnicity-matched controls.

The magnitude of novel MMEJ-del discovered allowed us to associate them with genomic features of replication stress, such as G-quadruplexes and minisatellites. Interestingly, we also observed a new class of MMEJ-del characterized by mismatches in the homologies, although not all mismatches were equally tolerated. Using this discovery, we demonstrated that a specific single-base substitution can restrict the occurrence of pre-leukemic MMEJ-del in the ASXL1 gene.

Summary:

Our findings highlight Del-Read's potential to uncover previously undetected deletions and provide insights into preventing somatic deletions through genome editing.

10:45-11:00
Session: Dynamics/Networks
Genetic control of the dynamic transcriptional response to immune stimuli and glucocorticoids at single-cell resolution
Format: Live from venue

Moderator(s): Anna Ritz

  • Justyna Resztak, Wayne State University, United States
  • Julong Wei, Wayne State University, United States
  • Samuele Zilioli, Wayne State University, United States
  • Adnan Alazizi, Wayne State University, United States
  • Henriette Mair-Meijers, Wayne State University, United States
  • Peijun Wu, University of Michigan, United States
  • Xiaoquan Wen, University of Michigan, United States
  • Richard Slatcher, University of Georgia Athens, United States
  • Xiang Zhou, University of Michigan, United States
  • Francesca Luca, Wayne State University, United States
  • Roger Pique-Regi, Wayne State University, United States


Presentation Overview: Show

Synthetic glucocorticoids, such as dexamethasone, have been used as a treatment for many immune conditions, such as asthma and, more recently, severe COVID-19. Single-cell data can capture more fine-grained details on transcriptional variability and dynamics to gain a better understanding of the molecular underpinnings of inter-individual variation in drug response. Here, we used single-cell RNA-seq to study the dynamics of the transcriptional response to glucocorticoids in activated peripheral blood mononuclear cells from 96 African American children. We used novel statistical approaches to calculate a mean-independent measure of gene expression variability and a measure of transcriptional response pseudotime. Using these approaches, we showed that glucocorticoids reverse the effects of immune stimulation on both gene expression mean and variability. Our novel measure of gene expression response dynamics, based on the diagonal linear discriminant analysis, separated individual cells by response status on the basis of their transcriptional profiles and allowed us to identify different dynamic patterns of gene expression along the response pseudotime. We identified genetic variants regulating gene expression mean and variability, including treatment-specific effects, and showed widespread genetic regulation of the transcriptional dynamics of the gene expression response.

11:00-11:15
Session: Dynamics/Networks
Modeling the heterogenous NFκB dynamics of single immune cells
Format: Live from venue

Moderator(s): Anna Ritz

  • Xiaolu Guo, University of California, Los Angeles, United States
  • Adewunmi Adelaja, University of California, Los Angeles, United States
  • Supriya Sen, University of California, Los Angeles, United States
  • Alexander Hoffmann, University of California, Los Angeles, United States


Presentation Overview: Show

Macrophages function as immune sentinel cells, initiating appropriate and specialized immune responses to a great variety of pathogens. The transcription factor NFκB controls macrophage gene expression responses, and its temporal dynamics enable stimulus-specificity of these responses. Using a fluorescent reporter mouse our laboratory recently generated large amounts of single-cell NFκB dynamic data and identified dynamic features, termed ‘signaling codons’, that convey information to the nucleus about stimulus identity and dose. Here, we aimed to recapitulate the stimulus-specific but highly cell-to-cell heterogeneous NFκB dynamics with a mathematical model of the signaling network. The parameters that are subject to biological variation provide the potential to account for the heterogeneity in observed stimulus responses. We estimated parameter distributions using the Stochastic Approximation Expectation Maximization (SAEM) approach and then fit the individual cell data using Bayesian maximum a posteriori (MAP) estimation. Visual inspection revealed an excellent fit with the data. To quantitatively evaluate the fitting performance, we compared the experimental and predicted distributions of NFκB signaling codons. Further, we identified biochemical reactions that may account for the cellular heterogeneity in NFκB dynamics. We verified that the stimulus-specificity of the virtual macrophage NFκB responses was consistent with their live-cell counterparts, as assessed by mutual information and machine learning classification. Additionally, the mathematical model allowed us extend experimental dose response studies, revealing the doses that maximize information. Furthermore, the virtual NFκB macrophages enabled the exploration of individual cell responses to different ligands. Leveraging this capability, we made predictions regarding combinatorial ligands, that were then experimentally tested. Discrepancies between the experimental results and model predictions led to the identification of a competition mechanism between CpG and PolyIC for endosome trafficking, resulting in non-integrative responses behavior. Our results establish a mathematical modeling tool that may be used to study the molecular determinants of response specificity and dynamical coding in immune sentinel cells at the single cell level.

11:15-11:30
Session: Dynamics/Networks
Epiregulon: Inference of single-cell transcription factor activity to dissect mechanisms of lineage plasticity and drug response
Format: Live from venue

Moderator(s): Anna Ritz

  • Tomasz Wlodarczyk, Roche, Poland
  • Aaron Lun, Genentech Inc., United States
  • Diana Wu, Genentech Inc, United States
  • Kerstin Seidel, Genentech Inc, United States
  • Mark Wang, Genentech Inc., United States
  • Jenille Tan, Genentech Inc., United States
  • Shang-Yang Chen, Genentech Inc, United States
  • Timothy Keyes, Genentech Inc., United States
  • Marc Hafner, Genentech Inc., United States
  • Christopher Siebel, Genentech Inc., United States
  • Robert Yauch, Genentech Inc., United States
  • Shiqi Xie, Genentech Inc., United States
  • Xiaosai Yao, Genentech Inc., United States


Presentation Overview: Show

Transcription factors (TFs) represent an emerging and exciting class of oncology targets. By quantifying target gene modulation, inference of gene regulatory networks (GRNs) enables assessment of TF-targeting agents. However, none of the existing methods is specifically designed to measure effects of perturbations in which TF expression is decoupled from its activity. We present Epiregulon, a method that constructs gene regulatory networks and predicts TF activity from single-cell ATAC-seq and RNA-seq data. Our weight estimation, based on co-occurrence of TF expression and chromatin accessibility, avoids erroneous inflation of TF activity as seen with TF expression-based approaches. Furthermore, our utilization of ChIP-seq data extends inference of activity to chromatin modifiers lacking defined motifs. Our extensive network of regulators facilitates identification of cell-state specific interaction partners. Using Epiregulon, we uncover the divergent cell fate transitions of prostate cancer cells driven by NKX2-1 and GATA6 expression. We accurately predicted the effects of AR inhibition across various drug modalities. Finally, we inferred context-dependent activity of SMARCA4 and recapitulated the unique etiologies of the prostate cancer subtypes. By mapping out the network of key regulators across a multitude of perturbations, Epiregulon can accelerate discovery of new therapeutics targeting transcription factors.

11:30-13:30
Poster Session with Lunch
Format: Live from venue

13:30-14:15
Invited Presentation: Keynote: Deep learning reveals mechanisms of protein-DNA binding
Format: Live from venue

Moderator(s): Roger Pique-Regi

  • Remo Rohs


Presentation Overview: Show

With the availability of a large amount of experimentally solved three-dimensional structures of transcription factor-DNA complexes and DNA sequence data from high-throughput binding assays, we developed statistical machine learning and deep learning methods that identify features of nucleic acids and proteins that are essential for protein-DNA readout and binding specificity. This talk will introduce such approaches and the new possibilities that such computational approaches provide for answering biological questions that relate to molecular recognition and gene regulation.

14:15-14:30
Session: Transcriptome
Long-read RNA-seq demarcates cis- and trans-directed alternative RNA splicing.
Format: Live from venue

Moderator(s): Roger Pique-Regi

  • Giovanni Quinones-Valdez, UCLA, United States
  • Kofi Amoah, UCLA, United States
  • Grace Xiao, UCLA, United States


Presentation Overview: Show

RNA splicing is determined by the contribution of cis-acting genomic elements and trans-acting splicing factors; however, the specific contributions of these elements to distinct splicing events remain unclear. Here, we leverage long-read RNA-seq to demarcate splicing events that are primarily cis- or trans-directed, using a new method, isoLASER. IsoLASER performs de novo variant calling, gene-level phasing, and allele-specific transcript expression analysis. We applied isoLASER to data from the ENCODE consortium and identified 2,047 and 4,679 unique exonic regions exhibiting cis-directed splicing in human and mouse tissues, respectively.

While most alternatively spliced exons are predominantly regulated by tissue-specific trans-acting factors, a subset of cis-regulated exons remains similarly spliced across tissues with shared genetic backgrounds. These cis-directed regions exhibited low conservation across vertebrates, and intriguingly, the degree of conservation and genetic linkage were strongly correlated. Among genes harboring cis-directed events, there is an over-representation of immune-related genes, particularly those within the Human Leukocyte Antigen (HLA) family. These findings align with a model suggesting that allele-specific splicing introduces novel modes of phenotypic variation, contributing an added layer of complexity to the intricate processing of antigens.

Cohort-level analysis, facilitated by isoLASER's joint function, capitalizes on linkage information from multiple donors, revealing splicing-associated variants (SAVs). This approach offers an alternative to association studies for cohorts with limited sample sizes. Remarkably, these SAVs demonstrated significant functional enrichment compared to matched controls in splicing quantitative trait loci (sQTLs) studies. This enrichment was further explained by significantly disrupting the binding of multiple splicing factors. Another distinct advantage of our method is uncovering associations independently of the variant’s minor allele frequency. In fact, approximately 7% of the SAVs identified in the human tissues were categorized as rare or ultra-rare in the population.

In summary, we introduce isoLASER, a novel tool for characterizing genetically and non-genetically driven splicing events and associated variants. It exhibits methodological advantages, revealing associations independent of cohort size or minor allele frequency. We also characterize the patterns and biological features of genetically modulated splicing in human and mouse tissues, highlighting notable examples within the HLA family.

14:30-14:45
Session: Transcriptome
Uncovering the single-cell isoform dynamics of the aging mouse brain using long-read nanopore sequencing
Format: Live from venue

Moderator(s): Roger Pique-Regi

  • Abid Rehman, National Institute on Aging, United States
  • Megan Duffy, National Institute on Aging, United States
  • Na Yang, National Institute on Aging, United States
  • Cedric Belair, National Institute on Aging, United States
  • Lin Wang, National Institute on Aging, United States
  • Sulochan Malla, National Institute on Aging, United States
  • Payel Sen, National Institute on Aging, United States
  • Mark Cookson, National Institute on Aging, United States
  • Manolis Maragkakis, National Institute on Aging, United States


Presentation Overview: Show

Pre-mRNA splicing is a process used by eukaryotic cells to generate a diverse set of transcripts from a single gene. It involves the removal of noncoding introns and joining of coding exons at varying combinations to generate mature mRNAs. The regulation of alternative splicing and gene isoform expression has been intimately linked to aging. In fact, an energy–splicing resilience axis has been proposed as an adaptive mechanism to support longevity and resilience. The human and mouse brain are among the most diverse organs in terms of isoform expression and much interest has focused on discovering their splicing programs across aging. Recent developments have allowed the quantification of gene expression at single-cell resolution and lead to discoveries that have greatly increased our understanding of gene regulation in different cell-types. However most such methods involve the use of short read sequencing and thus cannot capture long range information that is required to characterize alternative splicing programs and quantify isoform diversity at single-cell resolution.

To address this, we couple 10X single-cell technology with nanopore long-read sequencing to capture full-length transcripts and measure isoform diversity in the aging mouse brain. Our analysis of more than 150,000 single cells identified isoform dynamics of the male and female hippocampus and cortex across 4 age points spanning young to extreme-old animals enabling us to extract patterns for healthy aging. We cluster isoforms to distinct expression trajectories and reveal multidimensional heterogeneity and isoform expression switching across cell type, aging, and sex. We find that extreme old animals exhibit isoform expression patterns that resemble those of younger counterparts highlighting patterns of healthy aging, especially in neurons. Finally, we show that the aging transcriptome is characterized by isoform length imbalance that manifests in a cell type-specific manner. Our work provides the first comprehensive full-length isoform profile of the aging mouse brain at single-cell resolution to date, shedding light on the dynamics of isoform usage and regulation in the context of aging.

14:45-15:00
Session: Transcriptome
dsRID: Editing-free in silico identification of dsRNA region using long-read RNA-seq data
Format: Live from venue

Moderator(s): Roger Pique-Regi

  • Ryo Yamamoto, University of California, Los Angeles, United States
  • Zhiheng Liu, University of California, Los Angeles, United States
  • Mudra Choudhury, University of California, Los Angeles, United States
  • Xinshu Xiao, University of California, Los Angeles, United States


Presentation Overview: Show

Endogenous dsRNAs, recognized by sensor proteins, falsely activate innate immune responses. Unwanted antiviral signaling can be prevented by Adenosine-to-Inosine (A-to-I) RNA editing. Accumulating evidence suggests that A-to-I editing on dsRNAs affects their immunogenicity, and is implicated in autoimmune diseases such as Alzheimer’s disease (AD).
Since the only known mechanism for A-to-I editing is through enzymes binding to dsRNA, A-to-I editing sites have been used to identify dsRNA regions. However, dsRNA structures that undergo low-level RNA editing may escape from identification by editing-based methods. For example, brain samples from AD patients exhibit a lower level of A-to-I editing globally. Such disease-specific dsRNAs with less editing may be potent activators for immune response impacting the disease condition.

To overcome these limitations, we developed a new approach, named double-stranded RNA Identifier (dsRID), to detect dsRNA regions in an editing-agnostic manner. This method is built upon a previous observation that dsRNA structures may induce region-skipping in RNA-seq reads, an artifact likely reflecting intramolecular template switching in reverse transcription. Leveraging this observation and long-read RNA-seq data, we constructed a machine-learning model that extracts features from mapped reads and outputs predictions of dsRNA regions.

dsRID achieved in-silico identification of dsRNA regions independent of editing with high accuracy and precision (Average AUC of 0.95, AUPRC of 0.94 across 11 datasets). By applying this method, we identified 32391 novel dsRNA regions, 1.51 times more than dsRNA identified in the editing-based approach. We applied this method to long-read RNA-seq datasets derived from AD and control samples, which predicted novel dsRNAs with low RNA editing levels. Interestingly, there are higher fractions of dsRNA predicted in AD samples compared to control samples (p=0.017) and the overall expression of dsRNAs was higher in AD. This suggests that higher total production of dsRNAs might affect innate immune responses in the AD brain. Furthermore, we observed that dsRNA found specifically in AD samples have significantly lower editing levels (FC = 0.76, p=4.9e-6), showing that hypoediting happens not only in well characterized dsRNAs but also in dsRNAs that are not detected by the editing-based approach. Our findings emphasize the importance of identifying dsRNAs without relying on editing and demonstrate the utility of our editing-free dsRNA identification approach for studying dsRNAs associated with immune response and disease.

15:00-15:15
Session: Transcriptome
Pan-tissue Transcriptome Analysis Reveals Sex-dimorphic Human Aging
Format: Live from venue

Moderator(s): Roger Pique-Regi

  • Siqi Wang, Shanghai Institute of Nutrition and Health, CAS, China; Department of Integrative Biology and Physiology, UCLA, United States
  • Danyue Dong, Shanghai Institute of Nutrition and Health, CAS, China
  • Xin Li, Shanghai Institute of Nutrition and Health, CAS, China
  • Zefeng Wang, Shanghai Institute of Nutrition and Health, CAS, China


Presentation Overview: Show

Sex-dimorphic patterns are exhibited in many complex diseases, giving rise to sex-differential morbidity, mortality, and prognosis. Understanding the mechanisms underlying is crucial for developing effective disease treatments and preventative measures. The known molecular mechanisms of these differences mainly revolved around sex-differential genetics, epigenetics, and gene expression architectures. These mechanisms could be hormone-driven, mitochondria-related, or sex-chromosome-linked, contributing to the differences in human diseases among individuals. It is noteworthy that a large number of sex-dimorphic traits are also age-related, especially in neurodegenerative and cardiovascular diseases. Based on the current understanding, the aging process shows substantial variability between females and males, leading to different disease susceptibility and life expectancy. However, the underlying mechanisms remain unclear, with limited studies in multiple tissues. Moreover, since human transcriptomes show large dynamic changes under different conditions, it remains unknown whether alternative splicing, an essential process to increase proteomic diversity, also plays a crucial role in sex or age differences.
To address these compelling questions, we systematically analyzed ~17,000 transcriptomes derived from 35 human tissues, and evaluated the contributions of sex and age to transcriptomic variations by designing a novel Principal Component-based Signal-to-Variation Ratio (pcSVR). We found sex and age are critical drivers to global variations and the age effects are different between females and males. Also, our findings revealed extensive sex dimorphisms during aging with distinct patterns in gene expression (GE) and alternative splicing (AS). Intriguingly, in splicing isoform resolution, we found the sex-biased age-associated AS events (sBASEs) have a stronger association with Alzheimer’s disease in males. Most of the sBASEs are regulated by several sex-biased splicing factors during aging which are controlled by sex hormones. At the gene expression level, our breakpoint analysis showed sex-dimorphic aging rates, with males having larger and earlier transcriptome breakpoints, which is consistent with the decline of sex hormones. Collectively, our systematical study uncovered an essential role of sex during aging at the molecular and multi-tissue levels, providing new insights into sex-dimorphic regulatory patterns, and paving the way toward the development of sex-specific therapeutic approaches for age-related diseases.

15:45-16:00
Session: Disease
Identifying associations of rare noncoding variants with autism through integration of gene expression, sequence and sex information
Format: Live from venue

Moderator(s): Amin Emad

  • Runjia Li, UCLA, United States
  • Jason Ernst, UCLA, United States


Presentation Overview: Show

The growth of whole-genome sequencing (WGS) data has facilitated genome-wide identification of rare noncoding variants. However, elucidating these variants’ associations with complex diseases remains challenging.

To better understand these associations we first revisit a previous report of significant brain-related noncoding association signals of autism spectrum disorder (ASD) detected from de novo variants in the Simons Simplex Collection (SSC) WGS data when using a deep-learning-based pathogenicity score. We first demonstrate that local GC content is sufficient to capture association signals in variants near brain-expressed genes similar to those previously reported based on deep learning. Additionally, we show that this local GC content signal is specific to male probands with a female sibling (minimum p-value in brain tissue=1.3x10-4) compared to male probands with a male sibling (minimum p-value in brain tissue=0.31). We further show among male probands and female siblings the signal is specific to variants upstream of their assigned TSS (minimum p-value in brain tissue=3.3x10-6 for upstream variants vs. 0.057 for downstream variants).

Based on these findings, we developed an approach, k-mer-based gene expression neighborhood test (KGNT), to more systematically consider gene expression and sequence information for testing for association signals. KGNT first organizes variants into “neighborhoods” based on their assigned genes and pairwise gene expression correlations determined from a large compendium of expression data. Then for each neighborhood, KGNT evaluates whether the proband and sibling variants show differing k-mer compositions. We applied KGNT to the SSC de novo variants upstream of TSS in male proband-female sibling families to investigate ASD association signals beyond GC content and extracted specific k-mers showing proband-sibling associations not explained by GC content from the top neighborhoods. We showed that genes associated with top neighborhoods showed the strongest enrichment for brain-related gene ontology terms. In addition, we examined the chromatin state assignments of the variants in the top neighborhoods and observed differential enrichments between probands and siblings.

In summary, we identified using local GC content an ASD association signal from de novo noncoding variants in male probands with female siblings upstream of brain-expressed genes, which we were able to further refine and enhance with KGNT.

16:00-16:15
Session: Disease
Molecular Origins of Prostate Cancer Lethality
Format: Live from venue

Moderator(s): Amin Emad

  • Takafumi Yamaguchi, University of California, Los Angeles, United States
  • Kathleen Houlahan, University of California, Los Angeles, United States
  • Helen Zhu, University of Toronto, Canada
  • Natalie Kurganovs, University of Melbourne, Australia
  • Julie Livingstone, University of California, Los Angeles, United States
  • Natalie Fox, University of California, Los Angeles, United States
  • Jiapei Yuan, University of Texas, Southwestern, United States
  • Jocelyn Penington, Walter and Eliza Hall Institute Parkville, Australia
  • Chol-Hee Jung, University of Melbourne, Australia
  • Tommer Schwarz, University of California, Los Angeles, United States
  • Weerachai Jaratlerdsiri, Garvan Institute of Medical Research, Australia
  • Job van Riet, Erasmus MC Cancer Institute, Netherlands
  • Peter Georgeson, University of Melbourne, Australia
  • Stefano Mangiola, Walter and Eliza Hall Institute, Australia
  • Kodi Taraszka, University of California, Los Angeles, United States
  • Robert Lesurf, Ontario Institute for Cancer Research, Canada
  • Jue Jiang, Garvan Institute of Medical Research, Australia
  • Ken Chow, University of Melbourne, Australia
  • Lawrence Heisler, Ontario Institute for Cancer Research, Canada
  • Yu-Jia Shiah, Ontario Institute for Cancer Research, Canada
  • Susmita Ramanand, University of Texas, Southwestern, United States
  • Michael Clarkson, University of Melbourne, Australia
  • Anne Nguyen, University of Melbourne, Australia
  • Shadrielle Melijah Espiritu, Ontario Institute for Cancer Research, Canada
  • Ryan Stuchbery, University of Melbourne, Australia
  • Richard Jovelin, Ontario Institute for Cancer Research, Canada
  • Vincent Huang, Ontario Institute for Cancer Research, Canada
  • Connor Bell, Dana Farber Cancer Institute, United States
  • Edward O'Connor, Dana Farber Cancer Institute, United States
  • Patrick McCoy, University of Melbourne, Australia
  • Christopher Lalansingh, Ontario Institute for Cancer Research, Canada
  • Marek Cmero, University of Melbourne, Australia
  • Adriana Salcedo, University of California, Los Angeles, United States
  • Eva Chan, Garvan Institute of Medical Research, Australia
  • Lydia Liu, University of California, Los Angeles, United States
  • Phillip Stricker, St. Vincent's Hospital, Australia
  • Vinayak Bhandari, University of Toronto, Canada
  • Riana Bornman, University of Pretoria, South Africa
  • Dorota Sendorek, Ontario Institute for Cancer Research, Canada
  • Andrew Lonie, University of Melbourne, Australia
  • Stephenie Prokopec, Ontario Institute for Cancer, Canada
  • Michael Fraser, Ontario Institute for Cancer Research, Canada
  • Justin Peters, University of Melbourne, Australia
  • Adrien Foucal, Ontario Institute for Cancer Research, Canada
  • Shingai Mutambirwa, Sefako Makgatho Health Science University, South Africa
  • Lachlan Mcintosh, Walter and Eliza Hall Institute, Australia
  • Michèle Orain, Centre of CHU de Québec-Université Laval, Canada
  • Matthew Wakefield, University of Melbourne, Australia
  • Valérie Picard, Centre of CHU de Québec-Université Laval, Canada
  • Daniel Park, University of Melbourne, Australia
  • Hélène Hovington, Centre of CHU de Québec-Université Laval, Canada
  • Michael Kerger, University of Melbourne, Australia
  • Alain Bergeron, Centre of CHU de Québec-Université Laval, Canada
  • Veronica Sabelnykova, Ontario Institute for Cancer Research, Canada
  • Ji-Heui Seo, Harvard, United States
  • Mark Pomerantz, Harvard, United States
  • Noah Zaitlen, University of California, Los Angeles, United States
  • Sebastian Waszak, Centre for Molecular Medicine Norway, Norway
  • Alexander Gusev, Harvard, United States
  • Louis Lacombe, Centre of CHU de Québec-Université Laval, Canada
  • Yves Fradet, Centre of CHU de Québec-Université Laval, Canada
  • Andrew Ryan, TissuPath Specialist Pathology Services, Australia
  • Amar Kishan, University of California, Los Angeles, United States
  • Martijn Lokema, Erasmus MC Cancer Institute, Netherlands
  • Joachim Weischenfeldt, University of Copenhagen, Denmark
  • Bernard Têtu, Centre of CHU de Québec-Université Laval, Canada
  • Anthony Costello, University of Melbourne, Australia
  • Vanessa Hayes, Garvan Institute of Medical Research, Australia
  • Rayjean Hung, University of Toronto, Canada
  • Housheng He, University of Toronto, Canada
  • John McPherson, University of California, Davis, United States
  • Bogdan Pasaniuc, University of California, Los Angeles, United States
  • Theodorus van der Kwast, University of Toronto, Canada
  • Anthony Papenfuss, Walter and Eliza Hall Institute, Australia
  • Matthew Freedman, Broad Institute, United States
  • Bernard Pope, University of Melbourne, Australia
  • Robert Bristow, University of Manchester, Australia
  • Ram Mani, University of Texas, Southwestern, United States
  • Niall Corcoran, University of Melbourne, Australia
  • Jüri Reimand, University of Toronto, Canada
  • Christopher Hovens, University of Melbourne, Australia
  • Paul Boutros, University of California, Los Angeles, United States


Presentation Overview: Show

Prostate cancer is the most common non-skin malignancy in men worldwide and its incident is predicted to rise substantially due to an aging global population. A significant portion of prostate tumors grow slowly and may not require intensive treatment, while a minor fraction can be highly aggressive and have the potential to metastasize and be lethal. However, differentiating between these types of tumors using clinico-pathlogical features such as serum concentrations of prostate-specific antigen (PSA), tumor tissue architecture, and tumor size (T category) often presents challenges. Among the methods to gauge the aggressiveness of prostate tumors, tissue architecture is the most accurate, quantified as ISUP Grade Group. To elucidate the molecular drivers of prostate cancer lethality, we examined a set of 666 whole-genome sequenced tumor-normal pairs. Our analysis resulted in the development of an exhaustive compendium of driver aberrations, highlighting 223 recurrently mutated genomics regions. Many of these drivers are associated with changed activity of specific mutational signatures and gene expression, illustrating the influence of drivers on downstream mutational processes and transcriptomic dysregulations. Furthermore, a superset of driver events were identified in high-grade tumors including higher frequency of BRCA1 and MYC mutations. ISUP Grade associated driver mutations occur early in tumor evolution and their occurrence strongly coincides with cancer relapse and metastasis. Our data suggests that the composition of driver events not only shapes the trajectory of tumor evolution but also influences the lethality of prostate tumors.

16:15-16:30
Session: Disease
Characterization of Genomics Landscape and Natural History of Anaplastic Thyroid Cancer using High Depth WGS and Subclonal Reconstruction
Format: Live from venue

Moderator(s): Amin Emad

  • Mao Tian, University of California, Los Angeles, United States
  • Peter Zeng, Department of Otolaryngology - Head and Neck Surgery, Western University, London, Ontario, Canada., Canada
  • Stephen Lai, Department of Otolaryngology – Head and Neck Surgery, MD Anderson Cancer Center, Houston, Texas., United States
  • Michelle Williams, Department of Pathology, MD Anderson Cancer Center, Houston, Texas., United States
  • Christopher Howlett, Department of Pathology, Western University, London, Ontario, Canada., Canada
  • Paul Plantinga, Department of Pathology, Western University, London, Ontario, Canada., Canada
  • Matthew Cecchini, Department of Pathology, Western University, London, Ontario, Canada., Canada
  • Jianxin Wang, Center for Computational Research, Univerisity at Buffalo, Buffalo, New York, United States
  • Ren Sun, Ontario Institute for Cancer Research, Toronto, Canada., Canada
  • John Watson, Ontario Institute for Cancer Research, Toronto, Canada., Canada
  • Stephenie Prokopec, Ontario Institute for Cancer Research, Toronto, Canada., Canada
  • Reju Korah, Department of Surgery, Yale University, New Haven, Connecticut., Canada
  • Tobias Carling, Carling Adrenal Center, Wesley Chalpel, Florida, USA., United States
  • Nishant Agrawal, Section of Otolaryngology–Head and Neck Surgery, University of Chicago, Chicago, Illinois., United States
  • Nicole Cipriani, Department of Pathology, University of Chicago, Chicago, USA, United States
  • Cathie Garnis, BC Cancer Agency, Vancouver, British Columbia, Canada., Canada
  • Ken Berean, BC Cancer Agency, Vancouver, British Columbia, Canada., Canada
  • Norman Nicolson, Department of Surgery, Yale University, New Haven, Connecticut., United States
  • Ying Henderson, Department of Head and Neck Surgery, MD Anderson Cancer Center, Houston, Texas., United States
  • Christopher Lalansingh, Ontario Institute for Cancer Research, Toronto, Canada., Canada
  • Melinda Luo, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, USA, United States
  • Shupeng Luxu, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, USA, United States
  • Takafumi Yamaguchi, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, USA, United States
  • Julie Livingstone, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, USA, United States
  • Adriana Salcedo, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, USA, United States
  • Joe Mymryk, London Regional Cancer Program, London, Ontario, Canada., United States
  • Thomas Giordano, Department of Pathology, University of Michigan, Ann Arbor, Michigan., United States
  • John Barrett, Department of Otolaryngology - Head and Neck Surgery, Western University, London, Ontario, Canada., Canada
  • Paul Boutros, Department of Human Genetics, University of California, Los Angeles, CA, USA, United States
  • Anthony Nichols, Department of Otolaryngology - Head and Neck Surgery, Western University, London, Ontario, Canada., Canada
  • Anthony Gill, Cancer Diagnosis and Pathology Group, Royal North Shore Hospital, St Leonards NSW Australia 2065, Australia
  • Gary Clayman, The Clayman Thyroid Surgery and Thyroid Cancer Center at The Thyroid Institute at Tampa General Hospital, United States


Presentation Overview: Show

Anaplastic thyroid cancer (ATC) is among the most lethal cancer types, with a median survival rate of approximately 12 weeks. ATC is resistant to both chemo- and radiotherapy, a characteristic attributed to its surrounding tissues and rapid progression. Although ATC is less common than other thyroid cancers, such as papillary thyroid carcinoma (PTC) and differentiated thyroid carcinoma (DTC), it frequently co-occurs with both PTC and DTC. In this study, we assembled a 108-sample cohort consisting of 46 ATC patients and 5 thyroid cancer cell lines. We conducted high-depth (x90) whole genome sequencing on ATC samples, as well as on adjacent PTC or DTC tissues, and included blood controls when available. Our analysis identified both germline and somatic variants in ATC, revealing a moderate mutation burden compared to C-type tumors and a higher incidence of recurrent SNVs and CNVs than in PTC or DTC. Notably, ATC samples exhibited more active mutational signatures, such as SBS2 and SBS13, than DTC or PTC samples. Additionally, we employed subclonal reconstruction to model the natural history of ATC in relation to co-occurring DTC and PTC.

16:30-16:45
Session: Disease
Molecular Phenotyping of Rare Peripheral Immune Cells in Alzheimer’s Disease Brain using Single Cell Genomics
Format: Live from venue

Moderator(s): Amin Emad

  • Mai Yamakawa, Department of Neurology, University of California Los Angeles, United States
  • Xia Han, Department of Neurology, University of California Los Angeles, United States
  • Lawrence Chen, Department of Neurology, University of California Los Angeles, United States
  • Vivian Mitri, Department of Neurology, University of California Los Angeles, United States
  • Connor Webb, Department of Neurology, University of California Los Angeles, United States
  • Misty Knight, Department of Neurology, University of California Los Angeles, United States
  • Jessica Rexach, Department of Neurology, University of California Los Angeles, United States


Presentation Overview: Show

Introduction: Alzheimer’s dementia (AD) is a multi-stage neurodegenerative disorder characterized by amyloid  accumulation occurring ~10 years earlier than phosphorylated Tau deposits and cognitive impairment. Microglia has been studied for early neuroinflammation against amyloid accumulation, however, a different mechanism involving CD8 (+) T cells has been suggested as a neuroimmune response against Tau accumulation in the late stage of the disease by recent studies using animal and organoid models. This study aims to characterize CD8 (+) T cells in human postmortem AD brains using single-cell transcriptomics.
Methods: Single nuclear RNA sequencing (snRNAseq) of the SEA-AD cohort involving 89 donors with or without AD was analyzed focusing on microglia and CD8 (+) T cells, including differential gene expression, cell proportion analysis, weighted gene co-expression network analysis (WGCNA), and cell-cell interaction analysis using NATMI and MultiNicheNet. Select ligand-receptor pairs were validated internally and externally with ROS/MAP snRNAseq dataset and immunohistochemistry.
Results: CD8 (+) T cells were more abundant in AD brains and positively correlated with pathological stages. We identified 2 significantly disease-associated network modules of microglia; amyloid-responsive module and degeneration-associated module. Cell-cell interaction analysis showed that among all microglial states, pathology-associated microglia had the most interactions with CD8 (+) T cells. Validation results of select ligand-receptor pairs will be shown at the conference.
Conclusions: We demonstrated an abundance of CD8 (+) T cells in postmortem AD brains and their interaction partners and key molecules responsible for interactions. Further functional validation is needed for drug discovery targeting CD8 (+) T cells and their interactions.

16:45-17:00
Session: Disease
Single-nucleus chromatin accessibility uncovers glia activation of tau dementias
Format: Live from venue

Moderator(s): Amin Emad

  • Xia Han, University of California, Los Angeles, United States
  • Jessica Rexach, University of California, Los Angeles, United States


Presentation Overview: Show

The accumulation of abnormal tau protein can selectively affect different brain regions and specific populations of neurons and glia cells in various tau-related dementias, including Alzheimer’s disease (AD), Behavioral variant frontotemporal dementia (bvFTD) and Progressive Supranuclear palsy (PSP). The fundamental regulatory mechanisms governing selective vulnerability of cell types remain elusive. Here, we present a single nucleus study of 663,896 nuclei profiling chromatin accessibility from 3 brain regions (precentral gyrus, mid insula and calcarine) of 40 individuals (n = 19 control, n = 22 AD, n = 21 bvFTD, n = 24 PSP). Integrative analysis of transcriptomic and chromatin-accessibility identified 238,126 cis regulatory elements (CRE) and 25 functional modules underlying cell-type specific regulation. We observed that groups of CRE modules are primarily activated in PSP astrocytes and FTD oligodendrocytes, respectively. Additionally, we noted an increase in cellular interactions involving astrocytes in tau pathology, while interactions involving oligodendrocytes were more frequent in FTD. In the midinsula, we identified an astrocyte subcluster (ast.C1) relevant to myelination with increased cell presence in PSP, and an oligodendrocyte subcluster (odc.C7) depleted in FTD. When using causal modeling to align candidate subclusters along the disease cascade, we found that neurodegeneration mediate 52% of the decreasing in odc.C7 in FTD, suggesting it as a candidate vulnerable subcluster in FTD. Furthermore, we mapped disease risk variants to regions of chromatin accessibility for cell subclusters, identifying a myelination-related microglia subcluster (mg.C4) with reduced granulin contributes to FTD risk heritability, which is distinct from another lipid-related microglia subcluster (mg.C11) that contribute to multiple sclerosis. These findings bolster the multicellular model of neurodegeneration, wherein various glial types interact with disease-specific risk genes, pathology, and one another in disorder-specific cellular networks. This work defines the reproducible and distinct molecular pathological phenotypes of disease, expanding understanding of cellular basis of tau pathology in the human brain.