Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CDT
Wednesday, October 2nd
9:00-9:15
Welcome - Day 2
Format: In person

Moderator(s): Sushmita Roy


Authors List: Show

9:15-10:15
Invited Presentation: RSG Keynote 1 - Empirical Bayes matrix factorization, and genomic applications
Confirmed Presenter: Matthew Stephens, University of Chicago, United States

Format: In Person

Moderator(s): Sushmita Roy


Authors List: Show

  • Matthew Stephens, University of Chicago, United States

Presentation Overview: Show

Matrix factorization techniques, such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF), are widely used for helping to summarize and interpret data that arise in many genomic applications. This talk will review some of the factors that can affect interpretability of results from matrix factorization techniques, including both non-negativity and sparsity of the matrices involved. We will describe an Empirical Bayes approach to matrix factorization that can be used to induce both non-negativity and sparsity, in the representation, and show results from several genomic applications to illustrate these ideas.

10:45-11:00
Valid inference for machine learning-assisted GWAS
Confirmed Presenter: Jiacheng Miao, University of Wisconsin-Madison, United States

Format: In Person

Moderator(s): Sushmita Roy


Authors List: Show

  • Jiacheng Miao, University of Wisconsin-Madison, United States
  • Yixuan Wu, University of Wisconsin-Madison, United States
  • Zhongxuan Sun, University of Wisconsin-Madison, United States
  • Xinran Miao, University of Wisconsin-Madison, United States
  • Tianyuan Lu, University of Wisconsin-Madison, United States
  • Jiwei Zhao, University of Wisconsin-Madison, United States
  • Qiongshi Lu, University of Wisconsin-Madison, United States

Presentation Overview: Show

Machine learning (ML) has revolutionized analytical strategies in almost all scientific disciplines including human genetics and genomics. A rising trend in complex trait genetics research is the ML-assisted genome-wide association study (GWAS), which applies advanced ML techniques to predict phenotypes that are difficult/expensive to measure (e.g., undiagnosed diseases, imaging-derived traits, molecular traits in rare tissues), and then conducts GWAS on these ML-imputed outcomes. However, all existing approaches for ML-assisted GWAS treat imputed phenotypes as observed and completely neglect the inherent uncertainty associated with “black-box” ML algorithms. In this work, we first demonstrate the risk of pervasive false positive associations in existing ML-assisted GWAS. Using real data benchmarking, we find that 81% of the associations in GWAS of ML-imputed type 2 diabetes fail to be replicated by the GWAS of ground truth type 2 diabetes. We then introduce POP-GWAS, a principled statistical framework that reimagines ML-assisted GWAS, ensuring valid and powerful results irrespective of the accuracy of imputation, the choice of ML algorithm, and variables used for imputation. It is a versatile tool that can account for binary phenotypes, sample relatedness, and selection bias, making it suitable for broad applications. It also only requires GWAS summary statistics as input and is computationally fast. We even prove that POP-GWAS is the statistically optimal solution to ML-assisted GWAS. Using POP-GWAS, we performed the largest-ever GWAS and rare-variant association analysis on bone mineral density (BMD) derived from dual-energy X-ray absorptiometry imaging (DXA) at 14 skeletal sites, achieving a 9.7%-50.7% gain in effective sample size. We identified 89 novel genome-wide significant loci not previously implicated in BMD GWAS and revealed the skeletal site-specific genetic architecture of BMD. It also nominated novel head-specific BMD genes including LGR5 supported by evidence of Lgr5-deleted mice exhibiting a range of craniofacial abnormalities. We further identified 47 novel genes associated with BMD through ML-assisted rare-variant association analysis, pinpointing potential therapeutic targets for osteoporosis. Our framework may fundamentally reshape the analytical strategies in future ML-assisted genetic association studies.

11:00-11:15
Integrating Perturb-seq and GWAS summary statistics identifies core genes for human complex disease
Confirmed Presenter: Stephen Dorn, University of Wisconsin-Madison, United States

Format: In Person

Moderator(s): Sushmita Roy


Authors List: Show

  • Stephen Dorn, University of Wisconsin-Madison, United States
  • Jiacheng Miao, University of Wisconsin-Madison, United States
  • Zijie Zhao, University of Wisconsin-Madison, United States
  • Yuchang Wu, University of Wisconsin-Madison, United States
  • Qiongshi Lu, University of Wisconsin-Madison, United States

Presentation Overview: Show

For many diseases, genome-wide association studies (GWAS) have found hundreds of statistically significant SNPs with small effect sizes, which are often hypothesized to have regulatory effects on the expression of nearby genes. However, it remains challenging to identify disease genes with large effects and biological mechanisms underlying infinitesimal associations across many genomic loci. We hypothesize that there are a small number of core genes that are key regulators of many important disease genes with GWAS hits. However, since these core genes have large effects on disease risk, their expression levels and regulatory genomic elements are highly constrained, making them challenging to uncover through GWAS. In Perturb-seq, highly constrained genes can be perturbed, allowing us to experimentally infer these genes’ effects on expression levels. In this work, we present a statistical framework which identifies highly constrained core genes for complex disease through integrated analysis of Perturb-seq and GWAS summary statistics. Our approach measures the concordance of Perturb-seq effects, which quantify the direct impact of perturbations on downstream genes, and the transcriptome-wide gene-level associations for the disease in a matched cell type to identify crucial genes with large, synergistic regulatory effects on many downstream genes implicated in disease risk. We applied our method to study coronary artery disease (CAD) using the largest publicly available CAD GWAS and a Perturb-seq dataset from human-derived endothelial cells (hECs) focusing on genes near known CAD GWAS loci. We identified three significant genes associated with CAD after multiple testing correction, including NPR1 (p=7.24e-06), a gene encoding the receptor for natriuretic peptides ANP and BNP which are the targets of multiple therapeutic drugs for heart failure. NPR1 has not previously been implicated in CAD through GWAS, but rare variants in its promoter cause heart failure in humans and NPR1 knockout in mice and hECs lead to increased vascular endothelial cell adhesion. Our model can be applied in future Perturb-seq studies to associate experimental perturbation effects with complex human traits, providing novel insights into disease mechanisms and therapeutic targets.

11:15-11:30
Supervised learning of enhancer–promoter specificity based on genome-wide perturbation studies highlights areas for improvement in learning
Confirmed Presenter: Mira Han, University of Nevada, Las Vegas, United States

Format: In Person

Moderator(s): Sushmita Roy


Authors List: Show

  • Dylan Barth, University of Nevada, Las Vegas, United States
  • Richard Van, University of Nevada, Las Vegas, United States
  • Jonathan Cardwell, University of Colorado, United States
  • Mira Han, University of Nevada, Las Vegas, United States

Presentation Overview: Show

Understanding enhancer-driven transcription is a key challenge in genomics. With the publication of multiple enhancer perturbation assays, we now have sufficient data to predict enhancer-promoter (EP) relationships using data-driven approaches. We applied machine learning to one of the largest enhancer perturbation studies by Gasperini et al., integrating it with transcription factor (TF) and histone modification ChIP-seq data. We used XGBoost for modeling, with feature selection using Boruta and hyperparameter optimization using Optuna.
Our analysis revealed discrepancies between genome-wide data and targeted experiment data. Prediction on genome-wide data improved by including TF peaks into the features. But when data was filtered for genes with at least one positive enhancer, additional TF information did not enhance performance, chromatin and contact was the only information necessary. Factors contributing to successful EP pair prediction included stronger Hi-C contact, shorter distance between enhancer and target transcription start site (TSS), higher target gene expression, elevated H3K27ac at the enhancer, and fewer neighboring enhancers and promoters. The feature importance supported the ABC model and highlighted the significance of relative Hi-C contact strength.
Improved model performance seen in genome-wide data was driven by reduced false positives, identifying genes insensitive to enhancers. Strong H3K27ac, EMSY, and HCFC1 at the TSS predicted negative genes with self-sufficient promoters that are not regulated by distal enhancers. Conversely, well-known hematopoietic and cancer-related TFs such as KLF16, GATA1, GATA2, NR2F2, STAT5A, NFIC, PML, and FOXM1 at enhancers predicted functional EP pairs for the K562 cell line. TFs at enhancers and TSS predicting positive EPs also included chromatin-modifying complex proteins such as NuRD and SWI/SNF. One of the few TF clusters at the enhancer that predicted negative EP pairs was the cluster that included AP-1 complex proteins. We speculate that these are likely robust enhancer hubs not easily perturbed by CRISPR interference of a single enhancer.
Additionally, novel features like enhancer/promoter density were found important, revealing gaps in our understanding on how other elements in the region contribute to the regulation. Sorting enhancers by contact identified a class of atypical enhancers in high-density genomic regions. Only among the indirect EPs without strong contact, we saw a subset of EP pairs that showed opposite trend from the rest of the data, i.e. stronger positive prediction when the enhancer promoter density of the region is larger. In summary, integrating genomic assays with enhancer perturbation studies enhanced model accuracy and provided new insights into enhancer-driven transcription.

11:30-11:45
Airqtl: causal inference of gene regulatory networks from efficient single-cell eQTL mapping
Confirmed Presenter: Lingfei Wang, University of Massachusetts Chan Medical School, United States

Format: In Person

Moderator(s): Sushmita Roy


Authors List: Show

  • Lingfei Wang, University of Massachusetts Chan Medical School, United States

Presentation Overview: Show

Gene regulatory networks (GRNs) underpin cell identity, function, and response to external cues. Their systematic reconstruction from data not only improves our understanding of diseases mechanisms at the molecular level, but also aids the translational search for therapeutic targets and disease subtypes.
Traditional methods for GRN reconstruction often rely on gene expression correlation, mutual information, or inter-predictivity, which struggle to distinguish causal regulations from reverse causality or confounding. Causal inference from large-scale single-cell perturbation studies provides a reliable solution with improved throughput and statistical power. However, their inferred causal GRNs (cGRNs) are restricted to engineered cell lines from special donors, granting limited generalizability to primary cells or the whole population.

We propose to infer cGRNs from population-scale scRNA-seq studies, where genetic variations serve as natural perturbations for causal inference or Mendelian randomization. This provides to systematically infer transcriptome-wide cGRNs for every sampled primary cell type/state and for the whole population. However, it requires to map cis- and trans- single-cell expression quantitative trait loci (sceQTLs) efficiently and accurately, which is challenged by single-cell sparsity, scalability, and genetic relatedness between cells.

To address these challenges, we introduce airqtl, a highly efficient and accurate method for sceQTL mapping and cGRN inference. Airqtl first normalizes scRNA-seq read counts with Normalisr, which is designed to enable efficient and accurate linear modeling of gene expression compared to generalized linear models. Airqtl then employs linear mixed models to account for genetic relatedness between cells that are unaccounted for by bulk eQTL mapping methods such as FastQTL. It also includes a novel exact algorithm to leverage the unique characteristics of single-cell data and outperform traditional computational complexities such as FaST-LMM. Further enhanced by Graphic Processing Units (GPUs), airqtl is eight orders of magnitude faster than existing methods for sceQTL mapping, offering comparable or superior statistical accuracy. Its efficient cis- and trans-sceQTL mapping capabilities have enabled the first inference of cell-type and state specific cGRNs at the population scale. Moreover, airqtl’s unprecedented speed facilitated the comprehensive benchmarking and iterative improvement of cell-type specific sceQTL mapping.

Airqtl's acceleration is fundamental and generalizable to other population-scale single-cell studies. We expect airqtl to become an indispensable tool to navigate the complex landscape of modern eQTL analysis and infer multi-modal cGRNs at the single-cell level.

11:45-12:00
GET-nQTL: Gene Embedding Transformed Centrality Based Network QTL (nQTL) Analysis
Confirmed Presenter: Shuchen Yan, University of Wisconsin-Madison, United States

Format: In Person

Moderator(s): Sushmita Roy


Authors List: Show

  • Shuchen Yan, University of Wisconsin-Madison, United States
  • Zhongxuan Sun, University of Wisconsin-Madison, United States
  • Sunduz Keles, University of Wisconsin-Madison, United States

Presentation Overview: Show

Genome-wide association studies (GWAS) and molecular quantitative loci mapping studies (e.g., eQTL) are hallmarks of identifying genetic variants linked to diseases and other phenotypes. In the past several years, population cohort scRNA-seq studies have enabled the estimation of personalized co-expression networks. Recent research also implies that some genetic variants linked to diseases might not be associated with changes in individual gene expression but rather with variations in the co-expression patterns within the gene network, i.e., Network-QTL or nQTL. However, the methodologies currently available for studying nQTL remain limited. We leveraged Dozer, a debiased personalized gene co-expression network estimation method, and conducted nQTL analysis on CD4+ T cell single-cell datasets. Starting with directly testing network centrality changes associated with genetic variants, we developed a network QTL approach named GET-nQTL, which integrates prior biological information represented by gene embeddings from state-of-the-art (SOTA) large-language models (LLM) into network centrality, to identify the effects of genetic variants on gene co-expression network properties. Our approach addresses some challenges that previous methods have inadequately considered, thereby reducing the number of false positives and increasing the power of detecting hidden signals of network changes. Similar to centrality-based methods, GET-nQTL can detect changes in the number of neighbors of a given gene. Even when the number of neighbors stays the same, GET-nQTL can capture changes in the correlation strength of the neighbors, which may imply different gene co-expression behaviors related to genetic variants. In conclusion, GET-nQTL discovers nQTLs with genetic variants linked to gene co-expression patterns and their potential association with phenotypes.

14:00-15:00
Invited Presentation: RSG Keynote 2 - Compressed representations of epigenetic data
Confirmed Presenter: Maria Chikina

Format: In person

Moderator(s): Tony Gitter


Authors List: Show

  • Maria Chikina

Presentation Overview: Show

One of the grand challenges of modern biology is understanding how the diverse array of cellular phenotypes emerges from a single genome. Large-scale efforts in epigenetic data generation have produced vast amounts of data, which continue to be analyzed, yielding new insights into gene regulation and cellular function. Given that every epigenetic assay is fundamentally the result of biochemical processes interacting with the DNA sequence, a central challenge in epigenetic analysis is inferring the details of these processes from the assay outputs. We approach this problem by seeking compressed representations of epigenetic readouts that capture the key biological features. We will explore several recent models designed to perform this compression task, producing compact and biologically interpretable representations of diverse epigenetic data.

15:00-15:15
SCOPRE: Sub-kilobase Compartment Prediction Using a Long Short Term Memory Model
Confirmed Presenter: Dante Bolzan, La Jolla Institute for Immunology, United States

Format: In Person

Moderator(s): Tony Gitter


Authors List: Show

  • Dante Bolzan, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States

Presentation Overview: Show

Chromatin can be broadly classified into two compartments that correspond to active and inactive regions respectively: A and B. A/B compartments are identified using pairwise contacts from Hi-C, a proximity ligation-based technique, mainly through PCA. Compartments are commonly called at 50kb to 1Mb resolution. Recently, Harris et al., 2023 called “sub-kb compartments” at an unprecedented 500bp resolution in GM12878 from a Hi-C map containing 33 billion contacts and a new algorithm called POSSUM. The high cost of sequencing tens of billions reads prohibits the identification of such sub-kb compartments in most other cell lines. In order to enable this, we developed SCOPRE, a deep learning model that predicts sub-kb compartments from 1D epigenomic tracks and sequence features.
We formulate compartment prediction as a binary classification problem where each 500bp bin is classified as A or B. Each bin is represented by the mean signal of each of the seven epigenomic features (H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, H3K9me3, and ATAC-seq) and two genomic features (GC-content and gene overlap). SCOPRE consists of two modules: an LSTM (Long Short Term Memory) and a fully connected layer. LSTMs are a variant of recurrent neural networks that were designed to facilitate learning when to forget and when to remember relevant information in sequential data. SCOPRE’s LSTM module is bidirectional allowing it to encode dependencies between bins to the left and right, and is multilayered, allowing it to learn complex global dependencies between bins. The sequence of binned features is fed into the LSTM module and the output of the last layer is fed into the fully connected module. The fully connected module applies a linear transformation to these weights to assign a compartment label. SCOPRE accurately annotates sub-kilobase compartments using a leave-one-chromosome-out training strategy in two cell lines achieving a median accuracy of 90% for GM12878 and 94% for HFFc6. SCOPRE significantly outperforms non-sequential baseline models such as logistic regression, SVM, Random Forest and simple baseline predictors that use one input feature at a time. SCOPRE’s misclassified bins have eigenvector values enriched around zero (i.e., bins with weak compartment association) compared to correctly classified bins showing the model’s accuracy for bins with well-defined compartment association.
By applying integrated gradients to SCOPRE to determine which features are most salient when predicting on unseen sequences, we uncover cell line-specific as well as locus-specific differences that may help elucidate the relationship between these 1D features and sub-kilobase compartments.

15:15-15:30
Identification of regulatory mechanisms enabling escape from X-chromosome inactivation
Confirmed Presenter: Wyeth Wasserman, University of British Columbia, Canada

Format: In Person

Moderator(s): Tony Gitter


Authors List: Show

  • Wyeth Wasserman, University of British Columbia, Canada
  • Carolyn Brown, University of British Columbia, Canada
  • Samantha Peeters, University of British Columbia, Canada
  • Aditi Srinivasan, University of British Columbia, Canada

Presentation Overview: Show

In mammalian cells with two copies of the X chromosome, one copy is inactivated by an epigenetic process, forming into the compressed Barr body. The process maintains comparable gene dosage between XX and XY cells. A subset of genes on the inactivated X (Xi) chromosome continue to be transcribed, albeit at reduced levels compared to genes on the active X (Xa). Understanding how these genes maintain transcriptional activity will provide insights into the regulatory processes. Using a genetics model, we test specific DNA sequences for their capacity to direct escape expression on the Xi chromosome. Specific essential regions have recently been identified that when modified block escape. Using bioinformatics methods, we seek to identify characteristic features of 55 escaping genes compared to a subset of genes subject to inactivation (selected to match properties of the escape genes). Enrichment analysis of predicted transcription factor (TF) motifs and databases of experimentally generated ChIP-seq peaks reveal sets of TFs with association to the escaping genes. A RPS4X “minigene” (i.e. a gene in which large portions of introns have been removed) is introduced which has the capacity to escape on the X chromosome. Swapping the immediate promoter with the promoter of another escape gene (KDM5C) retains escape, while exchange with a ~700bp promoter of a gene subject to inactivation blocks escape. Shuffling a ~1000bp intronic region blocks escape, but shuffling of the later 500 bp of the region does not, indicating the presence of an essential cis-regulatory feature in the initial 500 bp. Highlighted bioinformatics analyses will include prediction of RNA structure and an analysis of the properties of transcription initiation in the intronic region based on capped RNA tags (CAGE) data.

15:30-15:45
MINTsC learns multi-way chromatin interactions from single cell high throughput chromatin conformation data
Confirmed Presenter: Kwangmoon Park, University of Wisconsin-Madison, United States

Format: In Person

Moderator(s): Tony Gitter


Authors List: Show

  • Kwangmoon Park, University of Wisconsin-Madison, United States
  • Tianchuan Gao, Indiana University, United States
  • Jingwen Yan, Indiana University, United States
  • Sunduz Keles, University of Wisconsin-Madison, United States

Presentation Overview: Show

A number of foundational analysis methods have emerged for single cell chromatin
conformation (scHi-C) datasets capturing 3D organizations of genomes with pairwise measurements
at the single cell or nuclei resolution; however, these datasets are currently under-utilized.
The canonical analyses of scHi-C data encompass, beyond standard cell type identification, inference
of chromosomal structures, and pairwise interactions. However, multi-way regulatory
interactions among genomic elements are entirely overlooked. We introduce MINTsC to learn
Multi-way INTeractions from single cell Hi-C. MINTsC builds on a dirichlet-multinomial
spline model and yields multi-way interaction scores by aggregating pairwise interactions
across cells of a context and summarizing them using order statistics of pairwise test statistics.
MINTsC yields well-calibrated p-values for controlling the false discovery rate. Evaluation
of MINTsC with scHi-C datasets from cell lines and complex tissues using multiple external
genomic and epigenomic datasets support multi-way interactions inferred by MINTsC. Application
of MINTsC to scHi-C data from human prefrontal cortex revealed multi-way interactions
with biological implications of gene regulation by multiple enhancers. Most notably,
MINTsC-inferred multi-way interactions demonstrate its potential for probing molecular QTL
and association studies for epistatic SNP effects by significantly reducing the multiple-testing
burden. MINTsC is publicly available at https://github.com/keleslab/mintsc.

15:45-16:00
The Phylogenetic Dynamic Regulatory Module Networks (P-DRMN) study infers Cis-regulatory features responsible for evolution of mammalian gene regulatory programs in aortic endothelium
Confirmed Presenter: Suvojit Hazra, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States

Format: In Person

Moderator(s): Tony Gitter


Authors List: Show

  • Suvojit Hazra, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Sara A Knaack, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Erika Da-Inn Lee, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Liangxi Wang, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Canada
  • Mohamed Hawash, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Canada
  • Huayun Hou, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Canada
  • Michael Wilson, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Canada
  • Sushmita Roy, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States

Presentation Overview: Show

Cis-regulatory elements such as promoters and enhancers are abundant in the genome and drive gene regulatory programs, contributing to morphological diversity across species. Comparative regulatory genomic studies, which collect and compare omic measurements across species, offer a powerful framework to systematically study the role of gene regulation in the evolution of complex traits. Recently such studies have expanded to measure multi-omic measurements combining transcriptome with epigenome assays. However, computational tools that are phylogenetically-aware to analyze such multi-omic, high-dimensional measurements across species remain scarce. Here, we introduce Phylogenetic Dynamic Regulatory Module Networks (P-DRMN), a novel multi-task regression-based learning algorithm that predicts dynamic gene module regulatory networks using transcriptome (RNA-seq), chromatin accessibility (ATAC-seq), and histone marks (ChIP-seq) data, leveraging phylogenetic relationships. P-DRMN models gene expression at the level of similarly expressing gene modules based on low-to-high expression of input genes and outputs gene regulatory programs for each species, where a gene's membership in a module is based on a regression function of upstream cis-regulatory features. We applied P-DRMN to basal level aortic endothelial cell expression, promoter and motif accessibility, and five histone marks (H3K27ac, H3K36me3, H3K4me3, H3K4me2, H3K27me3) measured in five mammalian species (human, rat, cow, pig, and dog). P-DRMN infers 19-65% conservation of gene modules across species with the highly expressed modules exhibiting the greatest conservation and the low expression modules exhibiting significant divergence across species. Genes that change their expression state across species were grouped into 103 transitioning gene sets exhibiting diverse phylogenetic trends including both species and clade-specific patterns. Several of these gene sets were further predicted to be regulated by a combination of transcription factors (TFs) and chromatin mark profiles. For instance, one geneset with human-specific high expression was predicted to be regulated by CTCF while another with pig and cow-specific high expression involved SHOX2, H3K27me3, and H3K4me3. This indicates that the key TFs of cis-regulatory features and their chromatin architecture differentially orchestrated the aortic endothelial cellular regulatory function and physiology across mammals. Taken together, P-DRMN offers a powerful framework to leverage multi-omics measurements (transcriptome and epigenome) across species to systematically examine the evolution of gene regulatory programs and networks.

16:30-16:45
Accessible chromatin maps of inflammatory bowel disease intestine nominate cell-type mediators of genetic disease risk
Confirmed Presenter: Zi Yang, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States

Format: In Person

Moderator(s): Hatice Osmanbeyoglu


Authors List: Show

  • Zi Yang, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Joseph A. Wayman, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Elizabeth Angerman, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Erin Bonkowski, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Ingrid Jurickova, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Xiaoting Chen, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Anthony T. Bejjani, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Lois Parks, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Sreeja Parameswaran, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, United States
  • Alexander G. Miethke, Cincinnati Children’s Hospital Medical Center; University of Cincinnati College of Medicine, Cincinnati, OH, United States
  • Kelli L. Vandussen, Cincinnati Children’s Hospital Medical Center; University of Cincinnati College of Medicine, Cincinnati, OH, United States
  • Jasbir Dhaliwal, Cincinnati Children’s Hospital Medical Center; University of Cincinnati College of Medicine, Cincinnati, OH, United States
  • Matthew T. Weirauch, Cincinnati Children’s Hospital Medical Center; University of Cincinnati College of Medicine, Cincinnati, OH, United States
  • Leah C. Kottyan, Cincinnati Children’s Hospital Medical Center; University of Cincinnati College of Medicine, Cincinnati, OH, United States
  • Lee A. Denson, Cincinnati Children’s Hospital Medical Center; University of Cincinnati College of Medicine, Cincinnati, OH, United States
  • Emily R. Miraldi, Cincinnati Children’s Hospital Medical Center; University of Cincinnati College of Medicine, Cincinnati, OH, United States

Presentation Overview: Show

Inflammatory Bowel Disease (IBD) is a chronic and often debilitating autoinflammatory condition, with an increasing incidence in children. Standard-of-care therapies lead to sustained transmural healing and clinical remission in fewer than one-third of patients. For children, TNFa inhibition remains the only FDA-approved biologic therapy, providing an even greater urgency to understanding mechanisms of response. Genome-wide association studies (GWAS) have identified 418 independent genetic risk loci contributing to IBD, yet the majority are noncoding and their mechanisms of action are difficult to decipher. If causal, they likely alter transcription factor (TF) binding and downstream gene expression in particular cell types and contexts. To bridge this knowledge gap, we built a novel resource: multiome-seq (tandem single-nuclei (sn)RNA-seq and chromatin accessibility (snATAC)-seq) of intestinal tissue from pediatric IBD patients, where anti-TNF response was defined by endoscopic healing. From the snATAC-seq data, we generated a first-time atlas of chromatin accessibility (putative regulatory elements) for diverse intestinal cell types in the context of IBD. For cell types/contexts mediating genetic risk, we reasoned that accessible chromatin will co-localize with genetic disease risk loci. We systematically tested for significant co-localization of our chromatin accessibility maps and risk variants for 758 GWAS traits. Globally, genetic risk variants for IBD, autoimmune and inflammatory diseases are enriched in accessible chromatin of immune populations, while other traits (e.g., colorectal cancer, metabolic) are enriched in epithelial and stromal populations. This resource opens new avenues to uncover the complex molecular and cellular mechanisms mediating genetic disease risk.
URL: https://www.biorxiv.org/content/10.1101/2024.02.09.579678v1

16:45-17:00
Predicting neurological disease status using neural somatic variants
Confirmed Presenter: Siddhant Sanghi, University of California, Davis, United States

Format: In Person

Moderator(s): Hatice Osmanbeyoglu


Authors List: Show

  • Siddhant Sanghi, University of California, Davis, United States
  • Aruna Nannapaneni, University of California, Davis, United States
  • Gerald Quon, University of California, Davis, United States

Presentation Overview: Show

Somatic mosaicism, in which different sets of cells harbor different somatic variants that originated post-zygotically due to mutations occurring during development, is prevalent in neuronal cells in the human brain. While a number of previous studies have characterized the number and types of somatic variants found in deeply sequenced brain samples across individuals, there is considerably less work exploring the relationship between these somatic variants and the development of neurological disorders.

In this work, we propose to use language modeling to test the hypothesis that the spectrum of somatic variants found in an individual’s brain is predictive of their neurological case-control status. To do so, we obtained sets of somatic variants identified across two case-control studies of Autism Spectrum Disorder and Tourette’s Syndrome. Using a combination of multimodal single cell datasets and histone modification profiles, we constructed a variant embedding space based on both coding and non-coding genome annotations. We then used a multi-layer transformer encoder to predict case-control status of each individual, given their collection of somatic variants and their respective embeddings.

Overall, we found a predictive signal suggesting systematic differences in somatic mutations between cases and controls during brain development, with embeddings for the allele frequency of each somatic mutation being one of the more predictive features. To further investigate the potential functional impacts of the somatic mutations, we generated embeddings for the mutations using coding and non-coding genomic data. We found disruptions in non-coding cis-regulatory elements were most predictive of disease status, and more predictive than target gene embeddings based on RNA-seq data. These findings in total suggest that somatic variants may play a role in neurological disease risk via disruption of non-coding regulatory elements. We are applying this same framework to determine potential mechanisms through which tumour somatic mutations may ultimately affect clinical phenotypes such as patient outcome.

In conclusion, our approach has the potential to provide a unified framework to study somatic mutations across various diseases.

17:00-17:15
NeuroTD: Deep Learning Approach to Analyze Time Delays in Neural Activities
Confirmed Presenter: Xiang Huang, University of Wisconsin-Madison, United States

Format: In Person

Moderator(s): Hatice Osmanbeyoglu


Authors List: Show

  • Xiang Huang, University of Wisconsin-Madison, United States
  • Athan Li, University of Wisconsin-Madison, United States
  • Noah Kalafut, University of Wisconsin-Madison, United States
  • Sayali Alatkar, University of Wisconsin-Madison, United States
  • Qiping Dong, University of Wisconsin-Madison, United States
  • Qiang Chang, University of Wisconsin-Madison, United States
  • Daifeng Wang, University of Wisconsin-Madison, United States

Presentation Overview: Show

Studying temporal features of neural activities is crucial for understanding functions of neurons as well as underlying neural circuits. To this end, recent researches employ emerging techniques including calcium imaging in freely behaving animals, Neuropixels, depth electrodes, and Patch-seq to generate multimodal time-series data that depict the activities of single neuron, group of neurons, and behavior. However, challenges persist, including the analysis of noisy, high-sampling-rate neuronal data, and the modeling of temporal dynamics across various modalities.
To address these computational challenges, we developed NeuroTD, a novel deep learning approach to align multimodal time-series datasets and infer cross-modality temporal relationships such as time delays/shifts. Particularly, NeuroTD integrates Siamese neural networks with frequency domain transformations and complex value optimization for the inference.
We applied NeuroTD to three multimodal datasets to: (1) analyze electrophysiological (ephys) time series measured by depth electrodes, identifying time delays among neurons across various positions, (2) investigate neural activity and behavioral time series data derived from Neuropixels and 3D motion captures, establishing causal relationships between neural activities and corresponding behavioral activities, and (3) explore gene expression and ephys data of single neurons from Patch-seq, identifying gene expression signatures highly correlated with time shifts in ephys responses.
In conclusion, NeuroTD is a deep learning approach for analyzing multimodal time-series data and inferring time delays across neural activities. NeuroTD is open-source and designed for general use within the computational biology and bioinformatics community.

17:15-17:30
The contribution of silencer variants to human disease
Confirmed Presenter: Ivan Ovcharenko, NIH, United States

Format: In Person

Moderator(s): Hatice Osmanbeyoglu


Authors List: Show

  • Di Huang, NIH, United States
  • Ivan Ovcharenko, NIH, United States

Presentation Overview: Show

Although disease-causal genetic variants have been found within silencer sequences, we still lack a comprehensive analysis of the association of silencers with diseases. Here, we profiled GWAS variants in 2.8 million candidate silencers across 97 human samples derived from a diverse panel of tissues and developmental time points, using deep learning models.

We show that candidate silencers exhibit strong enrichment in disease-associated variants, and several diseases display a much stronger association with silencer variants than enhancer variants. Close to 52% of candidate silencers cluster, forming silencer-rich loci, and, in the loci of Parkinson’s-disease-hallmark genes TRIM31 and MAL, the associated SNPs densely populate clustered candidate silencers rather than enhancers displaying an overall twofold enrichment in silencers versus enhancers. The disruption of apoptosis in neuronal cells is associated with both schizophrenia and bipolar disorder and can largely be attributed to variants within candidate silencers. Our model permits a mechanistic explanation of causative SNP effects by identifying altered binding of tissue-specific repressors and activators, validated with a 70% of directional concordance using SNP-SELEX. Narrowing the focus of the analysis to individual silencer variants, experimental data confirms the role of the rs62055708 SNP in Parkinson’s disease, rs2535629 in schizophrenia, and rs6207121 in type 1 diabetes.

In summary, our results indicate that advances in deep learning models for the discovery of disease-causal variants within candidate silencers effectively “double” the number of functionally characterized GWAS variants. This provides a basis for explaining mechanisms of action and designing novel diagnostics and therapeutics.

17:30-17:50
Invited Presentation: Accelerating Bioinformatics Workflows with Interactive High-Performance Computing
Confirmed Presenter: Camilo Buscaron

Format: In person

Moderator(s): Hatice Osmanbeyoglu


Authors List: Show

  • Camilo Buscaron

Presentation Overview: Show

Bioinformatics research is advancing rapidly thanks to techniques like next-generation high-throughput sequencing, computational mass spectrometry, and computational biophysics. Automation tools also play a significant role in scaling these processes. Such surge in bioinformatics workloads has led to an unprecedented accumulation of biological data, necessitating high-performance and high-throughput computing technologies to process these massive datasets. Hardware accelerators and massively parallel heterogeneous computing systems are key to speeding up the processing of big data in high-performance environments. By enabling greater degrees of parallelism, these technologies significantly boost computational throughput. In this talk, we will explore the latest architectures that are driving the acceleration and growth in bioinformatics workflows.

Thursday, October 3rd
9:10-9:15
Welcome - Day 3
Format: In person

Moderator(s): Ferhat Ay


Authors List: Show

9:15-10:15
Invited Presentation: RSG Keynote 3 - Integration and exploration of the BRAIN Initiative data to elucidate impacts of 3D chromatin at the single cell level
Confirmed Presenter: Sunduz Keles

Format: In Person

Moderator(s): Ferhat Ay


Authors List: Show

  • Sunduz Keles

Presentation Overview: Show

The advent of single-cell technologies (e.g., sc/snRNA-seq, snATAC-seq, sn-mC-seq) enhanced the resolution to understand how genetic and environmental factors alter transcriptomic and epigenomic profiles of cells. These developments are accompanied by advances in mapping of 3D genome (single cell high throughput chromatin conformation sequencing (scHi-C) technologies (sn-3C-seq, dip-C)) to elucidate the heterogeneity of long-range chromatin interactions that regulate molecular programs of the cells. Efforts within the Brain Initiative led to multi-modal single cell datasets that capture the transcriptomic and electrophysiologic profiles (Patch-seq) and morphologies of brain cells in addition to datasets profiling their 3D chromatin organizations and epigenomes. We developed two computational approaches that leverage these rich data from Brain Initiative and explore the impacts of 3D chromatin on cell-level phenotypes. The first approach, GLEAM, integrates multiple single-cell datasets with incomplete or mismatched modalities across cells using a graph neural network approach with link prediction and infers features of the missing modalities. Application of GLEAM highlights connections between 3D chromatin and electrophysiological properties of neurons. In the second approach, we leverage the single cell nature of the 3D chromatin interactions and infer multi-way chromatin interactions in the cells with MINTsC. Leveraging these interactions within the context of eQTL analysis of prefrontal cortex expression identifies epistasis effects of single nucleotide variants.

10:45-11:00
High-resolution profiling of the tumor microenvironment with spatial ecotypes
Confirmed Presenter: Wubing Zhang, Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA 94305, USA, United States

Format: In Person

Moderator(s): Ferhat Ay


Authors List: Show

  • Wubing Zhang, Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA 94305, USA, United States
  • Erin Brown, Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA 94305, USA, United States
  • Abul Usmani, Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA, United States
  • Hyun Soo Jeon, Department of Computer Science, Stanford University, Stanford, CA 94305, USA, United States
  • Janella Schwab, Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA 94305, USA, United States
  • Chloé Steen, Department of Medical Genetics, Institute of Clinical Medicine, Oslo University Hospital, Norway
  • Noah Earland, Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA, United States
  • Qingyuan Cai, School of Life Sciences, Peking University, Beijing 100871, China, China
  • Ryan Fields, Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110, USA, United States
  • David Chen, Division of Dermatology, Washington University School of Medicine, St. Louis, MO 63110, USA, United States
  • Aadel Chaudhuri, Department of Radiation Oncology, Mayo Clinic, Rochester, MN 55905, USA, United States
  • Aaron Newman, Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA 94305, USA, United States

Presentation Overview: Show

Multicellular programs in the tumor microenvironment (TME) drive cancer pathogenesis and response to therapy but remain challenging to identify and profile clinically. Here, we present Spatial EcoTyper, a multimodal machine learning framework for large-scale profiling of spatially dependent cell states and multicellular ecosystems, termed spatial ecotypes (SEs). By integrating five million single-cell and spot-level spatial transcriptomes from 10 diverse human neoplasms, including carcinomas and melanomas, we discovered nine SEs with broad conservation – each with unique biology, TME cell states, geospatial features, and clinical outcome associations. All SEs were well-validated in held-out tumors, whether profiled by spatial transcriptomics or single-cell RNA sequencing. By deconvolving SEs from nearly 1,200 bulk tumor RNA-seq profiles, we found striking associations between SE levels and response to immune checkpoint inhibitors (ICIs), outperforming previously described correlates of ICI response. Notably, baseline levels of a proinflammatory SE localized in the tumor core strongly forecasted ICI benefit, while an SE located at the tumor margin, characterized by myofibroblast and hypoxia-associated endothelial cells, portended ICI resistance. Our data reveal fundamental units of TME organization and demonstrate a platform for large-scale profiling of spatial cellular ecosystems in any tissue, with implications for improved risk stratification and therapy personalization.

11:00-11:15
Comparing Deconvolution and Label Transfer Methods for Spot-Based Spatial Transcriptomics Data to Optimize Experimental and Computational Practice
Confirmed Presenter: Chandra Sekhar Reddy Edula, Georgia Institute of Technology, United States

Format: In Person

Moderator(s): Ferhat Ay


Authors List: Show

  • Chandra Sekhar Reddy Edula, Georgia Institute of Technology, United States
  • Xiuwei Zhang, Georgia Institute of Technology, United States

Presentation Overview: Show

Spatial transcriptomics (ST) technologies enable detailed transcriptome profiling while preserving spatial context, offering insights into tissue architecture and transcriptional patterns. Spot-based ST data lacks single-cell resolution, challenging the identification of cellular heterogeneity. High-resolution wet-lab technologies exist but often at a higher cost, prompting experimentalists to question the necessary spot size. This decision partly depends on the computational tasks performed on ST data. Here, we focus on cell type annotation for spots, encompassing cell type deconvolution and label transfer analyses.
This study compares six methods—three deconvolution methods (Tangram, CARD, and cell2location) and three label transfer methods (scVI, SingleR, and SVM)—that use both spot-based ST and single-cell RNA sequencing (scRNA-seq) data for deconvoluting and labeling spots of varying sizes and average cell counts. We aim to provide practical guidelines for selecting spot sizes and the most appropriate method for specific requirements.
We simulated spots with varying sizes using labeled single-cell ST data. The evaluation included four real-world datasets (Mouse brain, Mouse embryo E16.5, Mouse Gastrulation E7.5, and Mouse olfactory bulb) and considered various metrics, resolutions, and spatial transcriptomics technologies. We evaluated the methods using traditional metrics (accuracy, cosine similarity, and Euclidean distance) and their adjusted versions incorporating an evenness measure to address prediction skew. Traditional metrics often fail to account for prediction diversity, favoring methods that predict all spots as a single cell type. To mitigate this, we incorporated an evenness score to penalize homogeneous predictions.
In this study, we consider two primary tasks: 1) obtaining proportions of cell types in each spot, and 2) assigning one label to each spot. For determining cell type proportions, we recommend using spots with 25+ cells to ensure accurate deconvolution, with Tangram and CARD being the best-performing methods. For the label transfer task, which involves assigning a single label to each spot, a spot size of 5-10 cells is optimal, balancing cost considerations with the resolution needed for effective spatial transcriptomics. Here, SingleR and Tangram emerge as the top-performing methods. These recommendations consider the cost of high-resolution spatial transcriptomics and the need for precise cellular composition data.
This study offers a comprehensive analysis of the strengths and limitations of existing methods for spot-based spatial transcriptomics, providing guidelines for selecting the optimal spot size and the best computational methods based on research needs and helping researchers make informed decisions tailored to their specific needs in spatial transcriptomics.

11:15-11:30
Spatial Regulatory Landscape of Glioblastoma Tumor Immune Microenvironment
Confirmed Presenter: Hatice Osmanbeyoglu, University of Pittsburgh, United States

Format: In Person

Moderator(s): Ferhat Ay


Authors List: Show

  • Linan Zhang, Ningbo University, China
  • Matthew Lu, University of Pittsburgh, United States
  • Hatice Osmanbeyoglu, University of Pittsburgh, United States

Presentation Overview: Show

Glioblastoma (GBM) stands as a formidable challenge in oncology due to its aggressive nature and limited treatment options. Accounting for 48 percent of all primary malignant brain tumors, GBM claims the lives of over 10,000 individuals annually in the United States. The tumor microenvironment (TME) plays a pivotal role in driving cancer phenotypes, including proliferation, invasion, metastasis, and drug resistance. Spatially transcriptomics (ST) technologies offer a promising avenue to dissect the complex interplay between tumor cells and their microenvironment.
In GBM, the interaction between tumor cells and the immune microenvironment significantly impacts disease outcomes. Transcription factors (TFs) are pivotal in regulating gene expression and orchestrating cellular responses. Ligands from the TME interact with receptors to drive complex signaling pathways, which in turn affect the activity of TFs and consequently influence tumor behavior. We recently developed STAN (Spatially informed Transcription Factor Activity Network), a computational method to predict spot-specific TF activities by utilizing spatial transcriptomics datasets and cis-regulatory information. Importantly, our approach provides a statistically principled framework for identifying TFs associated with cell types/states, spatial domains (e.g. germinal centers), pathological regions and ligand-receptor pairs. Here, we extend this approach to predict spatially informed spot-specific pathway activities (SPAN - Spatially informed Pathway Activity Network). Then, we applied STAN and SPAN to publicly available GBM ST datasets (n=26) to map the regulatory landscape and ligand-receptor interactions within the GBM TME. We unraveled the regulatory networks governing cell states and their associated ligand-receptor interactions in GBM TME. For example, our analysis revealed a significant correlation between STAN-predicted SOX2 activity and expression levels of CD44 receptors and VIM (vimentin) ligand across samples. We validated co-expression of SOX2, VIM, and CD44 at the protein level using independent glioblastoma patient specimens. This validation suggests promising biomedical applications, such as targeted therapies, and encourages further exploration of these regulators in glioblastoma. We are currently extending the validation of our predictions and developing a web server that will provide access to predicted TF and pathway activities, as well as ligand and receptor expression across samples.
Our study holds great promise for advancing our understanding of GBM biology, identifying novel therapeutic targets, and ultimately improving treatment strategies and patient outcomes for this devastating disease. Overall, STAN and SPAN are applicable to any disease type (e.g., autoimmune diseases) or biological system (e.g., immune system) and will enable researchers to leverage spot-based ST datasets to predict context-specific TF and pathway activities.

11:30-11:45
CellNeighborEx V2 identifies the influence of diverse cell-cell communication from Visium data
Confirmed Presenter: Kyoung Jae Won, Cedars-Sinai Medical Center, United States

Format: In Person

Moderator(s): Ferhat Ay


Authors List: Show

  • Hyobin Kim, Cedars-Sinai Medical Center, United States
  • Kyoung Jae Won, Cedars-Sinai Medical Center, United States

Presentation Overview: Show

Cells continuously interact with their microenvironment using various approaches including direct interaction. Direct interaction between different cell types can induce molecular signals that dictate lineage specification and cell fate decisions. However, current approached to study cell-cell interactions (CCIs) have relied on ligand-receptor co-expression and cannot study diverse mode of cell-cell interactions effectively.

Our previous work, CellNeighborEx, demonstrated that high-resolution SlideSeq data enables the identification of niche-specific genes influenced by neighboring cell types. These genes differed from those predictable through ligand-receptor interactions alone. For this, we used the heterotypic spots (the spots with transcriptome from two different cell types) to study cell direct contact by comparing the transcriptome of them with other homotypic beads. Though successful, it is still hard to study cell-cell interactions from Visium data beyond what ligand-receptor can dictate. It is because cell direct contact is not clearly observed in low dimensional Visium data.

To study the influence of CCIs for low resolution Visium data, we newly develop CellNeighborEx V2. We will identify genes influenced by diverse mode of CCI by comparing the expected values against the observed expressed values in Visium data. Using a regression model, we identify the cell types expressing the niche specific genes. By applying this method to artificial Visium data generated from high-resolution SlideSeq data, we demonstrate the ability of CellNeighborEx V2 to accurately detect both ligand-receptor and direct cell-contact associated genes. Furthermore, through the analysis of multiple public cancer datasets, we unveil a repertoire of novel niche-specific genes that have been overlooked by conventional ligand-receptor-centric approaches.

11:45-12:00
Spatial Deconvolution of Cell Types and Cell States at Scale Utilizing TACIT
Confirmed Presenter: Khoa Huynh, Virginia Commonwealth University, United States

Format: In Person

Moderator(s): Ferhat Ay


Authors List: Show

  • Khoa Huynh, Virginia Commonwealth University, United States
  • Bruno Matuck, ADA Science & Research Institute, Gaithersburg, MD, USA, United States
  • Katarzyna Tyc, Virginia Commonwealth University, United States
  • Kevin Byrd, ADA Science & Research Institute, Gaithersburg, MD, USA, United States
  • Jinze Liu, Virginia Commonwealth University, United States

Presentation Overview: Show

Identifying cell types and states remains a time-consuming and error-prone challenge for spatial biology. While deep learning is increasingly used, it is difficult to generalize due to variability at the level of cells, neighborhoods, and niches in health and disease. To address this, we developed TACIT, an unsupervised algorithm for cell annotation using predefined signatures that operates without training data, using unbiased thresholding to distinguish positive cells from background, focusing on relevant markers to identify ambiguous cells in multiomic assays. Using five datasets (5,000,000-cells; 51-cell types) from three niches (brain, intestine, gland), TACIT outperformed existing unsupervised methods in accuracy and scalability. Integration of TACIT-identified cell with a novel Shiny app revealed new phenotypes in two inflammatory gland diseases. Finally, using combined spatial transcriptomics and proteomics, we discover under- and overrepresented immune cell types and states in regions of interest, suggesting multimodality is essential for translating spatial biology to clinical applications.

14:00-15:00
Invited Presentation: RSG Keynote 4 - Understanding the sequence and chromatin determinants of transcription factor binding specificity
Confirmed Presenter: Shaun Mahony

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Shaun Mahony

Presentation Overview: Show

The DNA-binding activities of transcription factors (TFs) are influenced by both intrinsic sequence preferences and extrinsic interactions with cell-specific chromatin landscapes and other regulatory proteins. Disentangling the roles of sequence and chromatin TF binding determinants remains challenging. For example, the FoxA subfamily of Forkhead domain (Fox) TFs are known pioneer factors that can bind to relatively inaccessible sites during development. Yet FoxA TF binding also varies across cell types, pointing to a combination of intrinsic and extrinsic forces guiding their binding. In general, how sequence and chromatin features combine to influence the DNA-binding activities of TFs has not been systematically characterized. Here, we present a principled approach to compare the relative contributions of intrinsic DNA sequence preference and cell-specific chromatin environments to a TF's DNA-binding activities. We apply our approach to investigate how a selection of Fox TFs vary in their binding specificity. By applying neural networks to interpret the TF binding patterns, we evaluate how sequence and preexisting chromatin features jointly contribute to induced TF binding. We demonstrate that Fox TFs bind different DNA targets, and drive differential gene expression patterns, even when induced in identical chromatin settings. Despite the association between Forkhead domains and pioneering activities, the selected Fox TFs display a wide range of affinities for preexiting chromatin states. Using sequence and chromatin feature attribution techniques to interpret the neural network predictions, we show that differential sequence preferences combined with differential abilities to engage relatively inaccessible chromatin together explain Fox TF binding patterns at individual sites and genome-wide.

15:00-15:15
Utilizing Convolutional Neural Networks to Predict CRISPRi Efficacy
Confirmed Presenter: Renee Napoliello, University of California, Davis, United States

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Renee Napoliello, University of California, Davis, United States
  • Lucas Ferns, University of California, Davis, United States
  • Hongru Hu, University of California, Davis, United States
  • Lacey Walker, University of California, Davis, United States
  • Gerald Quon, University of California, Davis, United States

Presentation Overview: Show

CRISPRi systems are powerful molecular tools for transcript silencing, yet efficacy varies widely due to complex factors that drive transcriptional regulation. Such factors include promoter epigenetic landscape, sequence-based factors, and gRNA design. Understanding the determinants of CRISPRi perturbation will significantly aid research into universal “rulesets” governing promoters. Existing computational approaches exploring the contributions of some of these factors to CRISPRi-based transcriptional silencing and activation have focused on a handful of hand-engineered genomic features. Here we use a CNN-based framework to predict CRISPRi genome-wide gRNA activity, simultaneously leveraging promoter DNA sequence, epigenetic signals, and gRNA sequence.

Using CRISPRi genome-wide screens from two studies, we constructed a series of models to identify the optimal representation of genomic information from both the target locus and the guide RNA. We first explored the best design principles for the target genomic locus. Using only the gRNA location and transcription start site (TSS) location, the model predicted held-out gRNA activity with a Spearman correlation of 0.4, suggesting gRNA location has strong influence over inhibitory activity. We also found that centering our target locus on the gRNA, together with providing an explicit representation of the location of the transcript, significantly outperformed centering the target locus on the transcription start site, as often is done for models that predict gene expression levels from sequence.

We also identified gRNA features that influenced gRNA activity. For convolutional input we represented the full gRNA, i.e. the protospacer and the scaffold. As features we included sequence, opening energy, secondary structure and overall minimum free energy. Guide features alone were significantly predictive of gRNA activity (Spearman correlation of ~0.5), suggesting gRNA efficacy has some degree of independence from target locus context. We found protospacer sequence was most informative of gRNA activity, while secondary structure/opening energy/minimum free energy alone did not add significant predictive power over protospacer sequence alone. Feature attribution analysis indicated heavy emphasis on protospacer features, particularly PAM-adjacent nucleotides. Specific scaffold sub-region opening energy/structure also exhibited strong gradient effect, which has not been reported previously. These results support our hypothesis that gRNA content and kinetics have impact on guide performance; and incorporation of scaffold is necessary for a complete assessment.
We anticipate further work exploring scPERTURB-seq datasets will enhance our understanding of CRISPRi mechanisms and significantly improve gRNA design for epigenetic interventions.

15:15-15:30
Advancing Cancer Prognosis: FFPE-CUTAC for Comprehensive and Affordable Transcription Profiling in Oncology
Confirmed Presenter: Ye Zheng, The University of Texas MD Anderson Cancer Center, United States

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Ye Zheng, The University of Texas MD Anderson Cancer Center, United States
  • Steven Henikoff, Fred Hutchinson Cancer Center, United States
  • Kami Ahmad, Fred Hutchinson Cancer Center, United States

Presentation Overview: Show

RNA-seq has been the method of choice for profiling patient samples. However, it predominantly detects genes expressed at moderate-to-high levels and is limited by the fragility of RNA, especially in Formalin-Fixed Paraffin-Embedded (FFPE) sections. We introduce Cleavage Under Targeted Accessible Chromatin (CUTAC) with an antibody to RNA Polymerase II (RNAPII) as a sensitive, cost-effective alternative for directly profiling transcription in FFPEs. Our findings reveal that FFPE-CUTAC libraries are significantly more comprehensive than RNA-seq libraries from both frozen and FFPE clinical sections, capturing low-level transcription of most active genes often missed by RNA-seq. Integrating RNAPII FFPE-CUTAC data from meningioma tumor samples with RNA-seq data from 1,298 patients, we demonstrate that RNAPII occupancy correlates strongly with transcriptional output. Embedding RNAPII FFPE-CUTAC datasets within a meningioma RNA-seq UMAP enhances the prediction of poor clinical outcomes, offering a powerful and economical replacement for RNA-seq in general cancer diagnosis.

Moreover, we observed widespread hypertranscription in human cancers, which correlates with poor prognosis. Using FFPE-CUTAC, we demonstrated genome-wide hypertranscription in transgene-driven mouse gliomas and various human tumors at active regulatory elements. Notably, hypertranscription challenges most normalization strategies that assume equal fragment volumes across samples. Tumor burden varies greatly across patients and tumor types, with significantly more cells in tumors compared to their corresponding normal samples. Within tumor cells, hypertranscription results in substantially higher FFPE-CUTAC fragment count. Assuming equal reads between tumor and normal samples can obscure hypertranscription signals. To address this, we developed a background-driven normalization strategy for FFPE-CUTAC data, accounting for the variations of cell numbers and fragment volumes to ensure accurate comparison across tumor and normal samples. We validated this strategy using both single-cell simulated data and specially designed experiments with known cell proportions and signal fold changes.

Our experimental and computational results affirm that FFPE-CUTAC provides a sensitive and affordable approach for detecting hypertranscription and classifying tumors using small tissue sections, advancing genome-wide strategies for personalized medicine.

15:30-15:45
SCOTCH: A tri-factorization approach for identification of cell states and driving genomic features from single cell omic data
Confirmed Presenter: Spencer Halberg-Spencer, University of Wisconsin-Madison, United States

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Spencer Halberg-Spencer, University of Wisconsin-Madison, United States
  • Harmon Bhasin, University of Wisconsin-Madison, United States
  • Katherine Mueller, University of Wisconsin-Madison, United States
  • Junha Shin, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Sunnie Grace McCalla, University of Wisconsin-Madison, United States
  • Elizabeth Capowski, University of Wisconsin-Madison, United States
  • Stefan Pietrzak, University of Wisconsin-Madison, United States
  • David Gamm, University of Wisconsin-Madison, United States
  • Kris Saha, University of Wisconsin-Madison, United States
  • Rupa Sridharan, University of Wisconsin-Madison, United States
  • Sushmita Roy, University of Wisconsin-Madison, United States

Presentation Overview: Show

Single cell genomics allows researchers to capture high-dimensional genomic feature profiles which characterize each cell. These data have transformed our ability to study heterogeneous populations of cells from diverse tissue, disease, and developmental contexts. Current pipelines for analyzing these data begin by first clustering cells and then performing differential analysis to identify which genomic features (e.g. genes or genomic loci) are cell cluster specific. While differential genomic features are sufficient to identify well characterized cell types from established platforms, this approach may not capture meaningful differences in more challenging systems such as patient-derived organoids. Simultaneous identification of cell clusters and discriminative feature sets provide a more direct approach of key molecular drivers of cellular state, which can be used for cell type annotation.

To address this problem, we developed single-cell orthogonal tri-factorization for clustering high-dimensional data (SCOTCH), which allows for simultaneous identification of cell clusters and genomic feature modules. SCOTCH is a broadly applicable method that can characterize cell types using either single-cell RNA-seq (scRNA-seq) or single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq). We performed comprehensive benchmarking to access SCOTCH’s ability to cluster cells compared to common scRNA-seq pipelines including scVI, Seurat, and scNMF using two labeled peripheral blood mononuclear cell (PBMC) scRNA-seq datasets. For each method, we compared the ability to recover the original PBMC labels using Adjusted Rand Index as well as the clustering quality using inverse Davis-Bouldin Index and Silhouette Index. SCOTCH clustered cells as well as the other methods, while also simultaneously identifying gene modules which define each of these cell types. With our gene modules we were able to characterize and annotate each cell type directly and more accurately then by using differential expression analysis. Next, we applied SCOTCH on scATAC-seq data to define cell clusters and groups of accessible chromatin features from a labeled PBMC scATAC-seq data. SCOTCH clustered the scATAC-seq data well compared to common pipelines, ArchR and cisTopic. We utilized modules of accessible gene transcription start sites to characterize and annotate different cell types within the PMBC datasets. We further identified patterns in accessibility across non-genic regions that were both unique and shared between different cell types. Our benchmarking results show that SCOTCH performs comparably to many state-of-the-art pipeline for clustering cells, while providing immediately interpretable gene or accessible chromatin modules which define each cluster's phenotypical state.

15:45-16:00
Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport (PILOT)
Confirmed Presenter: Mina Shaigan, Institute for Computational Genomics, RWTH University Hospital, Germany

Format: Live Stream

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Mina Shaigan, Institute for Computational Genomics, RWTH University Hospital, Germany
  • Mehdi Joodaki, Institute for Computational Genomics, RWTH University Hospital, Germany
  • Victor Parra, Institute for Computational Genomics, RWTH University Hospital, Germany
  • Roman D. Bülow, Institute of PathologyRWTH Aachen University Medical School Aachen Germany, Germany
  • Christoph Kuppe, Institute of Experimental Medicine and Systems BiologyRWTH Aachen University Aachen Germany, Germany
  • David L. Hölscher, Institute of PathologyRWTH Aachen University Medical School Aachen Germany, Germany
  • Mingbo Cheng, Institute for Computational Genomics, RWTH University Hospital, Germany
  • James S. Nagai, Institute for Computational Genomics, RWTH University Hospital, Germany
  • Michaël Goedertier, Institute for Computational Genomics, RWTH University Hospital, Germany
  • Nassim Bouteldja, Institute of PathologyRWTH Aachen University Medical School Aachen Germany, Germany
  • Vladimir Tesar, Department of Nephrology, 1st Faculty of Medicine Charles University Prague Czech Republic, Czechia
  • Jonathan Barratt, John Walls Renal UnitUniversity Hospital of Leicester National Health Service Trust Leicester UK, United Kingdom
  • Ian S.D. Roberts, Oxford University Hospitals National Health Services Foundation Trust Oxford UK, United Kingdom
  • Rosanna Coppo, Fondazione Ricerca MolinetteRegina Margherita Children’s University Hospital Torino Italy, Italy
  • Rafael Kramann, Institute of Experimental Medicine and Systems BiologyRWTH Aachen University Aachen Germany, Germany
  • Peter Boor, Institute of PathologyRWTH Aachen University Medical School Aachen Germany, Germany
  • Ivan Gesteira Costa, Institute for Computational Genomics, RWTH University Hospital, Germany

Presentation Overview: Show

Single-cell genomics and spatial transcriptome profiling techniques provide an unprecedented opportunity to understand phase transitions and disease progression across samples by capturing molecular, cellular, and structural changes in human tissues. These techniques enable the identification of patients at the early stages of disease and the discovery of potential molecular markers for diagnosis, which can be explored for patient stratification and personalized treatments.
However, deep molecular profiling techniques are multiscale, meaning the data measured for one patient is represented by a distribution of cells, each in a distinct molecular state and cell type. To estimate the similarity of these multiscale molecular profiles between two patients, distinct computational methods and algorithms are required.
Here, we developed PILOT, an interpretable machine learning model for the analysis of multiscale spatial or single-cell data across samples. Using optimal transport, we compute the Wasserstein distance between two individual single-cell samples to define disease trajectories. We then use non-linear regression models to characterize the molecular and cellular changes driving disease trajectories. To lessen the effect of outliers and capture trends in cell clusters, genes, or morphological features through the estimated disease progression, we utilize a robust regression method called Huber regression. This approach can be considered a generalization of cluster-based discrete expression patterns along disease progression, allowing us to detect significant signaling pathways corresponding to each pattern. Additionally, to detect cell type-specific markers, we use the Wald statistical test to identify distinct differential expression patterns for one cell type versus others.
Finally, we use PILOT to analyze complex, clinically relevant deep molecular profiling data. By analyzing the myocardial infarction spatial atlas, we can recapitulate cellular and molecular changes related to tissue remodeling in patients with ischemia. Additionally, we apply PILOT to digital pathology data, where we have shown that PILOT trajectories can predict patients at risk of kidney failure.

16:30-16:45
Novel mechanism-centric network-based algorithm identifies molecular programs of treatment resistance in cancer
Confirmed Presenter: Sukanya Panja, Rutgers Health, SHP; New York University, United States

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Sukanya Panja, Rutgers Health, SHP; New York University, United States
  • Mihai Truica, Northwestern University Feinberg School of Medicine, United States
  • Christina Yu, Rutgers Health, SHP, United States
  • Vamshi Saggurthi, Rutgers Health, SHP, United States
  • Michael W. Craige, Rutgers Health, SHP, United States
  • Katie Whitehead, Rutgers Health, SHP, United States
  • Mayra Tuiche, Rutgers Health, SHP, United States
  • Aymen Alsaadi, Rutgers School of Engineering, United States
  • Riddhi Vyas, Rutgers Health, SHP, United States
  • Shridar Ganesan, Rutgers Cancer Institute of New Jersey, United States
  • Suril Gohel, Rutgers Health, SHP, United States
  • Frederick Coffman, Rutgers Health, SHP, United States
  • James Scott Parrott, Rutgers Health, SHP, United States
  • Songhua Quan, Northwestern University Feinberg School of Medicine, United States
  • Shantenu Jha, Rutgers School of Engineering, United States
  • Isaac Kim, Yale School of Medicine, United States
  • Edward Schaeffer, Northwestern University Feinberg School of Medicine, United States
  • Vishal Kothari, Northwestern University Feinberg School of Medicine, United States
  • Sarki Abdulkadir, Northwestern University Feinberg School of Medicine, United States
  • Antonina Mitrofanova, Rutgers University, United States

Presentation Overview: Show

We have developed a novel computational algorithm TR-2-PATH that reconstructs first-of-its kind mechanism-centric regulatory network, which connects molecular pathways to their upstream transcriptional regulatory programs, and prioritizes them as markers of therapeutic resistance in cancer. Such network offers a novel way to identify biomarkers that are mechanisms-centric, rather than based on individual genes or alterations - a new way to identify functional interactions and valuable therapeutic targets. As a proof of concept, we have applied TR-2-PATH to metastatic castration-resistant prostate cancer (mCRPC). Network mining step addressed a knowledge gap of multi-collinearity among upstream transcriptional regulators (TRs) and identified TR groups that collaborate to regulate downstream pathways. Interrogating this network with signatures of resistance to Enzalutamide, a second-generation androgen-deprivation drug commonly administered to mCRPC, identified a collaboration between NME2 TR program and MYC molecular pathways as a biomarker of primary resistance to Enzalutamide. In vitro and in vivo experimental validation confirmed cooperation of these mechanisms and demonstrated that their joined therapeutic targeting is not only effective to prevent resistance to Enzalutamide, but also re-sensitizes Enzalutamide resistant tumors in vivo, allowing Enzalutamide to work longer. We propose to use MYC and NME2 as markers to identify patients at risk of Enzalutamide resistance and as effective therapeutic targets for patients that failed Enzalutamide. Our novel algorithm is generalizable and could be applied to study a multitude of biologically and clinically important questions, including (but not limited to) therapeutic resistance, metastatic progression, tumor heterogeneity and plasticity across cancer types and in other diseases. This work was published in Nature Communications in January 2024.

16:45-17:00
VAPOR: Variational autoencoder with transport operators decouples concurrent biological processes in development
Confirmed Presenter: Jie Sheng, University of Wisconsin - Madison, United States

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Jie Sheng, University of Wisconsin - Madison, United States
  • Daifeng Wang, University of Wisconsin - Madison, United States

Presentation Overview: Show

Background: Emerging single cell and spatial transcriptomic data enable the investigation of gene expression dynamics across various biological processes in development. To this end, existing computational methods typically infer trajectories that sequentially order cells for revealing gene expression changes in development, e.g., to assign a pseudotime to each cell indicating the ordering. However, these trajectories can mix different biological processes that cells undergo simultaneously–such as maturation for specialized function and differentiation into specific cell types–which do not occur on the same timescale. Therefore, a single pseudotime axis may not distinguish gene expression dynamics from concurrent processes.

Methods: We introduce a method, variational autoencoder with transport operators (VAPOR), to decouple dynamic patterns from developmental gene expression data. Particularly, VAPOR learns a latent space for gene expression dynamics and decomposes the space into multiple subspaces. The dynamics on each subspace is governed by an ordinary differential equation model, attempting to recapitulate specific biological processes. Furthermore, we can infer the subspace-specific pseudotimes, revealing multidimensional timescales of distinct processes that cells involve during development.

Results: Initially tested on synthetic datasets, VAPOR effectively recovered the topology and decoupled distinct dynamic patterns in the data. We then applied VAPOR to a developmental human brain scRNA-seq dataset across postconceptional weeks and identified gene expression dynamics for several key processes, such as differentiation and maturation. Additionally, we applied VAPOR to spatial transcriptomics data in human dorsolateral pre-frontal cortex. VAPOR captured the 'inside-out' pattern across cortical layers, potentially revealing the order in which layers were formed, characterized by their gene expression dynamics.

Conclusion: VAPOR is a new method to parameterize and infer developmental gene expression dynamics. In addition, it can be generalized for other single-cell and spatial omics such as chromatin accessibility to reveal developmental epigenomic dynamics.

17:00-17:15
Accelerating high-throughput characterization of regulatory variants with deep learning
Confirmed Presenter: Pyaree Mohan Dash, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany, Germany

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Pyaree Mohan Dash, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany, Germany
  • Chengyu Deng, Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA, United States
  • Jay Shendure, Department of Genome Sciences, University of Washington, Seattle, WA, USA, United States
  • Nadav Ahituv, Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA, United States
  • Max Schubach, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany, Germany
  • Martin Kircher, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany, Germany

Presentation Overview: Show

The need for functional characterization of regulatory elements and variants residing in the non-coding genome has given rise to high-throughput assays such as Massively Parallel Reporter Assays (MPRAs). As part of the NIH/NHGRI Impact of Genomic Variation on Function (IGVF) consortium, we aim to quantify activities of over a million regulatory elements and/or variants across different cellular systems.
Here, we characterized >160,000 variants falling within >50,000 cis-regulatory elements (CREs) by testing their MPRA activity in two human cell-types. Our candidate CREs include variants prioritized for cell-type-specific or agnostic effects from a convolutional neural network (CNN) model. Our model was pretrained on open and closed chromatin regions, circumventing the common shortage of quantitative effect readouts either limited to specific loci or standing variation. With an average library complexity of 24M constructs in our MPRA in HepG2 and HEK293T cells, we observed high reproducibility between three replicates.
The resulting dataset was utilized for re-training our CNN model as a multi-task regressor for prediction of MPRA activity. On hold-out data, our model achieved a state-of-the-art Pearson correlation of 65% in HepG2 and 63% in HEK293T cells. We further used variant cCREs from our library to compute variant effects, which yielded up to 36% Pearson correlation with the model predictions. Benchmarking with large models such as Enformer or DeepSEA Sei, our model outperformed these models by 60% on variant effect prediction while offering much faster computation due to the smaller sequence context. From model interpretation of our CNN, we identified transcription factors (TF) motifs for HNF4A and HNF1B specific to HepG2, and SP1, FOSL2, NFYC, USF2 and NRF1 as generally expressing TFs in both cell-types. Even though the allelic difference in our MPRA library were not designed deliberately to include standing variation, we observe some high effect alleles as singleton or rare variants (allele frequency < 1%) in gnomAD.
Further analysis of high effect variants within specific TF motifs will be crucial for improving prediction of regulation of gene expression. We are currently integrating additional functional data and predictions that map CREs to their regulatory genes to gain insight into enhancer-gene regulatory effects. We believe that the models and datasets generated here will be an important puzzle piece in the genome-wide interpretation of functional variants.

17:15-17:30
Cell signaling networks discovery from multi-modal data
Confirmed Presenter: Claire Simpson, Cell Signaling Technology, United States

Format: In Person

Moderator(s): Aly Azeem Khan


Authors List: Show

  • Claire Simpson, Cell Signaling Technology, United States
  • Changhan He, University of California, Irvine, United States
  • Ian Cossentino, Cell Signaling Technology, United States
  • Bin Zhang, Cell Signaling Technology, United States
  • Sasha Tkachev, Cell Signaling Technology, United States
  • Tyler Levy, Cell Signaling Technology, United States
  • Anthony Possemato, Cell Signaling Technology, United States
  • Ivan Gregoretti, Cell Signaling Technology, United States
  • Marco Colonna, Washington University School of Medicine, United States
  • Klarisa Rikova, Cell Signaling Technology, United States
  • Qing Nie, University of California, Irvine, United States
  • Darya Orlova, Cell Signaling Technology, United States

Presentation Overview: Show

Deciphering cell signaling networks is crucial for advancing our understanding of basic biology, disease mechanisms, and developing innovative therapeutic interventions. Recent advancements in multi-omics technologies enable us to capture cell signaling information in a more meaningful context. Nevertheless, omics data is complex—it is high-dimensional, heterogeneous, and extensive—making it challenging for the human brain to readily process.

The current landscape of computational tools that can infer cell signaling networks from multi-omics data independently from prior knowledge is very scarce, creating an acute need for developing such methods. Recent advances in this area have been methods involving deep learning approaches (such as MultiVI, scMoGNN, scDART) that can effectively integrate and leverage information from multiple data modalities simultaneously. These methods focus on constructing joint representations that capture information from multiple modalities (such as scRNA-seq, scATAC-seq, etc). However, these approaches do not construct a cell signaling network. Other methods that are independent of prior knowledge of signaling pathways and can be applied to multi-modal data are those based on correlation and stoichiometry scores, such as WGCNA and DMPA, but high stoichiometry scores might be misinterpreted as strong associations, even if the underlying biological relevance is weak.

To address this problem we developed a method called Incytr (Inference of Cell Signal Transmission) that enables the efficient discovery of cell signaling networks from a combination of diverse data modalities, including scRNAseq, proteomics, phosphoproteomics, and kinase-substrate specificity. The core of the method is developed from methods such as ExFINDER (He 2023), which identify cell-type-specific ligand-receptor-signaling molecule-target subpathway from scRNAseq data by conditioning gene co-expression by existing or newly generated knowledge of protein-protein interactions. Our method then calculates the differential expression of these subpathways between experimental groups and further optionally allows the user to incorporate additional data modalities. We provide instructions for generating simulated cell-type-specific proteomic intensities from a combination of scRNAseq data and bulk proteomic data, and the method uses the simulated cell-type-specific proteomic data to augment the differential analysis of subpathways. Phosphoproteomics is incorporated in an analogous manner. Additionally, kinase predictions on phosphopeptides identified in the phosphoproteomic data are made using the Kinase Library (Johnson 2023) and combined with kinase expression in scRNAseq to further augment the differential analysis.

We illustrate Incytr’s application in elucidating cell signaling in the contexts of Alzheimer's and cancer diseases. Incytr rediscovers known subpathways in these diseases and provides novel hypotheses for cell-type specific signaling pathways supported by multiple data modalities.