The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 24, 2025
8:40-9:20
Invited Presentation: “Gene regulation of human cell systems”
Confirmed Presenter: Roser Vento-Tormo, Wellcome Sanger Institute, United Kingdom
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Moderator(s): Annique Claringbould


Authors List: Show

  • Roser Vento-Tormo, Roser Vento-Tormo, Wellcome Sanger Institute

Presentation Overview:Show

“The study of human tissues requires a systems biology approach. Their development starts in utero and during adulthood, they change their organization and cell composition. Our team has integrated comprehensive maps of human developing and adult tissues generated by us and others using a combination of single-cell and spatial transcriptomics, chromatin accessibility assays and fluorescent microscopy. We utilise these maps to guide the development and interpretability of in vitro models. To do so, we develop and apply bioinformatic tools that allow us to quantitatively compare both systems and predict changes. ”

July 24, 2025
9:20-9:40
Proceedings Presentation: Anomaly Detection in Spatial Transcriptomics via Spatially Localized Density Comparison
Confirmed Presenter: Gary Hu, Princeton University, United States
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Annique Claringbould


Authors List: Show

  • Gary Hu, Gary Hu, Princeton University
  • Julian Gold, Julian Gold, Princeton University
  • Uthsav Chitra, Uthsav Chitra, Broad Institute of MIT and Harvard
  • Sunay Joshi, Sunay Joshi, University of Pennsylvania
  • Benjamin Raphael, Benjamin Raphael, Princeton University

Presentation Overview:Show

Motivation
Perturbations in biological tissues – e.g. due to inflammation, disease, or drug treatment – alter the composition of cell types and cell states in the tissue. These alterations are often spatially localized in different regions of a tissue, and can be measured using spatial transcriptomics technologies. However, current methods to analyze differential abundance in cell types or cell states, either do not incorporate spatial information – and thus cannot identify spatially localized alterations – or use heuristic and inaccurate approaches.

Results
We introduce Spatial Anomaly Region Detection in Expression Manifolds (Sardine), a method to estimate spatially localized changes in spatial transcriptomics data obtained from tissue slices from two or more conditions. Sardine estimates the probability of a cell state being at the same (relative) spatial location between different conditions using spatially localized density estimation. On simulated data, Sardine recapitulates the spatial patterning of expression changes more accurately than existing approaches. On a Visium dataset of the mouse cerebral cortex before and after injury response, as well as on a Visium dataset of a mouse spinal cord undergoing electrotherapy, Sardine identifies regions of spatially localized expression changes that are more biologically plausible than alternative approaches.

July 24, 2025
9:40-10:00
Flash Talk Session 2
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Annique Claringbould


Authors List: Show

  • Maxime Christophe
  • Gabriela A Merino
  • Erick Isaac Navarro Delgado
  • Tomas Rube

Presentation Overview:Show

Session with 4 short talks:
Maxime Christophe - Interpretable deep learning reveals sequence determinants of nucleosome positioning in mammalian genomes
Gabriela A Merino - Ensembl’s multispecies catalogue of regulatory elements
Erick Isaac Navarro Delgado - RAMEN: A reproducible framework for dissecting individual, additive and interactive gene-environment contributions in genomic regions with variable DNA methylation
Tomas Rube - Accurate affinity models for SH2 domains from peptide binding assays and free-energy regression

July 24, 2025
11:20-11:40
Proceedings Presentation: GASTON-Mix: a unified model of spatial gradients and domains using spatial mixture-of-experts
Confirmed Presenter: Uthsav Chitra, Princeton University, United States
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Anthony Mathelier


Authors List: Show

  • Uthsav Chitra, Uthsav Chitra, Princeton University
  • Shu Dan, Shu Dan, Princeton University
  • Fenna Krienen, Fenna Krienen, Princeton University
  • Ben Raphael, Ben Raphael, Princeton University

Presentation Overview:Show

Motivation: Gene expression varies across a tissue due to both the organization of the tissue into spatial domains, i.e. discrete regions of a tissue with distinct cell type composition, and continuous spatial gradients of gene expression within different spatial domains. Spatially resolved transcriptomics (SRT) technologies provide high-throughput measurements of gene expression in a tissue slice, enabling the characterization of spatial gradients and domains. However, existing computational methods for quantifying spatial variation in gene expression either model only spatial domains – and do not account for continuous gradients of expression – or require restrictive geometric assumptions on the spatial domains and spatial gradients that do not hold for many complex tissues.

Results: We introduce GASTON-Mix, a machine learning algorithm to identify both spatial domains and spatial gradients within each domain from SRT data. GASTON-Mix extends the mixture-of-experts (MoE) deep learning framework to a spatial MoE model, combining the clustering component of the MoE model with a neural field model that learns a separate 1-D coordinate (“isodepth”) within each domain. The spatial MoE is capable of representing any geometric arrangement of spatial domains in a tissue, and the isodepth coordinates define continuous gradients of gene expression within each domain. We show using simulations and real data that GASTON-Mix identifies spatial domains and spatial gradients of gene expression more accurately than existing methods. GASTON-Mix reveals spatial gradients in the striatum and lateral septum that regulate complex social behavior, and GASTON-Mix reveals localized spatial gradients of hypoxia and TNF-$alpha$ signaling in the tumor microenvironment.

July 24, 2025
11:40-12:00
Proceedings Presentation: Refinement Strategies for Tangram for Reliable Single-Cell to Spatial Mapping
Confirmed Presenter: Merle Stahl, Data Science in Systems Biology, TUM School of Life Sciences
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Anthony Mathelier


Authors List: Show

  • Merle Stahl, Merle Stahl, Data Science in Systems Biology
  • Lena J. Straßer, Lena J. Straßer, Data Science in Systems Biology
  • Chit Tong Lio, Chit Tong Lio, Data Science in Systems Biology
  • Judith Bernett, Judith Bernett, Data Science in Systems Biology
  • Richard Röttger, Richard Röttger, Department of Mathematics and Computer Science
  • Markus List, Markus List, Data Science in Systems Biology

Presentation Overview:Show

Motivation: Single-cell RNA sequencing (scRNA-seq) provides comprehensive gene expression data at a
single-cell level but lacks spatial context. In contrast, spatial transcriptomics captures both spatial and
transcriptional information but is limited by resolution, sensitivity, or feasibility. No single technology combines
both the high spatial resolution and deep transcriptomic profiling at the single-cell level without trade-offs.
Spatial mapping tools that integrate scRNA-seq and spatial transcriptomics data are crucial to bridge this gap.
However, we found that Tangram, one of the most prominent spatial mapping tools, provides inconsistent
results over repeated runs.
Results: We refine Tangram to achieve more consistent cell mappings and investigate the challenges that
arise from data characteristics. We find that the mapping quality depends on the gene expression sparsity.
To address this, we (1) train the model on an informative gene subset, (2) apply cell filtering, (3) introduce
several forms of regularization, and (4) incorporate neighborhood information. Evaluations on real and
simulated mouse datasets demonstrate that this approach improves both gene expression prediction and cell
mapping. Consistent cell mapping strengthens the reliability of the projection of cell annotations and features
into space, gene imputation, and correction of low-quality measurements. Our pipeline, which includes gene
set and hyperparameter selection, can serve as guidance for applying Tangram on other datasets, while our
benchmarking framework with data simulation and inconsistency metrics is useful for evaluating other tools
or Tangram modifications.
Availability: The refinements for Tangram and our benchmarking pipeline are available at https://github.
com/daisybio/Tangram_Refinement_Strategies.

July 24, 2025
12:00-12:20
Encoding single-cell chromatin landscapes as probability distributions with optimal transport
Confirmed Presenter: Cassandra Burdziak, Memorial Sloan Kettering Cancer Center, United States
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Anthony Mathelier


Authors List: Show

  • Cassandra Burdziak, Cassandra Burdziak, Memorial Sloan Kettering Cancer Center
  • Danielle Maydan, Danielle Maydan, Columbia University
  • Doron Haviv, Doron Haviv, Memorial Sloan Kettering Cancer Center
  • Marisa Mariani, Marisa Mariani, Memorial Sloan Kettering Cancer Center
  • Ronan Chaligne, Ronan Chaligne, Memorial Sloan Kettering Cancer Center
  • Dana Pe'Er, Dana Pe'Er, Memorial Sloan Kettering Cancer Center

Presentation Overview:Show

Single-cell measurement of paired epigenetic and transcriptomic features is becoming routine, and promises to license more sophisticated models of gene regulation. Still, most existing models are limited to the cis-regulatory element representation (typically, averaged signal at pre-defined accessibility “peaks”), which shrouds much of the chromatin molecule’s fine-grained structure. To maximize chromatin’s explanatory power for cell-state (and fate) prediction, we sought to achieve a more unbiased, quantitative representation of the chromatin molecule by treating the accessibility landscape as a discrete (per-base pair) probability distribution. Given single-cell accessibility data, our approach embeds the chromatin landscape of each cell state according to the optimal transport (OT) distance between the empirical distribution of accessibility at particular loci, whilst controlling for sequence-related biases in DNA tagmentation. The resulting embeddings capture the precise shape of the accessibility distribution, which itself reflects transcription factor binding footprints, nucleosome positions, and RNA polymerase movement. Application of this model in the well-studied hematopoiesis system highlights its superior ability to explain cell-state: the latent accessibility distribution is more universally predictive of gene expression than promoter accessibility, and can define transcription factor binding modes active in specific branches of development. Most excitingly, position in latent space may closely correspond with the presence of certain activating or repressive chromatin marks, despite the model lacking such information during training. This representation may thus empower future models of gene regulation with a richer representation of epigenetic data with stronger ties to cellular phenotypes.

July 24, 2025
12:20-12:40
scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution
Confirmed Presenter: Laura D. Martens, Technical University of Munich, Germany
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Anthony Mathelier


Authors List: Show

  • Johannes C. Hingerl, Johannes C. Hingerl, Technical University of Munich
  • Laura D. Martens, Laura D. Martens, Technical University of Munich
  • Alexander Karollus, Alexander Karollus, Technical University of Munich
  • Trevor Manz, Trevor Manz, Harvard Medical School
  • Jason D. Buenrostro, Jason D. Buenrostro, Harvard University
  • Fabian J. Theis, Fabian J. Theis, Helmholtz Center Munich
  • Julien Gagneur, Julien Gagneur, Technical University of Munich

Presentation Overview:Show

Understanding how regulatory sequences shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build models capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, a framework to model genomic profiles of scRNA-seq coverage and scATAC-seq insertions from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi and equip it with a cell-specific decoder. Scooby recapitulates cell-specific expression levels of held-out genes and identifies regulators and their putative target genes. Moreover, scooby allows resolving single-cell effects of bulk eQTLs and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.

July 24, 2025
12:40-13:00
Proceedings Presentation: Soffritto: a deep-learning model for predicting high-resolution replication timing
Confirmed Presenter: Dante Bolzan, La Jolla Institute for Immunology, United States
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Anthony Mathelier


Authors List: Show

  • Dante Bolzan, Dante Bolzan, La Jolla Institute for Immunology
  • Ferhat Ay, Ferhat Ay, La Jolla Institute for Immunology

Presentation Overview:Show

Motivation: Replication Timing (RT) refers to the order by which DNA loci are replicated during S phase. RT is cell-type specific and implicated in cellular processes including transcription, differentiation, and disease. RT is typically quantified genome-wide using two-fraction assays (e.g., Repli-Seq) which sort cells into early and late S phase fractions followed by DNA sequencing yielding a ratio as the RT signal. While two-fraction RT data is widely available in multiple cell lines, it is limited in its ability to capture high-resolution RT features. To address this, high-resolution Repli-Seq, which quantifies RT across 16 fractions, was developed, but it is costly and technically challenging with very limited data generated to date.
Results: Here we developed Soffritto, a deep learning model that predicts high-resolution RT data using two-fraction RT data, histone ChIP-seq data, GC content, and gene density as input. Soffritto is composed of a Long Short Term Memory (LSTM) module and a prediction module. The LSTM module learns long- and short-range interactions between genomic bins while the prediction module is composed of a fully connected layer that outputs a 16-fraction probability vector for each bin using the LSTM module’s embeddings as input. By performing both within cell line and cross cell line training and testing for five human and mouse cell lines, we show that Soffritto is able to capture experimental 16-fraction RT signals with high accuracy and the predicted signals allow detection of high-resolution RT patterns.

July 24, 2025
14:00-14:20
Proceedings Presentation: Detection of Cell-type-specific Differentially Methylated Regions in Epigenome-Wide Association Studies
Confirmed Presenter: Yingying Wei, The Chinese University of Hong Kong, Hong Kong
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Marcel Schulz


Authors List: Show

  • Ruofan Jia, Ruofan Jia, The Chinese University of Hong Kong
  • Yingying Wei, Yingying Wei, The Chinese University of Hong Kong

Presentation Overview:Show

DNA methylation at cytosine-phosphate-guanine (CpG) sites is one of the most important epigenetic markers. Therefore, epidemiologists are interested in investigating DNA methylation in large cohorts through epigenome-wide association studies (EWAS). However, the observed EWAS data are bulk data with signals aggregated from distinct cell types. Deconvolution of cell-type-specific signals from EWAS data is challenging because phenotypes can affect both cell-type proportions and cell-type-specific methylation levels. Recently, there has been active research on detecting cell-type-specific risk CpG sites for EWAS data. However, since existing methods all assume that the methylation levels of different CpG sites are independent and perform association detection for each CpG site separately, although they significantly improve the detection at the aggregated-level−identifying a CpG site as a risk CpG site as long as it is associated with the phenotype in any cell type, they have low power in detecting cell-type-specific associations for EWAS with typical sample sizes. Here, we develop a new method, Fine-scale inference for Differentially Methylated Regions (FineDMR), to borrow strengths of nearby CpG sites to improve the cell-type-specific association detection. Via a Bayesian hierarchical model built upon Gaussian process functional regression, FineDMR takes advantage of the spatial dependencies between CpG sites. FineDMR can provide cell-type-specific association detection as well as output subject-specific and cell-type-specific methylation profiles for each subject. Simulation studies and real data analysis show that FineDMR substantially improves the power in detecting cell-type-specific associations for EWAS data. FineDMR is freely available at https://github.com/JiaRuofan/Detection-of-Cell-type-specific-DMRs-in-EWAS.

July 24, 2025
14:20-14:40
Proceedings Presentation: MutBERT: Probabilistic Genome Representation Improves Genomics Foundation Models
Confirmed Presenter: Weicai Long, Hong Kong University of Science and Technology (Guangzhou), China
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Marcel Schulz


Authors List: Show

  • Weicai Long, Weicai Long, Hong Kong University of Science and Technology (Guangzhou)
  • Houcheng Su, Houcheng Su, Hong Kong University of Science and Technology (Guangzhou)
  • Jiaqi Xiong, Jiaqi Xiong, Hong Kong University of Science and Technology (Guangzhou)
  • Yanlin Zhang, Yanlin Zhang, Hong Kong University of Science and Technology (Guangzhou)

Presentation Overview:Show

Motivation: Understanding the genomic foundation of human diversity and disease requires models that effectively capture sequence variation, such as single nucleotide polymorphisms (SNPs). While recent genomic foundation models have scaled to larger datasets and multi-species inputs, they often fail to account for the sparsity and redundancy inherent in human population data, such as those in the 1000 Genomes Project. SNPs are rare in humans, and current masked language models (MLMs) trained directly on whole-genome sequences may struggle to efficiently learn these variations. Additionally, training on the entire dataset without prioritizing regions of genetic variation results in inefficiencies and negligible gains in performance.
Results: We present MutBERT, a probabilistic genome-based masked language model that efficiently utilizes SNP information from population-scale genomic data. By representing the entire genome as a probabilistic distribution over observed allele frequencies, MutBERT focuses on informative genomic variations while maintaining computational efficiency. We evaluated MutBERT against DNABERT-2, various versions of Nucleotide Transformer, and modified versions of MutBERT across multiple downstream prediction tasks. MutBERT consistently ranked as one of the top-performing models, demonstrating that this novel representation strategy enables better utilization of biobank-scale genomic data in building pretrained genomic foundation models.
Availability: https://github.com/ai4nucleome/mutBERT
Contact: yanlinzhang@hkust-gz.edu.cn

July 24, 2025
14:40-15:00
Detecting and avoiding homology-based data leakage in genome-trained sequence models
Confirmed Presenter: Abdul Muntakim Rafi, University of British Columbia, Canada
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Marcel Schulz


Authors List: Show

  • Abdul Muntakim Rafi, Abdul Muntakim Rafi, University of British Columbia
  • Brett Kiyota, Brett Kiyota, University of British Columbia
  • Nozomu Yachcie, Nozomu Yachcie, University of British Columbia
  • Carl de Boer, Carl de Boer, University of British Columbia

Presentation Overview:Show

Models that predict function from DNA sequence have become critical tools in deciphering the roles of genomic sequences and genetic variation within them. However, traditional approaches for dividing the genomic sequences into training data, used to create the model, and test data, used to determine the model’s performance on unseen data, fail to account for the widespread homology that permeates the genome. Using models that predict human gene expression from DNA sequence, we demonstrate that model performance on test sequences varies by their similarity with training sequences, consistent with homology-based ‘data leakage’ that influences model performance by rewarding overfitting of homologous sequences. Because the sequence and its function are inexorably linked, even a maximally overfit model with no understanding of gene regulation can predict the expression of sequences that are similar to its training data. To prevent leakage in genome-trained models, we introduce ‘hashFrag,' a scalable solution for partitioning data with minimal leakage. hashFrag improves estimates of model performance and can actually increase model performance by providing improved splits for model training. Altogether, we demonstrate how to account for homology based leakage when partitioning genomic sequences for model training and evaluation, and highlight the consequences of failing to do so.

July 24, 2025
15:00-15:20
Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic
Confirmed Presenter: Susanne Bornelöv, University of Cambridge, United Kingdom
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Marcel Schulz


Authors List: Show

  • Tirtharaj Dash, Tirtharaj Dash, University of Cambridge
  • Susanne Bornelöv, Susanne Bornelöv, University of Cambridge

Presentation Overview:Show

Gene expression is largely controlled by transcription factors and their binding and interactions in gene promoter regions. Early attempts to use deep learning to learn about this gene-regulatory logic were limited to training sets containing naturally occurring promoter sequences. However, using massive parallel reporter assays, potential training data can now be expanded by orders of magnitude, going beyond naturally occurring sequences. Nevertheless, a clear understanding of how to best use deep learning to study gene regulation is still lacking. Here we investigate the complex association between promoters and gene expression in S. cerevisiae using Camformer, a residual convolutional neural network that ranked 4th in the Random Promoter DREAM Challenge 2022. We explore the original Camformer model trained on 6.7 million random promoter sequences and investigate 270 alternative models to determine what factors contribute most to model performance. We show that Camformer accurately decodes the association between promoters and gene expression (r2 = 0.914 ± 0.003, ρ = 0.962 ± 0.002) and provides a substantial improvement over previous state of the art. Using explainable AI techniques, such as in silico mutagenesis, we demonstrate that the model learns both individual motifs and their hierarchy. For example, while an IME1 motif on its own increases gene expression, the co-occurrence of IME1 and UME6 motifs strongly reduces gene expression, beyond the repressive effect of UME6 on its own. Thus, we demonstrate that Camformer can be used to provide detailed insights into cis-regulatory logic.

July 24, 2025
15:20-16:00
Invited Presentation: Learning the Regulatory Genome by Destruction and Creation
Confirmed Presenter: Luca Pinello
Track: RegSys: Regulatory and Systems Genomics

Room: 11BC
Format: In person
Moderator(s): Alejandra Medina Rivera


Authors List: Show

  • Luca Pinello

Presentation Overview:Show

The regulatory genome operates through a complex DNA language that controls gene expression. In this keynote, I will present two complementary approaches to decode this language: learning by precise perturbation and learning by generative design.

First, I will introduce CRISPR-CLEAR, which combines dense base editing with sequencing of resulting mutations to map regulatory elements at single-nucleotide resolution. We systematically