Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Monday, July 21st
11:20-11:40
Proceedings Presentation: Harnessing Deep Learning for Proteome-Scale Detection of Amyloid Signaling Motifs
Confirmed Presenter: Witold Dyrka, Politechnika Wrocławska, Poland

Room: 02N
Format: In person


Authors List: Show

  • Krzysztof Pysz, Politechnika Wrocławska, Poland
  • Jakub Gałązka, Politechnika Wrocławska, Poland
  • Witold Dyrka, Politechnika Wrocławska, Poland

Presentation Overview: Show

Amyloid signaling sequences adopt the cross-β fold that is capable of self-replication in the templating process. Propagation of the amxyloid fold from the receptor to the effector protein is used for signal transduction in the immune response pathways in animals, fungi and bacteria. So far, a dozen of families of amyloid signaling motifs (ASMs) have been classified. Unfortunately, due to the wide variety of ASMs it is difficult to identify them in large protein databases available, which limits the possibility of conducting experimental studies. To date, various deep learning (DL) models have been applied across a range of protein-related tasks, including domain family classification and the prediction of protein structure and protein-protein interactions. In this study, we develop tailor-made bidirectional LSTM and BERT-based architectures to model ASM, and compare their performance against a state-of-the-art machine learning grammatical model. Our research is focused on developing a discriminative model of generalized amyloid signaling motifs, capable of detecting ASMs in large data sets. The DL-based models are trained on a diverse set of motif families and a global negative set, and used to identify ASMs from remotely related families. We analyze how both models represent the data and demonstrate that the DL-based approaches effectively detect ASMs, including novel motifs, even at the genome scale.

11:40-12:00
Proceedings Presentation: From High-Throughput Evaluation to Wet-Lab Studies: Advancing Mutation Effect Prediction with a Retrieval-Enhanced Model
Confirmed Presenter: Bingxin Zhou, Shanghai Jiao Tong University, China

Room: 02N
Format: In person


Authors List: Show

  • Yang Tan, East China University of Science and Technology, China
  • Ruilin Wang, East China University of Science and Technology, China
  • Banghao Wu, Shanghai Jiao Tong University, China
  • Liang Hong, Shanghai Jiao Tong University, China
  • Bingxin Zhou, Shanghai Jiao Tong University, China

Presentation Overview: Show

Enzyme engineering is a critical approach for producing enzymes that meet industrial and research demands by modifying wild-type proteins to enhance properties such as catalytic activity and thermostability. Beyond traditional methods like directed evolution and rational design, recent advancements in deep learning offer cost-effective and high-performance alternatives. By encoding implicit coevolutionary patterns, these pre-trained models have become powerful tools for mutation effect prediction, with the central challenge being to uncover the intricate relationships among protein sequence, structure, and function. In this study, we present VenusREM, a retrieval-enhanced protein language model designed to capture local amino acid interactions across both spatial and temporal scales. VenusREM achieves state-of-the-art performance on 217 assays from the ProteinGym benchmark. Beyond high-throughput open benchmark validations, we conducted a low-throughput post-hoc analysis on more than 30 mutants to verify the model’s ability to improve the stability and binding affinity of a VHH antibody. We also validated the practical effectiveness of VenusREM by designing 10 novel mutants of a DNA polymerase and performing wet-lab experiments to evaluate their enhanced activity at elevated temperatures. Both in silico and experimental evaluations not only confirm the reliability of VenusREM as a computational tool for enzyme engineering but also demonstrate a comprehensive evaluation framework for future computational studies in mutation effect prediction. The implementation is publicly available at https://github.com/tyang816/VenusREM.

12:00-12:20
BE3D: A Computational Workflow for Integrative Structure-Function Analysis of Base-Editor Tiling Mutagenesis Data
Confirmed Presenter: Yoochan Myung, Broad Institute of MIT and Harvard, United States

Room: 02N
Format: In person


Authors List: Show

  • Yoochan Myung, Broad Institute of MIT and Harvard, United States
  • Calvin Hu, Harvard University, United States
  • Surya Mani, Broad Institute of MIT and Harvard, United States
  • Annie Chen, Dana-Farber Cancer Institute, United States
  • Vivian Lu, Broad Institute of MIT and Harvard, United States
  • Brian Liau, Harvard University, United States
  • Guillaume Poncet-Montange, Broad Institute of MIT and Harvard, United States
  • Gabriel Griffin, Dana-Farber Cancer Institute, United States
  • Sumaiya Iqbal, Broad Institute of MIT and Harvard, United States

Presentation Overview: Show

Understanding functional consequences of single-nucleotide variants is critical for elucidating the genetic basis of diseases, yet current variant screening technologies have limitations. CRISPR base editors (BEs) efficiently generate transition mutations, enabling targeted variant screens. However, interpreting these screens in the context of protein structure-function relationships remains challenging due to technical constraints and biological variability. We introduce BE3D, an integrated workflow to systematically analyze BE tiling mutagenesis data within protein structural contexts. BE3D comprises three modules: (A) BE-QA, assessing screening quality based on biological hypotheses (e.g., knockout vs. neutral guides); (B) BE-Clust3D, identifying hits from BE screening with an expanded coverage using protein 3D structures and highlighting their clusters; and (C) BE-MetaClust3D, aggregating data from multiple screens, enhancing detection of functionally relevant sites across cell lines and species. Applying BE3D to published BE screens on DNMT3A and MEN1, we show that BE-Clust3D method increased the coverage of functional residues by integrating structural data, yielding up to 3.5-fold improved detection of critical domains in DNMT3A and highlighting crucial drug-binding MEN1 residues (e.g., Met327, Trp346), inaccessible and unidentifiable by Bes due to PAM limitations. Meta-aggregation of MEN1 BE screen readouts from two cell lines (MOLM-13, MV4-11) using BE-MetaClust3D further emphasized a drug-resistant mutational hotspot, achieving a stronger drug-binding site enrichment (3.43-fold) compared to individual screens (average odds ratio 2.2). In summary, BE3D is an open-source, scalable tool for integrative structure-function analysis and interpretation of BE tiling mutagenesis data (Github: https://github.com/broadinstitute/beclust3d-public). BE3D is expected to accelerate variant-to-function investigation and the discovery of drug-targetable sites.

12:20-12:40
Enhanced protein evolution with inverse folding models using structural and evolutionary constraints
Confirmed Presenter: Yunjia Li, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, China

Room: 02N
Format: In person


Authors List: Show

  • Yunjia Li, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, China
  • Hongyuan Fei, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, China
  • Caixia Gao, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, China

Presentation Overview: Show

Protein engineering enables artificial protein evolution through iterative sequence changes, but current methods often suffer from low success rates and limited cost-effectiveness. Here, we present AiCE (AI-informed Constraints for protein Engineering), an approach that facilitates efficient protein evolution using generic protein inverse folding models, reducing dependence on human heuristics and task-specific models. By sampling sequences from inverse folding models and integrating structural and evolutionary constraints, AiCE identifies high-fitness single- and multi-mutations. We applied AiCE to eight protein engineering tasks, including deaminases, a nuclear localization sequence, nucleases, and a reverse transcriptase, spanning proteins from tens to thousands of residues, with success rates of 11%-88%. We also developed base editors for precision medicine and agriculture, including enABE8e (5 bp window), enSdd6-CBE (1.3-fold improved fidelity), and enDdd1-DdCBE (up to 14.3-fold enhanced mitochondrial activity). These results demonstrate that AiCE is a versatile, user-friendly mutation-design method that outperforms conventional approaches in efficiency, scalability, and generalizability.

12:40-13:00
Proceedings Presentation: Precise Prediction of Hotspot Residues in Protein-RNA Complexes Using Graph Attention Networks and Pre-trained Protein Language Models
Confirmed Presenter: Siyuan Shen, Central South University, China

Room: 02N
Format: In person


Authors List: Show

  • Siyuan Shen, Central South University, China
  • Jie Chen, Xinjiang University, China
  • Zhijian Huang, Central South University, China
  • Yuanpeng Zhang, Xinjiang University, China
  • Ziyu Fan, Central South University, China
  • Yuting Kong, Xinjiang Institute of Engineering, China
  • Lei Deng, Central South University, China

Presentation Overview: Show

Motivation: Protein-RNA interactions play a pivotal role in biological processes and disease mechanisms, with hotspot residues being critical for targeted drug design. Traditional experimental methods for identifying hotspot residues are often inefficient and expensive. Moreover, many existing prediction methods rely heavily on high-resolution structural data, which may not always be available. Consequently, there is an urgent need for an accurate and efficient sequence-based computational approach for predicting hotspot residues in protein-RNA complexes.
Results: In this study, we introduce DeepHotResi, a sequence-based computational method designed to predict hotspot residues in protein-RNA complexes. DeepHotResi leverages a pre-trained protein language model to predict protein structure and generate an amino acid contact map. To enhance feature representation, DeepHotResi integrates the Squeeze-and-Excitation (SE) module, which processes diverse amino acid-level features. Next, it constructs an amino acid feature network from the contact map and SE-Module-derived features. Finally, DeepHotResi employs a Graph Attention Network (GAT) to model hotspot residue prediction as a graph node classification task. Experimental results demonstrate that DeepHotResi outperforms state-of-the-art methods, effectively identifying hotspot residues in protein-RNA complexes with superior accuracy on the test set.

14:00-14:20
Proceedings Presentation: Trustworthy Causal Biomarker Discovery: A Multiomics Brain Imaging Genetics based Approach
Confirmed Presenter: Jin Zhang, Northwestern Polytechnical University, China

Room: 02N
Format: In person


Authors List: Show

  • Jin Zhang, Northwestern Polytechnical University, China
  • Yan Yang, Northwestern Polytechnical University, China
  • Muheng Shang, Northwestern Polytechnical University, China
  • Lei Guo, Northwestern Polytechnical University, China
  • Daoqiang Zhang, Nanjing University of Aeronautics and Astronautics, China
  • Lei Du, Northwestern Polytechnical University, China

Presentation Overview: Show

Discovering genetic variations underpinning brain disorders is important to understand their pathogenesis. Indirect associations or spurious causal relationships pose a threat to the reliability of biomarker discovery for brain disorders, potentially misleading or incurring bias in subsequent decision-making. Unfortunately, the stringent selection of reliable biomarker candidates for brain disorders remains a predominantly unexplored challenge. In this paper, to fill this gap, we propose a fresh and powerful scheme, referred to as the Causality-aware Genotype intermediate Phenotype Correlation Approach (Ca-GPCA). Specifically, we design a bidirectional association learning framework, integrated with a parallel causal variable decorrelation module and sparse variable regularizer module, to identify trustworthy causal biomarkers. A disease diagnosis module is further incorporated to ensure accurate diagnosis and identification of causal effects for pathogenesis. Additionally, considering the large computational burden incurred by high-dimensional genotype-phenotype covariances, we develop a fast and efficient strategy to reduce the runtime and prompt practical availability and applicability. Extensive experimental results on four simulation data and real neuroimaging genetic data clearly show that Ca-GPCA outperforms state-of-the-art methods with excellent built-in interpretability. This can provide novel and reliable insights into the underlying pathogenic mechanisms of brain disorders.

14:20-14:40
Proceedings Presentation: SVQ-MIL: Small-Cohort Whole Slide Image Classification via Split Vector Quantization
Confirmed Presenter: Yao-Zhong Zhang, The University of Tokyo, Japan

Room: 02N
Format: In person


Authors List: Show

  • Dawei Shen, The University of Tokyo, Japan
  • Yao-Zhong Zhang, The University of Tokyo, Japan
  • Keita Tamura, Hiroshima University, Japan
  • Yohei Okubo, The University of Tokyo, Japan
  • Seiya Imoto, The University of Tokyo, Japan

Presentation Overview: Show

Whole Slide Images (WSIs) are high-resolution digital scans of microscope slides that play important roles in pathological analysis. Recent advancements in deep learning have significantly improved WSI classification.
However, challenges persist, particularly in small cohorts with limited training samples.
Multiple Instance Learning (MIL) has emerged as a leading framework for WSI classification. In MIL, each WSI is divided into image tiles, and each tile is represented by an embedding generated by a pretrained vision foundation model. Nevertheless, these embeddings are general-purpose and typically exhibit high variability, rendering them suboptimal for specific classification tasks.
In this study, we introduce SVQ-MIL, a generalized framework that leverages Split Vector Quantization (SVQ) with a learnable codebook to quantize instance embeddings. The learned codebook reduces embedding variability and abbreviates the input for MIL model, making it advantageous for small-cohort datasets. Additionally, SVQ-MIL enhances model interpretability by providing a profiling of the WSI instances through the learned codebook. Experimental evaluations demonstrate that SVQ-MIL achieves competitive performance compared with the-state-of-the-art methods on two benchmark datasets. \textcolor{red}{The source code is available at \url{https://github.com/aCoalBall/SVQMIL}.}

14:40-15:00
Genetic Confounding and Comorbidity: Re-evaluating Causal Inference in Disease Associations
Confirmed Presenter: Hadasa Kaufman, The Hebrew University of Jerusalem, Israel

Room: 02N
Format: In person


Authors List: Show

  • Hadasa Kaufman, The Hebrew University of Jerusalem, Israel
  • Nadav Rapoport, Ben-Gurion University of the Negev, Israel
  • Michal Linial, The Hebrew University of Jerusalem, Israel

Presentation Overview: Show

Comorbidity analyses indicate that ~35% of common disease pairs tend to occur sequentially within the same individual. Understanding whether this comorbidity pairing reflects a causal relationship or results from shared (often unknown) external factors is crucial for clinical decisions. Although causal inference methods are increasingly used in clinical research, most methods fail to incorporate genetic information, despite the well-documented pleiotropy of single-nucleotide polymorphisms (SNPs). Herein, we develop methodologies aimed at addressing this knowledge gap. In an extensive analysis of 440×440 disease pairs (with ≥500 cases each) from the UK Biobank (UKB), we found that approximately 58% of disease pairs share at least one associated SNP. We compared and evaluated two complementary approaches for addressing the genetic confounding effects. In the first scheme (coined EXPO for Exclude Population), we removed all individuals that displayed shared associated SNPs for both diseases. When EXPO was applied to the 440×440 disease pairs, this method showed a significant shift in p-value distributions (p-value 6e-4), but failed to identify pairs confirming elimination of residual genetic signals. The second approach relied on a propensity score matching (PSM) protocol to balance genetic risk between matched groups. In a pilot test of 5×5 abundant disease pairs, we combined the PSM with polygenic risk scores (PRS). The PRS-PSM and classical PSM yielded consistent results in 80% of cases, and for another 15%, significant results were confirmed only in PRS-PSM. These findings suggest that incorporating genetic information via PRS-PSM will enhance genetic interpretation and validate the genuine causal relationship of outcomes.

15:00-15:20
Sparse modeling of interactions enables fast detection of genome-wide epistasis in biobank-scale studies
Confirmed Presenter: Julian Stamp, Brown University, United States

Room: 02N
Format: In person


Authors List: Show

  • Julian Stamp, Brown University, United States
  • Samuel Pattillo Smith, University of Texas, United States
  • Daniel Weinreich, Brown University, United States
  • Lorin Crawford, Microsoft Research, United States

Presentation Overview: Show

The lack of computational methods capable of detecting epistasis in biobanks has led to uncertainty about the role of non-additive genetic effects on complex trait variation. The marginal epistasis framework is a powerful approach because it estimates the likelihood of a SNP being involved in any interaction, thereby reducing the multiple testing burden. Current implementations of this approach have failed to scale to large human studies. To address this, we present the sparse marginal epistasis (SME) test, which concentrates the scans for epistasis to regions of the genome that have known functional enrichment for a trait of interest. By leveraging the sparse nature of this modeling setup, we develop a novel statistical algorithm that allows SME to run 10 to 90 times faster than state-of-the-art epistatic mapping methods. In a study of blood traits measured in 349,411 individuals from the UK Biobank, we show that reducing searches of epistasis to variants in accessible chromatin regions facilitates the identification of genetic interactions associated with regulatory genomic elements.

15:20-15:40
Pan-cancer analysis in the real-world setting uncovers immunogenomic drivers of acquired resistance post-immunotherapy
Confirmed Presenter: Mohamed Reda Keddar, AstraZeneca, United Kingdom

Room: 02N
Format: In person


Authors List: Show

  • Mohamed Reda Keddar, AstraZeneca, United Kingdom
  • Martin Miller, AstraZeneca, United Kingdom

Presentation Overview: Show

Immune checkpoint blockade (ICB) has transformed cancer care, procuring long-lasting benefit to patients across various cancer types. However, >80% of patients fail to respond to ICB (primary resistant) or eventually develop resistance after initial clinical benefit (acquired resistant). Due to difficulty in accessing post-progression clinical samples, remarkably little is known about which immunogenomic features emerge as patients progress on therapy. Here, we use the Tempus AI real-world clinicogenomic database to build a pan-cancer and multimodal dataset of clinical and pre/post-treatment RNA/DNA-seq data from >5,000 patients across NSCLC, HNC, and TNBC. Using a systematic bioinformatics approach, we characterise and compare the clinical and molecular features of acquired vs. primary resistant patients in the post-progression setting. We find that acquired resistant patients consistently derive an ICB-specific prognostic advantage, as they survive significantly longer than their primary counterpart even after progressing. At the molecular level, acquired resistant tumours show a universally inflamed tumour microenvironment (TME) post-progression, specifically maintained or induced by ICB. Using dN/dS to evaluate mutation selection from pre- to post-treatment, we identify ICB-specific mutations selected for post-acquired resistance. These mutations were involved in functionally-relevant molecular processes, including loss of antigen processing and presentation, dysregulated metabolism, and putative immune escape via onogenic signalling pathways. Altogether, our analysis of post-progression samples mapped out the molecular underpinnings of acquired vs. primary ICB resistance and offers an opportunity for improved patient selection strategies and positioning of next-generation immunotherapies to re-activate an effective anti-tumour response and optimise outcome.

15:40-16:00
Beyond Mutation Frequency: A Bayesian Framework for Identifying Functional Cancer Drivers from Single-Cell Data
Confirmed Presenter: Komlan Atitey, NIH, United States

Room: 02N
Format: In person


Authors List: Show

  • Komlan Atitey, NIH, United States
  • Benedict Anchang, National Institute of Environmental Health Sciences, United States

Presentation Overview: Show

Cancer is driven by genetic alterations, especially gain-of-function mutations in oncogenes (OGs) and loss-of-function mutations in tumor suppressor genes (TSGs). Traditional approaches to identifying cancer driver genes (CDGs) rely heavily on mutation frequency across patient cohorts. While effective at detecting common drivers, these methods often miss rare but functionally significant mutations, and they struggle with the complexity introduced by tumor heterogeneity. To address these limitations, we present PICDGI (Predict Immunosuppressive Cancer Driver Genes using gene-gene Interaction features), a Bayesian framework that integrates time-series single-cell RNA sequencing (scRNA-seq) data with gene-gene interaction dynamics. PICDGI moves beyond mutation frequency by modeling gene regulatory influence and functional impact within evolving tumor cell populations. PICDGI begins by identifying cancer progenitor cells across tumor stages and reconstructs gene expression trajectories during tumor development. It then uses variational Bayesian inference to infer dynamic gene interaction networks and introduces the gene driver coefficient, a novel metric that quantifies each gene’s regulatory influence on downstream targets. This enables the identification of both known and previously unrecognized driver genes based on their functional roles in tumor progression and immune evasion. When applied to scRNA-seq data from nine samples across three lung adenocarcinoma (LUAD) patients, PICDGI successfully recovered established OGs and TSGs (62%) and revealed novel candidate drivers (38%) with strong expression patterns and relevance to tumor evolution, as confirmed by Moran’s I test. Overall, PICDGI provides a biologically grounded, interaction-driven strategy for identifying functional cancer drivers from single-cell data, offering a powerful tool for advancing personalized cancer genomics.

16:40-17:00
AdaGenes: A streaming processor for high-throughput annotation and filtering of sequence variant data
Confirmed Presenter: Nadine S. Kurz, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany, Germany

Room: 02N
Format: In person


Authors List: Show

  • Nadine S. Kurz, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany, Germany
  • Klara Drofenik, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany, Germany
  • Kevin Kornrumpf, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany, Germany
  • Kirsten Reuter-Jessen, Institute of Pathology, University Medical Center Göttingen, Germany, Germany
  • Jürgen Dönitz, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany, Germany

Presentation Overview: Show

The amount of sequencing data resulting from whole exome or genome sequencing (WES / WGS) presents challenges for annotation, filtering, and analysis.
We introduce the Adaptive Genes processor (AdaGenes), a sequence variant streaming processor designed to efficiently annotate, filter, LiftOver and transform large-scale VCF files. AdaGenes provides a unified solution for researchers to streamline VCF processing workflows and address common challenges in genomic data processing, e.g. to filter out non-relevant variants to focus on further processing of the relevant positions. AdaGenes integrates genomic, transcript and protein data annotations, while maintaining scalability and performance for high-throughput workflows. Leveraging a streaming architecture, AdaGenes processes variant data incrementally, enabling high-performance on large files due to low memory consumption and seamless handling of whole genome files.
The interactive front end provides the user with the ability to dynamically filter variants based on user-defined criteria.
It allows researchers and clinicians to efficiently analyze large genomic datasets, facilitating variant interpretation in diverse genomics applications, such as population studies, clinical diagnostics, and precision medicine.
AdaGenes is able to parse and convert multiple file formats while preserving metadata, and provides a report of the changes made to the variant file.
AdaGenes is available at https://mtb.bioinf.med.uni-goettingen.de/adagenes.

17:00-17:20
Comprehensive framework for assessing discrepancies in genomic content and species-level annotations across microbial reference genomes
Confirmed Presenter: Serghei Mangul, Department of Biomedical Sciences, University of Suceava, Romania

Room: 02N
Format: In person


Authors List: Show

  • Grigore Boldirev, Georgia State University, United States
  • Mohammed Alser, Georgia State University, United States
  • Peace Aguma, Georgia State University, United States
  • Viorel Munteanu, University of Suceava, Romania
  • Mihai Dimian, Department of Computers, Electronics and Automation, Ștefan cel Mare University of Suceava, Romania
  • Alex Zelikovsky, Georgia State University, United States
  • Serghei Mangul, Department of Biomedical Sciences, University of Suceava, Romania

Presentation Overview: Show

Metagenomics research provides insights into the composition, diversity, and functions of microbial communities in various environments. To identify bacterial species, sequencing reads from samples are typically mapped to reference genomes found in bacterial reference databases. However, multiple references may share the same taxonomic identifiers while containing different genomic information, which can lead to inconsistencies in downstream analyses. We have developed a novel comprehensive framework for assessing discrepancies in genomic content and species-level annotations across microbial reference genomes, and applied it to evaluate the two most widely used bacterial reference databases: PATRIC and RefSeq. NCBI’s taxonomic identifiers were used to assess the agreement between databases at the species level. Species found in both databases were identified by matching taxIDs. To compare genomic representation, the BLAST tool was used to align all contigs from one database to all contigs of the corresponding strain in the other database. This analysis was extended to all overlapping species where strain-level information was available. The study revealed substantial discrepancies between databases. Among single-contig genomes, 85.5% exhibited 100% genomic similarity, 14.4% demonstrated an average similarity of 94.3%, and 17 genomes showed less than 75% similarity. For genomes with 2–10 contigs, 82.6% had 100% similarity, 17% averaged 94.79% similarity, and 128 genomes fell below the 75% threshold. Our results emphasize significant variability in genome representation across reference databases, especially for multi-contig genomes. Our framework will provide a foundation for building a more consistent and comprehensive reference database, which will improve the accuracy, rigor, and reproducibility of metagenomics research.

17:20-17:40
Building Ultralarge Pangenomes Using Scalable and Compressive Techniques
Confirmed Presenter: Sumit Walia, University of California San Diego, United States

Room: 02N
Format: In person


Authors List: Show

  • Sumit Walia, University of California San Diego, United States
  • Harsh Motwani, University of California San Diego, United States
  • Yu-Hsiang Tseng, University of California San Diego, United States
  • Kyle Smith, University of California San Diego, United States
  • Russell Corbett-Detig, University of California San Diego, United States
  • Yatish Turakhia, University of California San Diego, United States

Presentation Overview: Show

Pangenomics studies intra-species genetic diversity by analyzing collections of genomes from the same species. As pangenomics scales to millions of sequences, efficient data formats become crucial to enabling future applications and ensuring efficient computational and memory performance for pangenomic analysis. Current pangenomic formats primarily store variation across genomes but fail to capture shared evolutionary and mutational histories, limiting their applicability. They also face scalability issues due to storage and computational inefficiencies. To address these limitations, we present PanMAN (Pangenome Mutation-Annotated Network), a novel pangenomic format that is the most compact, scalable, and information-rich among all variation-preserving formats. PanMAN encodes not only genome alignments and variations but also shared mutational and evolutionary histories inferred across genomes, making it the first format to unify multiple whole-genome alignment, phylogeny, and mutational histories into a single unified framework. By leveraging "evolutionary compression," PanMAN achieves 3.5X to 1391X compression over other formats (GFA, VG, GBZ, PanGraph, AGC, and tskit) across microbial datasets. To demonstrate scalability, we built the largest pangenome in terms of number of sequences —a PanMAN with 8 million SARS-CoV-2 genomes—requiring just 366MB of disk space. Using SARS-CoV-2 as a case study, we show that PanMAN offers a detailed and accurate portrayal of the pathogen's evolutionary and mutational history, facilitating the discovery of new biological insights. We also present panmanUtils, a software toolkit for constructing, analyzing, and integrating PanMANs with existing pangenomic workflows. PanMANs are poised to enhance the scale, speed, resolution, and overall scope of pangenomic analyses and data sharing.

17:40-18:00
Information Content as a metric to evaluate and compare DNA Language Models
Confirmed Presenter: Melissa Sanabria, Technische Universität Dresden, Germany

Room: 02N
Format: In person


Authors List: Show

  • Melissa Sanabria, Technische Universität Dresden, Germany
  • Anna R. Poetsch, Technische Universität Dresden, Germany

Presentation Overview: Show

Large language models have transformed the field of natural language processing by enabling the generation of coherent and meaningful text. This success has inspired researchers to apply similar approaches to biological sequences, particularly DNA, where the underlying "language" of the genome holds biological insight. DNA language models, such as GROVER, offer a promising avenue for advancing genomic analysis. Despite their potential, evaluating and comparing those models remains a significant challenge. Existing metrics often rely on genome-specific motifs, biological annotations, or the number of parameters used during model training. These limitations make it difficult to perform consistent and generalizable assessments across different models or genomic contexts. We propose the use of entropy and information content as general-purpose metrics to evaluate DNA language models. By computing these measures over whole-genome predictions, we can quantify how much information the model captures during training. It allows us to compare not only between different types of genomic elements and regions—such as coding vs. non-coding sequences or promoters vs. intergenic regions—but also across different versions of the human genome. We also introduce a set of pretrained DNA language models for three major human genome builds: hg19, hg38, and telomere-to-telomere (T2T). Our analysis reveals that, although T2T includes a substantially greater proportion of repetitive sequences, this increase does not adversely affect the information content observed in other genomic regions. Our approach provides a more interpretable and genome-agnostic framework for evaluating DNA language models and offers new insights into how different genome assemblies influence model learning and performance.

Thursday, July 24th
8:40-9:00
Proceedings Presentation: ADME-Drug-Likeness: Enriching Molecular Foundation Models via Pharmacokinetics-Guided Multi-Task Learning for Drug-likeness Prediction
Confirmed Presenter: Dongmin Bang, Seoul National University, South Korea

Room: 03A
Format: In person


Authors List: Show

  • Dongmin Bang, Seoul National University, South Korea
  • Juyeon Kim, Seoul National University, South Korea
  • Haerin Song, Seoul National University, South Korea
  • Sun Kim, Seoul National University, South Korea

Presentation Overview: Show

Recent breakthroughs in AI-driven generative models enable the rapid design of extensive molecular libraries, creating an urgent need for fast and accurate drug-likeness evaluation. Traditional approaches, however, rely heavily on structural descriptors and overlook pharmacokinetic (PK) factors such as absorption, distribution, metabolism, and excretion (ADME). Furthermore, existing deep-learning models neglect the complex interdependencies among ADME tasks, which play a pivotal role in determining clinical viability.
We introduce ADME-DL (drug likeness), a novel two-step pipeline that first enhances diverse range of Molecular Foundation Models (MFMs) via sequential ADME multi-task learning. By enforcing an A→D→M→E flow—grounded in a data-driven task dependency analysis that aligns with established pharmacokinetic principles—our method more accurately encodes PK information into the learned embedding space.
In Step 2, the resulting ADME-informed embeddings are leveraged for drug-likeness classification, distinguishing approved drugs from negative sets drawn from chemical libraries.
Through comprehensive experiments, our sequential ADME multi-task learning achieves up to +2.4% improvement over state-of-the-art baselines, and enhancing performance across tested MFMs by up to +18.2%. Case studies with clinically annotated drugs validate that respecting the PK hierarchy produces more relevant predictions, reflecting drug discovery phases. These findings underscore the potential of ADME-DL to significantly enhance the early-stage filtering of candidate molecules, bridging the gap between purely structural screening methods and PK-aware modeling.

9:00-9:20
Proceedings Presentation: Understanding the Sources of Performance in Deep Drug Response Models Reveals Insights and Improvements
Room: 03A
Format: In person


Authors List: Show

  • Nikhil Branson, queen mary university of london, United Kingdom
  • Pedro Rodriguez Cutillas, Barts Cancer Institute, QMUL, United Kingdom
  • Conrad Bessant, Queen Mary - University of London, United Kingdom

Presentation Overview: Show

Anti-cancer drug response prediction (DRP) using cancer cell lines (CLs) is crucial in stratified medicine and drug discovery. Recently new deep learning models for DRP have improved performance over their predecessors. However, different models use different input data types and architectures making it hard to find the source of these improvements. Here we consider published DRP models that report state-of-the-art performance predicting continuous response values. These models take chemical structures of drugs and omics profiles of CLS as input. By experimenting with these models and comparing with our simple benchmarks we show that no performance comes from drug features, instead, performance is due to the transcriptomics CL profiles. Furthermore, we show that, depending on the testing type, much of the current reported performance is a property of the training target values. We address these limitations by creating BinaryET and BinaryCB that predict binary drug response values, guided by the hypothesis that this reduces the noise in the drug efficacy data. Thus, better aligning them with biochemistry that can be learnt from the input data. BinaryCB leverages a chemical foundation model, while BinaryET is trained from scratch using a transformer-type architecture. We show that these models learn useful chemical drug features, which is the first time this has been demonstrated for multiple testing types to our knowledge. We further show binarising the drug response values causes the models to learn useful chemical drug features. We also show that BinaryET improves performance over BinaryCB, and the published models that report state-of-the-art performance.

9:20-9:40
Proceedings Presentation: FACT: Feature Aggregation and Convolution with Transformers for predicting drug classification code
Confirmed Presenter: Gwang-Hyeon Yun, Yonsei University - Mirae Campus, South Korea

Room: 03A
Format: In person


Authors List: Show

  • Gwang-Hyeon Yun, Yonsei University - Mirae Campus, South Korea
  • Jong-Hoon Park, Yonsei University - Mirae Campus, South Korea
  • Young-Rae Cho, Yonsei University - Mirae Campus, South Korea

Presentation Overview: Show

Motivation: Drug repositioning, identifying new therapeutic applications for existing drugs, can significantly reduce the time and cost involved in drug development. Recent studies have explored the use of Anatomical Therapeutic Chemical (ATC) codes in drug repositioning, offering a systematic framework to predict ATC codes for a drug. The ATC classification system organizes drugs according to their chemical properties, pharmacological actions, and therapeutic effects. However, its complex hierarchical structure and the limited scalability at higher levels present significant challenges for achieving accurate ATC code prediction.
Results: We propose a novel approach to predict ATC codes of drugs, named Feature Aggregation and Convolution with Transformer models (FACT). This method computes three types of drug similarities, incorporating ATC code similarity with hierarchical weights and masked drug-ATC code associations. These features are then aggregated for each target drug-ATC code pair and processed through a convolution-transformer encoder to generate three embeddings. The embeddings are finally used to estimate the probability of an association between the target pair. The experimental results demonstrate that the proposed method achieves an AUROC of 0.9805 and an AUPRC of 0.9770 at level 4 of the ATC codes, outperforming the previous methods by 15.05% and 18.42%, respectively. This study highlights the effectiveness of integrating diverse drug features and the potential of transformer-based models in ATC code prediction.

9:40-10:00
Proceedings Presentation: Efficient 3D kernels for molecular property prediction
Confirmed Presenter: Ankit, Indian Institute of Technology Palakkad, India

Room: 03A
Format: In person


Authors List: Show

  • Ankit, Indian Institute of Technology Palakkad, India
  • Sahely Bhadra, Indian Institute of Technology Palakkad, India
  • Juho Rousu, Aalto University, Finland

Presentation Overview: Show

This paper addresses the challenge of incorporating 3-dimensional structural information in graph kernels for machine learning-based virtual screening, a crucial task in drug discovery. Existing kernels that capture 3D information often suffer from high computational complexity, which limits their scalability. To overcome this, we propose the 3-dimensional chain motif graph kernel (c-MGK), which effectively integrates essential 3D structural properties—bond length, bond angle, and torsion angle—within the three-hop neighborhood of each atom in a molecule. In addition, we introduce a more computationally efficient variant, the 3-dimensional graph hopper kernel (3DGHK), which reduces the complexity from the state-of-the-art $\mathcal{O}(n^{6})$ (for the 3D pharmacophore kernel) to $\mathcal{O}(n^{2}(m + \log(n) + \delta^2 +dT^{6}))$. Here, $n$ is the number of nodes, $T$ is the highest degree of the node, $m$ is the number of edges, $\delta$ is the diameter of the graph, and $d$ is the dimension of the attributes of the nodes. We conducted experiments on 21 datasets, demonstrating that 3DGHK not only outperforms state-of-the-art 2D and 3D graph kernels, but also surpasses deep learning models in classification accuracy, offering a powerful and scalable solution for virtual screening tasks.

11:20-11:40
Haplotype-specific copy number profiling of cancer genomes from long reads sequencing data
Confirmed Presenter: Tanveer Ahmad, NIH/NCI, United States

Room: 03A
Format: In person


Authors List: Show

  • Tanveer Ahmad, NIH/NCI, United States
  • Ayse Keskus, NIH/NCI, United States
  • Mikhail Kolmogorov, NIH/NCI, United States
  • Sergey Aganezov, Oxford Nanopore, United States
  • Midhat Farooqi, Childrens Mercy, United States
  • Anton Goretsky, University of Maryland, United States
  • Ataberk Donmez, University of Maryland, United States
  • Michael Dean, NIH/NCI, United States

Presentation Overview: Show

Attached as PDF

11:40-12:00
Multi-omics and liquid biopsy profiling of rapid autopsies reveals evolutionary dynamics and heterogeneity in metastatic bladder cancer
Confirmed Presenter: Pushpa Itagi, Public Health Sciences, Fred Hutch, Seattle, WA, USA. Human Biology Division, Fred Hutch, Seattle, WA, USA., United States

Room: 03A
Format: In person


Authors List: Show

  • Pushpa Itagi, Public Health Sciences, Fred Hutch, Seattle, WA, USA. Human Biology Division, Fred Hutch, Seattle, WA, USA., United States
  • Samantha Schuster, Public Health Sciences, Fred Hutch, Seattle, WA, USA. Human Biology Division, Fred Hutch, Seattle, WA, USA., United States
  • Sonali Arora, Public Health Sciences, Fred Hutch, Seattle, WA, USA. Human Biology Division, Fred Hutch, Seattle, WA, USA., United States
  • Patricia Galipeau, Public Health Sciences, Fred Hutch, Seattle, WA, USA. Human Biology Division, Fred Hutch, Seattle, WA, USA., United States
  • Hung-Ming Lam, Department of Urology, University of Washington, United States
  • Andrew Hsieh, Human Biology Division, Fred Hutch, Seattle, WA, USA. Department of Medicine, UW, Seattle,WA, USA., United States
  • Gavin Ha, Public Health Sciences, Fred Hutch, Seattle, WA, USA. Human Biology Division, Fred Hutch, Seattle, WA, USA., United States

Presentation Overview: Show

The extensive molecular, transcriptomic and genomic complexity of metastatic bladder cancer (mBLCA) significantly complicates clinical management. Approximately 75% of mBLCA cases are conventional urothelial carcinoma, while 25% display variant histologies, which have a poorer prognosis. We characterized heterogeneity and clonal evolution in a rapid autopsy cohort of 20 patients using tumor tissues, matched normal samples, and cell-free DNA (cfDNA). Clonal evolution and metastatic seeding and migration patterns were inferred from mutation data for all patients. We used COSMIC signatures linking mutation profiles to histological and clinical features for various subtypes. Custom approaches and frameworks were developed for analyzing mutations, copy number alterations (CNAs), structural variants (SVs) in tumors and cfDNA. Mutational clonal evolution analyses and RNA-seq highlighted cisplatin resistance in the plasmacytoid urothelial carcinoma (PUC) subtype, driven by enhanced DNA damage response pathways. Most patients showed significant mutational heterogeneity (~20–30% subclonal) and for CNAs/SVs (>40% subclonal), potentially driving therapy resistance and elevating tumor heterogeneity. cfDNA detected about 90% of founder, 85% shared, and 25% private mutations from matched tumors. Nucleosome profiling from cfDNA differentiated mBLCA from healthy controls and identified variant-specific transcription factors that are active in mBLCA. Integrating multi-omics with cfDNA effectively captures intra-patient and inter-patient tumor heterogeneity, providing a comprehensive view of clonal dynamics. Insights and findings from this work pave the way for targeted therapies against evolving tumor clones and offer strategies to overcome resistance mechanisms in mBLCA.

12:00-12:20
Using spatial transcriptomics to elucidate the primary site of Cancers of Unknown Primary (CUPs)
Confirmed Presenter: Oscar González Velasco, German Cancer Research Center DKFZ, Germany

Room: 03A
Format: In person


Authors List: Show

  • Oscar González Velasco, German Cancer Research Center DKFZ, Germany
  • Siao-Han Wong, German Cancer Research Center (DKFZ), Germany
  • Marta Casado, Institut de Recerca Contra la Leucèmia Josep Carreras, Spain
  • Veronica Davalos, Institut de Recerca Contra la Leucèmia Josep Carreras, Spain
  • Javier De Las Rivas Sanz, Salamanca Cancer Research Center, Spain
  • Manell Steller, Institut de Recerca Contra la Leucèmia Josep Carreras, Spain
  • Benedikt Brors, German Cancer Research Center (DKFZ), Germany

Presentation Overview: Show

Cancers of unknown primary (CUP) are a challenging group of poorly differentiated metastatic cancers, that due to its nature limited treatment options are available, resulting in a poor prognosis and overall sur-vival. Recently, novel predictive models to characterize CUP patients showed encouraging results and suggested relevant therapeutic interventions yet lacked consistency and interpretability to be widely adopt-ed in clinical care.
We have developed a state-of-the-art AI CNN using bulk RNA-Seq gene expression and prior knowledge in the form of curated gene signature of transcription factors and their associated gene targets. The training corpus consists of more than 27000 samples from cancer patients and healthy donors, targeting 28 primary sites. The model displayed an accuracy of 97.17% on validation data at predicting the primary sites. Additionally, we analyzed 40 spatial transcriptomic samples from a wide range of known primary sites, including distant metastasis, for which we unambiguously located the correct primary site in 39 of them. Additionally, we analysed 20 novel CUP spatial transcriptomics samples. Results show that, by using annotations from pathologist, our suggested primaries could help to identify plausible origins, yielding coherent results (in contrast with the homing tissue) for those which did not have any clinical-derived hypothesis.
By identifying the true primary site from metastatic CUPs we hope to provide in the future clinical bene-fits from site-specific therapies, opening the possibility for many existing treatment options.

12:20-12:40
Inherited genetic risk factors associated with young adult versus late-onset lung cancers
Room: 03A
Format: In person


Authors List: Show

  • Myvizhi Esai Selvan, Icahn School of Medicine at Mount Sinai, United States
  • Robert J. Klein, Icahn School of Medicine at Mount Sinai, United States
  • Zeynep H. Gümüş, Icahn School of Medicine at Mount Sinai, United States

Presentation Overview: Show

Genetics plays a key role in lung cancer risk. While lung cancer primarily affects older adults, incidence among young adults is increasing. However, whether the germline genetics differ between young adults (<45 years) and older lung cancer patients (≥45 years) remains unclear.

We performed whole-genome sequencing on 171 predominantly young lung cancer patients and integrated germline whole-exome sequencing datasets from existing lung cancer cohorts and biobanks, totaling 9,065 participants—the largest analysis of lung cancer patients to date, with 186 young adults and 6359 older cases after sample QC. We compared the prevalence of rare pathogenic and likely pathogenic (P/LP) variants in cancer-related genes and 33,591 pathways from the Human Molecular Signatures Database (MSigDB) between two age groups using Fisher’s exact test, accounting for histology, gender and smoking status.

Young adult lung cancer patients harbored significantly more rare P/LP variants in DNA damage response genes compared to older patients, especially in lung squamous cell carcinoma patients and females. This association persisted in lung adenocarcinoma patients after controlling for smoking status. Young adult patients showed enrichment of rare P/LP variants in cancer driver, Fanconi Anemia and complement pathway genes. Notably, rare P/LP variants in BRIP1, ERCC6 and MSH5 were significantly more prevalent in young adult patients.

Our results demonstrate that the inherited genetics of early-onset lung cancer differs significantly from late-onset lung cancer. These findings can inform age-specific risk assessment and guide precision prevention, screening and targeted treatment strategies for young adult individuals harboring these variants.

12:40-13:00
pC-SAC: Method for High-Resolution 3D Genome Reconstruction from Low-Resolution Hi-C Data
Confirmed Presenter: Carlos Angel, Department of Biomedical Informatics, Columbia University, United States

Room: 03A
Format: In person


Authors List: Show

  • Carlos Angel, Department of Biomedical Informatics, Columbia University, United States
  • Narjis El Amraoui, New York Genome Center, New York City, USA, United States
  • Gamze Gürsoy, Department of Biomedical Informatics, Columbia University, United States

Presentation Overview: Show

The three-dimensional (3D) organization of the genome is crucial for gene regulation, with disruptions linked to various diseases. High-throughput Chromosome Conformation Capture (Hi-C) and related technologies have advanced our understanding of genome architecture by mapping interactions between distant genomic regions. However, capturing enhancer-promoter interactions at high resolution remains challenging due to the high sequencing depth required. We introduce pC-SAC (probabilistically Constrained-Self-Avoiding-Chromatin), a novel computational method for producing accurate high-resolution Hi-C matrices from low-resolution data. pC-SAC uses adaptive importance sampling with sequential Monte Carlo to generate ensembles of 3D chromatin chains that satisfy physical constraints derived from low-resolution Hi-C data. Our method achieves over 95% accuracy in reconstructing high-resolution chromatin maps and identifies novel interactions enriched with candidate cis-regulatory elements (cCREs) and expression Quantitative Trait Loci (eQTLs). Benchmarking against state-of-the-art deep learning models demonstrates pC-SAC's superior performance in both short- and long-range interaction reconstruction. pC-SAC offers a cost-effective solution for enhancing the resolution of Hi-C data, thus enabling deeper insights into genome organization and its role in gene regulation and disease. Our tool can be found at https://github.com/G2Lab/pCSAC.

14:00-14:20
Proceedings Presentation: HIDE: Hierarchical cell-type Deconvolution
Confirmed Presenter: Franziska Görtler, Cancer Clinic, Haukeland University Hospital, Norway

Room: 03A
Format: In person


Authors List: Show

  • Dennis Völkl, Institute of Theoretical Physics, University of Regensburg, Germany
  • Malte Mensching-Buhr, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany
  • Thomas Sterr, Institute of Theoretical Physics, University of Regensburg, Germany
  • Sarah Bolz, Institute of Human Anatomy and Embryology, University of Regensburg, Germany
  • Andreas Schäfer, Institute of Theoretical Physics, University of Regensburg, Germany, Germany
  • Nicole Seifert, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany
  • Jana Tauschke, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Germany
  • Austin Rayford, Department of Biomedicine and Centre for Cancer Biomarkers, University of Bergen, Norway
  • Oddbjørn Straume, Cancer Clinic, Haukeland University Hospital, Norway
  • Helena U. Zacharias, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Germany
  • Sushma Nagaraja Grellscheid, Computational Biology Unit, University of Bergen, Norway
  • Tim Beissbarth, University Medicine Göttingen, Germany
  • Michael Altenbuchinger, Department of Medical Bioinformatics, University Medical Center Göttingen, Germany
  • Franziska Görtler, Cancer Clinic, Haukeland University Hospital, Norway

Presentation Overview: Show

Motivation: Cell-type deconvolution is a computational approach to infer cellular distributions from bulk transcriptomics data. Several methods have been proposed, each with its own advantages and disadvantages. Reference based approaches make use of archetypic transcriptomic profiles representing individual cell types. Those reference profiles are ideally chosen such that the observed bulks can be reconstructed as a linear combination thereof. This strategy, however, ignores the fact that cellular populations arise through the process of cellular differentiation, which entails the gradual emergence of cell groups with diverse morphological and functional characteristics.
Results: Here, we propose Hierarchical cell-type Deconvolution (HIDE), a cell-type deconvolution approach which incorporates a cell hierarchy for improved performance and interpretability. This is achieved by a hierarchical procedure that preserves estimates of major cell populations while inferring their respective subpopulations. We show in simulation studies that this procedure produces more reliable and more consistent results than other state-of-the-art approaches. Finally, we provide an example application of HIDE to explore breast cancer specimens from TCGA.
Availability: A python implementation of HIDE is available at zenodo: doi:10.5281/zenodo.14724906.

14:20-14:40
Proceedings Presentation: RVINN: A Flexible Modeling for Inferring Dynamic Transcriptional and Post-Transcriptional Regulation Using Physics-Informed Neural Networks
Confirmed Presenter: Osamu Muto, Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Japan

Room: 03A
Format: In person


Authors List: Show

  • Osamu Muto, Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Japan
  • Zhongliang Guo, Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Japan
  • Rui Yamaguchi, Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Japan

Presentation Overview: Show

Dynamic gene expression is controlled by transcriptional and post-transcriptional regulation. Recent studies on transcriptional bursting and buffering have increasingly highlighted the dynamic gene regulatory mechanisms. However, direct measurement techniques still face various constraints and require complementary methodologies, which are both comprehensive and versatile. To address this issue, inference approaches based on transcriptome data and differential equation models representing the messenger RNA lifecycle have been proposed. However, the inference of complex dynamics under diverse experimental conditions and biological scenarios remains challenging. In this study, we developed a flexible modeling using Physics-Informed Neural Networks and demonstrated its performance using simulation and experimental data. Our model has the ability to computationally revalidate and visualize dynamic biological phenomena, such as transcriptional ripple, co-bursting, and buffering in a breast cancer cell line. Furthermore, our results suggest putative molecular mechanisms underlying these phenomena. We propose a novel approach for inferring transcriptional and post-transcriptional regulation and expect to offer valuable insights for experimental and systems biology.

14:40-15:00
A deep learning framework for predicting single gene expression from cell-free DNA
Confirmed Presenter: Robert Patton, Fred Hutchinson Cancer Center, United States

Room: 03A
Format: In person


Authors List: Show

  • Robert Patton, Fred Hutchinson Cancer Center, United States
  • Alexander Netzley, Fred Hutchinson Cancer Center, United States
  • Thomas Persse, Fred Hutchinson Cancer Center, United States
  • Akira Nair, Fred Hutchinson Cancer Center, United States
  • Peter Nelson, Fred Hutchinson Cancer Center, United States
  • Gavin Ha, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA, United States

Presentation Overview: Show

Liquid biopsy derived circulating tumor DNA (ctDNA) profiling is increasingly used as a minimally invasive alternative to traditional biopsies. Epigenetic inference from ctDNA has made considerable strides, but current methods struggle with single gene resolution and require specialized assays or ultra-deep, targeted sequencing. Herein we jointly introduce Triton, a tool for comprehensive fragmentomic and nucleosome profiling of cell-free DNA (cfDNA), and Proteus, a multi-modal deep learning framework for predicting single gene expression as a direct RNA-Seq analog, using standard depth whole genome sequencing of cfDNA. By synthesizing fragmentation and inferred nucleosome positioning patterns in the promoter and gene body, Proteus is capable of reproducing expression profiles from patient-derived xenograft (PDX) pure ctDNA with an accuracy similar to RNA-Seq technical replicates. Applying Proteus to cfDNA from four patient cohorts with matched tumor RNA-Seq, we show that the model can accurately predict the expression of specific prognostic and phenotype markers and therapeutic targets at as low as 3% tumor fraction. As a direct analog to RNA-Seq, we further confirm this method’s immediate applicability to existing tools through accurate prediction of gene set and pathway enrichment scores. Our results demonstrate the potential clinical utility of Triton and Proteus as minimally invasive tools for cancer monitoring and therapeutic guidance, without requiring specialized assays or targeted panels.

15:00-15:20
MAGPIE: Multi-modal alignment of genes and peaks for integrated exploration of spatial transcriptomics and spatial metabolomics data
Room: 03A
Format: In person


Authors List: Show

  • Eleanor Williams, Cambridge Stem Cell Institute, United Kingdom
  • Lovisa Franzén, Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden, Sweden
  • Martina Olsson Lindvall, Safety Sciences, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg, Sweden, Sweden
  • Gregory Hamm, Integrated Bioanalysis, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Cambridge, UK, United Kingdom
  • Steven Oag, Animal Science and Technology, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Gothenburg, Sweden, Sweden
  • James Denholm, Integrated Bioanalysis, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Cambridge, UK, United Kingdom
  • Azam Hamidinekoo, Pathology, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK, United Kingdom
  • Muntasir Mamun Majumder, Safety Sciences, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg, Sweden, Sweden
  • Javier Escudero Morlanes, Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden, Sweden
  • Marco Vicari, Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden, Sweden
  • Joakim Lundeberg, Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden, Sweden
  • Laura Setyo, Pathology, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK, United Kingdom
  • Aleksandr Zakirov, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK, United Kingdom
  • Jorrit Hornberg, Safety Sciences, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg, Sweden, Sweden
  • Marianna Stamou, Safety Sciences, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg, Sweden, Sweden
  • Patrik Ståhl, Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden, Sweden
  • Anna Ollerstam, Safety Sciences, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg, Sweden, Sweden
  • Jennifer Tan, Predictive AI & Data, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Cambridge, UK, United Kingdom
  • Irina Mohorianu, Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK, United Kingdom

Presentation Overview: Show

Recent developments in spatially resolved -omics have enabled studies linking gene expression and metabolite levels to tissue morphology, offering new insights into biological pathways. By capturing multiple modalities on matched tissue sections, one can better probe how different biological entities interact in a spatially coordinated manner. However, such cross-modality integration presents experimental and computational challenges.

To align multimodal datasets into a shared coordinate system and facilitate enhanced integration and analysis, we propose MAGPIE (Multi-modal Alignment of Genes and Peaks for Integrated Exploration), a framework for co-registering spatially resolved transcriptomics, metabolomics, and tissue morphology from the same or consecutive sections.

We illustrate the generalisability and scalability of MAGPIE on spatial multi-omics data from multiple tissues, combining Visium with both MALDI and DESI mass spectrometry imaging. MAGPIE was also applied to newly generated multimodal datasets created using specialised experimental sampling strategy to characterise the metabolic and transcriptomic landscape in an in vivo model of drug-induced pulmonary fibrosis, to showcase the linking of small-molecule co-detection with endogenous responses in lung tissue.

MAGPIE highlights the refined resolution and increased interpretability of spatial multimodal analyses in studying tissue injury, particularly in pharmacological contexts, and offers a modular, accessible computational workflow for data integration.

15:20-15:40
Randomized Spatial PCA (RASP): a computationally efficient method for dimensionality reduction of high-resolution spatial transcriptomics data
Confirmed Presenter: Ian Gingerich, Dartmouth College, United States

Room: 03A
Format: In person


Authors List: Show

  • Ian Gingerich, Dartmouth College, United States
  • Brittany Goods, Dartmouth College, Thayer School of Engineering, United States
  • H. Robert Frost, Dartmouth College, Department of Biomedical Data Science, United States

Presentation Overview: Show

Spatial transcriptomics (ST) provides critical insights into the complex spatial organization of gene expression in tissues, enabling researchers to unravel the intricate relationship between cellular environments and biological function. Identifying spatial domains within tissues is essential for understanding tissue architecture and the mechanisms underlying various biological processes, including development and disease progression. Here, we present Randomized Spatial PCA (RASP), a novel spatially aware dimensionality reduction method for spatial transcriptomics (ST) data. RASP is designed to be orders-of-magnitude faster than existing techniques, scale to ST data with hundreds-of-thousands of locations, support the flexible integration of non-transcriptomic covariates, and enable the reconstruction of de-noised and spatially smoothed expression values for individual genes. To achieve these goals, RASP uses a randomized two-stage principal component analysis (PCA) framework that leverages sparse matrix operations and configurable spatial smoothing. We compared the performance of RASP against five alternative methods (BASS, GraphST, SEDR, spatialPCA, and STAGATE) on four publicly available ST datasets generated using diverse techniques and resolutions (10x Visium, Stereo-Seq, MERFISH, and 10x Xenium) on human and mouse tissues. Our results demonstrate that RASP achieves tissue domain detection performance comparable or superior to existing methods with a several orders-of-magnitude improvement in computational speed. The efficiency of RASP enhances the analysis of complex ST data by facilitating the exploration of increasingly high-resolution subcellular ST datasets that are being generated.

15:40-16:00
CAdir: Fast Clustering and Visualization of Single-Cell Transcriptomics Data by Direction in CA Space
Confirmed Presenter: Clemens Kohl, Max Planck Institute for Molecular Genetics, Germany

Room: 03A
Format: In person


Authors List: Show

  • Clemens Kohl, Max Planck Institute for Molecular Genetics, Germany
  • Martin Vingron, Max Planck Institute for Molecular Genetics, Germany

Presentation Overview: Show

Clustering for single-cell RNA-seq aims at finding similar cells and grouping them into biologically meaningful clusters. Many available clustering algorithms however do not not provide the cluster defining marker genes or are unable to infer the number of clusters in an unsupervised manner as well as lack tools to easily determine the quality of the label assignments. Therefore, clustering quality is commonly evaluated by visually inspecting low-dimensional embeddings as produced by e.g. UMAP or t-SNE. These embeddings can, however, distort the true cluster structure and are known to produce radically different embeddings depending on the chosen hyperparameters.

In order to improve the interpretability of clustering results, we developed CAdir, a clustering algorithm that can infer the number of clusters in the data, determine cluster specific genes and provides easy to interpret diagnostic plots. CAdir exploits the geometry induced by correspondence analysis (CA) to cluster cells as well as cluster associated genes based on their direction in CA space. Using the angle between the cluster directions, it is able to automatically infer the number of clusters in the data by merging and splitting clusters. A comprehensive set of diagnostic and explanatory plots provides users with valuable feedback about the clustering decisions and the quality of the final as well as intermediary clusters. CAdir is scalable to even the largest data set and provides similar clustering performance to other state-of-the-art cell clustering algorithms in our benchmarking.

CAdir can be downloaded from GitHub: https://github.com/VingronLab/CAdir