The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 24, 2025
8:40-9:00
Proceedings Presentation: ADME-Drug-Likeness: Enriching Molecular Foundation Models via Pharmacokinetics-Guided Multi-Task Learning for Drug-likeness Prediction
Confirmed Presenter: Dongmin Bang, Seoul National University, South Korea
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Dongmin Bang, Dongmin Bang, Seoul National University
  • Juyeon Kim, Juyeon Kim, Seoul National University
  • Haerin Song, Haerin Song, Seoul National University
  • Sun Kim, Sun Kim, Seoul National University

Presentation Overview:Show

Recent breakthroughs in AI-driven generative models enable the rapid design of extensive molecular libraries, creating an urgent need for fast and accurate drug-likeness evaluation. Traditional approaches, however, rely heavily on structural descriptors and overlook pharmacokinetic (PK) factors such as absorption, distribution, metabolism, and excretion (ADME). Furthermore, existing deep-learning models neglect the complex interdependencies among ADME tasks, which play a pivotal role in determining clinical viability.
We introduce ADME-DL (drug likeness), a novel two-step pipeline that first enhances diverse range of Molecular Foundation Models (MFMs) via sequential ADME multi-task learning. By enforcing an A→D→M→E flow—grounded in a data-driven task dependency analysis that aligns with established pharmacokinetic principles—our method more accurately encodes PK information into the learned embedding space.
In Step 2, the resulting ADME-informed embeddings are leveraged for drug-likeness classification, distinguishing approved drugs from negative sets drawn from chemical libraries.
Through comprehensive experiments, our sequential ADME multi-task learning achieves up to +2.4% improvement over state-of-the-art baselines, and enhancing performance across tested MFMs by up to +18.2%. Case studies with clinically annotated drugs validate that respecting the PK hierarchy produces more relevant predictions, reflecting drug discovery phases. These findings underscore the potential of ADME-DL to significantly enhance the early-stage filtering of candidate molecules, bridging the gap between purely structural screening methods and PK-aware modeling.

July 24, 2025
9:00-9:20
Proceedings Presentation: Understanding the Sources of Performance in Deep Drug Response Models Reveals Insights and Improvements
Confirmed Presenter: Nikhil Branson, queen mary university of london, United Kingdom
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Nikhil Branson, Nikhil Branson, queen mary university of london
  • Pedro Rodriguez Cutillas, Pedro Rodriguez Cutillas, Barts Cancer Institute
  • Conrad Bessant, Conrad Bessant, Queen Mary - University of London

Presentation Overview:Show

Anti-cancer drug response prediction (DRP) using cancer cell lines (CLs) is crucial in stratified medicine and drug discovery. Recently new deep learning models for DRP have improved performance over their predecessors. However, different models use different input data types and architectures making it hard to find the source of these improvements. Here we consider published DRP models that report state-of-the-art performance predicting continuous response values. These models take chemical structures of drugs and omics profiles of CLS as input. By experimenting with these models and comparing with our simple benchmarks we show that no performance comes from drug features, instead, performance is due to the transcriptomics CL profiles. Furthermore, we show that, depending on the testing type, much of the current reported performance is a property of the training target values. We address these limitations by creating BinaryET and BinaryCB that predict binary drug response values, guided by the hypothesis that this reduces the noise in the drug efficacy data. Thus, better aligning them with biochemistry that can be learnt from the input data. BinaryCB leverages a chemical foundation model, while BinaryET is trained from scratch using a transformer-type architecture. We show that these models learn useful chemical drug features, which is the first time this has been demonstrated for multiple testing types to our knowledge. We further show binarising the drug response values causes the models to learn useful chemical drug features. We also show that BinaryET improves performance over BinaryCB, and the published models that report state-of-the-art performance.

July 24, 2025
9:20-9:40
Proceedings Presentation: FACT: Feature Aggregation and Convolution with Transformers for predicting drug classification code
Confirmed Presenter: Gwang-Hyeon Yun, Yonsei University - Mirae Campus, South Korea
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Gwang-Hyeon Yun, Gwang-Hyeon Yun, Yonsei University - Mirae Campus
  • Jong-Hoon Park, Jong-Hoon Park, Yonsei University - Mirae Campus
  • Young-Rae Cho, Young-Rae Cho, Yonsei University - Mirae Campus

Presentation Overview:Show

Motivation: Drug repositioning, identifying new therapeutic applications for existing drugs, can significantly reduce the time and cost involved in drug development. Recent studies have explored the use of Anatomical Therapeutic Chemical (ATC) codes in drug repositioning, offering a systematic framework to predict ATC codes for a drug. The ATC classification system organizes drugs according to their chemical properties, pharmacological actions, and therapeutic effects. However, its complex hierarchical structure and the limited scalability at higher levels present significant challenges for achieving accurate ATC code prediction.
Results: We propose a novel approach to predict ATC codes of drugs, named Feature Aggregation and Convolution with Transformer models (FACT). This method computes three types of drug similarities, incorporating ATC code similarity with hierarchical weights and masked drug-ATC code associations. These features are then aggregated for each target drug-ATC code pair and processed through a convolution-transformer encoder to generate three embeddings. The embeddings are finally used to estimate the probability of an association between the target pair. The experimental results demonstrate that the proposed method achieves an AUROC of 0.9805 and an AUPRC of 0.9770 at level 4 of the ATC codes, outperforming the previous methods by 15.05% and 18.42%, respectively. This study highlights the effectiveness of integrating diverse drug features and the potential of transformer-based models in ATC code prediction.

July 24, 2025
9:40-10:00
Proceedings Presentation: Efficient 3D kernels for molecular property prediction
Confirmed Presenter: Ankit, Indian Institute of Technology Palakkad, India
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Ankit, Ankit, Indian Institute of Technology Palakkad
  • Sahely Bhadra, Sahely Bhadra, Indian Institute of Technology Palakkad
  • Juho Rousu, Juho Rousu, Aalto University

Presentation Overview:Show

This paper addresses the challenge of incorporating 3-dimensional structural information in graph kernels for machine learning-based virtual screening, a crucial task in drug discovery. Existing kernels that capture 3D information often suffer from high computational complexity, which limits their scalability. To overcome this, we propose the 3-dimensional chain motif graph kernel (c-MGK), which effectively integrates essential 3D structural properties—bond length, bond angle, and torsion angle—within the three-hop neighborhood of each atom in a molecule. In addition, we introduce a more computationally efficient variant, the 3-dimensional graph hopper kernel (3DGHK), which reduces the complexity from the state-of-the-art $\mathcal{O}(n^{6})$ (for the 3D pharmacophore kernel) to $\mathcal{O}(n^{2}(m + \log(n) + \delta^2 +dT^{6}))$. Here, $n$ is the number of nodes, $T$ is the highest degree of the node, $m$ is the number of edges, $\delta$ is the diameter of the graph, and $d$ is the dimension of the attributes of the nodes. We conducted experiments on 21 datasets, demonstrating that 3DGHK not only outperforms state-of-the-art 2D and 3D graph kernels, but also surpasses deep learning models in classification accuracy, offering a powerful and scalable solution for virtual screening tasks.

July 24, 2025
11:20-11:40
Haplotype-specific copy number profiling of cancer genomes from long reads sequencing data
Confirmed Presenter: Tanveer Ahmad, NIH/NCI, United States
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Tanveer Ahmad, Tanveer Ahmad, NIH/NCI
  • Ayse Keskus, Ayse Keskus, NIH/NCI
  • Mikhail Kolmogorov, Mikhail Kolmogorov, NIH/NCI
  • Sergey Aganezov, Sergey Aganezov, Oxford Nanopore
  • Midhat Farooqi, Midhat Farooqi, Childrens Mercy
  • Anton Goretsky, Anton Goretsky, University of Maryland
  • Ataberk Donmez, Ataberk Donmez, University of Maryland
  • Michael Dean, Michael Dean, NIH/NCI

Presentation Overview:Show

Attached as PDF

July 24, 2025
11:40-12:00
Multi-omics and liquid biopsy profiling of rapid autopsies reveals evolutionary dynamics and heterogeneity in metastatic bladder cancer
Confirmed Presenter: Pushpa Itagi, Public Health Sciences, Fred Hutch
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Pushpa Itagi, Pushpa Itagi, Public Health Sciences
  • Samantha Schuster, Samantha Schuster, Public Health Sciences
  • Sonali Arora, Sonali Arora, Public Health Sciences
  • Patricia Galipeau, Patricia Galipeau, Public Health Sciences
  • Hung-Ming Lam, Hung-Ming Lam, Department of Urology
  • Andrew Hsieh, Andrew Hsieh, Human Biology Division
  • Gavin Ha, Gavin Ha, Public Health Sciences

Presentation Overview:Show

The extensive molecular, transcriptomic and genomic complexity of metastatic bladder cancer (mBLCA) significantly complicates clinical management. Approximately 75% of mBLCA cases are conventional urothelial carcinoma, while 25% display variant histologies, which have a poorer prognosis. We characterized heterogeneity and clonal evolution in a rapid autopsy cohort of 20 patients using tumor tissues, matched normal samples, and cell-free DNA (cfDNA). Clonal evolution and metastatic seeding and migration patterns were inferred from mutation data for all patients. We used COSMIC signatures linking mutation profiles to histological and clinical features for various subtypes. Custom approaches and frameworks were developed for analyzing mutations, copy number alterations (CNAs), structural variants (SVs) in tumors and cfDNA. Mutational clonal evolution analyses and RNA-seq highlighted cisplatin resistance in the plasmacytoid urothelial carcinoma (PUC) subtype, driven by enhanced DNA damage response pathways. Most patients showed significant mutational heterogeneity (~20–30% subclonal) and for CNAs/SVs (>40% subclonal), potentially driving therapy resistance and elevating tumor heterogeneity. cfDNA detected about 90% of founder, 85% shared, and 25% private mutations from matched tumors. Nucleosome profiling from cfDNA differentiated mBLCA from healthy controls and identified variant-specific transcription factors that are active in mBLCA. Integrating multi-omics with cfDNA effectively captures intra-patient and inter-patient tumor heterogeneity, providing a comprehensive view of clonal dynamics. Insights and findings from this work pave the way for targeted therapies against evolving tumor clones and offer strategies to overcome resistance mechanisms in mBLCA.

July 24, 2025
12:00-12:20
Using spatial transcriptomics to elucidate the primary site of Cancers of Unknown Primary (CUPs)
Confirmed Presenter: Oscar González Velasco, German Cancer Research Center DKFZ, Germany
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Oscar González Velasco, Oscar González Velasco, German Cancer Research Center DKFZ
  • Siao-Han Wong, Siao-Han Wong, German Cancer Research Center (DKFZ)
  • Marta Casado, Marta Casado, Institut de Recerca Contra la Leucèmia Josep Carreras
  • Veronica Davalos, Veronica Davalos, Institut de Recerca Contra la Leucèmia Josep Carreras
  • Javier De Las Rivas Sanz, Javier De Las Rivas Sanz, Salamanca Cancer Research Center
  • Manell Steller, Manell Steller, Institut de Recerca Contra la Leucèmia Josep Carreras
  • Benedikt Brors, Benedikt Brors, German Cancer Research Center (DKFZ)

Presentation Overview:Show

Cancers of unknown primary (CUP) are a challenging group of poorly differentiated metastatic cancers, that due to its nature limited treatment options are available, resulting in a poor prognosis and overall sur-vival. Recently, novel predictive models to characterize CUP patients showed encouraging results and suggested relevant therapeutic interventions yet lacked consistency and interpretability to be widely adopt-ed in clinical care.
We have developed a state-of-the-art AI CNN using bulk RNA-Seq gene expression and prior knowledge in the form of curated gene signature of transcription factors and their associated gene targets. The training corpus consists of more than 27000 samples from cancer patients and healthy donors, targeting 28 primary sites. The model displayed an accuracy of 97.17% on validation data at predicting the primary sites. Additionally, we analyzed 40 spatial transcriptomic samples from a wide range of known primary sites, including distant metastasis, for which we unambiguously located the correct primary site in 39 of them. Additionally, we analysed 20 novel CUP spatial transcriptomics samples. Results show that, by using annotations from pathologist, our suggested primaries could help to identify plausible origins, yielding coherent results (in contrast with the homing tissue) for those which did not have any clinical-derived hypothesis.
By identifying the true primary site from metastatic CUPs we hope to provide in the future clinical bene-fits from site-specific therapies, opening the possibility for many existing treatment options.

July 24, 2025
12:20-12:40
Inherited genetic risk factors associated with young adult versus late-onset lung cancers
Confirmed Presenter: Zeynep H. Gümüş, Icahn School of Medicine at Mount Sinai, United States
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Myvizhi Esai Selvan, Myvizhi Esai Selvan, Icahn School of Medicine at Mount Sinai
  • Robert J. Klein, Robert J. Klein, Icahn School of Medicine at Mount Sinai
  • Zeynep H. Gümüş, Zeynep H. Gümüş, Icahn School of Medicine at Mount Sinai

Presentation Overview:Show

Genetics plays a key role in lung cancer risk. While lung cancer primarily affects older adults, incidence among young adults is increasing. However, whether the germline genetics differ between young adults (<45 years) and older lung cancer patients (≥45 years) remains unclear.

We performed whole-genome sequencing on 171 predominantly young lung cancer patients and integrated germline whole-exome sequencing datasets from existing lung cancer cohorts and biobanks, totaling 9,065 participants—the largest analysis of lung cancer patients to date, with 186 young adults and 6359 older cases after sample QC. We compared the prevalence of rare pathogenic and likely pathogenic (P/LP) variants in cancer-related genes and 33,591 pathways from the Human Molecular Signatures Database (MSigDB) between two age groups using Fisher’s exact test, accounting for histology, gender and smoking status.

Young adult lung cancer patients harbored significantly more rare P/LP variants in DNA damage response genes compared to older patients, especially in lung squamous cell carcinoma patients and females. This association persisted in lung adenocarcinoma patients after controlling for smoking status. Young adult patients showed enrichment of rare P/LP variants in cancer driver, Fanconi Anemia and complement pathway genes. Notably, rare P/LP variants in BRIP1, ERCC6 and MSH5 were significantly more prevalent in young adult patients.

Our results demonstrate that the inherited genetics of early-onset lung cancer differs significantly from late-onset lung cancer. These findings can inform age-specific risk assessment and guide precision prevention, screening and targeted treatment strategies for young adult individuals harboring these variants.

July 24, 2025
12:40-13:00
pC-SAC: Method for High-Resolution 3D Genome Reconstruction from Low-Resolution Hi-C Data
Confirmed Presenter: Carlos Angel, Department of Biomedical Informatics, Columbia University
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Carlos Angel, Carlos Angel, Department of Biomedical Informatics
  • Narjis El Amraoui, Narjis El Amraoui, New York Genome Center
  • Gamze Gürsoy, Gamze Gürsoy, Department of Biomedical Informatics

Presentation Overview:Show

The three-dimensional (3D) organization of the genome is crucial for gene regulation, with disruptions linked to various diseases. High-throughput Chromosome Conformation Capture (Hi-C) and related technologies have advanced our understanding of genome architecture by mapping interactions between distant genomic regions. However, capturing enhancer-promoter interactions at high resolution remains challenging due to the high sequencing depth required. We introduce pC-SAC (probabilistically Constrained-Self-Avoiding-Chromatin), a novel computational method for producing accurate high-resolution Hi-C matrices from low-resolution data. pC-SAC uses adaptive importance sampling with sequential Monte Carlo to generate ensembles of 3D chromatin chains that satisfy physical constraints derived from low-resolution Hi-C data. Our method achieves over 95% accuracy in reconstructing high-resolution chromatin maps and identifies novel interactions enriched with candidate cis-regulatory elements (cCREs) and expression Quantitative Trait Loci (eQTLs). Benchmarking against state-of-the-art deep learning models demonstrates pC-SAC's superior performance in both short- and long-range interaction reconstruction. pC-SAC offers a cost-effective solution for enhancing the resolution of Hi-C data, thus enabling deeper insights into genome organization and its role in gene regulation and disease. Our tool can be found at https://github.com/G2Lab/pCSAC.

July 24, 2025
14:00-14:20
Proceedings Presentation: HIDE: Hierarchical cell-type Deconvolution
Confirmed Presenter:
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Dennis Völkl, Dennis Völkl, Institute of Theoretical Physics
  • Malte Mensching-Buhr, Malte Mensching-Buhr, Department of Medical Bioinformatics
  • Thomas Sterr, Thomas Sterr, Institute of Theoretical Physics
  • Sarah Bolz, Sarah Bolz, Institute of Human Anatomy and Embryology
  • Andreas Schäfer, Andreas Schäfer, Institute of Theoretical Physics
  • Nicole Seifert, Nicole Seifert, Department of Medical Bioinformatics
  • Jana Tauschke, Jana Tauschke, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School
  • Austin Rayford, Austin Rayford, Department of Biomedicine and Centre for Cancer Biomarkers
  • Oddbjørn Straume, Oddbjørn Straume, Cancer Clinic
  • Helena U. Zacharias, Helena U. Zacharias

Presentation Overview:Show

Motivation: Cell-type deconvolution is a computational approach to infer cellular distributions from bulk transcriptomics data. Several methods have been proposed, each with its own advantages and disadvantages. Reference based approaches make use of archetypic transcriptomic profiles representing individual cell types. Those reference profiles are ideally chosen such that the observed bulks can be reconstructed as a linear combination thereof. This strategy, however, ignores the fact that cellular populations arise through the process of cellular differentiation, which entails the gradual emergence of cell groups with diverse morphological and functional characteristics.
Results: Here, we propose Hierarchical cell-type Deconvolution (HIDE), a cell-type deconvolution approach which incorporates a cell hierarchy for improved performance and interpretability. This is achieved by a hierarchical procedure that preserves estimates of major cell populations while inferring their respective subpopulations. We show in simulation studies that this procedure produces more reliable and more consistent results than other state-of-the-art approaches. Finally, we provide an example application of HIDE to explore breast cancer specimens from TCGA.
Availability: A python implementation of HIDE is available at zenodo: doi:10.5281/zenodo.14724906.

July 24, 2025
14:20-14:40
Proceedings Presentation: RVINN: A Flexible Modeling for Inferring Dynamic Transcriptional and Post-Transcriptional Regulation Using Physics-Informed Neural Networks
Confirmed Presenter: Osamu Muto, Division of Cancer Informatics, Nagoya University Graduate School of Medicine
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Osamu Muto, Osamu Muto, Division of Cancer Informatics
  • Zhongliang Guo, Zhongliang Guo, Division of Cancer Systems Biology
  • Rui Yamaguchi, Rui Yamaguchi, Division of Cancer Systems Biology

Presentation Overview:Show

Dynamic gene expression is controlled by transcriptional and post-transcriptional regulation. Recent studies on transcriptional bursting and buffering have increasingly highlighted the dynamic gene regulatory mechanisms. However, direct measurement techniques still face various constraints and require complementary methodologies, which are both comprehensive and versatile. To address this issue, inference approaches based on transcriptome data and differential equation models representing the messenger RNA lifecycle have been proposed. However, the inference of complex dynamics under diverse experimental conditions and biological scenarios remains challenging. In this study, we developed a flexible modeling using Physics-Informed Neural Networks and demonstrated its performance using simulation and experimental data. Our model has the ability to computationally revalidate and visualize dynamic biological phenomena, such as transcriptional ripple, co-bursting, and buffering in a breast cancer cell line. Furthermore, our results suggest putative molecular mechanisms underlying these phenomena. We propose a novel approach for inferring transcriptional and post-transcriptional regulation and expect to offer valuable insights for experimental and systems biology.

July 24, 2025
14:40-15:00
A deep learning framework for predicting single gene expression from cell-free DNA
Confirmed Presenter: Robert Patton, Fred Hutchinson Cancer Center, United States
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Robert Patton, Robert Patton, Fred Hutchinson Cancer Center
  • Alexander Netzley, Alexander Netzley, Fred Hutchinson Cancer Center
  • Thomas Persse, Thomas Persse, Fred Hutchinson Cancer Center
  • Akira Nair, Akira Nair, Fred Hutchinson Cancer Center
  • Peter Nelson, Peter Nelson, Fred Hutchinson Cancer Center
  • Gavin Ha, Gavin Ha, Division of Public Health Sciences

Presentation Overview:Show

Liquid biopsy derived circulating tumor DNA (ctDNA) profiling is increasingly used as a minimally invasive alternative to traditional biopsies. Epigenetic inference from ctDNA has made considerable strides, but current methods struggle with single gene resolution and require specialized assays or ultra-deep, targeted sequencing. Herein we jointly introduce Triton, a tool for comprehensive fragmentomic and nucleosome profiling of cell-free DNA (cfDNA), and Proteus, a multi-modal deep learning framework for predicting single gene expression as a direct RNA-Seq analog, using standard depth whole genome sequencing of cfDNA. By synthesizing fragmentation and inferred nucleosome positioning patterns in the promoter and gene body, Proteus is capable of reproducing expression profiles from patient-derived xenograft (PDX) pure ctDNA with an accuracy similar to RNA-Seq technical replicates. Applying Proteus to cfDNA from four patient cohorts with matched tumor RNA-Seq, we show that the model can accurately predict the expression of specific prognostic and phenotype markers and therapeutic targets at as low as 3% tumor fraction. As a direct analog to RNA-Seq, we further confirm this method’s immediate applicability to existing tools through accurate prediction of gene set and pathway enrichment scores. Our results demonstrate the potential clinical utility of Triton and Proteus as minimally invasive tools for cancer monitoring and therapeutic guidance, without requiring specialized assays or targeted panels.

July 24, 2025
15:00-15:20
MAGPIE: Multi-modal alignment of genes and peaks for integrated exploration of spatial transcriptomics and spatial metabolomics data
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Eleanor Williams, Eleanor Williams, Cambridge Stem Cell Institute
  • Lovisa Franzén, Lovisa Franzén, Department of Gene Technology
  • Martina Olsson Lindvall, Martina Olsson Lindvall, Safety Sciences
  • Gregory Hamm, Gregory Hamm, Integrated Bioanalysis
  • Steven Oag, Steven Oag, Animal Science and Technology
  • James Denholm, James Denholm, Integrated Bioanalysis
  • Azam Hamidinekoo, Azam Hamidinekoo, Pathology
  • Muntasir Mamun Majumder, Muntasir Mamun Majumder, Safety Sciences

Presentation Overview:Show

Recent developments in spatially resolved -omics have enabled studies linking gene expression and metabolite levels to tissue morphology, offering new insights into biological pathways. By capturing multiple modalities on matched tissue sections, one can better probe how different biological entities interact in a spatially coordinated manner. However, such cross-modality integration presents experimental and computational challenges.

To align multimodal datasets into a shared coordinate system and facilitate enhanced integration and analysis, we propose MAGPIE (Multi-modal Alignment of Genes and Peaks for Integrated Exploration), a framework for co-registering spatially resolved transcriptomics, metabolomics, and tissue morphology from the same or consecutive sections.

We illustrate the generalisability and scalability of MAGPIE on spatial multi-omics data from multiple tissues, combining Visium with both MALDI and DESI mass spectrometry imaging. MAGPIE was also applied to newly generated multimodal datasets created using specialised experimental sampling strategy to characterise the metabolic and transcriptomic landscape in an in vivo model of drug-induced pulmonary fibrosis, to showcase the linking of small-molecule co-detection with endogenous responses in lung tissue.

MAGPIE highlights the refined resolution and increased interpretability of spatial multimodal analyses in studying tissue injury, particularly in pharmacological contexts, and offers a modular, accessible computational workflow for data integration.

July 24, 2025
15:20-15:40
CAdir: Fast Clustering and Visualization of Single-Cell Transcriptomics Data by Direction in CA Space
Confirmed Presenter: Clemens Kohl, Max Planck Institute for Molecular Genetics, Germany
Track: GenCompBio: General Computational Biology

Room: 03A
Format: In person

Authors List: Show

  • Clemens Kohl, Clemens Kohl, Max Planck Institute for Molecular Genetics
  • Martin Vingron, Martin Vingron, Max Planck Institute for Molecular Genetics

Presentation Overview:Show

Clustering for single-cell RNA-seq aims at finding similar cells and grouping them into biologically meaningful clusters. Many available clustering algorithms however do not not provide the cluster defining marker genes or are unable to infer the number of clusters in an unsupervised manner as well as lack tools to easily determine the quality of the label assignments. Therefore, clustering quality is commonly evaluated by visually inspecting low-dimensional embeddings as produced by e.g. UMAP or t-SNE. These embeddings can, however, distort the true cluster structure and are known to produce radically different embeddings depending on the chosen hyperparameters.

In order to improve the interpretability of clustering results, we developed CAdir, a clustering algorithm that can infer the number of clusters in the data, determine cluster specific genes and provides easy to interpret diagnostic plots. CAdir exploits the geometry induced by correspondence analysis (CA) to cluster cells as well as cluster associated genes based on their direction in CA space. Using the angle between the cluster directions, it is able to automatically infer the number of clusters in the data by merging and splitting clusters. A comprehensive set of diagnostic and explanatory plots provides users with valuable feedback about the clustering decisions and the quality of the final as well as intermediary clusters. CAdir is scalable to even the largest data set and provides similar clustering performance to other state-of-the-art cell clustering algorithms in our benchmarking.

CAdir can be downloaded from GitHub: https://github.com/VingronLab/CAdir