The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 24, 2025
8:40-9:00
Proceedings Presentation: GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search
Confirmed Presenter: Jiayu Shang, Department of Information Engineering, Chinese University of Hong Kong
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Fuchuan Qu, Fuchuan Qu, Department of Electrical Engineering
  • Cheng Peng, Cheng Peng, Department of Electrical Engineering
  • Jiaojiao Guan, Jiaojiao Guan, Department of Electrical Engineering
  • Donglin Wang, Donglin Wang, School of Environmental Science & Engineering
  • Yanni Sun, Yanni Sun, Department of Electrical Engineering
  • Jiayu Shang, Jiayu Shang, Department of Information Engineering

Presentation Overview:Show

Motivation: Nucleocytoplasmic large DNA viruses (NCLDVs) are notable for their large genomes and extensive gene repertoires, which contribute to their widespread environmental presence and critical roles in processes such as host metabolic reprogramming and nutrient cycling. Metagenomic sequencing has emerged as a powerful tool for uncovering novel NCLDVs in environmental samples. However, identifying NCLDV sequences in metagenomic data remains challenging due to their high genomic diversity, limited reference genomes, and shared regions with other microbes. Existing alignment-based and machine learning methods struggle with achieving optimal trade-offs between sensitivity and precision.

Results: In this work, we present GiantHunter, a reinforcement learning-based tool for identifying NCLDVs from metagenomic data. By employing a Monte Carlo tree search strategy, GiantHunter dynamically selects representative non-NCLDV sequences as the negative training data, enabling the model to establish a robust decision boundary. Benchmarking on rigorously designed experiments shows that GiantHunter achieves high precision while maintaining competitive sensitivity, improving the F1-score by 10% and reducing computational cost by 90% compared to the second-best method. To demonstrate its real-world utility, we applied GiantHunter to 60 metagenomic datasets collected from six cities along the Yangtze River, located both upstream and downstream of the Three Gorges Dam. The results reveal significant differences in NCLDV diversity correlated with proximity to the dam, likely influenced by reduced flow velocity caused by the dam. These findings highlight GiantHunter's potential to advance our understanding of NCLDVs and their ecological roles in diverse environments.

July 24, 2025
9:00-9:10
CAMI Benchmarking Portal: online evaluation and ranking of metagenomic software
Confirmed Presenter: Fernando Meyer, Helmholtz Centre for Infection Research, Braunschweig
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Fernando Meyer, Fernando Meyer, Helmholtz Centre for Infection Research
  • Gary Robertson, Gary Robertson, Helmholtz Centre for Infection Research
  • Zhi-Luo Deng, Zhi-Luo Deng, Helmholtz Centre for Infection Research
  • David Koslicki, David Koslicki, Penn State University
  • Alexey Gurevich, Alexey Gurevich, Helmholtz Institute for Pharmaceutical Research Saarland
  • Alice C. McHardy, Alice C. McHardy, Helmholtz Centre for Infection Research

Presentation Overview:Show

Finding appropriate software and parameter settings to process shotgun metagenome data is essential for meaningful metagenomic analyses. To enable objective and comprehensive benchmarking of metagenomic software, the community-led initiative for the Critical Assessment of Metagenome Interpretation (CAMI) promotes standards and best practices. Since 2015, CAMI has provided comprehensive datasets, benchmarking guidelines, and challenges. However, benchmarking had to be conducted offline, requiring substantial time and technical expertise and leading to gaps in results between challenges. We present the CAMI Benchmarking Portal — a central repository of CAMI resources and web server for the evaluation and ranking of metagenome assembly, binning, and taxonomic profiling software. The portal simplifies evaluation, enabling users to easily compare their results with previous and other users’ submissions through a variety of metrics and visualizations. The portal currently hosts 28,675 results and is freely available at https://cami-challenge.org/.

July 24, 2025
9:10-9:20
Invited Presentation: CAMI community exchange
Confirmed Presenter: Alice McHardy, Helmoltz Centre for Infection Research, Germany
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Alice McHardy, Alice McHardy, Helmoltz Centre for Infection Research
July 24, 2025
9:20-9:30
NanoGraph: Mapping Nanopore Squiggles to Graphs Enables Accurate Taxonomic Assignment
Confirmed Presenter: Wenhuan Zeng, University of Tuebingen, Germany
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Wenhuan Zeng, Wenhuan Zeng, University of Tuebingen
  • Daniel H. Huson, Daniel H. Huson, University of Tuebingen

Presentation Overview:Show

Nanopore sequencing technology offers long sequencing reads and real-time analysis capabilities, making it a powerful tool for addressing diverse questions in the life sciences. This technology detects electronic raw signals from samples, which are converted into nucleotide sequences (A, T, G, and C) through a process known as basecalling. These sequences can subsequently be used for various types of analysis. To enhance the efficiency of taxonomic classification in Nanopore sequencing and explore the challenges of applying deep learning algorithms to ultra-long sequences, we developed NanoGraph, which is a graph-based deep learning framework designed to classify samples based on their taxonomic lineages. NanoGraph processes raw signals (of substantial length) by transforming them into topological graph structures using novel methods. We evaluated NanoGraph’s performance using a customized simulated dataset and benchmarked it against a previous study on public datasets, demonstrating superior results. Additionally, we assessed its practical usability after fine-tuning the trained model on real raw signal datasets generated in our wet lab. In summary, NanoGraph provides a robust and effective approach for the taxonomic classification of Nanopore-sequenced samples, offering insights that advance the application of graph neural networks to raw signal data and help bridge the gap between computational efficiency and ultra-long sequencing reads.

July 24, 2025
9:30-9:40
MEGAN7: Enhanced Optimization and Advanced Functionality for Metagenomic Analysis
Confirmed Presenter: Anupam Gautam, University of Tuebingen/ IMPRS "From Molecules to Organisms", Max Planck institute for Biology Tuebingen
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Anupam Gautam, Anupam Gautam, University of Tuebingen/ IMPRS "From Molecules to Organisms"
  • Daniel H. Huson, Daniel H. Huson, University of Tuebingen

Presentation Overview:Show

MEGAN is a widely used, user-friendly tool for metagenomic analysis, suitable for long and short read data, and remains the only tool with a GUI interface. MEGAN7 introduces optimized workflows and enhanced functionality. By utilizing smaller, clustered reference databases, MEGAN7 improves computational efficiency while maintaining high-quality taxonomic and functional assignments, making it a scalable solution for diverse datasets.
This study presents current work on MEGAN7, a major update of our MEGAN software, and highlights the impact of utilizing smaller reference databases on the computational efficiency and effectiveness of metagenomic sequencing data analysis, as integrated into MEGAN7.
Metagenomic analysis was conducted on short and long reads from ten diverse datasets. Reads were aligned to various resolutions of the UniRef database (100%, 90%, and 50%) and clustered NCBI-nr databases (90% and 50% identity) using DIAMOND. Taxonomic and functional binning of the aligned reads was carried out using MEGAN7.
Smaller reference databases, particularly at 90% and 50% identity, significantly accelerated processing times while maintaining high-quality alignment and assignment rates. The integration of DIAMOND's clustering capabilities further enhanced efficiency, demonstrating improved performance across all downsized databases. MEGAN7 achieved good and agreeable assignment rates for both taxonomic and functional binning, even with reduced database sizes.
These findings illustrate that downsizing reference databases effectively reduces the computational burden of metagenomic analysis without compromising result quality. The incorporation of DIAMOND's clustering features offers additional efficiency gains. With these optimized workflows, MEGAN7 presents a scalable and efficient tool for metagenomic data analysis, offering enhanced functionality for diverse datasets.

July 24, 2025
9:40-9:50
TaxSEA: Rapid Interpretation of Microbiome Alterations Using Taxon Set Enrichment Analysis and Public Databases
Confirmed Presenter: Feargal Ryan, Flinders University, Australia
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Feargal Ryan, Feargal Ryan, Flinders University

Presentation Overview:Show

Microbial communities are essential regulators of ecosystem function, with their composition commonly assessed through DNA sequencing. Most current tools focus on detecting changes among individual taxa (e.g., species or genera), however in other omics fields, such as transcriptomics, enrichment analyses like Gene Set Enrichment Analysis (GSEA) are commonly used to uncover patterns not seen with individual features. Here, we introduce TaxSEA, a taxon set enrichment analysis tool available as an R package, a web portal (https://shiny.taxsea.app), and a Python package. TaxSEA integrates taxon sets from five public microbiota databases (BugSigDB, MiMeDB, GutMGene, mBodyMap, and GMRepoV2) while also allowing users to incorporate custom sets such as taxonomic groupings. In-silico assessments show TaxSEA is accurate across a range of set sizes. When applied to differential abundance analysis output from Inflammatory Bowel Disease and Type 2 Diabetes metagenomic data, TaxSEA can rapidly identify changes in functional groups corresponding to known associations. We also show that TaxSEA is robust to the choice of differential abundance (DA) analysis package. In summary, TaxSEA enables researchers to efficiently contextualize their findings within the broader microbiome literature, facilitating rapid interpretation and advancing understanding of microbiome–host and environmental interactions.

July 24, 2025
9:50-10:00
SinProVirP: a Signature Protein-based Approach for Accurate and Efficient Profiling of the Human Gut Virome
Confirmed Presenter: Junhua Li, BGI Research, Belgrade 11000
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Junhua Li, Junhua Li, BGI Research
  • Fangming Yang, Fangming Yang, BGI Research
  • Liwen Xiong, Liwen Xiong, BGI Research
  • Min Li, Min Li, BGI Research
  • Xuyang Feng, Xuyang Feng, BGI Research
  • Huahui Ren, Huahui Ren, Institute of Intelligent Medical Research (IIMR)
  • Zhun Shi, Zhun Shi, Institute of Intelligent Medical Research (IIMR)
  • Huanzi Zhong, Huanzi Zhong, Institute of Intelligent Medical Research (IIMR)

Presentation Overview:Show

The human gut virome represents a critical yet underexplored microbial component that regulates bacterial communities, modulates host immunity, and maintains gut health. However, virome analysis remains challenging due to the vast diversity and genomic variability of viruses. Existing profiling methods often struggle with accuracy and efficiency, hindering their ability to detect novel viral species and perform large-scale analyses. Here, we present SinProVirP, a genus-level virome profiling tool based on signature proteins. By analyzing 275,202 phage genomes to establish a curated database of 109,221 signature proteins across 6,780 viral clusters (VCs), SinProVirP achieves genus-level phage quantification with precision and recall comparable to the benchmark method while reducing computational demands by over 80%. Crucially, SinProVirP significantly outperforms existing tools in detecting novel viruses, achieving over 80% recall by using signature protein-based identification strategy. Applied to inflammatory bowel disease (IBD) cohorts, SinProVirP revealed disease-specific virome dysbiosis, identified phage-host interactions, and improved performance of bacteria-only disease classification models. This approach enables robust, large-scale virome analysis, facilitates the integrative analysis of viral and bacterial communities, and improves our understanding of the virome’s role in health.

July 24, 2025
11:20-11:40
Proceedings Presentation: Leveraging Large Language Models to Predict Antibiotic Resistance in Mycobacterium tuberculosis
Confirmed Presenter: Conrad Testagrose, University of Florida, United States
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Conrad Testagrose, Conrad Testagrose, University of Florida
  • Sakshi Pandey, Sakshi Pandey, University of Florida
  • Mohammadali Serajian, Mohammadali Serajian, University of Florida
  • Simone Marini, Simone Marini, University of Florida
  • Mattia Prosperi, Mattia Prosperi, University of Florida
  • Christina Boucher, Christina Boucher, University of Florida

Presentation Overview:Show

Antibiotic resistance in Mycobacterium tuberculosis (MTB) poses a significant challenge to global public health. Rapid and accurate prediction of antibiotic resistance can inform treatment strategies and mitigate the spread of resistant strains. In this study, we present a novel approach leveraging large language models (LLMs) to predict antibiotic resistance in MTB (LLMTB). Our model is trained on a large dataset of genomic data and associated resistance profiles, utilizing natural language processing techniques to capture patterns and mutations linked to resistance. The model's architecture integrates state-of-the-art transformer-based LLMs, enabling the analysis of complex genomic sequences and the extraction of critical features relevant to antibiotic resistance. We evaluate our model's performance using a comprehensive dataset of MTB strains, demonstrating its ability to achieve high performance in predicting resistance to various antibiotics. Unlike traditional machine learning methods, fine-tuning or few-shot learning open avenues for LLMs to adapt to new or emerging drugs thereby reducing reliance on extensive data curation. Beyond predictive accuracy, LLMTB uncovers deeper biological insights, identifying critical genes, intergenic regions, and novel resistance mechanisms. This method marks a transformative shift in resistance prediction and offers significant potential for enhancing diagnostic capabilities and guiding personalized treatment plans, ultimately contributing to the global effort to combat tuberculosis and antibiotic resistance. All source code is publicly available at https://github.com/ctestagrose/LLMTB.

July 24, 2025
11:40-11:50
De novo discovery of conserved gene clusters in microbial genomes with Spacedust
Confirmed Presenter: Johannes Soeding, Max Planck institute for multidisciplinary sciences, Germany
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Ruoshi Zhang, Ruoshi Zhang, Max Planck institute for multidisciplinary sciences
  • Milot Mirdita, Milot Mirdita, Seoul National University
  • Johannes Soeding, Johannes Soeding, Max Planck institute for multidisciplinary sciences

Presentation Overview:Show

Metagenomics has revolutionized environmental and human-associated microbiome studies. However, the limited fraction of proteins with known biological process and molecular functions presents a major bottleneck. In prokaryotes and viruses, evolution favors keeping genes participating in the same biological processes co-localized as conserved gene clusters. Conversely, conservation of gene neighborhood indicates functional association. Spacedust is a tool for systematic, de novo discovery of conserved gene clusters. To find homologous protein matches it uses fast and sensitive structure comparison with Foldseek. Partially conserved clusters are detected using novel clustering and order conservation P-values. We demonstrate Spacedust's sensitivity with an all-vs-all analysis of 1\,308 bacterial genomes, identifying 72\,843 conserved gene clusters containing 58\% of the 4.2 million genes. It recovered recover 95% of antiviral defense system clusters annotated by a specialized tool. Spacedust's high sensitivity and speed will facilitate the large-scale annotation of the huge numbers of sequenced bacterial, archaeal and viral genomes.

July 24, 2025
11:50-12:00
Nerpa 2: linking biosynthetic gene clusters to nonribosomal peptide structures
Confirmed Presenter: Ilia Olkhovskii, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Ilia Olkhovskii, Ilia Olkhovskii, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)
  • Azat Tagirdzhanov, Azat Tagirdzhanov, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)
  • Alexey Gurevich, Alexey Gurevich, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)
  • Aleksandra Kushnareva, Aleksandra Kushnareva, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)
  • Petr Popov, Petr Popov, Neapolis University Pafos

Presentation Overview:Show

Nonribosomal peptides (NRPs) are clinically important molecules produced by microbial specialized enzymes encoded in biosynthetic gene clusters (BGCs). Linking BGCs to their products is crucial for predicting and manipulating NRP production, yet BGC-to-NRP biosynthesis is often complex and non-unique, making prediction from the genome challenging. Here, we present Nerpa 2, a high-throughput BGC–NRP matching tool. Compared to its predecessor, we improved prediction of NRP monomers selected during synthesis, introduced a hidden Markov model–based alignment strategy for handling complex biosynthetic paths, and added interactive visualizations for result interpretation.
We evaluated Nerpa 2 on 191 BGCs and 1,205 NRP structures, demonstrating a notable accuracy improvement over both Nerpa 1 and a related tool BioCAT (50% vs. 42% and 8%). In addition to higher overall precision, Nerpa 2 performs significantly better on especially challenging cases.
Nerpa 2 streamlines a range of tasks in NRP research, including annotation of computationally predicted BGCs, prioritization of BGCs more likely to yield novel NRPs, and guiding bioengineering experiments by identifying BGCs that yield molecules close to user-specified target structures. The software is freely available at https://github.com/gurevichlab/nerpa.

July 24, 2025
12:00-12:10
Phylo-Spec: a phylogeny-fusion deep learning model advances microbiome status identification
Confirmed Presenter: Xiaoquan Su, Qingdao University, China
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Junhui Zhang, Junhui Zhang, Qingdao University
  • Fan Meng, Fan Meng, Qingdao University
  • Yangyang Sun, Yangyang Sun, Qingdao University
  • Wenfei Xu, Wenfei Xu, Qingdao University
  • Shunyao Wu, Shunyao Wu, Qingdao University
  • Xiaoquan Su, Xiaoquan Su, Qingdao University

Presentation Overview:Show

Motivation: The human microbiome is crucial for health regulation and disease progression, presenting a valuable opportunity for health state classification. Traditional microbiome-based classification rely on pre-trained machine learning (ML) or deep learning (DL) models, which typically focus on microbial distribution patterns, neglecting the underlying relationships between microbes. As a result, model performance can be significantly affected by data sparsity, misclassified features, or incomplete microbial profiles.

Methods: To overcome these challenges, we introduce Phylo-Spec, a phylogeny-driven deep learning algorithm that integrates multi-aspect microbial information for improved status recognition. Phylo-Spec fuses convolutional features of microbes within a phylogenetic hierarchy via a bottom-up iteration, significantly alleviates the challenges due to sparse data and inaccurate profiling. Additionally, the model dynamically assigns unclassified species to virtual nodes on the phylogenetic tree based on higher-level taxonomy, minimizing interferences from uncertain microbes. Phylo-Spec also captures the feature importance via an information gain-based mechanism through the phylogenetic structure propagation, enhancing the interpretability of classification decisions.

Results: Phylo-Spec demonstrated superior efficacy in microbiome status classification across two in-silico synthetic datasets that simulates the aforementioned cases, outperforming existing ML and DL methods. Validation with real-world metagenomic and amplicon data further confirmed the model’s performance in multiple status classification, establishing a powerful framework for microbiome-based health state identification and microbe-disease association.

July 24, 2025
12:10-12:20
Beyond Taxonomy and Function: Protein Language Models for Scalable Microbial Representations
Confirmed Presenter: Petra Matyskova, Utrecht University, Netherlands
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Petra Matyskova, Petra Matyskova, Utrecht University
  • Gijs Selten, Gijs Selten, Utrecht University
  • Sanne Abeln, Sanne Abeln, Utrecht University
  • Ronnie de Jonge, Ronnie de Jonge, Utrecht University

Presentation Overview:Show

Traditional microbial representations based on taxonomy or functional annotations like KEGG Orthology (KOs) and OrthoFinder groups (OGs) suffer from low coverage, high dimensionality, or long computation times. In this work, we explore the use of protein large language models (PLLMs), specifically ESM-2, to generate compact and informative microbial embeddings. We benchmark these embeddings against KOs and OGs using a dataset of 988 microbial genomes. We compare the three approaches in terms of protein coverage, feature dimensionality, runtime, and predictive performance in a biologically relevant task: predicting the root competence of microbes on Arabidopsis thaliana. ESM-2 embeddings achieved full protein coverage and required less runtime than OGs or KOs while producing compact 320-dimensional feature sets. In the classification task, random forest and multi-layer perceptron based on ESM-2 embeddings outperformed traditional methods. Additionally, the results were replicated on external synthetic community datasets. Importantly, ESM-2 embeddings preserved relevant taxonomic and functional information, as confirmed through hierarchical clustering and PCA. Through analysing the embedding weights, we also identified key proteins predictive of root competence, including known and novel candidates. Our results suggest that PLLM-based microbial representations offer an efficient and scalable alternative to conventional functional annotation-based approaches, especially for small datasets common in microbiome studies. This approach lays the foundation for more advanced applications such as multi-modal embedding based data integration and the discovery of new biologically meaningful traits beyond taxonomic labels or annotated proteins.

July 24, 2025
12:20-12:30
Guided tokenizer enhances metagenomic language models performance
Confirmed Presenter: Ali Rahnavard, The George Washington University, United States
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Ali Rahnavard, Ali Rahnavard, The George Washington University
  • Vedant Mahangade, Vedant Mahangade, The George Washington University
  • Keith Crandall, Keith Crandall, The George Washington University

Presentation Overview:Show

Tokenization is a critical step in adapting language models for genomic and metagenomic sequence analysis. Traditional tokenization methods—such as fixed-length k-mers or statistical compression algorithms like byte-pair encoding (BPE)—often fail to capture the biological relevance embedded in DNA sequences. We introduce Guided Tokenization (GT), a novel, adaptive strategy that prioritizes biologically meaningful subsequences by leveraging importance scores derived from functional annotations, class distributions, and model attention mechanisms.

Unlike conventional approaches, GT dynamically selects high-importance tokens by integrating (1) class-specific unique k-mers, (2) frequently observed informative subsequences, (3) model-informed weighted tokens after fine-tuning, and (4) biologically annotated fragments such as promoters or coding regions. This token prioritization strategy is applied during pretraining, fine-tuning, and prediction phases of genomic language models (gLMs), enabling more efficient learning with fewer parameters and reduced sequence inflation.

We evaluated GT across a range of metagenomic classification and sequence modeling tasks, including taxonomic profiling, antibiotic resistance gene classification, and read classification (e.g., host vs. microbial and plasmid vs. chromosome). Results consistently demonstrate that GT improves model performance, especially for small and mid-sized models, by enhancing classification accuracy, representation quality, and computational efficiency. These findings position guided tokenization as a scalable and biologically aware framework for advancing the next generation of metagenomic language models.

July 24, 2025
12:30-12:40
REMAG: recovery of eukaryotic genomes from metagenomes using reference-free contrastive learning
Confirmed Presenter: Daniel Gómez-Pérez, Earlham Institute, United Kingdom
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Daniel Gómez-Pérez, Daniel Gómez-Pérez, Earlham Institute
  • Sebastién Raguideau, Sebastién Raguideau, Earlham Institute
  • Falk Hildebrand, Falk Hildebrand, Earlham Institute and Quadram Institute Bioscience
  • Christopher Quince, Christopher Quince, Earlham Institute

Presentation Overview:Show

Assembly-based metagenomic approaches, including generation of metagenome‑assembled genome (MAG) catalogues, are pivotal for exploring and understanding microbial communities. Yet, despite the relevance of protists and fungi for ecological communities, eukaryotic MAG recovery lags behind that of prokaryotes. State‑of‑the‑art binning pipelines rely on reference databases of single‑copy core genes that are sparse for eukaryotes. This problem is further complicated as reference databases scale poorly as sequence diversity and dataset size increase. Here, we present, REMAG (Recovery of Eukaryotic MAGs), a tool that learns from individual metagenomic datasets to recover eukaryotic bins. By embedding contig‑level composition and coverage features into a shared latent space optimized by contrastive learning followed by hierarchical clustering, the method accurately extracts representative bins. In benchmarks based on real and simulated synthetic community datasets of varying sizes (including prokaryotes and eukaryotes), we show its ability to recover eukaryotic genomes with higher completeness and less contamination than similar state-of-the-art tools, which often result in high fragmentation of eukaryotic bins. Overall, our approach provides a reference‑free method for eukaryotic binning that scales well with the increased growth and higher depth of diverse metagenomic datasets.

July 24, 2025
12:40-12:50
Flexible Log-odds Homology Features for Plasmid Identification
Confirmed Presenter: Tomas Vinar, Comenius University in Bratislava, Slovakia
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Brona Brejova, Brona Brejova, Faculty of Mathematics
  • Veronika Tordova, Veronika Tordova, Faculty of Mathematics
  • Kristian Andrascik, Kristian Andrascik, Faculty of Mathematics
  • Cedric Chauve, Cedric Chauve, Simon Fraser University
  • Tomas Vinar, Tomas Vinar, Comenius University in Bratislava

Presentation Overview:Show

We study the problem of plasmid identification in short-read assemblies of bacterial isolates. The goal is to classify individual contigs as coming from a chromosome or a plasmid. This problem is typically addressed by machine learning methods combining features derived from input contigs. Some methods also use additional features based on homology to sequences typical for known plasmids or chromosomes. In this work we propose a method for creating such features using log-odds scores based on ideas similar to those traditionally used in sequence alignment scoring. The framework is flexible as it can handle both close homologs as well as protein domains capturing distant homology. Inclusion of these features into the plASgraph2 graph neural network significantly improves its accuracy.

July 24, 2025
12:50-13:00
Accurate plasmid reconstruction from metagenomics data using assembly-alignment graphs and contrastive learning
Confirmed Presenter: Pau Piera Lindez, University of Copenhagen, Novo Nordisk Foundation Center for Basic Metabolic Research
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Pau Piera Lindez, Pau Piera Lindez, University of Copenhagen
  • Jakob Nissen, Jakob Nissen, University of Copenhagen
  • Simon Rasmussen, Simon Rasmussen, University of Copenhagen

Presentation Overview:Show

Plasmids are extrachromosomal DNA molecules that enable horizontal gene transfer in bacteria, often conferring advantages such as antibiotic resistance. Despite their significance, plasmids are underrepresented in genomic databases due to challenges in assembling them, caused by mosaicism and micro-diversity. Current plasmid assemblers rely on detecting circular paths in single-sample assembly graphs, but face limitations due to graph fragmentation and entanglement, and low coverage. We introduce PlasMAAG (Plasmid and organism Metagenomic binning using Assembly Alignment Graphs), a framework to recover plasmids and organisms from metagenomic samples that leverages an approach that we call "assembly-alignment graphs” alongside common binning features. On synthetic benchmark datasets, PlasMAAG reconstructed 50–121% more near-complete plasmids than competing methods and improved the Matthews Correlation Coefficient of geNomad contig classification by 28–106%. On hospital sewage samples, PlasMAAG outperformed all other methods, reconstructing 33% more plasmid sequences. PlasMAAG enables the study of organism-plasmid associations and intra-plasmid diversity across samples, offering state-of-the-art plasmid reconstruction with reduced computational costs.

July 24, 2025
14:00-14:20
Proceedings Presentation: Predicting coarse-grained representations of biogeochemical cycles from metabarcoding data
Confirmed Presenter: Arnaud Belcour, Univ. Grenoble Alpes, Inria
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Arnaud Belcour, Arnaud Belcour, Univ. Grenoble Alpes
  • Loris Megy, Loris Megy, Gricad
  • Sylvain Stephant, Sylvain Stephant, French Geological Survey (BRGM)
  • Caroline Michel, Caroline Michel, French Geological Survey (BRGM)
  • Sétareh Rad, Sétareh Rad, French Geological Survey (BRGM)
  • Petra Bombach, Petra Bombach, Isodetect GmbH
  • Nicole Dopffel, Nicole Dopffel, NORCE Norwegian Research Center AS
  • Hidde de Jong, Hidde de Jong, Univ. Grenoble Alpes
  • Delphine Ropers, Delphine Ropers, Univ. Grenoble Alpes

Presentation Overview:Show

Motivation: Taxonomic analysis of environmental microbial communities is now routinely performed thanks to advances in DNA sequencing. Determining the role of these communities in global biogeochemical cycles requires the identification of their metabolic functions, such as hydrogen oxidation, sulfur reduction, and carbon fixation. These functions can be directly inferred from metagenomics data, but in many environmental applications metabarcoding is still the method of choice. The reconstruction of metabolic functions from metabarcoding data and their integration into coarse-grained representations of geobiochemical cycles remains a difficult bioinformatics problem today.

Results: We developed a pipeline, called Tabigecy, which exploits taxonomic affiliations to predict metabolic functions constituting biogeochemical cycles. In a first step, Tabigecy uses the tool EsMeCaTa to predict consensus proteomes from input affiliations. To optimise this process, we generated a precomputed database containing information about 2,404 taxa from UniProt. The consensus proteomes are searched using bigecyhmm, a newly developed Python package relying on Hidden Markov Models to identify key enzymes involved in metabolic function of biogeochemical cycles. The metabolic functions are then projected on coarse-grained representation of the cycles. We applied Tabigecy to two salt cavern datasets and validated its predictions with microbial activity and hydrochemistry measurements performed on the samples. The results highlight the utility of the approach to investigate the impact of microbial communities on geobiochemical processes.

Availability: The Tabigecy pipeline is available at https://github.com/ArnaudBelcour/tabigecy.
The Python package bigecyhmm and the precomputed EsMeCaTa database are also separately available at \https://github.com/ArnaudBelcour/bigecyhmm and https://doi.org/10.5281/zenodo.13354073, respectively.

July 24, 2025
14:20-14:30
CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data
Confirmed Presenter: Florian Plaza Oñate, Université Paris-Saclay, INRAE
Track: MICROBIOME

Room: 01B
Format: Live stream

Authors List: Show

  • Florian Plaza Oñate, Florian Plaza Oñate, Université Paris-Saclay
  • Lindsay Goulet, Lindsay Goulet, Université Paris-Saclay
  • Pauline Barbet, Pauline Barbet, Université Paris-Saclay
  • Alexandre Famechon, Alexandre Famechon, Université Paris-Saclay
  • Benoît Quinquis, Benoît Quinquis, Université Paris-Saclay
  • Eugeni Belda, Eugeni Belda, UMMISCO
  • Edi Prifti, Edi Prifti, UMMISCO
  • Emmanuelle Le Chatelier, Emmanuelle Le Chatelier, Université Paris-Saclay
  • Guillaume Gautreau, Guillaume Gautreau, Université Paris-Saclay

Presentation Overview:Show

Metagenomic sequencing provides profound insights into microbial communities, but it is often compromised by technical biases, including cross-sample contamination. This phenomenon arises when microbial content is inadvertently exchanged among concurrently processed samples, distorting microbial profiles and compromising the reliability of metagenomic data and downstream analyses.
Existing detection methods often rely on negative controls, which are inconvenient and do not detect contamination within real samples. Meanwhile, strain-level bioinformatics approaches fail to distinguish contamination from natural strain sharing and lack sensitivity.
To fill this gap, we introduce CroCoDeEL, a decision-support tool for detecting and quantifying cross-sample contamination. Leveraging linear modeling and a pre-trained supervised model, CroCoDeEL identifies specific contamination patterns in species abundance profiles. It requires no negative controls or prior knowledge of sample processing positions, offering improved accuracy and versatility.
Benchmarks across three public datasets demonstrate that CroCoDeEL accurately detects contaminated samples and identifies their contamination sources, even at low rates (<0.1%), provided sufficient sequencing depth. Notably, we discovered critical contamination cases in highly cited studies, calling some of their results into question. Our findings suggest that cross-sample contamination is a widespread yet underexplored issue in metagenomics and emphasize the necessity of systematically integrating contamination detection into sequencing quality control. Future work will consist in developping an innovative approach to remove the contamination signal detected by CroCoDeEL.
CroCoDeEL is freely available at https://github.com/metagenopolis/CroCoDeEL.

Reference
Goulet, L. et al. ""CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data"" bioRxiv (2025). https://doi.org/10.1101/2025.01.15.633153.

July 24, 2025
14:30-14:40
Longflow: A comprehensive end-to-end solution for long-read metagenomics.
Confirmed Presenter: Sebastien Raguideau, Earlham Institute, United Kingdom
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Sebastien Raguideau, Sebastien Raguideau, Earlham Institute
  • Chris Quince, Chris Quince, Earlham Institute

Presentation Overview:Show

Transitioning from short-read to long-read sequencing in metagenomics requires methodological refinements. We present Longflow, a versatile pipeline tailored for long-read data, supporting analysis from raw FASTQ/BAM files to annotated metagenome-assembled genomes (MAGs). Built with Snakemake and containerised for reproducibility, Longflow is robust and easily deployed on HPC systems.

Longflow enables flexible analysis, including per-sample or co-assembly schemes, and co-binning, leveraging samples not part of the assembly to enhance binning performance. It integrates tools for taxonomy (e.g., Silva, NR), functional annotation (KEGG, InterProScan), viral detection (GeNomad), and SNV calling (Longshot). MAGs are curated using a consensus approach from multiple binners and classified via GTDB-Tk.

To address the issue of chimeric contigs, particularly problematic in long-read assemblies due to larger contig sizes, we created a visualisation tool to detect these artefacts and implemented a fragmentation heuristic, thus improving MAG recovery and removing one source of contamination.

Longflow also facilitates the incorporation of short-read data for co-binning. We improved read assignment and overall binning results by using a novel k-mer coverage estimation method to handle ambiguous mappings.

Longflow is a reliable and flexible tool for contemporary metagenomic research, and it is constantly being developed and maintained to increase its functionality.

July 24, 2025
14:40-14:50
Long-reads metagenome-assembled genomes can be higher quality than reference genomes: the case of the Shanghai pet dog microbiome catalog
Confirmed Presenter: Luis Pedro Coelho, Queensland university of Technology, Australia
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Anna Cuscó, Anna Cuscó, Fudan University
  • Yiqian Duan, Yiqian Duan, Fudan University
  • Fernando Gil, Fernando Gil, Fudan University
  • Shaojun Pan, Shaojun Pan, Institute of Science and Technology for Brain-Inspired Intelligence
  • Nithya Kruthi, Nithya Kruthi, Queensland University of Technology
  • Alexei Chklovski, Alexei Chklovski, Queensland University of Technology
  • Xing-Ming Zhao, Xing-Ming Zhao, Tongji University
  • Luis Pedro Coelho, Luis Pedro Coelho, Queensland university of Technology

Presentation Overview:Show

We present a comprehensive analysis of the gut microbiome of 50 pet dogs living in Shanghai (China). Both long-read and short-read sequencing methods were employed to deeply sequence fecal samples, enabling high-quality metagenome-assembled genome (MAG) recovery. Polishing long-read assemblies with short reads notably improved MAG quality, particularly for genomes with lower sequencing coverage.

The final MAG collection comprises 2,676 MAGs (72% high-quality), representing 320 bacterial species, and captures global microbial diversity, evidenced by high read mapping rates (>90%) to external datasets from multiple countries. The predominant phyla were Bacillota, Bacteroidota, and Fusobacteriota.

Many of the resulting MAGs are of higher quality than reference genomes available for the same species. In particular, our MAGs more consistently contain ribosomal genes, tRNAs, and mobilome-associated genes; all classes that are known to be difficult to recover (even from sequencing isolates) using short-reads.

Extra-chromosomal (e.g., plasmids or viruses) are another blind spot when using short reads. We recovered 185 circular elements (comprising 58 plasmids, 30 viruses, and 97 elements that cannot be confidently assigned). Several of these contain antibiotic resistance genes, including beta-lactamases.

One-third of identified bacterial species were novel, particularly within genera such as CAG-269 and Dysosmobacter. Additionally, this study demonstrated clear microbiome differences between pet dogs and colony-living dogs, the latter showing higher microbial diversity and higher abundance of probiotic-associated species.

Overall, this study provides the best known resource for pet dog microbiome studies and demonstrates the value of hybrid sequencing to build the highest quality resources.

July 24, 2025
14:50-15:00
Use of Long-Read SMRT PacBio Sequencing for Detailed Genomic and Epigenetic Studies of Complex Microbial Communities in the Wheat Rhizosphere to Abiotic Stress
Confirmed Presenter: Oleg Reva, Centre for Bioinformatics and Computational Biology (CBCB), BGM
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Siphiwe Maseko, Siphiwe Maseko, Centre for Bioinformatics and Computational Biology (CBCB)
  • Nwabisa Ngwentle, Nwabisa Ngwentle, Centre for Microbial Ecology and Genomics (CMEG)
  • Teresa Coutinho, Teresa Coutinho, Centre for Microbial Ecology and Genomics (CMEG)
  • Ngwekazi Mehlomakulu, Ngwekazi Mehlomakulu, Dep. Consumer and Food Sciences
  • Oleg Reva, Oleg Reva, Centre for Bioinformatics and Computational Biology (CBCB)

Presentation Overview:Show

The wheat rhizosphere harbours complex microbial communities essential for plant health and soil fertility. Traditional sequencing reveals microbial diversity but often misses genomic and epigenetic interactions. Here, long-read SMRT PacBio sequencing was applied to a wheat field in South Africa (34.08551°S, 20.26628°E) to profile microbial communities across varying environmental conditions from August to November 2023, spanning heavy rains to extreme drought seasons. This approach enabled a detailed reconstruction of microbial interactions and the identification of key taxa influencing soil fertility. Network analysis revealed species-specific associations shaping the microbial community. Epigenetic analysis of metagenome assembled contigs demonstrated that Pseudomonas fluorescens, Flavobacterium pectinovorum, and Flavobacterium aquicola thrived in wet conditions but suffered during drought, evidenced by increased oxidized guanine residues in their genome under unfavourable conditions. Conversely, Amycolatopsis camponoti and some uncultured Alpha-proteobacteria and Actinomycetota struggled in floods but flourished in arid conditions. These findings demonstrate the varying responses of rhizobacterial community members to environmental stressors, highlighting the need for a strategic selection of beneficial bacteria used in agro-biopreparations. Selecting microbial inoculants based on their optimal environmental conditions can enhance their efficacy in improving soil fertility and crop resilience. Long-read SMRT sequencing enables species-level identification and detailed genomic and epigenetic insights, which could not be achieved before. Additionally, novel computational tools were developed for modelling microbial networks and predicting oxidized guanine distribution along metagenome-assembled contigs.
This study was conducted for the TRIBIOME Project (https://www.tribiome.eu/) and funded by the Horizon Europe research and innovation program (grant # 101084485).

July 24, 2025
15:00-15:10
proMGEflow: recombinase-based detection of mobile genetic elements in bacterial meta(genomes)
Confirmed Presenter: Anastasiia Grekova, EMBL Heidelberg, Technical University of Munich
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Anastasiia Grekova, Anastasiia Grekova, EMBL Heidelberg
  • Supriya Khedkar, Supriya Khedkar, BioQuant
  • Christian Schudoma, Christian Schudoma, EMBL Heidelberg
  • Chan Yeong Kim, Chan Yeong Kim, European Molecular Biology Laboratory
  • Daniel Podlesny, Daniel Podlesny, EMBL Heidelberg
  • Anthony Noel Fullam, Anthony Noel Fullam, EMBL Heidelberg
  • Jonas Richter, Jonas Richter, EMBL Heidelberg
  • Thomas Sebastian Schmidt, Thomas Sebastian Schmidt, University College Cork
  • Daniel Mende, Daniel Mende, Keio University
  • Suguru Nishijima, Suguru Nishijima, University of Tokyo
  • Askarbek Orakov, Askarbek Orakov, Harvard T.H. Chan School of Public Health
  • Michael Kuhn, Michael Kuhn, EMBL Heidelberg
  • Ivica Letunic, Ivica Letunic, EMBL Heidelberg
  • Peer Bork, Peer Bork, EMBL Heidelberg

Presentation Overview:Show

Mobile Genetic Elements (MGEs) are drivers of bacterial adaptation and can increase fitness of microbial communities in the changing environment. Yet the identification of MGEs remains challenging due to the fuzziness of different MGE types and incompleteness of metagenomic assembled genomes (MAGs). Here we present proMGEflow - a Nextflow pipeline designed to annotate full genomes and MAGs with discrete MGE categories: plasmids, integrons, phages and transposable elements. In comparison to other tools, proMGEflow takes a top-down approach to harmonize all MGEs on one go from a given (meta)genome. Our pipeline uses subfamilies of recombinases as universal MGE markers, as well as MGE type-specific mobility machinery, e.g. structural phage genes, for fine-grained assignment. The MGE boundaries estimation is based on the joined bacterial species pangenome from the MAG and species cluster of high-quality reference genomes from the ProGenomes3 database. By decoupling the MGE boundary determination step into the Python package MGExpose, we can further annotate MGEs in user-provided genomic regions by rule-based classification of their machinery and recombinases. In total, we applied proMGEflow to around 200,000 MAGs from the Searchable Planetary-scale mIcrobiome REsource (SPIRE). This did not only result in the discovery of around 3 million MGEs of different types but also helped to gain the first functional insights into a global environmental mobilome. The availability of scalable and reproducible pipelines for unified MGE annotation from metagenomes will improve our understanding of mechanisms of gene mobility as well as the cross-talk with their prokaryotic hosts.

July 24, 2025
15:10-15:20
Extracting host-specific developmental signatures from longitudinal microbiome data
Confirmed Presenter: Balazs Erdos, Simula Metropolitan Center for Digital Engineering, Norway
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Balazs Erdos, Balazs Erdos, Simula Metropolitan Center for Digital Engineering
  • Christos Chatzis, Christos Chatzis, Simula Metropolitan Center for Digital Engineering
  • Jonathan Thorsen, Jonathan Thorsen, COPSAC
  • Jakob Stokholm, Jakob Stokholm, COPSAC
  • Age K. Smilde, Age K. Smilde, Biosystems Data Analysis
  • Morten A. Rasmussen, Morten A. Rasmussen, COPSAC
  • Evrim Acar, Evrim Acar, Simula Metropolitan Center for Digital Engineering

Presentation Overview:Show

Longitudinal microbiome studies offer critical insights into microbial community dynamics, helping to distinguish true biological signals from interindividual variability. Tensor decompositions, such as CANDECOMP/PARAFAC (CP), have been applied to analyze longitudinal microbiome data by arranging temporal measurements as a third-order tensor with modes representing taxa, time, and hosts. While these methods have proven useful in revealing the underlying structures in such data, they are limited in their ability to capture host-specific microbial dynamics including individual accelerated or delayed phenomena. To address this limitation, we use the PARAFAC2 model, a more flexible tensor model, which can account for host-specific differences in temporal trajectories of microbial communities. We analyze longitudinal microbiome data from the COPSAC2010 (Copenhagen Prospective Studies on Asthma in Childhood) cohort, tracking gut microbiome maturation in children over their first six years of life, along with data from the FARMM (Food and Resulting Microbial Metabolites) study, examining dietary effects before and after microbiota depletion. We show that both CP and PARAFAC2 decompositions reveal meaningful microbial signatures, including compositional shifts associated with birth mode, presence of older siblings, and dietary interventions. However, while CP captures the main microbial trends in time, PARAFAC2 uncovers host- and subgroup-specific developmental trajectories, offering a more nuanced view of microbiome maturation, highlighting its potential to enhance longitudinal microbiome data analysis. In addition, we discuss the interpretability of the extracted patterns facilitated by the uniqueness properties of CP and PARAFAC2, and discuss potential challenges related to the generalization of the patterns through the concept of replicability.

July 24, 2025
15:20-15:30
Complex SynCom inoculations to study root community assembly
Confirmed Presenter: Gijs Selten, Utrecht University, Netherlands
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Gijs Selten, Gijs Selten, Utrecht University
  • Florian Lamouche, Florian Lamouche, INRAE Angers-Nantes
  • Adrian Gomez Repolles, Adrian Gomez Repolles, Aarhus University
  • Simona Radutoiu, Simona Radutoiu, Aarhus University
  • Ronnie de Jonge, Ronnie de Jonge, Utrecht University

Presentation Overview:Show

The root microbiome is a complex system composed of millions of interacting microbes, some of which have plant-beneficial traits such as priming the plant’s defenses or promoting growth. To apply these traits for sustainable agriculture, however, an understanding of root microbiome assembly, dynamics, and functioning is required. To gain this understanding, we isolated hundreds of rhizobacterial strains from Arabidopsis, Barley, and Lotus, when grown in natural soil. These strains were then cultured and used to reconstitute highly complex Synthetic Communities (SynComs) comprising between 175 and 1,000 strains, which were subsequently inoculated onto the three hosts. After cultivation, the roots were harvested, DNA was extracted, and were subjected to shotgun metagenomics to identify and quantify the SynCom members. Using the genomic sequences of the bacterial strains, we examined both communal functions – i.e. bacterial functions enriched in the root microbiome–and individual traits that enhance a strain’s competitiveness. Community analyses revealed the three hosts to select for functions irrespective of taxonomic origin, with an enriched selection of functions related to amino acid and vitamin metabolism, quorum sensing, and flagellar assembly. Furthermore, metagenome-wide association analysis of the most successful strains highlighted metabolic diversity, motility and secretion systems as key traits in driving a strain’s competitiveness. Although root competence functions in rhizobacteria have been studied extensively, our dataset enables the investigation of these traits within a complex microbiome context that closely resembles natural communities. This unique feature offers new insights into how plant-microbe and microbe-microbe interactions shift across different environmental and community contexts.

July 24, 2025
15:30-15:40
Spatial and temporal variation of marine microbial interactions around the west Antarctic Peninsula
Confirmed Presenter: Julia C Engelmann, Royal Netherlands Institute for Sea Research (NIOZ), Netherlands
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Julia C Engelmann, Julia C Engelmann, Royal Netherlands Institute for Sea Research (NIOZ)
  • Swan Ls Sow, Swan Ls Sow, Royal Netherlands Institute for Sea Research (NIOZ)
  • Willem H van de Poll, Willem H van de Poll, Faculty of Science and Engineering
  • Rachel Eveleth, Rachel Eveleth, Oberlin College
  • Jeremy J Rich, Jeremy J Rich, School of Marine Sciences
  • Hugh W Ducklow, Hugh W Ducklow, Dep. of Earth and Environmental Sciences and Lamont-Doherty Earth Observatory
  • Patrick D Rozema, Patrick D Rozema, Faculty of Science and Engineering
  • Catherine M Luria, Catherine M Luria, Laboratory of Systems Pharmacology
  • Henk Bolhuis, Henk Bolhuis, Royal Netherlands Institute for Sea Research (NIOZ)
  • M

Presentation Overview:Show

The west Antarctic Peninsula (WAP) has experienced more dramatic increases in temperature due to climate change than the rest of the continent and the global average. Moreover, the northern region of the WAP, hosting the research station Palmer, has seen higher temperatures and lower sea ice extent than the South, where the Rothera research station is located. We assessed bacterial and microbial eukaryote communities and their seasonal variation at the Palmer and Rothera time-series sites between July 2013-April 2014 and predicted inter-and intra-domain causal effects. We found that microbial communities were considerably different between the two sites, with differences being attributed to seawater temperature and sea ice coverage in combination with sea ice type differences. We predicted microbial interactions with causal effect modelling, which corrects for spurious correlations and takes the direction of information flow into account (using a directed acyclic graph reconstruction approach to identify confounders). Causal effect analysis suggested that bacteria were stronger drivers of ecosystem dynamics at Palmer, while microbial eukaryotes played a stronger role at Rothera. The parasitic taxa Syndiniales persevered at both sites across the seasons, with Palmer and Rothera harbouring different key groups. However, at Rothera Syndiniales dominated in the set of negative causal effects while this was not the case at Palmer, suggesting that parasitism drives community dynamics at Rothera more strongly than at Palmer. Our research sheds light on the dynamics of microbial community composition and potential microbial interactions at two sampling locations that represent different climate regimes along the WAP.

July 24, 2025
15:40-15:50
Associations between Microbiome-Associated Variants and Diseases
Confirmed Presenter: Tess Cherlin, University of Pennsylvania, United States
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Tess Cherlin, Tess Cherlin, University of Pennsylvania
  • Jagyshila Das, Jagyshila Das, University of Pennsylvania
  • Colleen Morse, Colleen Morse, University of Pennsylvania
  • Regeneron Genetics Center, Regeneron Genetics Center, Regeneron Genetics Center
  • Penn Medicine Biobank, Penn Medicine Biobank, University of Pennsylvania
  • Seth Bordenstein, Seth Bordenstein, Pennsylvania State University
  • Anurag Verma, Anurag Verma, University of Pennsylvania
  • Shefali Setia-Verma, Shefali Setia-Verma, University of Pennsylvania

Presentation Overview:Show

High throughput sequencing, studies have investigated the microbiome’s association with diseases and genetic variants. We aimed to 1) extended previously identified microbiome-associated variants (MAVs) from microbiome GWAS (mbGWAS) to include newer non-European population studies and 2) leverage biobank data with large sample sizes for genetically diverse population groups. We did phenome-wide association study (PheWAS) in the Penn Medicine Biobank (PMBB) which included 41,102 patients from two genetically inferred ancestry groups. Next, we mined MAV associations from PheWAS data in the NIH’s All of Us Biobank (n = 205,237) and the Million Veterans Program (MVP) Biobank (n= 630,969). We then meta-analyzed the MAV by PheWAS results from these three datasets as well as the results from each ancestry-specific meta-analysis. We found 13 significant associations from the AFR meta-analysis (p-value ≤ 4.6e-08), 205 significant associations from the EUR meta-analysis (p-value ≤ 3.4e-08), and 122 significant associations for 25 unique MAVs from the meta-analysis across all ancestries (p-value ≤ 6.6e-08). To extend findings from our PheWAS, we performed QTL and causal inference testing analysis on these 25 loci using the SMR portal to determine whether these loci showed evidence of shared genetic signals across traits. We found several significant relationships especially in significantly associated traits like psoriasis, venous thromboembolism, and type 2 diabetes, and gout. Future work will investigate microbiome-QTLs to understand the potential causal relationship between MAVs, the microbiome, and disease phenotypes. This research sets the stage for further investigations aiming to uncover the mechanisms and clinical implications of microbiome-disease associations.

July 24, 2025
15:50-16:00
Detecting Synergistic Associations in Microbial Communities via Multi-Dimensional Feature Selection
Confirmed Presenter: Witold Rudnicki, Faculty of Computer Science, University of Białystok
Track: MICROBIOME

Room: 01B
Format: In person

Authors List: Show

  • Sajad Shahbazi, Sajad Shahbazi, Computational Centre
  • Piotr Stomma, Piotr Stomma, Faculty of Computer Science
  • Tara Zakerali, Tara Zakerali, Computational Centre
  • Balakrishnan Subramanian, Balakrishnan Subramanian, Computational Centre
  • Kinga Zielinska, Kinga Zielinska, Malopolska Centre of Biotechnology
  • Paweł Łabaj, Paweł Łabaj, Malopolska Centre of Biotechnology
  • Izabela Święcicka, Izabela Święcicka, Faculty of Biology
  • Marek Bartoszewicz, Marek Bartoszewicz, Faculty of Biology
  • Krzysztof Mnich, Krzysztof Mnich, Computational Centre
  • Witold Rudnicki, Witold Rudnicki, Faculty of Computer Science

Presentation Overview:Show

The gut microbiome regulates host immunity, barrier function, and inflammatory processes. While many studies have identified individual taxa associated with disease, they often overlook higher-order dependencies within microbial communities. We present a methodology that combines information-theoretic feature selection and machine learning to identify taxa whose predictive relevance may depend on synergy with other community members.

We apply this framework to data from the American Gut Project, focusing on the presence or absence of self-reported food allergy. Taxonomic profiles were normalised and binarised using two thresholding strategies. After quality filtering, the dataset included samples from 1694 healthy and 1847 allergic individuals. We used the Multi-Dimensional Feature Selection (MDFS) algorithm to evaluate information gain in both univariate (1D) and pairwise (2D) settings. As a baseline, we performed U-tests to identify taxa with significantly different abundances between groups. Predictive models were built using Random Forest classifiers trained separately on features selected via each method.

MDFS outperformed the U-test in both sensitivity and robustness. Fifteen taxa were consistently selected by all methods, while MDFS variants uniquely recovered 42. The 2D analysis revealed 18 taxa that carried no predictive value alone but contributed significant information in combination with others, suggesting synergistic structure. Conventional univariate approaches would have overlooked these taxa.

The results demonstrate the utility of synergy-aware feature selection for capturing complex, non-additive associations in microbial communities. Similar patterns observed across other cohorts indicate the potential generalizability of this approach.