Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Wednesday, July 23rd
14:40-15:20
Invited Presentation: Microbiome multitudes and metadata madness
Format: In person


Authors List: Show

  • Fiona Brinkman

Presentation Overview: Show

Microbiome analysis is increasingly becoming a critical component of a wide range of health, agri-foods, and environmental studies. I will present case studies showing the benefit of integrating very diverse metadata into such analyses - and also pitfalls to watch out for. The results of one such cohort study will be further presented, illustrating the need for analyses that allow one to flexibly view the metadata in the context of microbiome data. The results support the multigenerational importance of “healthy", diverse microbiomes, though defining what is “healthy" is complex.

15:20-15:30
Species-level taxonomic profiling of Earth’s microbiomes with mOTUs4
Format: In person


Authors List: Show

  • Marija Dmitrijeva, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Switzerland
  • Hans-Joachim Ruscheweyh, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Switzerland
  • Lilith Feer, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Switzerland
  • Kang Li, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Switzerland
  • Samuel Miravet-Verde, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Switzerland
  • Anna Sintsova, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Switzerland
  • Andrew Abi Younes, Institute of Microbiology, ETH Zurich, Switzerland
  • Wolf-Dietrich Hardt, Institute of Microbiology, ETH Zurich, Switzerland
  • Daniel Mende, Human Biology Microbiome Quantum Research Center (Bio2Q), Keio University, Japan
  • Georg Zeller, Leiden University Center for Infectious Diseases (LUCID), Leiden University Medical Center, Netherlands
  • Shinichi Sunagawa, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Switzerland

Presentation Overview: Show

Microbial communities are crucial to the health and functioning of diverse ecosystems on Earth. A key step in their analysis is taxonomic profiling, i.e., the identification and quantification of microbial community composition, typically done by comparing environmental samples to reference genome collections. However, species from underexplored ecosystems are poorly represented in public databases, limiting the accuracy of taxonomic profiling tools. Here, we present mOTUs4 and its accompanying online database, accessible at https://motus-db.org/. This resource comprises 2.83 million metagenome-assembled genomes (MAGs) recovered from over 50 environments using a unified genome reconstruction workflow. The MAGs are accompanied by 919,090 genomes from reference databases, totalling 3.75 million prokaryotic genomes. mOTUs4 can profile 124,295 species, expanding taxonomic coverage of underrepresented ecosystems. The associated genomic data can be interactively browsed online and filtered based on taxonomy, mOTUs identifiers, and genome quality metrics; the user-friendly interface minimizes the need for programming skills to link profiling results with genomic context. In addition, the output produced by mOTUs4 can serve as a proxy for the number of cells within a sample, allowing its use as a scaling factor for normalizing gene counts. This opens the utility of using the profiler output to calculate per cell copy numbers of diverse gene functional groups, such as antimicrobial resistance genes. By improving accuracy and interpretability in taxonomic profiling across diverse ecosystems and standardising quantification of gene functional groups, mOTUs4 offers a scalable approach to microbial community analysis.

15:30-15:40
Accurate profiling of microbial communities for shotgun metagenomic sequencing with Meteor2
Format: In person


Authors List: Show

  • Amine Ghozlane, Institut Pasteur, France
  • Florence Thirion, INRAE, France
  • Florian Plaza Oñate, INRAE, France
  • Franck Gauthier, INRAE, France
  • Emmanuelle Le Chatelier, INRAE, France
  • Anita Annamalé, Institut Pasteur, France
  • Mathieu Almeida, INRAE, France
  • Stanislav Ehrlich, University College London, United Kingdom
  • Nicolas Pons, INRAE, France

Presentation Overview: Show

The characterization of complex microbial communities is a critical challenge in microbiome research. Metagenomic profiling has advanced to include taxonomic, functional, and strain-level profiling (TFSP) of microbial communities. We present Meteor2, a tool that leverages compact, environment-specific microbial gene catalogues to deliver comprehensive TFSP insights from metagenomic samples. Meteor2 currently supports ten ecosystems, with 63,494,365 microbial genes clustered into 11,653 metagenomic species pangenomes (MSPs).
In benchmark tests, Meteor2 demonstrated strong performance in TFSP, excelling in detecting low-coverage species. It improved species detection sensitivity by at least 45% compared to other tools, such as MetaPhlAn4 and sylph, in human and mouse gut microbiota simulations. For functional profiling, Meteor2 improved abundance estimation accuracy by at least 35% compared to HUMAnN3. Additionally, Meteor2 tracked more strain pairs than StrainPhlAn, capturing an additional 9.8% on the human dataset and 19.4% on the mouse dataset.
In its fast configuration, Meteor2 emerges as one of the fastest available tools for profiling, requiring only 2.3 minutes for taxonomic analysis and 10 minutes for strain-level analysis against the human microbial gene catalogue when processing 10M paired reads — operating within a modest 5GB RAM footprint. We futher validated Meteor2 using a published faecal microbiota transplantation (FMT) dataset, demonstrating its ability to deliver extensive and actionable metagenomic analysis. As an open-source, easy-to-install, and accurate analysis platform, Meteor2 is highly accessible to researchers, facilitating the exploration of complex microbial ecosystems. Meteor2 is available on github (https://github.com/metagenopolis/meteor) and bioconda (bioconda/meteor). A preprint is currently available here (DOI:21203/rs.3.rs-6122276/v1).

15:40-15:50
Benchmarking metagenomic binning tools on real datasets across sequencing platforms and binning modes
Format: In person


Authors List: Show

  • Haitao Han, Fudan University, China
  • Ziye Wang, Nankai University, China
  • Shanfeng Zhu, Fudan University, China

Presentation Overview: Show

Metagenomic binning is a culture-free approach that facilitates the recovery of metagenome-assembled genomes by grouping genomic fragments. However, there remains a lack of a comprehensive benchmark to evaluate the performance of metagenomic binning tools across various combinations of data types and binning modes. In this study, we benchmark 13 metagenomic binning tools using short-read, long-read, and hybrid data under co-assembly, single-sample, and multi-sample binning, respectively. The benchmark results demonstrate that multi-sample binning exhibits optimal performance across short-read, long-read, and hybrid data. Moreover, multi-sample binning outperforms other binning modes in identifying potential antibiotic resistance gene hosts and near-complete strains containing potential biosynthetic gene clusters across diverse data types. This study also recommends three efficient binners across all data-binning combinations, as well as high-performance binners for each combination.

15:50-16:00
Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow
Format: In person


Authors List: Show

  • Nils Kleinbölting, Forschungszentrum Jülich GmbH, Germany
  • Peter Belmann, Forschungszentrum Jülich GmbH, Germany
  • Benedikt Osterholz, Forschungszentrum Jülich GmbH, Germany

Presentation Overview: Show

The metagenome analysis of complex environments with thousands of datasets, such as those available in the Sequence Read Archive, requires immense computational resources to complete the computational work within an acceptable time frame. Such large-scale analyses require that the underlying infrastructure is used efficiently. In addition, any analysis should be fully reproducible and the workflow must be publicly available to allow other researchers to understand the reasoning behind computed results. To address this challenge, we have developed and like to present the Metagenomics-Toolkit, a scalable, data agnostic workflow that automates the analysis of short and long metagenomic reads obtained from Illumina or Oxford Nanopore Technology devices, respectively. The Metagenomics-Toolkit offers not only standard features expected in a metagenome workflow, such as quality control, assembly, binning, and annotation, but also distinctive features, such as plasmid identification based on various tools, the recovery of unassembled microbial community members and the discovery of microbial interdependencies through a combination of dereplication, co-occurrence, and genome-scale metabolic modeling. Furthermore, the Metagenomics-Toolkit includes a machine learning-optimized assembly step that tailors the peak RAM value requested by a metagenome assembler to match actual requirements, thereby minimizing the dependency on dedicated high-memory hardware. While the Metagenomics Toolkit can be executed on user workstations, it also offers several optimizations for an efficient cloud-based cluster execution.

16:40-17:20
Invited Presentation: TBA
Format: In person


Authors List: Show

  • Rob Knight
17:20-17:40
Proceedings Presentation: DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA Embeddings
Format: In person


Authors List: Show

  • Zhihan Zhou, Northwestern University, United States
  • Weimin Wu, Northwestern University, United States
  • Harrison Ho, University of California, Merced, United States
  • Jiayi Wang, Northwestern University, United States
  • Lizhen Shi, Northwestern University, United States
  • Ramana Davuluri, Stony Brook University, United States
  • Zhong Wang, Lawrence Berkeley National Laboratory, United States
  • Han Liu, Northwestern University, United States

Presentation Overview: Show

We introduce DNABERT-S, a tailored genome model that develops species-aware embeddings to naturally cluster and segregate DNA sequences of different species in the embedding space. Differentiating species from genomic sequences (i.e., DNA and RNA) is vital yet challenging, since many real-world species remain uncharacterized, lacking known genomes for reference. Embedding-based methods are therefore used to differentiate species in an unsupervised manner. DNABERT-S builds upon a pre-trained genome foundation model named DNABERT-2. To encourage effective embeddings to error-prone long-read DNA sequences, we introduce Manifold Instance Mixup (MI-Mix), a contrastive objective that mixes the hidden representations of DNA sequences at randomly selected layers and trains the model to recognize and differentiate these mixed proportions at the output layer. We further enhance it with the proposed Curriculum Contrastive Learning (C$^2$LR) strategy. Empirical results on 23 diverse datasets show DNABERT-S's effectiveness, especially in realistic label-scarce scenarios. For example, it identifies twice more species from a mixture of unlabeled genomic sequences, doubles the Adjusted Rand Index (ARI) in species clustering, and outperforms the top baseline's performance in 10-shot species classification with just a 2-shot training.

17:40-17:50
MGnify Genomes: generating richly annotated, searchable biome-specific genome catalogues
Format: In person


Authors List: Show

  • Tatiana Gurbich, EMBL-EBI, United Kingdom
  • Germana Baldi, EMBL-EBI, United Kingdom
  • Martin Beracochea, EMBL-EBI, United Kingdom
  • Alejandra Escobar-Zepeda, EMBL-EBI, United Kingdom
  • Varsha Kale, EMBL-EBI, United Kingdom
  • Jennifer Lu, EMBL-EBI, United Kingdom
  • Lorna Richardson, EMBL-EBI, United Kingdom
  • Alexander Rogers, EMBL-EBI, United Kingdom
  • Ekaterina Sakharova, EMBL-EBI, United Kingdom
  • Mahfouz Shehu, EMBL-EBI, United Kingdom
  • Robert Finn, EMBL-EBI, United Kingdom

Presentation Overview: Show

The generation of metagenome-assembled genomes (MAGs) has become a routine method for studying microbiomes. With the growing availability of MAGs in public repositories, MGnify—a free platform for metagenomic data assembly, analysis, and archiving—has introduced MGnify Genomes. This resource serves as a hub for systematically organising and annotating publicly available MAGs and isolate genomes into non-redundant, biome-specific catalogues.
The resource includes over half a million genomes and has recently expanded to incorporate eukaryotic genomes in addition to prokaryotic ones. These genomes are sourced from a wide range of biomes, including both host-associated and environmental contexts. Within each biome, genomes are organised into species-level clusters, with the highest-quality genome selected as the representative—prioritising isolate genomes over MAGs. Each representative genome is richly annotated with comprehensive functional information, including antimicrobial resistance. Additional annotations cover biosynthetic gene clusters, carbohydrate metabolism—including polysaccharide utilisation loci, non-coding RNAs, CRISPR, phage sequences, plasmids, and integrative mobile elements.
An open-source Nextflow pipeline is maintained for generating new catalogues and updating existing ones. The platform offers multiple ways to utilise these references: each biome-specific catalogue is accompanied by Kraken2, protein, and gene databases. A fast, k-mer-based search tool is available on the MGnify Genomes website, allowing users to quickly compare their genomes against the reference catalogues. The resource supports a wide range of applications, including the identification of novel genomes, analysis of species-level adaptation across environments, and research in agricultural, environmental, and health and disease fields.

17:50-18:00
Rapid and Consistent Genome Clustering for Navigating Bacterial Diversity with Millions of Genomes
Format: In person


Authors List: Show

  • Johanna von Wachsmann, European Bioinformatics Institute, United Kingdom
  • John A. Lees, European Bioinformatics Institute, United Kingdom
  • Robert D. Finn, European Bioinformatics Institute, United Kingdom

Presentation Overview: Show

The exponential growth of bacterial genomic databases presents unprecedented challenges for researchers, with isolate genomes increasing from 661,405 samples in 2021 to 2,440,377 samples by August 2024, alongside expanding MAG repositories like those provided by MGnify. While removing genome redundancy at species or strain levels is essential for navigating this vast landscape, current gold-standard tools like dRep have become computationally infeasible for datasets exceeding 50,000 genomes - illustrated by the human gut MAG catalogue in MGnify requiring artificial splitting into multiple chunks for processing, risking taxonomic inconsistencies and demanding extensive manual intervention. We present a novel sketching-based clustering approach that dramatically improves scalability while maintaining high biological accuracy. Our method is built on sketchlib.rust (approximately 100× faster than MASH) for sketching genomes and constructing genome similarity networks that effectively partition millions of genomes into species clusters. When benchmarked against dRep on a 1,125-genome dataset, our approach clusters the genomes in just 0.2 CPU hours compared to dRep's 92 CPU hours. More importantly, our method successfully processes 219,000 genomes in only 17.1 CPU hours - a task impossible for dRep. Quality assessment across multiple datasets demonstrates excellent taxonomic coherence, with monophyletic scores >99%. This breakthrough enables researchers to effectively navigate and utilise the unprecedented scale of available bacterial genomic data, facilitating analyses previously considered impracticable or even impossible.

Thursday, July 24th
8:40-9:00
Proceedings Presentation: GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search
Format: In person


Authors List: Show

  • Fuchuan Qu, Department of Electrical Engineering, City University of Hong Kong, Hong Kong
  • Cheng Peng, Department of Electrical Engineering, City University of Hong Kong, Hong Kong
  • Jiaojiao Guan, Department of Electrical Engineering, City University of Hong Kong, Hong Kong
  • Donglin Wang, School of Environmental Science & Engineering, Shandong University, China
  • Yanni Sun, Department of Electrical Engineering, City University of Hong Kong, Hong Kong
  • Jiayu Shang, Department of Information Engineering, Chinese University of Hong Kong, Hong Kong

Presentation Overview: Show

Motivation: Nucleocytoplasmic large DNA viruses (NCLDVs) are notable for their large genomes and extensive gene repertoires, which contribute to their widespread environmental presence and critical roles in processes such as host metabolic reprogramming and nutrient cycling. Metagenomic sequencing has emerged as a powerful tool for uncovering novel NCLDVs in environmental samples. However, identifying NCLDV sequences in metagenomic data remains challenging due to their high genomic diversity, limited reference genomes, and shared regions with other microbes. Existing alignment-based and machine learning methods struggle with achieving optimal trade-offs between sensitivity and precision.

Results: In this work, we present GiantHunter, a reinforcement learning-based tool for identifying NCLDVs from metagenomic data. By employing a Monte Carlo tree search strategy, GiantHunter dynamically selects representative non-NCLDV sequences as the negative training data, enabling the model to establish a robust decision boundary. Benchmarking on rigorously designed experiments shows that GiantHunter achieves high precision while maintaining competitive sensitivity, improving the F1-score by 10% and reducing computational cost by 90% compared to the second-best method. To demonstrate its real-world utility, we applied GiantHunter to 60 metagenomic datasets collected from six cities along the Yangtze River, located both upstream and downstream of the Three Gorges Dam. The results reveal significant differences in NCLDV diversity correlated with proximity to the dam, likely influenced by reduced flow velocity caused by the dam. These findings highlight GiantHunter's potential to advance our understanding of NCLDVs and their ecological roles in diverse environments.

9:00-9:10
CAMI Benchmarking Portal: online evaluation and ranking of metagenomic software
Format: In person


Authors List: Show

  • Fernando Meyer, Helmholtz Centre for Infection Research, Braunschweig, Germany, Germany
  • Gary Robertson, Helmholtz Centre for Infection Research, Braunschweig, Germany, Germany
  • Zhi-Luo Deng, Helmholtz Centre for Infection Research, Braunschweig, Germany, Germany
  • David Koslicki, Penn State University, University Park, PA, USA, United States
  • Alexey Gurevich, Helmholtz Institute for Pharmaceutical Research Saarland, Helmholtz Centre for Infection Research, Saarbrücken, Germany, Germany
  • Alice C. McHardy, Helmholtz Centre for Infection Research, Braunschweig, Germany, Germany

Presentation Overview: Show

Finding appropriate software and parameter settings to process shotgun metagenome data is essential for meaningful metagenomic analyses. To enable objective and comprehensive benchmarking of metagenomic software, the community-led initiative for the Critical Assessment of Metagenome Interpretation (CAMI) promotes standards and best practices. Since 2015, CAMI has provided comprehensive datasets, benchmarking guidelines, and challenges. However, benchmarking had to be conducted offline, requiring substantial time and technical expertise and leading to gaps in results between challenges. We present the CAMI Benchmarking Portal — a central repository of CAMI resources and web server for the evaluation and ranking of metagenome assembly, binning, and taxonomic profiling software. The portal simplifies evaluation, enabling users to easily compare their results with previous and other users’ submissions through a variety of metrics and visualizations. The portal currently hosts 28,675 results and is freely available at https://cami-challenge.org/.

9:10-9:20
Invited Presentation: CAMI community exchange
Format: In person


Authors List: Show

  • Alice McHardy
9:20-9:30
NanoGraph: Mapping Nanopore Squiggles to Graphs Enables Accurate Taxonomic Assignment
Format: In person


Authors List: Show

  • Wenhuan Zeng, University of Tuebingen, Germany
  • Daniel H. Huson, University of Tuebingen, Germany

Presentation Overview: Show

Nanopore sequencing technology offers long sequencing reads and real-time analysis capabilities, making it a powerful tool for addressing diverse questions in the life sciences. This technology detects electronic raw signals from samples, which are converted into nucleotide sequences (A, T, G, and C) through a process known as basecalling. These sequences can subsequently be used for various types of analysis. To enhance the efficiency of taxonomic classification in Nanopore sequencing and explore the challenges of applying deep learning algorithms to ultra-long sequences, we developed NanoGraph, which is a graph-based deep learning framework designed to classify samples based on their taxonomic lineages. NanoGraph processes raw signals (of substantial length) by transforming them into topological graph structures using novel methods. We evaluated NanoGraph’s performance using a customized simulated dataset and benchmarked it against a previous study on public datasets, demonstrating superior results. Additionally, we assessed its practical usability after fine-tuning the trained model on real raw signal datasets generated in our wet lab. In summary, NanoGraph provides a robust and effective approach for the taxonomic classification of Nanopore-sequenced samples, offering insights that advance the application of graph neural networks to raw signal data and help bridge the gap between computational efficiency and ultra-long sequencing reads.

9:30-9:40
MEGAN7: Enhanced Optimization and Advanced Functionality for Metagenomic Analysis
Format: In person


Authors List: Show

  • Anupam Gautam, University of Tuebingen/ IMPRS "From Molecules to Organisms", Max Planck institute for Biology Tuebingen, Germany
  • Daniel H. Huson, University of Tuebingen, Germany

Presentation Overview: Show

MEGAN is a widely used, user-friendly tool for metagenomic analysis, suitable for long and short read data, and remains the only tool with a GUI interface. MEGAN7 introduces optimized workflows and enhanced functionality. By utilizing smaller, clustered reference databases, MEGAN7 improves computational efficiency while maintaining high-quality taxonomic and functional assignments, making it a scalable solution for diverse datasets.
This study presents current work on MEGAN7, a major update of our MEGAN software, and highlights the impact of utilizing smaller reference databases on the computational efficiency and effectiveness of metagenomic sequencing data analysis, as integrated into MEGAN7.
Metagenomic analysis was conducted on short and long reads from ten diverse datasets. Reads were aligned to various resolutions of the UniRef database (100%, 90%, and 50%) and clustered NCBI-nr databases (90% and 50% identity) using DIAMOND. Taxonomic and functional binning of the aligned reads was carried out using MEGAN7.
Smaller reference databases, particularly at 90% and 50% identity, significantly accelerated processing times while maintaining high-quality alignment and assignment rates. The integration of DIAMOND's clustering capabilities further enhanced efficiency, demonstrating improved performance across all downsized databases. MEGAN7 achieved good and agreeable assignment rates for both taxonomic and functional binning, even with reduced database sizes.
These findings illustrate that downsizing reference databases effectively reduces the computational burden of metagenomic analysis without compromising result quality. The incorporation of DIAMOND's clustering features offers additional efficiency gains. With these optimized workflows, MEGAN7 presents a scalable and efficient tool for metagenomic data analysis, offering enhanced functionality for diverse datasets.

9:40-9:50
TaxSEA: Rapid Interpretation of Microbiome Alterations Using Taxon Set Enrichment Analysis and Public Databases
Format: In person


Authors List: Show

  • Feargal Ryan, Flinders University, Australia

Presentation Overview: Show

Microbial communities are essential regulators of ecosystem function, with their composition commonly assessed through DNA sequencing. Most current tools focus on detecting changes among individual taxa (e.g., species or genera), however in other omics fields, such as transcriptomics, enrichment analyses like Gene Set Enrichment Analysis (GSEA) are commonly used to uncover patterns not seen with individual features. Here, we introduce TaxSEA, a taxon set enrichment analysis tool available as an R package, a web portal (https://shiny.taxsea.app), and a Python package. TaxSEA integrates taxon sets from five public microbiota databases (BugSigDB, MiMeDB, GutMGene, mBodyMap, and GMRepoV2) while also allowing users to incorporate custom sets such as taxonomic groupings. In-silico assessments show TaxSEA is accurate across a range of set sizes. When applied to differential abundance analysis output from Inflammatory Bowel Disease and Type 2 Diabetes metagenomic data, TaxSEA can rapidly identify changes in functional groups corresponding to known associations. We also show that TaxSEA is robust to the choice of differential abundance (DA) analysis package. In summary, TaxSEA enables researchers to efficiently contextualize their findings within the broader microbiome literature, facilitating rapid interpretation and advancing understanding of microbiome–host and environmental interactions.

9:50-10:00
SinProVirP: a Signature Protein-based Approach for Accurate and Efficient Profiling of the Human Gut Virome
Format: In person


Authors List: Show

  • Junhua Li, BGI Research, Belgrade 11000, Serbia, Serbia
  • Fangming Yang, BGI Research, Belgrade 11000, Serbia, Serbia
  • Liwen Xiong, BGI Research, Belgrade 11000, Serbia, Serbia
  • Min Li, BGI Research, Belgrade 11000, Serbia, Serbia
  • Xuyang Feng, BGI Research, Belgrade 11000, Serbia, Serbia
  • Huahui Ren, Institute of Intelligent Medical Research (IIMR), BGI Genomics, Shenzhen 518083, China, China
  • Zhun Shi, Institute of Intelligent Medical Research (IIMR), BGI Genomics, Shenzhen 518083, China, China
  • Huanzi Zhong, Institute of Intelligent Medical Research (IIMR), BGI Genomics, Shenzhen 518083, China, China

Presentation Overview: Show

The human gut virome represents a critical yet underexplored microbial component that regulates bacterial communities, modulates host immunity, and maintains gut health. However, virome analysis remains challenging due to the vast diversity and genomic variability of viruses. Existing profiling methods often struggle with accuracy and efficiency, hindering their ability to detect novel viral species and perform large-scale analyses. Here, we present SinProVirP, a genus-level virome profiling tool based on signature proteins. By analyzing 275,202 phage genomes to establish a curated database of 109,221 signature proteins across 6,780 viral clusters (VCs), SinProVirP achieves genus-level phage quantification with precision and recall comparable to the benchmark method while reducing computational demands by over 80%. Crucially, SinProVirP significantly outperforms existing tools in detecting novel viruses, achieving over 80% recall by using signature protein-based identification strategy. Applied to inflammatory bowel disease (IBD) cohorts, SinProVirP revealed disease-specific virome dysbiosis, identified phage-host interactions, and improved performance of bacteria-only disease classification models. This approach enables robust, large-scale virome analysis, facilitates the integrative analysis of viral and bacterial communities, and improves our understanding of the virome’s role in health.

11:20-11:40
Proceedings Presentation: Leveraging Large Language Models to Predict Antibiotic Resistance in Mycobacterium tuberculosis
Format: In person


Authors List: Show

  • Conrad Testagrose, University of Florida, United States
  • Sakshi Pandey, University of Florida, United States
  • Mohammadali Serajian, University of Florida, United States
  • Simone Marini, University of Florida, United States
  • Mattia Prosperi, University of Florida, United States
  • Christina Boucher, University of Florida, United States

Presentation Overview: Show

Antibiotic resistance in Mycobacterium tuberculosis (MTB) poses a significant challenge to global public health. Rapid and accurate prediction of antibiotic resistance can inform treatment strategies and mitigate the spread of resistant strains. In this study, we present a novel approach leveraging large language models (LLMs) to predict antibiotic resistance in MTB (LLMTB). Our model is trained on a large dataset of genomic data and associated resistance profiles, utilizing natural language processing techniques to capture patterns and mutations linked to resistance. The model's architecture integrates state-of-the-art transformer-based LLMs, enabling the analysis of complex genomic sequences and the extraction of critical features relevant to antibiotic resistance. We evaluate our model's performance using a comprehensive dataset of MTB strains, demonstrating its ability to achieve high performance in predicting resistance to various antibiotics. Unlike traditional machine learning methods, fine-tuning or few-shot learning open avenues for LLMs to adapt to new or emerging drugs thereby reducing reliance on extensive data curation. Beyond predictive accuracy, LLMTB uncovers deeper biological insights, identifying critical genes, intergenic regions, and novel resistance mechanisms. This method marks a transformative shift in resistance prediction and offers significant potential for enhancing diagnostic capabilities and guiding personalized treatment plans, ultimately contributing to the global effort to combat tuberculosis and antibiotic resistance. All source code is publicly available at https://github.com/ctestagrose/LLMTB.

11:40-11:50
De novo discovery of conserved gene clusters in microbial genomes with Spacedu
Format: In person


Authors List: Show

  • Ruoshi Zhang, Max Planck institute for multidisciplinary sciences, Germany
  • Johannes Soeding, Max Planck institute for multidisciplinary sciences, Germany
  • Milot Mirdita, Seoul National University, Korea, South Korea

Presentation Overview: Show

Metagenomics has revolutionized environmental and human-associated microbiome studies. However, the limited fraction of proteins with known biological process and molecular functions presents a major bottleneck. In prokaryotes and viruses, evolution favors keeping genes participating in the same biological processes co-localized as conserved gene clusters. Conversely, conservation of gene neighborhood indicates functional association. Spacedust is a tool for systematic, de novo discovery of conserved gene clusters. To find homologous protein matches it uses fast and sensitive structure comparison with Foldseek. Partially conserved clusters are detected using novel clustering and order conservation P-values. We demonstrate Spacedust's sensitivity with an all-vs-all analysis of 1\,308 bacterial genomes, identifying 72\,843 conserved gene clusters containing 58\% of the 4.2 million genes. It recovered recover 95% of antiviral defense system clusters annotated by a specialized tool. Spacedust's high sensitivity and speed will facilitate the large-scale annotation of the huge numbers of sequenced bacterial, archaeal and viral genomes.

11:50-12:00
Nerpa 2: linking biosynthetic gene clusters to nonribosomal peptide structures
Format: In person


Authors List: Show

  • Ilia Olkhovskii, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, Germany
  • Azat Tagirdzhanov, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, Germany
  • Alexey Gurevich, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, Germany
  • Aleksandra Kushnareva, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, Germany
  • Petr Popov, Neapolis University Pafos, Pafos, Cyprus, Greece

Presentation Overview: Show

Nonribosomal peptides (NRPs) are clinically important molecules produced by microbial specialized enzymes encoded in biosynthetic gene clusters (BGCs). Linking BGCs to their products is crucial for predicting and manipulating NRP production, yet BGC-to-NRP biosynthesis is often complex and non-unique, making prediction from the genome challenging. Here, we present Nerpa 2, a high-throughput BGC–NRP matching tool. Compared to its predecessor, we improved prediction of NRP monomers selected during synthesis, introduced a hidden Markov model–based alignment strategy for handling complex biosynthetic paths, and added interactive visualizations for result interpretation.
We evaluated Nerpa 2 on 191 BGCs and 1,205 NRP structures, demonstrating a notable accuracy improvement over both Nerpa 1 and a related tool BioCAT (50% vs. 42% and 8%). In addition to higher overall precision, Nerpa 2 performs significantly better on especially challenging cases.
Nerpa 2 streamlines a range of tasks in NRP research, including annotation of computationally predicted BGCs, prioritization of BGCs more likely to yield novel NRPs, and guiding bioengineering experiments by identifying BGCs that yield molecules close to user-specified target structures. The software is freely available at https://github.com/gurevichlab/nerpa.

12:00-12:10
Phylo-Spec: a phylogeny-fusion deep learning model advances microbiome status identification
Format: In person


Authors List: Show

  • Junhui Zhang, Qingdao University, China
  • Fan Meng, Qingdao University, China
  • Yangyang Sun, Qingdao University, China
  • Wenfei Xu, Qingdao University, China
  • Shunyao Wu, Qingdao University, China
  • Xiaoquan Su, Qingdao University, China

Presentation Overview: Show

Motivation: The human microbiome is crucial for health regulation and disease progression, presenting a valuable opportunity for health state classification. Traditional microbiome-based classification rely on pre-trained machine learning (ML) or deep learning (DL) models, which typically focus on microbial distribution patterns, neglecting the underlying relationships between microbes. As a result, model performance can be significantly affected by data sparsity, misclassified features, or incomplete microbial profiles.

Methods: To overcome these challenges, we introduce Phylo-Spec, a phylogeny-driven deep learning algorithm that integrates multi-aspect microbial information for improved status recognition. Phylo-Spec fuses convolutional features of microbes within a phylogenetic hierarchy via a bottom-up iteration, significantly alleviates the challenges due to sparse data and inaccurate profiling. Additionally, the model dynamically assigns unclassified species to virtual nodes on the phylogenetic tree based on higher-level taxonomy, minimizing interferences from uncertain microbes. Phylo-Spec also captures the feature importance via an information gain-based mechanism through the phylogenetic structure propagation, enhancing the interpretability of classification decisions.

Results: Phylo-Spec demonstrated superior efficacy in microbiome status classification across two in-silico synthetic datasets that simulates the aforementioned cases, outperforming existing ML and DL methods. Validation with real-world metagenomic and amplicon data further confirmed the model’s performance in multiple status classification, establishing a powerful framework for microbiome-based health state identification and microbe-disease association.

12:10-12:20
Beyond Taxonomy and Function: Protein Language Models for Scalable Microbial Representations
Format: In person


Authors List: Show

  • Petra Matyskova, Utrecht University, Netherlands
  • Gijs Selten, Utrecht University, Netherlands
  • Sanne Abeln, Utrecht University, Netherlands
  • Ronnie de Jonge, Utrecht University, Netherlands

Presentation Overview: Show

Traditional microbial representations based on taxonomy or functional annotations like KEGG Orthology (KOs) and OrthoFinder groups (OGs) suffer from low coverage, high dimensionality, or long computation times. In this work, we explore the use of protein large language models (PLLMs), specifically ESM-2, to generate compact and informative microbial embeddings. We benchmark these embeddings against KOs and OGs using a dataset of 988 microbial genomes. We compare the three approaches in terms of protein coverage, feature dimensionality, runtime, and predictive performance in a biologically relevant task: predicting the root competence of microbes on Arabidopsis thaliana. ESM-2 embeddings achieved full protein coverage and required less runtime than OGs or KOs while producing compact 320-dimensional feature sets. In the classification task, random forest and multi-layer perceptron based on ESM-2 embeddings outperformed traditional methods. Additionally, the results were replicated on external synthetic community datasets. Importantly, ESM-2 embeddings preserved relevant taxonomic and functional information, as confirmed through hierarchical clustering and PCA. Through analysing the embedding weights, we also identified key proteins predictive of root competence, including known and novel candidates. Our results suggest that PLLM-based microbial representations offer an efficient and scalable alternative to conventional functional annotation-based approaches, especially for small datasets common in microbiome studies. This approach lays the foundation for more advanced applications such as multi-modal embedding based data integration and the discovery of new biologically meaningful traits beyond taxonomic labels or annotated proteins.

12:20-12:30
Guided tokenizer enhances metagenomic language models performance
Format: In person


Authors List: Show

  • Ali Rahnavard, The George Washington University, United States
  • Vedant Mahangade, The George Washington University, United States
  • Keith Crandall, The George Washington University, United States

Presentation Overview: Show

Tokenization is a critical step in adapting language models for genomic and metagenomic sequence analysis. Traditional tokenization methods—such as fixed-length k-mers or statistical compression algorithms like byte-pair encoding (BPE)—often fail to capture the biological relevance embedded in DNA sequences. We introduce Guided Tokenization (GT), a novel, adaptive strategy that prioritizes biologically meaningful subsequences by leveraging importance scores derived from functional annotations, class distributions, and model attention mechanisms.

Unlike conventional approaches, GT dynamically selects high-importance tokens by integrating (1) class-specific unique k-mers, (2) frequently observed informative subsequences, (3) model-informed weighted tokens after fine-tuning, and (4) biologically annotated fragments such as promoters or coding regions. This token prioritization strategy is applied during pretraining, fine-tuning, and prediction phases of genomic language models (gLMs), enabling more efficient learning with fewer parameters and reduced sequence inflation.

We evaluated GT across a range of metagenomic classification and sequence modeling tasks, including taxonomic profiling, antibiotic resistance gene classification, and read classification (e.g., host vs. microbial and plasmid vs. chromosome). Results consistently demonstrate that GT improves model performance, especially for small and mid-sized models, by enhancing classification accuracy, representation quality, and computational efficiency. These findings position guided tokenization as a scalable and biologically aware framework for advancing the next generation of metagenomic language models.

12:30-12:40
Contrastive learning enables accurate recovery of eukaryotic genomes from metagenome assemblies
Format: In person


Authors List: Show

  • Daniel Gómez-Pérez, Earlham Institute, United Kingdom
  • Sebastién Raguideau, Earlham Institute, United Kingdom
  • Falk Hildebrand, Earlham Institute and Quadram Institute Bioscience, United Kingdom
  • Christopher Quince, Earlham Institute, United Kingdom

Presentation Overview: Show

Assembly-based metagenomic approaches, including generation of metagenome‑assembled genome (MAG) catalogues are pivotal for exploring and understanding microbial communities. Yet, despite the relevance of protist and fungi for communities such as soil and the rhizosphere, eukaryotic MAG recovery lags behind that of prokaryotes. State‑of‑the‑art binning pipelines depend on reference databases of single‑copy core genes that are sparse for eukaryotes and scale poorly as sequence diversity and dataset size increase.
Here, we present a tool written in Python that learns directly from individual metagenomic datasets to recover high-quality eukaryotic MAGs. By embedding contig‑level composition and coverage features into a shared latent space using contrastive learning and then clustering on its reduced space, the method accurately extracts representative bins. In benchmarks based on real and simulated synthetic community datasets of varying sizes (including prokaryotes and eukaryotes), we show its ability to recover every eukaryotic genome with higher than 85% completeness. This is an improvement compared to other state-of-the-art binning tools, which often result in high fragmentation of eukaryotic bins.
Overall, our approach provides a reference‑free method for eukaryotic binning that scales well with the increased growth and higher depth of diverse metagenomic datasets.

12:40-12:50
Flexible Log-odds Homology Features for Plasmid Identification
Format: In person


Authors List: Show

  • Brona Brejova, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia
  • Veronika Tordova, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia
  • Kristian Andrascik, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia
  • Cedric Chauve, Simon Fraser University, Canada
  • Tomas Vinar, Comenius University in Bratislava, Slovakia

Presentation Overview: Show

We study the problem of plasmid identification in short-read assemblies of bacterial isolates. The goal is to classify individual contigs as coming from a chromosome or a plasmid. This problem is typically addressed by machine learning methods combining features derived from input contigs. Some methods also use additional features based on homology to sequences typical for known plasmids or chromosomes. In this work we propose a method for creating such features using log-odds scores based on ideas similar to those traditionally used in sequence alignment scoring. The framework is flexible as it can handle both close homologs as well as protein domains capturing distant homology. Inclusion of these features into the plASgraph2 graph neural network significantly improves its accuracy.

12:50-13:00
Accurate plasmid reconstruction from metagenomics data using assembly-alignment graphs and contrastive learning
Format: In person


Authors List: Show

  • Pau Piera Lindez, University of Copenhagen, Novo Nordisk Foundation Center for Basic Metabolic Research, Denmark
  • Jakob Nissen, University of Copenhagen, Novo Nordisk Foundation Center for Basic Metabolic Research, Denmark
  • Simon Rasmussen, University of Copenhagen, Novo Nordisk Foundation Center for Basic Metabolic Research, Denmark

Presentation Overview: Show

Plasmids are extrachromosomal DNA molecules that enable horizontal gene transfer in bacteria, often conferring advantages such as antibiotic resistance. Despite their significance, plasmids are underrepresented in genomic databases due to challenges in assembling them, caused by mosaicism and micro-diversity. Current plasmid assemblers rely on detecting circular paths in single-sample assembly graphs, but face limitations due to graph fragmentation and entanglement, and low coverage. We introduce PlasMAAG (Plasmid and organism Metagenomic binning using Assembly Alignment Graphs), a framework to recover plasmids and organisms from metagenomic samples that leverages an approach that we call "assembly-alignment graphs” alongside common binning features. On synthetic benchmark datasets, PlasMAAG reconstructed 50–121% more near-complete plasmids than competing methods and improved the Matthews Correlation Coefficient of geNomad contig classification by 28–106%. On hospital sewage samples, PlasMAAG outperformed all other methods, reconstructing 33% more plasmid sequences. PlasMAAG enables the study of organism-plasmid associations and intra-plasmid diversity across samples, offering state-of-the-art plasmid reconstruction with reduced computational costs.

14:00-14:20
Proceedings Presentation: Predicting coarse-grained representations of biogeochemical cycles from metabarcoding data
Format: In person


Authors List: Show

  • Arnaud Belcour, Univ. Grenoble Alpes, Inria, France
  • Loris Megy, Gricad, Inria, CNRS, Université Grenoble Alpes, Grenoble INP, France
  • Sylvain Stephant, French Geological Survey (BRGM), France
  • Caroline Michel, French Geological Survey (BRGM), France
  • Sétareh Rad, French Geological Survey (BRGM), France
  • Petra Bombach, Isodetect GmbH, Germany
  • Nicole Dopffel, NORCE Norwegian Research Center AS, Norway
  • Hidde de Jong, Univ. Grenoble Alpes, Inria, France
  • Delphine Ropers, Univ. Grenoble Alpes, Inria, France

Presentation Overview: Show

Motivation: Taxonomic analysis of environmental microbial communities is now routinely performed thanks to advances in DNA sequencing. Determining the role of these communities in global biogeochemical cycles requires the identification of their metabolic functions, such as hydrogen oxidation, sulfur reduction, and carbon fixation. These functions can be directly inferred from metagenomics data, but in many environmental applications metabarcoding is still the method of choice. The reconstruction of metabolic functions from metabarcoding data and their integration into coarse-grained representations of geobiochemical cycles remains a difficult bioinformatics problem today.

Results: We developed a pipeline, called Tabigecy, which exploits taxonomic affiliations to predict metabolic functions constituting biogeochemical cycles. In a first step, Tabigecy uses the tool EsMeCaTa to predict consensus proteomes from input affiliations. To optimise this process, we generated a precomputed database containing information about 2,404 taxa from UniProt. The consensus proteomes are searched using bigecyhmm, a newly developed Python package relying on Hidden Markov Models to identify key enzymes involved in metabolic function of biogeochemical cycles. The metabolic functions are then projected on coarse-grained representation of the cycles. We applied Tabigecy to two salt cavern datasets and validated its predictions with microbial activity and hydrochemistry measurements performed on the samples. The results highlight the utility of the approach to investigate the impact of microbial communities on geobiochemical processes.

Availability: The Tabigecy pipeline is available at https://github.com/ArnaudBelcour/tabigecy.
The Python package bigecyhmm and the precomputed EsMeCaTa database are also separately available at \https://github.com/ArnaudBelcour/bigecyhmm and https://doi.org/10.5281/zenodo.13354073, respectively.

14:20-14:30
CROCODEEL : ACCURATE CONTROL-FREE DETECTION OF CROSS-SAMPLE CONTAMINATION IN METAGENOMIC DATA
Format: In person


Authors List: Show

  • Lindsay Goulet, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Florian Plaza Oñate, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Pauline Barbet, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Alexandre Famechon, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Benoît Quinquis, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Eugeni Belda, UMMISCO, Sorbonne Université, IRD, F-93143, Bondy, France, France
  • Edi Prifti, UMMISCO, Sorbonne Université, IRD, F-93143, Bondy, France, France
  • Emmanuelle Le Chatelier, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Guillaume Gautreau, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France

Presentation Overview: Show

Metagenomic sequencing provides profound insights into microbial communities, but it is often compromised by technical biases, including cross-sample contamination. This phenomenon arises when microbial content is inadvertently exchanged among concurrently processed samples, distorting microbial profiles and compromising the reliability of metagenomic data and downstream analyses.
Existing detection methods often rely on negative controls, which are inconvenient and do not detect contamination within real samples. Meanwhile, strain-level bioinformatics approaches fail to distinguish contamination from natural strain sharing and lack sensitivity.
To fill this gap, we introduce CroCoDeEL, a decision-support tool for detecting and quantifying cross-sample contamination. Leveraging linear modeling and a pre-trained supervised model, CroCoDeEL identifies specific contamination patterns in species abundance profiles. It requires no negative controls or prior knowledge of sample processing positions, offering improved accuracy and versatility.
Benchmarks across three public datasets demonstrate that CroCoDeEL accurately detects contaminated samples and identifies their contamination sources, even at low rates (<0.1%), provided sufficient sequencing depth. Notably, we discovered critical contamination cases in highly cited studies, calling some of their results into question. Our findings suggest that cross-sample contamination is a widespread yet underexplored issue in metagenomics and emphasize the necessity of systematically integrating contamination detection into sequencing quality control. Future work will consist in developping an innovative approach to remove the contamination signal detected by CroCoDeEL.
CroCoDeEL is freely available at https://github.com/metagenopolis/CroCoDeEL.

Reference
Goulet, L. et al. ""CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomic data"" bioRxiv (2025). https://doi.org/10.1101/2025.01.15.633153.

14:30-14:40
Longflow: A comprehensive end-to-end solution for long-read metagenomics.
Format: In person


Authors List: Show

  • Sebastien Raguideau, Earlham Institute, United Kingdom

Presentation Overview: Show

Transitioning from short-read to long-read sequencing in metagenomics requires methodological refinements. We present Longflow, a versatile pipeline tailored for long-read data, supporting analysis from raw FASTQ/BAM files to annotated metagenome-assembled genomes (MAGs). Built with Snakemake and containerised for reproducibility, Longflow is robust and easily deployed on HPC systems.

Longflow enables flexible analysis, including per-sample or co-assembly schemes, and co-binning, leveraging samples not part of the assembly to enhance binning performance. It integrates tools for taxonomy (e.g., Silva, NR), functional annotation (KEGG, InterProScan), viral detection (GeNomad), and SNV calling (Longshot). MAGs are curated using a consensus approach from multiple binners and classified via GTDB-Tk.

To address the issue of chimeric contigs, particularly problematic in long-read assemblies due to larger contig sizes, we created a visualisation tool to detect these artefacts and implemented a fragmentation heuristic, thus improving MAG recovery and removing one source of contamination.

Longflow also facilitates the incorporation of short-read data for co-binning. We improved read assignment and overall binning results by using a novel k-mer coverage estimation method to handle ambiguous mappings.

Longflow is a reliable and flexible tool for contemporary metagenomic research, and it is constantly being developed and maintained to increase its functionality.

14:40-14:50
Long-reads metagenome-assembled genomes can be higher quality than reference genomes: the case of the Shanghai pet dog microbiome catalog
Format: In person


Authors List: Show

  • Anna Cuscó, Fudan University, China
  • Yiqian Duan, Fudan University, China
  • Fernando Gil, Fudan University, China
  • Shaojun Pan, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
  • Nithya Kruthi, Queensland University of Technology, Australia
  • Alexei Chklovski, Queensland University of Technology, Australia
  • Xing-Ming Zhao, Tongji University, China
  • Luis Pedro Coelho, Queensland university of Technology, Australia

Presentation Overview: Show

We present a comprehensive analysis of the gut microbiome of 50 pet dogs living in Shanghai (China). Both long-read and short-read sequencing methods were employed to deeply sequence fecal samples, enabling high-quality metagenome-assembled genome (MAG) recovery. Polishing long-read assemblies with short reads notably improved MAG quality, particularly for genomes with lower sequencing coverage.

The final MAG collection comprises 2,676 MAGs (72% high-quality), representing 320 bacterial species, and captures global microbial diversity, evidenced by high read mapping rates (>90%) to external datasets from multiple countries. The predominant phyla were Bacillota, Bacteroidota, and Fusobacteriota.

Many of the resulting MAGs are of higher quality than reference genomes available for the same species. In particular, our MAGs more consistently contain ribosomal genes, tRNAs, and mobilome-associated genes; all classes that are known to be difficult to recover (even from sequencing isolates) using short-reads.

Extra-chromosomal (e.g., plasmids or viruses) are another blind spot when using short reads. We recovered 185 circular elements (comprising 58 plasmids, 30 viruses, and 97 elements that cannot be confidently assigned). Several of these contain antibiotic resistance genes, including beta-lactamases.

One-third of identified bacterial species were novel, particularly within genera such as CAG-269 and Dysosmobacter. Additionally, this study demonstrated clear microbiome differences between pet dogs and colony-living dogs, the latter showing higher microbial diversity and higher abundance of probiotic-associated species.

Overall, this study provides the best known resource for pet dog microbiome studies and demonstrates the value of hybrid sequencing to build the highest quality resources.

14:50-15:00
Use of Long-Read SMRT PacBio Sequencing for Detailed Genomic and Epigenetic Studies of Complex Microbial Communities in the Wheat Rhizosphere to Abiotic Stress
Format: In person


Authors List: Show

  • Siphiwe Maseko, Centre for Bioinformatics and Computational Biology (CBCB), BGM, University of Pretoria, South Africa
  • Nwabisa Ngwentle, Centre for Microbial Ecology and Genomics (CMEG), BGM, University of Pretoria, South Africa
  • Teresa Coutinho, Centre for Microbial Ecology and Genomics (CMEG), BGM, University of Pretoria, South Africa
  • Ngwekazi Mehlomakulu, Dep. Consumer and Food Sciences, University of Pretoria, South Africa
  • Oleg Reva, Centre for Bioinformatics and Computational Biology (CBCB), BGM, University of Pretoria, South Africa

Presentation Overview: Show

The wheat rhizosphere harbours complex microbial communities essential for plant health and soil fertility. Traditional sequencing reveals microbial diversity but often misses genomic and epigenetic interactions. Here, long-read SMRT PacBio sequencing was applied to a wheat field in South Africa (34.08551°S, 20.26628°E) to profile microbial communities across varying environmental conditions from August to November 2023, spanning heavy rains to extreme drought seasons. This approach enabled a detailed reconstruction of microbial interactions and the identification of key taxa influencing soil fertility. Network analysis revealed species-specific associations shaping the microbial community. Epigenetic analysis of metagenome assembled contigs demonstrated that Pseudomonas fluorescens, Flavobacterium pectinovorum, and Flavobacterium aquicola thrived in wet conditions but suffered during drought, evidenced by increased oxidized guanine residues in their genome under unfavourable conditions. Conversely, Amycolatopsis camponoti and some uncultured Alpha-proteobacteria and Actinomycetota struggled in floods but flourished in arid conditions. These findings demonstrate the varying responses of rhizobacterial community members to environmental stressors, highlighting the need for a strategic selection of beneficial bacteria used in agro-biopreparations. Selecting microbial inoculants based on their optimal environmental conditions can enhance their efficacy in improving soil fertility and crop resilience. Long-read SMRT sequencing enables species-level identification and detailed genomic and epigenetic insights, which could not be achieved before. Additionally, novel computational tools were developed for modelling microbial networks and predicting oxidized guanine distribution along metagenome-assembled contigs.
This study was conducted for the TRIBIOME Project (https://www.tribiome.eu/) and funded by the Horizon Europe research and innovation program (grant Nº 101084485).

15:00-15:10
proMGEflow: recombinase-based detection of mobile genetic elements in bacterial meta(genomes)
Format: In person


Authors List: Show

  • Anastasiia Grekova, EMBL Heidelberg, Technical University of Munich, Germany
  • Supriya Khedkar, BioQuant, Heidelberg University, Germany
  • Christian Schudoma, EMBL Heidelberg, Germany
  • Chan Yeong Kim, European Molecular Biology Laboratory, Germany
  • Daniel Podlesny, EMBL Heidelberg, Germany
  • Anthony Noel Fullam, EMBL Heidelberg, Germany
  • Jonas Richter, EMBL Heidelberg, Germany
  • Thomas Sebastian Schmidt, University College Cork, United Kingdom
  • Daniel Mende, Keio University, Tokyo, Japan
  • Suguru Nishijima, University of Tokyo, Japan
  • Askarbek Orakov, Harvard T.H. Chan School of Public Health, United States
  • Michael Kuhn, EMBL Heidelberg, Germany
  • Ivica Letunic, EMBL Heidelberg, Germany
  • Peer Bork, EMBL Heidelberg, Germany

Presentation Overview: Show

Mobile Genetic Elements (MGEs) are drivers of bacterial adaptation and can increase fitness of microbial communities in the changing environment. Yet the identification of MGEs remains challenging due to the fuzziness of different MGE types and incompleteness of metagenomic assembled genomes (MAGs). Here we present proMGEflow - a Nextflow pipeline designed to annotate full genomes and MAGs with discrete MGE categories: plasmids, integrons, phages and transposable elements. In comparison to other tools, proMGEflow takes a top-down approach to harmonize all MGEs on one go from a given (meta)genome. Our pipeline uses subfamilies of recombinases as universal MGE markers, as well as MGE type-specific mobility machinery, e.g. structural phage genes, for fine-grained assignment. The MGE boundaries estimation is based on the joined bacterial species pangenome from the MAG and species cluster of high-quality reference genomes from the ProGenomes3 database. By decoupling the MGE boundary determination step into the Python package MGExpose, we can further annotate MGEs in user-provided genomic regions by rule-based classification of their machinery and recombinases. In total, we applied proMGEflow to around 200,000 MAGs from the Searchable Planetary-scale mIcrobiome REsource (SPIRE). This did not only result in the discovery of around 3 million MGEs of different types but also helped to gain the first functional insights into a global environmental mobilome. The availability of scalable and reproducible pipelines for unified MGE annotation from metagenomes will improve our understanding of mechanisms of gene mobility as well as the cross-talk with their prokaryotic hosts.

15:10-15:20
Extracting host-specific developmental signatures from longitudinal microbiome data
Format: In person


Authors List: Show

  • Balazs Erdos, Simula Metropolitan Center for Digital Engineering, Norway
  • Christos Chatzis, Simula Metropolitan Center for Digital Engineering, Norway
  • Jonathan Thorsen, COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Denmark
  • Jakob Stokholm, COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Denmark
  • Age K. Smilde, Biosystems Data Analysis, University of Amsterdam, Netherlands
  • Morten A. Rasmussen, COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Denmark
  • Evrim Acar, Simula Metropolitan Center for Digital Engineering, Norway

Presentation Overview: Show

Longitudinal microbiome studies offer critical insights into microbial community dynamics, helping to distinguish true biological signals from interindividual variability. Tensor decompositions, such as CANDECOMP/PARAFAC (CP), have been applied to analyze longitudinal microbiome data by arranging temporal measurements as a third-order tensor with modes representing taxa, time, and hosts. While these methods have proven useful in revealing the underlying structures in such data, they are limited in their ability to capture host-specific microbial dynamics including individual accelerated or delayed phenomena. To address this limitation, we use the PARAFAC2 model, a more flexible tensor model, which can account for host-specific differences in temporal trajectories of microbial communities. We analyze longitudinal microbiome data from the COPSAC2010 (Copenhagen Prospective Studies on Asthma in Childhood) cohort, tracking gut microbiome maturation in children over their first six years of life, along with data from the FARMM (Food and Resulting Microbial Metabolites) study, examining dietary effects before and after microbiota depletion. We show that both CP and PARAFAC2 decompositions reveal meaningful microbial signatures, including compositional shifts associated with birth mode, presence of older siblings, and dietary interventions. However, while CP captures the main microbial trends in time, PARAFAC2 uncovers host- and subgroup-specific developmental trajectories, offering a more nuanced view of microbiome maturation, highlighting its potential to enhance longitudinal microbiome data analysis. In addition, we discuss the interpretability of the extracted patterns facilitated by the uniqueness properties of CP and PARAFAC2, and discuss potential challenges related to the generalization of the patterns through the concept of replicability.

15:20-15:30
Complex SynCom inoculations to study root community assembly
Format: In person


Authors List: Show

  • Gijs Selten, Utrecht University, Netherlands
  • Florian Lamouche, INRAE Angers-Nantes, France
  • Adrian Gomez Repolles, Aarhus University, Denmark
  • Simona Radutoiu, Aarhus University, Denmark
  • Ronnie de Jonge, Utrecht University, Netherlands

Presentation Overview: Show

The root microbiome is a complex system composed of millions of interacting microbes, some of which have plant-beneficial traits such as priming the plant’s defenses or promoting growth. To apply these traits for sustainable agriculture, however, an understanding of root microbiome assembly, dynamics, and functioning is required. To gain this understanding, we isolated hundreds of rhizobacterial strains from Arabidopsis, Barley, and Lotus, when grown in natural soil. These strains were then cultured and used to reconstitute highly complex Synthetic Communities (SynComs) comprising between 175 and 1,000 strains, which were subsequently inoculated onto the three hosts. After cultivation, the roots were harvested, DNA was extracted, and were subjected to shotgun metagenomics to identify and quantify the SynCom members. Using the genomic sequences of the bacterial strains, we examined both communal functions – i.e. bacterial functions enriched in the root microbiome–and individual traits that enhance a strain’s competitiveness. Community analyses revealed the three hosts to select for functions irrespective of taxonomic origin, with an enriched selection of functions related to amino acid and vitamin metabolism, quorum sensing, and flagellar assembly. Furthermore, metagenome-wide association analysis of the most successful strains highlighted metabolic diversity, motility and secretion systems as key traits in driving a strain’s competitiveness. Although root competence functions in rhizobacteria have been studied extensively, our dataset enables the investigation of these traits within a complex microbiome context that closely resembles natural communities. This unique feature offers new insights into how plant-microbe and microbe-microbe interactions shift across different environmental and community contexts.

15:30-15:40
Spatial and temporal variation of marine microbial interactions around the west Antarctic Peninsula
Format: In person


Authors List: Show

  • Julia C Engelmann, Royal Netherlands Institute for Sea Research (NIOZ), Netherlands
  • Swan Ls Sow, Royal Netherlands Institute for Sea Research (NIOZ), Netherlands
  • Willem H van de Poll, Faculty of Science and Engineering, University of Groningen, The Netherlands, Netherlands
  • Rachel Eveleth, Oberlin College, Department of Geosciences, Ohio, USA, United States
  • Jeremy J Rich, School of Marine Sciences, Darling Marine Centre, University of Maine, Maine, USA, United States
  • Hugh W Ducklow, Dep. of Earth and Environmental Sciences and Lamont-Doherty Earth Observatory, Columbia University, New York, USA, United States
  • Patrick D Rozema, Faculty of Science and Engineering, University of Groningen, The Netherlands, Netherlands
  • Catherine M Luria, Laboratory of Systems Pharmacology, Harvard Medical School, Boston, Massachusetts, USA, United States
  • Henk Bolhuis, Royal Netherlands Institute for Sea Research (NIOZ), Netherlands
  • Michael P Meredith, British Antarctic Survey, Cambridge, United Kingdom, United Kingdom
  • Linda Amaral-Zettler, Royal Netherlands Institute for Sea Research (NIOZ), Netherlands

Presentation Overview: Show

The west Antarctic Peninsula (WAP) has experienced more dramatic increases in temperature due to climate change than the rest of the continent and the global average. Moreover, the northern region of the WAP, hosting the research station Palmer, has seen higher temperatures and lower sea ice extent than the South, where the Rothera research station is located. We assessed bacterial and microbial eukaryote communities and their seasonal variation at the Palmer and Rothera time-series sites between July 2013-April 2014 and predicted inter-and intra-domain causal effects. We found that microbial communities were considerably different between the two sites, with differences being attributed to seawater temperature and sea ice coverage in combination with sea ice type differences. We predicted microbial interactions with causal effect modelling, which corrects for spurious correlations and takes the direction of information flow into account (using a directed acyclic graph reconstruction approach to identify confounders). Causal effect analysis suggested that bacteria were stronger drivers of ecosystem dynamics at Palmer, while microbial eukaryotes played a stronger role at Rothera. The parasitic taxa Syndiniales persevered at both sites across the seasons, with Palmer and Rothera harbouring different key groups. However, at Rothera Syndiniales dominated in the set of negative causal effects while this was not the case at Palmer, suggesting that parasitism drives community dynamics at Rothera more strongly than at Palmer. Our research sheds light on the dynamics of microbial community composition and potential microbial interactions at two sampling locations that represent different climate regimes along the WAP.

15:40-15:50
Associations between Microbiome-Associated Variants and Diseases
Format: In person


Authors List: Show

  • Tess Cherlin, University of Pennsylvania, United States
  • Jagyshila Das, University of Pennsylvania, United States
  • Colleen Morse, University of Pennsylvania, United States
  • Regeneron Genetics Center, Regeneron Genetics Center, United States
  • Penn Medicine Biobank, University of Pennsylvania, United States
  • Seth Bordenstein, Pennsylvania State University, United States
  • Anurag Verma, University of Pennsylvania, United States
  • Shefali Setia-Verma, University of Pennsylvania, United States

Presentation Overview: Show

High throughput sequencing, studies have investigated the microbiome’s association with diseases and genetic variants. We aimed to 1) extended previously identified microbiome-associated variants (MAVs) from microbiome GWAS (mbGWAS) to include newer non-European population studies and 2) leverage biobank data with large sample sizes for genetically diverse population groups. We did phenome-wide association study (PheWAS) in the Penn Medicine Biobank (PMBB) which included 41,102 patients from two genetically inferred ancestry groups. Next, we mined MAV associations from PheWAS data in the NIH’s All of Us Biobank (n = 205,237) and the Million Veterans Program (MVP) Biobank (n= 630,969). We then meta-analyzed the MAV by PheWAS results from these three datasets as well as the results from each ancestry-specific meta-analysis. We found 13 significant associations from the AFR meta-analysis (p-value ≤ 4.6e-08), 205 significant associations from the EUR meta-analysis (p-value ≤ 3.4e-08), and 122 significant associations for 25 unique MAVs from the meta-analysis across all ancestries (p-value ≤ 6.6e-08). To extend findings from our PheWAS, we performed QTL and causal inference testing analysis on these 25 loci using the SMR portal to determine whether these loci showed evidence of shared genetic signals across traits. We found several significant relationships especially in significantly associated traits like psoriasis, venous thromboembolism, and type 2 diabetes, and gout. Future work will investigate microbiome-QTLs to understand the potential causal relationship between MAVs, the microbiome, and disease phenotypes. This research sets the stage for further investigations aiming to uncover the mechanisms and clinical implications of microbiome-disease associations.

15:50-16:00
Detecting Synergistic Associations in Microbial Communities via Multi-Dimensional Feature Selection
Format: In person


Authors List: Show

  • Sajad Shahbazi, Computational Centre, University of Białystok, Poland
  • Piotr Stomma, Faculty of Computer Science, University of Białystok, Poland
  • Tara Zakerali, Computational Centre, University of Białystok, Poland
  • Balakrishnan Subramanian, Computational Centre, University of Białystok, Poland
  • Kinga Zielinska, Malopolska Centre of Biotechnology, Jagiellonian University, Poland
  • Paweł Łabaj, Malopolska Centre of Biotechnology, Jagiellonian University, Poland
  • Izabela Święcicka, Faculty of Biology, University of Białystok, Poland
  • Marek Bartoszewicz, Faculty of Biology, University of Białystok, Poland
  • Krzysztof Mnich, Computational Centre, University of Białystok, Poland
  • Witold Rudnicki, Faculty of Computer Science, University of Białystok, Poland

Presentation Overview: Show

The gut microbiome regulates host immunity, barrier function, and inflammatory processes. While many studies have identified individual taxa associated with disease, they often overlook higher-order dependencies within microbial communities. We present a methodology that combines information-theoretic feature selection and machine learning to identify taxa whose predictive relevance may depend on synergy with other community members.

We apply this framework to data from the American Gut Project, focusing on the presence or absence of self-reported food allergy. Taxonomic profiles were normalised and binarised using two thresholding strategies. After quality filtering, the dataset included samples from 1694 healthy and 1847 allergic individuals. We used the Multi-Dimensional Feature Selection (MDFS) algorithm to evaluate information gain in both univariate (1D) and pairwise (2D) settings. As a baseline, we performed U-tests to identify taxa with significantly different abundances between groups. Predictive models were built using Random Forest classifiers trained separately on features selected via each method.

MDFS outperformed the U-test in both sensitivity and robustness. Fifteen taxa were consistently selected by all methods, while MDFS variants uniquely recovered 42. The 2D analysis revealed 18 taxa that carried no predictive value alone but contributed significant information in combination with others, suggesting synergistic structure. Conventional univariate approaches would have overlooked these taxa.

The results demonstrate the utility of synergy-aware feature selection for capturing complex, non-additive associations in microbial communities. Similar patterns observed across other cohorts indicate the potential generalizability of this approach.