Attention Presenters - please review the Presenter Information Page available here
Schedule subject to change
All times listed are in EDT
Saturday, July 13th
10:40-11:25
Invited Presentation: Towards fully genome-resolved metagenomics
Confirmed Presenter: Christopher Quince, Earlham Institute, UK

Room: 520c
Format: Live Stream

Moderator(s): Mihai Pop


Authors List: Show

  • Christopher Quince, Earlham Institute, UK
  • Gaetan Benoit, Pasteur Institute, France
  • Rayan Chikhi, Pasteur Institute, France
  • Sebastien Raguideau, Earlham Institute, UK
  • Rob James, Quadram Institute, UK

Presentation Overview: Show

I will discuss the impact of long accurate sequencing reads generated by HiFi PacBio on the assembly of microbial genomes directly from metagenomes. I will present our recent assembler metaMDBG based on minimiser de Bruijn graphs for this application. I will also talk about the prospects for the use of the latest ONT reads which are approaching HiFi levels of accuracy. I will conclude with a discussion of Hi-C proximity ligation for metagenome binning and the linking of extra-chromosomal elements to genomes.

11:25-11:40
Effective binning of metagenomic contigs using contrastive multi-view representation learning
Confirmed Presenter: Shanfeng Zhu, Fudan University, China

Room: 520c
Format: Live Stream

Moderator(s): Mihai Pop


Authors List: Show

  • Ziye Wang, Fudan University, China
  • Ronghui You, Nankai University, China
  • Haitao Han, Fudan University, China
  • Wei Liu, Fudan University, China
  • Fengzhu Sun, University of Southern California, United States
  • Shanfeng Zhu, Fudan University, China

Presentation Overview: Show

Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).

11:40-12:00
Proceedings Presentation: Floria: Fast and accurate strain haplotyping in metagenomes
Confirmed Presenter: Jim Shaw, University of Toronto, Canada

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Jim Shaw, University of Toronto, Canada
  • Jean-Sebastien Gounot, Genome Institute of Singapore, Singapore
  • Hanrong Chen, Genome Institute of Singapore, Singapore
  • Niranjan Nagarajan, Genome Institute of Singapore, Singapore
  • Yun William Yu, Carnegie Mellon University, United States

Presentation Overview: Show

Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes showed that Floria is > 3x faster and recovers 21% more strain content than base-level assembly methods (Strainberry), while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took < 20 minutes on average per sample, and identified several species that have consistent strain heterogeneity. Applying Floria’s short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses.

12:00-12:20
The impact of transitive annotation on the training of taxonomic classifiers
Confirmed Presenter: Mihai Pop, University of Maryland, United States

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Harihara Subrahmaniam Muralidharan, University of Maryland, United States
  • Noam Fox, University of Maryland, United States
  • Nathalie Bonin, University of Maryland, United States
  • Mihai Pop, University of Maryland, United States

Presentation Overview: Show

A common task in the analysis of microbial communities involves assigning taxonomic labels to the sequences derived from organisms found in the communities. Frequently, such labels are assigned using machine learning algorithms that are trained to recognize individual taxonomic groups based on training data sets that comprise sequences with known taxonomic labels. Ideally, the training data should rely on labels that are experimentally verified—formal taxonomic labels require knowledge of physical and biochemical properties of organisms that cannot be directly inferred from sequence alone. However, the labels associated with sequences in biological databases are most commonly computational predictions which themselves may rely on computationally-generated data—a process commonly referred to as “transitive annotation”. Here, we focus on taxonomic annotation using a naïve Bayes classifier developed for the annotation of 16S rRNA gene sequences—the Ribosomal Database Project (RDP) classifier. We chose this data set and classifier since they are established resources in microbial ecology, however, our general methodology and conclusions apply more broadly to any sequence-based machine-learning classifier. We demonstrate that even a few computationally-generated training data points can significantly skew the output of the classifier to the point where entire regions of the taxonomic space can be disturbed. To exemplify we note that retraining the classifier with artificial sequences caused changes that spanned microbial phyla. We also discuss key factors that affect the resilience of classifiers to transitively-annotated training data, and propose best practices to avoid the artifacts of transitive annotation.

Sensitive, specific association of microbial functions with host phenotypes using Phylogenize2
Confirmed Presenter: Patrick Bradley, The Ohio State University, United States

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Kathryn Kananen, The Ohio State University, United States
  • Patrick Bradley, The Ohio State University, United States

Presentation Overview: Show

In metagenomics, a key challenge is to explain differences in microbial communities in terms of gene function. Many common approaches to this problem do not account for the fact that related species tend to share both genes and phenotypes. This makes them susceptible to discovering clade markers for differentially-abundant microbes, which often have weak evidence for functional relevance. We have developed a new major version, Phylogenize2, of a tool that allows researchers to link community-level changes to gene content while accounting for phylogeny. This revision leverages large, modern collections of isolate and metagenome-assembled genomes, allowing the method to be applied across a range of biomes. We have also substantially improved Phylogenize2's statistical power by combining new, microbiome-specific differential abundance methods with adaptive shrinkage. As a test case, we apply Phylogenize2 to a human cohort with liver cirrhosis, and discover a link between abundances of the Lachnospiraceae (a prevalent group of commensal Clostridia) and anaerobic oxidative stress. Our preliminary results suggest that Phylogenize2 is both more specific than linear modeling and more sensitive than competing methods. Phylogenize2 is a publicly available, open source tool that can extract specific functional information from a wide variety of environmental and host-associated microbiomes.

14:20-14:40
Proceedings Presentation: Reference-free Structural Variant Detection in Microbiomes via Long-read Co-assembly Graphs
Confirmed Presenter: Kristen Curry, Rice University, United States

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Kristen Curry, Rice University, United States
  • Feiqiao Yu, Arc Institute, United States
  • Summer Vance, University of California, Berkeley, United States
  • Santiago Segarra, Rice University, United States
  • Devaki Bhaya, Carnegie Institute for Science, United States
  • Rayan Chikhi, Institut Pasteur, Université Paris Cité, France
  • Eduardo Rocha, Institut Pasteur, Université Paris Cité, France
  • Todd Treangen, Rice University, United States

Presentation Overview: Show

Bacterial genome dynamics are vital for understanding the mechanisms underlying microbial adaptation, growth, and their broader impact on host phenotype. Structural variants (SVs), genomic alterations of 50 base pairs or more, play a pivotal role in driving evolutionary processes and maintaining genomic heterogeneity within bacterial populations. While SV detection in isolate genomes is relatively straightforward, metagenomes present broader challenges due to the absence of clear reference genomes and the presence of mixed strains. In response, our proposed method rhea, forgoes reference genomes and metagenome-assembled genomes (MAGs) by encompassing a single metagenome co-assembly graph constructed from all samples in a series. The log fold change in graph coverage between subsequent samples is then calculated to call SVs that are thriving or declining throughout the series. We show rhea to outperform existing methods for SV and horizontal gene transfer (HGT) detection in two simulated mock metagenomes, which is particularly noticeable as the simulated reads diverge from reference genomes and an increase in strain diversity is incorporated. We additionally demonstrate use cases for rhea on series metagenomic data of environmental and fermented food microbiomes to detect specific sequence alterations between successive time and temperature samples, suggesting host advantage. Our approach leverages previous work in assembly graph structural and coverage patterns to provide versatility in studying SVs across diverse and poorly characterized microbial communities for more comprehensive insights into microbial gene flux.

14:40-14:55
Targeted Sequencing and Triplet Loss classification allow for microbiome-based inference of soil sample origin
Confirmed Presenter: Paweł P. Łabaj, Małopolska Centre of Biotechnology of Jagiellonian University, Poland

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Michał Kowalski, Jagiellonian University, Poland
  • Kamila Marszałek, Jagiellonian University, Poland
  • Alina Frolova, The Institute of Molecular Biology and Genetics of NASU, Ukraine
  • Kinga Herda, Jagiellonian University, Poland
  • Agata Jagiełło, Central Forensic Laboratory of the Police, Poland
  • Anna Woźniak, Central Forensic Laboratory of the Police, Poland
  • Kaja Milanowska-Zabel, Ardigen S.A., Poland
  • Rafał Płoski, Medical University of Warsaw, Poland
  • Andrzej Ossowski, Pomeranian Medical University, Poland
  • Renata Zbieć-Piekarska, Central Forensic Laboratory of the Police, Poland
  • Wojciech Branicki, Institute of Zoology and Biomedical Research of the Jagiellonian University, Poland
  • Paweł P. Łabaj, Małopolska Centre of Biotechnology of Jagiellonian University, Poland

Presentation Overview: Show

Microbiome characterization has been successfully applied in forensic studies. However, from MetaSUB Consortium and CAMDA we know that the full forensic potential of environmental metagenomic data is yet to be unveiled. Thus the aim of our SMAFT project was to develop a complete (wet-lab + dry-lab) solution for forensic laboratories in Poland.
Based on earlier gained experience we first have analyzed climate and geographical properties of Poland to select 80 locations. From those the samples have been collected throughout four seasons in triplicates resulting in almost 1000 total. They were then profiled with WMS with about 120 million read-pairs per sample. Which were then fed (and MetaSUB ones as non-Polish negative reference) to MetaGraph to obtain the set of metagenomic features (unitigs) to be used to distinguish between respective locations.
We have ultimately identify 1015 All Relevant Features. which in wet-lab part were used for designing probes for Targeted Metagenomics Sequencing panel, while in dry-lab part were source of data for classification/prediction model.
For sample origin prediction the Triplet Loss based solution was used to reduce the dimensionality of the metagenomic profiles and then Deep Neural Network to obtain probabilities of the origin of the sample. The overall performance was: LRAP: 0.87; Weighted F1: 0.86; Balanced Accuracy: 0.94 on all validation samples (n=547).
Our work has confirmed that targeted microbiome sequencing of soil sample together with appropriate data science and AI processing allows for accurate sample origin classification even in such climate-homogeneous country as Poland.

14:55-15:10
Integration and analysis of 168,000 human gut microbiome samples
Confirmed Presenter: Samantha Graham, University of Minnesota, United States

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Samantha Graham, University of Minnesota, United States
  • Richard Abdill, University of Chicago, United States
  • Vincent Rubinetti, University of Colorado School of Medicine, United States
  • Frank Albert, University of Minnesota, United States
  • Casey Greene, University of Colorado School of Medicine, United States
  • Sean Davis, University of Colorado School of Medicine, United States
  • Ran Blekhman, University of Chicago, United States

Presentation Overview: Show

The human microbiome, or the collection of microorganisms within the human body, plays an important role in modulating human health and disease. While some health-relevant patterns may only be discoverable with thousands of samples, the average gut microbiome study contains fewer than 100 samples; fortunately, hundreds of thousands of samples are available via public repositories such as NCBI SRA. Here, we leverage this publicly available data by uniformly processing and integrating 168,464 16S rRNA gene sequencing human gut microbiome samples. This resource, the Human Microbiome Compendium, is freely available via our website (MicroBioMap.org) and R package (MicroBioMap). We use this dataset to characterize global patterns of variation in microbiome composition.
We classified our samples into eight world regions to investigate patterns of variation, and found distinct differences in microbiome composition between regions. We sought to understand specific microbes driving these differences and found that the 65 most abundant genera were differentially abundant between at least one pair of regions. We also note the disparity in sampling between world regions. Over 90,000 of the samples originate from Europe and Northern America, while we identified only 4 samples from Oceania. We show that some regions likely have many taxa that remain undiscovered due to undersampling.
Here, we present a new, large-scale collection of human gut microbiome data, which we use to study global microbiome variation. We expect this compendium will be a valuable resource for the community and enable novel insights into the microbial ecology of the human gut.

15:10-15:30
MC-Funcformer: A foundational model of microbial community metabolism
Confirmed Presenter: Ananthan Nambiar, University of Illinois Urbana-Champaign, United States

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Ananthan Nambiar, University of Illinois Urbana-Champaign, United States
  • John Forsyth, University of Illinois Urbana-Champaign, United States
  • Zihan Wang, University of Illinois Urbana-Champaign, United States
  • Veronika Dubinkina, Gladstone Institute for Data Science and Biotechnology, United States
  • Sergei Maslov, University of Illinois Urbana-Champaign, United States

Presentation Overview: Show

Microbial communities are remarkably complex and encompass an incredible diversity of bacteria, archaea, fungi, and viruses. The metabolic functions of individual community members are at the core of this complexity. Despite the immense taxonomic diversity of microorganisms, there is a notable degree of functional redundancy and universality across taxa. Here, we leverage the recent advances in large language models (LLMs) and a large database of microbial community profiles across a wide range of environments to learn the principles of functional universality. In particular, we introduce MC-Funcformer, a novel approach that pretrains a language model on information extracted from microbial functional abundance data. By using functional information in lieu of taxonomic information, we are able to represent microbial community profiles derived from diverse environments. Our findings highlight the utility of MC-Funcformer in predicting various metadata associated with microbial communities, including host phenotypes and environmental properties. Notably, the embeddings generated by MC-Funcformer outperform traditional approaches based solely on functional abundance vectors, improving predictions of host diseases, diet, and environmental parameters. Furthermore, our analysis reveals the universal nature of the embeddings, enabling generalization across different microbiomes and prediction tasks.

Assessing Microbial Genome Representation Across Various Reference Databases: A Comprehensive Evaluation
Confirmed Presenter: Serghei Mangul, University of Southern California, United States

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Grigore Boldirev, GEORGIA STATE UNIVERSITY, United States
  • Nitesh Sharma, University of Southern California, United States
  • Viorel Munteanu, Technical University of Moldova, Moldova
  • Arun Bhavatharini, University of Southern California, United States
  • David Koslicki, Pennsylvania State University, United States
  • Alex Zelikovsky, GEORGIA STATE UNIVERSITY, United States
  • Serghei Mangul, University of Southern California, United States

Presentation Overview: Show

Metagenomics research can provide significant insights into the composition, diversity and functions of mixed microbial communities found in various environments. To identify bacterial species, reads from samples are mapped to references that are found in bacterial reference databases. Multiple references may be assigned the same taxonomic identifiers yet these references may contain different genomic information. This project was designed to uncover and correct inconsistencies in bacterial reference databases by comparing species names and genomic representation for the two most commonly used bacterial reference databases PATRIC and Refseq. NCBI’s taxonomic identifiers were utilized to assess the agreement of reference databases at the species rank. Same species across two databases were identified by finding the same taxID in two databases. Comparison of genomic representation across databases was performed using the BLAST tool. After finding the exact same strain, all the contigs from one database were aligned to all contigs from another. This analysis was extended to all overlapping species where strain information was available. The results of the study revealed substantial discrepancies across the databases in the presence of bacterial species. 12.4% of species are present in all three databases, 29.49% are found only in two databases, and 58.46% are found only in one database. To compare genomic representation, we visualized gathered data on all observed alignment cases showing that quality of reference genomes can be improved through consolidation of contigs. This evaluation is a fundamental step towards creating a comprehensive reference database that will substantially improve the accuracy of metagenomics research.

15:30-16:00
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater
Confirmed Presenter: Victor Gordeev, Department of Computers, Informatics, and Microelectronics, Technical University of Moldova, Moldova

Room: 520c
Format: Live Stream

Moderator(s): Mihai Pop


Authors List: Show

  • Victor Gordeev, Department of Computers, Informatics, and Microelectronics, Technical University of Moldova, Moldova
  • Viorel Munteanu, Department of Computers, Informatics, and Microelectronics, Technical University of Moldova, Moldova
  • Shelesh Agrawal, Department of Civil and Environmental Engineering Sciences, Technical University of Darmstadt, Germany
  • Martin Hölzer, Genome Competence Center (MF1), Method Development and Research Infrastructure, Robert Koch Institute, Germany
  • Adam Smith, Astani Department of Civil and Environmental Engineering University of Southern California, United States
  • Dumitru Ciorba, Department of Computers, Informatics, and Microelectronics, Technical University of Moldova, Moldova
  • Serghei Mangul, Titus Family Department of Clinical Pharmacy, University of Southern California, United States

Presentation Overview: Show

Wastewater genomic surveillance of SARS-CoV-2 has emerged as a scalable, cost-effective, passive surveillance tool to monitor viral variants circulating in the human population. However, accurate estimation of viral lineage prevalence in communities relies on the performance of computational methods for analyzing wastewater sequencing data. We perform a comprehensive benchmarking of bioinformatics methods designed for estimating the relative abundance of SARS-CoV-2 (sub)lineages from wastewater sequencing data, along with RNA-Seq and metagenomics methods repurposed for this task. We systematically compare the accuracy of these computational methods in estimating the relative abundances of the (sub)lineages present in a sample, including closely related and low-abundance (sub)lineages. Our preliminary results on simulated data and a few computational methods show that RNA-Seq methods RSEM (most accurate), Kallisto, and Salmon consistently achieve lower L1 errors for lineages and particularly for sublineages when compared to wastewater-surveillance methods Alcov and PiGx. In particular, while the distribution of absolute errors for lineages is similar, for sublineages, roughly 80% of the absolute error values for RSEM, Kallisto, and Salmon are lower than 0.04%, compared to roughly 75% for Alcov and only 25% for PiGx. In addition to extensive simulated data, we will use in vitro mixtures of (sub)lineages of various complexity prepared from synthetic RNA genomes or inactivated viral particles and sequenced using short-read and long-read technologies. Using different experimental strategies, we will also investigate how the performance of these computational methods is impacted by the wastewater matrix or wastewater nucleic acid background, but also by the design of the sequencing experiment. Our study will inform the selection of the most accurate, robust, and sensitive methods for SARS-CoV-2 lineage prevalence estimation to enable effective wastewater-based genomic surveillance.

MetaViz: Realistic assortment of novel metagenomics benchmarks with diverse biological and technological characteristics
Confirmed Presenter: Nitesh Kumar Sharma, Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089, United States

Room: 520c
Format: Live Stream

Moderator(s): Mihai Pop


Authors List: Show

  • Nitesh Kumar Sharma, Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089, United States
  • Karishma Chhugani, Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089, United States
  • Viorel Munteanu, Department of Computers, Informatics and Microelectronics, Technical University of Moldova, Chisinau, 2045, Moldova, Moldova
  • Pavel Skums, School of Computing, University of Connecticut, 371 Fairfield Way, Storrs, 06269, CT, USA, United States
  • Alex Zelikovsky, Department of Computer Science, Georgia State University, Atlanta, GA, USA, United States
  • Serghei Mangul, Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089, United States

Presentation Overview: Show

Metagenomics research relies heavily on bioinformatics methods for analyzing complex microbial communities, necessitating rigorous validation through benchmarking. However, creating high-quality experimental benchmarks can be costly and challenging. Current benchmarking efforts often rely on limited gold-standard samples or synthetic data, hindering comprehensive evaluations. To address this, we propose MetaViz, a tool for generating semi-real novel metagenomics benchmarks through in silico modification of existing experimental data. MetaViz offers a cost-effective alternative, combining elements of real data with simulated modifications, surpassing the limitations of purely simulated datasets. Our tool allows precise control over sample composition, diversity, and technological characteristics, enhancing benchmarking accuracy and applicability. We applied MetaViz to over 27 real metagenomics benchmarks, including in-vitro viral mock communities and intra-host clinical samples. Our tool allowed us to precisely control the composition and the abundance of microbial genomes in the in-vitro mixtures (mock community). We were also able to adjust their relative abundance with varying frequency ranging from 0.1% to 10%. Leveraging reference mapping, we introduced varying errors within the read data, thereby enhancing reliability. Our method introduces a novel approach to benchmarking in metagenomics, particularly valuable where traditional gold-standard creation is impractical. By capturing the complexity of actual datasets, MetaViz produces semi-real benchmarks that encompass a broader range of clinical and technological characteristics, ultimately enhancing benchmarking comprehensiveness. Adoption of our approach promises to significantly improve benchmarking studies' robustness and accuracy, advancing our understanding of microbial communities across diverse biological contexts.

Phage Host Prediction Using Novel Global-Scale Phage-Host Interaction Atlas and Genomic Language Models
Confirmed Presenter: Jonas Grove, Phase Genomics, United States

Room: 520c
Format: In Person

Moderator(s): Mihai Pop


Authors List: Show

  • Jonas Grove, Phase Genomics, United States
  • Samuel Bryson, Phase Genomics, United States
  • Benjamin Auch, Phase Genomics, United States
  • Bradley Nelson, Phase Genomics, United States
  • Cristiana Carpinteiro, Loka, Portugal
  • Zach Sisson, Phase Genomics, United States
  • Demi Glidden, Phase Genomics, United States
  • Emily Reister, Phase Genomics, United States
  • Ivan Liachko, Phase Genomics, United States

Presentation Overview: Show

Viruses, including bacteriophages and archaeal viruses, are the most abundant form of life on earth (1031), interacting with all life and shaping the global ecosystem. However, phage-host relationships have proven challenging to identify without culture-based experiments to generate unambiguous evidence for a phage’s presence in a given host. These experiments inherently require that all hosts are culturable, restricting the microbial diversity that can be surveyed.

Proximity ligation sequencing is a powerful metagenomic method for associating viruses with their hosts directly in native microbial communities. Proximity ligation captures, in vivo, physical interactions between the host microbial genome and the genetic material of both lytic and lysogenic phages. These linkages offer direct evidence that phage sequences were present within an intact host cell, establishing a phage-host pair without the propagation of living bacterial cells. The combination of intra-phage and phage-host signal enables us to simultaneously deconvolve viral and microbial genomes directly from metagenomes, and to assign microbial hosts to large numbers of viruses without culturing.

Our application of this technology to thousands of complex microbiome samples has yielded host assignments for hundreds of thousands of novel phage and archaeal viruses. Utilizing our expanded phage-host interaction training data, and leveraging advancements made in the field of natural language processing (NLP) and genomic large language models (LLMs), we have developed deep learning networks that model the dynamics between phages and microbial hosts at sequence-level resolution. We will report published and unpublished work highlighting the power of this approach in the field of metagenomic discovery.

16:40-17:00
Proceedings Presentation: Towards more accurate microbial source tracking via non-negative matrix factorization (NMF)
Confirmed Presenter: Yanni Sun, City University of Hong Kong, Hong Kong

Room: 520c
Format: In Person

Moderator(s): Serghei Mangul


Authors List: Show

  • Ziyi Huang, City University of Hong Kong, Hong Kong
  • Dehan Cai, City University of Hong Kong, Hong Kong
  • Yanni Sun, City University of Hong Kong, Hong Kong

Presentation Overview: Show

Motivation: The microbiome of a sampled habitat often consists of microbial communities from various sources, including potential contaminants. Microbial source tracking (MST) can be used to discern the contribution of each source to the observed microbiome data, thus enabling the identification and tracking of microbial communities within a sample. Therefore, MST has various applications, from monitoring microbial contamination in clinical labs to tracing the source of pollution in environmental samples. Despite promising results in MST development, there is still room for improvement, particularly for applications where precise quantification of each source’s contribution is critical.
Results: In this study, we introduce a novel tool called SourceID-NMF towards more precise microbial source tracking. SourceID-NMF utilizes a non-negative matrix factorization (NMF) algorithm to trace the microbial sources contributing to a target sample, without assuming specific probability distributions. By leveraging the taxa abundance in both available sources and the target sample, SourceID-NMF estimates the proportion of available sources present in the target sample. To evaluate the performance of SourceID-NMF, we conducted a series of benchmarking experiments using simulated and real data. The simulated experiments mimic realistic yet challenging scenarios for identifying highly similar sources, irrelevant sources, unknown sources, low abundance sources, and noise sources. The results demonstrate the superior accuracy of SourceID-NMF over existing methods. Particularly, SourceID-NMF accurately estimated the proportion of irrelevant and unknown sources while other tools either over- or under-estimated them. Additionally, the noise sources experiment also demonstrated the robustness of SourceID-NMF for MST.

17:00-17:15
Carbohydrate-active enzyme annotation in microbiomes using dbCAN3
Confirmed Presenter: Yanbin Yin, University of Nebraska - Lincoln, United States

Room: 520c
Format: In Person

Moderator(s): Serghei Mangul


Authors List: Show

  • Yanbin Yin, University of Nebraska - Lincoln, United States

Presentation Overview: Show

Carbohydrate active enzymes (CAZymes) are made by various organisms for complex carbohydrate metabolism. Genome mining of CAZymes has become a routine data analysis in microbiome sequencing projects, owing to the importance of CAZymes in bioenergy, microbiome, nutrition, agriculture, and global carbon recycling. In 2012, dbCAN was provided as an online web server for automated CAZyme annotation. dbCAN2 (https://bcb.unl.edu/dbCAN2) was further developed in 2018 as a meta server to combine multiple tools for improved CAZyme annotation. dbCAN2 also included CGC-Finder, a tool for identifying CAZyme gene clusters (CGCs) in (meta-)genomes. We have updated the meta server to dbCAN3 with the following new functions and components: (i) dbCAN-sub as a profile Hidden Markov Model database (HMMdb) for substrate prediction at the CAZyme subfamily level; (ii) searching against experimentally characterized polysaccharide utilization loci (PULs) with known glycan substates of the dbCAN-PUL database for substrate prediction at the CGC level; (iii) a majority voting method to consider all CAZymes with substrate predicted from dbCAN-sub for substrate prediction at the CGC level; (iv) improved data browsing and visualization of substrate prediction results on the website. In summary, dbCAN3 not only inherits all the functions of dbCAN2, but also integrates three new methods for glycan substrate prediction in microbiome sequencing data.

Publication: https://academic.oup.com/nar/article/51/W1/W115/7147496

17:15-17:30
NUTRIclock, NEURAL NETWORKS ANALYSIS OF MICROBIOME FOR IMPLEMENTING PRECISSION NUTRITION IN AGING.
Confirmed Presenter: Adrian Martin-Segura, IMDEA Food Institute, 28049 Madrid, Spain., Spain

Room: 520c
Format: In Person

Moderator(s): Serghei Mangul


Authors List: Show

  • Adrian Martin-Segura, IMDEA Food Institute, 28049 Madrid, Spain., Spain
  • Laura J. Marcos-Zambrano, IMDEA Food Institute, 28049 Madrid, Spain., Spain
  • Blanca Lacruz-Pleguezuelos, IMDEA Food Institute, 28049 Madrid, Spain., Spain
  • Alberto Diaz-Ruiz, IMDEA Food Institute, 28049 Madrid, Spain., Spain
  • Enrique Carrillo de Santa Pau, IMDEA Food Institute, 28049 Madrid, Spain., Spain

Presentation Overview: Show

Aging is the greatest risk factor for the development of chronic diseases like neurodegenerative disorders or cancer. The increase in life expectancy makes extremely urgent to understand and develop mechanisms to promote healthy aging. In that sense, the study of microbiome has emerged as an important factor in aging processes. The gut microbiome comprises all gut microorganisms, and different scientific evidence points towards microbiome alterations as important events in disease onset. Moreover, different nutritional interventions have shown their efficiency to modulate human microbiome and physiology due to their versatility. However, their efficiency is highly dependent on the individual’s genomic and metagenomic background. We propose using neural networks (NN) algorithms and ~3700 whole genome shotgun samples from public repositories, to elucidate microbiome changes with aging. Using samples’ microbial profiles (relative abundance and gene families), we are developing NUTRIclock, an aging clock based on microbiome. We are trying different NN architectures (convolutional, GraphNN) to find the one that better accommodates to microbiome particularities, encoding in it microbial and metabolic changes observed along aging. NUTRIclock will serve i) as a tool to calculate the biological age of a patient, in opposition to his/her chronological age; ii) to find microbiome patterns altered in the patients that could be driving potential effects of that biological age. Thus, allowing to implement personalized nutritional interventions according to a patient microbiome profile, to influence on its dynamics, enhancing the development of precision nutrition and personalized medicine fields.

17:30-17:40
MIOSTONE: Modeling microbiome-trait associations with taxonomy-adaptive neural networks
Confirmed Presenter: Yifan Jiang, Cheriton School of of Computer Science, University of Waterloo, Canada

Room: 520c
Format: In Person

Moderator(s): Serghei Mangul


Authors List: Show

  • Yifan Jiang, Cheriton School of of Computer Science, University of Waterloo, Canada
  • Matthew Aton, School of Life Sciences, Arizona State University, United States
  • Qiyun Zhu, School of Life Sciences, Arizona State University, United States
  • Yang Lu, Cheriton School of of Computer Science, University of Waterloo, Canada

Presentation Overview: Show

The human microbiome, a complex ecosystem of microorganisms inhabiting the body, plays a critical role in human health. Investigating its association with host traits is essential for understanding its impact on various diseases. Although shotgun metagenomic sequencing technologies have produced vast amounts of microbiome data, analyzing such data is highly challenging due to its sparsity, noisiness, and high feature dimensionality. Here we propose MIOSTONE, a novel machine learning method that leverages the intercorrelation of microbiome features due to their phylogeny-based taxonomic relationships. MIOSTONE employs a novel taxonomy-encoded deep neural network (DNN) architecture that harnesses the capabilities of DNNs with mitigated concerns of overfitting. In addition, MIOSTONE has the ability to determine whether taxa within the corresponding taxonomic group provide a better explanation in a data-driven manner. We empirically assessed MIOSTONE's accuracy and interpretability on various real microbiome datasets, demonstrating its competitive performance and interpretability compared to existing methods.

17:40-18:00
Invited Presentation: Critical Assessment of Metagenome Interpretation - Updates and Future Benchmarking Challenges
Confirmed Presenter: Alice McHardy

Room: 520c
Format: Live Stream

Moderator(s): Serghei Mangul


Authors List: Show

  • Alice McHardy
Sunday, July 14th
10:40-11:25
Invited Presentation: Sequence-based interrogation of soil microbiomes and their ecosystem benefits
Confirmed Presenter: Susannah Tringe, Lawrence Berkeley National Laboratory, United States

Room: 520c
Format: In Person

Moderator(s): Zhong Wang


Authors List: Show

  • Susannah Tringe, Lawrence Berkeley National Laboratory, United States

Presentation Overview: Show

Plants roots and the soil they grow in are heavily colonized with microbes that play critical roles in nutrient cycling and transport as well as influencing plant growth and health. Molecular methods including DNA sequencing have begun to elucidate the forces governing the assembly and maintenance of plant and soil microbial communities, offering the opportunity for these microbial communities to be nurtured and manipulated to promote plant growth and health as well as soil health and ecosystem functions.
We have combined omics methods, biogeochemical assays, and gas flux measurements to investigate the factors influencing greenhouse gas emissions from natural and managed wetland systems. By integrating these datasets we find that gas fluxes represent a complex interplay of biological, chemical, and physical factors that vary across habitats. Our results suggest considerable heterogeneity in fluxes even in physically proximate locations that have implications for the success of wetland preservation and restoration as a carbon storage strategy, particularly in the context of sea level rise.
In agricultural systems, we find that different plant compartments (e.g. rhizosphere and root endosphere) harbor unique and dynamic microbial communities heavily influenced by the soil, surrounding environment and host genotype. Abiotic stress, such as drought and low nitrogen, can alter both the composition of these communities and their interactions with each other and the plant. Our sequence-based characterizations of plant-associated communities, leveraging a variety of bioinformatic tools, have identified key populations that structure the community and respond dynamically to environmental changes, representing potential targets for improvement of plant resilience.

11:25-11:40
Understanding the small proteins from the global microbiome
Confirmed Presenter: Luis Pedro Coelho, Queensland University of Technology, Australia

Room: 520c
Format: In Person

Moderator(s): Zhong Wang


Authors List: Show

  • Célio Dias Santos-Júnior, Fudan University, China
  • Marcelo Torres, University of Pennsylvania, United States
  • Yiqian Duan, Fudan University, China
  • Cesar de la Fuente Nunez, University of Pennsylvania, United States
  • Luis Pedro Coelho, Queensland University of Technology, Australia

Presentation Overview: Show

Small proteins, crucial across all life domains, have been overlooked in large-scale microbiome studies due to limitations in both wet lab and bioinformatics techniques. In particular, it is difficult to predict them without generating numerous false positives, and functional predictions based on homology fail without closely-related homologs. Recently, studies have begun addressing these challenges, with improved methods for managing small protein data in metagenomic analyses.

We tackled this by analyzing sequences shared across multiple metagenomes, to increase confidence in predictions. This method was applied in creating the Global Microbial smORF Catalogue, which includes almost one billion sequences. This is accessible online for users to identify homologs to smORFs identified in their own studies.

Additionally, we used machine learning to filter out false positives effectively, particularly in identifying active sequences within specific functional classes like antimicrobial peptides (AMPs). For this task, we designed macrel to optimize for high precision, albeit at the potential cost of lower recall. Macrel was used to generate a catalog of one million potential AMPs from extensive genomic and metagenomic data, a dataset we termed AMPsphere.

Experimental validation of these methods included synthesizing and testing 100 AMPs. In total, 79 showed activity against pathogens or commensals. Some peptides also demonstrated efficacy comparable to the clinical antimicrobial, polymyxin B, in a preclinical mouse model, underscoring the potential of these novel bioinformatic approaches to contribute significantly to discovering novel antibiotics.

11:40-11:55
Multi-level analysis of the gut–brain axis shows autism spectrum disorder-associated molecular and microbial profiles
Confirmed Presenter: James Morton, Gutz Analytics, United States

Room: 520c
Format: In Person

Moderator(s): Zhong Wang


Authors List: Show

  • James Morton, Gutz Analytics, United States

Presentation Overview: Show

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut–brain axis (GBA) has been implicated in ASD although with limited reproducibility across studies. In this study, we developed a Bayesian differential ranking algorithm to identify ASD-associated molecular and taxa profiles across 10 cross-sectional microbiome datasets and 15 other datasets, including dietary patterns, metabolomics, cytokine profiles and human brain gene expression profiles. We found a functional architecture along the GBA that correlates with heterogeneity of ASD phenotypes, and it is characterized by ASD-associated amino acid, carbohydrate and lipid profiles predominantly encoded by microbial species in the genera Prevotella, Bifidobacterium, Desulfovibrio and Bacteroides and correlates with brain gene expression changes, restrictive dietary patterns and pro-inflammatory cytokine profiles. The functional architecture revealed in age-matched and sex-matched cohorts is not present in sibling-matched cohorts. We also show a strong association between temporal changes in microbiome composition and ASD phenotypes. In summary, we propose a framework to leverage multi-omic datasets from well-defined cohorts and investigate how the GBA influences ASD.

11:55-12:10
Metagenomic Mining Reveals Niche-Specific Bilirubin Reductases in the Gut Microbiome
Confirmed Presenter: Xiaofang Jiang, NLM/NIH, United States

Room: 520c
Format: In Person

Moderator(s): Zhong Wang


Authors List: Show

  • Xiaofang Jiang, NLM/NIH, United States
  • Keith Dufault-Thompson, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA, United States
  • Brantley Hall, Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, College Park, MD, United States

Presentation Overview: Show

The gut microbiome plays crucial roles in animal health and metabolism, including the biotransformation of host and diet-derived metabolites. The microbial reduction of bilirubin, a heme degradation product, to urobilinogen is a key process in maintaining bilirubin homeostasis in animals. This study employs phylogenetic and metagenomic mining approaches to unveil a novel family of gut-adapted bilirubin reductase enzymes within the Old Yellow Enzyme (OYE) family.

Through an integrated analysis combining experimental screening, comparative genomics, and advanced computational methodologies, we identified and characterized a putative bilirubin reductase enzyme family in anaerobic microbes associated with the gut. Through structural modeling, ancestral sequence reconstruction, and targeted mutation experiments, we confirmed the specificity and function of these enzymes, delineating them from other members of the OYE family. Our findings reveal three distinct forms of bilirubin reductase, characterized by unique domain compositions, that form separate clades within the enzyme's phylogeny.

Our analysis of 1373 gut metagenomes across 132 animal species illuminated the evolutionary divergence and niche-specific associations of the bilirubin reductase clades. We found that bilirubin reductase was significantly enriched in the anaerobic niche of the lower gut in multiple animals, being nearly absent in their upper gastrointestinal tracts. The broader distribution of bilirubin reductase clades highlights clear patterns of co-evolution with their animal hosts, underscoring the ecological and evolutionary interplay between gut microbes and their vertebrate hosts.

14:20-14:40
Proceedings Presentation: Scalable de novo Classification of Antimicrobial Resistance of Mycobacterium Tuberculosis
Confirmed Presenter: Christina Boucher, University of Florida, United States

Room: 520c
Format: In Person

Moderator(s): Luis Pedro Coelho


Authors List: Show

  • Mohammadali Serajian, Unievrsity of Florida, United States
  • Simone Marini, University of Florida, United States
  • Jarno N. Alanko, University of Helsinki, Finland
  • Noelle R. Noyes, University of Minnesota, United States
  • Mattia Prosperi, University of Florida, United States
  • Christina Boucher, University of Florida, United States

Presentation Overview: Show

We develop a robust machine learning classifier using both linear and nonlinear models (i.e., LASSO logistic regression (LR) and random forests (RF)) to predict the phenotypic resistance of \emph{Mycobacterium tuberculosis} (MTB) for a broad range of antibiotic drugs. We use data from the CRyPTIC consortium to train our classifier, which consists of whole genome sequencing and antibiotic susceptibility testing (AST) phenotypic data for 13 different antibiotics. To train our model, we assemble the sequence data into genomic contigs, identify all unique 31-mers in the set of contigs, and build a feature matrix M, where M[i,j] is equal to the number of times the i-th 31-mer occurs in the j-th genome. Due to the size of this feature matrix (over 350 million unique 31-mers), we build and use a sparse matrix representation. Our method, which we refer to as MTB++, leverages compact data structures and iterative methods to allow for the screening of all the 31-mers in the development of both LASSO LR and RF. MTB++ is able to achieve high discrimination (F-1 greater than 80%) for the first-line antibiotics. Moreover, MTB++ had the highest F-1 score in all but three classes and was the most comprehensive since it had an F-1 score greater than 75% in all but four (rare) antibiotic drugs. We use our feature selection to contextualize the 31-mers that are used for the prediction of phenotypic resistance, leading to some insights about sequence similarity to genes in MEGARes.

14:40-15:00
Genomic analysis reveals dysregulation of the intratumor microbiome related to immune response in lung cancer
Confirmed Presenter: Youping Deng, University of Hawaii at Manoa, United States

Room: 520c
Format: In Person

Moderator(s): Luis Pedro Coelho


Authors List: Show

  • Ba Thong Nguyen, University of Hawaii at Manoa, United States
  • Shaoqiu Chen, University of Hawaii at Manoa, United States
  • Donna Lee Kuehu, University of Hawaii at Manoa, United States
  • Isam Ibrahim, University of Hawaii at Manoa, United States
  • Yujia Qin, University of Hawaii at Manoa, United States
  • Youping Deng, University of Hawaii at Manoa, United States

Presentation Overview: Show

Background: Identifying factors underlying resistance to immune checkpoint therapy (ICT) is still challenging. Intratumor microbes (bacteria, fungi, and viruses) are found in multiple tumor tissues of many cancers. In this study, we examined the intratumor microbes of lung patients under immune checkpoint inhibitors.
Methods: We downloaded the whole exome sequencing (WXS) that contains primary tumor and non-tumor non-small cell lung cancer (NSCLC) with clinical data downloaded from Genomes and Phenotypes (dbGaP) databases. The data was collected from three NSCLC with ICT response cohorts (phs002244, phs000980 and phs001940), including 44 response (R) samples and 51 nonresponse (NR) samples. The microbes’ abundance, diversity and significant microbes were extracted through machine learning and microbiome analysis.
Results: The whole exome sequencing (WXS) of 95 patient’s data (NR, 51; R,44) were obtained and analyzed from Feb 2023 to April 2024. After cleaning up, and microbiome analysis data, we found significant significantly higher alpha diversity in response group compare with non-response group in three type microbes in tumor samples (p<0.05) while not significant found in non-tumor tissues. Through different microbes’ analysis, we found bacterium (Lactobacillus gasseri), fungi (Aspergillus_versicolor, GS01_phy_Incertae_sedis_sp) and viruses (Alphabaculovirus, and Mardivirus) are top significant species and genus in response group compared to non-response group in tumor tissues. We found top abundance species and genus of bacteria (Lactobacillus gasseri, and Ralstonia solanacearum), fungi (Aspergillus, Fungi_gen_Incertae_sedis) and viruses (Alphabaculovirus and Betapartitivirus) in tumor samples.
Conclusion: Together, these microbes data provide important implications for the treatment of lung cancer with immune checkpoint inhibitors.

Bioinformatics exploration of bacterial communities and plastic-degrading laccase from the gut microbiomes of plastic degrading beetle larvae
Confirmed Presenter: Jithin Sunny, Queen's University, Canada

Room: 520c
Format: In Person

Moderator(s): Luis Pedro Coelho


Authors List: Show

  • Jithin Sunny, Queen's University, Canada
  • Sabhjeet Kaur, Queen's University, Canada
  • Jeremie Alexander, Queen's University, Canada
  • George C. Dicenzo, Queen's University, Canada

Presentation Overview: Show

This study utilizes comprehensive bioinformatics approaches to investigate the gut bacterial population of mealworms and superworms and to mine for enzymes potentially involved in plastic degradation. A total of 46 metagenomes were assembled, annotated, and analyzed to characterize the bacterial population and identify taxa differentially abundant between insects fed plastics and those not fed plastics. Alpha and beta diversity metrics were first used to examine global differences between the diet groups followed by non-symmetric analysis to explain the variation in data. Binning of the metagenome assemblies led to the generation of 153 metagenome-assembled genomes (MAGs). Metabolic pathway analysis was performed for these MAGs to observe the gene counts involved in aromatic compound degradation genes belonging to the butanoate, and propanoate metabolic pathways amongst others. To further explore genes potentially associated with plastic biodegradation, we annotated all the metagenomes and extracted a non-redundant set of ~105,000 proteins. The non-redundant set of exported proteins included 129 putative laccases, which were of interest as previous studies have implicated this protein family in plastic degradation. We therefore performed sequence and structural analyses to explore the properties of the putative laccases identified in our study. Features were computed using site and domain-based information along with residue and enzyme backbone based structural similarity. Three different clustering methods along with evaluation metrics were employed to evaluate enzymes showing high similarity to laccases previously suggested to be active on plastics. Overall, this research employs different bioinformatics techniques to understand the bacterial groups and enzymes involved in plastic degradation.