The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 13, 2024
10:40-11:25
Invited Presentation: Towards fully genome-resolved metagenomics
Confirmed Presenter: Christopher Quince, Earlham Institute, UK
Track: MICROBIOME

Room: 520c
Format: Live Stream
Moderator(s): Mihai Pop


Authors List: Show

  • Christopher Quince, Christopher Quince, Earlham Institute
  • Gaetan Benoit, Gaetan Benoit, Pasteur Institute
  • Rayan Chikhi, Rayan Chikhi, Pasteur Institute
  • Sebastien Raguideau, Sebastien Raguideau, Earlham Institute
  • Rob James, Rob James, Quadram Institute

Presentation Overview:Show

I will discuss the impact of long accurate sequencing reads generated by HiFi PacBio on the assembly of microbial genomes directly from metagenomes. I will present our recent assembler metaMDBG based on minimiser de Bruijn graphs for this application. I will also talk about the prospects for the use of the latest ONT reads which are approaching HiFi levels of accuracy. I will conclude with a discussion of Hi-C proximity ligation for metagenome binning and the linking of extra-chromosomal elements to genomes.

July 13, 2024
11:25-11:40
Effective binning of metagenomic contigs using contrastive multi-view representation learning
Confirmed Presenter: Shanfeng Zhu, Fudan University, China
Track: MICROBIOME

Room: 520c
Format: Live Stream
Moderator(s): Mihai Pop


Authors List: Show

  • Ziye Wang, Ziye Wang, Fudan University
  • Ronghui You, Ronghui You, Nankai University
  • Haitao Han, Haitao Han, Fudan University
  • Wei Liu, Wei Liu, Fudan University
  • Fengzhu Sun, Fengzhu Sun, University of Southern California
  • Shanfeng Zhu, Shanfeng Zhu, Fudan University

Presentation Overview:Show

Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).

July 13, 2024
11:40-12:00
Proceedings Presentation: Floria: Fast and accurate strain haplotyping in metagenomes
Confirmed Presenter: Jim Shaw, University of Toronto, Canada
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Jim Shaw, Jim Shaw, University of Toronto
  • Jean-Sebastien Gounot, Jean-Sebastien Gounot, Genome Institute of Singapore
  • Hanrong Chen, Hanrong Chen, Genome Institute of Singapore
  • Niranjan Nagarajan, Niranjan Nagarajan, Genome Institute of Singapore
  • Yun William Yu, Yun William Yu, Carnegie Mellon University

Presentation Overview:Show

Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes showed that Floria is > 3x faster and recovers 21% more strain content than base-level assembly methods (Strainberry), while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took < 20 minutes on average per sample, and identified several species that have consistent strain heterogeneity. Applying Floria’s short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses.

July 13, 2024
12:00-12:20
The impact of transitive annotation on the training of taxonomic classifiers
Confirmed Presenter: Mihai Pop, University of Maryland, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Harihara Subrahmaniam Muralidharan, Harihara Subrahmaniam Muralidharan, University of Maryland
  • Noam Fox, Noam Fox, University of Maryland
  • Nathalie Bonin, Nathalie Bonin, University of Maryland
  • Mihai Pop, Mihai Pop, University of Maryland

Presentation Overview:Show

A common task in the analysis of microbial communities involves assigning taxonomic labels to the sequences derived from organisms found in the communities. Frequently, such labels are assigned using machine learning algorithms that are trained to recognize individual taxonomic groups based on training data sets that comprise sequences with known taxonomic labels. Ideally, the training data should rely on labels that are experimentally verified—formal taxonomic labels require knowledge of physical and biochemical properties of organisms that cannot be directly inferred from sequence alone. However, the labels associated with sequences in biological databases are most commonly computational predictions which themselves may rely on computationally-generated data—a process commonly referred to as “transitive annotation”. Here, we focus on taxonomic annotation using a naïve Bayes classifier developed for the annotation of 16S rRNA gene sequences—the Ribosomal Database Project (RDP) classifier. We chose this data set and classifier since they are established resources in microbial ecology, however, our general methodology and conclusions apply more broadly to any sequence-based machine-learning classifier. We demonstrate that even a few computationally-generated training data points can significantly skew the output of the classifier to the point where entire regions of the taxonomic space can be disturbed. To exemplify we note that retraining the classifier with artificial sequences caused changes that spanned microbial phyla. We also discuss key factors that affect the resilience of classifiers to transitively-annotated training data, and propose best practices to avoid the artifacts of transitive annotation.

July 13, 2024
12:00-12:20
Sensitive, specific association of microbial functions with host phenotypes using Phylogenize2
Confirmed Presenter: Patrick Bradley, The Ohio State University, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Kathryn Kananen, Kathryn Kananen, The Ohio State University
  • Patrick Bradley, Patrick Bradley, The Ohio State University

Presentation Overview:Show

In metagenomics, a key challenge is to explain differences in microbial communities in terms of gene function. Many common approaches to this problem do not account for the fact that related species tend to share both genes and phenotypes. This makes them susceptible to discovering clade markers for differentially-abundant microbes, which often have weak evidence for functional relevance. We have developed a new major version, Phylogenize2, of a tool that allows researchers to link community-level changes to gene content while accounting for phylogeny. This revision leverages large, modern collections of isolate and metagenome-assembled genomes, allowing the method to be applied across a range of biomes. We have also substantially improved Phylogenize2's statistical power by combining new, microbiome-specific differential abundance methods with adaptive shrinkage. As a test case, we apply Phylogenize2 to a human cohort with liver cirrhosis, and discover a link between abundances of the Lachnospiraceae (a prevalent group of commensal Clostridia) and anaerobic oxidative stress. Our preliminary results suggest that Phylogenize2 is both more specific than linear modeling and more sensitive than competing methods. Phylogenize2 is a publicly available, open source tool that can extract specific functional information from a wide variety of environmental and host-associated microbiomes.

July 13, 2024
14:20-14:40
Proceedings Presentation: Reference-free Structural Variant Detection in Microbiomes via Long-read Co-assembly Graphs
Confirmed Presenter: Kristen Curry, Rice University, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Kristen Curry, Kristen Curry, Rice University
  • Feiqiao Yu, Feiqiao Yu, Arc Institute
  • Summer Vance, Summer Vance, University of California
  • Santiago Segarra, Santiago Segarra, Rice University
  • Devaki Bhaya, Devaki Bhaya, Carnegie Institute for Science
  • Rayan Chikhi, Rayan Chikhi, Institut Pasteur
  • Eduardo Rocha, Eduardo Rocha, Institut Pasteur
  • Todd Treangen, Todd Treangen, Rice University

Presentation Overview:Show

Bacterial genome dynamics are vital for understanding the mechanisms underlying microbial adaptation, growth, and their broader impact on host phenotype. Structural variants (SVs), genomic alterations of 50 base pairs or more, play a pivotal role in driving evolutionary processes and maintaining genomic heterogeneity within bacterial populations. While SV detection in isolate genomes is relatively straightforward, metagenomes present broader challenges due to the absence of clear reference genomes and the presence of mixed strains. In response, our proposed method rhea, forgoes reference genomes and metagenome-assembled genomes (MAGs) by encompassing a single metagenome co-assembly graph constructed from all samples in a series. The log fold change in graph coverage between subsequent samples is then calculated to call SVs that are thriving or declining throughout the series. We show rhea to outperform existing methods for SV and horizontal gene transfer (HGT) detection in two simulated mock metagenomes, which is particularly noticeable as the simulated reads diverge from reference genomes and an increase in strain diversity is incorporated. We additionally demonstrate use cases for rhea on series metagenomic data of environmental and fermented food microbiomes to detect specific sequence alterations between successive time and temperature samples, suggesting host advantage. Our approach leverages previous work in assembly graph structural and coverage patterns to provide versatility in studying SVs across diverse and poorly characterized microbial communities for more comprehensive insights into microbial gene flux.

July 13, 2024
14:40-14:55
Targeted Sequencing and Triplet Loss classification allow for microbiome-based inference of soil sample origin
Confirmed Presenter: Paweł P. Łabaj, Małopolska Centre of Biotechnology of Jagiellonian University, Poland
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Michał Kowalski, Michał Kowalski, Jagiellonian University
  • Kamila Marszałek, Kamila Marszałek, Jagiellonian University
  • Alina Frolova, Alina Frolova, The Institute of Molecular Biology and Genetics of NASU
  • Kinga Herda, Kinga Herda, Jagiellonian University
  • Agata Jagiełło, Agata Jagiełło, Central Forensic Laboratory of the Police
  • Anna Woźniak, Anna Woźniak, Central Forensic Laboratory of the Police
  • Kaja Milanowska-Zabel, Kaja Milanowska-Zabel, Ardigen S.A.
  • Rafał Płoski, Rafał Płoski, Medical University of Warsaw
  • Andrzej Ossowski, Andrzej Ossowski, Pomeranian Medical University
  • Renata Zbieć-Piekarska, Renata Zbieć-Piekarska, Central Forensic Laboratory of the Police
  • Wojciech Branicki, Wojciech Branicki, Institute of Zoology and Biomedical Research of the Jagiellonian University
  • Paweł P. Łabaj, Paweł P. Łabaj, Małopolska Centre of Biotechnology of Jagiellonian University

Presentation Overview:Show

Microbiome characterization has been successfully applied in forensic studies. However, from MetaSUB Consortium and CAMDA we know that the full forensic potential of environmental metagenomic data is yet to be unveiled. Thus the aim of our SMAFT project was to develop a complete (wet-lab + dry-lab) solution for forensic laboratories in Poland.
Based on earlier gained experience we first have analyzed climate and geographical properties of Poland to select 80 locations. From those the samples have been collected throughout four seasons in triplicates resulting in almost 1000 total. They were then profiled with WMS with about 120 million read-pairs per sample. Which were then fed (and MetaSUB ones as non-Polish negative reference) to MetaGraph to obtain the set of metagenomic features (unitigs) to be used to distinguish between respective locations.
We have ultimately identify 1015 All Relevant Features. which in wet-lab part were used for designing probes for Targeted Metagenomics Sequencing panel, while in dry-lab part were source of data for classification/prediction model.
For sample origin prediction the Triplet Loss based solution was used to reduce the dimensionality of the metagenomic profiles and then Deep Neural Network to obtain probabilities of the origin of the sample. The overall performance was: LRAP: 0.87; Weighted F1: 0.86; Balanced Accuracy: 0.94 on all validation samples (n=547).
Our work has confirmed that targeted microbiome sequencing of soil sample together with appropriate data science and AI processing allows for accurate sample origin classification even in such climate-homogeneous country as Poland.

July 13, 2024
14:55-15:10
Integration and analysis of 168,000 human gut microbiome samples
Confirmed Presenter: Samantha Graham, University of Minnesota, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Samantha Graham, Samantha Graham, University of Minnesota
  • Richard Abdill, Richard Abdill, University of Chicago
  • Vincent Rubinetti, Vincent Rubinetti, University of Colorado School of Medicine
  • Frank Albert, Frank Albert, University of Minnesota
  • Casey Greene, Casey Greene, University of Colorado School of Medicine
  • Sean Davis, Sean Davis, University of Colorado School of Medicine
  • Ran Blekhman, Ran Blekhman, University of Chicago

Presentation Overview:Show

The human microbiome, or the collection of microorganisms within the human body, plays an important role in modulating human health and disease. While some health-relevant patterns may only be discoverable with thousands of samples, the average gut microbiome study contains fewer than 100 samples; fortunately, hundreds of thousands of samples are available via public repositories such as NCBI SRA. Here, we leverage this publicly available data by uniformly processing and integrating 168,464 16S rRNA gene sequencing human gut microbiome samples. This resource, the Human Microbiome Compendium, is freely available via our website (MicroBioMap.org) and R package (MicroBioMap). We use this dataset to characterize global patterns of variation in microbiome composition.
We classified our samples into eight world regions to investigate patterns of variation, and found distinct differences in microbiome composition between regions. We sought to understand specific microbes driving these differences and found that the 65 most abundant genera were differentially abundant between at least one pair of regions. We also note the disparity in sampling between world regions. Over 90,000 of the samples originate from Europe and Northern America, while we identified only 4 samples from Oceania. We show that some regions likely have many taxa that remain undiscovered due to undersampling.
Here, we present a new, large-scale collection of human gut microbiome data, which we use to study global microbiome variation. We expect this compendium will be a valuable resource for the community and enable novel insights into the microbial ecology of the human gut.

July 13, 2024
15:10-15:30
MC-Funcformer: A foundational model of microbial community metabolism
Confirmed Presenter: Ananthan Nambiar, University of Illinois Urbana-Champaign, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Ananthan Nambiar, Ananthan Nambiar, University of Illinois Urbana-Champaign
  • John Forsyth, John Forsyth, University of Illinois Urbana-Champaign
  • Zihan Wang, Zihan Wang, University of Illinois Urbana-Champaign
  • Veronika Dubinkina, Veronika Dubinkina, Gladstone Institute for Data Science and Biotechnology
  • Sergei Maslov, Sergei Maslov, University of Illinois Urbana-Champaign

Presentation Overview:Show

Microbial communities are remarkably complex and encompass an incredible diversity of bacteria, archaea, fungi, and viruses. The metabolic functions of individual community members are at the core of this complexity. Despite the immense taxonomic diversity of microorganisms, there is a notable degree of functional redundancy and universality across taxa. Here, we leverage the recent advances in large language models (LLMs) and a large database of microbial community profiles across a wide range of environments to learn the principles of functional universality. In particular, we introduce MC-Funcformer, a novel approach that pretrains a language model on information extracted from microbial functional abundance data. By using functional information in lieu of taxonomic information, we are able to represent microbial community profiles derived from diverse environments. Our findings highlight the utility of MC-Funcformer in predicting various metadata associated with microbial communities, including host phenotypes and environmental properties. Notably, the embeddings generated by MC-Funcformer outperform traditional approaches based solely on functional abundance vectors, improving predictions of host diseases, diet, and environmental parameters. Furthermore, our analysis reveals the universal nature of the embeddings, enabling generalization across different microbiomes and prediction tasks.

July 13, 2024
15:10-15:30
Assessing Microbial Genome Representation Across Various Reference Databases: A Comprehensive Evaluation
Confirmed Presenter: Serghei Mangul, University of Southern California, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Grigore Boldirev, Grigore Boldirev, GEORGIA STATE UNIVERSITY
  • Nitesh Sharma, Nitesh Sharma, University of Southern California
  • Viorel Munteanu, Viorel Munteanu, Technical University of Moldova
  • Arun Bhavatharini, Arun Bhavatharini, University of Southern California
  • David Koslicki, David Koslicki, Pennsylvania State University
  • Alex Zelikovsky, Alex Zelikovsky, GEORGIA STATE UNIVERSITY
  • Serghei Mangul, Serghei Mangul, University of Southern California

Presentation Overview:Show

Metagenomics research can provide significant insights into the composition, diversity and functions of mixed microbial communities found in various environments. To identify bacterial species, reads from samples are mapped to references that are found in bacterial reference databases. Multiple references may be assigned the same taxonomic identifiers yet these references may contain different genomic information. This project was designed to uncover and correct inconsistencies in bacterial reference databases by comparing species names and genomic representation for the two most commonly used bacterial reference databases PATRIC and Refseq. NCBI’s taxonomic identifiers were utilized to assess the agreement of reference databases at the species rank. Same species across two databases were identified by finding the same taxID in two databases. Comparison of genomic representation across databases was performed using the BLAST tool. After finding the exact same strain, all the contigs from one database were aligned to all contigs from another. This analysis was extended to all overlapping species where strain information was available. The results of the study revealed substantial discrepancies across the databases in the presence of bacterial species. 12.4% of species are present in all three databases, 29.49% are found only in two databases, and 58.46% are found only in one database. To compare genomic representation, we visualized gathered data on all observed alignment cases showing that quality of reference genomes can be improved through consolidation of contigs. This evaluation is a fundamental step towards creating a comprehensive reference database that will substantially improve the accuracy of metagenomics research.

July 13, 2024
15:30-16:00
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater
Confirmed Presenter: Victor Gordeev, Department of Computers, Informatics
Track: MICROBIOME

Room: 520c
Format: Live Stream
Moderator(s): Mihai Pop


Authors List: Show

  • Victor Gordeev, Victor Gordeev, Department of Computers
  • Viorel Munteanu, Viorel Munteanu, Department of Computers
  • Shelesh Agrawal, Shelesh Agrawal, Department of Civil and Environmental Engineering Sciences
  • Martin Hölzer, Martin Hölzer, Genome Competence Center (MF1)
  • Adam Smith, Adam Smith, Astani Department of Civil and Environmental Engineering University of Southern California
  • Dumitru Ciorba, Dumitru Ciorba, Department of Computers
  • Serghei Mangul, Serghei Mangul, Titus Family Department of Clinical Pharmacy

Presentation Overview:Show

Wastewater genomic surveillance of SARS-CoV-2 has emerged as a scalable, cost-effective, passive surveillance tool to monitor viral variants circulating in the human population. However, accurate estimation of viral lineage prevalence in communities relies on the performance of computational methods for analyzing wastewater sequencing data. We perform a comprehensive benchmarking of bioinformatics methods designed for estimating the relative abundance of SARS-CoV-2 (sub)lineages from wastewater sequencing data, along with RNA-Seq and metagenomics methods repurposed for this task. We systematically compare the accuracy of these computational methods in estimating the relative abundances of the (sub)lineages present in a sample, including closely related and low-abundance (sub)lineages. Our preliminary results on simulated data and a few computational methods show that RNA-Seq methods RSEM (most accurate), Kallisto, and Salmon consistently achieve lower L1 errors for lineages and particularly for sublineages when compared to wastewater-surveillance methods Alcov and PiGx. In particular, while the distribution of absolute errors for lineages is similar, for sublineages, roughly 80% of the absolute error values for RSEM, Kallisto, and Salmon are lower than 0.04%, compared to roughly 75% for Alcov and only 25% for PiGx. In addition to extensive simulated data, we will use in vitro mixtures of (sub)lineages of various complexity prepared from synthetic RNA genomes or inactivated viral particles and sequenced using short-read and long-read technologies. Using different experimental strategies, we will also investigate how the performance of these computational methods is impacted by the wastewater matrix or wastewater nucleic acid background, but also by the design of the sequencing experiment. Our study will inform the selection of the most accurate, robust, and sensitive methods for SARS-CoV-2 lineage prevalence estimation to enable effective wastewater-based genomic surveillance.

July 13, 2024
15:30-16:00
MetaViz: Realistic assortment of novel metagenomics benchmarks with diverse biological and technological characteristics
Confirmed Presenter: Nitesh Kumar Sharma, Department of Clinical Pharmacy, University of Southern California
Track: MICROBIOME

Room: 520c
Format: Live Stream
Moderator(s): Mihai Pop


Authors List: Show

  • Nitesh Kumar Sharma, Nitesh Kumar Sharma, Department of Clinical Pharmacy
  • Karishma Chhugani, Karishma Chhugani, Department of Clinical Pharmacy
  • Viorel Munteanu, Viorel Munteanu, Department of Computers
  • Pavel Skums, Pavel Skums, School of Computing
  • Alex Zelikovsky, Alex Zelikovsky, Department of Computer Science
  • Serghei Mangul, Serghei Mangul, Department of Clinical Pharmacy

Presentation Overview:Show

Metagenomics research relies heavily on bioinformatics methods for analyzing complex microbial communities, necessitating rigorous validation through benchmarking. However, creating high-quality experimental benchmarks can be costly and challenging. Current benchmarking efforts often rely on limited gold-standard samples or synthetic data, hindering comprehensive evaluations. To address this, we propose MetaViz, a tool for generating semi-real novel metagenomics benchmarks through in silico modification of existing experimental data. MetaViz offers a cost-effective alternative, combining elements of real data with simulated modifications, surpassing the limitations of purely simulated datasets. Our tool allows precise control over sample composition, diversity, and technological characteristics, enhancing benchmarking accuracy and applicability. We applied MetaViz to over 27 real metagenomics benchmarks, including in-vitro viral mock communities and intra-host clinical samples. Our tool allowed us to precisely control the composition and the abundance of microbial genomes in the in-vitro mixtures (mock community). We were also able to adjust their relative abundance with varying frequency ranging from 0.1% to 10%. Leveraging reference mapping, we introduced varying errors within the read data, thereby enhancing reliability. Our method introduces a novel approach to benchmarking in metagenomics, particularly valuable where traditional gold-standard creation is impractical. By capturing the complexity of actual datasets, MetaViz produces semi-real benchmarks that encompass a broader range of clinical and technological characteristics, ultimately enhancing benchmarking comprehensiveness. Adoption of our approach promises to significantly improve benchmarking studies' robustness and accuracy, advancing our understanding of microbial communities across diverse biological contexts.

July 13, 2024
15:30-16:00
Phage Host Prediction Using Novel Global-Scale Phage-Host Interaction Atlas and Genomic Language Models
Confirmed Presenter: Jonas Grove, Phase Genomics, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Mihai Pop


Authors List: Show

  • Jonas Grove, Jonas Grove, Phase Genomics
  • Samuel Bryson, Samuel Bryson, Phase Genomics
  • Benjamin Auch, Benjamin Auch, Phase Genomics
  • Bradley Nelson, Bradley Nelson, Phase Genomics
  • Cristiana Carpinteiro, Cristiana Carpinteiro, Loka
  • Zach Sisson, Zach Sisson, Phase Genomics
  • Demi Glidden, Demi Glidden, Phase Genomics
  • Emily Reister, Emily Reister, Phase Genomics
  • Ivan Liachko, Ivan Liachko, Phase Genomics

Presentation Overview:Show

Viruses, including bacteriophages and archaeal viruses, are the most abundant form of life on earth (1031), interacting with all life and shaping the global ecosystem. However, phage-host relationships have proven challenging to identify without culture-based experiments to generate unambiguous evidence for a phage’s presence in a given host. These experiments inherently require that all hosts are culturable, restricting the microbial diversity that can be surveyed.

Proximity ligation sequencing is a powerful metagenomic method for associating viruses with their hosts directly in native microbial communities. Proximity ligation captures, in vivo, physical interactions between the host microbial genome and the genetic material of both lytic and lysogenic phages. These linkages offer direct evidence that phage sequences were present within an intact host cell, establishing a phage-host pair without the propagation of living bacterial cells. The combination of intra-phage and phage-host signal enables us to simultaneously deconvolve viral and microbial genomes directly from metagenomes, and to assign microbial hosts to large numbers of viruses without culturing.

Our application of this technology to thousands of complex microbiome samples has yielded host assignments for hundreds of thousands of novel phage and archaeal viruses. Utilizing our expanded phage-host interaction training data, and leveraging advancements made in the field of natural language processing (NLP) and genomic large language models (LLMs), we have developed deep learning networks that model the dynamics between phages and microbial hosts at sequence-level resolution. We will report published and unpublished work highlighting the power of this approach in the field of metagenomic discovery.

July 13, 2024
16:40-17:00
Proceedings Presentation: Towards more accurate microbial source tracking via non-negative matrix factorization (NMF)
Confirmed Presenter: Yanni Sun, City University of Hong Kong, Hong Kong
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Serghei Mangul


Authors List: Show

  • Ziyi Huang, Ziyi Huang, City University of Hong Kong
  • Dehan Cai, Dehan Cai, City University of Hong Kong
  • Yanni Sun, Yanni Sun, City University of Hong Kong

Presentation Overview:Show

Motivation: The microbiome of a sampled habitat often consists of microbial communities from various sources, including potential contaminants. Microbial source tracking (MST) can be used to discern the contribution of each source to the observed microbiome data, thus enabling the identification and tracking of microbial communities within a sample. Therefore, MST has various applications, from monitoring microbial contamination in clinical labs to tracing the source of pollution in environmental samples. Despite promising results in MST development, there is still room for improvement, particularly for applications where precise quantification of each source’s contribution is critical.
Results: In this study, we introduce a novel tool called SourceID-NMF towards more precise microbial source tracking. SourceID-NMF utilizes a non-negative matrix factorization (NMF) algorithm to trace the microbial sources contributing to a target sample, without assuming specific probability distributions. By leveraging the taxa abundance in both available sources and the target sample, SourceID-NMF estimates the proportion of available sources present in the target sample. To evaluate the performance of SourceID-NMF, we conducted a series of benchmarking experiments using simulated and real data. The simulated experiments mimic realistic yet challenging scenarios for identifying highly similar sources, irrelevant sources, unknown sources, low abundance sources, and noise sources. The results demonstrate the superior accuracy of SourceID-NMF over existing methods. Particularly, SourceID-NMF accurately estimated the proportion of irrelevant and unknown sources while other tools either over- or under-estimated them. Additionally, the noise sources experiment also demonstrated the robustness of SourceID-NMF for MST.

July 13, 2024
17:00-17:15
Carbohydrate-active enzyme annotation in microbiomes using dbCAN3
Confirmed Presenter: Yanbin Yin, University of Nebraska - Lincoln, United States
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Serghei Mangul


Authors List: Show

  • Yanbin Yin, Yanbin Yin, University of Nebraska - Lincoln

Presentation Overview:Show

Carbohydrate active enzymes (CAZymes) are made by various organisms for complex carbohydrate metabolism. Genome mining of CAZymes has become a routine data analysis in microbiome sequencing projects, owing to the importance of CAZymes in bioenergy, microbiome, nutrition, agriculture, and global carbon recycling. In 2012, dbCAN was provided as an online web server for automated CAZyme annotation. dbCAN2 (https://bcb.unl.edu/dbCAN2) was further developed in 2018 as a meta server to combine multiple tools for improved CAZyme annotation. dbCAN2 also included CGC-Finder, a tool for identifying CAZyme gene clusters (CGCs) in (meta-)genomes. We have updated the meta server to dbCAN3 with the following new functions and components: (i) dbCAN-sub as a profile Hidden Markov Model database (HMMdb) for substrate prediction at the CAZyme subfamily level; (ii) searching against experimentally characterized polysaccharide utilization loci (PULs) with known glycan substates of the dbCAN-PUL database for substrate prediction at the CGC level; (iii) a majority voting method to consider all CAZymes with substrate predicted from dbCAN-sub for substrate prediction at the CGC level; (iv) improved data browsing and visualization of substrate prediction results on the website. In summary, dbCAN3 not only inherits all the functions of dbCAN2, but also integrates three new methods for glycan substrate prediction in microbiome sequencing data.

Publication: https://academic.oup.com/nar/article/51/W1/W115/7147496

July 13, 2024
17:15-17:30
NUTRIclock, NEURAL NETWORKS ANALYSIS OF MICROBIOME FOR IMPLEMENTING PRECISSION NUTRITION IN AGING.
Confirmed Presenter: Adrian Martin-Segura, IMDEA Food Institute, 28049 Madrid
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Serghei Mangul


Authors List: Show

  • Adrian Martin-Segura, Adrian Martin-Segura, IMDEA Food Institute
  • Laura J. Marcos-Zambrano, Laura J. Marcos-Zambrano, IMDEA Food Institute
  • Blanca Lacruz-Pleguezuelos, Blanca Lacruz-Pleguezuelos, IMDEA Food Institute
  • Alberto Diaz-Ruiz, Alberto Diaz-Ruiz, IMDEA Food Institute
  • Enrique Carrillo de Santa Pau, Enrique Carrillo de Santa Pau, IMDEA Food Institute

Presentation Overview:Show

Aging is the greatest risk factor for the development of chronic diseases like neurodegenerative disorders or cancer. The increase in life expectancy makes extremely urgent to understand and develop mechanisms to promote healthy aging. In that sense, the study of microbiome has emerged as an important factor in aging processes. The gut microbiome comprises all gut microorganisms, and different scientific evidence points towards microbiome alterations as important events in disease onset. Moreover, different nutritional interventions have shown their efficiency to modulate human microbiome and physiology due to their versatility. However, their efficiency is highly dependent on the individual’s genomic and metagenomic background. We propose using neural networks (NN) algorithms and ~3700 whole genome shotgun samples from public repositories, to elucidate microbiome changes with aging. Using samples’ microbial profiles (relative abundance and gene families), we are developing NUTRIclock, an aging clock based on microbiome. We are trying different NN architectures (convolutional, GraphNN) to find the one that better accommodates to microbiome particularities, encoding in it microbial and metabolic changes observed along aging. NUTRIclock will serve i) as a tool to calculate the biological age of a patient, in opposition to his/her chronological age; ii) to find microbiome patterns altered in the patients that could be driving potential effects of that biological age. Thus, allowing to implement personalized nutritional interventions according to a patient microbiome profile, to influence on its dynamics, enhancing the development of precision nutrition and personalized medicine fields.

July 13, 2024
17:30-17:40
MIOSTONE: Modeling microbiome-trait associations with taxonomy-adaptive neural networks
Confirmed Presenter: Yifan Jiang, Cheriton School of of Computer Science, University of Waterloo
Track: MICROBIOME

Room: 520c
Format: In Person
Moderator(s): Serghei Mangul


Authors List: Show

  • Yifan Jiang, Yifan Jiang, Cheriton School of of Computer Science
  • Matthew Aton, Matthew Aton, School of Life Sciences
  • Qiyun Zhu, Qiyun Zhu, School of Life Sciences
  • Yang Lu, Yang Lu, Cheriton School of of Computer Science

Presentation Overview:Show

The human microbiome, a complex ecosystem of microorganisms inhabiting the body, plays a critical role in human health. Investigating its association with host traits is essential for understanding its impact on various diseases. Although shotgun metagenomic sequencing technologies have produced vast amounts of microbiome data, analyzing such data is highly challenging due to its sparsity, noisiness, and high feature dimensionality. Here we propose MIOSTONE, a novel machine learning method that leverages the intercorrelation of microbiome features due to their phylogeny-based taxonomic relationships. MIOSTONE employs a novel taxonomy-encoded deep neural network (DNN) architecture that harnesses the capabilities of DNNs with mitigated concerns of overfitting. In addition, MIOSTONE has the ability to determine whether taxa within the corresponding taxonomic group provide a better explanation in a data-driven manner. We empirically assessed MIOSTONE's accuracy and interpretability on various real microbiome datasets, demonstrating its competitive performance and interpretability compared to existing methods.

July 13, 2024
17:40-18:00
Invited Presentation: Critical Assessment of Metagenome Interpretation - Updates and Future Benchmarking Challenges
Confirmed Presenter: Alice McHardy
Track: MICROBIOME

Room: 520c
Format: Live Stream
Moderator(s): Serghei Mangul


Authors List: Show

  • Alice McHardy