Posters - Schedules

Posters Home

View Posters By Category

Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT
Session A Poster Set-up and Dismantle Session A Posters set up:
Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT
Session A Posters dismantle:
Tuesday, July 12 at 6:00 PM CDT
Session B Poster Set-up and Dismantle Session B Posters set up:
Wednesday, July 13 between 7:30 AM - 10:00 AM CDT
Session B Posters dismantle:
Thursday. July 14 at 2:00 PM CDT
Virtual: A rat microbial BodyMap across 11 tissue types and 4 developmental stages
COSI: MICROBIOME
  • Lan Zhao, Stanford University, United States
  • Mark Nicolls, Stanford University, United States


Presentation Overview: Show

The rat has been widely used as a model in a variety of fields. To determine whether specific microbiome patterns and signatures are associated with different developmental stages in the rat organs, we systematically analyzed a cohort of RNA-Seq samples generated by the SEQC consortium from 11 organs of juvenile, adolescent, adult and aged Fischer 344 healthy rats. Raw sequencing data were mapped to the rat reference genome (Ensembl release 104) with STAR (v2.7.9a) aligner. These sets of unmapped reads were subjected to microbial taxonomic classification using exact k-mer matches, Kraken (v2.0.8-beta). A biclustering algorithm based on Consensus Non-negative Matrix factorization (cNMF) was applied to identify the rat-microbe interaction patterns and signatures. A total of 4,647 taxonomies were identified across all rat tissue types, and 4 rat-microbe interaction patterns were subsequently determined. The lung's microbial profiles showed distinct separation among clusters, and shifted significantly at different developmental stages. Our study did taxonomic profiling of the rat 11 tissue types at four stages of development. The identified four rat-organ clusters with 362 microbial signatures may be useful for accessing the unculturable microbial communities and facilitating discovery of the roles the microbiome plays in human health.

Virtual: Genome-Metagenome Similarity Graph: a Containment-Based Approach to Metagenome Comparison
COSI: MICROBIOME
  • Isaac Thomas, Pennsylvania State University, United States
  • David Koslicki, Pennsylvania State University, United States


Presentation Overview: Show

Well-known similarity/distance functions like the Jaccard Index, the UniFrac metric, and Bray-Curtis dissimilarity form a cornerstone of comparative metagenomics, enabling informative clustering and similarity network construction over real-world microbial community data. However, such methods are sensitive to sequencing noise in the first case or suffer indirect performance penalties due to the need for abundance profiles in the latter two cases. To address these shortcomings, we propose a containment index-based method that combines the robustness of reference-based techniques with the speed of locality-sensitive $k$-mer hashing approaches. We introduce a Genome-Metagenome Similarity Graph (GMSG) which relates two $k$-merized metagenomic samples by the containments of $k$-merized reference genomes within them, estimated efficiently using containment MinHash. From the GMSG's stored information, we then derive a maximum-flow-inspired similarity function which we call GMSG flow similarity. Through simulating microbial community read data with realistic sequencing noise, we found that the GMSG flow similarity scored pairs of communities more accurately than the Jaccard Index did. We plan to add the UniFrac Metric and Bray-Curtis dissimilarity to this comparison complete with performance profiling, then determine clustering purity for these functions on human microbiome data.

Virtual: Machine-learning based identification of discriminatory microbial features for the classification of a diarrheal gut microbiota
COSI: MICROBIOME
  • Indumathi Palanikumar, Indian Institute of Technology Madras, India
  • Karthik Raman, Indian Institute of Technology Madras, India
  • Himanshu Sinha, Indian Institute of Technology Madras, India


Presentation Overview: Show

Diarrhea, an enteric disease, is the second leading cause of under-5 mortality worldwide. This work explores the influence of enteric pathogens on the infant’s gut microbiota and investigates the probable microbial candidate for gut microbiota recovery. Gut microbiome data from healthy and diarrheal infected children from South-East Asian and African countries showed that there is age and condition-dependent variation in microbiome diversity. To differentiate between healthy and diseased states, we employed a supervised classification method, Random Forest, to identify several discriminatory microbial species that allowed to study the country and age-group-specific influence on microbiota development. These results suggested that the microbial community disruption depends on the type of enteric pathogen and the existing community structure. The differential functional pathway analyses among the healthy and diseased microbial communities showed variation in the glycan degradation pathway due to the loss of glycan-degrading microbes during pathogen colonization. We are currently employing constraint-based modeling methods to study the metabolism in the discriminatory microbial species to decipher the metabolic interaction variations and the potential microbial signatures to enhance gut health.

Virtual: MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities
COSI: MICROBIOME
  • Ziye Wang, Fudan University, China
  • Pingqin Huang, Fudan University, China
  • Ronghui You, Fudan University, China
  • Fengzhu Sun, University of Southern California, United States
  • Shanfeng Zhu, Fudan University, China


Presentation Overview: Show

Binning aims to recover microbial genomes from metagenomic data. For complex metagenomic communities, the available binning methods are far from satisfactory, which usually do not fully use different types of features and important biological knowledge. We developed a novel ensemble binner, MetaBinner, which generates component results with multiple types of features by k-means and utilizes single-copy gene (SCG) information for initialization. It then employs a two-stage ensemble strategy based on SCGs to integrate the component results efficiently and effectively. Extensive experimental results on three large-scale simulated datasets and one real-world dataset demonstrate that MetaBinner outperforms the state-of-the-art binners significantly. MetaBinner is freely available at https://github.com/ziyewang/MetaBinner.

Virtual: PLR-GEN: a method for generating pseudo-long reads from metagenome short reads
COSI: MICROBIOME
  • Mikang Sim, Konkuk University, South Korea
  • Jongin Lee, Konkuk University, South Korea
  • Suyeon Wy, Konkuk University, South Korea
  • Nayoung Park, Konkuk University, South Korea
  • Daehwan Lee, Konkuk University, South Korea
  • Daehong Kwon, Konkuk university, South Korea
  • Jaebum Kim, Konkuk University, South Korea


Presentation Overview: Show

Metagenome assembly using high-throughput sequencing data is a powerful method to construct microbial genomes in environmental samples without cultivation. However, metagenome assembly is a complex and challenging task because mixed genomes of multiple microorganisms constitute the metagenome. Although long read sequencing technologies have been developed and begun to be used for metagenome assembly, many metagenome studies have been performed based on short reads. In this study, we present a new method called PLR-GEN which creates pseudo-long reads from metagenome short reads based on given reference genome sequences by considering small sequence variations existing in individual genomes of the same or different species. When applied to a mock community dataset in the Human Microbiome Project, PLR-GEN dramatically extended short reads in length of 101 bp to pseudo-long reads with N50 of 33 Kbp and 0.4 % error rate. The use of these pseudo-long reads generated by PLR-GEN resulted in an obvious improvement of metagenome assembly in terms of the number of sequences, assembly contiguity, and prediction of species and genes. PLR-GEN can be used to generate artificial long-read sequences without spending extra sequencing cost, thus aiding various studies using metagenomes.

V-001: Lasonolide A is synthesized by a trans-AT PKS pathway present in an uncultured Verrucomicrobiota
COSI: MICROBIOME
  • Jackie Metz, Florida Atlantic University, Harbor Branch Oceanographic Institute, United States
  • René Xavier, Florida Atlantic University, Harbor Branch Oceanographic Institute, United States
  • Guojun Wang, Valent BioSciences, United States
  • Amy Wright, Florida Atlantic University, Harbor Branch Oceanographic Institute, United States
  • Jason Kwan, University of Wisconsin-Madison, United States
  • Siddharth Uppal, Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin—Madison, Madison, Wisconsin, USA, United States


Presentation Overview: Show

Lasonolide A (LSA) is a bioactive polyketide isolated from marine sponge Forcepia sp. LSA exhibits potent anticancer activity against certain cell lines in the National Cancer Institute-60 cell line screen. Furthermore, LSA acts by a unique mechanism making it an excellent drug lead. However, the limited supply of the sponge and its laborious chemical synthesis has hampered its transition into clinical trials. Identification of LSA producer and elucidation of the biosynthetic genes producing LSA can potentially allow researchers to assess higher quantities of LSA thus facilitating its progression into the clinic.
We analyzed the metagenome of Forcepia sp. and uncovered a putative trans-AT PKS pathway (las BGC) proposed to produce LSA. las BGC was identified to be present in a bacterium belonging to a novel genus of phylum Verrucomicrobiota, which we named “Candidatus Thermopylae lasonolidus”. Significantly different penta-nucleotide composition (5-mers) and GC percent of las BGC when compared to the “Ca. T. lasonolidus” suggests a horizontal acquisition of the gene cluster. Mapping of paired-end reads and analysis of the assembly graph revealed three copies of las BGC in “Ca. T. lasonolidus”. The three repeats were not identical and were found to contain differences including insertions and single nucleotide polymorphism.

V-002: Unlocking capacities of genomics for the COVID-19 response and future pandemics
COSI: MICROBIOME
  • Karishma Chhugani, University of Southern California, United States
  • Serghei Mangul, University of Southern California, United States
  • Sergey Knyazev, University of California, Los Angeles, United States
  • Varuni Sarwal, University of California, Los Angeles, United States
  • Ram Ayyala, University of Southern California, United States
  • Angela Lu, University of Southern California, United States
  • Adam Smith, University of Southern California, United States


Presentation Overview: Show

During the COVID-19 pandemic, genomics and bioinformatics have emerged as essential public health tools. The genomic data acquired using these methods have supported the global health response, facilitated development of testing methods, and allowed timely tracking of novel SARS-CoV-2 variants. Yet the virtually unlimited potential for rapid generation and analysis of genomic data is also coupled with unique technical, scientific, and organizational challenges. Here, we present the application of genomic and computational methods for the efficient data driven COVID-19 response, advantages of democratization of viral sequencing around the world, and challenges associated with viral genome data collection and processing.

V-003: Adversarial and variational autoencoders improve metagenomics binning
COSI: MICROBIOME
  • Pau Piera Lindez, University of Copenhagen, Novo Nordisk Foundation Center for Protein Research, Denmark
  • Joachim Johansen, University of Copenhagen, Novo Nordisk Foundation Center for Protein Research, Denmark
  • Jakob Nybo Nissen, University of Copenhagen, Novo Nordisk Foundation Center for Protein Research, Denmark
  • Simon Rasmussen, University of Copenhagen, Novo Nordisk Foundation Center for Protein Research, Denmark


Presentation Overview: Show

Reconstruction of high-quality genomes from metagenomic samples is a hard problem, often resulting in highly fragmented genome assemblies. Metagenomic binning allows us to reconstruct genomes by re-grouping the sequences by their organism of origin, thus representing a crucial bottleneck for exploring biological diversity in metagenomic samples. Here we present Adversarial Autoencoders for Metagenomics Binning (AAMB), a deep learning approach that integrates sequence co-abundances and tetra nucleotides frequencies into a common denoised space that enables precise clustering of sequences into microbial genomes. When benchmarked AAMB presented similar or better results compared with the state-of-the-art binner VAMB, reconstructing 0-35% and 5-10% more near-complete (NC) genomes on simulated and real data, respectively. When integrating VAMB and AAMB NC bins with dRep, we, on average, obtained 30% additional NC bins across simulated and real datasets. In addition, the VAMB-AAMB integrated bins had higher completeness, greater taxonomic diversity, and covered a wider range of sample prevalence compared with VAMB. Finally, we implemented a pipeline integrating VAMB, AAMB, and dRep that enables efficient binning and integration without extensive additional runtime.

V-004: Host-microbiome protein-protein interactions capture disease-relevant pathways
COSI: MICROBIOME
  • Juan Felipe Beltrán, Cornell University, United States
  • Ilana Brito, Cornell University, United States
  • Hao Zhou, Cornell University, United States


Presentation Overview: Show

Host-microbe interactions are crucial for normal physiological and immune system development and are implicated in a variety of diseases. To identify potential pathways through which human-associated bacteria impact host health, we leverage publicly-available interspecies protein-protein interaction (PPI) data to find clusters of microbiome-derived proteins with high sequence identity to known human-protein interactors. We observe differential targeting of putative human-interacting bacterial genes in nine independent metagenomic studies, finding evidence that the microbiome broadly targets human proteins involved in immune, oncogenic, apoptotic, and endocrine signaling pathways in relation to IBD, CRC, obesity, and T2D diagnoses. This host-centric analysis provides a mechanistic hypothesis-generating platform and extensively adds human functional annotation to commensal bacterial proteins.

V-005: The Gastric Microbiome and Gastric Carcinogenesis: Bacteria diversity, Co-occurrence patterns and Predictive Models
COSI: MICROBIOME
  • Edwin Moses Appiah, Department of biochemistry and Biotechnology, KNUST, Ghana
  • Samson Pandam Salifu, Department of biochemistry and Biotechnology, KNUST, Ghana


Presentation Overview: Show

Changes in the microbiome composition and interaction have been implicated in gastric cancer development. Toward the understanding of how the microbiome affects the pathogenesis of the disease, many studies have provided relevant yet varying results. We present a comprehensive analysis of the gastric microbiome in gastric carcinogenesis, focusing on bacterial diversity, co-occurrence patterns, and ultimately identification of potential microbial biomarkers. We combined raw 16s rRNA data from six (6) studies across 985 samples from individuals consisting of healthy, gastritis, intestinal metaplasia and cancer. Batch effects were corrected with the Herman package in R. The Proteobacteria composition and diversity decrease, and the Actinobacteria increase with carcinogenesis. Transient oral pathogenic and intestinal bacteria, Prevotella, Propionibacterium acnes, Acinetobacter baumannii, lactobacillus, Gordonai polyisoprenivorans, were highly enriched with increasing carcinogenesis from gastritis to cancer. Microbial co-occurrence analysis revealed essential keystone species with Pseudoxanthomonas spadix and Sphinogobium represented as hubs in healthy individuals. Filifactor alocis showed significant interaction with pathogenic bacteria Fusobacterium nucleatum in gastric cancer communities. LASSO models revealed Bacteroides dorei, Hydrogenophilus hirschii, and Propionibacterium granulosum as potential biomarkers for gastric cancer. This study provides significant insight into the gastric microbial communities and how they could serve as a potential tool for predicting gastric carcinogenesis.

V-006: Target-enriched long-read sequencing (TELSeq) contextualizes antimicrobial resistance risk in metagenomes.
COSI: MICROBIOME
  • Ilya B. Slizovskiy, University of Minnesota, United States
  • Marco Oliva, University of Florida, United States
  • Jonathen K. Settle, University of Florida, United States
  • Lidiya V. Zyskina, University of Maryland, United States
  • Mattia C. F. Prosperi, University of Florida, United States
  • Christina Boucher, University of Florida, United States
  • Noelle R. Noyes, University of Minnesota, United States


Presentation Overview: Show

Metagenomic data can be used to profile high-importance functions within microbiomes. However, current metagenomic workflows produce data that suffer from low sensitivity and an inability to accurately reconstruct partial or full genomes. These limitations preclude colocalization analysis, i.e., the ability to characterize the genomic context of genes and functions within a metagenomic sample. Genomic context is especially crucial for functions associated with horizontal gene transfer (HGT) via mobile genetic elements (MGEs), for example antimicrobial resistance (AMR). To overcome this current limitation of metagenomics, we present a method for comprehensive and accurate reconstruction of antimicrobial resistance genes (ARGs) and MGEs from metagenomic DNA, termed target-enriched long-read sequencing (TELSeq).
Using replicates of diverse sample types, we compared TELSeq performance to that of non-enriched PacBio and short-read Illumina sequencing. TELSeq achieved much higher sensitivity than the other methods, revealing an extensive resistome profile comprising many low-abundance ARGs, including some with public health importance. Using the long reads generated by TELSeq, we identified numerous MGEs flanking the low-abundance ARGs, indicating that these ARGs could be transferred across bacterial taxa via HGT.

V-007: PhAME DB: A comprehensive catalog of phage auxiliary metabolic genes
COSI: MICROBIOME
  • Cody Martin, University of Wisconsin-Madison, United States
  • Karthik Anantharaman, University of Wisconsin-Madison, United States


Presentation Overview: Show

Although bacteria are significant contributors to microbiomes and biogeochemical cycles, bacterial viruses, or phages, are poorly understood in these realms. Phages encode auxiliary metabolic genes (AMGs) that are used to exploit host metabolic niches, including photosynthesis, carbon, nitrogen, and sulfur metabolism, and virulence. However, the global diversity and distribution of AMGs is not well characterized due to the lack of a comprehensive AMG database, downplaying the importance of phage as manipulators of microbial metabolism and biogeochemical cycles. Therefore, we present a comprehensive database of AMGs, PhAME DB, to progress our understanding of the role phages play in modulating bacterial metabolic networks. PhAME DB was constructed from a diverse set of viral databases, including IMG/VR, mammalian gut viromes, PIGEON, and both Global Oceans Viromes. 1.8M out of 7.7M phage protein families (85.3M total sequences) were annotated by remote homology detection using HMMER3 queried against metabolic hidden Markov models from KEGG, dbCAN2, MEROPS, eggNOG, efam, and efam-XC. This database will enable large-scale analyses of AMG diversity, ecology, and evolution and provide context on the impacts and roles of phage in ecosystems and biogeochemistry. Additionally, a web application will be created to allow users to query the database and annotate input proteins. Furthermore, this database will serve as the foundation for the development of tools that 1) enable automated verification of AMGs in a metagenomic context and 2) aid in host prediction using AMG homology to host metabolic genes.

V-008: Applying UniFrac to Whole Genome Shotgun Data
COSI: MICROBIOME
  • Wei Wei, Pennsylvania State University, United States
  • David Koslicki, Pennsylvania State University, United States


Presentation Overview: Show

The UniFrac metric has been proven useful in revealing diversity across metagenomic communities. Due to the phylogeny-based nature of this metric, UniFrac has historically only been applied to 16S rRNA data. Simultaneously, whole genome shotgun (WGS) metagenomics has been increasingly widely employed and proven to provide more information than 16S data, but a UniFrac-like diversity metric suitable for WGS data has not previously been developed. In this study, we demonstrate a method to overcome the absence of phylogenetic information in WGS data by assigning branch lengths to taxonomic trees. We conduct experiments using real and simulated data and test different branch lengths assignment methods, including reference-free models and taxonomy with phylogenic information obtained from the GTDB database. Our results show that this WGSUniFrac method is comparably robust to traditional 16S UniFrac and is not highly sensitive to branch lengths assignments, be they data-derived or model-prescribed. In real WGS human data, WGSUniFrac clusters samples by body sites, recapitulating the pattern exhibited in 16S data. Our study provides a method for direct and efficient beta-diversity measurement on WGS data and suggests that the UniFrac has the potential to be applied to a wider range of data structures.

V-009: Advanced automation platform for microbiome community data analysis
COSI: MICROBIOME
  • Kwangmin Kim, 3BIGS CO.,LTD., South Korea
  • Sathishkumar Natarajan, 3BIGS CO.,LTD., South Korea
  • Nahyun Woo, 3BIGS CO.,LTD., South Korea
  • Bohyeon Park, 3BIGS CO.,LTD., South Korea
  • Hoyong Chung, 3BIGS CO.,LTD., South Korea
  • Junhyung Park, 3BIGS CO.,LTD., South Korea


Presentation Overview: Show

The improvement of NGS technology has facilitated generalization of microbiome community analysis research, which has become a global standard procedure for analyzing the structure and functionality of microbiomes. However, in order to accurately interpret the sequencing data produced in this way, various bioinformatics related analysis tools and pipelines are required. Recently, various biological information analysis programs for metagenome research have been released for free, but it is difficult to use them due to the characteristics and advantages and disadvantages of each program. We studied automation pipelines by integrating and linking dozens of analysis tools for metagenome research, and implemented each result in a graphical format so that researchers could easily understand it. The most important thing in the metagenome study is database information to check microbiome information. We utilize NCBI's database as well as the known 16S rRNA-specific database, and we can analyze it using customized DB construction according to specific research. We also built an AWS cloud-based automation process from managing sample sequencing data to reporting in HTML form of all analysis results. We already have thousands of analysis cases, and we expect them to be of great help to researchers doing metagenome research.

V-010: Microbiome Preprocessing Machine Learning Pipeline
COSI: MICROBIOME
  • Yoram Louzoun, Bar Ilan University, Israel


Presentation Overview: Show

Background

16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML.
Methods

We checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification.
Results

We show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results.
Conclusions

The prepossessing of microbiome 16S data is crucial for optimal microbiome based Machine Learning. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand-alone version at: https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/Home Both contain the code, and standard test sets.

V-011: Identifying microbial drivers in biological phenotypes with a Bayesian Network Regression model
COSI: MICROBIOME
  • Samuel Ozminkowski, University of Wisconsin-Madison, United States
  • Claudia Solis-Lemus, University of Wisconsin-Madison, United States


Presentation Overview: Show

Understanding the composition of microbial communities and how these compositions shape biological phenotypes is crucial to comprehend complex biological processes in soil, plants, and humans. Standard approaches to study these connections do not account for correlations between microbes, and models to connect a microbial network to a biological phenotype remain unknown. A handful of new methods using a regression framework to identify associations between network predictors and a phenotype have been developed but have only been studied for dense networks.
We introduce a Bayesian Network Regression (BNR) model that uses the microbial network as the predictor of a biological phenotype. This model accounts for interactions among microbes and can identify influential interactions and microbes that drive phenotypic variability. While the model itself is not new, it has only been studied for brain networks. Its applicability to microbial networks, which are sparser and higher-dimensional, has not been studied. We develop the first thorough investigation of BNR models for microbial datasets on synthetic data generated under realistic biological scenarios. We show that this model can identify influential nodes and edges in the microbial networks that drive changes in the phenotype for most biological settings and identify scenarios where this method performs poorly.

V-012: SCRAPT: An Iterative Algorithm for Clustering Large 16S rRNA Data Sets
COSI: MICROBIOME
  • Tu Luan, University of Maryland, College Park, United States
  • Harihara Subrahmaniam Muralidharan, University of Maryland, United States
  • Marwan Alshehri, University of Maryland, College Park, United States
  • Mihai Pop, University of Maryland, United States


Presentation Overview: Show

16S rRNA sequence clustering is an important tool in characterizing the diversity of microbial communities. As 16S rRNA data sets grow in size, existing sequence clustering algorithms become an analytical bottleneck. Existing methods spend a lot of time with clustering singletons and produce fragmented clusters leaving a gap for further improvements. We propose an iterative sampling-based 16S rRNA sequence clustering approach that targets the largest clusters, allowing users to stop the clustering process when sufficient clusters are available for the specific analysis being targeted. We describe a probabilistic analysis of the iterative clustering process that supports the intuition that the clustering process identifies the larger clusters in the data set first. Using real data sets of 16S rRNA gene sequences, we show that our iterative algorithm SCRAPT, coupled with an adaptive sampling process and a mode-shifting strategy for identifying cluster representatives, substantially speeds up the clustering process while being effective at capturing the large clusters in the dataset. The experiments also show SCRAPT is able to produce Operational Taxonomic Unit (OTUs) which are less fragmented than popular tools UCLUST, CDHIT and DNACLUST.

Software Availability: The algorithm is implemented in the open-source package SCRAPT and is available at https://github.com/hsmurali/SCRAPT.

V-013: A protocol for studying metabolic interactions in a microbial community using graph-based approaches
COSI: MICROBIOME
  • Dinesh Kumar Kuppa Baskaran, Indian Institute of Technology Madras, India
  • Karthik Raman, Indian Institute of Technology Madras, India


Presentation Overview: Show

A major part of research in microbial systems biology deals with revealing the ecological principles that shape a microbial community. Approaches that can help understand the microbial inter-species interactions open new landscapes for controlling, engineering and synthesizing microbial communities for various applications. Computational studies in this field often rely on genome-scale metabolic models built from genome sequences. The main bottleneck of using these models is the need for manual curation. Alternatively, genome-scale metabolic networks generated from draft metabolic models lend themselves to many graph-theoretic analyses. The method we exploited in our research is based on a Python package previously developed in our group, MetQuest, which employs graph-theoretic algorithms to study metabolic networks. We developed scripts for constructing individual and community metabolic networks and predicting possible pairwise microbial interactions, higher-order microbial interactions, metabolic exchanges, and unique contributors of metabolic capabilities in the community. The ability of this approach to capture the effect of change in the metabolic environment allows us to study the change in interaction patterns with a different environment. This method can help researchers focus wet-lab experiments on a specific aspect by pointing out possible interesting interactions in a microbiome. The link to MetQuest can be found at https://github.com/RamanLab/metquest

V-014: Characterization and integration of transcriptional and microbial profiles of oral lesions and cancer
COSI: MICROBIOME
  • Mohammed Muzamil Khan, Boston University, United States
  • Jennifer Frustino, Erie County Medical Center, United States
  • Alessandro Villa, University of California, San Francisco, United States
  • Cuc Bach-Nguyen, Boston University School of Medicine, United States
  • Sook Bin-Woo, Brigham & Womens Hospital, United States
  • Xaralabos Varelas, Boston University School of Medicine, United States
  • Maria Kukuruzinska, Boston University School of Medicine, United States
  • Stefano Monti, Boston University School of Medicine, United States


Presentation Overview: Show

Head and neck cancer is a complex malignancy with its major anatomical subsite, cancer of the oral cavity, ranking among the most deadly and disfiguring cancers due to lack of early detection and effective treatments. Oral cancer(OC) presents primarily as HPV-negative oral squamous cell carcinoma, whose etiology includes tobacco and alcohol use. OC is thought to progress through a series of well-defined clinical and histopathological stages that starts off as premalignant lesions(PML). In this study, using total RNA sequencing and leveraging multiple bioinformatics methods, including differential gene and pathway enrichment analyses, we show that the PMLs are characterized by the activation of major pro-inflammatory and tumor-promoting pathways, such as epithelial-to-mesenchymal transition, TNFa, and NFkB, along with anti-inflammatory pathways, such as FCGR and IL6, which may prevent PMLs from further progressing. Through mediation analysis integrating host transcriptome and microbiome, we further show that these pathways may be driven by a concomitant differential abundance of specific microbes previously shown to be associated with OC. These results suggest that the cross-talk between host and microbial activity may play a significant role in the malignant transformation of PMLs and mal help uncover early detection markers and drivers of transformation of HPV(-) lesions and cancers.

V-015: Batch Effect Correction of Metgenomic Data using ComBat-Seq Improved for Microbiome Research
COSI: MICROBIOME
  • Howard Fan, Boston University, United States
  • Julie Palmer, Boston University, United States
  • Jessica Petrick, Boston University, United States
  • Evan Johnson, Boston Unversity, United States


Presentation Overview: Show

A major obstacle for reproducibility of microbiome research is the high sensitivity of microbial compositions to external factors and batch-to-batch technical variability. This unwanted variation impacts the data during sample processing, resulting in batch effects that often hinder analysis of factors of interest. While batch effect adjustment methods have been developed for other biomedical data, including sequencing applications, they do not appropriately account for two unique features of microbiome data: 1) its compositional nature, and 2) extreme overdispersion and zero-inflation. We examined the effectiveness of existing methods for batch correction, e.g., ComBat-Seq, in removing batch effects from metagenomic data and propose improvements that address the needs in microbiome data. We used 640 saliva samples from participants in the Black Women’s Health Study, which were processed in two batches. After a filtering step to remove rare OTUs, batch effect adjustment using ComBat-Seq reduced the differences detected between batches using hierarchical cluster analysis (before: p < 0.001; after: p = 0.902). We also evaluated the use of log-ratio transformations commonly used for compositional data analyses. Overall, we conclude that our improvements to ComBat and ComBat-Seq are effective in removing batch effect from metagenomic data and improving the statistical power of downstream analysis.

V-017: Breastfeeding and Farming Lifestyle Promotes Predominant Bifidobacterium in Infants
COSI: MICROBIOME
  • Deborah Chasman, University of Wisconsin-Madison, United States
  • Krittisak Chaiyakul, University of Wisconsin-Madison, United States
  • Samantha Fye, University of Wisconsin-Madison, United States
  • James Gern, University of Wisconsin-Madison, United States
  • Susan Lynch, University of California San Francisco, United States
  • Christine Seroogy, University of Wisconsin-Madison, United States
  • Irene Ong, University of Wisconsin-Madison, United States


Presentation Overview: Show

Introduction: The inception of immune mediated disorders, which have increased worldwide, typically occurs during early childhood and leads to chronic and lifelong diseases. Children exposed to microbes from pets, farm animals, or from traditional communities such as the Amish, have reduced rates of these diseases. The gut microbiome influences neonatal immune development; however, the contributing microbial features are unknown. We compared stool metagenomes from Wisconsin infants from three levels of farming exposures: traditionally-farming Amish (n=27), dairy farming (n=46), and rural non-farming (n=43). We hypothesized that microbiome composition would vary between the groups.

Methods: We analyzed farm group, diet, and metagenomic features using statistical tests and machine learning.

Results: Microbiome composition significantly differed by diet and farm group. Machine learning models successfully classified Amish from non-Amish (AUROC=0.94). Variable importance and statistical analysis highlighted a significantly greater abundance of Bifidobacterium longum in Amish. Gene families found uniquely in Amish samples included genes from B. longum infantis, which encodes a large complement of human milk utilization gene clusters.

Conclusion: Breastfeeding and Amish lifestyle influence early gut colonization. Pioneer microbes may protect against colonization by pathogens and aid immune maturation via metabolic products of human milk.

V-018: Pan-cancer characterization of microbiome signatures
COSI: MICROBIOME
  • Wei-Hao Lee, Systems, Synthetic, and Physical Biology Program, Rice University, United States
  • Ruth Dannenfelser, Department of Computer Science, Rice University, United States
  • Vicky Yao, Department of Computer Science, Rice University, United States


Presentation Overview: Show

Cancer has been studied at the molecular and genetic level in cells for decades, but non-cellular elements from the microenvironment, such as the microbiome, are rarely investigated. The tumor-resident microbiome has been linked to malignancies in both direct and indirect ways. To systematically characterize the cancer-associated microbiome, we re-examine the 32 cancer types from The Cancer Genome Atlas (TCGA) by matching non-human reads to microbial reference genomes. We then use semi-supervised non-negative matrix factorization to identify microbiome signatures. By analyzing these signatures, we successfully recapitulate known cancer-associated microbes and further identify several novel associations, including signatures that are linked with survival outcomes. This comprehensive investigation provides an overview of the microbiome spectrum across cancer types and establishes a new method for assessing the interplay between the microbiome and human disease.

V-019: Critical assessment of pan-genomics of metagenome-assembled genomes
COSI: MICROBIOME
  • Yanbin Yin, University of Nebraska - Lincoln, United States
  • Tang Li, University of Nebraska - Lincoln, United States


Presentation Overview: Show

Background: Large scale metagenome assembly and binning to generate metagenome-assembled genomes (MAGs) has become possible in the past five years. As a result, millions of MAGs have been produced and increasingly included in pan-genomics workflow. However, pan-genome analyses of MAGs may suffer from the known issues with MAGs: fragmentation, incompleteness, and contamination. Here, we conducted a critical assessment of including MAGs in pan-genome analysis.

Results: We found that incompleteness led to more significant core gene loss than fragmentation. Contamination had little effect on core genome size but had major influence on accessory genomes. The core gene loss remained when using different pan-genome analysis tools and when using a mixture of MAGs and complete genomes. Importantly, the core gene loss was partially alleviated by lowering the core gene threshold and using gene prediction algorithms that consider fragmented genes. The core gene loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees.

Conclusions: We conclude that lowering core gene threshold and predicting genes in metagenome mode (as Anvi’o does with Prodigal) are necessary in pan-genome analysis of MAGs. Better quality control of MAGs and development of new pan-genome analysis tools specifically designed for MAGs are needed in future studies.

V-020: Charcoal: filtering contamination in metagenome-assembled genome bins and other genomes
COSI: MICROBIOME
  • Taylor Reiter, University of Colorado Anschutz Medical Campus, United States
  • N. Tessa Pierce-Ward, Population Health And Reproduction, University of California, Davis, United States
  • Luiz Irber, Population Health And Reproduction, University of California, Davis, United States
  • Erich M. Schwarz, Department of Molecular Biology and Genetics, Cornell University, United States
  • C. Titus Brown, Population Health And Reproduction, University of California, Davis, United States


Presentation Overview: Show

Metagenomics has expanded our knowledge of microbial diversity, but contaminant sequences are frequently accidentally included in metagenome-assembled genomes. Genome contamination is often estimated by the presence of marker genes that are biased against detecting contaminants lacking these sequences. Further, most contamination detection tools do not remove contamination. We present charcoal, a tool that rapidly identifies and removes contamination in metagenome-assembled genomes using k-mer based methods. K-mers are nucleotide sequences of length k. Sufficiently long k-mers are usually specific to a taxonomic lineage. Exploiting this property of k-mers, charcoal identifies majority and minority lineages for each contiguous sequence in a genome and removes contiguous sequences belonging to minority lineages when those lineages occur below a taxonomic threshold (by default, order). Applying charcoal to the GTDB rs207 database, we found approximately 25% of genomes in GTDB were contaminated, with contamination broadly distributed across species and occurring in both representative and RefSeq genomes. Genomes with longer contiguous sequences are less likely to be contaminated. Our results show concordance with CheckM on detecting the presence of contamination in a genome. Charcoal is a snakemake workflow developed around the tool sourmash. It is available at github.com/dib-lab/charcoal, and is pip installable.

V-021: DL-TODA: A Deep Learning Tool for Omics Data Analysis
COSI: MICROBIOME
  • Cecile Cres, University of Rhode Island, United States
  • Andrew Tritt, Lawrence Berkeley National Laboratory, United States
  • Kristofer Bouchard, Lawrence Berkeley National Laboratory, United States
  • Ying Zhang, University of Rhode Island, United States


Presentation Overview: Show

Shotgun metagenomics supports the profiling of diversity and functional potentials of microbiota without the need to cultivate individual microbes. However, many computational challenges remain in fully exploiting the application of metagenomics. This project aims to improve the taxonomic classification of metagenomic reads. Deep learning algorithms were applied with an objective to enhance the rate and accuracy of taxonomic classification. We developed the Deep Learning Tool for Omics Data Analysis (DL-TODA), capable of classifying short DNA sequences using a convolutional neural network. Training and testing were carried out on GPU platforms and implemented with TensorFlow. Simulated reads of bacteria were used in building and testing the model. Care was taken to avoid duplicate DNA sequences between the training and testing datasets to provide a non-biased test. DL-TODA classified DNA sequences with comparable or higher accuracy across all taxa in six distinct taxonomic ranks compared to other state-of-the-art taxonomic profiling tools. However, the DL-TODA predictions for some taxa had lower precision and/or recall values when compared with other tools. Further development is underway to enhance the identification of reads from individual taxonomic groups.

V-022: Genome Resequencing of Laboratory Stocks of Marine Heterotrophic Bacteria to Understand Laboratory Domestication
COSI: MICROBIOME
  • Natasha Gurevich, Boston University, United States
  • Helen Scott, Boston University, United States
  • Joseline Velasquez-Reyes, Boston University, United States
  • Daniel Segre, Boston University, United States
  • Melisa Osborne, Boston University, United States


Presentation Overview: Show

Marine heterotrophic bacteria play critical roles in marine food webs, biogeochemical cycles, and nutrient cycling. However, studies to predict bacterial behavior and function are complicated by laboratory domestication. Propagation of bacteria in the laboratory can result in cultures distinct from their wild ancestors. We performed an analysis of an "Evolve and Resequence" scenario with 47 marine heterotrophic bacteria. We selected bacterial cultures previously propagated over varying time periods in several labs. Samples were sequenced and we identified single nucleotide variants by comparing the results to each organism’s reference genome. We analyzed mutations with respect to propagation time, genome size and individual genes, and used hierarchical clustering to analyze the similarity of mutational patterns (the frequency of the 12 possible single nucleotide substitutions) among the samples. We found that genome size did not influence mutation count and that the gltD, gltB, and dapD genes were most frequently mutated. Estimated time of lab propagation and mode of sample storage did not correlate with mutational patterns, while phylogenetic relationships among samples did. These results provide further understanding of which factors influence bacterial mutation within the laboratory.

V-023: A constraint-based method to identify function-specific minimal microbiomes from large microbial communities
COSI: MICROBIOME
  • Aswathy K. Raghu, Northwestern University, United States
  • Karthik Raman, Indian Institute of Technology Madras, India


Presentation Overview: Show

Microorganisms thrive in large communities of diverse species, exhibiting various functionalities. The mammalian gut microbiome, for instance, has the functionality of digesting dietary fibre and producing different short-chain fatty acids. Not all microbes present in a community contribute to a given functionality; it is possible to find a minimal microbiome, which is a subset of the large microbiome, that is capable of performing the functionality while maintaining other properties such as growth rate. Such a minimal microbiome will also contain keystone species of that community. In the wake of perturbations of gut microbiome that results in disease conditions, cultivated minimal microbiomes can be administered to restore lost functionalities. In this work, we present a systematic approach to find a minimal microbiome for a specific functionality, from a large community. We employ a top-down approach with sequential deletion followed by solving a mixed-integer linear programming problem with the objective to minimize the $L_1$-norm of the membership vector. We demonstrate the utility of our algorithm by identifying the minimal microbiomes of some communities and discuss their validity based on the presence of the keystone species in the community.

Availability: The algorithm is available from \url{https://github.com/RamanLab/MinMicrobiome}

V-024: Using knowledge graphs to infer gut-brain axis interactions
COSI: MICROBIOME
  • Brook Santangelo, University of Colorado Anschutz Medical Campus, United States
  • Lawrence Hunter, University of Colorado Anschutz Medical Campus, United States
  • Catherine Lozupone, University of Colorado Anschutz Medical Campus, United States


Presentation Overview: Show

Knowledge graphs in the biomedical field have broad applications by enabling a simplified representation of existing knowledge. Evaluating the gut microbiome in the context of disease benefits from this global context and abstraction due to the multi-omic nature of microbial processes. Using a microbiome-relevant knowledge graph, we developed a pipeline to predict mechanisms between a microbe and a disease, neurotransmitter, phenotype, or other entity of interest. Over 1800 microbe-gene or microbe-metabolite relationships from the gutMGene database were incorporated into the large biomedically relevant knowledge graph PheKnowLator. The assertions were mapped to a statement of semantic representation following the W3C Web Ontology Language (OWL) representation scheme. This introduced nodes representing microbes in the context of their anatomical location (i.e. the gastrointestinal tract) and the species in which that interaction took place (humans or mice). We used vector embeddings of the knowledge graph generated by Node2Vec to infer microbe-disease relationships. We performed a shortest path search between microbe-disease pairs by including each first order neighbor of the microbes in the path and weighting edges by relevance. This method enables multiple mechanistic hypotheses surrounding complex interactions between gut microbes and neurological disorders to be generated in a scalable and comprehensive manner.

V-025: Critical Assessment of Metagenome Interpretation - the second round of challenges
COSI: MICROBIOME
  • Fernando Meyer, Helmholtz Centre for Infection Research, Germany
  • Adrian Fritz, Helmholtz Centre for Infection Research, Germany
  • Alice C. McHardy, Helmholtz Centre for Infection Research, Germany


Presentation Overview: Show

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.

V-026: Clusters of interactions between operational taxonomical units in human gut microbiome
COSI: MICROBIOME
  • Witold Wydmański, Jagiellonian University, Poland
  • Valentyn Bezshapkin, Małopolska Centre of Biotechnology of Jagiellonian University, Poland
  • Krzysztof Mnich, University of Białystok, Poland
  • Michał Kowalski, Jagiellonian University, Poland
  • Dagmara Błaszczyk, Universytet Jagielloński, Poland
  • Katarzyna Kopera, Małopolska Centre of Biotechnology of Jagiellonian University, Poland
  • Klas Udekwu, Swedish University of Agricultural Sciences, Sweden
  • Tomasz Kosciolek, Jagiellonian University in Kraków, Poland
  • Witold Rudnicki, Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Poland
  • Paweł P. Łabaj, Małopolska Centre of Biotechnology of Jagiellonian University, Poland
  • Ymke de Jong, Stockholm University, Sweden


Presentation Overview: Show

OTUs, one of the most common metagenomic data description techniques, are formed by combining 16S mRNA sequences together in clusters based on their similarity. Researchers have discovered a number of associations between various illnesses, such as schizophrenia, and the prevalence of particular taxons. There are also many health problems that also should intuitively correlate micriobiome status but the mechanism of these relations is not clear.

One of the possible explanations is that the OTU classification provides too small resolution and obscures important data about the samples, focusing on classification instead of functions.

In this study we focus on identification of interaction relationships between different OTUs using the principles of information theory. We use multidimensional feature selection (MDFS) algorithm identifying possible interections. Then, we sample this graph to find functional siblings of specified taxons and group them to define functionally unique clusters.

We assess the validity of those clusters using metabolic pathways prediction via PICRUSt. Additionaly, we use a novel method based on automated literature research with NLP techniques to explore historical research about specific taxons. The results show that publications mentioning specific taxa convey enough information to recreate clustering, further supporting our hypothesis of importance of interactions within functional groups