Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide



Preparing your Poster - Information and Poster Size
Poster Schedule
Print your poster in Chicago
Poster Categories

View Posters By Category

Session A: (July 7 and July 8)
Session B: (July 9 and July 10)
B-578: AMBER: Assessment of Metagenome BinnERs
  • Fernando Meyer, Helmholtz Centre for Infection Research, Germany
  • Peter Hofmann, Helmholtz Centre for Infection Research, Germany
  • Peter Belmann, Helmholtz Centre for Infection Research, Germany
  • Ruben Garrido-Oter, Max Planck Institute for Plant Breeding Research, Germany
  • Adrian Fritz, Helmholtz Centre for Infection Research, Germany
  • Alexander Sczyrba, Bielefeld University, Germany
  • Alice McHardy, Helmholtz Centre for Infection Research, Germany

Short Abstract: Reconstructing the genomes of microbial community members is key to the interpretation of shotgun metagenome samples. Genome binning programs deconvolute reads or assembled contigs of such samples into individual bins, but assessing their quality is difficult due to the lack of evaluation software and standardized metrics. We present AMBER, an evaluation package for the comparative assessment of genome reconstructions from metagenome benchmark data sets. It calculates the performance metrics and comparative visualizations used in the first benchmarking challenge of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). As an application, we show the outputs of AMBER for eleven different binning software options on two CAMI benchmark data sets. AMBER is implemented in Python and available under the Apache 2.0 license on GitHub (https://github.com/CAMI-challenge/AMBER).

B-580: mi-faser, fast and accurate annotation of microbiome sequencing reads
  • Chengsheng Zhu, Rutgers University, United States
  • Maximilian Miller, Rutgers University, United States
  • Srinayani Marpaka, Rutgers University, United States
  • Pavel Vaysberg, Rutgers University, United States
  • Malte Rühlemann, Christian-Albrechts-University of Kiel, Germany
  • Guojun Wu, Shanghai Jiao Tong University, China
  • Benoît Pinto, Université de Lyon, France
  • Femke-Anouska Heinsen, Christian-Albrechts-University of Kiel, Germany
  • Marie Tempel, Christian-Albrechts-University of Kiel, Germany
  • Catherine Larose, Université de Lyon, France
  • Timothy Vogel, Ecole Centrale de Lyon, France
  • Liping Zhao, Shanghai Jiao Tong University, China
  • Wolfgang Lieb, Christian-Albrechts-University of Kiel, Germany
  • Andre Franke, Christian-Albrechts-University of Kiel, Germany
  • Yana Bromberg, Rutgers University, United States

Short Abstract: Molecular functionality of microbiomes is often assessed via meta-genomic/-transcriptomic sequencing. We recently created mi-faser, a computational method for super fast (minutes-per-microbiome) and accurate (90% precision) mapping of sequencing reads to molecular functions of the read-correspondent genes, augmented with a manually curated reference database. Comparing microbiome function profiles between different conditions, we identified previously unseen oil degradation-specific functions in BP oil-spill data, as well as functional signatures of individual-specific gut microbiome responses to a dietary intervention in children with Prader-Willi syndrome. Mi-faser also distinguished Crohn's Disease patient microbiomes from those of related healthy individuals, highlighting the microbiome role in CD pathogenicity. In a subsequent, soon-to-be-published, study of snow from Svalbard, Norway, we identified higher microbiome dissimilarity in the early vs. late spring samples, suggesting a community recovery hypothesis. The observed correlation between organic acid levels and geraniol degradation pathway further indicates that members of those communities can degrade complex organic compounds at temperatures below 0°C. These are potentially valuable in both industrial and bioremediation sense and will be followed up experimentally. In short, due to its speed, accuracy, and robustness to evolutionary differences, mi-faser is useful for generating testable hypothesis of emergent microbiome molecular functionality.

B-582: Multivariable Association in Population-scale Meta'omic Surveys
  • Himel Mallick, Harvard University, United States
  • Timothy L. Tickle, Harvard University, United States
  • Lauren J. McIver, Harvard University, United States
  • George Weingart, Harvard University, United States
  • Joseph N. Paulson, Genentech, United States
  • Siyuan Ma, Harvard University, United States
  • Boyu Ren, Harvard University, United States
  • Emma Schwager, Harvard University, United States
  • Ayshwarya Subramanian, Harvard University, United States
  • Eric Franzosa, Harvard University, United States
  • Hector Corrada Bravo, University of Maryland, United States
  • Curtis Huttenhower, Harvard University, United States

Short Abstract: It is challenging to relate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi’omics are typically noisy, sparse (zero-inflated), high-dimensional, and extremely non-normal, often in the form of either count or compositional measurements. Here, we introduce an optimal combination of established methodology to assess multivariable association of microbial community features with complex metadata in population-scale epidemiological studies. Our approach, MaAsLin2 (Multivariable Association with Linear Models), relies on multiple statistical models to account for the inherent characteristics of modern meta’omic epidemiology study designs, including repeated measures and multiple covariates. To construct this method, we conducted a large-scale evaluation of a broad range of data settings under which straightforward identification of meta’omic associations can be challenging. These simulation studies reveal that MaAsLin2 preserves statistical power in the presence of repeated measures and multiple covariates while accounting for the nuances of meta’omic features and controlling false discovery. Finally, we applied MaAsLin2 to a microbial multi’omic dataset from the Integrative Human Microbiome Project (HMP2) which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel disease (IBD) across multiple time points and ‘omics profiles.

B-584: CAMISIM: Simulating metagenomes and microbial communities
  • Adrian Fritz, Helmholtz Centre for Infection Research, Germany
  • Peter Hofmann, Helmholtz Centre for Infection Research, Germany
  • Stephan Majda, Heinrich Heine University Dusseldorf, Germany
  • Eik Dahms, Helmholtz-Centre for Infection Research, Germany
  • Johannes Dröge, Chalmers University of Technology, Sweden
  • Jessika Fiedler, Heinrich Heine University Dusseldorf, Germany
  • Till R. Lesker, Helmholtz-Centre for Infection Research, Germany
  • Peter Belmann, Helmholtz-Centre for Infection Research, Germany
  • Matthew Z. Demaere, University of Technology, Sydney, Australia
  • Aaron E. Darling, University of Technology, Sydney, Australia
  • Alexander Sczyrba, Bielefeld University, Germany
  • Andreas Bremges, Helmholtz Centre for Infection Research, Germany
  • Alice McHardy, Helmholtz Centre for Infection Research, Germany

Short Abstract: Studies like the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI) have shown that, while metagenomic software already produces promising results, there is still a lot of room for improvement and an absolute requirement for standardized benchmarking data sets. To overcome these obstacles, we present CAMISIM, a software for the automatic generation of complete microbial communities in silico. CAMISIM already was successfully used in the creation of the data sets used in the first CAMI challenge and provides vast possibilities for the personalization of the desired data sets such that they represent certain microbial compositions, sampling strategies and experimental setups as closely as possible. In addition to providing full microbial sequence samples, CAMISIM always provides a ground truth for assembling, binning and profiling of the produced metagenome which subsequently can be used to measure performance of different metagenomic software. We successfully used CAMISIM to create different data sets to show its value in producing both small, specialised data sets for testing metagenomic software as well as large, realistic benchmarking data sets. CAMISIM is implemented in Python and available under the Apache 2.0 license on GitHub (https://github.com/CAMI-challenge/CAMISIM)

B-585: Microbiome Search Engine: Enabling Platform for Large-scale Comparative Microbiome Research
  • Xiaoquan Su, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, China
  • Gongchao Jing, Qingdao Institute of Bioenergy and Bioprocess Technology, China
  • Honglei Wang, Qingdao institute of bioenergy and bioprocess technology, Chinese academy of sciences, China
  • Zheng Sun, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, China
  • Shi Huang, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, China
  • Jian Xu, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, China

Short Abstract: With the rapid growth of microbiome research across the globe, one major challenge is large-scale analysis of metagenome datasets under context of the metagenome space known to mankind to-date. Here we describe Microbiome Search Engine (MSE), a powerful database engine that enables rapid sample search against large-scale of microbiome datasets. This database now contains 101,983 curated microbiome samples that are of clear scientific background from 293 studies. The input query microbiome can be uploaded by users via web or standalone interface and then searched against the entire database for structurally or functionally similar microbiomes in a ‘BLAST-like’ manner. The querying speed is independent of database size, and is less 0.3 second against the whole database, which is 80 times faster than pairwise exhaustive comparison. The search results provide visualized organismal or functional alignment patterns between queries and matches with quantitative similarity scores. The search for ‘best matches’ and ‘top N matches’ from the vast amount of microbiomes accumulated so far represents a novel way for not only annotation new microbiome datasets but also identifying scientific hypotheses that probe the complex interplay between microbiome features and ecosystem parameters. More information is available at:

B-586: Predicting Microbial Ecology from Shotgun Metagenomic Data
  • Jamie Strampe, Boston University, United States
  • Anthony Federico, Boston University, United States
  • Aaron Chevalier, Boston University, United States

Short Abstract: Thousands of bacterial species can coexist in one gram of soil, but little is currently known about the structure and metabolic interactions of these communities. When grown in carbon-limited medium, bacteria grown ex situ from soil samples formed stable family-level communities at steady state. This result is surprising given that species competing for a single limiting resource, in our case a single carbon source, should not be able to stably coexist according to the competitive exclusion principle. Spent media experiments indicate that metabolites are exchanged in this microbial community, but we do not know which metabolites or how these cross-feeding interactions contribute to producing a stable and reproducible community structure. To address these questions, we used whole genome shotgun sequencing and experimentally-derived phenotypic data to build constraint-based models of core carbon metabolism for five bacterial community members. Utilizing flux balance analysis (FBA), we simulated growth and predicted pairwise cross-feeding interactions. Simulations showed that our model of each organism was capable of growing on the spent media of all other organisms. They further exhibited expected levels of carbon-conversion efficiency and cross-feeding preference consistent with experimental results.

B-587: Metagenomics for the ImMiGeNe Project
  • Sina Beier, University of Tuebingen, Germany
  • Daniel H. Huson, University of Tuebingen, Germany
  • Silke Peter, Institute for Medical Microbiology and Hygiene, University Clinics Tuebingen, Germany
  • Peter Lang, University Clinics Tuebingen, Germany
  • Alexander N.R. Weber, University of Tuebingen, Germany

Short Abstract: The goal of the ImMiGeNe Project is to implement a streamlined pipeline for high-throughput sampling, sequencing, analysis and integrative interpretation of clinical data collected from patients undergoing hematopoietic stem cell transplantation. Gut metagenomic samples, whole exome sequencing data, immunogenic characteristics and inflammatory biomarkers will be collected from patients before and after they receive their transplant. Donors will have their gut microbiome sampled, to be able to compare the communities before and after transplantation and monitor their change in relation to the change in the patient’s immune system. The bioinformatics analysis pipeline we established is highly automated with configuration files provided for different scenarios that guide the analysis. This way it will be easy to use in the clinical diagnostic workflow in the future to determine the microbial community match of a patient and their potential donor. We hope that the gut microbiome can provide markers to predict the outcome of the procedure and aid the donor selection process. The pipeline includes raw read preprocessing and uses fast alignment tools such as DIAMOND and MALT to enable fast processing of samples. Taxonomic and functional analysis is conducted using MEGAN as well as additional scripts for in-depth analysis using the available metadata.

B-588: National Center for Genome Analysis Support and eXtreme Science and Engineering Discovery Environment as platforms for metagenomic analysis
  • Bhavya Papudeshi, National Center for Genome Analysis Support, United States
  • Sheri Sanders, National Center for Genome Analysis Support, United States
  • Carrie Ganote, National Center for Genome Analysis Support, United States
  • Phil Blood, Pittsburg Supercomputing center, United States
  • Thomas G. Doak, National Center for Genome Analysis Support, United States

Short Abstract: Culture-independent metagenomic methods are commonly employed in microbiome studies to study the millions of microbes living around/within a host and understand how they impact host function. The rapid decrease in sequencing costs and increasing interest in microbiome studies have increased the metagenome sample sizes per study. The critical bottleneck in analyzing large metagenome samples is access to computational resources: high-performance computing (HPC) and high-performance data analytics (HPDA) systems that have large memory, fast I/O, and multithreaded, and distributed parallel computing necessary to analyze these complex communities. The National Center for Genome Analysis Support (NCGAS) and eXtreme Science and Engineering Discovery Environment (XSEDE) are NSF-funded organizations that collaborate to enable the biological research community to analyze genomic data. Through NCGAS and XSEDE, users have access to a range of resources for bioinformatic analysis. NCGAS and XSEDE offer additional support as online materials, training/workshops, project consultation, and software installation, optimization, and maintenance on these clusters. Metagenomic analysis is also constantly evolving to include additional steps or new upgraded software. Organizations such as NCGAS and XSEDE are resources for staying current with these developments.

B-589: Batch Effects Correction for Microbiome Data with Dirichlet-multinomial Regression
  • Fangda Song, The Chinese University of Hong Kong,
  • Zhenwei Dai, The Chinese University of Hong Kong, Hong Kong
  • Yingying Wei, The Chinese University of Hong Kong, Hong Kong
  • Jun Yu, The Chinese University of Hong Kong, Hong Kong
  • Hei Wong, The Chinese University of Hong Kong, Hong Kong

Short Abstract: Metagenomic sequencing techniques enable quantitative analyses of the microbiome. However, combining the microbial data from these experiments is challenging due to the variations between experiments. The existing methods for correcting batch effects do not consider the interactions between variables---microbial taxa in microbial studies---and the overdispersion of the microbiome data. Therefore, they are not applicable to microbiome data. We developed a new method, Bayesian Dirichlet-multinomial regression meta-analysis (BDMMA), to simultaneously model the batch effects and detect the microbial taxa associated with phenotypes. BDMMA automatically models the dependence among microbial taxa and is robust in detecting associations in high-dimensional, over-dispersed microbiome data with sparse associations. Simulation studies and real data analysis have shown that BDMMA can successfully adjust batch effects and substantially reduce false discoveries in microbial meta-analyses. BDMMA is a powerful tool to perform meta-analysis for metagenomic studies and detect taxa that are truly associated with the phenotypes with high accuracy. We envision that BDMMA will be widely applied in practice, especially with the rise of large consortium projects such as the American Gut Project and the MetaHIT project.

B-590: Population structure discovery in meta-analyzed microbial communities
  • Siyuan Ma, Harvard University, United States
  • Dmitry Shugin, Harvard University, United States
  • Himel Mallick, Harvard University, United States
  • Raivo Kolde, Philips Research, United States
  • Eric A. Franzosa, Harvard University, United States
  • Hera Vlamakis, Harvard University, United States
  • Ramnik Xavier, Harvard University, United States
  • Curtis Huttenhower, Harvard University, United States

Short Abstract: Human microbiome studies have now achieved a scale at which it is practical to associate features of the microbiome with health outcomes and covariates in multiple large populations. This permits the development of rigorous meta-analysis and population structure analysis methods. We have developed MMUPHin (Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies), a set of normalization, meta-analysis, and population structure discovery methods appropriate for microbiome taxonomic and functional profiles. By applying our methods to a combination of eight inflammatory bowel disease (IBD) cohorts (5,232 total samples), we characterized consistent population structure in patients’ gut microbiomes. Evaluation of data handling practices identified those most sensitive to biological variation and robust to batch and technical differences, including known effects of Bacteroides and Prevotella species. Linear mixed effects models revealed consistent enrichment and depletion in the IBD population versus controls. Finally, multiple unsupervised clustering methods, combined with different clustering strength metrics, agreed on a lack of discrete microbiome “types” in the IBD gut microbiome.As these results are consistent across datasets, we anticipate they will provide a reference for the IBD microbiome and a future framework for human microbiome meta-analyses more broadly.

B-591: Combining 16S amplicon and shotgun sequencing to investigate gut microbiota difference between Type-2 diabetic and non-diabetic obese adults
  • Tam Tran, University College Cork, Viet Nam
  • Edel Cormac, University College Cork, Ireland
  • Céline Ribière, University College Cork, France
  • Werner Frei, University College Cork, Italy
  • Paul O'Toole, University College Cork, Ireland

Short Abstract: The human gut microbiota interacts with host metabolism in conditions including insulin resistance, type-2 diabetes (T2D) and obesity. However, the exact contribution of gut microbiota to the development of T2D is not fully understood due to the complexity and diversity of gut microbes, ethnic variation and large variations between individuals studied. The aim of this study was to characterize the gut microbiome of obese adults with T2D versus non-diabetics using 16S rRNA gene amplicon sequencing and shotgun metagenomics. We identified that phylum Firmicutes and Bacteroidetes were the major dominant phyla by both sequencing approaches, but 16S analysis was biased towards detect higher proportions of Firmicutes. An increased abundance of Bacilli, in particular Streptococcus, in both approaches and a decrease of Clostridiales in metagenomics data were noted in T2D compared to non-diabetic subjects. Furthermore, functional profiling of shotgun data using HUMAnN2 revealed significant differences in pathway abundances linked to specific species involving short-chain fatty acids and branched chain amino acids. These findings suggest that shotgun sequencing has complementary advantages compared with the 16S amplicon approach in studying the association between gut microbiota and T2D, which provided comprehensive insights into bacterial communities and their functional repertoires.

B-592: Fecal Microbiota Analysis of Dairy Cattle over 12 Stages of Dairy Production Lines
  • Lei Zhao, UNC Charlotte, United States
  • Xunde Li, University of California, Davis, United States
  • Zhengchang Su, uncc, United States

Short Abstract: The bacterial communities in the gut of dairy cattle are very important since they relate to host health, milk production and food safety. However, a comprehensive analysis of gut microbiota in dairy cattle corresponding to each dairy production stage is still lacking. Here we report a systematic analysis of fecal microbiota from 90 dairy cattle over 12 dairy production stages using the DADA2 package, which models and corrects sequencing amplicon errors and thus infers variants instead of traditional OTU clustering approaches that can easily mask biological variation. The study identified 236 genera in 21 phyla predominated by Firmicutes, Patescibacteria, and Verrucomicrobia. The next-generation sequencing data revealed a high level of heterogeneity in terms of diversity, richness, and composition in cattle of various stages, especially between parous and nulliparous animals. Additionally, we summarized compositional change patterns of overall bacteria along the stages as well as patterns of certain interesting taxa such as Ruminococcaceae, an active plant degrader. Generally, this study provides the complete insights into the stability, variability, and composition of gut microbiota in dairy cattle over the entire dairy production lines and it may lay a foundation for future research on dairy food safety, ruminants management, and disease control.

B-593: Prediction of Host–Pathogen Interactions for Helicobacter pylori by Interface Mimicry and Implications to Gastric Cancer
  • Emine Guven Maiorov, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute, United States
  • Ruth Nussinov, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute, United States
  • Chung-Jung Tsai, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute, United States
  • Buyong Ma, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute, United States

Short Abstract: There is a strong correlation between some pathogens and certain cancer types. One example is Helicobacter pylori and gastric cancer. Exactly how they contribute to host tumorigenesis is, however, a mystery. Pathogens often interact with the host through proteins. To subvert defense, they may mimic host proteins at the sequence, structure, motif, or interface levels. Interface similarity permits pathogen proteins to compete with those of the host for a target protein and thereby alter the host signaling. Detection of host–pathogen interactions (HPIs) and mapping the re-wired superorganism HPI network—with structural details—can provide unprecedented clues to the underlying mechanisms and help therapeutics. Here, we describe the first computational approach exploiting solely interface mimicry to model potential HPIs. Interface mimicry can identify more HPIs than sequence or complete structural similarity since it appears more common than the other mimicry types. We illustrate the usefulness of this concept by modeling HPIs of H. pylori to understand how they modulate host immunity, persist lifelong, and contribute to tumorigenesis. H. pylori proteins interfere with multiple host pathways as they target several host hub proteins. Our results help illuminate structural basis of resistance to apoptosis, immune evasion, and loss of cell-junctions seen in infected host cells.

B-594: Ampliclust: A Fully Probabilistic Model-Based Approach Denoising Illumina Amplicon Data
  • Xiyu Peng, Iowa State University, United States
  • Heliang Shi, Pfizer, United States
  • Karin Dorman, Iowa State University, United States

Short Abstract: Next-generation amplicon sequencing is a powerful tool for understanding microbial communites. Downstream analysis is often based on the construction of Operational Taxonomic Units (OTUs) with dissimilarity threshold 3%. The arbitrary threshold and reliance on OTU references can lead to low resolution, false positives, and misestimation of alpha and beta microbial diversity. We introduce Ampliclust, a reference-free method to resolve the number, abundance and identity of error-free sequences in Illumina Amplicon data. Unlike existing methods, Ampliclust is a fully probabilistic model, allowing the data, rather than an algorithm or an external database, drive the conclusions. We use a modified Bayesian information criterion to estimate the number of sequence variants and obtain maximum likelihood estimates of the abundance and identity of error-free sequences. Our model is able to match the performance of Dada2 on well-separated mock communities, but in simulated communities with more similar real sequences, Ampliclust can achieve better accuracy. The major challenge is the computational scalability, which we begin to address through principled iterative schemes and improved initialization methods.

B-595: Genomics-based prediction of metabolic phenotypes in microbial communities
  • Stanislav Iablokov, Institute for Information Transmission Problems and P.G. Demidov Yaroslavl State University, Russia
  • Pavel Novichkov, Lawrence Berkeley National Laboratory, United States
  • Andrei Osterman, Sanford Burnham Prebys Institute, United States
  • Dmitry Rodionov, Sanford Burnham Prebys Institute and IITP RAS, United States

Short Abstract: High-throughput genomic and metagenomic sequencing revolutionized exploration of complex microbial communities such as human or soil microbiota. We have developed an approach for describing and comparing microbial communities in terms of their metabolic (in addition to phylogenetic) signatures. Using subsystems-based metabolic reconstruction methodology we infer phenotypic features (nutrient requirements, carbohydrate utilization capabilities, quorum sensing etc) directly from microbial genomes. The obtained collection of binary metabolic phenotypes for ~2,300 reference bacterial genomes representing human gut microbiota was used in two-step pipeline for prediction of phenotype profiles for 16S RNA samples. The upstream module determines taxonomic composition of input samples using classifiers implemented in QIIME2 and various 16S databases (GreenGenes, NCBI, RDP, SILVA) assessed for coverage of reference collection. The downstream module calculates the matrix of cumulative phenotypes normalized by species abundance for each sample and each metabolic feature. It uses a three-step taxonomic mapping procedure and computes averaged phenotype indices at the levels of species, genus and family for probabilistic assessment of metabolic features of those taxonomic entities that cannot be mapped to presently available reference genomes of individual species. We also implemented the sequence-based weighted mapping to reference genomes and compared it with taxonomy-based approaches for community phenotype inference.

B-596: Constructing Lightweight and Flexible Pipelines Using Plugin-Based Microbiome Analysis (PluMA)
  • Trevor Cickovski, Florida International University, United States
  • Giri Narasimhan, Florida International University, United States

Short Abstract: Software pipelines have become almost standardized tools for microbiome analysis. Currently many pipelines are available, often sharing some of the same algorithms as stages. This is largely because each pipeline has its own source language and file formats, making it typically more economical to reinvent the wheel than to learn and interface to an existing package. We present Plugin-Based Microbiome Analysis (PluMA), which addresses this problem by providing a lightweight back end that can be infinitely extended using dynamically loaded plugin extensions. These can be written in one of many compiled or scripting languages. With PluMA and its online plugin pool, algorithm designers can easily plug-and-play existing pipeline stages with no knowledge of their underlying implementation, allowing them to efficiently test a new algorithm alongside these stages or combine them in a new and creative way. We demonstrate the usefulness of PluMA through an example pipeline (P-M16S) that expands an obesity study involving gut microbiome samples from the mouse, by integrating multiple plugins using a variety of source languages and file formats, and producing new results.

B-597: Associating Holobiont Pathways with Host Disease Traits Using Microbiome Data via an Adaptive Canonical-correlation Model
  • Yongzhong Zhao, Lerner Research Institute, United States
  • Zeneng Wang, Lerner Research Institute, United States

Short Abstract: The underlying holobiont pathways of the holobiont (the host and the microbial symbionts), encompass genes and proteins encoded by the hologenome (the collective genomic content of a host and its microbiome), metabolites and other molecules. Holobiont pathway underpins host-microbiota interactions that etiologically underlie complex diseases. Given metagenomics data generally featured with a limited sample size and millions of microbe gene variables, it remains challenging to correlate holobiont pathways to host traits especially for those disease and intermedia phenotypes. Here we present a framework of associating holobiont pathways with complex diseases including type II diabetes mellitus (T2DM), atherosclerotic cardiovascular disease (ACVD) and inflammatory bowel disease (IBD) by using microbiome data via an adaptive canonical-correlation analysis model (Acam). In this model, we weight the metagene variables with enzyme kinetic data such as the Michaelis constants of related encoding enzymes and gut microbiome metagene abundance. We implemented this model into an R package HolobiontR and assessed it with synthetic and real microbiome datasets. By applying this model to the MetaHIT and HMP2 metagenome and metatranscriptome data, we show the potential association of the short chain fatty acid pathway with T2DM, the secondary bile acids pathway with IBD, and trimethylamine holobiont pathways with ACVD.

B-598: Metagenomics-Based Ecosystem Biomonitoring: EcoBiomics
  • Christine Lowe, Agriculture and Agri-Food Canada, Canada
  • Donald Baird, Environment and Climate Change Canada, Canada
  • Guillaume Bilodeau, Canadian Food Inspection Agency, Canada
  • Charles Greer, National Research Council of Canada, Canada
  • Armand Seguin, Natural Resources Canada, Canada
  • Iyad Kandalaft, Agriculture and Agri-Food Canada, Canada
  • Keith Glen Newton, Agriculture and Agri-Food Canada, Canada
  • Thomas Edge, Environment and Climate Change Canada, Canada
  • James Macklin, Agriculture and Agri-Food Canada, Canada

Short Abstract: Through the Genomics Research and Development Initiative, the EcoBiomics Project focuses on the urgent need to better understand the extent and significance of ongoing changes to microbial and invertebrate biodiversity in the soil and aquatic ecosystems in response to anthropogenic stressors. To address the need for sustainability of ecosystem services and economically important natural resources such as fisheries, forests, and agriculture, the Government of Canada has engaged eight science-based departments and agencies to collaboratively address this challenge. The EcoBiomics project has three overarching objectives: i) Develop standard methods for sample collection, DNA extraction, next-generation DNA sequencing, and a federal bioinformatics platform for harmonizing the analysis of metagenomics data across federal departments, ii) Pilot genomic observatories for establishing comprehensive metagenomics baselines at nationally important long-term environmental monitoring sites in Canada, iii) Apply next generation sequencing to comprehensively characterize aquatic microbiomes, soil microbiomes, and invertebrate zoobiomes and test hypotheses for improving environmental monitoring, assessment, and remediation activities for water quality and soil health across Canada. Through the use of robust and standardized metadata, protocols, and data analysis methods, the EcoBiomics project will contribute high quality datasets that will enable environmental monitoring initiatives and comparative studies using geographically diverse sites.

B-599: Utilizing Longitudinal Gut Microbiome Profiles to Predict Allergies via Autoencoder and Long Short-Term Memory Network
  • Ahmed Metwally, University of Illinois at Chicago, United States
  • Philip Yu, University of Illinois at Chicago, United States
  • Derek Reiman, University of Illinois at Chicago, United States
  • Yang Dai, University of Illinois at Chicago, United States
  • Patricia Finn, University of Illinois at Chicago, United States
  • David Perkins, University of Illinois at Chicago, United States

Short Abstract: Deep learning has revolutionized various fields by offering incomparable strategies to extract abstract nonlinear features that are refractory to traditional methods. Specifically, Long Short-Term Memory (LSTM) networks have the ability to learn dynamic temporal behavior for a time sequence event. On the other hand, allergic asthma and food allergy are usually hard to diagnose at young ages. The inability to diagnose patients with these atopic diseases at earlier age may lead to severe complications. Recently, there have been many studies that link infant’s gut microbiome to allergy development. In this work, we investigate the use of autoencoder and LSTM to predict various types of allergies for young babies (0-3 years) from a subject’s longitudinal microbiome profiles of stool samples. Our results demonstrate the proper use of the proposed model and show the significant increase in predictive power compared to SVM and logistic regression.

B-600: The effect of host sex on gut microbiome alterations in response to the xenobiotic risperidone
  • Samantha Atkinson, Medical College of Wisconsin, United States
  • Tomye Ollinger, Medical College of Wisconsin, United States
  • Justin L. Grobe, University of Iowa, United States
  • John R. Kirby, Medical College of Wisconsin, United States

Short Abstract: Risperidone, a commonly prescribed antipsychotic, causes weight gain in humans and mice. We have shown that risperidone-induced shifts in the gut microbiome are mechanistically involved in this weight gain phenotype. It is known that males and females have inherent differences in their microbiome; therefore, we hypothesize that risperidone alters the microbiome differently for the two sexes, which by extension differentially affects weight gain. By co-assembling metagenomic reads, assigning taxonomy, and analyzing the data with Anvi’o, we determined the microbiota of male and female mice in response to risperidone. Female mice showed a loss of species associated with a healthy gut during risperidone treatment, which correlated with a gain in body weight. Conversely, male mice did not show a loss of these organisms in response to risperidone and did not gain weight compared to controls. Assessing the functional capabilities of each gut microbiota, we observed more protein counts associated with antibiotic and secondary metabolite biosynthesis in male mice compared to female mice in response to risperidone. We speculate that the gut microbiome of male mice is protective against risperidone-induced weight gain through a fitness advantage due to increased antibiotic and metabolite biosynthesis. Future biological and informatics approaches will explore this hypothesis.

B-601: Phylogenetic placement of exact amplicon sequences improves associations with clinical information
  • Stefan Janssen, University of California San Diego, United States
  • Daniel McDonald, UCSD, United States
  • Antonio Gonzalez, UCSD, United States
  • Lingjing Jiang, UCSD, United States
  • Zechnjiang Zech Xu, UCSD, United States
  • Kevin Winker, University of Alaska Museum and Department of Biology and Wildlife, Fairbanks, Alaska, USA, United States
  • Deborah M Kado, University of California, San Diego, United States
  • Eric Orwoll, Oregon Health & Science University, United States
  • Mark Manary, Washington University in Saint Louis, United States
  • Siavash Mirarab, UCSD, United States
  • Rob Knight, UCSD, United States

Short Abstract: Recent algorithmic advances in amplicon-based microbiome studies enable inference of exact amplicon sequence fragments. These new methods allow for investigation of sub-operational-taxonomic-units (sOTU) by removing erroneous sequences. However, short DNA sequence fragments, do not contain sufficient phylogenetic signal to reproduce a reasonable tree, introducing a barrier in the utilization of critical phylogenetically-aware metrics, like Faith's PD or UniFrac. Although fragment insertion methods do exist, these methods have not been tested for sOTUs from high throughput amplicon studies when inserting against a broad reference phylogeny. We benchmark the SATé-enabled phylogenetic placement (SEPP) technique explicitly against 16S V4 sequence fragments, and show that it outperforms the conceptually problematic but often used practice of reconstructing de novo phylogenies. In addition, we provide a BSD-license QIIME2 plugin (https://github.com/biocore/q2-fragment-insertion) for SEPP and integration into the microbial study management platform QIITA. The move from OTU-based to sOTU-based analysis, while providing additional resolution, also introduces computational challenges. We demonstrate that one popular method of dealing with sOTUs (building a de novo tree from the short sequences) can provide incorrect results in human gut metagenomic studies, and show that phylogenetic placement of the new sequences with SEPP resolves this problem while also yielding other benefits over existing methods.

B-602: ATGC Database and ATGC-COGs: an Updated Resource for Micro- and Macro-evolutionary Studies of Prokaryotic Genomes and Protein Family Annotation
  • David Kristensen, University of Iowa, United States

Short Abstract: As genome sequencing continues at its inexorably rapid pace, the resulting information provides unprecedented glimpses into the diversity of life in our biosphere. Numerous novel organisms, with new metabolic pathways and unique nanostructural designs continue to be discovered, and along with that breadth of diversity, as the depth of coverage also increases and multiple genome assemblies become available for a greater number of organisms, pangenomic analysis provides for detailed studies of microevolutionary change. The ATGC (Alignable Tight Genomic Clusters) database is a collection of data for closely related prokaryotic genomes that provides several tools to aid research into evolutionary processes in the microbial world. These clusters, which contain millions of proteins from thousands of genomes organized into hundreds of clusters, are objectively defined based on local gene order (synteny) and synonymous substitutions in the protein-coding genes, since in the realm of small evolutionary distances where traditional phylogenetic markers such as 16S rRNA become useless, these criteria become extremely useful (and are far more useful than raw DNA similarity). As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g., species), whereas the entire collection of ATGCs is useful for macroevolutionary studies.

B-603: Microbiome alteration in Inflammatory Bowel Disease and Depression comorbidity.
  • Pedro Morell, Denmark Technical University Bioinformatics, Denmark
  • Haja Kadarmideen, Denmark Technical University Bioinformatics, Denmark

Short Abstract: Patients of Inflammatory Bowel Disease (IBD) show an increased tendency to develop depression symptoms than healthy individuals. The traditional explanation for this fact have been that being chronically ill has a negative effect on the mental health of the patient. However, nowadays the link between depression and the microbiome is quite well defined, and being IBD also closely related to microbiomic structure, our focus was to analyze which alterations in the microbiome explain this comorbidity. To achieve that we used data from the Integrative Human Microbiome Project that included several stool samples from 70 patients with Chron Disease, Ulcerative Colitis and control and a mental health questionnaire all of them answered. A Machine Learning approach was used to determine which are the bacteria that most influence the different phenotypes. We identified several significantly distinct taxons. Afterwards, we analyzed the metabolic pathways Burkholderiales bacterium 1_1_47, since it's only present on patients of IBD that don't show symptoms of depression. This results could help us understand better the relationship between the microbiome and the brain, and why some cases of dysbiosis drive to mental health problems and others don't.

B-604: Non-digestible carbohydrates cause distinct and targeted shifts in the human microbiome as revealed by longitudinal clinical studies
  • Jonathan Leff, Kaleido Biosciences, United States
  • Adarsh Jose, Kaleido Biosciences, United States
  • Jie Tan, Kaleido Biosciences, United States
  • Steven Fukuda, Kaleido Biosciences, United States
  • Michael Mahowald, Kaleido Biosciences, United States
  • Ruth Thieroff-Ekerdt, Kaleido Biosciences, United States

Short Abstract: Non-digestible carbohydrate (NDC) intake is associated with beneficial health outcomes. Several lines of evidence suggest that NDCs modulate the microbial community in the gastrointestinal track, promoting microbes with beneficial traits. However, the extent to which different NDCs modulate microbiome and their temporal dynamics remains unclear. We designed three complementary studies to thoroughly characterize the effects of two NDCs, oligofructose (FOS) and polydextrose (PDX), on the structure and function of human microbiome. Using dense time-series metagenomic data, we determined responses to the NDCs in two studies and validated them in a third cross-over study featuring both. We observed that FOS and PDX had distinct effects on microbiome structure, with FOS decreasing and PDX increasing diversity. FOS consistently promoted Bifidobacterium spp. while PDX promoted Parabacteroides spp. among other significant taxa shifts. These shifts were recapitulated in the cross-over study independent to the order of NDC intake. Taxonomic shifts were further linked to changes in functional gene profiles, suggesting differences in carbohydrate utilization and metabolic capacity. Taken together, these results demonstrate that NDCs can modulate the human gut microbiome in distinct, targeted, and predictable manners, which could be used to promote specific health-related outcomes. These important findings support further research with proprietary compounds.

B-605: PopPhy-CNN: A Convolutional Neural Network Approach Using Embedded Phylogenetic Trees for Analyzing the Association of Host Microbiome and Phenotype
  • Derek Reiman, University of Illinois at Chicago, United States
  • Ahmed Metwally, University of Illinois at Chicago, United States
  • Yang Dai, University of Illinois at Chicago, United States

Short Abstract: Accurate prediction of the host phenotype from a metagenomic sample and identification of the associated bacterial markers are important in metagenomic studies. We introduce PopPhy-CNN, a novel convolutional neural networks (CNN) learning architecture that effectively exploits phylogenetic structure in microbial taxa. PopPhy-CNN provides an input format of 2D matrix created by as an image of the phylogenetic tree that is populated with the relative abundance of microbial taxa in a metagenomic sample. This conversion empowers CNNs to explore the spatial relationship of the taxonomic annotations on the tree and their quantitative characteristics in metagenomic data. PopPhy-CNN is evaluated using three metagenomic datasets of moderate size. We show the superior performance of PopPhy-CNN compared to random forest, support vector machines, LASSO and a baseline 1D-CNN model constructed with relative abundance microbial feature vectors. In addition, we design a novel scheme of feature extraction from the learned CNN models and demonstrate the improved performance when the extracted features are used to train support vector machines. PopPhy-CNN facilitates not only the retrieval of informative microbial taxa from the trained CNN models but also the visualization of the taxa on the phylogenetic tree.

B-606: NBBD: A Network-based Biomarkers Discovery Framework from Metagenomics Data
  • Mostafa Abbas, Qatar Computing Research Institute, Qatar
  • Vasant Honavar, The Pennsylvania State University, United States
  • Yasser El-Manzalawy, The Pennsylvania State University, United States

Short Abstract: Biomarkers discovery is one of the most successful means for translating genomic data into clinical practice. Changes in microbial compositions in the gut have been associated with disease states such as Type 2 Diabetes (T2D), Obesity, and Inflammatory Bowel Disease (IBD). Reliable identification of the most informative features (i.e., microbes) for discriminating metagenomics samples from two or more groups (i.e., phenotypes) is a major challenge in computational metagenomics. In this work, we propose a comparative network-based framework for detecting biomarkers from metagenomics data. Our framework has two customizable components: i) A network inference component, which applies any existing tool for inferring ecological networks from the abundances of microbial operational taxonomic units (OTUs); ii) A node importance scoring component, which compares constructed networks for two phenotypes and scores each node based on a measure of the change in its topological properties in the two networks. Our preliminary results for identifying biomarkers for IBD using a large cohort dataset of 657 and 316 IBD and healthy controls metagenomic samples (respectively) show that our network-based approach is very competitive with some state-of-the-art feature selection methods including the widely used method based on random forest variable importance scores.

B-607: Identifying important uncharacterized genes using metagenomes and metatranscriptomes
  • Gholamali Rahnavard, Broad Institute of MIT and Harvard, United States
  • Afrah Shafquat, Harvard University, United States
  • Kevin Bonham, Harvard University, United States
  • Himel Mallick, Harvard University, United States
  • Eric Franzosa, Harvard University, United States
  • Curtis Huttenhower, Harvard University, United States

Short Abstract: The discovery of novel microbial genes from metagenomes and, increasingly, metatranscriptomes has outpaced our ability to functionally characterize those genes. In this work, we present PPANINI (Prioritization and Prediction of functional Annotation for Novel and Important genes via automated data Network Integration), a method to prioritize genes based on an “importance” score calculated across microbial communities. We validated PPANINI by, first, assessing homologs of known essential genes, achieving high accuracy (e.g. AUC=0.74, 0.82, and 0.94). This was true across a range of microbial habitats, including four human body sites (skin, vagina, gut, and mouth), marine, and prairie soil metagenomes. Applying the method to these environments prioritized in total 463,044 novel and 274,913 uncharacterized gene families, in addition to 124,332 already-characterized genes. These differed strikingly from isolate genome analysis, with 722,304 gene families identified based solely on metagenomes. Finally, applying PPANINI to the Crohn’s disease metatranscriptome revealed enriched functional categories important in the disease, including viral release from host cells. This method thus provides an efficient strategy to identify potentially important, undercharacterized genes from microbial communities, paving the way for improved bioinformatic and biochemical characterization efforts. http://huttenhower.sph.harvard.edu/ppanini.

B-608: SigmaW: Utilizing Amazon Web Service Cloud Computing to Enhance Taxonomic Profiling in Metagenomics Analysis
  • Dylan Lawrence, Program of Computational and Systems Biology, Washington University in St. Louis, United States
  • Taehee Kim, Hanyang University, South Korea
  • Matthew Mosior, Program of Bioinformatics and Computational Biology, Saint Louis University, United States
  • Chongle Pan, Oak Ridge National Laboratory, United States
  • Mina Rho, Hanyang University, South Korea
  • Tae-Hyuk Ahn, Saint Louis University, United States

Short Abstract: For the last decade, a cultivation-independent metagenomics approach, in which the entire set of microorganisms in a sample are directly sequenced together, has been immensely applied to understand the crucial roles of microbes on human health. In previous work, Sigma was proposed for strain-level identification and quantification of microbes using their reference genomes in metagenomic analysis. Here we present SigmaW, a fast and accurate taxonomy profiler for metagenomic analysis on cloud computing. SigmaW uses Amazon Web Services (AWS) to provide its primary cloud computing capabilities. Cloud computing allows SigmaW to become more user friendly by providing users a quick and easy way of running the metagenomics profiling tool without undertaking the initial software setup and command line program execution. Elastic Beanstalk (EB), Relation Database Service (RDS), and EC2 are the central services adapted in SigmaW. In addition, the small size of NCBI reference genomes enabled a quick analysis of the metagenomic datasets to get a sketch of microbiome compositions. The algorithm performance was evaluated using simulated mock communities and human microbiome samples.

B-609: Comparison of read-based annotation and assembly methods for taxonomic and functional inferences in shotgun metagenomic data
  • Mark Maienschein-Cline, University of Illinois at Chicago, United States
  • George Chlipala, University of Illinois at Chicago, United States
  • Zhengdeng Lei, University of Illinois at Chicago, United States
  • Pinal Kanabar, University of Illinois at Chicago, United States
  • Hong Hu, University of Illinois at Chicago, United States
  • Kyle Wong, University of Illinois at Chicago, United States

Short Abstract: Shotgun metagenomic sequencing creates incredibly rich datasets, with abundant information regarding the different organisms and biological functions present in a system. Devising methods for accurately measuring this information in a computationally tractable manner remains a tremendous challenge due to the large diversity of these datasets. There are generally two approaches, albeit with many alternative strategies for each: (1) read-based annotation methods, which annotate each raw read separately against a database, and (2) assembly-based methods, which first perform a de-novo assembly of raw reads and then annotate the resulting contigs. Here, we compare these two approaches, looking at the computational requirements and scaling of each strategy, the diversity and types of results obtained, and the quantitative similarities in specific estimates, such as taxonomic abundance, that can be measured from each approach. Furthermore, we provide general guidelines about what types of experimental questions would be best addressed by read-based annotation versus assembly.

B-610: Microbiome Analysis of Styrofoam Consuming Mealworms
  • Samantha Sevilla, George Mason University, United States
  • Masoomeh Sikaroodi, George Mason University, United States
  • Elizaveta Plis, Mantua Elementary School, United States
  • Ilya Manukhov, Moscow Institute of Physics and Technology, Russia
  • Patrick Gillevet, George Mason University, United States
  • Ancha Baranova, George Mason University, United States

Short Abstract: Advances in sequencing has led to an improved understanding, and promise, of microbial community’s environmental impact. Polystyrene (PS), a biodegradation-resistant material, commonly known as Styrofoam, can be used as a carbon source for microorganisms, however, its high molecular-weight limits its use as a substrate. Recently, mealworms have demonstrated the ability to consume PS, and we have shown increases in consumption when conditioned on a high-sugar diet. As the exact mechanism of degradation is unclear, 16S rRNA sequencing was performed to compare the fecal and gut microbiomes of two diet-conditioned mealworm groups after 4-time points (Day0, Day5, Day8, Day12) of exposure to a PS-only diet. No significant differences (Shannon Index, p=0.88) were found between sample input, demonstrating fecal materials representation of the gut microbiome. Significant differences (Shannon Index, p=7.88E-5) were found between those on a sugar-rich (dry apple slices) and sugar-poor (rice bran) diet. Amongst collection timepoints, the strongest differences were noted between Day 0/12 and Day 8/12 (Shannon Index, p=3E-5 and p=6.7E-3). These findings may be indicative of a rapid adaptation to the changing food sources, and future studies include culturing overrepresented species, to provide a detailed characterization of the community capable of degrading the Styrofoam and, possibly, other plastics.

B-611: Profiling short-read microbiome data using population-based k-mer identification
  • Golestan Sally Radwan, Royal Holloway University of London, United Kingdom
  • Dr Hugh Shanahan, Royal Holloway University of London, United Kingdom

Short Abstract: Most studies of microbiome function rely on identifying individual species (or higher taxonomic ranks), mainly through the use of marker genes, followed by functional annotation and/or phylogenetic mapping. This approach can suffer from loss of information due to the lack of reference genomes, and is computationally very expensive. In this study, we treat a microbiome as a population of genes/proteins and identify overrepresented protein families in the whole sample as opposed to individual species. This approach obviates factors such as horizontal gene transfer or unculturable species by assessing the behaviour of the entire microbiome in response to different conditions. Using a k-mer based pipeline to find frequently-occurring motif fragments in short-read data from lean, overweight, and obese twins, we construct functional groupings which show the most influential functions in all three cases. This is done without the need for global alignment, assembly or reference genomes, and a typical run takes approximately 3 hours on a standard laptop. Preliminary results have shown a much greater diversity of influential functional groupings in obese twins compared to lean and overweight ones. We also found around 185 potential candidates for novel protein families which warrant further experimental investigation.

B-612: Gaussian process models for microbial dynamics in the expanded Human Microbiome Project
  • Jason Lloyd-Price, Harvard University, United States
  • Anup Mahurkar, Institute for Genome Sciences, United States
  • Gholamali Rahnavard, Broad Institute of MIT and Harvard, United States
  • Jonathan Crabtree, Institute for Genome Sciences, United States
  • Joshua Orvis, Institute for Genome Sciences, United States
  • A. Brantley Hall, Harvard University, United States
  • Arthur Brady, Institute for Genome Sciences, United States
  • Heather H. Creasy, Institute for Genome Sciences, United States
  • Carrie McCracken, Institute for Genome Sciences, United States
  • Michelle Giglio, University of Maryland School of Medicine, United States
  • Daniel McDonald, UCSD, United States
  • Eric A. Franzosa, Harvard University, United States
  • Rob Knight, UCSD, United States
  • Owen White, Institute for Genome Sciences, United States
  • Curtis Huttenhower, Harvard University, United States

Short Abstract: Multiple molecular data types are increasingly used to study microbial community dynamics over time, for example in the NIH Human Microbiome Project (HMP). We have developed a set of complementary multi'omic longitudinal models for such data, including Gaussian Processes (GPs) with a Beta-Binomial likelihood appropriate for microbial communities' technical zeros, sequencing depth, overdispersion, and compositionality. Using GPs, we present new findings from a dramatic expansion of shotgun metagenomes (now ~2,400 samples) from the HMP (“HMP1-II”). We partitioned variance of microbial taxa and metabolic processes into host-specific, temporally-variable, and rapidly-variable subsets. We found that species abundances in the gut were highly individualized, with the Bacteroidetes phylum exhibiting highly individualized abundances, while Firmicutes tended to be shared among individuals with varied abundance over time. Microbes at other sites did not exhibit such a phylum-level distinction, and were less personalized than the gut. Meanwhile, metabolic pathways were not personalized despite being encoded by personalized microbial communities, indicating that community assembly may be mediated by the need for keystone functions rather than particular taxa. The results and framework presented here will enable further in-depth characterizations of the dynamics of the microbiome, particularly as longitudinal datasets become more widely available in the field.

B-613: Identifying novel lateral gene transfer events from assembled metagenomes
  • Tiffany Hsu, Harvard University, United States
  • Eric Franzosa, Harvard University, United States
  • Dennis Wong, Dalhousie University, Canada
  • Chengwei Luo, Harvard University, United States
  • Robert Beiko, Dalhousie University, Canada
  • Morgan Langille, Dalhousie University, Canada
  • Curtis Huttenhower, Harvard University, United States

Short Abstract: Lateral gene transfer (LGT) is an important mechanism for genome diversification in microbial communities, including the human microbiome. While methods exist to identify LGTs from sequenced isolate genomes, identifying LGTs from community metagenomes remains an open problem. To address this, we developed WAAFLE: the Workflow to Annotate Assemblies and Find LGT Events. WAAFLE integrates gene sequence homology and taxonomic provenance to identify metagenomic contigs explained by pairs of microbial clades but not by single clades (i.e. putative LGTs). It also rules out alternative explanations such as gene deletion and misassembly. We validated our approach on synthetic contigs containing spiked LGTs: WAAFLE identified challenging intra-genus LGTs with 51% sensitivity, other LGTs with >91% sensitivity, and was >99.9% specific. We then applied WAAFLE to 138 million contigs from 2,289 assembled human metagenomes (the HMP1-II dataset), revealing 393 thousand novel LGTs (182±173 per metagenome, mean±SD). These were enriched in the oral and gut body sites (compared to skin and vagina) and among phylogenetically related taxa. Transferred functions were enriched for known mobile elements as well as outer membrane proteins, such as TonB receptors. Hence, WAAFLE is a powerful and useful approach for profiling LGTs in microbial communities.