View Posters By Category
Session A: (July 22 and July 23)
Session B: (July 24 and July 25)
Presentation Schedule for July 22, 6:00 pm – 8:00 pm
Presentation Schedule for July 23, 6:00 pm – 8:00 pm
Presentation Schedule for July 24, 6:00 pm – 8:00 pm
Session A Poster Set-up and Dismantle
Session B Poster Set-up and Dismantle
Short Abstract: With the expansion of microbiome sequencing globally, a key challenge is to relate new microbiome samples to the existing space of microbiome samples. Here, we present Microbiome Search Engine (MSE), which enables the rapid search of query microbiome samples against a large, well-curated reference microbiome database organized by taxonomic similarity at the whole-microbiome level. Tracking the microbiome novelty score (MNS) over 8 years of microbiome depositions based on searching in more than 100,000 global 16S rRNA gene amplicon samples, we detected that the structural novelty of human microbiomes is approaching saturation and likely bounded, whereas that in environmental habitats remains 5 times higher. Via the microbiome focus index (MFI), which is derived from the MNS and microbiome attention score (MAS), we objectively track and compare the structural novelty and attracted-attention scores of individual microbiome samples and projects, and we predict future trends in the field. For example, marine and indoor environments and mother-baby interactions are likely to receive disproportionate additional attention based on recent trends. Therefore, MNS, MAS, and MFI are proposed “alt-metrics” for evaluating a microbiome project or prospective developments in the microbiome field, both of which are done in the context of existing microbiome big data.
Short Abstract: Chicken cecum microbiome population plays a significant role in host performance, digestion, absorption of nutrients and defense against pathogens. Exploring the topological difference in microbial composition is important in understanding the role of each microbial member in host health and overall production. Here, we analyze Ethiopian indigenous chicken cecum microbiome from two distinct geographical zones: Afar (AF) district (Dulecha, 730m above sea level) and Amhara (AM) district (Menz Gera Midir, 3300m above sea level) using whole metagenome sequencing approach. We found that microbial populations were dominated by Bacteroidetes, Firmicutes, Proteobacteria, and other unclassified organism. We identified 2210 common genes between the two groups and compared difference in microbial abundance. Linear discriminant analysis effect size (LEfSe) shows that Coprobacter, Geobacter, Cronobacter, Alloprevotella, and Dysgonomonas are significantly abundant in AF. Pathway analysis showed that metabolism, genetic information processing, environmental information processing, cellular process are significant. Functional abundance between the two sample groups was found to be associated with the nutrient absorption and microbial localization. Finally, the cecum microbial study reported here is a valuable resource in understanding the genetic profiles and the role played by cecum microbiome in chicken adaptation to different environmental and climatic conditions
Short Abstract: The microbiota composition of mosquitoes is influenced by their aquatic breeding environment. The exact factors that define the structure of the mosquitos’ microbiota including malaria vectors are currently unknown. The inventory of the bacterial populations, was conducted to better understand their frequencies distribution in larvae and in their respective breeding sites water. We collected 149 samples in which we identified 11 species, 49 genera and 82 strains. The bacterial species populations were composed of Bacillus sp. Lysinibacillus sp, Paenbacillus sp and Sobacillus sp in Nanguilabougou. The species Bacillus anthracis and Bacillus thuringiensis were present only in An. coluzzii in Nanguilabougou and Kouroubabougou. Bacillus cereus, Bacillus amyloliquefacins and Bacillus subtilus were observed only in Nanguilabougou and in An. coluzzii. Enterobacter cloacae and Enterobacter ludwigii were present in An. gambiae with the S1/S1 TEP1 genoptype in Nanguilabougou. Bacillus cereus, Bacillus amyloliquefaciens and Bacillus subtilis were more associated with the TEP1 genotype of S1/S1 in An. gambiae in the village of Kouroubabougou. These results will have significant potential in developing a biological control method based on further investigation of the ability of these bacteria to selectively infect and kill An gambiae s.l species at the larval stages.
Short Abstract: Motivation: Microbial ecological patterns exhibit high inter-subject variation, with few operational taxonomic units (OTUs) for each species. To overcome these issues, non-parametric approaches, such as the Wilcoxon rank-sum test, have often been used. However, these approaches only utilize the ranks of observed relative abundances, leading to information loss, and are associated with high false-negative rates. In this article, we propose a phylogenetic tree-based microbiome association test (TMAT) to analyze the associations between microbiome OTU abundances and disease phenotypes. Phylogenetic trees illustrate patterns of similarity among different OTUs, and TMAT provides an efficient method for utilizing such information for association analyses. The proposed TMAT provides test statistics for each node, which are combined to identify mutations associated with host diseases. Results: Statistical power estimates of TMAT were compared with existing methods using extensive simulations based on real absolute abundances. Simulation studies showed that TMAT preserves the nominal type-1 error rate, and estimates of its statistical power generally outperformed existing methods with regard to the considered scenarios. Furthermore, TMAT can be used to detect phylogenetic mutations associated with host diseases, providing more in-depth insight into bacterial pathology. Availability: TMAT was implemented in the R package. Detailed information is available at http://healthstat.snu.ac.kr/software/tmat.
Short Abstract: Fecal microbiota transplant (FMT) of human fecal samples to germ-free (GF) mice is useful for establishing causal relationships between altered gut microbiota and human phenotypes. However, due to intrinsic differences between human and mouse intestines, and distinct diets of choice between the two organisms, replicating human phenotypes in mouse through FMT is not guaranteed. By comparing gut microbiota profiles in 1,713 human-mouse pairs, we found strikingly on average <50% of the human gut microbes can be re-established in mice at the species level; among which, more than 1/3 have undergone significant changes (referred as to “variable microbes”), most of which were consistent across multiple human-mouse pairs and experimental settings. Consistently, one-third of human samples had changed their enterotypes, i.e. significant changes in their leading species after FMT. Mice fed with controlled diet showed significant decrease in the enterotype change rate (~25%) as compared those with non-controlled diet (~50%), suggesting a possible solution for rescue. Strikingly, most of the variable microbes have been implicated in human diseases, with some being recognized as causing species. Our results highlighted the challenges of using mouse model in replicating human gut microbiota-associated phenotypes and call for additional validations after FMT.
Short Abstract: Host microbiome is composed of a wide variety of microorganisms, essential for maintaining human health and preventing diseases. Accurate estimation and inference of microbiome composition and function is essential for understanding their patho-physiological implications. A plethora of computational methods exists to determine microbial taxonomy profiles and functions accurately. However, their integration, parameterization and optimization is challenging with no best solution. In order to facilitate this sbvIMPROVER organized a community challenge on Microbiomics: Microbiota composition prediction. As part of this challenge we designed a highly accurate and effective computational pipeline to recover relative abundance and taxonomy assignment of bacterial communities. Our approach consists of three steps. First, a thorough quality assessment and read filtering, followed by kmer based classification using Kraken which assigns a taxonomic label. Bracken (Bayesian Re-estimation of Abundance with Kraken) is then used for estimating abundance estimation. Finally, a relative abundance at phylum, genus and species level is calculated using in-house python modules. Results were tested against a blinded gold standard. Evaluation was done using qualitative/ binary classification metric (F1 score) and quantitative/ abundance metrics (L1-norm and weighted UniFrac). Our computational approach performed significantly better than others in all evaluation metrics and outperformed all other eight submissions.
Short Abstract: Though Microbial Dark Matter (MDM) has been uncovered in a wide range of habitats, few studies have explored beyond abundance and distribution patterns, leaving the ecological role of MDM a mystery. To understand the potential ecological contributions of MDM, it is essential to first understand how these unknown species impact neighboring microbes and their respective environment. Here, we establish a method to predict the ecological significance of MDM using microbial correlation networks of four extreme aquatic environmental categories- Hot Springs, Hypersaline, Deep Sea, and Polar- compiled together from 45 publicly available 16S rRNA studies of 1086 environmental samples. Networks were constructed including and excluding MDM at multiple taxonomic levels for each of the four environments. Network centrality measures were used to quantitatively compare between networks. Due to the significant changes to closeness and betweenness centralities of other microbes in the absence of MDM in the Deep Sea, Polar, and Hypersaline communities, MDM appear to play necessary ecological roles. Interestingly, microbial taxa were shown to predominantly occur as hubs across all environments. We show that MDM, by their interactions with other microbes, are integral, highly adapted to extreme environments, and can be used to detect novel genes and pathways of adaptation.
Short Abstract: Metagenomic contig binning is an important computational problem in metagenomic research. Unlike classical clustering problem, contig binning can utilize known relationships among some of the contigs or the taxonomic identity of some contigs. However, the current binning methods do not make full use of the additional biological information except the coverage and sequence composition of the contigs. We developed a novel contig binning method, SolidBin, based on semi-supervised spectral clustering. Using sequence feature similarity and/or additional biological information, such as the reliable taxonomy assignments of some contigs, SolidBin constructs two types of prior information: must-link and cannot-link constraints. Must-link constraints mean that the pair of contigs should be clustered into the same group, while cannot-link constraints mean that the pair of contigs should be clustered in different groups. These constraints are then integrated into a classical spectral clustering approach, normalized cut (NCut). The performance of SolidBin is compared with five state-of-the-art genome binners on five next-generation sequencing (NGS) benchmark datasets including simulated multi- and single-sample datasets and real multi-sample datasets. The experimental results show that, SolidBin has achieved the best performance in terms of F-score, ARI and NMI, especially while using the real datasets and the single sample dataset.
Short Abstract: Many human diseases, especially inflammatory ones, are associated with adverse changes in microbiome composition (dysbiosis). These alterations are apparent from the taxonomic structure of the microbiome, which can in turn be monitored using a combination of next-generation sequencing (NGS) data and the appropriate downstream computational analysis. In 2018, the sbv IMPROVER Microbiomics Challenge was conducted to identify which computational pipelines are most accurate to re-construct the taxonomic profile of a set of real and simulated NGS datasets characterized by low and high complexity and AT/GC-rich biases. Among all the participants, the most accurate pipeline employed a combination of the tools Kraken and Bracken. To strengthen our conclusions, we also (1) examined the impact of key components of the analysis pipeline on the Kraken-Bracken pipeline performance, (2) extended the list of benchmarked taxonomy profiler tools and versions tested on shotgun metagenomics data representative of samples from various origins (e.g., human, environmental) with different levels of complexity, and (3) investigated the performance of consensus taxonomy profiling using ensemble methods (Wisdom of Crowds) compared with that of individual methods. This analysis allows us to understand specific aspects of metagenomic data analysis and the applicability of the different methods depending on the context.
Short Abstract: The human gut microbiome is estimated to harbor over 2 million unique protein-coding genes. Only a fraction of them is experimentally annotated and therefore require computational predictions. Community-wide experiments such as CAFA show that homology-based function annotation approaches are lacking and require more sophisticated approaches. We use protein families to predict residue-residue contacts and use them as constraints for the de novo structure predictions. The predictions are carried out using World Community Grid Microbiome Immunity Project (https://www.worldcommunitygrid.org/research/mip1/overview.do). Anyone can donate their spare computational time to the project. Until now, during 1.5 years of project duration, we were able to generate over 160,000 unique structural models, each representing a different gene family. Thus, effectively doubling the number of available protein structures. 3D structural models, instead of sequences, serve then as inputs for a deep learning-based function prediction method we developed. This approach enables us to achieve state-of-the-art accuracies in predicting gene ontology terms. We are now in a position to functionally annotate microbial genomes and metagenomes with higher coverage and accuracy. We may also start addressing microbe-microbe and host-microbiome protein-protein interactions to determine the mechanisms of microbiota-induced immune response.
Short Abstract: A major challenge for microbiome research is the functional interpretation of whole genome shotgun sequencing data derived from bacteria. Functional interpretation may be achieved by identifying biosynthetic gene clusters (BGCs), which are co-localized groups of genes that encode a biosynthetic pathway capable of producing functional metabolites. DeepBGC is a bidirectional long-short term memory (BiLSTM) recursive neural network that was developed to identify BGCs using a training data set of previously published BGCs and artificially constructed non-BGCs, followed by parameter tuning using nine bacterial genomes with embedded BGCs. The genes comprising BGCs were annotated for protein family (pfam) domains and then a novel algorithm, pfam2vec, converted these domains to numeric vectors for DeepBGC input. DeepBGC outperformed Hidden Markov Models (i.e. ClusterFinder) when evaluated on a hold-out validation data set of six bacterial genomes annotated for BGCs (AUC 0.923 vs. 0.847). Finally, DeepBGC also demonstrated superior performance when predicting BGCs of classes removed from the training data (e.g. RiPPs, AUC 0.910 vs. 0.738). In conclusion, DeepBGC is distributed under an MIT open source license and facilitates functional interpretation of metagenomic experiments, as well as the identification of new natural products that represent drug candidates with antimicrobial or anticancer properties.
Short Abstract: Inflammatory bowel diseases (IBD) constitute a spectrum of chronic inflammatory disorders that recurrently affect the gastrointestinal tract. Endoscopy, the gold standard for IBD diagnosis and monitoring, remains limited by its low sensitivity and high variability. A growing number of reports showing the alteration of gut microbiota in subjects with IBD indicate the potential benefit of exploiting metagenomics for non-invasive IBD diagnostics. Previously identified metagenomic features allowing distinction between healthy and IBD subjects proved highly study-dependent. Thus, they are questionable for diagnostics. Using shotgun metagenomic data from fecal samples of independent clinical cohorts, our research aimed to identify metagenomic-based signatures and models predictive of IBD status. We evaluated a number of profiling techniques combined with various machine learning algorithms to extract stable discriminative features and train predictive models on one sample cohort and then applied obtained classifiers on a new independent sample cohort for validation. Our results show that different models based on various features (taxonomy, microbial pathways abundance, and others) enable high-accuracy cross-cohort classification of IBD vs. non-IBD subjects. In conclusion, our investigations demonstrate the feasibility of utilizing metagenomics data for IBD status prediction with potential applications for IBD diagnostics.
Short Abstract: Horses are uniquely sensitive to dietary change and prone to dysbiosis related conditions such as colic and laminitis, thus the gut microbiome plays a key role in supporting and regulating homeostasis in the horse. While specific parameters of the “healthy microbiome” remain to be defined, microbial community profiling techniques have enormously contributed to identifying alterations between healthy and unhealthy microbiomes. The Equine Microbiome Project seeks to uncover the roles that gut microbes play in health, nutrition, immunity, and disease in diverse contexts for the horse. To understand these effects, over 200 horse owners have donated fecal samples and have reported metadata about medical health history and diet habits. Fecal samples have been analyzed using a 16S rRNA technique and a custom bioinformatics pipeline. Concurrently, an entity-relationship model was designed to integrate and store the information of meta-analyses in a publicly accessible online database. From the generated microbiome data, potential prognostic indicators of nutrition-related and stress-related disorders might be characterized by using predictive biomarkers that were discovered through the analysis of shifts in the composition of the gut microbiome by utilizing machine learning algorithms.
Short Abstract: Microbes play a fundamental role in many processes and are an essential component in maintaining the earth’s ecosystem. Advances in sequencing technologies made it possible to study these complex roles, which led to the initiation of several important microbiome projects. For the analysis a large variety of computational methods have been developed. However, the plethora of these tools and the complexity of these analyses complicate the process of conducting such studies tremendously. Although, there exist certain guidelines for metagenomics and 16S rRNA analyses (e.g. Critical Assessment of Metagenome Interpretation (CAMI) and The Microbiome Quality Control project (MBQC)), there is no single tool that unifies all necessary steps in a common pipeline, while following these guidelines. We here present MiCroM, the first comprehensive and easy-to-use microbiome analysis pipeline that unifies a variety of state-of-the-art microbiome tools for amplicon, whole genome shotgun metagenomics and long read sequencing in a common pipeline, following CAMI and MBQC standards. Further, MiCroM provides an integrated visualization module to automatically generate interactive visualizations and statistical analyses to help scientists with the interpretation of the results. Eventually, MiCroM will facilitate the analysis of microbial data and will help to increase the reproducibility of scientific results.
Short Abstract: Microbial ecosystems play a major role in fields as diverse as human health, biowaste treatment and food production and understanding them is increasingly relevant to develop sustainable practives. High throughput sequencing allows for precise quantification of the taxa and functions present in the microbiome. Many dedicated tools have been developed in the last few years for the statistical analyses of microbiome data but the field remains very active due the challenging nature of the data. Microbiome data are high-dimensional, sparse, multivariate, highly structured, integer-valued, vary over several orders of magnitude and are subject to differences in sequencing depths. Many methods, including compositional methods, rely on log-transformation of the counts followed by standard multivariate methods designed for gaussian settings. They deal with sparsity by adding pseudo-counts to the data. In this work, we introduce a generic multivariate framework based on the Poisson log-Normal distribution where the counts are modeled directly. They are Poisson distributed conditional to latent (hidden) Gaussian variables. This probabilistic modeling can accommodate the confounding effect of known covariates, varying sample sizes (through offset terms) and mixed marker-genes (e.g. 16S and ITS). We show how it can be used to perform dimension reduction, classification and network inference.
Short Abstract: In this study, we aimed to clarify the composition and function of the bacterial flora in the digestive tract at the gene resolution level, and performed metagenomic and metatranscriptomic analysis of the intestinal bacterial flora in the digestive tract of common marmosets. The combination of metagenomics and metatranscriptomics analysis can capture the exact gene transcriptional activity of the intestinal flora. Furthermore, in order to understand the functions of the intestinal flora more accurately, we have established computational methods not only for known bacteria but also for unknown bacteria. Cecum contents, transverse colon contents and feces were obtained by dissection of two common marmosets. After DNA extraction and total RNA extraction, gDNA-seq and mRNA-seq were performed. The total scaffold length covered 60 to 90 bacterial genomes, and the mapping rate to the reference genome constructed by metagenome assembly is approximately 90%. As a result of gene function prediction by COG and quantification of the expression level, differences among individual samples appeared in the classification of 26 functions. Clustering by each expression level of each COG gave a site-specific function common to the two individuals, and the transcriptional activity of the bacterial flora differed depending on the site.
Short Abstract: Microorganisms are important occupants in many different environments because of their functional roles. Therefore, it is necessary to identify the composition of microorganisms in environmental samples using metagenomes which is the collection of genomes of microorganisms. To this end, many taxonomy analysis tools have been developed based on different algorithms. However, the variability of analysis outputs of the existing tools from the same input metagenome datasets is a major drawback for many researchers in this field. We present a meta-analysis pipeline for metagenome taxonomy analysis, called TAMA, by integrating outputs from three different taxonomy analysis tools, CLARK, Kraken and Centrifuge. Using an integrated reference database, TAMA performs taxonomy assignment for all metagenome reads based on a meta-score by integrating scores of taxonomy assignment from the three tools. In evaluation using simulated metagenome datasets, more accurate read classification was obtained by TAMA compared with existing taxonomy analysis tools. TAMA will contribute to more accurately uncover the composition of microorganisms in metagenome samples, especially when the use of a single taxonomy analysis tool is not reliable.
Short Abstract: Metagenomic sequencing has greatly improved our ability to profile the composition of environmental and host-associated microbial communities. However, the dependency of most methods on reference genomes, which are currently unavailable for a substantial fraction of microbial species, introduces estimation biases. We present an updated and functionally extended tool based on universal (i.e., reference-independent), phylogenetic marker gene (MG)-based operational taxonomic units (mOTUs) enabling the profiling of >7,700 microbial species. As more than 30% of them could not previously be quantified at this taxonomic resolution, relative abundance estimates based on mOTUs are more accurate compared to other methods. As a new feature, we show that mOTUs, which are based on essential housekeeping genes, are demonstrably well-suited for quantification of basal transcriptional activity of community members. Furthermore, single nucleotide variation profiles estimated using mOTUs reflect those from whole genomes, which allows for comparing microbial strain populations (e.g., across different human body sites).
Short Abstract: Human gut microbiota exerts functions essential for the maintenance of host physiology. However, characterization of host-microbiota interactions remains challenging in reference-based quantitative metagenomics analyses. Taxonomic and functional analyses being realized independently, there is no link between genes and species. Although a first set of species-level bins (metagenomic species, MGS) was built by clustering co-abundant genes, no reference MGS set is established based on the most comprehensive available gut microbiota gene catalog - the Integrated Gene Catalog (IGC). Published benchmarking results focusing on the reconstruction of individual genomes have highlighted best-performing solutions but do not include methods binning co-abundant genes. In order to identify the best suitable and most accurate approach to group IGC genes, we built a simulated gene catalog based on the IGC construction workflow and benchmarked 12 taxonomy-independent binners. We used and adjusted two complementary assessment tools to evaluate binners on a non-redundant gene set. Quality assessment results show that no hybrid or abundance-based binner performs best on all metrics with our simulated catalog. Overall, the best combination of average purity and completeness per bin was achieved by integrating the results of multiple binning methods. Ultimately, this approach seems promising but can still be improved at several levels.
Short Abstract: Determining the functional profile of a sample is one of the key computational questions in metagenomics analysis. Its answer is based on a functional classification system of choice. We compare four of the most commonly used ones, namely, InterPro families, EggNOG, KEGG and SEED. By mapping sequences assigned to one classification system onto another, we find that InterPro families and EggNOG cover both KEGG and SEED very well, but not vice versa. In the poster we present a detailed comparison of the four functional classification systems by identifying the most significant similarities and differences among them.
Short Abstract: DNA methylation plays important roles in prokaryotes, and their genomic landscapes-prokaryotic epigenomes-have recently begun to be disclosed. However, our knowledge of prokaryotic methylation systems is focused on those of culturable microbes, which are rare in nature. Here, we used single-molecule real-time and circular consensus sequencing techniques to reveal the 'metaepigenomes' of a microbial community in the largest lake in Japan, Lake Biwa. We reconstructed 19 draft genomes from diverse bacterial and archaeal groups, most of which are yet to be cultured. The analysis of DNA chemical modifications in those genomes revealed 22 methylated motifs, nine of which were novel. We identified methyltransferase genes likely responsible for methylation of the novel motifs, and confirmed the catalytic specificities of four of them via transformation experiments using synthetic genes. Our study highlights metaepigenomics as a powerful approach for identification of the vast unexplored variety of prokaryotic DNA methylation systems in nature.
Short Abstract: Weaning piglets in the meat industry are susceptible to deadly Enterotoxigenic Escherichia coli (ETEC) infections. Currently when an animal develops an infection, all healthy animals are administered an antibiotic. Antibiotic use leads to antimicrobial resistance (AMR) gene build up. Growing evidence shows AMR gene transfer to humans and the environment. Preliminary results and literature suggest that probiotics help colonise the gut with beneficial bacteria, thereby hindering the establishment of pathogenic bacteria and possibly offering an effective alternative to antibiotics. We aimed to study the effects of the antibiotic neomycin and two probiotics for post-weaning diarrhea, on pig gut microbiome changes using metagenomic analysis. Stool samples (1-11, median=6) were collected from each of the 126 piglets over six weeks. An average of 5.2Gbp of metagenomic data was generated per sample for 869 samples. Co-assemblies were performed on data combined from multiple time points from the same animal. Metagenome-assembled genomes (MAGs) were constructed per-animal using time series binning with MetaBAT2 and quality checked with CheckM, producing nearly 8000 MAGs with >=90% completeness and <=5% contamination. This data will support further investigation into the effects of probiotics and antibiotics on microbiome structure and carriage of AMR genes in these pig populations.
Short Abstract: Shanty towns in Lima present an interesting environment for microbiome studies: i) Low-income populations have not been deeply studied. ii) Water flux has a direct contact with the population, because of the non-existant urban planification, exposing pipes on the streets. iii) Thanks to their periurban location, some shanty towns are placed near water treatment plants. Environmental microbiomes have a continuous interaction with human microbes populations, making environmental sampling of interest for human microbiome surveys. In this study, we obtained shotgun metagenomic data from fecal and water samples for the characterization of human and environmental microbiomes in a specific period of the year (summer season). Common species present in all samples were found, such as bacteria from the genre Bifidobacterium and Faecalibacterium. On the other hand, water samples also contained unique species, not found in fecal samples. The opposite is also true. Multivariate analysis was also performed, in which both water and human samples were found to be clustered together, therefore confirming their similarity. These results could improve human microbiome surveying in low-income locations without depending on fecal sampling.
Short Abstract: Cystic fibrosis (CF) is a life-threatening genetic disorder that is accompanied by chronic lung infections and the arising respiratory complications. In the clinic, decisions on how to manage the disorder are carried out based on cultures of microbes from the coughed-up lung sputum. However, these cultures provide limited taxonomic resolution and focus only on common pathogens. In our study, we wanted to gain a more complete picture of the lung microbiome in CF and explore its changes over time. We therefore followed four CF patients over the course of two years. During exacerbations and routine clinical visits, we collected lung sputum from these patients, isolated the DNA and subjected it to shotgun sequencing. Sequencing showed the presence of several microbes, particularly anaerobes, that were missed by culture. We classified the most abundant species on the strain level and corrected mistakes in taxonomic assignment from culture. Finally, by analyzing genomic variation and its changes over time, we detected the presence of multiple Pseudomonas aeruginosa strains in the same patient and described large-scale deletions and amplifications that followed strain-specific patterns. The study gives an insight into the advantages of longitudinal assessment of the complete lung microbiome.
Short Abstract: Metagenomics is revolutionizing the study of microbes by allowing us to investigate the 99% of uncultivatable organisms through direct sequencing. Unicellular eukaryotes play essential roles in microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts and parasites. Investigating their roles is therefore of great interest to ecology, biotechnology, medicine, and evolution. However, the generally lower sequencing coverage, their complex gene architectures, and a lack of eukaryote-specific experimental and computational procedures have kept them on the sidelines of metagenomics. MetaEuk is a toolkit for high-throughput, reference-based discovery and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk’s power to discover novel eukaryotic proteins in large-scale metagenomics data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted millions of protein-coding genes in < 60 hours on twenty 16-core servers. Most of which are diverged from known proteins and originate from sparsely sampled eukaryotic supergroups. MetaEuk is an open-source (GPLv3) software: https://metaeuk.soedinglab.org.
Short Abstract: Short read high-throughput sequencing has opened up the possibility for microbial community profiling at low costs for every lab. This is typically done by amplifying and sequencing a marker gene that is omnipresent in the community of interest. By these means, microbial profiles have been generated from most of the habitats on earth, including the human body and the deep sea. The current bottleneck in microbial profiling is the computational analysis of the (potentially massive) sequence data. While software for the analysis of raw sequence data to an OTU/SV (operational taxonomic unit/sequence variant) table including taxonomic annotation exists, pipelines are either inherently inflexible, or the software requires at least basic command line skills from the user. Moreover, analyzing multiple sequencing libraries quickly becomes tedious for users not proficient in implementing e.g. bash scripts to chain steps. To serve the needs of high flexibility, ease of use and scalability in the number of samples analyzed, we developed a snakemake pipeline to analyze amplicon sequence data. The pipeline uses parallel computing and can be run on personal computers and computing servers. Moreover, all input and output files, tools and parameters used are documented, rendering the analyses fully reproducible.
Short Abstract: Diseases such as AIDS, flu, hepatitis, rabies, Ebola caused by viruses still represent a formidable threat to public health (WHO report). A better understanding of the genetics and the evolution of the viral pathogens is the basis to combat those diseases. High-throughput sequencing technique and corresponding bioinformatic tools have facilitated the viral genomic research. In viral genomics study, genome assembly, haplotype (strain) deconvolution and variant calling are essential approaches for characterizing the evolution and diversity of viral populations. The rapid and massive mutation of the virus, especially RNA virus, render a highly diverse and heterogeneous viral community of multiple strain mixture called quasispecies within the host. The existence of micro-diversity in the community poses a big challenge for the viral genomic study. In this study, we used two combinations of two different HCMV strains in different mixing ratios and evaluated the performance of genome assembly, viral haplotype assembly and variant calling software on this dataset.
Short Abstract: We present a open source dockerized metagenome analysis application making use of BioContainers where available. The application can be run in a web environment as well as locally in a Windows, Apple or Linux environment. It handles long and short read data and has been tested on PacBio and Illumina whole metagenome sequence reads. In its basic form it comprises quality control, taxonomic classification (Kraken2, METAPhlan), statistical analysis (alpha/beta diversity, differential abundances) based on user defined parameters, as well as functional classification (HUMANn2), and visualization. An extensions with a de-novo assembly module is available. As a case study, we investigated microbial content transferred from the environment onto human palms by comparing the palm microbiome after washing with the palm microbiome after 2h movement through the city environment from 50 individuals. Whole genome sequencing was performed on Illumina, and raw data was analysed with our MetagenApp. Results indicate a shift towards higher microbiome diversity after 2 hours, but also show an enrichment of certain non-skin-related species after washing, which decreases in abundance over time.
Short Abstract: The portable ONT (Oxford Nanopore Technology) MinION has great potential for studying microbial communities directly in the field requiring software dealing with limited access to the internet and computational power restricted to laptops. Here, we present an approach meeting those requirements. Our method is designed to run on a laptop (without internet access) and aims at identifying present bacterial species in a highly accurate way. Our analysis uses LAST to perform a two-tiered translated alignment against the bacterial part of the NCBI RefSeq database. Species identification is then performed using a "gene-sequence graph" that represents genes and their adjacency along reads. To illustrate our approach we use published long-read sequencing data derived from a commercially-available mock community containing eight equally distributed bacteria and successfully analyze this data on a laptop with 8 cores and 64GB of memory. The proposed method performs an accurate real-time taxonomic classification using the comprehensive RefSeq database on a laptop. Thus, in combination with a portable sequencing device (such as an ONT MinION and MinIT), our approach opens up new possibilities for studying microbial communities directly in the field.
Short Abstract: The preservation of milk in cheesemaking results from the fermentation of lactose into lactate by specialized microbial communities. These cheese starter communities are dominated by two species, Streptococcus salivarius subsp. thermophilus and Lactobacillus delbrueckii subsp. lactis. However, little is known about the genomic diversity of these two species within cheese starter communities and how their genomic composition changes during the fermentation. Here, we used shotgun metagenomics and metatranscriptomics to follow the genomic composition and gene expression over the first 24 hours of fermentation and disentangle the functional roles of different bacterial strains and phages in the system. Preliminary analysis of the metagenomic data reveals that besides the two dominant species, several low abundant species are present that may take up important roles during the later cheese ripening. For both of the dominant species we find evidence for strain-level diversity and a surprisingly large amount of phages. Our metagenomic analysis suggests a dynamic interplay between species, strains and phages relevant for the cheesemaking process. We believe that the integration of 'omics' datasets will help our basic understanding of how these communities have evolved, how they are maintained and function in these common food processing practices.
Short Abstract: Modern sequencing technologies allow the processing of very small amounts of input DNA. In the context of metagenomic sequencing of low biomass samples, contamination cannot be ignored. Contamination may come from exogenous DNA introduced during library preparation as well as sample cross contamination. When the input target DNA is of low content, contaminants might account for a significant proportion of sequence data, challenging the accuracy of sequence data interpretation. Despite precautions in laboratory practice, proper in-silico data processing is still necessary to allow unbiased metagenomic analysis. Here we introduce an alignment-based method to diagnose and eliminate contaminants from metagenomic data. The contaminant filter employs taxonomic information from negative control samples with no intended DNA input, as references for background contamination. We address the problem of false positives by sample cross contamination that confounds the identification of contaminants. The presence of candidate contaminants in negative controls and target samples are compared, and then the final list of contaminants is determined and filtered from the community profiles. We applied the contaminant filter to a mock community dataset to demonstrate its sensitivity and accuracy.
Short Abstract: Flint is a scalable and efficient metagenomics profiling tool that is built on top of the Apache Spark framework. It is primarily designed for fast and efficient large-scale bacterial profiling of metagenomics samples in the cloud. However, it can be extended to perform a variety of metagenomic analysis tasks. In particular, metagenomic whole-genome sequencing (mWGS) data can also be used to calculate bacterial replication rates [Korem et al. 2015; Brown et al. 2016]. Here we extend the Flint system to measure bacterial growth rates in a fast and efficient way. We integrate existing tools for bacterial growth rate measurement from a single metagenomic samples into Flint’s MapReduce framework and take advantage of Flint's read alignment efficiencies to map reads and calculate bacterial growth rates, thus enabling the creation of bacterial abundance profiles that are enhanced with growth-rate information. To show the viability of our method we performed a bacterial growth analysis of stool samples from a study of longitudinally sampled preterm infants [Gibson et al. 2016], and show which type of antibiotics have the strongest effect on bacterial replication rates, and which bacteria are more susceptible to each antibiotics type.
Short Abstract: Microbes tend to organize into communities consisting of hundreds of species involved in complex interactions with each other. 16S ribosomal RNA (16S rRNA) gene profiling provides snapshots that reveal the phylogenies and abundance distributions of these microbial communities. These snapshots, when collected from multiple samples, have the potential to reveal which microbes co-occur, providing a glimpse into the network of inter-dependencies underlying these communities. The inference of networks from 16S data is prone to statistical artifacts, but the extent to which the different steps in the workflow affect the resultant network is still unclear. In this study, we perform a meticulous analysis of each step of a pipeline that processes 16S sequencing data into a network of microbial associations. Through this process, we determine the tools and parameters that generate the most accurate and robust co-occurrence networks. Ultimately, we develop a standardized pipeline that follows these default tools and parameters, but that can also help explore the outcome of any other combination of choices. We envisage that this standard pipeline for processing 16S sequencing data into networks of microbial co-occurrences could be used for integrating multiple data-sets, and for generating comparative analyses and consensus networks useful for detecting disease-related patterns.
Short Abstract: Microbial taxonomy down to the species level presents unique challenges because the main dataset availavle(eg; Silva, RDP and Greengenes) are not curated and have many errors and missing information. Here we present AnnotatIEM a new algorithm for species level annotation of operational taxonomic units (OTU) as derived from 16S rDNA sequencing. The hit selection algorithm combines the annotation output from multiple databases to ensure accurate identification of a species and reducing the potential for errors from each of them alone. Furthermore, AnnotatIEM combines and compare hit selection using the top hit and majority-hit approaches. When tested with true positives using mock community (both experimental and Virtual) the annotation accuracy is 90-95% at species level and greater than 95% in genus level. For a number of real case studies, annotation was possible for higher number of OTUs than any online and offline sequence processing tool (Eg: IMNGS, SILVA, RDP Classifier). In conclusion, a systematic bench marking shows that combination of database and novel hit selection algorithm contributes to the better performance when compared to other microbiome analysis pipelines. Generalization of AnnotatIEM is possible for other microbiome groups (eg; Fungus, virus) is possible and under development.
Short Abstract: Metagenomic and metatranscriptomic sequencing analyses have become increasingly popular tools for producing massive amounts of short-read data, often used for the reconstruction of draft genomes or the detection of (active) genes in microbial communities. Unfortunately, sequence assemblies of such datasets generally remain a computationally challenging task. Frequently, researchers are only interested in a specific group of organisms or genes; yet, the assembly of multiple datasets only to identify candidate sequences for a specific question is sometimes prohibitively slow, forcing researchers to select a subset of available datasets. Here we present PhyloMagnet, a workflow to screen meta-omics datasets for taxa and genes of interest using gene-centric assembly and phylogenetic placement of sequences. PhyloMagnet could identify up to 87% of the genera in an in vitro mock community with variable abundances, while the false positive predictions per single gene tree ranged from 0% to 23%. When applied to a group of metagenomes for which a set of MAGs have been published, we could detect the majority of the taxonomic labels that the MAGs had been annotated with. In a metatranscriptomic setting the phylogenetic placement of assembled contigs corresponds to that of transcripts obtained from transcriptome assembly.
Short Abstract: Dietary patterns can influence health-related outcomes directly, but effects may also be modulated indirectly by the gut microbiota. The extent to which the gut microbiome mediates effects of certain diets is still unclear. Moreover, the high dimensionality, sparsity, and non-normality of microbiome data requires new statistical approaches for testing potential mediation effects. We aim to assess to what extent the effect of diet on an outcome of interest is mediated through the gut microbiome or whether there is a diet-microbiome interaction that identifies subgroups of individuals who are more susceptible to diet effects than others. As an example, we are investigating the effect of a 4-week vegan diet on the diversity of microbiota and branched-chain amino acid metabolism in healthy omnivorous volunteers in a randomized study. To asses potential mediation effects of the microbiome, we apply the approach by Zhang et al (2018) based on pairwise dissimilarity in the microbiome profiles. Simultaneously, potential moderation effects are investigated by the approach of Tian et al (2014) to improve power. Together with quantification of the association of baseline microbiome with subsequent microbiome measurements, this allows to investigate the extent to which a moderating effect is erroneously taken to be a mediating effect.
Short Abstract: The global threat of antimicrobial resistance has driven the use of high-throughput sequencing techniques to monitor the profile of resistance genes in microbial populations. The human oral cavity contains a poorly explored reservoir of these genes, and little is known about their abundance and diversity, or how their profile compares with antimicrobial resistance genes in the gut. Here we analyse the resistome profiles of 790 oral cavities worldwide and compare these profiles with paired stool samples from shotgun metagenomic data. We find country-specific differences in the prevalence of antimicrobial resistance gene classes and mechanisms in oral and stool samples. Countries with a higher prevalence of resistance to antibiotic classes relative to their use, contain genes resistant to those classes that co-localise with bacteriophages, suggesting the occurrence of horizontal gene transfer of these genes. Between individuals, the oral cavity contains a significantly higher abundance, but lower diversity of antimicrobial resistance genes compared to the gut, which is likely influenced by differences in microbial hosts and mobile genetic element associations. This is the first study to date that characterises the oral cavity resistome worldwide, identifying its distinctive signatures compared to the gut, and its role in the maintenance of antimicrobial resistance.
Short Abstract: phylosmith is an R-package designed to enable reproducible and efficient analyses of microbiome data with phyloseq-class objects by providing robust and efficient functions. phylosmith utilizes the standardized data format of phyloseq and R-object accession methods to provide functions with simple and intuitive input arguments. These arguments implement various data-wrangling operations, parsing, creation of graphs, and calculation of Spearman rank co-occurrence and their networks. Presented here is an example analysis implementing the functions of the phylosmith package.
Short Abstract: We hypothesized that fish oil counteracts antipsychotics-associated metabolic dysfunctions such as obesity, inflammation and gut dysbiosis. We examined changes in the composition of the gut microbiota in B6 mice fed high fat diets supplemented without (HF) or with olanzapine (HFZ), a second-generation antipsychotic, fish oil (HFO), individually or in combination (HFOZ). We conducted metabolic phenotyping in these mice, and determined changes in inflammatory markers and gut microbiome HFO and HFOZ mice had lower body weights and greater glucose clearance compared to the HF group. HFO and HFOZ groups also exhibited lower inflammatory profile, reduced LPS Binding Protein levels compared to HF group. Gut microbiota profiles were different among the 4 groups; the Bacteroidetes-to-Firmicutes (B/F) ratio had lowest value 0.51 in HF group compared to 0.6 in HFZ, 0.9 in HFO, and 1.1 in HFOZ. Fish oil reduced obesity and its associated inflammation and increased B/F ratio and persisted in the presence of olanzapine, demonstrating its potential protective effects in subjects using antipsychotic drugs.
Short Abstract: Antimicrobial resistance (AMR) is on the rise globally and is expected to cause more than 10 million annual casualties by 2050. Appropriate patient treatment and AMR stewardship, enabled through accurate, fast and relevant AMR diagnostics, are essential to fight the rise of AMR. Currently, routine clinical diagnostics can take several days from sample to answer, or miss information required for appropriate treatment. Next-generation sequencing (NGS) based diagnostics can expedite diagnosis and potentially identify all AMR markers present in a patient sample. The set of AMR markers detected by NGS, depends on the reference repository used. Public resources like CARD, Resfams and NDARO already contain sets of AMR markers. However, there is no single comprehensive standardised resource. Here we present a combined repository, ARESfam, including a classification pipeline that uses sequence alignment and k-mers to classify known and novel AMR markers. ARESfam was built from CARD and Resfams sequences that were curated manually to guarantee high annotation quality. ARESfam has 4540 nodes supported by 5859 genetic markers and 135 variant sequences. When assessed against NDARO, the classification pipeline correctly classified 92% of NDARO markers (0.3 seconds/marker). ARESfam lends itself to the integration of NGS based diagnostics and marker discovery pipelines.
Short Abstract: Defense systems of prokaryotes protect them against an invasion of foreign DNA. The most widely known defense systems are Restriction-Modification and CRISPR-Cas systems, and new systems have been recently found (Doron et al, 2018). Defense systems are often studied on WGS data and their distribution in real communities has not been intensively studied yet. We found genes of known defense systems and phage capsid proteins in the World Ocean, using metagenomes of the Tara Oceans project (Sunagawa et al, 2015) and revealed a relationship between ecological conditions and distribution of the defense genes. Homologs of R-M systems, CRISPR-Cas, and 13 other defense systems were found in 135 samples of the Tara Oceans project. Geographically adjacent samples did not demonstrate noticeable similarity by R-M system composition. Samples from the deep oceanic layer (> 200 m) possess more numerous and diverse set of the defense systems of each type as compared to the surface samples. At the same time, phage capsid proteins do not share this trend. It implies that the benefits of defense system usage could depend not only on phage concentration itself but also some environmental factors. The work was supported by the Russian Foundation for Basic Research grant 18-34-00860.
Short Abstract: Identifying distinctive taxa for microbiome-related diseases is considered key to the establishment of diagnosis and therapy options in precision medicine and imposes high demands on the accuracy of microbiome analysis techniques. We propose an alignment- and reference- free subsequence based 16S rRNA data analysis, as a new paradigm for microbiome phenotype and biomarker detection. Our method, called DiTaxa, substitutes standard OTU-clustering by segmenting 16S rRNA reads into the most frequent variable-length subsequences. We compared the performance of DiTaxa to the state-of-the-art methods in phenotype and biomarker detection, using human-associated 16S rRNA samples for periodontal disease, rheumatoid arthritis, and inflammatory bowel diseases, as well as a synthetic benchmark dataset. DiTaxa performed competitively to the k-mer based state-of-the-art approach in phenotype prediction while outperforming the OTU-based state-of-the-art approach in finding biomarkers in both resolution and coverage evaluated over known links from literature and synthetic benchmark datasets.
Short Abstract: The intestinal microbiota is well known to play a variety of important roles for our health. For this reason, it is expected that changes in the intestinal microbiota can be used as an indicator of health, and as a biomarker for various diseases. However, it is difficult to understand the effect of the intestinal microbial community on the host. The reason is that individual variations in the intestinal microbiota due to differences in genetic background and environmental background are large. A large cohort study that collects not only intestinal microbiota, but also various phenotype metadata can provide a means to overcome these problems. We obtained intestinal microbiome data and phenotype metadata from several hundred Japanese individuals living in different areas in Japan. All processes, from sampling to bacterial composition analysis, were performed by using a single pipeline within our institute. We created script which help us to automate the whole process in QIIME1 for large cohort. Moreover, we developed an integrative database and analysis platform for microbiome and phenotypic data. Here, we introduce the detail of processing pipeline, and the integrated analysis platform and show the results of correlation between intestinal microbiota and phenotype metadata.