Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Microbiome COSI

Presentations

Schedule subject to change
Wednesday, July 15th
10:40 AM-11:20 AM
Microbiome Keynote: Assembling and modelling complex microbiomes mediating host-pathogen interactions
Format: Live-stream

  • Niranjan Nagarajan, Genome Institute of Singapore, Singapore

Presentation Overview: Show

Human and environmental microbial communities mediating host-pathogen interactions often have complex genetic architectures and dynamics. Unravelling these needs new approaches for metagenome assembly at the strain level and microbiome modelling from limited relative abundance profiles. We propose a hybrid assembly framework that leverages long read sequencing to generate high-quality, near-complete strain genomes from complex metagenomes (OPERA-MS [1]). Applying this approach to human and environmental communities enabled recovery of 100s of novel genomes, plasmid and phage sequences, direct analysis of transmission patterns and investigation of antibiotic resistance gene combinations [1, 2]. Furthermore, we show how microbial community dynamics can be modelled accurately from sparse relative abundance data (BEEM [3]), providing insights into pathogen-commensal interactions in skin dermotypes. Data from several studies tracking the transmission of multi-drug resistant pathogens across environmental and human microbiomes will be used to illustrate the utility of these methods.

[1] Bertrand et al. “Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes." 2019 Nature Biotechnology 37 (8), 937-944
[2] Chng et al. ”Cartography of opportunistic pathogens and antibiotic resistance genes in a tertiary hospital environment." 2019 BioRxiv, 644740
[3] Li et al. “An expectation-maximization algorithm enables accurate ecological modeling using longitudinal microbiome sequencing data." 2019 Microbiome 7 (1), 1-14

11:20 AM-11:30 AM
Charting the secondary metabolic diversity of 209,211 microbial genomes and metagenome-assembled genomes
Format: Pre-recorded with live Q&A

  • Justin Jj van der Hooft, Wageningen University, Netherlands
  • Satria Ardhe Kautsar, Wageningen University, Netherlands
  • Dick de Ridder, Wageningen University, Netherlands
  • Marnix Medema, Wageningen University, Netherlands

Presentation Overview: Show

Microbial secondary metabolism plays a central role in the community dynamics of the microbiome. The wide arsenal of unique chemical compounds produced by these pathways is used by the microbes to gain survival advantages and to interact with its environment. To investigate this metabolism, genome mining of Biosynthetic Gene Clusters (BGCs) acts as a bridge, linking gene sequences to the chemistry of compounds they produced. With the large, ever-increasing number of genomes and metagenomes being sequenced, a map of biosynthetic diversity across taxa will help us chart our course in natural product discovery and microbial ecology. Here, we introduce BiG-SLiCE, a highly scalable tool for the large scale clustering analyses of BGC data. Using this new tool, we performed a global homology analysis of 1,225,071 BGCs identified from 188,623 microbial isolate genomes and 20,588 previously published metagenome-assembled genomes in roughly 100 hours of wall-time on a 36-cores CPU. The analysis reveals the true extent of microbial product diversity, showing a high number of potential novelty, especially from environmental microbes. Furthermore, the collection of GCF models it produced may be used in combination with long reads sequencing technology to perform BGC-based functional metagenomics.

11:30 AM-11:40 AM
PIRATE- Phage Identification fRom Assembly-graph varianT Elements
Format: Pre-recorded with live Q&A

  • Mihai Pop, University of Maryland, College Park, United States
  • Jacquelyn S Meisel, University of Maryland, College Park, United States
  • Harihara Subrahmaniam Muralidharan, University of Maryland, College Park, United States
  • Nidhi Shah, University of Maryland, College Park, United States

Presentation Overview: Show

Bacteriophages are viruses that infect and destroy bacteria. As bacteria rapidly evolve to counter the effect of antibiotic drugs, bacteriophages are being explored as complements and alternatives to antibiotics. Identification and characterization of novel phage from sequencing data is critical to achieve this goal, but presents many computational challenges. We developed MetaCarvel (https://github.com/marbl/MetaCarvel), a scaffolding tool that detects assembly graph motifs representative of biologically-relevant variants. Some bubble and repeat motifs detected by MetaCarvel represent phage integration events, providing the opportunity for detecting novel phage within microbial communities. Bubbles, indicating genomic insertion/deletion events or strain variants, may contain specialist phage, while repeat elements may capture generalist phage, common to multiple closely related bacterial hosts. Our assembly graph based methods were able to detect crAssphage (the first computationally identified phage) within variants in 208 human gut microbiome samples. To identify novel phage in metagenomes, we extracted repeat and bubble contigs(unitigs) that did not share sufficient similarity with known organisms. We clustered contigs with similar genomic content and blasted predicted genes from each cluster against the UniProt phage database. Multiple clusters contained sequences rich in integrase genes, tail proteins and tape measure proteins, suggesting these sequences represent genomic fragments from previously uncharacterized phage.

12:00 PM-12:20 PM
Proceedings Presentation: MetaBCC-LR: Metagenomics Binning by Coverageand Composition for Long Reads
Format: Pre-recorded with live Q&A

  • Anuradha Wickramarachchi, Research School of Computer Science, Australian National University, Australia
  • Vijini Mallawaarachchi, Research School of Computer Science, Australian National University, Australia
  • Vaibhav Rajan, School of Computing, National University of Singapore, Singapore
  • Yu Lin, Research School of Computer Science, Australian National University, Australia

Presentation Overview: Show

Motivation: Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyze metagenomic data, binning is considered a crucial step to characterise the different species of microorganisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this paper, we presentMetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition.

Results: We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving∼13% improvement in F1-score and∼30% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read based metagenomics analyses to support a wide range of applications.

Availability: The source code is freely available at: https://github.com/anuradhawick/MetaBCC-LR.

12:20 PM-12:30 PM
Assembly graph-based variant discovery reveals novel dynamics in the human microbiome
Format: Pre-recorded with live Q&A

  • Mihai Pop, University of Maryland, College Park, United States
  • Jacquelyn S Meisel, University of Maryland, College Park, United States
  • Harihara Subrahmaniam Muralidharan, University of Maryland, College Park, United States
  • Jay Ghurye, Dovetail Genomics, United States
  • Todd Treangen, Rice University, United States
  • Sergey Koren, NIH, United States
  • Marcus Fedarko, University of California, San Diego, United States

Presentation Overview: Show

Sequence variation within metagenomes reveals important information about the structure, function, and evolution of microbial communities. However, most existing methods for variant detection are reference-dependent and are limited to identifying single nucleotide polymorphisms (SNPs), missing more complex structural changes. We developed MetaCarvel (​https://github.com/marbl/MetaCarvel​), a reference-independent tool that incorporates paired-end read information to link together contigs into confident scaffolds and detects a rich set of graph signatures indicative of biologically-relevant variants. We applied MetaCarvel to almost 1,000 metagenomes from the Human Microbiome Project and identified over nine million variants representing insertion/deletion events, complex strain differences, plasmids, and repeats. The majority of identified variants were repeats, some corresponding to mobile genetic elements. Our analysis revealed striking differences in the rate of variation across body sites, highlighting niche-specific mechanisms of bacterial adaptation. We identified more indels and strain variants in the oral cavity than in the comparatively nutrient-rich gut. In particular, we highlight a ​Streptococcus​ variant from neighboring sites in the oral cavity suggesting that, despite their close proximity, bacteria within each microenvironment utilize unique approaches for effective colonization. This work highlights the utility of using graph-based variant detection to capture biologically significant signals in microbial populations.

12:30 PM-12:40 PM
Meta-NanoSim: metagenome simulator for nanopore reads
Format: Pre-recorded with live Q&A

  • Theodora Lo, Canada's Michael Smith Genome Sciences Centre, Canada
  • Chen Yang, Canada's Michael Smith Genome Sciences Centre, Canada
  • Ka Ming Nip, Canada's Michael Smith Genome Sciences Centre, Canada
  • Saber Hafezqorani, Canada's Michael Smith Genome Sciences Centre, Canada
  • Rene L. Warren, BC Cancer Genome Sciences Centre., Canada
  • Inanc Birol, Canada's Michael Smith Genome Sciences Centre, Canada

Presentation Overview: Show

As a long-read sequencing technique, Oxford Nanopore Technology (ONT) has shown unprecedented potential in metagenomic studies. However, the challenges associated with ONT reads, such as high error rate and non-uniform error distributions, necessitate analytical tools designed specifically for long reads. To facilitate the development and benchmarking, simulated datasets with known ground truth are desirable. Here, we present Meta-NanoSim, a fast and lightweight ONT read simulator that characterizes and simulates the unique properties of ONT metagenomes, including abundance levels, chimeric reads, and reads that span both ends of a circular genome. Provided with the empirical profiles and abundance profile learnt from experimental dataset, multi-sample multi-replicate metagenome datasets are generated to simulate microbial communities with both circular and linear genomes. To demonstrate its performance, we train Meta-NanoSim with two mock microbial community standards and compare the simulation results against state-of-the-art tools. Further, we showcase the application of Meta-NanoSim through benchmarking ONT metagenome assemblers on our simulated datasets. Gold standards provided by Meta-NanoSim will facilitate the development of algorithms and pipelines in metagenomics, including functional gene prediction, species detection, comparative metagenomics, and clinical diagnosis. As such, we expect Meta-NanoSim to have an enabling role in the field.

2:00 PM-2:40 PM
Microbiome Keynote: The analysis of microbiome data from biased high-throughput sequencing
Format: Live-stream

  • Amy Willis

Presentation Overview: Show

The composition of a microbiome is an important parameter to estimate given the critical role that microbiomes play in human and environmental health. However, profiling the composition of a microbial community using high throughput sequencing distorts the true composition of the community. Sequencing mock communities -- artificially constructed microbiomes of known composition -- clearly illustrates that observed composition is a biased estimate of true composition, with certain taxa consistently overobserved or underobserved compared to their true relative abundance. We propose a statistical model for bias in compositional data, illustrating its performance on data from the Vaginal Microbiome Consortium, and illustrate the effect of compositional bias on the replicability of human microbiome studies using data from the Microbiome Quality Control Project. We conclude with recommendations for the design and analysis of microbiome studies.

2:40 PM-2:50 PM
PLoT-ME: Pre-classification of Long-reads for Memory Efficient Taxonomic assignment
Format: Pre-recorded with live Q&A

  • Sylvain Riondet, National University of Singapore / Genome Institute of Singapore, Singapore
  • Niranjan Nagarajan, Genome Institute of Singapore, Singapore

Presentation Overview: Show

With increasing feasibility, long-read metagenomics can enable high-resolution taxonomic analysis in a range of applications from diagnostics to forensics. The ease of access via portable long-read platforms (e.g. MinION) is in contrast to the need for significant memory resources when classifiers try to provide precise reads assignments (to strain or sub-strain level) or identify a wider set of organisms (e.g. large eukaryotes). To address this, memory-efficient taxonomic classifiers are an active area of research, with methods based on compact indexes providing various tradeoffs between memory usage and speed.

Here we present a general-purpose strategy (PLoT-ME) that leverages the information in k-mer frequency (3-5bp) spectrums of long-reads to pre-classify them, allowing existing classifiers to further assign them against subsets of the reference database.

Evaluation on mock communities (real reads) shows that PLoT-ME’s fast K-means classifier provides a scalable, compact approach to rapidly pre-classify long error-prone reads (PacBio, Oxford Nanopore) without loss in classification performance. PLoT-ME was found to be robust to a range of read lengths (500bp-10kbp) and provides up to an order-of-magnitude reduction in memory requirements. We envisage that with further improvements in long-read metagenomic classifiers, this approach will enable a general-purpose strategy for high-resolution, low-memory microbiome analysis.

2:50 PM-3:00 PM
pepFunk: an R shiny app and workflow for peptide-centric functional analysis of metaproteomic microbiome data
Format: Pre-recorded with live Q&A

  • Mathieu Lavallée-Adam, University of Ottawa, Canada
  • Caitlin Simopoulos, University of Ottawa, Canada
  • Zhibin Ning, University of Ottawa, Canada
  • Xu Zhang, University of Ottawa, Canada
  • Leyuan Li, University of Ottawa, Canada
  • Krystal Walker, University of Ottawa, Canada
  • Daniel Figeys, University of Ottawa, Canada

Presentation Overview: Show

Researchers can use metaproteomics to study the composition and functional contributions of the gut microbiome to human health. These metaproteomic data are acquired by a multistep process, first starting with enzymatic digestion of microbial proteins into smaller and more easily detectable peptides. These peptides are then processed by a mass spectrometer, and the obtained mass spectra are matched back to a peptide database. Typically, the metaproteomic data analysis pipeline involves the identification of each peptide to a potential parent protein. Challenges to unambiguous protein identification arise due to the nature of enzymatic digestion, where peptides can match back to multiple parent proteins. We developed pepFunk, a peptide-centric functional analysis of metaproteomic data methodology and tool to circumvent this challenge. We created a gut microbiome peptide-to-KEGG database and developed a functional enrichment strategy for peptide-level data. Our peptide-centric approach gives an enhanced ability for users to observe the biological processes taking place in the microbiome. Our tool is open source and is available as a Shiny web application at https://shiny.imetalab.ca/pepFunk.

3:20 PM-3:40 PM
Proceedings Presentation: ganon: precise metagenomics classification against large and up-to-date sets of reference sequences
Format: Pre-recorded with live Q&A

  • Vitor C. Piro, Hasso Platner Institute, Germany
  • Temesgen Hailemariam Dadi, Freie Universität Berlin, Germany
  • Enrico Seiler, Freie Universität Berlin, Germany
  • Knut Reinert, Freie Universität Berlin, Germany
  • Bernhard Renard, Hasso Platner Institute, Germany

Presentation Overview: Show

The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices. Motivated by those limitations we created ganon, a k-mer based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires less than 55 minutes to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-Score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification. The software is open-source and available at: https://gitlab.com/rki_bioinformatics/ganon

3:40 PM-3:50 PM
Studying the dynamics of the gut microbiota using metabolically stable isotopic labeling and metaproteomics
Format: Pre-recorded with live Q&A

  • Mathieu Lavallée-Adam, University of Ottawa, Canada
  • Krystal Walker, University of Ottawa, Canada
  • Daniel Figeys, University of Ottawa, Canada
  • Patrick Smyth, University of Ottawa, Canada
  • Xu Zhang, University of Ottawa, Canada
  • Zhibin Ning, University of Ottawa, Canada
  • Janice Mayne, University of Ottawa, Canada
  • Jasmine Moore, University of Ottawa, Canada

Presentation Overview: Show

The gut microbiome and its metabolic processes are dynamic systems. Surprisingly, our understanding of gut microbiome dynamics is limited. Here we report a metaproteomic workflow that involves protein stable isotope probing (protein-SIP) and identification/quantification of partially labeled peptides. We also developed a package, which we call MetaProfiler, that corrects for false identifications and performs phylogenetic and time series analysis for the study of microbiome dynamics. From the stool sample of five mice that were fed with 15N hydrolysate from Ralstonia eutropha, we identified 15,297 non-redundant unlabeled peptides of which 10,839 of their heavy counterparts were quantified. These results revealed that i) isotope incorporation in proteins differed between taxa, ii) the rate of protein synthesis was lower in the microbiota than in mice, and iii) differences in protein synthesis appeared across protein functions. Interestingly, the phylum Verrucomicrobia and the genera, Akkermansia, Lactobacillus, and Ruminococcus had not reached a plateau of isotopic incorporation 43 days after the continuous introduction of the isotope. Altogether, our study provides an efficient workflow for the study of dynamics of gut microbiota, and our findings helped better understand the complex host-microbiome interactions.

3:50 PM-4:00 PM
Phenotypic characterization of complex microbial communities
Format: Pre-recorded with live Q&A

  • Dmitry Rodionov, Sanford-Burnham-Prebys Medical Discovery Institute, United States
  • Stanislav Iablokov, Institute for Information Transmission Problems, Russia

Presentation Overview: Show

Metabolic capabilities (phenotypes) of each microbial species are defined by the presence or absence of pathways encoded in their respective genomes. We reconstructed >70 metabolic pathways in >2,600 reference genomes of bacteria representing the human gut microbiome (HGM) and assigned metabolic phenotypes for (i) utilization of primary sources of energy/carbon (sugars, amino acids); (ii) synthesis of essential nutrients (vitamins/cofactors, amino acids); (iii) excretion of fermentation end-products (short-chain fatty acids). Capturing these phenotypes by a simple binary (1/0) phenotype matrix (BPM) facilitates comparative analysis of the cumulative metabolic potential of microbial communities. To enable metabolic phenotype profiling of microbiomes, we established a computational pipeline converting 16S metagenomic profiles into Community Phenotype Profiles comprised of Community Phenotype Index (CPI) representing fractional representation of all “1”-phenotypes (vitamin prototrophs, sugar utilizers, etc). We applied this approach to assess the distribution of metabolic capabilities in several large HGM datasets from healthy and sick subjects. We also introduce a concept of phenotypic diversity as a diversity of the subcommunity of organisms possessing a particular metabolic phenotype. The obtained functional diversity metrics (Alpha and Beta diversity of phenotypes) reflect phenotype distribution in microbiome samples and allow to train machine learning models for sample classification.

4:00 PM-4:10 PM
Introduction to CAMI
Format: Live-stream

  • Alice McHardy, Helmoltz Centre for Infection Research, Germany
4:10 PM-4:20 PM
Assembly results for second round of CAMI challenges
Format: Live-stream

  • Alex Sczyrba, Bielefeld University, Germany
4:20 PM-4:30 PM
(Taxonomic) binning results for the second round of CAMI challenges
Format: Live-stream

  • Fernando Meyer, Helmoltz Centre for Infection Research, Germany
4:30 PM-4:40 PM
Profiling results for the second round of CAMI challenges
Format: Live-stream

  • David Koslicki, Pennsylvania State University, United States
4:40 PM-4:50 PM
Expanding CAMI towards metaproteomics
Format: Pre-recorded with live Q&A

  • Alice McHardy, Helmoltz Centre for Infection Research, Germany
  • Patrick May, Luxembourg Centre for Systems Biomedicine, Luxembourg
  • Benoit Kunath, Luxembourg Centre for Systems Biomedicine, Luxembourg
  • Paul Wilmes, Luxembourg Centre for Systems Biomedicine, Luxembourg

Presentation Overview: Show

In 2018, the metaproteomics community initiated a challenge covering from protein extraction and nanoLC mass spectra acquisition to the comparison of search engines for metaproteomic analysis. While the assessment focused on the protein identification using commonly employed search engines, one important challenge in metaproteomics was missing: the appropriate database used for protein identification. Metaproteome analyses require specific protein databases and multiple ways are available to generate such from meta-genomic and –transcriptomic data.
The CAMI challenge assesses and discusses those ways but does not include any assessment of their results using other functional omics (especially metaproteomics) so far.
We propose that state-of-the-art metaproteomic analysis may be a useful tool to complement and proof CAMI results from metagenomic and/or metatranscriptomic assemblies and gene prediction methods. Furthermore, we are able to provide several high-quality multi-omics datasets comprising DNA, RNA, and proteins extracted from single sample reducing thereby the heterogeneity amongst the different omics. Such integrated and comprehensive assessment would provide multiple high-quality databases for improved metaproteomic analyses and supplementary means to assess CAMI’s results.

5:00 PM-5:20 PM
Proceedings Presentation: Topological and kernel-based microbial phenotype prediction from MALDI-TOF mass spectra
Format: Pre-recorded with live Q&A

  • Karsten Borgwardt, ETH Zurich, Switzerland
  • Bastian Rieck, ETH Zurich, Switzerland
  • Caroline Weis, ETH Zurich, Switzerland
  • Max Horn, ETH Zurich, Switzerland
  • Aline Cuenod, University of Basel, Switzerland
  • Adrian Egli, University Hospital Basel, Switzerland

Presentation Overview: Show

Motivation: Microbial species identification based on Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry has become a standard tool in biomedicine and microbiology. MALDI-TOF mass spectra harbour the potential to deliver prediction results for other phenotypes, such as antibiotic resistance. Machine learning algorithm development specifically for MALDI-TOF MS based phenotype prediction is still in its infancy. Current spectral pre-processing typically involves a parameter-heavy chain of operations without analysis of their influence on the prediction results. In addition, classification algorithms lack quantification of
uncertainty, which is indispensable for predictions potentially influencing patient treatment.

Results: We present a novel prediction method for
antimicrobial resistance based on MALDI-TOF mass spectra. First, we compare the complex conventional pre-processing to a new approach that exploits topological information and requires only a single parameter, namely the number of peaks of a spectrum to keep. Second, we introduce PIKE, the Peak Information Kernel, a similarity measure specifically tailored to MALDI-TOF mass
spectra which combined with a Gaussian Process classifier provides well calibrated uncertainty estimates about predictions. We demonstrate the utility of our approach by predicting antibiotic resistance of three highly-relevant bacterial species.
Our method consistently out-performs competitor approaches, while demonstrating improved performance and security by rejecting out-of-distribution samples, such as bacterial species not represented in the training data. Ultimately, our method could contribute to an earlier and precise antimicrobial treatment in clinical patient care.

Availability: We make our code publicly available as an easy-to-use Python package at https://github.com/BorgwardtLab/maldi_PIKE.

Contact:
caroline.weis@bsse.ethz.ch, karsten.borgwardt@bsse.ethz.ch

5:20 PM-5:30 PM
Deep learning for binning and high resolution taxonomic profiling of microbial genomes
Format: Pre-recorded with live Q&A

  • Jakob Nissen, Technical University of Denmark, Denmark
  • Joachim Johansen, University of Copenhagen, Denmark
  • Rosa Allesøe, University of Copenhagen, Denmark
  • Casper Sønderby, University of Copenhagen, Denmark
  • Jose Armenteros, Technical University of Denmark, Denmark
  • Christopher Grønbech, Technical University of Denmark, Denmark
  • Lars Jensen, University of Denmark, Denmark
  • Henrik Nielsen, Clinical Microbiomics A/S, Denmark
  • Thomas Petersen, Technical University of Denmark, Denmark
  • Ole Winther, Technical University of Denmark, Denmark
  • Simon Rasmussen, University of Copenhagen, Denmark

Presentation Overview: Show

Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data, remains a challenging task. We have used recent advances in deep learning to develop Variational Autoencoders for Metagenomic Binning (VAMB), a program that uses deep variational autoencoders to encode sequence co-abundance and k-mer distribution information prior to clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any prior knowledge of the datasets. VAMB outperforms existing state-of-the-art binners on contig datasets, reconstructing 29–98% more near complete draft genomes. We employed VAMB in a novel multi-split workflow, that enables assembly of 28–105% more strains compared to using VAMB with the commonly used single sample binning strategy. To demonstrate the scalability of our method, we bin a human gut microbiome dataset with 1,000 samples and reconstruct 45% more near-complete bins compared to state-of-the-art methods. Furthermore, we show that VAMB enables direct high-resolution taxonomic analysis of the generated genome clusters. Finally, we use this to show that different organisms have different geographical distribution patterns potentially important for design of probiotics. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb.

5:30 PM-5:40 PM
MiMeNet: Exploring the Microbiome-Metabolome Relationships using Neural Networks
Format: Pre-recorded with live Q&A

  • Derek Reiman, University of Illinois at Chicago, United States
  • Yang Dai, University of Illinois at Chicago, United States

Presentation Overview: Show

The microbial community has been shown to be involved in host development as well as the pathogenesis of various diseases. The microbial community is believed to functionally interact with their host at a metabolic level through symbiotic interactions and co-metabolism. Recent studies are beginning to highlight various metabolic dysregulations leading to the development of metabolic diseases. However, there is a lack of metabolomic data as it is costly and difficult to obtain. Therefore. the ability to predict unknown metabolomic profiles using microbial features would be extremely useful. Here, we describe MiMeNet, a neural network model to predict the metabolomics profile from microbial features. Using three paired microbiome-metabolomic datasets, we show that MiMeNet has superior predictive performance compared to the state-of-art linear models. In particular, MiMeNet uses data from one cohort of patients with inflammatory bowel disease to accurately predict the metabolomic profile of a second external cohort. Additionally, MiMeNet can be used to interpret the underlying structure of the microbe-metabolite interaction network, providing insights for the causes of metabolic dysregulation in disease which could allow for future hypothesis generation.

5:40 PM-5:50 PM
Machine-learning based prospection of antimicrobial peptides (AMPs) from metagenomes using Macrel
Format: Pre-recorded with live Q&A

  • Célio Dias Santos-Júnior, Fudan University, China
  • Shaojun Pan, Fudan University, China
  • Xing-Ming Zhao, Tongji University, China
  • Luis Pedro Coelho, Fudan University, China

Presentation Overview: Show

Antimicrobial peptides (AMPs) are peptides (≤ 100 residues) with antimicrobial properties, which are used in both clinical and non-clinical environments. Metagenomes and metatranscriptomes present an opportunity for prospect novel AMPs. However, standard methods do not apply to shorter peptides as we show empirically. Here, we present MACREL (for Meta(genomic) AMPs Classification and REtrievaL), an end-to-end pipeline that works from metagenomes/metatranscriptomes (in the form of short reads) or genomes (in the form of pre-assembled contigs) and predicts the AMP therein. Macrel uses random forest classifiers trained with a novel set of 22 descriptors that represent the main AMP features. The effectiveness of Macrel in AMP prediction was benchmarked using realistic simulations and real metagenomic data. Macrel is available as open-source software at https://github.com/BigDataBiology/macrel and as a web server: http://big-data-biology.org/software/macrel. We show that Macrel has comparable overall performance (Acc. 94.6%, MCC 0.90) to other state-of-art methods, achieving the highest specificity (99.8%) compared to other methods. AMPs are likely to be relatively rare, thus reducing the number of false positives is more important than reducing false negatives. High-quality AMP candidates were recovered, and most were expressed in metatranscriptomes from the same biological samples.

5:50 PM-6:00 PM
Identifying short open reading frames (smORFs) with deep learning
Format: Pre-recorded with live Q&A

  • Shaojun Pan, Fudan University, China
  • Luis Pedro Coelho, Fudan University, China

Presentation Overview: Show

Standard computational gene prediction methods do not predict very short genes. This is due to methodological limitations and to the historical belief that these sequences rarely have a biological function. Recently, however, several groups have demonstrated that there is a wealth of function in these short proteins. The computational difficulties remain as standard approaches are too prone to false positives when trying to predict smORFs within genomic sequences. Here, we take advantage of a previously published dataset of smORFs, which had used conservation signatures to eliminate likely false positives. We now frame the question as a classification problem and apply multiple input Convolutional Neural Network to this problem. Our classifier achieves 68.9% recall on 48.1% precision accuracy (the testing sets do not contain any sequences that are > 80% identical to the sequences with at least 90% coverage in training sets).This demonstrates the potential of this approach for identifying smORFs in silico using only their sequences.