HOME

Tweets by @ISMBinfo

Accepted Posters

Attention Conference Presenters - please review the Speaker Information Page available here.

If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category I - 'Open Science and Citizen Science'

I01 - Metagenomics analysis of probiotic supplement contents using k-mer matches to sequencing reads

Mark Mammel, United States Food and Drug Administration, United States

Short Abstract: In dietary supplements containing live microbial ingredients of beneficial bacteria, identification of ingredient strains and potential contaminants presents a concern for food safety regulators to assess safety and proper labeling of the product as well as post-market product surveillance. Whole genome sequencing was used to identify and quantify genome content directly from various marketed products containing mixtures of bacterial strains. Data analyses was based on k-mer identification of each read that allows for rapid identification of the genomes present in the sequence data. First, a database of 25-mers was generated from reference whole genome sequences. Then, 25-mers were selected from each bacterial species of interest which are present in representatives of that species but not found in other species. All 25-mers are stored in a trie structure to allow for fast searching capability. After the completion of a sequencing run on a mixed bacterial population, the program attempts to match each read to the 25-mers in the trie and then each matched read to a species is tallied. After normalization using coverage of the reference strain by the set of k-mers for a species, a distribution of the counts of each species present is output. Using k-mer matching on MiSeq sequencing reads of a metagenomic sample we were able to quickly identify the components of several probiotic samples and compare our results to the listed ingredients.

I02 - A Bioinformatics Pipeline for Designing Virtual DNA Arrays for Improved Species-Level Interpretation of Amplicon-Based Metagenomic Data

Christine Lowe, Agriculture and Agri-Food Canada, Canada

Short Abstract: High-throughput Next Generation Sequencing (NGS) technologies have enhanced the ability to explore microbial diversity from environmental samples using amplicon-based metagenomics (ABM), also referred to as metabarcoding. ABM utilizes large publically accessible reference databases, such as GenBank, and available off-the-shelf tools to profile microbial community structure by assigning barcode sequences to different taxonomic ranks. Algorithms implemented in most NGS annotation tools are based on sequence similarity, including the widely used Lowest Common Ancestor (LCA) algorithm and the Ribosomal Database Project (RDP) classifier. Such algorithms may not be able to resolve highly similar sequences at or below species level resulting in identifications at higher taxonomic ranks. Here, we present an Automated Oligonucleotide Design Pipeline (AODP) which designs signature oligonucleotides (SO) with specificity and fidelity for members of a given taxonomic group. Such SO were then used for in silico probing of NGS data for improved species or below species-level identifications for specific taxa. We will demonstrate the usefulness and effectiveness of this approach for accurately assigning NGS sequences to lower taxonomic ranks. This approach can serve as a potential tool to monitor the distribution and introduction of specific economically significant pathogens in the agri-ecosystem with increased confidence by reducing ambiguities when identifying high risk species in NGS data.

I03 - The exploration of metagenomic sequence and fold space diversity

Daniel Roche, French Alternative Energies and Atomic Energy Commission, France

Short Abstract: The structure/function relationship forms the basis of a molecular description and understanding of the properties and cellular roles of proteins. Protein 3D structure determination remains essentially a low to medium throughput process, while sequencing has entered a high-throughput regime. As a result, sequence based structure inference methods, relying on existing structural knowledge. Thus, remote homology detection techniques are the "de facto" means to infer structural properties. Previous studies using such techniques show that structural coverage of sequence space seems to be plateauing. However, sequence sampling of the biosphere has been -and remains to a large extent-heavily biased, focusing on culturable phyla relevant to biomedicine or biotechnology. The advent of metagenomics and initiatives like GEBA aim to correcting this bias, but the extent to which these efforts affect the apparent saturation of structural diversity remains unclear. To investigate this issue, we performed a sequence based structural survey of 41 bacterial genomes from the GEBA initiative and of numerous genomes that were reconstructed from uncultured candidate phyla using either metagenomics (47 genomes recovered from an acetate amended aquifer, Wrighton 2013) or single cell based techniques (555 genomes recovered from the “microbial dark matter” project, Rinke 2013). Initial results which estimate known sequence and structural coverage in the selected (meta)genomes are presented. These results indicate that undersampled organisms can be a significant source of sequence novelty. We then used an empirically calibrated relationship between sequence level and structure level novelty to estimate how this sequence novelty is likely to translate into structural novelty.

I04 - Detection of a genetically modified plant pathogen Serratia marcescens using E-probe Diagnostic Nucleic acid Analysis (EDNA)

Ulrich Melcher, Oklahoma State University, United States

Short Abstract: Next generation sequencing (NGS) and metagenomics are approaches that have great potential in addressing various questions in plant biosecurity projects. A typical NGS metagenome analyzing process is computationally exhaustive, limited by the inordinate lengths of time required to complete the assembly and BLAST analyses. We designed a bioinformatic tool called E-probe Diagnostic Nucleic acid Analysis (EDNA) to reduce the time of analysis. EDNA creates query sets of small representative sequences unique to a particular organism (pathogen), termed electronic probes, or e-probes. Then EDNA reverses the typical metagenome analysis. Instead of BLAST analysis of an entire NGS database against a reference database like GenBank, the diagnostic e-probes are used to BLAST against the NGS database. Three sets of e-probes for the bacterial pathogen Serratia marcescens (SM) were designed in this study: 1) SM e-probes, 2) GFP plasmid vector e-probes, and 3) commonly used antibiotic resistant genes e-probes. All three sets of e-probes were tested as query against three types of simulated NGS databases: 1) plant host only, 2) plant host with SM and 3) plant host with SM and GFP plasmid sequences. EDNA was successful in detecting both SM and the “genetically modified” SM in these simulations. This e-probe diagnostic nucleic acid analysis approach provides a good framework for a new sequence-based detection system that eliminates the need for assembly of NGS data

I05 - CRISPR–Cas Systems in the Human Microbiome

Quan Zhang, Indiana University Bloomington, United States

Short Abstract: The CRISPR–Cas adaptive immune system is an important defense system in bacteria and archaea, providing targeted defense against invasions of foreign nucleic acids (including viruses). The CRISPR (clusters of regularly interspaced short palindromic repeats) loci and cas (CRISPR-associated) genes are the two components of CRISPR–Cas immune systems: segments of invading DNAs are incorporated into host genomes in the CRISPR loci (forming spacers between repeats in CRISPR arrays), while cas genes encode Cas proteins that mediate the defense process. We have developed several computational approaches for the discovery and characterization of the CRISPR–Cas systems from metagenomic sequences. The first approach is the targeted assembly of CRISPR arrays, which first pools metagenomic sequences that contain CRISPR repeats and then assembles pooled reads. The targeted approach significantly improves the assembly of the CRISPR arrays, resulting in the identification of more CRISPR arrays, and longer CRISPR arrays. The second approach utilizes the spacers found in the CRISPR arrays to identify putative invasive DNAs that contain segments identical or similar to the spacers. The third approach combines similarity searches and genomic context analyses for prediction of cas genes. Application of our approaches to the Human Microbiome Project (HMP) datasets has resulted in a large collection of CRISPR–Cas systems (many are new) and Cas proteins, and putative invasive genetic elements, shedding new light on the human-associated microbial communities, and the interaction between bacteria and invasive genetic elements in the human microbiome.

I06 - Integration of microbiota and metabolomics data in longitudinal cohort of infants en route to type 1 diabetes

Tommi Vatanen, Aalto University, Finland

Short Abstract: Type 1 diabetes mellitus (T1D) is the most common metabolic-endocrine disorder in children. The interplay of genetic and environmental triggers in T1D is still a controversial topic and more empirical evidence is needed to understand the development of the disease. Here, we study integration of human gut microbiome and metabolomics data in a longitudinal cohort of 19 subjects and 104 samples. Infants at risk for T1D were tracked by monthly stool sampling and multiple serum samples from birth until 3 years of age, during which time a subset of patients seroconverted and were diagnosed with T1D. We have observed a marked drop in alpha-diversity (a measure of community complexity) and microbial gene richness in progressors in the time-window between seroconversion and T1D diagnosis accompanied by a spike in the Rikenellaceae, Rumminococcus gnavus, and Streptococcus spp. We have applied several approaches including pairwise Spearman correlations with bootstrapping, penalized canonical correlation analysis and Bayesian group factor analysis for integrating microbiome data with metabolomics data, which were analysed from both stool and serum samples. We found that shifts in microbiota occurring before seroconversion are correlated with altered serum levels of triglycerides and branched chain amino acids, both of which are associated with poor metabolic control in T1D. In stool, the increased Ruminococcus and decreased Veillonella abundance is associated with increased sphingomyelin and decreased lithocholine levels. Our results suggest that a reduced-complexity, pathobiont-containing, inflammation-favoring microbiota becomes established in the gut of seroconverted children prior to development of T1D.

I07 - Automated analysis pipeline for the metagenomic sequence data highlighting changes in microbial community composition in social stress model of mice

Raina Kumar, Frederick National Laboratory for Cancer Research, United States

Short Abstract: Here we introduce an automated analysis pipeline for the evaluation of metagenomic 16S rRNA sequence data from the Ion Torrent PGM™ Sequencer. This pipeline is developed using the lightweight python module, Ruffus. The pipeline performs basic metagenomic analysis steps including quality filtering, chimera detection, sequence read clustering for operational taxonomic units (OTUs) calling and taxonomic assignment, together with diversity analysis using QIMME. This pipeline also performs high-dimensional class comparisons using LEfSe along with metagenome functional profiling using PICRUSt. The visualization component of the pipeline generates heat maps and plots to show the difference in community composition between groups of samples. This also generates a phylogenetic tree using the iTOL software, API. To demonstrate the usability of this pipeline, we report analysis results that show changes in microbial community composition of socially stressed mice. C57BL/6J mice were exposed to aggressor mice for 6 hours daily for 5 or 10 days. DNA from duodenum tissues of these mice were then sequenced for the 16S rRNA gene V4 variable region on the Ion Torrent PGM™ Sequencer. Our results indicate changes in the relative community abundance of Lactobacillales of class Bacilli and Clostridiales of class Clostridia in 10 day-stressed mice in comparison to control, unstressed mice.

I08 - Functional annotation of metagenome-derived ORFans using structure prediction

Andrew Doxey, University of Waterloo, Canada

Short Abstract: A large fraction of protein sequences derived from metagenomes, often comprising over 50%, cannot be functionally annotated based on homology. We designed and applied a computational pipeline to identify, cluster, structurally model, and functionally annotate metagenome-derived ORFans. The method identifies ORFans that lack similarity to existing proteins but cluster into metagenome-specific sequence families, models structures of ORFan family representatives using threading, and identifies conserved catalytic sites similar to those from existing enzymes. By repeating the pipeline with shuffled sequences, false discovery rates and statistical significance of models can be obtained. We applied the pipeline to two large metagenomes from agricultural and grassland soils containing 2.4 GBP and 1.96 GBP of DNA, respectively, and discovered over 2000 novel ORFan families whose structures were inferred with less than 10% false discovery rate, providing a means for functional annotation. Enriched functions of interest include novel families of glycosyl hydrolases, antibiotic resistance proteins, defense proteins, and bacterial toxins. These novel predictions provide insights into the nature of ORFan proteins, suggesting an association with structurally diverse or rapidly evolving protein functions. Secondly, they provide a diverse resource of novel protein families, which are currently being validated experimentally through enzymatic assays and functional complementation. Ultimately, our pipeline shows promise as a large-scale annotation pipeline for metagenomic ORFans that escape homology-based annotation.

I09 - Gene finding in metatranscriptomic sequences

Wazim Mohammed Ismail, Indiana University, United States

Short Abstract: Background:
Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics.
Results:
In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript.
Conclusion:
We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community.
Availability:
TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.

View Posters By Category

Search Posters:

TOP