||Gary Bader, PhD
University of Toronto, Canada
Title: Pathway Analysis of Genomics Data – from Correlation to Causation
Abstract: Genomic data provides a snapshot of one or a few dimensions of information about an organism, such as gene or protein expression, promoter methylation and genome sequence in a set of cells. This information is typically interpreted using correlation based methods. For instance, a genome wide association study correlates a genetic marker with a phenotype and a set of gene expression profiles across multiple experiments can be correlated and clustered to identify groups of genes that act similarly and thus may be part of the same pathway. This approach is extremely valuable, but a major challenge is to better understand causative mechanisms underlying the genomic snapshot. For instance, we would like to know if the activity of a particular transcription factor or microRNA can explain the pattern of gene or pathway activity observed in an mRNA transcript profile. This could then be experimentally tested by perturbing the controlling factor. We are developing computational approaches to collect and use biochemical pathway information to help interpret genomics data and gain a more mechanistic understanding of cellular function.
||Greg Carter, Ph.D.
The Jackson Laboratory
Bar Harbor, ME USA
Title: Modeling Genetic Complexity with Integrated Interaction Analysis across Multiple Phenotypes
Abstract: Contemporary studies are revealing the genetic complexity of many traits in humans and model organisms. Two hallmarks of this complexity are epistasis, or gene-gene interaction, and pleiotropy, in which one gene affects multiple phenotypes. Understanding the genetic architecture of complex traits requires addressing these phenomena, but interpreting the biological significance of epistasis and pleiotropy is often difficult. While epistasis reveals dependencies between genetic variants, it is often unclear how the activity of one variant is specifically modifying the other. Epistasis found in one phenotypic context may disappear in another context, rendering the genetic interaction ambiguous. Pleiotropy can suggest either redundant phenotype measures or gene variants that affect multiple biological processes. Here we address these interpretation ambiguities with a method called combined analysis of pleiotropy and epistasis (CAPE). This approach integrates information from multiple related phenotypes to constrain models of epistasis, thereby enhancing the detection of interactions that simultaneously describe all phenotypes. The networks inferred are readily interpretable in terms of directed influences that indicate suppressive and enhancing effects of individual genetic variants on other variants, which in turn account for the variance in quantitative traits. We demonstrate the utility of this approach by analyzing mouse intercross data, discovering a novel interaction network influencing kidney gene expression and disease. We have implemented this approach in an R package that can be applied to data from both genetic screens and a variety of segregating populations including backcrosses, intercrosses, and natural populations.
- Anna Tyler - The Jackson Laboratory, US
- Vivek Philip - The Jackson Laboratory, US
||Chijioke O. Elekwachi, Ph.D.
University of Nottingham, UK
Title: eMBRLitMine and eMBRHelper: Bioinformatics Approaches for Improved Microbial Bioremediation Outcomes
Abstract: Contamination of ecosystems by xenobiotic substances has negatively impacted affected ecologies and the health and economic livelihoods of human populations in such environments. Bioremediation has proven to be a safe, low-cost and environmentally friendly method for remediation of such areas. However, a lack of complete understanding of the metabolic, enzymatic and cellular processes involved has made it difficult to model or predict outcomes of field processes. Researchers’ ability to make critical decisions capable of influencing the direction and outcome of these processes is hampered, thereby hindering its development. Following a survey that highlighted priorities, practices and needs of the sector the environmental Microbial BioRemediation (eMBR) web-portal was developed. This article describes the structure, algorithms and output of two bioinformatics resources for improved microbial bioremediation outcomes, deployed via the portal.
eMBRLitMine helps in identifying which microorganisms would be suitable for remediating sites contaminated by named compounds. It combines named-entity-recognition algorithms, a mySQL database, and graph-rendering technologies to create, from information available in literature, a statistical co-occurrence matrix from which it infers associations among microorganisms and contaminants. This provides insights into possible bacteria/contaminant relationships and highlights microorganisms that may be useful for remediating particular contaminants. Following the construction of a comprehensive metabolic biodegradation network eMBRHelper enables the delineation of possible biodegradation pathways for named contaminants. By integrating chemical, and enzymatic information, it attempts to model the interplay between contaminants, enzymes, microorganisms in degradation pathways, and enables researchers to make informed decisions, capable of improving outcomes of remediation exercises involving bio-augmentation.
- Charlie Hodgman - University of Nottingham, Multidisciplinary Centre for Integrative Biology (MyCIB), UK
||Benjamin Good, Ph.D.
Senior Staff Scientist
The Scripps Research Institute
La Jolla, CA, USA
Title: The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction
Abstract: Breast cancer has been studied extensively with genomic technologies, with many attempts to devise molecular predictors of clinical outcomes. A key aspect to these studies is the selection of small, informative sets of genes with which to compose predictors. Many groups now apply prior knowledge in forms such as protein-protein interaction networks and pathway databases in the process of gene selection. However, these approaches can not make use of unstructured knowledge.
Since the year 2000, more than 160000 publications related to breast cancer have been added to PubMed. We created a ‘scientific discovery game’, called The Cure, to tap into the knowledge represented in those articles and in the minds of those who can read them. The objective of The Cure is to identify genes that can be used to build improved survival predictors for breast cancer. It is formulated as a card game with genes as cards. Hands are scored based on the value of the genes in creating a decision tree predictor of survival using their expression values. To win a game, players must select more predictive genes than their opponent. In one year, 1,077 players registered and collectively played 9,904 gene-selection games. Aggregating the results of game play, we generated a ranked list of preferentially selected genes. Preliminary analysis of this list indicates that it is non-random, with the higher-ranked genes showing overlaps with prior breast cancer gene lists as well as surfacing some novel predictive genes.
- Salvatore Loguercio - The Scripps Research Institute, Molecular and Experimental Medicine, US
- Max Nanis - The Scripps Research Institute, Molecular and Experimental Medicine, US
- Andrew Su - The Scripps Research Institute, Molecular and Experimental Medicine, US
||Melissa Haendel, PhD
Ontology Development Group, OHSU Library
Department of Medical Informatics and Epidemiology
Oregon Health & Science University, Oregon, USA
Title: Tales from the Crypt: Do You Know Where Your Data Has Been?
Abstract: Researchers produce data that bioinformaticists analyze to generate hypotheses and novel discoveries, which feed back into basic or clinical research. It is a beautiful cycle, but as everyone knows, not all data is created equal and this can cause incorrect conclusions at best and broad propagation of errors at worst. We expect science to be reproducible and replicable. What does this even mean and how can we work towards having greater confidence in our collective scientific knowledge? Here, we explore these fundamental issues that plague bioinformatics: stories about scientific reproducibility and mechanisms for improvement, efforts to understand how we know when a dataset is annotated sufficiently, and the importance of tracking provenance in data transformations through analytical pipelines. The goal is to instill in all of us increased scrutiny on the data we work with and to discuss development of informatics methods that can aid data quality analysis and metrics.
||Kirk E. Jordan, PhD
IBM Distinguished Engineer
Emerging Solutions Executive & Assoc. Program Director
Computational Science Center
IBM T.J. Watson Research
Cambridge, MA, USA
Title: Solving Life Sciences Problems Requires Systems that Optimize the Workflow
Abstract: The world and the life sciences are awash in data. The problem for computing is no longer the ability to compute but the inability to move the data to the compute. As a consequence, the focus is shifting to the concept can we move the compute to the data. This raises the question of optimizing the entire workflows to solve problems instead of optimizing a compute intensive kernel. In this talk, I will briefly expound on the concept of moving compute to the data. I will describe some of our ongoing investigations of trying to tackle entire workflows presenting some of these efforts and the impact this is having on looking at the entire workflow.
||Peter Karp, Ph.D.
Director, Bioinformatics Research Group
Artificial Intelligence Center
Menlo Park, CA, USA
Title: A Flux Balance Analysis Model of E. coli K-12 MG1655 Derived From the EcoCyc Database
Abstract: We present EcoCyc-17.5-FBA, a genome-scale model of the Escherichia coli K-12 MG1655 metabolic network. The model is automatically generated from the current state of the EcoCyc database using the MetaFlux component of Pathway Tools, ensuring reflection of the current state of knowledge of the E. coli metabolic network. EcoCyc-17.5-FBA represents several advances in E. coli genome-scale models, breaking new ground in the number of genes modeled and the accuracy and breadth of its predictions. EcoCyc-17.5-FBA encompasses 1923 genes, 1996 unique metabolic reactions, and 2026 unique metabolites. We demonstrate a three-part validation of EcoCyc-17.5-FBA: (1) A comparison of simulated EcoCyc-17.5-FBA growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (2) Essentiality predictions for all 1923 genes involved in the model, with an accuracy of 94.0%. (3) Nutrient utilization testing for viability on 478 different growth media, with an accuracy of 77.4%. The development of EcoCyc-17.5-FBA as an aspect of EcoCyc has improved the quality and depth of EcoCyc's representation of E. coli. EcoCyc-17.5-FBA is a literate model in the sense that it is highly accessible to human comprehension: the component reactions, metabolites, and genes of the model are directly inspectable through the EcoCyc website, which includes a broad array of information that enriches the model, such as metabolite chemical structures, pathway diagrams, genome maps, and regulatory interactions.
- Daniel Weaver - SRI International, Bioinformatics Research Group, US
||Catherine Lozupone, Ph.D.
Anschutz Medical Campus
Aurora, CO USA
Title: Application of Network Analysis to the Human Gut Microbiota and Use of Comparative Genomics to Understand the Driving Factors of Microbial Associations
Abstract: Application of network analysis to the human gut microbiota and use of comparative genomics to understand the driving factors of microbial associations
Non-random co-occurrence patterns of human gut bacteria across people can indicate important biological relationships between them. A positive association can indicate symbioses, such as a syntrophic interaction in which microbes support each other’s growth by cooperative metabolism. However a positive association can also be driven by indirect associations such as a shared preference for the same type of environment. For instance in a dataset containing samples with diverse oxygen concentration, strict anaerobes may positively occur simply because a shared dislike of oxygen. Similarly, negative associations can be driven by direct competition between microbes for the same resource, or by opposing environmental preferences. I will discuss methods for detecting significant co-occurrence interactions between microbes using data from culture-independent surveys of the human gut, and for predicting potential direct or indirect drivers of co-occurrence patterns by comparing the content of associated genomes.