12th Annual Rocky Mountain Bioinformatics Conference


Updated Nov 20, 2014

Links within this page:

Judith A Blake, Ph.D.
Associate Professor
The Jackson Laboratory

Title: Glycolysis Pathways in the Gene Ontology: All Roads Lead to Pyruvate

Abstract: The Gene Ontology (GO) is a freely-available resource that provides connections between gene products and a structured, controlled vocabulary of biological terms used to describe how gene products function. The representation of biochemical pathways presents a challenge because different species may utilize different enzymatic activities or substrates to carry out similar processes. Glycolysis is an example of an overall process that is well conserved among species, but varies with respect to input molecules and enzymatic activities. We describe a strategy to represent a taxonomy of glycolytic processes that accounts for variation seen in different biological contexts. We factor out core, conserved processes, and represent variations as subtypes. Glycolytic processes are defined by axioms that include the parent superclass process, the molecular functions that are necessary parts of the process and the input and output chemical entities. We group the conserved processes by shared intermediate substances, such as a shared glucose-6-phosphate intermediate. Creating formal definitions for pathways allows use of OWL-based reasoning for further inferred classification. Inclusion of necessary molecular functions not only aids ontology maintenance but also guides annotators in determining which genes are directly involved in a process. We use glycolysis in mouse sperm to show how curators can use the necessary functions associated with glycolytic process to infer that subtype for annotation and how users can explore annotations to determine which isoforms of an enzyme are used in a given biological context. This work supported by NIH:NHGRI grants U41HG002273 to GOC and U41HG003751 to Reactome Knowledgebase.

CV: web
Robert Kirk DeLisle, PhD Robert Kirk DeLisle, PhD
Principal Scientist
SomaLogic, Inc.

Title:  SOMAscan™– Based Prediction of Early Kidney Function Decline

Robert Kirk DeLisle1; Laila Bruun2; Robert Mehler1; Britta Singer1; Stephen A. Williams1; Anders Christensson2

1SomaLogic, Boulder, CO; 2Dept. Nephrology, Skåne Univ. Hosp., Lund Univ., Malmö, Sweden

Abstract:  Glomerular filtration rate (GFR) is defined as the volume of fluid filtered by the kidney per unit time and is the best measure of renal function. Clearance of certain exogenous substances from the blood allows direct measure of GFR but is not suitable for routine clinical practice. Endogenous clearance markers creatinine and cystatin-C are attractive because they offer simpler means of estimating GFR. However, these estimates are relatively imprecise, especially above 60ml/min/1.73m2. Measuring both markers does not resolve this limitation. The result is that early, significant loss of functioning renal mass may go unrecognized, and the opportunity for timely, focused intervention is lost.

Using SOMAscan™, a high-throughput, multiplexed proteomic assay that simultaneously measures >1100 proteins from small volumes of biological samples, we assayed 183 plasma samples collected at Skåne University Hospital, Sweden from male subjects with GFR>60 as measured by Iohexol clearance, and developed models to assess the ability of protein levels to predict GFR above 60 ml/min/1.73m2. Model performances were compared to those of creatinine and cystatin C, and found to better predict measured GFR >60 (predictive correlation of 0.70 vs. 0.55).  We conclude that it will be possible to develop a more clinically useful blood test that measures GFR>60 ml/min/1.73 m2, enabling detection and intervention in cases of early, clinically relevant renal function loss.

CV:  pdf
Kirk E. Jordan, PhD
IBM Distinguished Engineer
Emerging Solutions Executive & Assoc. Prog. Director
Computational Science Center
IBM T.J. Watson Research

Title: Experience in Improving de novo Transcriptome Assembly to Address Large Data Sets

Co-Authors: Chang-Sik Kim, Vipin Sachdeva, Martyn Winn

Abstract: The world and the life sciences are awash in data. The problem for computing is no longer the ability to compute but the inability to move data and handle large data sets. In this talk, we will describe the work we have done to handle large data sets for de novo transcriptome assembly through developing parallel, distributed memory versions of the Broad Institute’s Trinity workflow. We will relate our efforts in using experimental hardware that gives insight into new systems architecture to better handle such data movement and large data sets.

CV: web
Frank Lee, Ph.D.
Lead Architect, IBM Genomics Solution
Technical Advisor for Life Sciences
Senior-certified Solution Architect
IBM Technical & Platform Computing Worldwide

Title: Tackling Big Data Challenges of Genomics by PowerGene

Abstract: I will discuss the challenges presented by the explosive growth of data and computation in genomics and share an architecture and best practice to 1) acquire, store, access data in scale; 2) build a high-speed (turnaround and throughput) computing infrastructure to process genomic and bioinformatics workload/workflow; 3) make the infrastructure smart by converging data river into data lake. Using real-life projects as case studies, I will share our approaches to tackle the challenges, the ecosystem we are enabling, some of the success stories, lesson learned and highlight potential for collaboration among genomic research communities.

Biography: pdf
Kevin M. Livingston, PhD
Research Associate
University of Colorado

Title: Complex Querying Across Multiple Data Sources Using a Common Biological Model

Abstract: A wealth of biomedical information currently exists spread out over numerous independent data sources. Querying across those data sources has historically been difficult. The Knowledge Base of Biomedicine (KaBOB) integrates information from 20 large prominent data sources and formally represents it in OWL using a common biomedical model grounded in the OBOs (Open Biomedical Ontologies). This enables complex queries across those data sources to be written in biomedical terms, as opposed to working with multiple source-specific data models and identifiers. We demonstrate how such a knowledge base can be interrogated in multiple ways to answer complex biomedical questions. Examples include looking for explanations of synthetic lethal drug-gene interactions in cells; or asking what drugs target what gene products that are in the mitochondria and involved in oxidative phosphorylation. The answers to these questions do not exist in a single database. Additionally KaBOB can answer these questions at multiple levels of granularity and abstraction, for example, querying about species-specific genes versus including information about homologous genes in other model organisms. Questions to KaBOB can be performed by direct querying of the underlying RDF using SPARQL, as well as by deriving results via back-chaining through the data using Prolog. The modeling used by KaBOB enables formal reasoning methods to be employed to derive deductively entailed relationships from the explicitly asserted knowledge. In the near future we hope to also explore abductive inference methods that hypothesize about missing information that if present would help produce more coherent explanations

CV: web
Jill P. Mesirov
Associate Director and Chief Informatics Officer
Director, Computational Biology and Bioinformatics
The Broad Institute of MIT and Harvard
Cambridge, MA, USA

Title: Integrative Computational Approaches for Genomic Medicine

The acceleration of data acquisition, and the corresponding availability of increasing amounts of both genetic and functional data, is changing the face of biomedical research. Computational approaches and new methods can take advantage of these data and bring the promise of improved understanding and treatment of disease.  

I will describe some approaches that leverage multiple data types and emphasize the use of more biologically interpretable models.  In particular I’ll describe our work integrating high-level clinical and genomic features to stratify pediatric brain tumor patients into groups with high and low risk of relapse after treatment.  The approach is more accurate than previous models; highlights possible future drug targets; and represents one of the few such predictors that generalized to a completely independent patient cohort.  In addition, I will present a recently developed method to shed light on the functional correlates of genetic variants. Finally, I will review the software through which we make our methods available to the research community.

CV: web
Elmar A Pruesse, PhD Elmar A Pruesse, PhD
Max Planck Institute for Marine Microbiology
AND University of Colorado Denver

Title: Tackling the rRNA Big-data Problem with ARB and SILVA

Abstract: Since Fox and Woese pioneered the use of ribosomal RNA some four decades ago, related methods have become essential for characterizing microbial communities. Accordingly, a vast body of over 4 million rRNA sequences has been accumulated. As evidenced by dedicated databases and tools such as RDP, greengenes, mothur and qiime, specialized methods are required to successfully conduct analyses given the current volume of reference and study data. Here, we present recent improvements to the combination of resources, services and tools offered by ARB (from arbor, lat. tree) and SILVA (lat. forest) to facilitate rRNA-based studies. The SILVA database includes large and small subunit rRNA data from all three domains, data integration from various sources and an up-to-date curated taxonomy. The SILVA website (~400 users per day) offers services such as sequence classification with SINA or the TestPrime tool for primer coverage validation. ARB integrates a broad range of external tools, unique algorithms and data management and visualization features into a single desktop application with a graphical interface. Version 6 brings enhancements to performance, stability, memory footprint, automation and usability as well as additional or enhanced internal and external analysis methods.

The new SILVAngs service streamlines the analysis of rRNA amplicon data from next generation sequencing into an intuitive web interface. Built on top of the SILVA database pipeline, it always uses the most current reference data and requires no installation. Results include commonly computed microbial ecology statistics as well as novel plots and easy export of processed reads to ARB.
Joanna Roder, PhD Joanna Roder, PhD
Senior Director of Data Analytics
Biodesix, Inc.

Title: Deep Learning, "Deep Data", and Molecular Diagnostics

Abstract: Recent developments in machine learning techniques based on deep learning neural networks have led to huge increases in performance in many “Big Data” problems, such as image recognition and natural language processing. While “Big Data” problems typically consist of millions of instances with hundreds of attributes, data sets arising within the field of personalized medicine usually have thousands of attributes for only a few hundreds of samples. Thus, learning from data generated by molecular multiplexed probes, such as NextGen sequencing or Deep MALDI, in genomic or protein profiling, respectively, can be thought of as a “Deep Data” problem. This presentation will describe how ideas from deep learning can be applied to the “Deep Data” sets arising from molecular profiling of patient samples. We will present the main principles and give examples from the genomic and proteomic arenas to show how these methods can be tailored to the development of clinically useful, multivariate, molecular diagnostic tests.

Donna K. Slonim
Associate Professor of Computer Science
Associate Professor of the Tufts School of Medicine
Genetics Faculty, Sackler School of Graduate Biomedical Sciences
Tufts University
Medford, MA, USA

Title: Human Developmental Bioinformatics: A Case Study in Overcoming Data Limitations

A growing awareness of developmental impacts on lifelong human health has increased interest in improving our understanding of human development at the molecular level. However, existing data collections characterizing gene function and disease relevance are typically imperfect, incomplete, and lacking in sufficient context. This state of affairs has limited the translational application of genomic data to developmental disorders. In this talk, we will discuss such barriers and our experiences addressing them. Adding context-specific functional annotation has improved our ability to interpret developmental data sets. Pooling imperfect gene-disease data across related disease processes has helped us link developmental processes to health outcomes. A new-anomaly detection paradigm for the analysis of expression data sets facilitates interpretation of individual samples. Our recent results in developmental bioinformatics illustrate how domain expertise, contextual awareness,and scaling up can help us overcome the limits of the data.

CV: .pdf
Raymond Tecotzky TECHNICAL TALK

Raymond Tecotzky
Market Manager, Informatics Ecosystem
Illumina, Inc.

Title:  Progress Report: From Sample to Answer - Sequencing in the Cloud Era with the Illumina BaseSpace®Bioinformatics Platform

Abstract:  There have been many improvements to the Illumina BaseSpace Informatics Platform and significant growth in its usage over the past year.  In my short talk, I will describe and demonstrate many of the new and exciting Sample to Answer Applications available from Illumina, including the Apps on BaseSpace and BaseSpace Onsite that enable them.  There are many choices for bioinformatics analysis available today in the genomics community and we will show you why we believe that BaseSpace and BaseSpace Onsite offer a better way to analyze your genomic data and represent the new trend for how all genomic data will be stored, analyzed, and shared in the future.

Biography:  pdf