Print

12th Annual Rocky Mountain Bioinformatics Conference

KEYNOTE SPEAKERS

Updated Nov 20, 2014

Links within this page:

Judith A. Blake, PhD
Robert Kirk DeLisle, PhD
Kirk E. Jordan, PhD
Frank Lee, PhD
Kevin M. Livingston, PhD
Jill P. Mesirov, PhD
Elmar A. Pruesse, PhD
Joanna Roder, PhD
Donna K. Slonim, PhD
Raymond Tecotzky (Technical Talk)

	Judith A Blake, Ph.D. Associate Professor The Jackson Laboratory Title: Glycolysis Pathways in the Gene Ontology: All Roads Lead to Pyruvate Abstract: The Gene Ontology (GO) is a freely-available resource that provides connections between gene products and a structured, controlled vocabulary of biological terms used to describe how gene products function. The representation of biochemical pathways presents a challenge because different species may utilize different enzymatic activities or substrates to carry out similar processes. Glycolysis is an example of an overall process that is well conserved among species, but varies with respect to input molecules and enzymatic activities. We describe a strategy to represent a taxonomy of glycolytic processes that accounts for variation seen in different biological contexts. We factor out core, conserved processes, and represent variations as subtypes. Glycolytic processes are defined by axioms that include the parent superclass process, the molecular functions that are necessary parts of the process and the input and output chemical entities. We group the conserved processes by shared intermediate substances, such as a shared glucose-6-phosphate intermediate. Creating formal definitions for pathways allows use of OWL-based reasoning for further inferred classification. Inclusion of necessary molecular functions not only aids ontology maintenance but also guides annotators in determining which genes are directly involved in a process. We use glycolysis in mouse sperm to show how curators can use the necessary functions associated with glycolytic process to infer that subtype for annotation and how users can explore annotations to determine which isoforms of an enzyme are used in a given biological context. This work supported by NIH:NHGRI grants U41HG002273 to GOC and U41HG003751 to Reactome Knowledgebase. CV: web
	Robert Kirk DeLisle, PhD Principal Scientist SomaLogic, Inc. Colorado Title: SOMAscan™– Based Prediction of Early Kidney Function Decline Robert Kirk DeLisle¹; Laila Bruun²; Robert Mehler¹; Britta Singer¹; Stephen A. Williams¹; Anders Christensson² ¹SomaLogic, Boulder, CO; ²Dept. Nephrology, Skåne Univ. Hosp., Lund Univ., Malmö, Sweden Abstract: Glomerular filtration rate (GFR) is defined as the volume of fluid filtered by the kidney per unit time and is the best measure of renal function. Clearance of certain exogenous substances from the blood allows direct measure of GFR but is not suitable for routine clinical practice. Endogenous clearance markers creatinine and cystatin-C are attractive because they offer simpler means of estimating GFR. However, these estimates are relatively imprecise, especially above 60ml/min/1.73m2. Measuring both markers does not resolve this limitation. The result is that early, significant loss of functioning renal mass may go unrecognized, and the opportunity for timely, focused intervention is lost. Using SOMAscan™, a high-throughput, multiplexed proteomic assay that simultaneously measures >1100 proteins from small volumes of biological samples, we assayed 183 plasma samples collected at Skåne University Hospital, Sweden from male subjects with GFR>60 as measured by Iohexol clearance, and developed models to assess the ability of protein levels to predict GFR above 60 ml/min/1.73m2. Model performances were compared to those of creatinine and cystatin C, and found to better predict measured GFR >60 (predictive correlation of 0.70 vs. 0.55). We conclude that it will be possible to develop a more clinically useful blood test that measures GFR>60 ml/min/1.73 m2, enabling detection and intervention in cases of early, clinically relevant renal function loss. CV: pdf
	Kirk E. Jordan, PhD IBM Distinguished Engineer Emerging Solutions Executive & Assoc. Prog. Director Computational Science Center IBM T.J. Watson Research Title: Experience in Improving de novo Transcriptome Assembly to Address Large Data Sets Co-Authors: Chang-Sik Kim, Vipin Sachdeva, Martyn Winn Abstract: The world and the life sciences are awash in data. The problem for computing is no longer the ability to compute but the inability to move data and handle large data sets. In this talk, we will describe the work we have done to handle large data sets for de novo transcriptome assembly through developing parallel, distributed memory versions of the Broad Institute’s Trinity workflow. We will relate our efforts in using experimental hardware that gives insight into new systems architecture to better handle such data movement and large data sets. CV: web
	Frank Lee, Ph.D. Lead Architect, IBM Genomics Solution Technical Advisor for Life Sciences Senior-certified Solution Architect IBM Technical & Platform Computing Worldwide Title: Tackling Big Data Challenges of Genomics by PowerGene Abstract: I will discuss the challenges presented by the explosive growth of data and computation in genomics and share an architecture and best practice to 1) acquire, store, access data in scale; 2) build a high-speed (turnaround and throughput) computing infrastructure to process genomic and bioinformatics workload/workflow; 3) make the infrastructure smart by converging data river into data lake. Using real-life projects as case studies, I will share our approaches to tackle the challenges, the ecosystem we are enabling, some of the success stories, lesson learned and highlight potential for collaboration among genomic research communities. Biography: pdf
	Kevin M. Livingston, PhD Research Associate University of Colorado Title: Complex Querying Across Multiple Data Sources Using a Common Biological Model Abstract: A wealth of biomedical information currently exists spread out over numerous independent data sources. Querying across those data sources has historically been difficult. The Knowledge Base of Biomedicine (KaBOB) integrates information from 20 large prominent data sources and formally represents it in OWL using a common biomedical model grounded in the OBOs (Open Biomedical Ontologies). This enables complex queries across those data sources to be written in biomedical terms, as opposed to working with multiple source-specific data models and identifiers. We demonstrate how such a knowledge base can be interrogated in multiple ways to answer complex biomedical questions. Examples include looking for explanations of synthetic lethal drug-gene interactions in cells; or asking what drugs target what gene products that are in the mitochondria and involved in oxidative phosphorylation. The answers to these questions do not exist in a single database. Additionally KaBOB can answer these questions at multiple levels of granularity and abstraction, for example, querying about species-specific genes versus including information about homologous genes in other model organisms. Questions to KaBOB can be performed by direct querying of the underlying RDF using SPARQL, as well as by deriving results via back-chaining through the data using Prolog. The modeling used by KaBOB enables formal reasoning methods to be employed to derive deductively entailed relationships from the explicitly asserted knowledge. In the near future we hope to also explore abductive inference methods that hypothesize about missing information that if present would help produce more coherent explanations CV: web
	Jill P. Mesirov Associate Director and Chief Informatics Officer Director, Computational Biology and Bioinformatics The Broad Institute of MIT and Harvard Cambridge, MA, USA Title: Integrative Computational Approaches for Genomic Medicine The acceleration of data acquisition, and the corresponding availability of increasing amounts of both genetic and functional data, is changing the face of biomedical research. Computational approaches and new methods can take advantage of these data and bring the promise of improved understanding and treatment of disease. I will describe some approaches that leverage multiple data types and emphasize the use of more biologically interpretable models. In particular I’ll describe our work integrating high-level clinical and genomic features to stratify pediatric brain tumor patients into groups with high and low risk of relapse after treatment. The approach is more accurate than previous models; highlights possible future drug targets; and represents one of the few such predictors that generalized to a completely independent patient cohort. In addition, I will present a recently developed method to shed light on the functional correlates of genetic variants. Finally, I will review the software through which we make our methods available to the research community. CV: web
	Elmar A Pruesse, PhD Max Planck Institute for Marine Microbiology AND University of Colorado Denver Title: Tackling the rRNA Big-data Problem with ARB and SILVA Abstract: Since Fox and Woese pioneered the use of ribosomal RNA some four decades ago, related methods have become essential for characterizing microbial communities. Accordingly, a vast body of over 4 million rRNA sequences has been accumulated. As evidenced by dedicated databases and tools such as RDP, greengenes, mothur and qiime, specialized methods are required to successfully conduct analyses given the current volume of reference and study data. Here, we present recent improvements to the combination of resources, services and tools offered by ARB (from arbor, lat. tree) and SILVA (lat. forest) to facilitate rRNA-based studies. The SILVA database includes large and small subunit rRNA data from all three domains, data integration from various sources and an up-to-date curated taxonomy. The SILVA website (~400 users per day) offers services such as sequence classification with SINA or the TestPrime tool for primer coverage validation. ARB integrates a broad range of external tools, unique algorithms and data management and visualization features into a single desktop application with a graphical interface. Version 6 brings enhancements to performance, stability, memory footprint, automation and usability as well as additional or enhanced internal and external analysis methods. The new SILVAngs service streamlines the analysis of rRNA amplicon data from next generation sequencing into an intuitive web interface. Built on top of the SILVA database pipeline, it always uses the most current reference data and requires no installation. Results include commonly computed microbial ecology statistics as well as novel plots and easy export of processed reads to ARB.
	Joanna Roder, PhD Senior Director of Data Analytics R&D Biodesix, Inc. Title: Deep Learning, "Deep Data", and Molecular Diagnostics Abstract: Recent developments in machine learning techniques based on deep learning neural networks have led to huge increases in performance in many “Big Data” problems, such as image recognition and natural language processing. While “Big Data” problems typically consist of millions of instances with hundreds of attributes, data sets arising within the field of personalized medicine usually have thousands of attributes for only a few hundreds of samples. Thus, learning from data generated by molecular multiplexed probes, such as NextGen sequencing or Deep MALDI, in genomic or protein profiling, respectively, can be thought of as a “Deep Data” problem. This presentation will describe how ideas from deep learning can be applied to the “Deep Data” sets arising from molecular profiling of patient samples. We will present the main principles and give examples from the genomic and proteomic arenas to show how these methods can be tailored to the development of clinically useful, multivariate, molecular diagnostic tests. Biography: pdf
	Donna K. Slonim Associate Professor of Computer Science Associate Professor of the Tufts School of Medicine Genetics Faculty, Sackler School of Graduate Biomedical Sciences Tufts University Medford, MA, USA Title: Human Developmental Bioinformatics: A Case Study in Overcoming Data Limitations Abstract: A growing awareness of developmental impacts on lifelong human health has increased interest in improving our understanding of human development at the molecular level. However, existing data collections characterizing gene function and disease relevance are typically imperfect, incomplete, and lacking in sufficient context. This state of affairs has limited the translational application of genomic data to developmental disorders. In this talk, we will discuss such barriers and our experiences addressing them. Adding context-specific functional annotation has improved our ability to interpret developmental data sets. Pooling imperfect gene-disease data across related disease processes has helped us link developmental processes to health outcomes. A new-anomaly detection paradigm for the analysis of expression data sets facilitates interpretation of individual samples. Our recent results in developmental bioinformatics illustrate how domain expertise, contextual awareness,and scaling up can help us overcome the limits of the data. CV: .pdf
	TECHNICAL TALK Raymond Tecotzky Market Manager, Informatics Ecosystem Illumina, Inc. Title: Progress Report: From Sample to Answer - Sequencing in the Cloud Era with the Illumina BaseSpace®Bioinformatics Platform Abstract: There have been many improvements to the Illumina BaseSpace Informatics Platform and significant growth in its usage over the past year. In my short talk, I will describe and demonstrate many of the new and exciting Sample to Answer Applications available from Illumina, including the Apps on BaseSpace and BaseSpace Onsite that enable them. There are many choices for bioinformatics analysis available today in the genomics community and we will show you why we believe that BaseSpace and BaseSpace Onsite offer a better way to analyze your genomic data and represent the new trend for how all genomic data will be stored, analyzed, and shared in the future. Biography: pdf
TOP