Links within this page:
- Robyn L. Ball, PhD
- Tiffany J. Callahan, PhD
- Zhiyong Lu, PhD FACMI
- David Nicholson, PhD
- Blake Williams
- Zhongming Zhao, PhD
|ROBYN L. BALL, PhD
The Jackson Laboratory
Bio and CV(web)
Traversing the Mouse-human Interface: An Integrative Bioinformatics Approach to Identify Preclinical Models and Identify Genes and Variants with Shared Roles in Complex Disease
Complex diseases are characterized by heterogeneous genetic mechanisms and diverse biological responses to the environment, lifestyle and drugs and therapeutics. Historically, therapeutics have been tested in standard strains of mice, or simplistic single-gene perturbation disease models, often leading to translational failures. An integrated bioinformatic approach to identify the backgrounds that are most likely to be susceptible to the disease and to respond to the treatment can lead to improved preclinical models for complex disease. Our approach takes advantage of extensive data on complex traits that have been generated through decades of investigation in mouse populations combined with newly developed data resources and pipelines. GenomeMUSter, a comprehensive mouse genotype resource (~83M SNPs), the Mouse Phenome Database GWAS meta-analysis server, and the VariantGraph (~4B associations) enable the use of evidence obtained in model organism studies to prioritize and characterize human genetic variants and find mouse traits that have convergence with characteristics of human diseases. Collectively these data and analytic services can be used to find genes and variants that are associated with human disease genes and their orthologs. The new Strain Recommender workflow uses genetic, molecular and phenotypic signals of human disease to recommend mouse backgrounds likely to be susceptible (and resistant) to the disease and treatment, representing preclinical models that have construct relevant similarity to various manifestations of disease in affected individuals. This work is supported by NIH U54OD030187, NIH DA028420 and by The Jackson Laboratory, The Cube Initiative Program Fund.
|- top -|
|TIFFANY J. CALLAHAN, PhD
Postdoctoral Research Fellow
Department of Biomedical Informatics
The Future of Translational Informatics: Leveraging Knowledge to Enable Causal Explanations at Scale
Rather than limited by the amount of data available, researchers are now faced with how to meaningfully extract and represent the knowledge these data contain. Knowledge graphs integrate disparate data, decipher complex processes, and have frequently been used to systematically interrogate complicated systems. Dr. Callahan will describe some of her recent research on leveraging knowledge graphs to enable semantic interoperability and improve the translational utility of deep computational phenotyping. In celebration of the 20th anniversary of the Rocky Bioinformatics Conference and the University of Colorado Anschutz Medical Campus’ Computational Bioscience Program, Dr. Callahan will also share some of her predictions for the next 20 years of computational biology and biomedical research.
|- top -|
|ZHIYONG LU, PhD FACMI
National Center for Biotechnology (NCBI)
PubTator: 10 Years of Growth and Innovation
The explosion of biomedical big data and information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge—mostly exists as free text in journal articles for humans to read—presents a grand new challenge: individual scientists around the world are increasingly finding themselves overwhelmed by the sheer volume of research literature and are struggling to keep up to date and to make sense of this wealth of textual information. Our research aims to break down this barrier and to empower scientists towards accelerated knowledge discovery. In this talk, I will present our research on developing large-scale, machine-learning based tools for better understanding scientific text in the biomedical literature. In particular, I will discuss our widely used PubTator tool for the semantic annotation of the biomedical literature with its current status and future plans.
|- top -|
| DAVID NICHOLSON, PhD
Digital Science & Research Solutions Ltd.
Changing Word Meanings in Biomedical Literature Reveal Pandemics and New Technologies
While we often think of words as having a fixed meaning that we use to describe a changing world, words are also dynamic and changing. Scientific research can also be remarkably fast-moving, with new concepts or approaches rapidly gaining mind share. We examined scientific writing, both preprint and pre-publication peer-reviewed text, to identify terms that have changed and examine their use. One particular challenge that we faced was that the shift from closed to open access publishing meant that the size of available corpora changed by over an order of magnitude in the last two decades. We developed an approach to evaluate semantic shift by accounting for both intra- and inter-year variability using multiple integrated models. Using this strategy and examining year-by-year changes revealed thousands of change points in both corpora. We found change points for tokens including ‘cas9’, ‘pandemic’, and ‘sars’ among many others. The consistent change-points between pre-publication peer-reviewed and preprinted text were largely related to the COVID-19 pandemic. We developed a web app for exploration (https://greenelab.github.io/word-lapse/) that enables users to investigate individual terms. To our knowledge, this analysis is the first to examine semantic shift in biomedical preprints and pre-publication peer-reviewed text, and it lays the foundation for future work to examine how terms acquire new meaning and the extent to which that process is encouraged or discouraged by peer review.
|- top -|
Bioinformatics Analyst III
Topic Modeling: Unsupervised Learning and Drivers of the Human Serum Proteome
The human serum proteome is a complex system with dynamic protein levels, protein-protein interactions, and converging biological pathways. With the SomaScan® Assay we are able to measure 7,000 proteins simultaneously, enabling analytic methods suitable for rich data sets. Here we share topic modeling, an unsupervised machine learning technique, to describe SomaScan Assay data in a space of reduced and interpretable dimensions called topics. These topics pool information across all measured proteins, enabling pathway detection, assisting biomarker discovery, and clustering samples on interpretable drivers of the proteome. Topic models can support biomarker discovery for diseases such as non-alcoholic steatohepatitis, while suggesting related pathways and expanding the pool of potential biomarkers beyond those of univariate analyses. These results suggest that topic models of SomaScan Assay measurements may be used to assist biomarker discovery, investigate relationships between diseases and pathways, and identify proteomic subtypes of diseases.
|- top -|
|ZHONGMING ZHAO, PhD
Professor and Chair
School of Biomedical Informatics and School of Public Health
University of Texas Health Science Center at Houston
Deep Generative Neural Network for Accurate Drug Response Prediction
Drug response differs substantially in cancer patients due to inter- and intra-tumor heterogeneity. Transcriptome context, especially in tumor microenvironment, has been shown playing a significant role in shaping the actual treatment outcome. In this work, we developed deep variational autoencoder (VAE) model to compress thousands of genes into latent vectors in a low-dimensional space. We demonstrate that these encoded vectors could accurately impute cancer drug response, outperform standard signature-gene based approaches, and appropriately control the overfitting problem. We applied rigorous quality assessment and validation, including assessing the impact of cell line lineage, cross-validation, cross-panel evaluation, and application in independent clinical data sets, to ensure the accuracy of the imputed drug response in both cell lines and cancer samples. Our novel measure, expression-regulated component (EReX) of the observed drug response, achieved high correlation across panels. Using the well-trained models, we imputed drug response of The Cancer Genome Atlas (TCGA) data and investigate the features and signatures associated with the imputed drug response, including cell line origins, somatic mutations and tumor mutation burdens, tumor microenvironment, and confounding factors. Furthermore, we benchmarked our VAEN model and other four embedding-based methods for accurate and transferable prediction of drug response, and the extensive evaluation results using cross-panels, cross-datasets, and target genes were implemented in a user-friendly online server DrVAEN, which has broad use in cancer research, model evaluation, and drug development.