|JOSEPH ALLISON, PhD
A Story of Proteomic Statistical Process Control at SomaLogic
The processes that have been implemented to support and run the SomaScan® Assay at SomaLogic are incredibly robust but, before the introduction of Umbrella, post-hoc tracking of these processes and their longitudinal stability was spread across the collective consciousness. Umbrella is a bespoke full-stack data science platform designed to coalesce that collective knowledge and report back to the company at large. These reports range from simple control charting to unsupervised ML pipelines to detect nuanced assay artifact, identify potential causes, and document the impact on our SOMAmer® reagents’ signals. It is this system that enables us to be able to robustly distinguish between biological signal and assay artifact.
- top -
|KEVIN BRETONNEL COHEN, PhD
Director, Biomedical Text Mining Group
Computational Bioscience Program
University of Colorado School of Medicine
Two Existential Threats to Biomedical Text Mining...and How to Address Them with Natural Language Processing
The reproducibility crisis calls into question some of the most fundamental use cases of biomedical natural language processing: if 65% of the scientific literature is questionable, what is the point of mining it? Meanwhile, computational research has mostly been immune to the crisis, but there is no a priori reason to expect that state of affairs to continue. This talk proposes that natural language processing itself can address this issue on both fronts--but how?
- top -
JUSTIN GUINNEY, PhD
JOYCE C. HO, PhD
Developing an Evidence Matching Framework Using Web-based Medical Literature
KIRK E. JORDAN, Ph.D.
|OLUWATOSIN OLUWADARE, PhD
Department of Computer Science and Bachelor of Innovation
College of Engineering and Applied Science
University of Colorado
3D Chromosome and Genome Structure Modeling
To improve the understanding of chromosome organization within a cell, chromosome conformation capture techniques such as 3C, 4C, 5C, and Hi-C have been developed. These technologies help to determine the spatial positioning and interaction of genes and chromosome regions within a genome. Using next-generation sequencing strategies such as high-throughput and parallel sequencing, Hi-C can profile read pair interactions on an "all-versus-all" basis—that is, it can profile interactions for all read pairs in an entire genome. The development of chromosome conformation capture techniques, particularly Hi-C, has substantially benefited the study of the spatial proximity, interaction, and genome organization of several cells. In recent years, numerous genome structure construction algorithms have been developed to explain the roles of three-dimensional (3D) structure reconstruction in the cell and to explain abnormalities occurring in a diseased and a normal cell. Three-dimensional inference involves the reconstruction of a genome’s 3D structure or (in some cases) ensemble of structures from contact interaction frequencies represented in a two-dimensional matrix. To solve this 3D inference problem, we developed an optimization-based algorithm that performed better than any existing tool for chromosome and genome 3D structure prediction called 3DMax. 3DMax has been packaged as a software tool and is publicly available to the research community. 3DMax performs well with noisy data; also, its performance is unaffected by changing normalization methods, which is not the case for many other existing methods.
- top -
HEINRICH RÖDER, PhD
Therapeutics for the treatment of cancer patients have been transformed with the introduction of immunotherapies, starting with checkpoint inhibition. Instead of targeting the tumor itself, the mechanism of action for immunotherapies relies on reactivation of the immune system such that the host can re-engage the cancer using the complex system evolutionarily developed to heal human disease. In oncology, clinical trials have proven that immunotherapies are effective at reducing tumor burden and extending survival in cancer patients across many indications. However, not all patients benefit from all immunotherapies. Specifically, there is a subgroup of patients whose lack of response may be attributed to a compromised immune system referred to as primary immunotherapy resistance (PIR). A test identifying patients with PIR, prior to treatment with specific immunotherapies, would be useful for guiding therapeutic decision making.
Biodesix uses a hypothesis-free approach to build clinically relevant tests allowing the creation of multivariate classifiers related to deep learning that reflect the complexity of biological interactions without any bias from expectations about their mechanisms. Mass spectral data collected from human serum samples are analyzed by the Diagnostic Cortex® robust data analytics machine learning based platform to design classifiers with clinical relevance. Using this approach, we have developed multiple independently validated tests to identify patients with melanoma and lung cancer that have particular poor outcomes on anti-PD1 immunotherapy and therefore may be unsuitable candidates for treatment with checkpoint inhibition.
These tests stratify patients into different immunological phenotypes with different outcomes on immunotherapies. We applied ideas similar to GSEA (Gene Set Enrichment Analysis) to mass spectral data (PSEA: Protein Set Enrichment Analysis) to gain biological insight into the processes detectable in the circulating proteome related to these phenotypes. We found host immunological functions, such as acute phase response, wound healing, and complement system are related to test classification labels indicating that treatment success does not depend solely on a single molecule, protein, or signaling pathway. Our systems biology approach combining proteomics and machine learning methods is hypothesis generating and requires further external validation; however, our findings have been supported by independent research groups using orthogonal approaches.
- top -
|NIMISHA SCHNEIDER, PhD
QuartzBio, part of Precision for Medicine
Coupling Data-Driven and Mechanistic Modeling Approaches Through the Application of a Scalable, Knowledge-Driven Framework and High-Throughput Public Omics Data Sources
Advances in high throughput measurement technologies (-omics data) have made it possible, and increasingly affordable, to generate high complexity, high volume data for medical research, and these data are increasingly available to researchers through public sources. Rapidly mining these data for useful mechanistic insights can be a challenge, given their complexity; for example, the cancer genome atlas contains multi-omic and clinical profiles of 11K+ patients across 33 cancer subtypes, over 500K files and 1 billion measurements. This talk will outline (1) how we are building the infrastructure and methods that couple data-driven approaches with a knowledgebase to enable mechanistic modeling of public data sources, (2) the challenges we face when modeling these kinds of data e.g., overfitting of models due to large feature sets on small sample sizes, and (3) a case study on one approach where we mined TCGA for mechanistic insights.
|- top -|