Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


ROCKY 2021 | Dec 2 – 4, 2021 | Aspen/Snowmass, CO | Keynote Speakers


Links within this page:
Melissa Haendel
Michael A. Hinterberg
Diane M. Korngiebel
Wouter Meuleman
Thomas Schaffter
David Van Valen

Melissa Haendel, PhD MELISSA HAENDEL, PhD
University of Colorado
United States

Biography (.pdf)

Who Has long-Covid? A Small and Big Data Approach

Post-acute sequelae of SARS-CoV-2 infection, or long-COVID, have severely impacted recovery from the pandemic for patients and society alike. This new disease is characterized by evolving, heterogeneous symptoms, which not only makes it a challenge to derive an unambiguous long-COVID definition, but hampers clinicians ability to offer effective and timely treatment. Clinicians and patients report distinct albeit overlapping spectra of symptoms making long-COVID classification difficult for diagnosis and care management. The clinical view is therefore incomplete. We have used the Human Phenotype Ontology to classify symptoms from patients and clinicians, which can provide subclasses of long-covid and the foundation for improved patient diagnosis and care management. Electronic health records (EHRs) could also be a good source of data for rapidly identifying patients with long-COVID. However, the aforementioned overlapping and incomplete spectra of symptoms makes harvesting the correct data from heterogenous EHR databases a significant challenge. Using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning models to identify potential long-COVID patients. We examined demographics, healthcare utilization, diagnoses, and medications for 97,995 adult COVID-19 patients. Our models identified potential long-COVID patients with high accuracy, with important features including rate of healthcare utilization, patient age, dyspnea, and other diagnosis and medications. Combinatorial approaches such as those presented here are especially useful in the face of a new disease with different patient trajectories and few treatment options, and can provide the basis for research studies and treatment strategies.
- top -  
Michael A. Hinterberg, PhD MICHAEL A. HINTERBERG, PhD
Senior Bioinformatics Scientist
SomaLogic, Inc.
United States

LinkedIn Profile

Dramatically Elevated Proteomic Risk Profiles Predict COVID-19 Severity

The effects of COVID-19 are strongly linked to cardiovascular disease and disease mechanisms and have additional multi-organ effects. Despite evidence that people with existing cardiovascular disease and risk factors are generally at higher risk for severe COVID-19, traditional clinical risk factors and measurements are often insufficient in the context of acute COVID-19. Using SomaScan® to measure 7,000 proteins simultaneously, along with developed and validated tests for cardiovascular risk, we show substantial effects of circulating plasma protein changes observed within hours or days associated with COVID-19 severity and mortality. Elevated cardiovascular risk prediction is predictive of COVID-19 severity and is superior to established cardiovascular clinical biomarkers. Additionally, predictive proteomic models for kidney and liver health, and models for cardiometabolic fitness, are significantly associated with COVID-19 outcomes. These results provide unique and important insight into cardiovascular disorders and multi-organ dysfunction associated with COVID-19, and broaden the applicability of proteomics into novel disease research.
- top -  
Diane M Korngiebel, DPhil DIANE M. KORNGIEBEL, DPhil
Bioethics Team, Google
Affiliate Associate Professor, in the Department of Biomedical Informatics & Medical Education and Department of Bioethics & Humanities, University of Washington School of Medicine
United States

Biography (.pdf)

Humans “in-the-loop”: Practical Recommendations for Enhancing the Trustworthiness of AI Development for Healthcare

Rather than rely mainly on human-in-the-loop oversight of AI, Dr. Korngiebel presents how trustworthy AI starts with trustworthy AI development. She describes some of her recent NIH-funded research on creating an ethics framework for AI development and reports some preliminary findings from the project’s first aim, mapping decision points in the AI development process. Then, drawing on the work of that project and her experience as an applied ethicist embedded in bioinformatics, she offers practical recommendations for ways to plan for “looping in” humans during AI development.
- top -  
Wouter Meuleman, PhD WOUTER MEULEMAN, PhD
Principal Investigator
Altius Institute for Biomedical Sciences
United States

Biography (web)

Mapping and Navigating the Human Regulatory Genome

This year marks the 20 year anniversary of the sequencing of the human genome in 2001. Since then, many large-scale data generation and analysis efforts have built upon this work by producing genome-wide maps and annotations. Most recently, we have systematically delineated and annotated accessible DNA elements in the human genome, by integrating more than 700 genome-wide maps of chromatin accessibility resulting in a single high definition annotation. Additionally, we have developed simple information-theoretic metrics (epilogos) to integrate chromatin state data across nearly 1,000 cell types and states. Despite these and many other efforts, systems to efficiently navigate genomic maps at scale have remained lacking. At the same time, consumer-facing web businesses such as Zillow, Spotify and Amazon have long understood the value of learning from patterns collected across large corpora of data, to better serve customers, maximize investment returns and prioritize future directions. This gap between current practice in genomics and ultimate potential forms the overarching motivation for our work. We are coupling massive amounts of genomics data to powerful recommendation engines and related machine learning approaches to generate insights not otherwise obtained. These ideas represent an essential and inevitable transition towards “augmented genomics”, a new field in which the work of genome scientists is supplemented - not replaced! - by data-driven machine intelligence.
- top -  
Thomas Schaffter, PhD

Lead of Research & Benchmarking Technology Workstream
Senior Bioinformatics & Full Stack Engineer
Sage Bionetworks
United States

LinkedIn Profile

NLP Sandbox: Overcoming Data Access Barriers to Reliably Assess the Performance of NLP Tools

Critical patient information derived from academic research, health care, and clinical trials are off limits for traditional data-to-model (whereby data is transferred/downloaded into a new environment to be colocated with the executable model) benchmarking of NLP tools. Existing barriers include restricted access to prohibitively large or sensitive data. In addition to data access constraints, we also lack effective frameworks for assessing the performance and generalizability of NLP tools.

The NLP Sandbox adopts a model-to-data approach to enable NLP developers to assess the performance of their tools on public and private datasets. When a developer submits a tool, partner organizations (e.g., hospitals, universities) automatically provision a tool, execute it, and evaluate its performance against their private data in a secure environment. Upon successful completion, the partner organization reports what the performance of the tool is and this report is automatically published in the NLP Sandbox leaderboards.

The first series of NLP tasks that the NLP Sandbox supports is the annotation of Protected Health Information (PHI) in clinical notes. These tasks have been identified through our collaboration with the National Center for Data to Health (CD2H). Submitted tools are currently evaluated on the dataset of the 2014 i2b2 NLP De-identification Challenge and private data from MCW. Additional data sites are currently being onboarded (Mayo Clinic, UW).
- top -  
David Van Valen, PhD

Assistant Professor
Biology and Biological Engineering
United States

Biosketch (pdf)

Single-cell Biology in a Software 2.0 World

Multiplexed imaging methods can measure the expression of dozens of proteins while preserving spatial information. While these methods open an exciting new window into the biology of human tissues, interpreting the images they generate with single cell resolution remains a significant challenge. Current approaches to this problem in tissues rely on identifying cell nuclei, which results in inaccurate estimates of cellular phenotype and morphology. In this work, we overcome this limitation by combining multiplexed imaging’s ability to image nuclear and membrane markers with large-scale data annotation and deep learning. We describe the construction of TissueNet, an image dataset containing more than a million paired whole-cell and nuclear annotations across eight tissue types and five imaging platforms. We also present Mesmer, a single model trained on this dataset that can perform nuclear and whole cell segmentation with human-level accuracy across tissue types and imaging platforms. We show that Mesmer accurately measures cell morphology in tissues, opening up a new observable for quantifying cellular phenotypes in tissues and harmonizing disparate datasets. We make this model available to users of all backgrounds with both cloud-native software and on-premise software. Last, we also describe ongoing work to develop a similar resource and models for dynamic live-cell imaging data.

- top -