Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

RSG with DREAM 2020 | November 16-19, 2020 | Conference Programme

RECOMB/ISCB Conference on Regulatory & Systems Genomics with DREAM Challenges 2020 Poster Information

View Talks By Category

Poster Presentations

DREAM

DREAM - 1: A Registry of Open Community Challenges (ROCC) to Increase Ease of Discovery and Challenge Participation

Show
Keywords: DREAM Challenges Benchmarking Open Science
  • Verena Chung, Sage Bionetworks, United States
  • Michael Mason, Sage Bionetworks, United States
  • Thomas Yu, Sage Bionetworks, United States
  • Thomas Schaffter, Sage Bionetworks, United States
  • Milen Nikolov, Sage Bionetworks, United States
  • Gustavo Stolovitzky, IBM, United States
  • James Eddy, Sage Bionetworks, United States
  • Justin Guinney, Sage Bionetworks, United States

Short Abstract: Over the years, a growing number of various biomedical and benchmarking challenges have become more popular among the open-community. However, there is currently not a straightforward way to search for and query information about active and upcoming challenges in one place. Instead, one must sleuth through many avenues to look for one that may be of interest and/or fit their expertise, which may unfortunately result in missing key dates for participation. The goal of the Registry of Open Community Challenges (ROCC) is to increase the ease of discovery for these challenges, by creating a portal that will standardize and highlight key features about a challenge. These captured metadata are based on a schema known as the “minimal information about a challenge” (MIAC), and examples include challenge questions, available data, timelines and rounds, funders and organizers, domains, scoring metrics, and type of submission accepted (traditional or model-to-data). Participants will be able to use ROCC to search for challenges in one of two ways: navigate through a web-based platform or call on a set of RESTful APIs. ROCC can also be utilized by challenge organizers to upload information about upcoming challenges or to update details on an active challenge. The development of a prototype of ROCC is currently underway and will initially focus on 38 DREAM challenges (from 2013 to mid-2020) and 152 non-DREAM challenges, including CAGI, CAMDA, BioCreAtivE, CASP, and more. A long-term goal of the ROCC is to expand to more non-DREAM challenges, and to create a higher standard for how open-community challenges are annotated, which could then lead to higher discoverability and increased participation.

DREAM - 10: An Iterative Strategy Optimizing CDE Recommendations from Real-World Data

Show
Keywords: metadata annotation semantics biomedical biomedical metadata common data element natural language processing machine learning python levenshtein distance
  • Attila L. Egyedi, Stanford Center for Biomedical Informatics Research / Stanford University, United States
  • Marcos Martínez-Romero, Stanford Center for Biomedical Informatics Research / Stanford University, United States
  • Martin O'Connor, Stanford Center for Biomedical Informatics Research / Stanford University, United States
  • John Graybeal, Stanford Center for Biomedical Informatics Research / Stanford University, United States
  • Mark Musen, Stanford Center for Biomedical Informatics Research / Stanford University, United States

Short Abstract: Annotating medical metadata—and metadata in general—is a tedious and error-prone task for humans. There are usually many usable machine-assisted methods to achieve the same goals, from simple rule-based systems to algorithms applying the latest and most sophisticated findings in the ML world. During the Metadata Annotation DREAM Challenge, teams attempted to mimic the ability of individual curators to choose common data elements—standardized and curated definitions of fields that can be used on clinical forms—that are appropriate for a given data set, containing given header labels and data values. The CEDAR Team developed an algorithm which tries to achieve good results against the provided scoring algorithm, while keeping a relatively simple algorithm with a quick runtime. Our team chose this path so that our algorithm can be easily deployed in real life systems, operated in real-time, used to support human selection, and understood and maintained by its adopters. In this talk, we will describe our approach, its strengths and weaknesses, and why we felt it was a good solution for likely real-world applications involving these types of selection problems.

DREAM - 11: Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth

Show
Keywords: transcriptomics proteomics spontaneous preterm birth gestational age machine learning
  • Adi L. Tarca, Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, United States
  • Bálint Ármin Pataki, Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
  • Roberto Romero, Perinatology Research Branch, NICHD/NIH, United States
  • Marina Sirota, Department of Pediatrics, University of California, San Francisco,, United States
  • Yuanfang Guan, Department of Computational Medicine and Bioinformatics, University of Michigan, United States
  • Rintu Kutum, Informatics and Big Data Unit, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
  • Nardhy Gomez-Lopez, Department of Biochemistry, Microbiology and Immunology, Wayne State University School of Medicine, Detroit, Michigan, United States
  • Bogdan Done, Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, United States
  • Gaurav Bhatti, Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, United States
  • Thomas Yu, Sage Bionetworks, Seattle, WA, United States
  • Gaia Andreoletti, Department of Pediatrics, University of California, San Francisco, California, United States
  • Tinnakorn Chaiworapongsa, Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, United States
  • Sonia S. Hassan, Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, United States
  • Chaur-Dong Hsu, Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, United States
  • Nima Aghaeepour, Department of Biomedical Data Sciences Stanford University School of Medicine, Stanford, California, United States
  • Gustavo Stolovitzky, IBM T.J. Watson Research Center, Yorktown Heights, New York, United States
  • Istvan Csabai, Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
  • James C. Costello, Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States

Short Abstract: Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. We found that whole blood gene expression predicts ultrasound-based gestational ages in normal and complicated pregnancies (r=0.83), as well as the delivery date in normal pregnancies (r=0.86), with an accuracy comparable to ultrasound. However, unlike the latter, transcriptomic data collected at <37 weeks of gestation predicted the delivery date of one third of spontaneous (sPTB) cases within 2 weeks of the actual date. Based on samples collected before 33 weeks in asymptomatic women, we found expression changes preceding preterm prelabor rupture of the membranes that were consistent across time points and cohorts, involving, among others, leukocyte-mediated immunity. Plasma proteomic random forests predicted sPTB with higher accuracy and earlier in pregnancy than whole blood transcriptomic models (e.g. AUROC=0.76 vs. AUROC=0.6 at 27-33 weeks of gestation).

DREAM - 12: Metadata Automation: A TF-IDF and Nearest Neighbors Approach

Show
Keywords: Metadata Automation DREAM Challenge Term Frequency – Inverse Document Frequency (TF-IDF) Nearest neighbors Scikit-Learn FuzzyWuzzy Levenshtein distance Fuzzy matching
  • Emily Hartley, Critical Path Institute, United States
  • Roopal Bhatnagar, Critical Path Institute, United States
  • Kurt Michels, Critical Path Institute, United States

Short Abstract: The goal of the Metadata Automation DREAM Challenge was to develop a tool to automate the annotation of metadata fields and values in structured biomedical data files with the best candidate Common Data Element (CDE) matches from the Cancer Data Standards Registry and Repository (caDSR). We chose to implement our model in Python 3.6 and approached this challenge from the perspective that it was essentially a fuzzy matching problem. Our approach utilizes Scikit-Learn’s TfidfVectorizer class along with a custom n-gram function to vectorize the data. These term frequency - inverse document frequency (TF-IDF) vectors are passed to Scikit-Learn’s Nearest Neighbor class which returns the k nearest CDE neighbors and their associated distance scores for each column header in the biomedical data file. For the returned CDEs with enumerated values, the Levenshtein distances from the observed values in the data to the CDE’s permissible values are computed using Python’s FuzzyWuzzy library. We then use a decision tree approach based on the TF-IDF distance scores and the observed values’ average Levenshtein distance scores to select and rank the top three CDE matches from the set of nearest neighbors for each column header. In this final ranking step, we apply cutoff values to the distance scores to determine when to include ‘NOMATCH’ as one of the three results. Throughout the challenge we experimented with many aspects of the algorithm including modifying the n-gram function, the selection of caDSR fields to include in the TF-IDF vectorization and applying different cutoff values. The final version of our model was arrived at by selecting the features and parameters that maximized the overall score across all the provided test datasets.

DREAM - 13: Rheumatoid arthritis X-ray evaluation with deep learning

Show
Keywords: biomedical imaging deep learning for healthcare computer-aided diagnostics
  • Alex Olar, Eötvös Loránd University, Hungary

Short Abstract: In this work we created a method for automated joint scoring where we confidently detect joints in the hands and feet and we score them with an intricate ensemble model while taking into account joint damage of all limbs with a random forest model. Our approach is very well thought out and we experimented with many failed attempts to make the score better. As far as we are aware there are no similar work in the literature to ours and we have not used any additional datasets in order to achieve these results.

DREAM - 14: CTD-squared Pancancer Drug Activity DREAM Challenge

Show
Keywords: polypharmacology machine learning drug discovery deep learning kinases cancer oncology drug sensitivity gene expression pharmacogenomics
  • Robert Allaway, Sage Bionetworks, United States
  • Eugene Douglass, University of Georgia, United States
  • Bence Szalai, Semmelweis University, Hungary
  • Verena Chung, Sage Bionetworks, United States
  • Pamela Birriel, NIH/NCI, United States
  • Mahalaxmi Aburi, Columbia University, United States
  • Iuliana Caescu, Columbia University, United States
  • Julie Bletz, Sage Bionetworks, United States
  • Gustavo Stolovitzky, IBM, United States
  • Julio Saez-Rodriguez, Heidelberg University, Germany
  • Daniela Gerhard, NIH/NCI, United States
  • Justin Guinney, Sage Bionetworks, United States
  • Andrea Califano, Columbia University, United States

Short Abstract: The Columbia Cancer Target Discovery and Development (CTD2) Center has developed Pancancer Analysis of Chemical Entity Activity (PanACEA), a database of dose-response curves and drug-perturbed RNAseq profiles for 400 clinical oncology drugs. We used this resource to host the CTD2 Pancancer Drug Activity DREAM Challenge, a crowdsourced competition to develop and benchmark computational models for the prediction of drug polypharmacology using drug sensitivity and gene expression information. We provided dose-response and drug-perturbed RNAseq data on 32 kinase inhibitors and asked the community to use this data to predict target binding across 255 kinases. Top performing teams employed two distinct strategies: simple similarity analysis using many highly curated training datasets, or more advanced deep-learning trained on a single large data set. Detailed analyses of the best performing methods provide (1) a framework for using pharmacogenomic data to predict drug-target interactions, (2) reconciliation of different “drug-target” gold-standard definitions, and (3) insights into therapeutically actionable associations between kinase signalling and transcriptional networks.

DREAM - 15: Predicting drug targets by integration of drug sensitivity and gene signature data - the NETPHAR strategy

Show
Keywords: drug sensitivity drug target perturbation signature pharmacological database
  • Wenyu Wang, University of Helsinki, Finland
  • Shuyu Zheng, Univerisity of Helsinki, Finland
  • Alberto Pessia, Univerisity, Finland
  • Mohieddin Jafari, Univerisity of Helsinki, Finland
  • Ziaurrehman Rehman, Univerisity of Helsinki, Finland
  • Jing Tang, Univerisity of Helsinki, Finland

Short Abstract: Misidentifying a drug’s mechanism of action is a common problem in drug discovery. Despite recent efforts on profiling of transcriptomics changes after drug treatment, it remains unknown whether they can facilitate the prediction of drug targets. The CTD2 Pancancer Drug Activity DREAM Challenge provided dose-response and drug-gene signatures on 32 kinase inhibitors and asked the participants to predict binding targets of these anonymous drugs. We have collected: 1) drug sensitivity data; 2) gene signature data and 3) drug-target interaction data. We utilized the DrugComb (http://drugcomb.fimm.fi), which is a crowd-sourcing database of comprehensive drug sensitivity data for combinatorial and monotherapy screenings. Furthermore, we determined the robust drug sensitivity metrics including IC20 and RI (relative inhibition) score, which is based on the area under the log10-scaled dose-response curves. Drug target interactions are derived from DrugTargetCommons (http://drugtargetcommons.fimm.fi/), which is a crowd-sourcing database to manually curate the drug-target bioactivity values from the literature. The final training dataset includes 116 drugs that have the cell line sensitivity features (d = 2*11), consensus gene expression signatures (d = 973, provided by organizers) as well as drug target profiles (d = 1259). To determine the best machine learning models to predict the drug targets, we considered two classes of methods, including weighted averaging and regression. For weighted averaging, the prediction was made based on the multiplication of the Pearson correlation matrix and the drug-target interaction matrix; while for regression, we considered standard machine learning algorithms including ElasticNet, RandomForest and GBM, for which the model was trained on the n = 116 compounds that were found in the training set, and then tested on the n = 32 Challenge compounds. We found that regression methods produced less accurate results, probably due to overfitting. Instead, our weighted averaging method, which directly uses Pearson correlation to transform the original predictor space into a drug similarity space, seemed to produce superior performance. In conclusion, we believed the hypothesis holds true that drug targets can be inferred from their drug responses and perturbational profiles, with the proper choice of data and model. Specifically, we found that RI and IC20 are robust estimates of the drug responses. Deeply-curated quantitative pharmacological databases (ie. DrugComb, DrugTargetCommons and L1000) pave ways for advanced pharmacological modelling which may help identify the mechanisms of drugs with improved accuracy.

DREAM - 16: Personalized prediction of on-off medication state from wearable-derived time-series features

Show
Keywords: Wearables At-home monitoring Patient-generated data Parkinson's disease Personalized medicine
  • Yidi Huang, Harvard Medical School Department of Biomedical Informatics, United States
  • Mark Keller, Harvard Medical School Department of Biomedical Informatics, United States
  • Mohammed Saqib, Harvard Medical School Department of Biomedical Informatics, United States
  • Brett Beaulieu-Jones, Harvard Medical School Department of Biomedical Informatics, United States

Short Abstract: Wearables hold potential for rich monitoring of patient state, particularly in chronic conditions such as Parkinson's disease. However, clinically useful information is difficult to extract due to the high dimensionality and large amounts of noise inherent to real world sensor data. In such data regimes, deep learning techniques can be susceptible to overfitting, and simpler techniques may actually be preferable. We developed a data pipeline to predict on-off states for Parkinson’s disease from wearable accelerometer data while minimizing overfitting. The input to our pipeline was raw sensor data consisting of triaxial acceleration time series signals measured from smartwatches. We combined Individual sensor axes and removed gravitational acceleration from the combined signal. We then extracted time-series features from the processed signal and fit a random forest to predict on-off state for each patient. To expand the training set, we divided each full-length observation into 10 second segments. Our pipeline generated predictions for each segment and used the ensembled median value as the prediction for the observation. This pipeline significantly outperformed the null model, as well as deep learning approaches, in both an internal validation and a held-out test set. Our approach emphasized parsimony and interpretability without sacrificing model performance.

DREAM - 17: CTD2 Beat AML DREAM Challenge: Strategies for Prediction of Drug Efficacy and Patient Outcomes

Show
Keywords: acute myeloid leukemia functional precision medicine drug sensitivity prediction
  • Jacob Roberts, Icahn School of Medicine at Mount Sinai, United States
  • Verena Chung, Sage Bionetworks, United States
  • Thomas Yu, Sage Bionetworks, United States
  • Brian White, Sage Bionetworks, United States
  • Anna Cichonska, Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), Finland
  • Gonzalo Lopez Garcia, Icahn School of Medicine at Mount Sinai, United States
  • Mike Mason, Sage Bionetworks, United States
  • Adam Margolin, Icahn School of Medicine at Mount Sinai, United States
  • Daniela Gerhard, National Cancer Institute, United States
  • Justin Guinney, Sage Bionetworks, United States
  • Tero Aittokallio, Oslo University Hospital, Norway
  • Jeffrey Tyner, Oregon Health & Science University, United States

Short Abstract: In the era of precision medicine, acute myeloid leukemia (AML) patients have few therapeutic options: “7 + 3” induction chemotherapy has remained the standard for decades. While several agents targeting the myeloid marker CD33, alterations in FLT3 or IDH1/2, or the anti-apoptotic protein BCL2 have demonstrated efficacy in patients, responses are muted in some populations and relapse remains prevalent. There is an urgent need for targeted treatment options that are tailored to more refined patient subpopulations in order to achieve durable responses. To address this need, we hosted an NCI-sponsored Beat AML DREAM Challenge under the auspices of the Cancer Target Discovery and Development (CTD2) program. In this community-wide assessment, participants predicted ex vivo sensitivity of AML patient primary cells to 122 targeted and chemotherapeutic agents using genomic, transcriptomic, and clinical data (sub-Challenge 1; SC1) and predicted clinical response using these data as well as the ex vivo drug sensitivity data (SC2). Data were furnished by the Beat AML initiative, which comprehensively profiled AML patient samples using whole-exome sequencing (WES), transcriptome sequencing (RNA-seq), and ex vivo functional drug sensitivity screens. Participants developed and tuned their methods using published training data (n=213 specimens) and subsequently received scored submissions on published “leaderboard” data (n=80). Final submissions were ranked on validation data (n=65) we generated for this Challenge using a primary scoring metric, with statistical ties resolved using a secondary metric. Twenty eight participants entered submissions for SC1. We applied two baseline comparator models: a ridge regression model using only expression data (primary metric Spearman’s rho = 0.32; secondary metric Pearson’s r = 0.32) and a Bayesian multitask multiple kernel learning method using expression and mutation data (rho = 0.31; r = 0.32), which was the top-performing method in a related assessment of drug sensitivity prediction across breast cancer cell lines in vitro. The top-performing participant improved upon both models (rho = 0.37; r = 0.38). Six of the top seven participants, including the first-ranked, used multitask approaches or otherwise shared information across the drugs. Fourteen participants entered submissions for SC2. A baseline Cox proportional hazards model with LASSO regularization using all available data modalities achieved a concordance index (CI; primary metric) of 0.68 and an AUC (secondary metric) of 0.65. Four participants were tied based on the primary metric, with the top participant determined by the secondary metric (CI = 0.77; AUC = 0.75).

DREAM - 18: RA2-DREAM Challenge: Automated Scoring of Radiographic Damage in Rheumatoid Arthritis

Show
Keywords: rheumatoid arthritis radiographs image analysis
  • S. Louis Bridges Jr., Hospital for Special Surgery, United States
  • Dongmei Sun, University of Alabama at Birmingham, United States
  • Jake Chen, University of Alabama at Birmingham, United States
  • Mason Frazier, University of Alabama at Birmingham, United States
  • Percio Gulko, Icahn Mount Sinai School of Medicine, United States
  • Robert Allaway, Sage Bionetworks, United States
  • James Costello, University of Colorado, United States

Short Abstract: Rheumatoid arthritis (RA) is a common chronic autoimmune disease characterized by inflammation of the synovium leading to joint space narrowing and bony erosions around the joints. The current state-of-the-art method for quantifying the degree of joint damage is human visual inspection of radiographic images by highly trained readers. This tedious, expensive, and non-scalable method is an impediment to research on factors associated with RA joint damage and its progression, and may delay appropriate treatment decisions by clinicians. We sought to develop automatic, rapid, accurate methods to quantify the degree of joint damage in patients with RA using machine learning or deep learning through the community crowdsourced RA2-DREAM Challenge. The motivation for the Challenge, background related to the scoring of joint damage in RA, and the scored radiographic images from clinical studies that supported the Challenge will be described. In addition, each of the three sub-challenges will be discussed: 1: Predict overall RA damage from radiographic images of hands and feet; 2: Predict joint space narrowing scores from radiographic images of hands and feet. 3: Predict joint erosion scores from radiographic images of hands and feet.

DREAM - 19: A multitask neural network approach for predicting drug targets from chemogenomic data

Show
Keywords: Drug-target interaction Neural network Machine learning
  • Tingzhong Tian, Tsinghua University, China
  • Fangping Wan, Tsinghua University, China
  • Shuya Li, Tsinghua University, China
  • Yuanpeng Xiong, Tsinghua University, China
  • Jianyang Zeng, Tsinghua University, China

Short Abstract: Accurately identifying drug-target interactions (DTIs) in silico can greatly facilitate the process of drug discovery and development as it can provide valuable insights into the drug mechanisms of action and off-target adverse events. With the emergence of chemogenomic data (e.g., drug perturbational gene expression profiles), researchers now can utilize more information beyond the drug structures to build data-driven DTI prediction tools. In this talk, we present a winning method in CTD-squared Pancancer Drug Activity Dream Challenge for this problem. We develop a multitask neural network approach to simultaneously model DTI relationships for a set of targets based on drug perturbed gene expression data. By incorporating a positive-unlabeled learning objective and a multitask learning constraint (i.e., graph Laplacian regularization), our method exhibits strong predictive power in both computational experiments and competitions. [ NO POSTER ]

DREAM - 2: Omics-based prediction of preterm birth by Gaussian Process Regression models

Show
Keywords: Preterm prediction SVM GPR
  • Yuanfang Guan, University of Michigan–Ann Arbor, United States

Short Abstract: We used a combination of SVM and GPR. The main task of the challenge was to tune the parameters of these two algorithms and assembling them. We included all samples into training (no matter microarray or RNAseq). We quantile normalized each sample. The meaning of tuning parameters in SVM and GPR is to find out how much noise are there in the expression data. It was through a systematic grid search. Models were weighted equally when predictions are combined.

DREAM - 20: A framework for studying machine learning methods in healthcare: The First EHR DREAM Challenge

Show
Keywords: Electronic Health Records Clinical Informatics Machine Learning
  • Timothy Bergquist, Sage Bionetworks, United States
  • Thomas Schaffter, Sage Bionetworks, United States
  • Yao Yan, Sage Bionetworks, United States
  • Thomas Yu, Sage Bionetworks, United States
  • Justin Prosser, University of Washington, United States
  • Sean Mooney, University of Washington, United States
  • Justin Guinney, Sage Bionetworks, United States

Short Abstract: Implementation of machine learning-based methods in healthcare is of high interest and has the potential to positively impact patient care. To that end, real world accuracy and outcomes from the application of these methods remain largely unknown, and performance on different subpopulations of patients also remains unclear. In order to address these important questions, we hosted a community challenge to evaluate disparate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as it is quantitative and clinically unambiguous. In order to overcome patient privacy concerns, we employed a Model-to-Data approach, allowing citizen scientists and researchers to train and evaluate machine learning models on electronic health records from the University of Washington medical system. We held the EHR DREAM Challenge: Patient Mortality from May 2019 to April 2020. We asked participants to predict the 180 day mortality status from the last visit that each patient had in UW Medicine. In total, we had 354 registered participants, coalescing into 25 independent teams. The top performing team achieved an area under the receiver operator curve of 0.947 (95% CI 0.942, 0.951) and an area under the precision-recall curve of 0.487 on all patients over a one year observation of a large health system. In a follow up phase of the challenge, we extracted the trained features from the best performing methods and evaluated the generalizability of models across different patient populations, revealing that models differ in accuracy on subpopulations, such as race or gender, even when they are trained on the same data and have similar accuracy on the population. This is the broadest community challenge focused on the evaluation of state-of-the-art machine learning methods in healthcare performed to date and shows the importance of prospective evaluation and collaborative development of individual models.

DREAM - 21: Biomedical Analysis of Composite Ontological Networks

Show
Keywords: Metadata DREAM Heuristics
  • Tim Kaniss, ENGR Dynamics, United States
  • Dave Kaniss, ENGR Dynamics, United States
  • Lauren Cirillo, ENGR Dynamics, United States

Short Abstract: The ENGR Dynamics approach focused on scalable, computationally efficient heuristics evaluated to score multiple comparisons quickly without complex querying and filtering. While a number of standardized approaches were considered, the final solution focused on capturing the nature of the exceptions and implemented a standardized approach for the columns containing an expected and predictable naming schema.

DREAM - 22: Parkinson's disease symptom assessment in free-living conditions; the BEAT-PD Challenge

Show
Keywords: Parkinson's disease wearable sensors digital biomarkers
  • Solveig Sieberts, Sage Bionetworks, United States
  • Alex Mariakakis, University of Toronto, Canada
  • Nicholas Shawen, Northwestern University, United States
  • Phil Snyder, Sage Bionetworks, United States
  • Luc Evers, Donders Institute for Brain, Cognition and Behaviour, Netherlands
  • Izhar Bar-Gad, Bar Ilan University, Israel
  • Brett Beaulieu-Jones, Harvard University, United States
  • Henryk Borzymowski, PwC Munich, Germany
  • Yuval El-Hanany, Bar Ilan University, Israel
  • Jann Goschenhofer, Fraunhofer IIS Ada Lovelace Center, Germany
  • Yuanfang Guan, University of Michigan, United States
  • Yidi Huang, Harvard School of Medicine, United States
  • Monica Javidnia, University of Rochester, United States
  • Mark Keller, Harvard School of Medicine, United States
  • Ayala Matzner, Bar Ilan University, United States
  • Alex Page, University of Rochester, United States
  • Mohammed Saqib, Harvard School of Medicine, United States
  • Greta Smith, University of Rochester, United States
  • Charles Venuto, University of Rochester, United States
  • Robbie Zielinski, University of Rochester, United States
  • Gaurav Pandey, Mount Sinai School of Medicine, United States
  • Luca Foschini, Evidation, United States
  • Larsson Omberg, Sage Bionetworks, United States

Short Abstract: Recent advances in mobile health have demonstrated great potential to leverage sensor-based technologies for quantitative, remote monitoring of health and disease - particularly for diseases affecting motor function such as Parkinson’s disease. While infrequent doctor’s visits along with patient recall can be subject to bias, remote monitoring offers the promise of a more objective, holistic picture of the symptoms and complications experienced by patients on a daily basis, which is critical for making decisions about treatment. Previous work, including the 2017 Parkinson’s Disease Digital Biomarker DREAM Challenge, showed that Parkinson’s diagnosis and symptom severity can be predicted using wearable and consumer sensors worn during the completion of specific short tasks. The BEAT-PD Challenge sought to understand whether symptom severity could be predicted from passive monitoring of patients, as they went about their daily lives, which is a critical component to developing algorithms for remote monitoring. To this end, we leveraged two previously unavailable data sets which collected passive accelerometer data from wrist-worn devices coupled with patient self-reports of symptom severity. Participants were asked to build patient-specific models to predict on/off medication status (subchallenge 1), dyskinesia, an often-violent involuntary movement which arises as a side-effect of medication (subchallenge 2), and tremor (subchallenge 3) for 28 patients. The participant models were compared to a patient-specific null model. Through this challenge, as well as the post-challenge community phase, we determined that sensor measurements from passive monitoring of Parkinson’s patients can be used to predict symptom severity for a subset of patients. Moreover, these models were also predictive for in-clinic physician-assessments of severity. Patient predictability was generally not related to factors like sample size or reporting lag but was somewhat related to overall disease severity.

DREAM - 3: Flash talk BEAT-PD: use deep learning to predict tremor severity

Show
Keywords: digital biomarker parkinson's deep learning
  • Yuanfang Guan, University of Michigan–Ann Arbor, United States

Short Abstract: The hallmark of digital medicine is the ability to monitor patients remotely without a physician. While accelerometer/gyroscope-based digital biomarkers have been developed to classify many diseases such as Parkinson’s, in general it remains an open question whether they can be used to monitor severity, particularly in a free-living environment. We report modalities and algorithms that combat the confounding factors in free-living environments and enable remote tremor severity monitoring for individual Parkinson’s patients. We found the fundamental reasons why previous attempts failed: direct regression against severity scores indeed produced no signal as existing studies, and we point to the critical aspects in constructing personalized parameters that allowed the model to place top in the BEAT-PD End Point Challenge. We envision that the methodology will have direct applications in clinical trials and patient care that requires objective, fine-grained scoring and can be adopted to the digital biomarker field for many other neurological or movement conditions.

DREAM - 4: A multistage deep learning method for scoring radiographic hand and foot joint damage in rheumatoid arthritis

Show
Keywords: rheumatoid arthritis deep learning automated radiograpic scoring Sharpe/van der Heijde method RA2 DREAM Challenge
  • Isaac Dimitrovsky, WRQ Research, United States
  • Lars Ericson, Catskills Research Company, United States

Short Abstract: We'll talk about our entry to the RA2 DREAM Challenge, which won the overall damage prediction category (SC1) - for details see our writeup at https://www.synapse.org/#!Synapse:syn21478998/wiki/604432 The main difficulty in this competition was the lack of training data. We'll review the strategies we used to deal with this, including: - Using a DL model to convert all images to the same dihedral orientation. - Using a DL model to locate joints and cut out joint images - this enabled us to merge groups of joints into one model, multiplying the training data available per prediction. - Thoughtful use of data augmentation, including perspective warps. - Using a carefully chosen pretrained architecture and cross-validation for final damage prediction. Used together, these strategies enabled us to use potentially higher-performance deep learning models without overfitting. Finally, we'll discuss what we think is an interesting open question: whether to use a postprocessing stage in which we adjust a patient's individual joint predictions based on the predictions from their other joints. Unlike the other winning entries, we didn't do this, because we felt unsure about whether this is a good thing to do in practice. We'll present some preliminary analysis of this question based on the competition training set.

DREAM - 5: Assessment of Parkinson's disease dyskinesia in a free-living environment

Show
Keywords: Parkinson's Disease activity accelerometer machine learning
  • Alex Page, University of Rochester Medical Center, United States
  • Monica Javidnia, University of Rochester Medical Center, United States
  • Greta Smith, University of Rochester Medical Center, United States
  • Robbie Zielinski, University of Rochester Medical Center, United States
  • Charles Venuto, University of Rochester Medical Center, United States

Short Abstract: While there is inherent value in a clinician examination, the “gold standard” clinical assessment for Parkinson's disease (PD), the MDS-UPDRS, is subjective, administered sporadically, and may not be reflective of the full burden of disease severity. Thus, more frequently-administered, objective measurements via digital technologies can allow for a more accurate detailing of one's disease severity and treatment response. The BEAT-PD DREAM Challenge offered a dataset of smartwatch/smartphone accelerometer recordings from 16 patients with PD, who self-reported their level of dyskinesia (e.g. on a scale of 0-4) during each recording period. We trained a random forest regression model to predict the level of dyskinesia based on measurements extracted from the accelerometer signals. 16 features were used from the accelerometers, such as the mean acceleration and dominant frequency of motion. In addition to the accelerometer features, the patient characteristics (e.g. age, gender, and baseline UPDRS scores) were also used to train the model. This allows the model to develop branches personalized for certain (types of) patients. Personalization is important not only due to differing patient lifestyles and disease progression, but also because the labels for these data are patient-reported, i.e. they are subjective. The total set of features can be reduced by principal component analysis (PCA) or recursive feature elimination (RFE) without significant impact to accuracy. The model makes a prediction for every 30 seconds of activity. For more stable predictions, these estimates can be averaged over longer periods of time (such as the 20-minute recordings of the DREAM challenge). Our model predicted dyskinesia severity with a mean per-patient error of 0.4053. In validation, we found that the model performed well on less severe dyskinesias, but under-estimated in relatively rare cases of high severity (e.g. 4 out of 4 dyskinesia). Future improvements could be made by addressing this class imbalance. We would also like to incorporate time and date into the model to capture circadian patterns. The current version outperformed all 37 other teams in the BEAT-PD DREAM Challenge. We found that the UPDRS scores were very important features for dyskinesia prediction. In many cases, sensor-derived features were secondary to the UPDRS values. Some of the most important sensor-derived features were the mean acceleration, power spectral entropy, and correlation coefficients between acceleration axes. Our code is publicly available: https://bitbucket.org/atpage/beat-pd/.

DREAM - 6: A LightGBM Model to Predict 180 Days Mortality Using EHR Data

Show
Keywords: mortality prediction EHR data machine learning
  • Jifan Gao, University of Wisconsin-Madison, United States
  • Guanhua Chen, University of Wisconsin-Madison, United States

Short Abstract: The aim of the EHR DREAM Challenge is to predict the mortality status of patients within 180 days of their last visit using their EHR data. We trained and tuned a LightGBM model to predict the mortality risk of each patient with a tailored feature engineering step. In particular, we used the ontology-rollup to reduce the dimensionality of features and used time binning and sample reweighting to capture the longitudinal feature information. The AUROC and AUPR of our model on the independent validation data are 0.9470 and 0.4779 respectively, which both are the highest among all models submitted for this challenge.

DREAM - 7: Dealing with high dimensional data to predict preterm birth

Show
Keywords: complex systems preterm birth gene expression machine learning
  • Istvan Csabai, Eotvos Lorand University, Department of Physics of Complex Systems, Hungary
  • Balint Armin Pataki, Eotvos Lorand University, Department of Physics of Complex Systems, Hungary

Short Abstract: The gene expression of human cells is a complex system with thousands of interacting components. In several studies researchers successfully used machine learning methods to infer high-level biological phenomena like preterm birth, as in the recent DREAM PTB challenge. Can we really get true biologically meaningful insights with this approach?

DREAM - 8: Deep Learning-based Prediction of Radiographic Joint Damage in Rheumatoid Arthritis

Show
Keywords: Rheumatoid Arthritis Deep Learning Data Challenge
  • Hongyang Li, University of Michigan, United States
  • Yuanfang Guan, University of Michigan, United States

Short Abstract: Rheumatoid arthritis (RA) is an autoimmune disease affecting joints of hands, feet, wrists, ankles, elbows, and knees. It is estimated that about 0.6 percent of the adults in the United States are affected by joint damages associated with RA, including pain and swelling arounds the joint regions. A standard way to evaluate joint damages is manually examining the radiographic images of joints and estimating the severity of joint space narrowing and erosion, which are labor-intensive and time-consuming even for experienced radiologists. Here we present a deep learning-based approach for automatically predicting joint damages and segmenting the regions of interest. This approach ranked top in the 2020 RA2 DREAM Challenge - Automated Scoring of Radiographic Joint Damage.

DREAM - 9: SVM-based approach to predict preterm birth using omics data

Show
Keywords: Preterm Birth sPTD PPROM Transcriptomics Machine Learning Support Vector Machine
  • Rintu Kutum, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India

Short Abstract: In the DREAM Preterm Birth Prediction Challenge, Transcriptomics (Sub-challenge 2), the goal was to predict the preterm birth phenotypes (sPTD and PPROM) with a minimal set (at most 100) of transcriptomic features. We (team IGIB) performed, 1) differential expression analysis between sPTD vs control and PPROM vs control using t-test or Wilcoxon-test, 2) prioritized top 100 features based on statistical significance p-value, 3) SVM-based classification models (kernel types: linear, sigmoid, and radial) were built with 5-fold cross-validation, and 4) Based on the overall sensitivity and specificity across 5-fold CV, the best SVM-approach, radial-SVM was selected for prediction of the preterm birth phenotypes (sPTD and PPROM). Overall the performances for radial-SVM models, to predict sPTD was 98.33% (sensitivity) and 93.33% (specificity); and to predict PPROM was 100%(sensitivity) and 100% (specificity).



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube