DREAM Challenges

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in PST
Tuesday, November 8th
12:30-12:45
Welcome and Introductory Remarks
Format: Live from venue

Moderator(s): Pablo Meyer

  • Pablo Meyer
12:45-13:45
Keynote Presentation: Towards regulatory and systems genomics from single-cell gene expression measurements
  • Lior Pachter


Presentation Overview: Show

Single-cell genomics technologies have been lauded for their potential to probe biological systems with cell type specificity, and to elucidate cellular differentiation trajectories. However, the analysis of single-cell genomics data is fraught with numerous computational challenges. I will show that biophysical models of transcriptional dynamics are helpful in resolving some of these challenges and outline a unifying mathematical framework for single-cell genomics.

13:45-14:07
Preterm Birth Prediction Microbiome DREAM Challenge: Leveraging Microbiome Data in the Era of Precision Medicine
Format: Live from venue

Moderator(s): James Costello

  • Tomiko Oskotsky


Presentation Overview: Show

Our DREAM challenge was to predict (a) preterm or (b) early preterm birth from 9 publicly available studies of the vaginal microbiome representing 3578 samples from 1268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on 2 novel datasets. From 318 participants we received 148 and 121 submissions for our prediction tasks with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87 respectively.

14:07-14:19
Team UWiskMadison - An Ensemble Model to Predict Preterm Birth Using Vaginal Microbiome Data
Format: Live from venue

Moderator(s): James Costello

  • Zhoujingpeng (Wallance) Wei


Presentation Overview: Show

TBD

14:19-14:31
Team AI4knowledge - Ensemble machine learning to predict preterm birth from vaginal microbiome
Format: Live from venue

Moderator(s): James Costello

  • Pierfrancesco Novielli


Presentation Overview: Show

Environmental factors and in particular the vaginal microbiome play a crucial role in preterm birth. Given the complexity of the microbiota, which is tightly associated with its role, its analysis requires advanced methodologies.
It has been used an ensemble of random forests models with oversampling of the minority class to predict preterm birth from demographic and longitudinal imbalanced vaginal microbiome input data.
Feature selection was performed in two different strategy: by using RF (embedded) or by using BORUTA's method.
As it is common in this type of data, samples were highly imbalanced. To address this problem we used a SMOTE (Synthetic Minority Over-sampling Technique) approach which in fact improved our performances.

14:31-14:43
Team TechtmannLab - Random forest model accurately predicts early preterm labor
Format: Live from venue

Moderator(s): James Costello

  • Abigail Kuntzleman, Michigan Technological University, United States
  • Isaac Bigcraft, Michigan Technological University, United States
  • Stephen Techtmann, Michigan Technological University, United States


Presentation Overview: Show

About 10% of births worldwide are preterm (delivery before 36 weeks), and 2% of births worldwide are early preterm (delivery before 32 weeks). For the Preterm Birth Microbiome DREAM Challenge we used 3,578 vaginal microbiome samples from 1,268 individuals to predict both preterm and early preterm birth. We explore the use of a generative adversarial network (GAN) applied to preterm birth prediction to train models on more data, but find that a basic random forest model trained on real relative abundances, diversity metrics, community state types, race, and collect week outperforms both a random forest and support vector machine trained on generated relative abundance data. For early preterm birth prediction, we employ a basic random forest model and find that the most important features for early preterm birth prediction include diversity statistics, collect week, race, community state types, and many phylotypes. However, few of these data show significant difference between early preterm and post-32 week samples, indicating that individual features on their own are not good predictors of early preterm birth. When tested on the validation dataset for the challenge, our early preterm birth prediction model had an AUC ROC of 0.87, an AUC PRC of 0.44, and an accuracy of 0.91.

14:43-14:55
Team KBJ - Prediction model construction of early-preterm birth via vaginal microbiomes based on ensemble learning approach
Format: Live from venue

Moderator(s): James Costello

  • Eunyong Kim


Presentation Overview: Show

The occurrence of preterm, including early-preterm birth, is estimated annually at 15 million births worldwide. Preterm birth(PTB) is a great concern as it is one of the leading causes of neonatal mortality, and the inflammation of the vaginal microbiome is known as the major cause of PTB. Because of the complexity of the vaginal microbial environment in pregnancy, it is necessary to accurately predict (early-) preterm birth using computational approaches based on microbiome characteristics and meta-information. In this Preterm Birth Prediction Microbiome DREAM Challenge, we constructed prediction models with selected features, handling highly sparse and similar data points in a given raw data. We applied the minimum redundancy maximum relevance method to select relevant features. Then, various machine learning models were tested to construct ensemble models to avoid overfitting and optimize the model. The constructed prediction models resulted in high performances, with an AUC of 0.635 and 0.841 for tasks 1 and 2.

15:25-16:20
Panel: Creating the DREAM Panel Discussion
Format: Live from venue

Moderator(s): Jake Albercht

  • DREAM Panel


Presentation Overview: Show

Creating the DREAM Panel Discussion

16:20-16:40
BraTS Continuous Evaluation Challenge Overview
Format: Live from venue

Moderator(s): Jake Albercht

  • Jake Albercht


Presentation Overview: Show

Overview of the BraTS Challenge

16:40-16:45
Spyridon Bakas and Ujjwal Baid will announce the BraTS Challenge Top Performers
Format: Live from venue

Moderator(s): Jake Albercht

  • Jake Albercht


Presentation Overview: Show

Spyridon Bakas and Ujjwal Baid will announce the BraTS Challenge Top Performers

Wednesday, November 9th
9:30-9:45
Welcome to Day 2
Format: Live from venue

Moderator(s): Pablo Meyer

  • Pablo Meyer
9:35-9:42
Predicting gene expression using millions of random promoter sequences
Format: Live from venue

Moderator(s): Pablo Meyer

  • Carl deBoer


Presentation Overview: Show

Predicting gene expression using millions of random promoter sequences

9:42-9:57
Predicting gene expression using random promoter sequences - Challenge Overview
Format: Live from venue

Moderator(s): Pablo Meyer

  • Abdul Rafi


Presentation Overview: Show

Understanding the cis-regulatory logic of the human genome is an important goal and would provide insight into the origins of many diseases. However, learning models from human data is challenging due to limitations in the diversity of sequences present within the human genome (e.g., extensive repetitive DNA), the vast number of cell types that differ in how they interpret regulatory DNA, limited reporter assay data, and substantial technical biases present in many omic methods. To overcome these issues, we have recently created high-throughput measurements of the cis-regulatory activity of millions of randomly generated promoters in the single-cell organism Yeast. Here, the expression level induced by each promoter sequence is measured via a fluorescent reporter gene regulated by a promoter. The set of randomly generated promoter sequences is so large that it rivals the complexity of the entire human genome, which gives us unprecedented power to learn the many parameters required to understand gene regulation. Because human and Yeast cis-regulatory logic uses similar principles, we hope that the model architectures learned on yeast data can inform how to create models for the human genome. We organized

9:57-10:13
Team BHI - Learning cis-regulatory logic in promoter sequence with sandwich-like deep neural network for prediction of gene expression
Format: Live from venue

Moderator(s): Pablo Meyer

  • Danyeong Lee


Presentation Overview: Show

We developed a deep neural network model named DeepGXP (Deep learning model for the prediction of Gene eXpression using Promoter sequence) for the sequence-to-expression prediction task. In short, it adopts a “sandwich” architecture consisting of a one-dimensional convolutional layer, a bidirectional long-short term memory (Bi-LSTM) layer, and another convolutional layer. Note that each convolutional layer uses two different kernel sizes (9 and 15) as they gave significant performance improvement. The first convolutional layer captures sequence motifs of fixed length and the following Bi-LSTM learns more flexible and long-range dependencies between motifs. The last convolutional layer seems to consolidate the ‘soft’ dependencies between motifs into ‘hard’, or basepair-resolution, dependencies. Besides the model architecture, we found that training details specialized for DNA sequence-based deep learning model were highly important for the overall performance. Among them, the most crucial was to use ‘post-hoc conjoined’ setting (Zhou et al., 2020), which imposes a reverse-complement equivariance to the model. Test-time augmentation was also effective. Predictions were made for an original sequence, its four shifted variants (generated by -2bp, -1bp, +1bp and +2bp shifting) and their reverse-complement sequences, then those 10 predictions were averaged to make a final prediction. While training sequences longer than 110bp were trimmed to the right, sequences shorter than 110bp are padded with original vector sequence on both sides randomly. We noticed that this informative padding gave nonnegligible performance boost. Finally, to be as unbiased as possible for the distribution of test set, predictions were quantile-transformed using the distribution of expression levels in training data as post-processing.

10:13-10:29
Proformer: A hybrid Macaron transformer model predicts expression values from promoter sequences
Format: Live from venue

Moderator(s): Pablo Meyer

  • Wuming Gong


Presentation Overview: Show

The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity for us to systematically decode the cis-regulatory logic that determines the expression values. In this DREAM challenge, we developed an end-to-end Transformer encoder architecture (Proformer) to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learnt positional embedding and strand embedding (forward strand vs reverse complement strand) as the sequence input. Along with the sequence input, we added several positions (32 in our final model) of

10:29-10:45
Team Presentation - Autosome Team - NogiNet: repurposing EfficientNetV2 for accurate promoter sequence-to-expression modeling
Format: Live-stream

Moderator(s): Pablo Meyer

  • Dmitry Penzar


Presentation Overview: Show

We present NoGiNet, a deep learning solution for the DREAM 2022 promoter expression challenge. Our network is based on EfficientNetV2 with residual concatenation instead of residual summation blocks. The One Cycle Policy is used for training with AdamW resulting in the so-called superconvergence of the model, with reduced training time and improved model performance.
Further improvement of the model convergence and validation performance was achieved by re-formulating the initial regression task as a soft-classification problem. Additionally, during training, we use an additional binary channel to explicitly mark the object with integer and thus likely imprecise expression measurements. The information from the second strand of each promoter is provided in a similar way, by augmenting the dataset with the reverse complementary sequences and then explicitly marking the orientation in a separate binary channel.
Our approach does not include any attention-based mechanics but reached high internal validation metrics and competitive performance on the public leaderboard. This agrees with the recently published studies showing that properly designed convolutional neural networks can outperform attention-based architectures in image analysis.

11:15-11:35
Anti-PD1 Response Prediction DREAM Challenge Overview
Format: Live from venue

Moderator(s): Jineta Banerjee

  • Michael Mason


Presentation Overview: Show

TBD

11:35-11:47
Team NetPhar-antipd1 - Decision Tree-based models for predicting the efficacy of anti-PD1 treatment
Format: Live from venue

Moderator(s): Jineta Banerjee

  • Jing Tang


Presentation Overview: Show

TBD

11:47-11:59
I-MIRACLE: Tumor-immune interaction prediction models
Format: Live from venue

Moderator(s): Jineta Banerjee

  • Raghvendra Mall


Presentation Overview: Show

The goal of Anti-PD1 Response Prediction DREAM Challenge was to identify strong predictive biomarkers for PD1 blockade therapy in NSCLC cancer. Tumor mutation burden (TMB) and PDL1 levels, assessed by IHC, has been associated with responsiveness to PD1 blockade therapy. Immune signatures capturing T helper 1/cytotoxic immune response has been coherently associated with responsiveness to immunotherapeutic approaches, and from a predictive perspective, known to synergistically interact with the TMB. We utilized a 20 gene signature called the Immunologic constant of rejection (ICR), reflecting a combination of Th-1 signaling activation, CXCR3/CCR5 chemokine ligands, cytotoxic effector molecules and compensatory immune regulators, with strong prognostic connotation in primary tumors and predictive connotation in the context of checkpoint inhibition. Moreover, a 5-gene proliferation signature showed significant correlation with TMB in TCGA lung cancer cohorts (LUAD and LUSC).

Using publicly available immunotherapy relevant transcriptomic datasets and synthetic data format from the Challenge, we engineered scale-invariant features relevant to pathway enrichments, ICR, proliferation scores and ensembled state-of-the-art machine learning (ML) models in a consensus framework for differentiating responders vs non-responders (SC3). We observed that our ML models did not generalize for the Anti-PD1 evaluation dataset (SC3) owing to lack of training dataset and hence focused on relevant prior knowledge (ICR and proliferation scores). We built linear additive models combining ICR and PDL1 levels with proliferation and TMB scores to assign ordinal scores of 1, 2 and 3 to patients while predicting for progression free survival (SC1) and overall survival (SC2). Our I-MIRACLE models outperformed the baseline model for SC1 (PFS, ranked 2nd in the sub challenge) and SC2 (OS, joint winners in the sub challenge) respectively.

11:59-12:11
Team cSysImmunoOnco: Systems-based descriptors of the tumor microenvironment for prediction of disease progression in anti-PD1 treatment
Format: Live from venue

Moderator(s): Jineta Banerjee

  • Óscar Lapuente-Santana


Presentation Overview: Show

The response of patients to immunotherapy with immune checkpoint blockers is still poorly understood. Tumors are multicellular systems with several intra- and inter-cellular regulatory interactions, calling for new holistic approaches to quantitatively characterize the tumor microenvironment (TME) for stratification of patients for immunotherapy.

Here, we show how bulk RNA-seq data complemented by prior knowledge provides a high-level mechanistic representation of the multifaceted TME: cell type abundances, pathways, transcription factors and cytokines activity scores, quantification of ligand-receptor and cell-cell interactions. Using multi-task machine learning to learn associations between these features and different hallmarks of immune response, we identified biomarkers that are shown to be predictive of immunotherapy response in different cancer types. Since immune response and tumor foreignness are complementary hallmarks of successful immunotherapy, we integrated our biomarker-based prediction of immune response with measurements of tumor mutational burden (TMB) to obtain a final score that improved our predictions in different cancer types. We used this approach to participate in the Anti-PD1 Response Prediction DREAM challenge which served to strongly support in an unbiased manner the validity of the approach in an additional cancer type such as non-small cell lung cancer (NSCLC).

12:11-12:23
L0-pseudonorm Regularized Regression in DREAM Anti-PD1 Challenge (team FICAN-OSCAR)
Format: Live from venue

Moderator(s): Jineta Banerjee

  • Teemu Laajala


Presentation Overview: Show

L0-pseudonorm, also known as best subset selection, is an alternate to the more conventionally used L1-or L2-norms in regularized regression, and has gone under-represented due to complexity in optimizing its discrete / non-continuous form. In this talk, L0 is presented in the context of the DREAM Anti-PD1 Challenge, where our methodology named

12:25-12:45
The Future of DREAM and Q&A
Format: Live from venue

Moderator(s): Pablo Meyer

  • Paul Boutros
14:00-15:00
Keynote Presentation: Connecting the dots: Collaborative analytics at scale
Format: Live from venue

  • Melissa Haendel


Presentation Overview: Show

Addressing complex scientific challenges requires weaving together data from diverse sources, organisms, contexts, formats, and granularities, and building a coherent holistic view of this data landscape to address any given problem is non-trivial. Often in the aggregation process, many of the original connections within the data are lost. Moreover, it is difficult to make new (inferred) connections without a common conceptual model. A common model is made possible by novel data integration strategies that leverage semantic technologies. However, it takes the people too; interdisciplinary collaboration can expedite progress and innovation - but is hard to scale. In communities as diverse as clinical genetics and Covid research, massive sharing of data and computational artifacts with good governance and attribution can pivot science from competition to collaboration.

15:00-15:02
IBM/NCI Community of Accelerated Discovery Welcome
Format: Live from venue

Moderator(s): James Kozloski

  • James Kozloski


Presentation Overview: Show

IBM and NCI have been working together to build a community of accelerated discovery in combination therapies.  This multi disciplinary team is working to solve tough questions by bringing together experts in the life sciences and AI and modeling in an open science / open source model.  The group has developed regular study group where tools are shared, further refined and enhanced to solve problems.  We are inviting all to join our efforts and bring your expertise to this multidisciplinary team working in this open science open source framework.  This special session will have presentations on the goals of the community, the initial seedling projects, the study group, panel discussion, and most important how you can get involved!

This effort is supported by the Open-Source Science Initiative at NumFOCUS, aiming to provide an integrated and sustainable home for the communities of research OSS across science.

15:02-15:07
A Vision of Open Science
Format: Live from venue

Moderator(s): James Kozloski

  • Alexy Kharvov
15:07-15:17
Community of Accelerated Discovery for CombinationTherapies for Cancer Treatment
Format: Live from venue

Moderator(s): James Kozloski

  • James Kozloski
15:17-15:45
Panel: NCI vision and Pilot Projects
Format: Live from venue

Moderator(s): Steven Becker

  • Steven Becker
  • Po Ear
  • Jeremy Kratz
  • Anand Patel
  • Stephen McGough
  • Irina Kareva
  • R. Huang
  • Adam Palmer
  • Leonard Harris


Presentation Overview: Show

Lightening talks by projects researchers funded in this effort followed by Q&A Panel after the break
1) “Towards a TIME-informed digital biospecimen to predict cancer combination treatment.” Huang, Palmer, Harris, and team.
2) “Assessing and exploiting metabolic plasticity to predict treatment susceptibility and improve and preserve treatment efficacy” Kareva and team.
3) “STEVE: Spatially-informed bladder cancer Tumor Explant evaluation and Virtual Exploration study” Patel, McGough, and team.
4) “A Framework for Identifying Biologically-based Combination Therapy Dosing and Therapeutic Repurposing” Kratz and team.
5) “Patient-Derived Spatial Modeling of Tumor Progression under Drug Combinations” Ear and team.

16:15-16:25
Panel: NCI vision and Pilot Projects Q&A
Format: Live from venue

Moderator(s): Steven Becker

  • Steven Becker
  • Po Ear
  • Jeremy Kratz
  • Anand Patel
  • Stephen McGough
  • Irina Kareva
  • R. Huang
  • Adam Palmer
  • Leonard Harris


Presentation Overview: Show

Q&A Panel

16:25-16:55
Panel: Study Goup Vision and implementation
Format: Live from venue

Moderator(s): James Kozloski

  • Marianna Rapsomaniki
  • James Kozloski
  • Steven Becker
  • Po Ear
  • Andriy Marusyk
  • Ashok Prasad
  • Marianna Kruithof-de-Julio


Presentation Overview: Show

Discussion of open source tools study group the team is using to share and develop resources. Both live and virtual participants.

16:55-17:30
Panel: Open Science Panel Discussion
Format: Live from venue

Moderator(s): James Kozloski

  • Pablo Meyer
  • James Costello
  • Jake Albrecht
  • Laura Heiser


Presentation Overview: Show

Discussion with about open science and different models for encouraging participation.