Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CDT
Tuesday, October 1st
8:30-9:00
Welcome to DREAM
Format: In person


Authors List: Show

  • Pablo Meyer Rojas
9:00-10:00
Invited Presentation: Digitizing Olfaction: Predicting Odor Perception from Molecular Structure
Confirmed Presenter: Joel Mainland, Monell Chemical Senses Center, United States

Format: In Person


Authors List: Show

  • Joel Mainland, Monell Chemical Senses Center, United States

Presentation Overview: Show

If you have a modern phone, you can capture a visual scene as a photograph, alter it, send it to a relative in another country in an instant, and store it so you can look at it for years to come. None of this is currently possible in olfaction. In vision and audition, we know how to map physical properties to perception: wavelength translates into color and frequency translates into pitch. By contrast, the mapping from chemical structure to olfactory percept is poorly understood, limiting our ability to describe and control odors. This, in turn, limits our ability to understand how the olfactory system encodes perception. Olfaction has a higher dimensionality than the other senses, but recent models have shown that with enough data, machine learning techniques can predict human perception from molecular structure. We hypothesized that the rate-limiting step for building a model that predicts human perception from molecular structure is the collection of high-quality psychophysical data. Here I will discuss our work towards predicting the intensity and character of both single molecules and complex mixtures. This will allow us to predict the odor of novel molecules and mixtures and paves the way toward digitizing odors.

10:25-10:45
Session: Olfactory Mixtures Prediction Challenge
Overview Talk
Format: In person


Authors List: Show

  • Pablo Meyer Rojas
10:45-11:05
Session: Olfactory Mixtures Prediction Challenge
Invited Presentation: Advancing Odor Mixture Discriminability with Pretrained Embeddings and Boosting
Confirmed Presenter: Yikun Han, University of Michigan, United States

Format: In Person


Authors List: Show

  • Yikun Han, University of Michigan, United States
  • Zehua Wang, University of Michigan, United States
  • Stephen Yang, University of Michigan, United States
  • Ambuj Tewari, University of Michigan, United States

Presentation Overview: Show

We present a novel framework for predicting odor mixture discriminability, integrating pretrained embeddings and boosting models. Our approach employs message passing neural networks to generate single molecule embeddings, followed by an exponential transformation to create mixture embeddings. These embeddings are further refined using CatBoost to predict perceptual distances. We incorporated dataset-specific weighting and data augmentation to enhance model performance. This methodology highlights the utility of combining chemical and perceptual information in machine olfaction, offering potential improvements in molecular representation and odor prediction models.

11:05-11:25
Session: Olfactory Mixtures Prediction Challenge
Invited Presentation: Predicting Olfactory Mixture Similarity: The D2Smell team
Confirmed Presenter: Vahid Satarifard, Yale Institute for Network Science, Yale University, United States

Format: In Person


Authors List: Show

  • Vahid Satarifard, Yale Institute for Network Science, Yale University, United States
  • Wenjie Yin, KTH Royal Institute of Technology, Sweden
  • Mårten Björkman, KTH Royal Institute of Technology, Sweden
  • Kobi Snitz, Department of Neurobiology, Weizmann Institute of Science, Israel
  • Danica Kragic, KTH Royal Institute of Technology, Sweden
  • Nicholas Christakis, Yale Institute for Network Science, Yale University, United States
  • Noam Sobel, Department of Neurobiology, Weizmann Institute of Science, Israel
  • Aharon Ravia, Cornell Tech, Cornell University, United States

Presentation Overview: Show

A key goal of sensory sciences is to establish the rules linking shifts in physical stimulus structure to predictable shifts in stimulus perception. These rules are currently better defined in vision and audition than they are in olfaction. The lack of these rules in olfaction is one of the key factors preventing the digitization of this sensory domain. The quest to establish such rules has shifted from efforts to predict odorant verbal labels to an effort to predict pairwise stimulus perceptual similarity. It is largely agreed that a strong framework for perceptual similarity will inherently allow stimulus labeling, and ultimately, digitization. With this in mind, the DREAM Olfactory Mixtures Prediction Challenge aimed to highlight models predicting the perceptual similarity of pairs of molecular mixtures from a curated dataset of multiple studies. We developed a machine learning model to predict olfactory mixture similarity leveraging chemical descriptors and model-derived olfactory semantic labels. Given the limited availability of perceptual data on molecular mixtures, we applied data augmentation techniques to enhance the model's learning capacity by adding or removing molecules in mixtures. Our dataset consists of 850 unique mixtures comprising 235 mono-molecules. From these, 780 pairs of mixtures were used to retrieve similarity ratings, which were expanded through data augmentation to over 10,000 entries. We used 21 physicochemical descriptors (as done by Snitz et al.) to predict 19 semantic descriptors (chosen by Keller et al.) for each molecule. These molecular descriptors were then linearly combined for the molecules within the mixtures. Additionally, we employed an aroma-chemical pair model to predict 25 semantic descriptors for molecular pairs, which were also linearly combined to represent the mixture. We then trained an ensemble of ten XGBoost models using hyperparameter optimization and 10-fold cross-validation and averaged the models to generate our predictions. Our model evaluated by DREAM challenge organizers using 10,000 bootstrap iterations achieved an MSE of 0.0062 and a Pearson correlation of 0.60752 on a held-out test dataset of 46 comparisons of mixture pairs. Our model demonstrates strong predictive power for olfactory mixture similarity, and we anticipate further improvements with larger and more detailed datasets in the future. These rules may ultimately allow for the digitization of smell.

11:25-11:45
Session: Olfactory Mixtures Prediction Challenge
Invited Presentation: Team belfaction: Leveraging Perceptual Embeddings, Mixture Representations, and Decision Forests
Confirmed Presenter: Pedro Ilídio, KU Leuven and itec (imec), Belgium

Format: In Person


Authors List: Show

  • Pedro Ilídio, KU Leuven and itec (imec), Belgium
  • Robbe D’hondt, KU Leuven and itec (imec), Belgium
  • Achilleas Ghinis, KU Leuven and itec (imec), Belgium
  • Jasper de Boer, KU Leuven and itec (imec), Belgium
  • Felipe Kenji Nakano, KU Leuven and itec (imec), Belgium
  • Alireza Gharahighehi, KU Leuven and itec (imec), Belgium
  • Celine Vens, KU Leuven and itec (imec), Belgium

Presentation Overview: Show

Despite significant advancements in machine learning for visual and auditory data, the development of effective models for olfactory phenomena remains a major challenge due to the complex nature of biological olfaction systems. Previous studies have explored how olfactory attributes can be determined from molecular structures, but computational methods still struggle to model how different odors interact and blend in mixtures. In this work, we investigate novel strategies for merging individual molecular attributes into rich vector representations of chemical mixtures, with the final goal of predicting human-perceived olfactory similarity between mixture pairs. Several representation techniques were explored on both the molecule and mixture levels, including deep learning embeddings, tree-embeddings and hypergraph-based features. Our best model combined perception embeddings from a directed message-passing neural network with a diverse set of statistical descriptors of mixture pairs, using an ensemble of Extremely Randomized Trees as the final estimator. This approach was recognized as a top performer in the DREAM Olfactory Mixtures Prediction Challenge 2024, demonstrating the importance of detailed mixture representations in capturing the nuanced nature of olfactory perception.

11:45-12:00
Session: Olfactory Mixtures Prediction Challenge
Invited Presentation: A Deep Metric Learning Approach for Olfactory Mixture Discriminability
Confirmed Presenter: Matej Hladiš, Université Cote d'Azur, France

Format: Live Stream


Authors List: Show

  • Maxence Lalis, Université Cote d'Azur, France
  • Sébastien Fiorucci, Université Cote d'Azur, France
  • Jérémie Topin, Université Cote d'Azur, France
  • Matej Hladiš, Université Cote d'Azur, France

Presentation Overview: Show

Mammals perceive and interpret a myriad of olfactory stimuli through a sophisticated coding mechanism involving interactions between odorant molecules and hundreds of olfactory receptors (ORs). These interactions generate unique patterns of activated receptors, forming what is known as the combinatorial code, which the brain interprets as distinct smells. Olfactory input is thus conveyed by odorant molecules and encoded through this combinatorial code. To solve the odorant mixture discriminability task, we propose to use the recent advances in odor quality and OR activity predictions. Utilizing state-of-the-art methodologies by Hladis et al. and Lee et al., we combine predicted combinatorial code and Principal Odor Map (POM) embeddings, respectively, as the representation of the mixture components. Additionally, to account for synergistic and antagonistic effects within the mixtures, we introduce a learned aggregation function to combine the diverse information within the components. Finally, we incorporate a metric learning approach based on Siamese networks to capture the nuanced relationships between different odorant combinations. Overall, our methodology allows for more accurate and reliable discrimination of olfactory mixtures, overcoming the limitations of classical distance metrics.

13:00-14:00
Invited Presentation: Computational Pharmacogenomics: An Open Science Approach
Confirmed Presenter: Benjamin Haibe-Kains, Princess Margaret Cancer Centre; University of Toronto, Canada

Format: In Person


Authors List: Show

  • Benjamin Haibe-Kains, Princess Margaret Cancer Centre; University of Toronto, Canada

Presentation Overview: Show

Machine learning is playing an increasingly important role in the development in the development of predictors for therapeutic responses in cancer, yet the translation from research to clinical settings remains hindered by the absence of standards for model development and evaluation. In this talk, we introduce a framework comprising seven hallmarks that are important to consider in the development of drug response models for clinical applications. These hallmarks include: Data Relevance and Actionability, Expressive Architecture, Standardized Benchmarking, Demonstrated Generalizability, Mechanistic Interpretability, Accessibility and Reproducibility, and Fairness, each underpinned by specific ethical considerations. We assess current progress against these benchmarks and propose a unified approach to help the community identify gaps in the development of clinical-relevant predictive models. By engaging the broader scientific community—spanning cancer researchers, regulators, clinicians, and policymakers—we aim to foster the development of consensus-driven standards that will expedite the clinical adoption of predictive models in oncology.

14:00-14:30
Session: Placental Clock Challenge
Overview Talk
Format: In person


Authors List: Show

  • Adi Tarca
14:30-14:50
Session: Placental Clock Challenge
Invited Presentation: Top Team - Refining placental clock measured by the Infinium HumanMethylation-450/850 BeadChip arrays by correcting collider-restriction bias
Confirmed Presenter: Herdiantri Sufriyana, National Yang-Ming Chiao-Tung University, Taiwan

Format: In Person


Authors List: Show

  • Herdiantri Sufriyana, National Yang-Ming Chiao-Tung University, Taiwan
  • Emily Chia-Yu Su, National Yang-Ming Chiao-Tung University, Taiwan

Presentation Overview: Show

Background. Latest placental epigenetic clocks were claimed to be robust when applied to cases with either maternal or fetal adverse conditions. However, they did not sufficiently correct selection bias, which might reduce the accuracy in population with different distributions of the conditions. We aimed to develop a multistage, multivariable prediction model of gestational age (GA) using beta values at individual CpG sites, correcting collider-restriction bias. Methods: A less biased estimate of placental clock Y, i.e., E[Y]=E[Y#S=1], was achieved by blocking any paths between Y and S using conditions X, including pregnancy complications and fetal malformation, that led to pregnancy termination or chorionic villi sampling. First, we developed a normal-GA model stacked with: (1) no residual model (normal-GA); (2) a residual-GA model by condition risks estimated by imputation (Res-GA) and prediction models (Resfull-GA); and (3) a Res-GA model with fully (Res-CR-GA)/partially (Res-Seq-GA) driven by clinical knowledge to deal with competing risks. Two machine learning algorithms were applied: (1) elastic net regression; and (2) random forest (RF). Second, we developed a normal-GA model stacked with: (1) a residual-GA model directly using beta values among samples without and with each GA subgroup (Res-CPG-GA); (2) a residual-GA model with beta values-predicted GA for each condition (Res-Conds-GA); and (3) a stacked model of Res-Conds-GA followed by predicting the subsequent residual using Res-CPG-GA pipeline (Res-Comb-GA) or the opposite order (Res-CPG-Comb-GA). The best model was selected if it: (1) won against the validation-set s top performer in terms of root mean squared-error (RMSE), mean absolute difference (MAE), and Pearson s correlation coefficient (r) in most of 15 phenotypic/dataset subgroups using the training set (n=1742); and (2) had the lowest RMSE in validation set (n=100). Results: Res-Comb-GA model won all subgroups in training set, except one dataset of origin. The RMSE/MAE/r values were 1.077/0.888/0.966 in validation set (second top rank). Eventually, Res-Comb-GA model ranked first in test set (RMSE=1.245). Conclusions: Correcting collider-restriction bias added a substantial improvement in placental clock accuracy and robustness. Res-Comb-GA model can be further refined by adding the number of the conditions, improving predictive performances of the conditions, and improving the condition-wise GA estimations.

14:50-15:10
Session: Placental Clock Challenge
Invited Presentation: #2 Team - Gestational Age prediction using CpG clusters in regularized regression
Confirmed Presenter: Tushar Patel, Leibniz Institute of Aging - Fritz Lipmann Institute , Germany

Format: In Person


Authors List: Show

  • Tushar Patel, Leibniz Institute of Aging - Fritz Lipmann Institute , Germany

Presentation Overview: Show

Recent studies have shown that DNA methylation clocks can accurately estimate an individual's age. These clocks, which are based on an essential part of the epigenome, can also improve our understanding of age-related processes in the body. They may also help diagnose, predict, and manage diseases and complications. One important application is accurately estimating gestational age (GA) to improve obstetric care. In this study, a placental clock model was developed to predict GA using DNA methylation data.

To ensure robustness, the approach modeled CpG clusters using their centroid values instead of individual CpG probes' methylation values. This reduces the impact of technical noise and other factors unrelated to gestational aging.
After filtering CpG probes strongly correlated with GA, k-means clustering was used to represent each cluster by its centroid. Then, an elastic net linear regression model was built using the centroid methylation values as input variables. Model parameters were determined using a comprehensive cross-validation approach, where one dataset from each pregnancy trimester was excluded for testing, and the remaining samples were used for training. The final model specified 186 clusters, representing 473 CpG probes, and achieved a root mean square error (RMSE) of 2.65 weeks on the training data and 1.43 weeks on independent test data.

15:10-15:30
Session: Placental Clock Challenge
Invited Presentation: #3 Team
Confirmed Presenter: Ibrahim Alsaggaf, School of Computing and Mathematical Sciences, Birkbeck, University of London

Format: Live Stream


Authors List: Show

  • Ibrahim Alsaggaf, School of Computing and Mathematical Sciences, Birkbeck, University of London
  • Cen Wan, School of Computing and Mathematical Sciences, Birkbeck, University of London

Presentation Overview: Show

Contrastive learning has recently drawn a great deal of attention given its success in a variety of prediction tasks across different domains. In this placental clock DREAM challenge, we used an enhanced Gaussian noise augmentation-based contrastive learning approach to predict the gestational age of the placenta. Our approach does not require any prior knowledge, rely on heavy pre-processing, or features selection, yet we obtained a competitive performance on the leaderboard and the testing data.

15:55-16:25
Session: Personal Environment and Genes Study (PEGS) Challenge
Overview Talk
Format: In person


Authors List: Show

  • Alison Motsinger-Reif
  • Farida Ahktari
16:25-16:55
Session: Personal Environment and Genes Study (PEGS) Challenge
Invited Presentation: Modeling interactions between genetics and environment to guide disease risk prediction, biological discovery, and prevention strategies
Confirmed Presenter: Alessandro Lussana, EMBL-EBI, United Kingdom

Format: In Person


Authors List: Show

  • Alessandro Lussana, EMBL-EBI, United Kingdom
  • Federico Marotta, EMBL, Germany
  • Evangelia Petsalaki, EMBL-EBI, United Kingdom
  • Peer Bork, EMBL, Germany

Presentation Overview: Show

In the context of the PEGS DREAM Challenge, our team focused on modeling the interactions between genetic and environmental factors to predict the risk of hypercholesterolemia. By integrating health and exposure survey data with genetic information, we developed a machine learning model that leverages both a random forest classifier and polygenic risk scores (PRS) to assess individual risk. The results demonstrate the importance of combining genetic and environmental data to improve disease risk prediction, with our approach showing promise in enhancing the accuracy of hypercholesterolemia classification.
Our work emphasizes the importance of identifying gene-environment (GxE) interactions beyond the aim of improving predictions. Understanding these interactions will be crucial in the future of human genetics as they can guide the formulation of hypothesis for disease etiology in complex traits. We outline a strategy that can be applied to the PEGS cohort to systematically discover GxE interactions, which will be essential to gain mechanistic insights of traits like hypercholesterolemia, potentially leading to new strategies for disease prevention and personalized medicine.

Invited Presentation: Modeling interactions between genetics and environment to guide disease risk prediction, biological discovery, and prevention strategies
Confirmed Presenter: Federico Marotta, EMBL, Germany

Format: In Person


Authors List: Show

  • Federico Marotta, EMBL, Germany
  • Alessandro Lussana, EMBL-EBI, United Kingdom
  • Evangelia Petsalaki, EMBL-EBI, United Kingdom
  • Peer Bork, EMBL, Germany

Presentation Overview: Show

In the context of the PEGS DREAM Challenge, our team focused on modeling the interactions between genetic and environmental factors to predict the risk of hypercholesterolemia. By integrating health and exposure survey data with genetic information, we developed a machine learning model that leverages both a random forest classifier and polygenic risk scores (PRS) to assess individual risk. The results demonstrate the importance of combining genetic and environmental data to improve disease risk prediction, with our approach showing promise in enhancing the accuracy of hypercholesterolemia classification.

Our work emphasizes the importance of identifying gene-environment (GxE) interactions beyond the aim of improving predictions. Understanding these interactions will be crucial in the future of human genetics as they can guide the formulation of hypothesis for disease etiology in complex traits. We outline a strategy that can be applied to the PEGS cohort to systematically discover GxE interactions, which will be essential to gain mechanistic insights of traits like hypercholesterolemia, potentially leading to new strategies for disease prevention and personalized medicine.

16:55-17:25
Session: Personal Environment and Genes Study (PEGS) Challenge
Invited Presentation: Presentation of the results of the Team SpiderBobs for the PEGS Dream Challenge
Confirmed Presenter: Johannes Falk, Constructor University Bremen, Germany

Format: Live Stream


Authors List: Show

  • Johannes Falk, Constructor University Bremen, Germany
  • Jyoti Jyoti, Constructor University Bremen, Germany
  • Venetia Voutsa, Constructor University Bremen, Germany
  • Ali Salehzadeh-Yazdi, Constructor University Bremen, Germany
  • Eda Cakir, Constructor University Bremen, Germany
  • Marc-Thorsten Hütt, Constructor University Bremen, Germany

Presentation Overview: Show

The PEGS Dream Challenge was about the mysteries behind the complex disease of hypercholesterolemia. The participating teams had to develop models to determine the various factors associated with hypercholesterolemia to identify individuals at high risk and improve the general understanding of this disease. For this purpose, the primary data source was the multi-dimensional genotype, phenotype, and environmental data from the Personalized Environment and Genes Study (PEGS). The PEGS Dream Challenge consisted of two separate competitions. In Task 1, the goal was to predict the risk of hypercholesterolemia based on given PEGS data. The goal of Task 2, an ideation challenge, was to develop new hypotheses to improve the understanding of the etiology of hypercholesterolemia.
Here, we present the results of the team SpiderBobs, which came first in both tasks. In particular, we show how our team from Constructor University Bremen approached the challenge, how we developed our model, and what hypothesis we arrived at.

17:25-18:00
Panel: DREAM and LLMs and BMFMs
Format: In person


Authors List: Show