Attention Presenters - please review the Speaker Information Page available here

DREAM sponsors

Schedule subject to change
All times listed are in BST
Monday, July 21st
14:00-14:15
Invited Presentation: Benchmarking foundation models in biology: where we are, and we where we want to go with the community
Confirmed Presenter: Julio Saez-Rodriguez

Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Julio Saez-Rodriguez

Presentation Overview: Show

The AI promise of powerful solutions to solve biomedical and healthcare related problems needs to be accompanied by a transparent evaluation and proof of reproducibility of the corresponding algorithms. The evaluation for algorithms in machine learning (ML) is typically done by assessing their performance on prediction tasks. The application of this benchmarking paradigm to foundation models (FM) is not straightforward. FMs are typically trained using self-supervised methods that don’t need labeled ground truth data, and therefore the models are embodiments of the phenomena that gave rise to the data. Usually, the training of FMs is followed by finetuning the models for specific tasks that are trained using more traditional ML methodologies, and can be benchmarked as such ML models. However, such an assessment would fail to elucidate if possible failures of the model reside in the FM or in the refined model. In this era of FMs, there is a need to rethink rigorous evaluation, both in what it means to validate and in how we validate. One strategy could be a concerted effort of the communities developing and using them, whereby crowdsourcing microtasks that test the limits of these models from every possible perspective within the domain of competence of the models, and in a continuous manner. The aim of this DREAM session at ISMB is to explore together this strategy and define as a community a roadmap to move forward with such a critically needed benchmark of foundation models.

14:15-14:45
Invited Presentation: Building Foundation Models for Single-cell Omics and Imaging
Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Bo Wang

Presentation Overview: Show

Keynote: This talk delves into the innovative utilization of generative AI in propelling biomedical research forward. By harnessing single-cellsequencing data, we developed scGPT, a foundational model that extracts biological insights from an extensive dataset of over 33 million cells. Analogous to how words form text, genes define cells, effectively bridging the technological and biological realms. The strategic application of scGPT via transfer learning significantly boosts its efficacy in diverse applications such as cell-type annotation, multi-batch integration, and gene network inference.Additionally, the talk will spotlight MedSAM, a state-of-the-art segmentation foundational model. Designed for universal application, MedSAM excels across various medical imaging tasks and modalities. It showcased unprecedented advancements in 30 segmentation tasks, outperforming existing models considerably. Notably, MedSAM possesses the unique ability for zero-shot and few-shot segmentation, enabling it to identify previously unseen tumor types andswiftly adapt to novel imaging modalities.Collectively, these breakthroughs emphasize the importance of developing versatile andefficient foundational models. These models are poised to address the expanding needs of imaging and omics data, thus driving continuous innovation in biomedical analysis.

14:45-15:00
Invited Presentation: Predicting Perturbation Effects: Are We Really There?
Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Maria Brbic

Presentation Overview: Show

TBA

15:00-15:15
Invited Presentation: AI Alliance: Benchmarking foundation models for drug discovery
Confirmed Presenter: Pablo Meyer-Rojas

Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Pablo Meyer-Rojas

Presentation Overview: Show

The AI Alliance is focused on fostering an open community and enabling developers and researchers to accelerate responsible innovation in AI while ensuring scientific rigor, trust, safety, security, diversity and economic competitiveness. We bring together a critical mass of compute, data, tools, and talent to accelerate and advocate for open innovation in AI. Together with DREAM challenges we aim to create a world-class research community that harnesses the potential of AI foundation models, transforms the field of drug discovery, and accelerates scientific progress by driving interdisciplinary collaboration on AI-powered drug discovery projects in the open. IBM Research biomedical foundation model (BMFM) technologies leverage multi-modal data of different types, including drug-like small molecules and proteins (covering a total of more than a billion molecules), as well as DNA and single-cell RNA sequence.

15:15-15:30
Invited Presentation: Benchmarking in Service of Virtual Cell Models: Challenges, Opportunities, and a Path Forward
Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Katrina Kalantar

Presentation Overview: Show

TBA

15:30-16:00
Invited Presentation: Deep learning models of regulatory DNA: A critical analysis of model design choices
Confirmed Presenter: Anshul Kundaje

Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Anshul Kundaje

Presentation Overview: Show

Keynote: Gene expression is tightly regulated by complexes of proteins that interpret complex sequence syntax encoded in regulatory DNA. Genetic variants influencing traits and diseases often disrupt this syntax. Several deep learning models have been developed to decipher regulatory DNA and identify functional variants. Most models use supervised learning to map sequences to cell-specific regulatory activity measured by genome-wide molecular profiling experiments. The general trend in model design is towards larger, multi-task, supervised models with expansive receptive fields. Further, emerging self-supervised DNA language models (DNALMs) promise foundational representations for probing and fine tuning on limited datasets. However, rigorous  evaluation of these models against lightweight alternatives on biologically relevant tasks have been lacking. In this talk, I will demonstrate that light-weight, single-task CNNs are competitive with or significantly outperform massive supervised transformer models and fine-tuned DNALMs on critical prediction tasks. Additionally, I will show that the multi-task, supervised models learn causally inconsistent features, impairing counterfactual prediction, interpretation, and design. In contrast, our lightweight, single task models are causally consistent and provide robust, interpretable insights into regulatory syntax and genetic variation, enabling scalable novel discoveries.

16:40-16:55
Invited Presentation: Benchmarking Multi-Modal Large Language Models for Metastatic Breast Cancer Prognosis
Confirmed Presenter: Justin Guinney

Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Justin Guinney

Presentation Overview: Show

Inputs into cancer prognostic models are primarily structured data such as demographic and clinicopathological features, and lack richer and temporal context often found in unstructured clinical notes. We hypothesize that creating a temporal clinical patient note from structured data that preserves longitudinal and clinical contextual information, and coupling it with a large language models (LLM) that is trained to prognosticate overall survival (OS), may improve model accuracy with an interpretable embedding space. In this study, we benchmark different LLMs and fine-tuning strategies to develop optimal models for predicting overall survival from time of metastasis in a large cohort of de-identified patients with metastatic breast cancer.

16:55-17:30
Panel: Crowdsourcing Experiment
Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Gustavo Stolovitzky

Presentation Overview: Show

We will collectively conduct a small scale community experiment to simulate a large scale crowdsourcing initiative to benchmark foundation models.

17:30-18:00
Panel: Evaluating and Benchmarking Foundation Models
Room: 12
Format: In person

Moderator(s): Gustavo Stolovitzky


Authors List: Show

  • Bo Wang, Anshul Kundaje, Justin Guinney, Katrina Kalantar, Maria Brbic, Luca Foschini

Presentation Overview: Show

Speakers will give their opinions and about best practices to evaluate foundation models in biomedicine, and engage in conversation with attendees.