Attention Presenters - please review the Speaker Information Page available here

This full-day track will explore the transformative potential of cloud and emerging technologies in biomedical research, focusing on quantum computing, digital twins, artificial intelligence (AI), and NIH cloud cyberinfrastructure initiatives. Hosted by the National Institutes of Health (NIH) Office of Data Science Strategy (ODSS) and Center for Information Technology (CIT), the session aims to bridge cutting-edge research and practical applications in areas like drug discovery, precision medicine, and personalized healthcare. It will bring together researchers, practitioners, and technology leaders to discuss the latest advancements and how they can revolutionize biomedical research and clinical applications.

Schedule subject to change
All times listed are in BST
Monday, July 21st
11:20-11:25
Opening Remarks for NIH Track
Room: 11A
Format: In person


Authors List: Show

  • Susan Gregurick, PhD
11:25-11:44
Invited Presentation: Digital Twins: Functional and Mechanistic Reconstruction of Multiple Myeloma Tumors
Room: 11A
Format: In person


Authors List: Show

  • Ariosto S. Silva, PhD
11:44-12:03
Invited Presentation: Network Science for Cyber-physical Twinning of Human Heart
Room: 11A
Format: In person


Authors List: Show

  • Timothy Kuo
12:03-12:22
Invited Presentation: A Digital Twins Prototype for Monitoring and Predicting Dynamic Diet-related Health Conditions
Room: 11A
Format: In person


Authors List: Show

  • Hua Fang, PhD
  • Honggang Wang, PhD
12:22-12:41
Invited Presentation: Multiscale Digital-Twin Modeling and Estimation with Indirect, Neurological Data
Room: 11A
Format: In person


Authors List: Show

  • Matthew F. Singh, PhD
12:41-13:00
Invited Presentation: Towards a Digital Twin Initiative for Neurodegenerative Diseases
Room: 11A
Format: In person


Authors List: Show

  • Karuna P Joshi, PhD
14:00-14:20
Advancing Discovery through GenAI and Scalable Infrastructure
Room: 11A
Format: In person

Moderator(s): Nick Weber


Authors List: Show

  • Sean D. Mooney. PhD

Presentation Overview: Show

Opening remarks from the NIH Center for Information Technology (CIT) will highlight how the STRIDES Initiative and Cloud Lab program are accelerating biomedical research through expanded access to scalable, secure cloud infrastructure. The Director will frame NIH’s vision for enabling FAIR data practices, fostering innovation through generative AI, and lowering technical barriers to empower the research community with next-generation tools and resources.

14:20-14:40
Invited Presentation: A Global Model for FAIR and Open Research: Scalable, Collaborative Infrastructure in Action
Room: 11A
Format: In person

Moderator(s): Nick Weber


Authors List: Show

  • Kristi Holmes, PhD

Presentation Overview: Show

Learn how Zenodo and InvenioRDM are advancing global biomedical research through scalable, AI-integrated, and FAIR-aligned infrastructure. This session highlights a powerful open-source model enabling reproducible science, seamless data sharing, and next-gen curation workflows—built by and for the research community.

14:40-15:00
Invited Presentation: Beyond Data Sharing: AI-Powered Solutions for Effective Biomedical Data Reuse
Room: 11A
Format: In person

Moderator(s): Nick Weber


Authors List: Show

  • Luca Foschini, PhD
  • Susheel Varma, Sage Bionetworks, United Kingdom

Presentation Overview: Show

This session will showcase AI-powered innovations that move biomedical research beyond data sharing toward meaningful data reuse. Using real-world platforms like Synapse.org and the Neurofibromatosis Data Portal, this session will highlight new tools for automating metadata harmonization, accelerating data discovery through conversational AI, and preparing datasets for machine learning. The talk will also explore emerging standards and open science challenges that are shaping the future of AI-ready biomedical data.

15:00-15:20
Invited Presentation: Reusable Cyberinfrastructure and Use Cases for the Cancer Research Data Commons (CRDC)
Room: 11A
Format: In person

Moderator(s): Nick Weber


Authors List: Show

  • Tanja Davidsen, PhD

Presentation Overview: Show

This session highlights how NIH’s Cancer Research Data Commons (CRDC) and supporting cyberinfrastructure are transforming cancer research through scalable, secure, and interoperable platforms. Learn how modular technologies, cloud-native services, and AI-driven tools are accelerating multi-modal data integration, enabling predictive analytics, and advancing personalized cancer care across the biomedical research ecosystem.

15:20-15:40
Invited Presentation: Power Your Kids First or INCLUDE Data Analysis on The Interoperable CAVATICA Cloud Analytics Workspace
Confirmed Presenter: Jared Rozowsky, Velsera, USA

Room: 11A
Format: In person

Moderator(s): Nick Weber


Authors List: Show

  • Jared Rozowsky, Velsera, USA
  • Surya Saha, Velsera, USA
  • Jack DiGiovanna, Velsera, USA
  • Marcia Fournier, NIH, USA
  • Concepcion Nierras, NIH, USA
  • Huiqing Li, NIH, USA

Presentation Overview: Show

The NIH-funded Gabriella Miller Kids First Data Resource Center (KF-DRC) and the INCLUDE Data Coordinating Center (INCLUDE DCC) provide harmonized datasets for researchers to investigate pediatric cancer, structural birth defects, and co-occurring conditions of Down Syndrome. Broadly, the goals of the two programs are to accelerate discovery, enhance healthcare, and change lives. CAVATICA is a data analysis and sharing platform designed to accelerate discovery in a scalable, cloud-based compute environment that is shared by both programs.

CAVATICA supports a unique integration with STRIDES, allowing all academic users on the platform to leverage the STRIDES discount without having to set up individual accounts. This setup means research dollars can go farther and drive us closer to a cure. Additionally, STRIDES has funded the KF and INCLUDE Cloud Credit program. While researchers can use primary files from the data portals without incurring storage fees, data analysis and storage of secondary files do incur charges. To aid researchers, the Cloud Credit Program supports data generators and secondary data users who want to analyze data in the cloud, leveraging existing tools, or developing their own tools to analyze data.

To date, Kids First has approved 31 research projects and allocated $49,000 of funding. INCLUDE has approved 12 projects and allocated $22,000 of funding. Both programs have supported researchers leading to multiple abstracts, presentations, and manuscripts. Some of the tools generated with the support of the Cloud Credit program are also available on CAVATICA for others to use in the public apps gallery and referenced in publications.

Applications to the Cloud Credit Program are open, and the program continues to support researchers in their endeavor to accelerate discovery, enhance healthcare, and change lives. We have open office hours to help users get started twice a week (https://www.cavatica.org/contact-us) and a 24/7 helpdesk staffed by our support staff.

As part of the KF and INCLUDE data ecosystems, CAVATICA not only allows researchers to leverage the cloud-based platform to access and analyze data from their respective data portals, researchers can also integrate their own data or utilize the platforms interoperability with the Cancer Research Data Commons, BioData Catalyst, or NCBI’s Sequence Read Archive, giving access to all data controlled by dbGaP. CAVATICA uses Research Auth Service (RAS) to ensure proper authorization of files. All analyses can be shared with other users with appropriate permission controls. CAVATICA supports workflow languages (CWL and NextFlow) for ‘tasks’ or ‘interactive analysis’ using JupyterLab or RStudio, either through the graphical user interface or API. Put together, CAVATICA allows researchers securely access and analyze controlled data, accelerating discovery and driving cures.

15:40-16:00
Invited Presentation: The Gene Set Browser: An interoperable and AI/ML-ready tool for gene set analysis in the Common Fund Data Ecosystem (CFDE)
Confirmed Presenter: Julie Jurgens, Broad Institute of MIT and Harvard, USA

Room: 11A
Format: In person

Moderator(s): Nick Weber


Authors List: Show

  • Julie Jurgens, Broad Institute of MIT and Harvard, USA
  • Vlado Dancik, Broad Institute of MIT and Harvard, USA
  • Ryan Koesterer, Broad Institute of MIT and Harvard, USA
  • Patrick Smadbeck, Broad Institute of MIT and Harvard, USA
  • Dongkeun Jang, Broad Institute of MIT and Harvard, USA
  • Alex Shillin, Broad Institute of MIT and Harvard, USA
  • Trang Nguyen, Broad Institute of MIT and Harvard, USA
  • MacKenzie Brandes, Broad Institute of MIT and Harvard, USA
  • Jason Flannick, Broad Institute of MIT and Harvard, USA
  • Noël Burtt, Broad Institute of MIT and Harvard, USA

Presentation Overview: Show

Summary:
This session introduces the NIH Common Fund Data Ecosystem (CFDE) Gene Set Browser, an AI/ML-ready tool that connects diverse biomedical datasets to uncover novel gene-disease associations. Learn how this interoperable resource leverages Bayesian modeling and LLM-driven insights to power cross-program analysis, enable hypothesis generation, and drive discovery through FAIR, integrated data.

Abstract:
In an AI/ML-ready world, data interoperability and integration are becoming increasingly critical. The US National Institutes of Health (NIH) has risen to address these needs through major initiatives including the Common Fund Data Ecosystem (CFDE), which promotes accessibility, (re)use, and integration of NIH Common Fund programs’ data and resources through a cohesive ecosystem. By establishing common standards, data, tools, and infrastructure, CFDE serves as a model for data accessibility and interoperability.
As a compelling use case of how increased interoperability can drive data utility and scientific discovery, we present the CFDE Gene Set Browser, available through https://cfdeknowledge.org. This open-access web resource performs cross-program analyses of gene sets (lists of genes) and their relationship to additional genes, human phenotypes, and mechanisms. Importantly, this tool connects multiple disparate CFDE and non-CFDE programs, phenotypes, and data types. Through Gene Set Browser, users can learn a) which gene sets capture important biological mechanisms, and b) which mechanisms are relevant to human health.

Gene sets are derived from six CFDE programs (GlyGen, GTEx, IDG, IMPC/ KOMP2, LINCS, and MoTrPAC); intersections between CFDE programs; and differential expression analyses of CFDE transcriptomic data. Phenotypes include rare diseases from Orphanet (n=2,927) and common phenotypes/ traits from the NHGRI Association to Function Knowledge Portal (n=1,237) and the EBI GWAS Catalog (n=2,213).

Relationships between phenotypes and gene sets were computed using PIGEAN (Priors Inferred from GEne ANnotations), a novel Bayesian method. PIGEAN jointly models the probability that each gene is involved in each phenotype, given the gene sets that contain the gene and the genome-wide association study (GWAS) statistics for variants near the gene. We applied PIGEAN to the above common and rare disease phenotypes/ traits, in each case fitting a model using all CFDE gene sets, intersections of CFDE gene sets, and gene sets from the Mouse Genome Informatics database (MGI; >11,000 mouse model phenotypes) and MSigDB (pathway analyses). Users can obtain the estimated probability that the genes within each gene set are involved in disease. Additionally, the estimated probability that each gene is involved in disease is provided. For each result, an LLM enables users to explore hypotheses underlying each gene set-to-disease connection.

The Gene Set Browser has unearthed a wide range of known and novel candidate genes and mechanisms for human biological processes and diseases. For example, a gene set from MoTrPAC, a CFDE program that studies the molecular effects of exercise, reveals a list of genes that are upregulated in the blood of male rats after 2 weeks of exercise and their connection to reticulocyte count.

Through the Gene Set Browser, users can discover gene sets relevant to a wide range of research questions, explore connections between gene sets and other biological information (e.g., pathways and disease associations from external databases), and generate new hypotheses that might not be apparent from individual resources. Connecting CFDE gene sets to external resources is a powerful demonstration of how leveraging interoperability can foster scientific discovery.

16:40-16:59
Invited Presentation: Quantum Approximate Optimization for K-Area Clustering of Biological Data
Room: 11A
Format: In person


Authors List: Show

  • Fei Li, PhD
  • Yong Chen, PhD
16:59-17:18
Invited Presentation: Efficient quantum algorithm to simulate open systems through a single environmental qubit
Room: 11A
Format: In person


Authors List: Show

  • Vischi Michele, University of Trieste, Italy
  • Giovanni Di Bartolomeo , University of Trieste, Italy
  • Tommaso Feri , University of Trieste, Italy
  • Angelo Bassi, University of Trieste, Italy
  • Sandro Donadi, Queen's university Belfast, United Kingdom
17:18-17:37
Invited Presentation: Advancing quantum algorithms for elementary mode and metabolic flux analysis.
Room: 11A
Format: In person


Authors List: Show

  • Chi Zhang, PhD
17:37-17:56
Invited Presentation: Quantum Computing for Modeling Epigenetic Plasticity in Cancer Evolution
Room: 11A
Format: In person


Authors List: Show

  • Ariosto S. Silva, PhD
17:56-18:00
Closing Remarks
Room: 11A
Format: In person


Authors List: Show

  • Sean D. Mooney, PhD