Attention Presenters - please review the Speaker Information Page available here

This full-day track will explore the transformative potential of cloud and emerging technologies in biomedical research, focusing on quantum computing, digital twins, artificial intelligence (AI), and NIH cloud cyberinfrastructure initiatives. Hosted by the National Institutes of Health (NIH) Office of Data Science Strategy (ODSS) and Center for Information Technology (CIT), the session aims to bridge cutting-edge research and practical applications in areas like drug discovery, precision medicine, and personalized healthcare. It will bring together researchers, practitioners, and technology leaders to discuss the latest advancements and how they can revolutionize biomedical research and clinical applications.

Schedule subject to change
All times listed are in BST

Monday, July 21^st

11:20-11:40

Opening Remarks for NIH Track

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

11:40-12:00

Invited Presentation: Graph Kolmogorov-Arnold Networks for Interpretable Alzheimer's Disease Diagnosis from Structural MRI

Confirmed Presenter: Liang Dong, Baylor University, United States

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

Presentation Overview: Show

12:00-12:20

Invited Presentation: Network Science for Cyber-physical Twinning of Human Heart

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

12:20-12:40

Invited Presentation: A Digital Twins Prototype for Monitoring and Predicting Dynamic Diet-related Health Conditions

Confirmed Presenter: Honggang Wang, Yeshiva University, United States

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

Presentation Overview: Show

A digital twin system aims to create a virtual representation of a physical subject by modeling both its intrinsic attributes and the external factors that influence them. In this work, we present a prototype digital twin system
composed of three integrated components: (1) a non-parametric machine learning algorithm for modeling temporal and spatial data; (2) a translation module that converts predicted health outcomes and risks into natural
language descriptions of physical and mental states; and (3) a generative 3D visualization engine that dynamically illustrates health changes over time. At the core of the system is a novel random forest learning model
enhanced with Choquet LASSO feature selection, designed to capture complex, nonlinear interactions among high-dimensional features. This approach improves predictive accuracy while maintaining computational efficiency. We compare the performance of this model with both traditional and emerging methods using standard evaluation metrics. Predicted health outcomes are translated into interpretable natural language narratives,
such as estimated body shape or biomarker trajectories, which are then used to drive the generation of 3D digital twins that visually reflect the subject’s evolving physical state. The prototype is designed to monitor
and forecast the progression of health and chronic conditions based on individual-level data inputs, including food intake, electronic health records, and user-reported variables such as age, gender, weight, and waist circumference. Dietary data are processed into personalized diet quality scores using the Alternative Healthy Eating Index (AHEI), and further decomposed into macro- and micronutrient components to support granular nutritional tracking. In our case studies, we demonstrate the system using real-world longitudinal datasets spanning up to 35 years, integrating historical data on biomarkers, dietary patterns, and disease progression. Although
the digital twins in our current study are generated retrospectively, the system architecture supports real-time monitoring and simulation. This enables users to intuitively explore how dietary and lifestyle factors impact
their health, and allows healthcare professionals to deliver personalized, real-time recommendations based on individuals’ behavioral, lifestyle, and environmental contexts.

12:40-13:00

Invited Presentation: Multiscale Digital-Twin Modeling and Estimation with Indirect, Neurological Data

Confirmed Presenter: Matthew F. Singh

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

Presentation Overview: Show

Linking data and predictions across spatial scales has been a key hurdle to digital twin applications in medicine. Noninvasive measurements, such as electrophysiology, provide key temporal-insight but are indirect and spatially-coarse. By contrast, many disease mechanisms involve feedback between local cellular-signaling (spatially-fine) and the larger organ-system (spatially-coarse). These issues are inherent to neurology, as brain function relies upon a hierarchy of networks which span from local circuits defined by cell-type to the long-range “wiring” between brain regions. However, this complexity is at a mismatch with the much more limited resolution of human brain data. Thus, there is critical need to estimate digital-twins which span scales.
Methodology:
We present an algorithm for estimating physiologically-detailed digital-twins using indirect (noninvasive) measurements. In this scenario, both the model-states and parameters are unknown, leading to a dual-estimation problem that challenges current paradigms. We propose a solution in which detailed biological digital-twins are “trained” by converting the estimation-problem into a form of recurrent-neural-network which we term the generalized-Backpropagation Kalman Filter (gBPKF). This reformulation retains the original physiological-model (there are no “black-boxes”) but enables AI approaches to solve the challenging optimization problem. We benchmark our approach and demonstrate its power for digital twins in neurology using two electrophysiology datasets: the Human Connectome Project (HCP) and a study comparing Transcranial Magnetic Stimulation (TMS) protocols (waveforms) within-patient. We establish the accuracy of person-specific predictions of key neurological markers (frequency-domain statistics) which we trace back to biological mechanisms through bifurcation analysis (HCP data). Using TMS neurostimulation data, we further tested the ability to track the timecourse of delayed biological changes (plasticity). We hypothesized that these changes would mirror the delayed time course of neuroplasticity and that TMS treatment protocols would differentially alter specific microcircuits within digital-twin models.
Key Results:
Our algorithm presents state-of-the-art accuracy and efficiency (lower complexity) in benchmarking with a simulated ground-truth. Applied to real-data, we demonstrate highly reliable digital-twin estimates even with detailed brain models (>1,000 unknown parameters). Utilizing genetic-twins (monozygotic vs. dizygotic), we demonstrate high heritability of model parameters. Models correctly forecast short-term changes in brain-activity (sub-second oscillations) and long-term temporal-patterns that differentiate individuals. We identify a specific bifurcation mechanism, arising from microcircuits but not macroscopic-connections, which drives human variability in brain-dynamics. This finding illustrates the power of linking multiple spatial scales. Fit to windows of post-TMS data, digital-twins identified a sequence of slow microcircuit changes whose delayed timecourse mirrored that of brain-plasticity. The direction of effect depended upon TMS-protocol, in agreement with theorized mechanisms.
Significance:
Our research advances model-estimation and prediction for digital-twins operating at multiple spatial scales. The developed gBPKF architecture provides an efficient, accurate solution to estimating biological models from noisy, indirect measurements. It also expands digital twin-technology to complex systems, such as the brain, in which current algorithms prove intractable due to the high number of dimensions. Our applications to neurology demonstrate the power of digital-twins to predict individual outcomes and track treatment-responses in terms of local-circuits which are not directly accessible with noninvasive technology.

14:00-14:20

Advancing Discovery through GenAI and Scalable Infrastructure

Room: 02F

Format: In person

Moderator(s): Nick Weber

Authors List: Show

Presentation Overview: Show

14:20-14:40

Invited Presentation: A Global Model for FAIR and Open Research: Scalable, Collaborative Infrastructure in Action

Confirmed Presenter: Kristi Holmes

Room: 02F

Format: In person

Moderator(s): Nick Weber

Authors List: Show

Presentation Overview: Show

14:40-15:00

Invited Presentation: Beyond Data Sharing: AI-Powered Solutions for Effective Biomedical Data Reuse

Confirmed Presenter: Luca Foschini, Sage Bionetworks, USA

Room: 02F

Format: In person

Moderator(s): Nick Weber

Authors List: Show

Presentation Overview: Show

15:00-15:20

Invited Presentation: Reusable Cyberinfrastructure and Use Cases for the Cancer Research Data Commons (CRDC)

Confirmed Presenter: Tanja Davidsen, NIH/NCI, United States

Room: 02F

Format: In person

Moderator(s): Nick Weber

Authors List: Show

Presentation Overview: Show

15:20-15:40

Invited Presentation: Power Your Kids First or INCLUDE Data Analysis on The Interoperable CAVATICA Cloud Analytics Workspace

Confirmed Presenter: Jared Rozowsky, Velsera, USA

Room: 02F

Format: In person

Moderator(s): Nick Weber

Authors List: Show

Presentation Overview: Show

The NIH-funded Gabriella Miller Kids First Data Resource Center (KF-DRC) and the INCLUDE Data Coordinating Center (INCLUDE DCC) provide harmonized datasets for researchers to investigate pediatric cancer, structural birth defects, and co-occurring conditions of Down Syndrome. Broadly, the goals of the two programs are to accelerate discovery, enhance healthcare, and change lives. CAVATICA is a data analysis and sharing platform designed to accelerate discovery in a scalable, cloud-based compute environment that is shared by both programs.

CAVATICA supports a unique integration with STRIDES, allowing all academic users on the platform to leverage the STRIDES discount without having to set up individual accounts. This setup means research dollars can go farther and drive us closer to a cure. Additionally, STRIDES has funded the KF and INCLUDE Cloud Credit program. While researchers can use primary files from the data portals without incurring storage fees, data analysis and storage of secondary files do incur charges. To aid researchers, the Cloud Credit Program supports data generators and secondary data users who want to analyze data in the cloud, leveraging existing tools, or developing their own tools to analyze data.

To date, Kids First has approved 31 research projects and allocated $49,000 of funding. INCLUDE has approved 12 projects and allocated $22,000 of funding. Both programs have supported researchers leading to multiple abstracts, presentations, and manuscripts. Some of the tools generated with the support of the Cloud Credit program are also available on CAVATICA for others to use in the public apps gallery and referenced in publications.

Applications to the Cloud Credit Program are open, and the program continues to support researchers in their endeavor to accelerate discovery, enhance healthcare, and change lives. We have open office hours to help users get started twice a week (https://www.cavatica.org/contact-us) and a 24/7 helpdesk staffed by our support staff.

As part of the KF and INCLUDE data ecosystems, CAVATICA not only allows researchers to leverage the cloud-based platform to access and analyze data from their respective data portals, researchers can also integrate their own data or utilize the platforms interoperability with the Cancer Research Data Commons, BioData Catalyst, or NCBI’s Sequence Read Archive, giving access to all data controlled by dbGaP. CAVATICA uses Research Auth Service (RAS) to ensure proper authorization of files. All analyses can be shared with other users with appropriate permission controls. CAVATICA supports workflow languages (CWL and NextFlow) for ‘tasks’ or ‘interactive analysis’ using JupyterLab or RStudio, either through the graphical user interface or API. Put together, CAVATICA allows researchers securely access and analyze controlled data, accelerating discovery and driving cures.

15:40-16:00

Invited Presentation: The Gene Set Browser: An interoperable and AI/ML-ready tool for gene set analysis in the Common Fund Data Ecosystem (CFDE)

Confirmed Presenter: Julie Jurgens, Broad Institute of MIT and Harvard, USA

Room: 02F

Format: In person

Moderator(s): Nick Weber

Authors List: Show

Presentation Overview: Show

Summary:
This session introduces the NIH Common Fund Data Ecosystem (CFDE) Gene Set Browser, an AI/ML-ready tool that connects diverse biomedical datasets to uncover novel gene-disease associations. Learn how this interoperable resource leverages Bayesian modeling and LLM-driven insights to power cross-program analysis, enable hypothesis generation, and drive discovery through FAIR, integrated data.

Abstract:
In an AI/ML-ready world, data interoperability and integration are becoming increasingly critical. The US National Institutes of Health (NIH) has risen to address these needs through major initiatives including the Common Fund Data Ecosystem (CFDE), which promotes accessibility, (re)use, and integration of NIH Common Fund programs’ data and resources through a cohesive ecosystem. By establishing common standards, data, tools, and infrastructure, CFDE serves as a model for data accessibility and interoperability.
As a compelling use case of how increased interoperability can drive data utility and scientific discovery, we present the CFDE Gene Set Browser, available through https://cfdeknowledge.org. This open-access web resource performs cross-program analyses of gene sets (lists of genes) and their relationship to additional genes, human phenotypes, and mechanisms. Importantly, this tool connects multiple disparate CFDE and non-CFDE programs, phenotypes, and data types. Through Gene Set Browser, users can learn a) which gene sets capture important biological mechanisms, and b) which mechanisms are relevant to human health.

Gene sets are derived from six CFDE programs (GlyGen, GTEx, IDG, IMPC/ KOMP2, LINCS, and MoTrPAC); intersections between CFDE programs; and differential expression analyses of CFDE transcriptomic data. Phenotypes include rare diseases from Orphanet (n=2,927) and common phenotypes/ traits from the NHGRI Association to Function Knowledge Portal (n=1,237) and the EBI GWAS Catalog (n=2,213).

Relationships between phenotypes and gene sets were computed using PIGEAN (Priors Inferred from GEne ANnotations), a novel Bayesian method. PIGEAN jointly models the probability that each gene is involved in each phenotype, given the gene sets that contain the gene and the genome-wide association study (GWAS) statistics for variants near the gene. We applied PIGEAN to the above common and rare disease phenotypes/ traits, in each case fitting a model using all CFDE gene sets, intersections of CFDE gene sets, and gene sets from the Mouse Genome Informatics database (MGI; >11,000 mouse model phenotypes) and MSigDB (pathway analyses). Users can obtain the estimated probability that the genes within each gene set are involved in disease. Additionally, the estimated probability that each gene is involved in disease is provided. For each result, an LLM enables users to explore hypotheses underlying each gene set-to-disease connection.

The Gene Set Browser has unearthed a wide range of known and novel candidate genes and mechanisms for human biological processes and diseases. For example, a gene set from MoTrPAC, a CFDE program that studies the molecular effects of exercise, reveals a list of genes that are upregulated in the blood of male rats after 2 weeks of exercise and their connection to reticulocyte count.

Through the Gene Set Browser, users can discover gene sets relevant to a wide range of research questions, explore connections between gene sets and other biological information (e.g., pathways and disease associations from external databases), and generate new hypotheses that might not be apparent from individual resources. Connecting CFDE gene sets to external resources is a powerful demonstration of how leveraging interoperability can foster scientific discovery.

16:40-17:00

Invited Presentation: NIH Quantum Computing Initiatives

Confirmed Presenter: Fenglou Mao

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

17:00-17:20

Invited Presentation: Advancing quantum algorithms for elementary mode and metabolic flux analysis

Confirmed Presenter: Chi Zhang, Oregon Health & Science University, US

Room: 02F

Format: Live stream

Moderator(s): Fenglou Mao

Authors List: Show

Presentation Overview: Show

Metabolic networks play a central role in cellular function, supporting energy production, biosynthesis, and adaptation to environmental conditions. Elementary flux modes (EFMs) represent minimal sets of reactions that support steady-state flux distributions, and they form the basis for understanding metabolic capabilities and constraints. However, identifying biologically feasible EFMs in genome-scale metabolic networks remains a fundamental challenge, as the number of possible EFMs grows exponentially with network complexity, rendering full enumeration computationally infeasible.

To address these limitations, we propose a quantum-based framework for efficiently exploring biologically plausible EFM distributions and predicting sample-specific metabolic fluxes. Our method formulates both tasks as Quadratic Unconstrained Binary Optimization (QUBO) problems, which we solve using quantum annealing. By leveraging the parallel sampling capability of quantum computing, this approach enables scalable and efficient search over high-dimensional solution spaces under biological constraints. To accommodate large genome-scale models, we incorporate tensor decomposition techniques that reduce model dimensionality and enable tractable QUBO formulations.

Preliminary experiments on simulated metabolic networks with up to 25 reactions demonstrate that our method recovers diverse and structurally feasible EFMs. We observe that EFMs satisfying key properties—including stoichiometric balance, support minimality, and irreducibility—consistently appear with higher occurrence percentages across repeated sampling runs, while structurally invalid modes are rarely sampled. In contrast, EFMs that violate these constraints are sampled with very low frequency. Furthermore, when applying different sample-specific constraints, we find that the high-frequency EFM sets vary across samples, indicating that the framework can distinguish condition-specific flux distributions without requiring full enumeration.

Our framework offers a new direction for integrating omics data with constraint-based modeling using quantum-enhanced computation. This approach can be applied to a range of applications, including identifying altered pathways in disease, prioritizing therapeutic metabolic interventions, and uncovering condition-specific metabolic strategies. By bridging quantum optimization and systems biology, this method contributes a practical and interpretable tool for personalized metabolic analysis and hypothesis generation across diverse biological contexts.

17:20-17:40

Invited Presentation: Efficient quantum algorithm to simulate open systems through a single environmental qubit

Confirmed Presenter: Vischi Michele, University of Trieste, Italy

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

Presentation Overview: Show

Simulating the dynamics of open quantum systems allows to understand real-world quantum phenomena which is a crucial task in a variety of fields. Recently, the idea that open system dynamics can be understood not only as a model of physical systems, but also as a general-purpose algorithmic framework for preparing target quantum states has been explored. For example, just as Hamiltonian dynamics are often used in simulation without specifying a physical system, open system dynamics can be designed without referencing an actual system-environment interaction. The environment can be viewed as a fictitious and engineered resource akin to artificial thermostats in classical Monte Carlo or molecular dynamics simulations. Such an approach is relevant for many biomedical problems such as large-scale molecular simulations and optimizations in system biology (that can be interpreted as a specific class of state preparation problems). The approach can be implemented on quantum computers, provided the development of accurate and efficient quantum algorithms. Such algorithms have to efficiently encode the environment as well as approximating the open system dynamics with a suitable system-environment interaction to drive the system evolution. Recently many proposals appeared in the literature to achieve these goals.
In this talk I will present an efficient quantum algorithm for simulating open quantum systems dynamics described by the Markovian Lindblad master equation. In contrast to existing approaches, the proposed method achieves two significant advancements. First, it employs a repetition of unitary gates on a set of n system qubits and, remarkably, only a single ancillary bath qubit to represent the environment. It follows that, for the typical case of m locality of the Lindblad operators, we reach an exponential improvement of the number of ancilla in terms of m and up to a polynomial improvement in ancilla overhead for large n with respect to other approaches. Although stochasticity is introduced, requiring multiple circuit realizations, the sampling overhead is independent of the system size. Second, we show that, under fixed accuracy conditions, our algorithm enables a reduction in the number of Trotter steps compared to other approaches, substantially decreasing circuit depth. These advancements hold particular significance for running the algorithm on near-term quantum computers, where minimizing both width and depth is critical due to inherent noise in their dynamics. I will further discuss how this approach can be extended to simulate non-Markovian evolution, thus including memory effects of the environment.

17:40-18:00

Invited Presentation: Quantum Approximate Optimization for K-Area Clustering of Biological Data

Confirmed Presenter: Yong Chen

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show

18:00-18:00

Closing Remarks

Room: 02F

Format: In person

Moderator(s): Fenglou Mao

Authors List: Show