Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Monday, July 21st
11:20-11:40
UniProt: Evolving Tools and Data for Protein Science
Format: In person


Authors List: Show

  • Daniel Rice, EMBL-EBI, United Kingdom

Presentation Overview: Show

The Universal Protein Resource (UniProt) is a cornerstone of molecular biology and bioinformatics, delivering high-quality, freely accessible protein sequence and functional information for over 20 years. This session presents a guided tour of UniProt’s latest features, datasets, and tools, reflecting its continued evolution to meet the needs of the scientific community.

We will highlight data integrations—including AlphaFold structural predictions, AlphaMissense variant effect predictions, RNA editing, post-translational modifications (PTMs), and Human Proteome Project (HPP) datasets—and demonstrate embedded visualizations developed by UniProt and third-party contributors. Attendees will learn about improved tools for browsing, analyzing, and exporting data, along with recent enhancements to UniProt’s API and new Swagger documentation that streamline programmatic access and data integration.

Whether you're a student or a seasoned researcher, this session will help you better leverage UniProt in your work. We will emphasize practical applications and encourage engagement with UniProt’s expanding capabilities. Attendees will leave with a deeper understanding of how to integrate UniProt resources into their workflows—and how to contribute feedback to guide its future development.

11:40-12:20
Genomics 2 Proteins portal: A resource and discovery platform for linking genetic screening outputs to protein sequences and structures
Format: In person


Authors List: Show

  • Sumaiya Iqbal, Broad Institute of MIT and Harvard, United States
  • Jordan Safer, Broad Institute of MIT and Harvard, United States

Presentation Overview: Show

Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics have generated genetic variants at an unprecedented scale. However, efficient tools and resources are needed to link disparate data types – to “map” variants onto protein structures, to better understand how the variation causes disease, and thereby design therapeutics. Here we present the Genomics 2 Proteins Portal (G2P; g2p.broadinstitute.org): a human proteome-wide resource that maps 49,500,857 genetic variants onto 42,481 protein sequences and 84,318 structures (according to Dec 2024 release), with a comprehensive set of structural and functional features. Additionally, the G2P portal allows users to interactively upload protein residue-wise annotations (variants, scores, etc.) as well as the protein structure beyond databases to establish the connection between genomics to proteins. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotypes.

12:20-13:00
ModCRE: modelling protein–DNA interactions and transcription-factor co-operativity in cis-regulatory elements
Format: In person


Authors List: Show

  • Patrick Gohl, Universitat Pompeu Fabra, Spain
  • Patricia Bota, Universitat Pompeu Fabra, Spain

Presentation Overview: Show

ModCRE is a web server using a structural approach to predict Transcription Factor binding preferences and auto-mate modelling of higher order regulatory complexes with DNA.

14:00-14:20
Orchestrating Microbiome Analysis with Bioconductor
Format: In person


Authors List: Show

  • Tuomas Borman, University of Turku, Finland
  • Leo Lahti, University of Turku, Finland

Presentation Overview: Show

Microbes play a crucial role in human health, disease, and the environment. Despite their significant impact, the mechanisms underlying microbiome interactions remain largely unknown due to their complexity. While microbiome research has heavily relied on sequencing data, understanding these interactions requires multi-omics approaches and computational methods.

R/Bioconductor is a well-established platform for biological data analysis, providing high-quality, open-source software. It is driven by a global community of researchers who collaborate through software development, shared standards, and active forums. The software is built on standardized data containers, with SummarizedExperiment being the most widely used, adopted across a wide range of biological fields. Shared data containers enable interoperability and facilitate advanced data integration.

This session will showcase Bioconductor tools for microbiome data science, with a particular focus on the mia (Microbiome Analysis) framework through a practical case study. By the end of the session, participants will gain insights into the latest advances in microbiome research within Bioconductor, including TreeSummarizedExperiment data container along with essential methods. They will also be prepared to further explore the data science ecosystem, supported by the online book Orchestrating Microbiome Analysis with Bioconductor.

14:20-14:40
Smart Turkana Beads: A Culturally Embedded IoT Innovation for Health Monitoring and Drug Adherence
Format: In person


Authors List: Show

  • Meya Brian, Freelance, Kenya

Presentation Overview: Show

Access to quality healthcare remains a persistent challenge in marginalized regions
such as Turkana County in northern Kenya, where traditional beliefs, nomadic
lifestyles, and weak infrastructure significantly hinder health service delivery and
uptake, particularly in chronic disease management and drug adherence. This proposal
introduces an innovative solution: culturally embedded Internet of Things (IoT)
technology in the form of Smart Turkana Beads—a wearable health-monitoring
device seamlessly integrated into the community’s traditional attire to enhance health
surveillance and promote drug adherence. Leveraging Indigenous Knowledge
Systems (IKS), the project seeks to reduce the cultural dissonance typically
encountered by conventional biomedical technologies, while addressing key barriers
to healthcare access in a way that aligns with local values and practices (Mwakalinga
et al., 2021)

14:40-15:00
AI and Quantum in Healthcare and Life Sciences
Format: In person


Authors List: Show

  • Filippo Utro, IBM, United States

Presentation Overview: Show

"The advent of foundation models (FM) and quantum computing
(QC) has ushered in a new paradigm for tackling complex
problems, igniting significant interest across diverse
sectors, particularly within healthcare and life sciences.
This talk will provide an exploration of the latest efforts
at IBM Research dedicated to leveraging FM and QC for
accelerating discovery in healthcare and life sciences. The
discussion will span a range of applications in omics data,
clinical trials, and drug discovery. Finally, in this
presentation, I will discuss technical challenges,
envisioning the new era of FM and QC in healthcare and life
sciences."

15:00-15:20
Integrating Long-Read Sequencing and Multiomics for Precision Cell Line Engineering
Format: In person


Authors List: Show

  • Daniel Fabian, Lonza, United Kingdom

Presentation Overview: Show

Optimizing the biomanufacturing of therapeutic molecules, such as monoclonal antibodies, is critical for delivering efficient and scalable patient treatment. A key early step in this production pipeline is the development of Chinese Hamster Ovary (CHO) cell lines that produce these biologic molecules. To address the rising demand for therapeutic biologics, it is essential to enhance cell line expression to achieve consistently high and stable product titers. Lonza has recently integrated nanopore long-read sequencing into its multiomics pipelines, providing unprecedented molecular insights into the genetic and epigenetic landscapes of CHO cell lines, advancing both genetic engineering and biomarker discovery. This presentation will highlight recent progress in improving omics data accuracy through de novo genome assembly, and the integration of nanopore whole genome sequencing and DNA methylation analysis with other next-generation sequencing technologies.

15:20-15:40
Title yet to be given by the sponsor
Format: In person


Authors List: Show

15:40-16:00
Title yet to be given by the sponsor
Format: In person


Authors List: Show

Tuesday, July 22nd
11:20-11:40
Scale with Seqera: Accelerate, Expand, and Collaborate
Format: In person


Authors List: Show

  • Adam Talbot, Seqera, United Kingdom
  • Geraldine Van der Auwera, Seqera, United States

Presentation Overview: Show

Turning a promising research project into a robust, real-world solution requires tools that support both early experimentation and enterprise-scale deployment. When reproducibility and reliability are non-negotiable, you need a platform that's flexible during ideation and powerful enough to meet the demands of mega-scale computation and collaborative research.

Too often, scaling up means switching tools, rewriting pipelines, or reprovisioning infrastructure — an expensive, frustrating process that can introduce errors and undermine scientific reproducibility.

In this talk, we will explore how Seqera's integrated suite of products empowers you to scale and accelerate your scientific research.

11:40-12:00
SimpleVM - Effortless Cloud Computing for Research
Format: In person


Authors List: Show

  • Viktor Rudko, Forschungszentrum Jülich GmbH, Germany
  • Peter Belmann, Forschungszentrum Jülich GmbH, Germany
  • Nils Hoffmann, Forschungszentrum Jülich GmbH, Germany
  • David Weinholz, Forschungszentrum Jülich GmbH, Germany
  • Alexander Sczyrba, Forschungszentrum Jülich, IBG-5, Germany

Presentation Overview: Show

SimpleVM empowers life science researchers to harness cloud resources, regardless of their expertise in cloud computing.
As a multi-cloud application, SimpleVM is optimized for seamless integration with multiple OpenStack® installations. From an OpenStack administrator's perspective, all that is required is an OpenStack project without the need for additional admin privileges. SimpleVM features enhanced security components that scan connection attempts to virtual machines and automatically block suspicious access attempts.

By combining KeyCloak with a Django-based service layer, SimpleVM provides comprehensive user management and customizable role-based access control.
This facilitates the integration of Authentication and Authorisation Infrastructure (AAI) for seamless use of various Identity Providers (IDPs), including LifeScience AAI or a local university IDP. 

In addition to the straightforward launch of virtual machines, the emphasis is placed on advanced features that are intended to streamline and enhance the user experience when working with cloud resources. For example, Virtual Research Environments (VREs) can be deployed with just a few clicks, providing access to powerful applications such as Visual Studio Code(R) or RStudio directly from the browser.
 
The SimpleVM workshop mode fosters the realization of high-attendance teaching sessions. Workshop instructors can easily pre-configure, launch and assign machines to attendees in no time. 
Finally, SimpleVM improves the use of cloud resources with features such as the auto-scalable SLURM clusters.

12:00-12:20
Overture Prelude: A toolkit for small teams with big data problems
Confirmed Presenter: Mitchell Shiell, Ontario Institute of Cancer Research (OICR), Canada

Format: In person


Authors List: Show

  • Mitchell Shiell, Ontario Institute of Cancer Research (OICR), Canada
  • Melanie Courtot, Ontario Institute for Cancer Research, Canada
  • Brandon Chan, Ontario Institute for Cancer Research (OICR), Canada
  • Jon Eubank, Ontario Institute for Cancer Research (OICR), Canada
  • Robin Haw, OICR, Canada
  • Justin Richardsson, Ontario Institute for Cancer Research (OICR), Canada
  • Leonardo Rivera, Ontario Institute for Cancer Research (OICR), Canada
  • Lincoln Stein, Ontario Institute for Cancer Research, Canada
  • Overture Team, Ontario Institute of Cancer Research (OICR), Canada

Presentation Overview: Show

Overture is used to build platforms that enable researchers to organize and share their data quickly, flexibly and at multiple scales. While Overture successfully powers major international platforms like ICGC-ARGO (100,000+ participants) and VirusSeq (500,000+ genomes), smaller teams generating massive data face prohibitive technical requirements during implementation. How then can we enable teams to build their data platform efficiently and with fewer resources? Prelude addresses this challenge by breaking down platform development into incremental phases, reducing the technical overhead during development and allowing teams to systematically verify requirements through hands-on testing, gaining insights into workflows, data needs, and platform fit.

Prelude guides teams through three progressive phases of data platform development each building upon the previous one's foundation:

- Phase one focuses on data exploration and theming, enabling teams to visualize
and search their data through a customizable UI;

- Phase two expands capabilities to enable tabular data management and validation
with persistent storage;

- Phase three adds file management and object storage.

These phases are supported by comprehensive documentation, deployment automations, and utilities that generate key configuration files, reducing unnecessary time spent on tedious manual configurations.

Prelude represents a practical step toward making data platform development accessible to all teams of all sizes. By providing a widely accessible platform we hope to encourage community requests and feedback such that we can improve and iterate on the platform making it the best it can be for advancing data sharing and reuse across the scientific community.

14:00-14:20
GPCRVS - AI-driven decision support system for GPCR virtual screening
Format: In person


Authors List: Show

  • Dorota Latek, University of Warsaw, Faculty of Chemistry, Poland

Presentation Overview: Show

GPCRVS represents an efficient, simple, easily accessible, and open-source web service that, as a decision support system, aims to facilitate the preclinical testing of drug candidates targeting peptides and small protein-binding G protein-coupled receptors. There are three major areas of drug discovery that GPCRVS could facilitate: prediction of drug selectivity, prediction of drug efficacy approximated by Autodock Vina docking scores, or by activity class assigned by the TensorFlow multiclass classifier, or by pChEMBL predictions using the LightGBM regressor, and finally prediction of the drug binding mode, showing the most crucial amino acids involved in the drug-receptor interactions. A comparison with precomputed results for known active compounds enables the prioritization of drug candidates, thereby significantly reducing the cost and length of experimental screening.
In addition, a novel approach to using peptide ligand data sets as SMILES-based fingerprints in conjunction with small-molecule ligand data sets in the training of DNN and GBM models was proposed. This makes possible to benefit from all GPCR-like ligand data sets deposited in ChEMBL, and to design new drugs that could include both peptide and non-peptide scaffolds of increased, unified activity and selectivity. Currently, two groups of peptide/small protein-binding GPCR receptors are included in GPCRVS, allowing it to make comparative predictions for class A and B receptors at the same time. The evaluation of GPCRVS performed using the patent compound data set from Google Patents showed that LightGBM provides the most accurate results among the three classifiers implemented in GPCRVS.

14:20-14:40
Self-supervised generative AI enables conversion of two non-overlapping cohorts
Format: In person


Authors List: Show

  • Supratim Das, University of Hamburg, Germany
  • Mahdie Rafiei, University of Hamburg, Germany
  • Andreas Maier, University of Hamburg, Germany
  • Linda Baumbach, University of Hamburg, Germany
  • Jan Baumbach, University of Hamburg, Germany

Presentation Overview: Show

Prognostic models in healthcare often rely on big data, which is typically distributed across multiple medical cohorts. Even if collected for similar purposes (e.g., capturing symptoms, treatments, and outcomes in osteoarthritis), they frequently differ in acquisition methods, structures, and variable definitions used. These discrepancies impede their integration into a unified, multi-cohort database for joint prognostic model training and pose significant challenges to model transferability, meaning a model trained on one cohort needs to be applied to data of a similar cohort with an incompatible data structure. Current cohort conversion approaches rely on AIs trained on linked, overlapping samples, which many healthcare cohorts lack. Here, we present DB-converter, a self-supervised deep learning architecture leveraging category theory and designed to convert data from different cohorts with different data structures into each other. We demonstrate the power and robustness of the DB-converter using synthetic and real health survey data. Our approach opens new avenues for multi-cohort analyses operating under the assumption that all cohorts to be integrated have been acquired for at least a similar real-world purpose.

14:40-15:00
The Bioverse - Biomolecule data processing for AI made easy
Format: In person


Authors List: Show

  • Tim Kucera, Max Planck Institute of Biochemistry, Germany
  • Karsten Borgwardt, Max Planck Institute of Biochemistry, Germany

Presentation Overview: Show

We introduce the bioverse, a free and open-source Python package that streamlines biological data preparation for machine learning. Focused on structural biology, it standardizes diverse biomolecular formats for flexible, high-performance workflows. Our demonstration will showcase key features, code examples, and how to launch your own ML projects in minutes.

15:00-15:20
Data, We Need to Chat": A Case for Enhanced Data Discovery and Dataset Exploration in the Biomedical Field
Format: In person


Authors List: Show

  • Susheel Varma, Sage Bionetworks, United Kingdom
  • Jineta Banerjee, Sage Bionetworks, United States
  • Robert Allaway, Sage Bionetworks, United States
  • John Hill, Sage Bionetworks, United States
  • Jay Hodgson, Sage Bionetworks, United States
  • Alberto Pepe, Sage Bionetworks, United States
  • Christine Suver, Sage Bionetworks, United States
  • Luca Foschini, Sage Bionetworks, United States

Presentation Overview: Show

"The exponential growth of biomedical datasets presents unprecedented opportunities for scientific discovery, yet researchers struggle to find and explore relevant data. Traditional search methods fall short when navigating complex, highly regulated biomedical data repositories. This paper examines these limitations and proposes AI-powered conversational interfaces as a solution.

Key obstacles to effective data discovery include repository fragmentation, inconsistent metadata, vocabulary mismatches, complex search requirements, and inadequate interface design. These challenges are intensified in biomedical research by regulatory restrictions on accessing sensitive data.

Conversational AI systems offer a promising alternative by enabling natural language dialogue with data repositories. Unlike keyword searches, these interfaces understand user intent, ask clarifying questions, and guide researchers to relevant datasets. Synapse.org's experimental chatbot implementation demonstrates how AI-assisted discovery processes complex queries (e.g., ""datasets related to people over 60 with Alzheimer's disease and Type 2 diabetes"") without requiring database expertise. This approach leverages Retrieval-Augmented Generation (RAG) while respecting authorization levels and regulatory compliance.

Such systems facilitate ""metadata spelunking,"" allowing researchers to explore dataset composition, methodology, and potential utility without needing to access sensitive raw data. The paper addresses ethical considerations related to privacy, bias, and trust, while outlining future possibilities for interdisciplinary data discovery.

By bridging the gap between vast biomedical data repositories and researchers, conversational AI interfaces promise to democratize data access, accelerate discovery, and ultimately improve human health."

15:20-15:40
BioInfore: A No-Code Genome Data Management System Based On AI Agents
Format: In person


Authors List: Show

  • Zheng Chen, SANKEN, Osaka University, Japan, Japan
  • Ziwei Yang, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan5, Japan
  • Xihao Piao, SANKEN, Osaka University, Japan, Japan
  • Peng Gao, Institute for Quantitative Biosciences, The University of Tokyo, Japan, Japan
  • Yasushi Sakurai, SANKEN, Osaka University, Japan, Japan
  • Yasuko Matsubara, SANKEN, Osaka University, Japan, Japan

Presentation Overview: Show

In many genomic projects, selecting and preparing assemblies requires complex database queries, manual metadata curation, and bespoke code scripting. We introduce a no-code AI agent workflow that replaces all of these steps with a single plain natural language request. Behind the scenes, five specialized AI agents handle retrieval, quality filtering, ranking, and format conversion. Users receive analysis-ready genome datasets in minutes, freeing them from programming barriers and manual errors so they can focus on biological discovery.

15:40-16:00
Omi: Bridging the Informatics to Bio Gap with a Natural Language Co-pilot
Format: In person


Authors List: Show

  • Prashant Bharadwaj Kalvapalle, Rice University, United States
  • Eddie Kim, Rice University, United States
  • Marko Tanevski, Rice University, United States
  • Sahil Joshi, Rice University, United States
  • Benjamin Mao, Rice University, United States
  • Anshumali Shrivastava, Rice University, United States
  • Todd Treangen, Rice University, United States

Presentation Overview: Show

Omi facilitates bioinformatics analysis by replacing complex command-line processes with a natural language bioinformatics co-pilot. We codify bioinformatics best practices into the LLM to select appropriate pipelines, provide explanations before running them, and return results after pipeline execution. Further democratization through coding bespoke statistical and data visualizations is underway.