Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Monday, July 21st
11:20-11:40
Welcome to BOSC; Open Bioinformatics Foundation update; CoFest announcement; sponsor video
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Nomi Harris
11:40-12:40
Invited Presentation: Working together to develop, promote and protect our data resources: Lessons learnt developing CATH and TED
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Christine Orengo

Presentation Overview: Show

Dr. Christine Orengo is a Professor of Bioinformatics at University College London (UCL). Her research focuses on the development of algorithms to capture relationships between protein structures, sequences and functions. She has built one of the most comprehensive protein classifications, CATH. CATH structural and functional data for hundreds of millions of proteins has enabled studies that revealed essential universal proteins and their biological roles, and extended characterisation of biological systems implicated in disease e.g. in cell division, cancer and aging. The Encyclopedia of Domains (TED) is a joint effort by CATH (Orengo group) and the Jones group at University College London to identify and classify protein domains in AlphaFold2 models from AlphaFold Database version 4, covering over 188 million unique sequences and 365 million domain assignments.

12:40-13:00
Connecting Data, People, and Purpose: How Open Science is Advancing Bioinformatics in a Low-Resource region (Nigeria)
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Seun Olufemi, OLS (Open Life Science), Nigeria

Presentation Overview: Show

In many low-resource regions, access to scientific training, collaboration opportunities, and computational tools remains limited, hindering both local research capacity and global scientific equity. In Nigeria, we sought to address these gaps by building a sustainable Community of Practice (CoP) for bioinformatics, grounded in open data, mentorship, and community-led learning.

Through the Open Seeds program by Open Life Science (OLS), we launched Bioinformatics Outreach Nigeria (https://bioinformatics-outreach-nigeria.github.io/)—a grassroots initiative aimed at using open science principles to foster data literacy, equitable access to bioinformatics tools, and shared community resources. Our journey began with a survey across the nation, which revealed that over 60% of aspiring bioinformaticians lacked access to adequate training and infrastructure.

In response, we designed and delivered an ""Open Science for Bioinformaticians"" workshop, training 48 out of 232 applicants in open data practices, principles of open science, reproducible research, and collaborative science. Pre- and post-training data showed significant knowledge gains and emphasized the value of continuous peer support. Building on these insights, we developed shared community infrastructure (documents) —such as a publicly accessible Open Canvas, a Code of Conduct, and open documentation practices—that not only reinforce transparency but also promote inclusive and collaborative scientific work.

Our experience demonstrates how data-driven community building, open science mentorship, and collaborative infrastructure can enable lasting change, serving as a scalable model for other underserved regions aiming to bridge the scientific access gap

Analytical code sharing practices in biomedical research
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Serghei Mangul, University of California, Los Angeles, United States
  • Dhrithi Deshpande, University of Southern California, United States
  • Viorel Munteanu, Technical University of Moldova, Moldova
  • Viorel Bostan, UTM, Moldova
  • Dumitru Ciorbă, Technical University of Moldova, Moldova
  • Nicole Nogoy, gigasciencejournal, United States

Presentation Overview: Show

Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016–2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten articles organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability. Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses. In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.

Introducing the Actionable Guidelines for FAIR Research Software Task Force
Confirmed Presenter: Bhavesh Patel, FAIR Data Innovations Hub, California Medical Innovations Institute, United States

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Bhavesh Patel, FAIR Data Innovations Hub, California Medical Innovations Institute, United States
  • Daniel Garijo, Universidad Politécnica de Madrid, Spain
  • Marie-Christine Jacquemot-Perbal, INIST-CNRS, France
  • Kelvin Lee, McMaster University, Canada
  • Carlos Martinez-Ortiz, Netherlands eScience Center, Netherlands
  • Alexander Struck, Humboldt-Universitaet zu Berlin, Germany

Presentation Overview: Show

The Research Software Alliance (ReSA) has established a Task Force dedicated to translating the FAIR principles for Research Software (FAIR4RS Principles) into practical, actionable guidelines. Existing field-specific actionable guidelines, such as the FAIR Biomedical Research Software (FAIR-BioRS) guidelines (presented first during BOSC 2022), lack cross-discipline community input. The Actionable Guidelines for FAIR Research Software Task Force, formed in December 2024, brings together a diverse team of research software developers to address this gap. The Task Force is using the FAIR-BioRS guidelines as a foundation to build upon while aiming to create generalized guidelines. The Task Force began by analyzing the FAIR4RS Principles, where it identified six key requirement categories: Identifiers, Metadata for software publication and discovery, Standards for inputs/outputs, Qualified references, Metadata for software reuse, and License. To address these requirements, six sub-groups are conducting literature reviews and community outreach to define actionable practices for each category. Some of the challenges include identifying suitable identifiers, archival repositories, and metadata standards across research domains. This presentation provides an overview of the Task Force, presents its current progress, and outlines opportunities for community involvement. We will also explain how the FAIR-BioRS guidelines have now led to this global effort. The Task Force is dedicated to making all its outcomes openly available (CC-BY-4.0 license). This initiative will significantly benefit the biomedical open-source community by providing generalized guidelines to make software FAIR that applies beyond just biomedical software, which is critical to prevent siloed practices and drive cross-discipline collaborations.

14:00-14:20
AMRColab: An Open-Access, Modular Bioinformatics Suite for Accessible Antimicrobial Resistance Genome Analysis
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Su Datt Lam, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia, Malaysia
  • Sabrina Di Gregorio, IBaViM, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires,Buenos Aires, Argentina, Argentina
  • Mia Yang Ang, Management & Science University, Shah Alam, Selangor, Malaysia, Malaysia
  • Emma Griffiths, Simon Fraser University, Vancouver, British Columbia, Canada, Canada
  • Tengku Zetty Maztura Tengku Jamaluddin, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Serdang, Malaysia, Malaysia
  • Sheila Nathan, Faculty of Science &Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia, Malaysia
  • Hui-Min Neoh, UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia, Malaysia

Presentation Overview: Show

Antimicrobial resistance (AMR) is a global health crisis, projected to cause 39 million deaths by 2050. Surveillance of AMR pathogens is essential for tracking their resistance profiles and guiding interventions. However, many public health professionals face barriers in bioinformatics expertise and computational resources, limiting their ability to analyse pathogen genomes effectively.

We developed AMRColab, an open-source, modular bioinformatics suite hosted on Google Colaboratory. Released under the CC BY 4.0 license, AMRColab enables users with minimal technical background to detect and visualise AMR determinants from pathogen genomes in a ‘plug-and-play’ format—without requiring local installation or HPC infrastructure.

The platform integrates tools such as AMRFinderPlus, ResFinder, and hAMRonization, supporting comparative and transmission analysis. A proof-of-concept study using MRSA strains validated AMRColab’s effectiveness across labs. Two workshops with 60 participants demonstrated high adoption potential, with participants reporting increased confidence in using genomics for AMR surveillance.

We recently introduced two genome assembly modules: (1) SPAdes and QUAST for Illumina/IonTorrent data; (2) a Nanopore pipeline with FastQC, FastP, NanoPlot, Flye, medaka, and BactInspector. These new modules are currently in beta testing and will be featured in future workshops. These standalone modules extend AMRColab into a full workflow from raw reads to AMR profiling.

AMRColab’s accessible design makes it valuable for medical laboratory technologists, clinicians, and public health researchers to perform genome analysis, regardless of their computational expertise. By lowering technical barriers, AMRColab contributes towards democratizing AMR surveillance and equipping healthcare professionals with essential genomic analysis tools.

Project repository: https://github.com/amrcolab/AMRColab/
License: CC BY 4.0

NApy: Efficient Statistics in Python for Large-Scale Heterogeneous Data with Enhanced Support for Missing Data
Confirmed Presenter: Fabian Woller, Biomedical Network Science Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Fabian Woller, Biomedical Network Science Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
  • Lis Arend, Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Germany
  • Christian Fuchsberger, Institute for Biomedicine, Eurac Research, Italy
  • Markus List, Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Germany
  • David B. Blumenthal, Biomedical Network Science Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

Presentation Overview: Show

Existing Python libraries and tools lack the ability to efficiently run statistical test (such as
Pearson correlation, ANOVA, Mann-Whitney-U test) for large datasets in the presence of
missing values. This presents an issue as soon as constraints on runtime and memory
availability become essential considerations for a particular use case. Relevant research
areas where such limitations arise include interactive tools and databases for exploratory
analysis of large mixed-type data. At the same time, until today, biomedical data analyses on
such large datasets (e.g. population cohorts or electronic health record data)
mostly investigate statistical associations between specific variables (e.g., correlations
between measurements as body mass index and blood pressure). However, the rapidly
growing popularity of systems approaches in biomedicine makes it increasingly
relevant to be able to efficiently compute pairwise statistical associations for all available
pairs of variables in a dataset.
To address this problem, we present the Python tool NApy, which relies on a Numba and
C++ backend with OpenMP parallelization to enable scalable statistical testing for
mixed-type datasets in the presence of missing values. Both with respect to runtime and
memory consumption, we assess NApy’s efficiency on simulated as well as real-world input
data originating from a population cohort study. We show that NApy outperforms Python
competitor tools and baseline implementations with naïve Python-based parallelization by
orders of magnitude enabling on-the-fly analyses in interactive applications. NApy is publicly
available at https://github.com/DyHealthNet/NApy.

pyANI-plus -- whole-genome classification of microbes using Average Nucleotide Identity and similar methods
Confirmed Presenter: Peter Cock, Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, Glasgow, UK, United Kingdom

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Peter Cock, Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, Glasgow, UK, United Kingdom
  • Angelika Kiepas, Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, Glasgow, UK; Fera, UK, United Kingdom
  • Leighton Pritchard, Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, Glasgow, UK, United Kingdom

Presentation Overview: Show

pyANI-plus is an open source MIT licensed Python tool for whole-genome classification of microbes using Average Nucleotide Identity (ANI) and similar methods. This reimplements our earlier tool pyani with additional schedulers and methods. Rather than biological applications or method insights, this presentation focuses on technical changes.

The workflow system snakemake is used internally as a scheduler-agnostic high performance compute cluster wrapper. Compute jobs call any underlying tools and cache results as JSON files which the main process imports into an SQLite3 database. The slowest methods can take around a day to compute for a thousand bacteria, meaning one million pairwise comparisons.

The database schema references each FASTA format input genome by MD5 checksum, and each pairwise comparison references the query and subject checksums and the method configuration (including underlying tool versions). This enables efficient reuse of previously computed results for common use-cases like resuming an interupted run, expanding an analysis by adding additional genomes, or reporting on a subset (for example after removing outliers).

The plotting commands provided use matplotlib and seaborn, but also export the associated data as tab-separated plain-text allowing the user to produce their own custom figures.

Our command line interface is defined using the typer library and Python type annotations. All the python type annotations are validated with mypy, code linting and style formatting with ruff, both run automatically with pre-commit hooks and continous integration testing. We have full test coverage in terms of lines of code, explicitly excluding corner cases like race conditions.

14:20-14:40
InterProScan 6: a modern large-scale protein function annotation pipeline
Confirmed Presenter: Matthias Blum, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Matthias Blum, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Emma Hobbs, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Laise Florentino, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Alex Bateman, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Presentation Overview: Show

InterProScan 6 represents a major step forward in protein function annotation, addressing the scalability, modularity, and usability limitations of its predecessor. Re-engineered as a Nextflow-based workflow, InterProScan 6 is optimised for flexible deployment across a wide range of computational environments, from local workstations and high-performance computing clusters to cloud infrastructures, enabling efficient analysis of large protein datasets.

A key architectural innovation is the decoupling of application code from signature databases, allowing users to download only the required datasets on demand. This modular design significantly reduces storage overhead and supports concurrent use of multiple data releases, enhancing both flexibility and reproducibility.

InterProScan 6 also integrates state-of-the-art deep learning predictors, including DeepTMHMM for transmembrane helix prediction and SignalP 6.0 for signal peptide detection, resulting in improved annotation accuracy. Native support for containerisation via Docker, Singularity, and Apptainer ensures consistent execution across platforms and simplifies environment management.

The legacy match-lookup service from InterProScan 5 has been replaced by a redesigned Matches API; an intuitive, RESTful interface providing programmatic access to precomputed InterPro matches for all UniParc sequences. This facilitates seamless integration with a wide range of external tools and workflows.

InterProScan 6 delivers substantial improvements in flexibility, annotation quality, and accessibility. The alpha release is scheduled for late April 2025, with a full release planned for summer 2025 under the Apache 2 open source license.

14:40-15:00
Real-time base modification analysis for nanopore sequencing
Confirmed Presenter: Suneth Samarasinghe, Computer Science and Engineering, School of Computing, University of New South Wales, Sydney, Australia

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Suneth Samarasinghe, Computer Science and Engineering, School of Computing, University of New South Wales, Sydney, Australia
  • Ira Deveson, Garvan Institute of Medical Research, Sydney, Australia
  • Hasindu Gamaarachchi, Computer Science and Engineering, School of Computing, University of New South Wales, Sydney, Australia

Presentation Overview: Show

Real-time analysis of DNA base modifications, particularly methylation, is crucial for making rapid decisions in contexts such as forensics and clinical settings, particularly when combined with selective or adaptive sequencing. Traditional methods like bisulfite sequencing, while accurate, are limited by their need for large DNA samples and their inability to capture long-range methylation patterns. Nanopore sequencing, with its ability to generate long reads and detect base modifications through electrical signal analysis, offers a promising alternative.

We introduce RealFreq, a lightweight framework for retrieving real-time base modification frequencies while sequencing on nanopore sequencing devices. Realfreq is composed of two primary components: realfreq-pipeline, a modular script that manages (monitors raw signals directory, basecalls raw signals, and aligns to a reference genome) data flow from the sequencing device to base modification detection, and realfreq-program, a C-based application that performs base modification calling based on information retrieved from alignment files and maintains an in-memory map of base modifications. Additionally, we developed realfreq-server within realfreq-program using simple socket connections, which provide an interface to query base modification information in real-time.

We demonstrate realfreq’s ability to keep up with the output data stream of the Oxford Nanopore Technologies (ONT) PromethION. The base modification calling algorithm we developed for realfreq-prog is separately bundled with Minimod, a simple base modification analysis tool we developed. Minimod’s output aligns with the outputs of current state-of-the-art tools while improving execution time by 2x for DNA and 40x for RNA datasets running on a laptop computer.

Open-Source GPU Acceleration for State-of-the-Art Nanopore Basecalling with Slorado
Confirmed Presenter: Bonson Wong, School of Computer Science and Engineering, UNSW Sydney, Australia

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Bonson Wong, School of Computer Science and Engineering, UNSW Sydney, Australia
  • Hasindu Gamaarachchi, School of Computer Science and Engineering, UNSW Sydney, Australia

Presentation Overview: Show

Nanopore sequencing has become a popular technology for genomic research because of its cost-effectiveness and ability to sequence long reads. Nanopore technology offers solutions from portable sequencing devices, such as the MinION designed for in-field applications, to large-scale sequencing devices like the PromethION. A nanopore sequencer generates a time series 'raw signal, ' which is then converted into a nucleobase sequence (A, C, G, T) through basecalling. The basecalling step, however, occupies a narrow scope of hardware that can be performed on. Much of the implementation in the current state-of-the-art Dorado basecaller relies on a closed-source binary package for platform-specific optimisations. Dorado is specifically developed for high-compute Nvidia Graphics Processing Units (GPUs) as their main platform. Basecalling without these optimisations is impractical, and therefore, researchers working in resource-constrained environments will be limited by Dorado's limited hardware compatibility. We aim to open-source these large sections of the codebase to make basecalling technology accessible to researchers and developers. We provide two open-source software packages to the genomics community: 'Openfish' is a library that accelerates nanopore CRF decoding tailored towards nanopore signal processing. Openfish implements decoding in Dorado on the GPU for NVIDIA and AMD hardware. As a framework for testing and benchmarking the entire basecalling pipeline, we have also built the application ‘Slorado’: a lean and open-source basecaller that can be easily compiled for NVIDIA and AMD machines.

Voyager-SDK: integrating and automating pipeline runs using Voyager-SDK and Voyager platform
Confirmed Presenter: Sinisa Ivkovic, Memorial Sloan Kettering Cancer Center, United States

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Sinisa Ivkovic, Memorial Sloan Kettering Cancer Center, United States
  • Christopher Allan Bolipata, Memorial Sloan Kettering Cancer Center, United States
  • Nikhil Kumar, Memorial Sloan Kettering Cancer Center, United States
  • Eric Buehler, Memorial Sloan Kettering Cancer Center, United States
  • Danielle Pankey, Memorial Sloan Kettering Cancer Center, United States
  • Adrian Fraiha, Memorial Sloan Kettering Cancer Center, United States
  • Mark Donoghue, Memorial Sloan Kettering Cancer Center, United States
  • Nicholas Socci, Memorial Sloan Kettering Cancer Center, United States
  • Ronak Shah, Memorial Sloan Kettering Cancer Center, United States
  • David B. Solit, Memorial Sloan Kettering Cancer Center, United States

Presentation Overview: Show

At Memorial Sloan Kettering Cancer Center (MSKCC), we developed Voyager, a platform to automate the execution of computational pipelines built using community standards Common Workflow Language and Nextflow. Voyager streamlines the orchestration and monitoring of pipelines across various compute environments. By leveraging the nf-core input schema for Nextflow pipelines and the Common Workflow Language (CWL) schema, Voyager abstracts input handling across both technologies, enabling seamless integration and execution regardless of the underlying workflow engine.

To enable broader adoption and community contribution, we are introducing the Voyager SDK—a toolkit that empowers developers to integrate their pipelines into the platform via modular components called Operators. As the number of pipelines in our organization grew, it became increasingly important to decouple the logic of these Operators from the core Voyager codebase. Operators encapsulate pipeline-specific logic and metadata, providing a structured interface to the Voyager engine. By externalizing this logic through the SDK, we enable independent development, promote extensibility and portability, and empower developers to onboard new pipelines without modifying the platform itself.

This talk will present the architecture of the Voyager platform, demonstrate how the SDK supports the creation and testing of Operators, and discuss how open standards and open-source tooling have been central to our development strategy. We will also share lessons learned from building infrastructure that balances institutional requirements with community best practices.

15:00-15:20
Empowering Bioinformatics Communities with nf-core: The success of an open-source bioinformatics ecosystem
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Björn Langer, Centre for Genomic Regulation (CRG), Spain
  • Andreia Amaral, MED. University of Évora, Portugal
  • Marie-Odile Baudement, Faculty of Biosciences, Norwegian University of Life Sciences, Norway
  • Franziska Bonath, Kungliga Tekniska Högskolan, School of Engineering Sciences in Chemistry, Biotechnology and Health, Sweden
  • Mathieu Charles, INRAE, AgroParisTech, GABI, Université Paris-Saclay, France
  • Praveen Krishna Chitneedi, Research Institute for Farm Animal Biology (FBN), Germany
  • Emily L. Clark, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, United Kingdom
  • Paolo Di Tommaso, Seqera, Spain
  • Sarah Djebali, IRSD, Université de Toulouse, INSERM, INRAE, ENVT, Univ Toulouse III - Paul Sabatier (UPS), France
  • Philip A. Ewels, Seqera, Spain
  • Sonia Eynard, GenPhySE, Université de Toulouse, INRAE, ENVT, France
  • James A. Fellows Yates, Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Germany
  • Daniel Fischer, Natural Resources Institute Finland (Luke), Applied Statistical Methods, Finland
  • Evan W. Floden, Seqera, Spain
  • Sylvain Foissac, GenPhySE, Université de Toulouse, INRAE, ENVT, France
  • Gisela Gabernet, ​​Department of Pathology, Yale School of Medicine, United States
  • Maxime U. Garcia, Seqera, Spain
  • Gareth Gillard, Centre for Integrative Genetics (CIGENE), Faculty of Biosciences, Norwegian University of Life Sciences, Norway
  • Manu Kumar Gundappa, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, United Kingdom
  • Cervin Guyomar, GenPhySE, Université de Toulouse, INRAE, ENVT, France
  • Christopher Hakkaart, Seqera, Spain
  • Friederike Hanssen, Quantitative Biology Center (QBiC), University of Tübingen, Germany
  • Peter W. Harrison, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, United Kingdom
  • Matthias Hörtenhuber, Department of Immunology, Genetics and Pathology, Uppsala University, Sweden
  • Cyril Kurylo, GenPhySE, Université de Toulouse, INRAE, ENVT, France
  • Christa Kühn, Research Institute for Farm Animal Biology (FBN), Germany
  • Sandrine Lagarrigue, INRAE, AgroParisTech, GABI, Université Paris-Saclay, France
  • Delphine Lallias, INRAE, AgroParisTech, GABI, Université Paris-Saclay, France
  • Daniel J. Macqueen, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, United Kingdom
  • Edmund Miller, Department of Biological Sciences and Center for Systems Biology, The University of Texas at Dallas, United States
  • Júlia Mir-Pedrol, Quantitative Biology Center (QBiC), University of Tübingen, Germany
  • Gabriel Costa Monteiro Moreira, Unit of Animal Genomics, GIGA & Faculty of Veterinary Medicine, University of Liège, Belgium
  • Sven Nahnsen, Quantitative Biology Center (QBiC), University of Tübingen, Germany
  • Harshil Patel, Seqera, Spain
  • Alexander Peltzer, Boehringer Ingelheim Pharma GmbH & Co KG, Germany
  • Frederique Pitel, GenPhySE, Université de Toulouse, INRAE, ENVT, France
  • Yuliaxis Ramayo-Caldas, Animal Breeding and Genetics Program, Institute of Agrifood Research and Technology (IRTA), Spain
  • Marcel da Câmara Ribeiro-Dantas, Seqera, Spain
  • Dominique Rocha, INRAE, AgroParisTech, GABI, Université Paris-Saclay, France
  • Mazdak Salavati, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, United Kingdom
  • Alexey Sokolov, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, United Kingdom
  • Jose Espinosa-Carrasco, Centre for Genomic Regulation (CRG), Spain
  • Cedric Notredame, Centre for Genomic Regulation (CRG), Spain

Presentation Overview: Show

The nf-core community exemplifies how open-source software development fosters collaboration, innovation, and sustainability in bioinformatics. nf-core currently features a curated collection of 124 pipelines built using the Nextflow workflow management system according to community-agreed standards. These standards ensure that nf-core pipelines are high-quality, portable, and reproducible.
Since its creation in 2018, nf-core has constantly grown. Notably, the project has expanded beyond genomics and now supports pipelines across domains such as imaging, mass spectrometry, protein structure prediction, and disciplines outside life sciences like economics or earth biosciences. One of the reasons for this community's success is nf-core’s strong commitment to outreach and inclusiveness, exemplified by free training videos, hackathons, webinars, and a mentorship program that supports newcomers and underrepresented groups.
The recent introduction of the Domain-Specific Language 2 in Nextflow, enabled the development of reusable software components (modules and subworkflows), accelerating pipeline development. Currently, nf-core provides access to over 1000 modules (single tool components) and 50 subworkflows (combinations of modules that wrap high-level functionalities). This modular architecture strengthened nf-core’s collaborative nature, making it easier and more appealing to contribute.
This open, community-driven framework was key for six European research consortia under the EuroFAANG umbrella, dedicated to farmed animal genomics. These consortia adopted nf-core as their standard for pipeline development, ensuring their contributions' long-term sustainability. Notably, nf-core standards have also been adopted by flagship projects such as the Darwin Tree of Life and Genomics England, highlighting the broad value and impact of the nf-core model.

15:20-15:40
JASPAR-Suite: An open toolkit for accessing TF binding motifs
Confirmed Presenter: Aziz Khan, Computational Biology Department, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE, United Arab Emirates

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Aziz Khan, Computational Biology Department, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE, United Arab Emirates
  • Anthony Mathelier, Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, Oslo 0318, Norway, Norway

Presentation Overview: Show

JASPAR database (https://jaspar.elixir.no) is a widely used open-access database of manually curated, non-redundant transcription factor (TF) binding profiles across multiple species, supporting the global community of gene regulation researchers. As the field of regulatory genomics grows increasingly data-driven, JASPAR plays a vital role in providing high-quality position frequency matrices (PFMs) for TFs, enabling insights into gene expression regulation, enhancer activity, and transcriptional networks. The JASPAR database can be accessed through several user-friendly and programmatic interfaces, including a web interface for intuitive exploration, a RESTful API for cross-platform integration, the Bioconductor package for R users, and pyJASPAR—a flexible and Pythonic toolkit for both interactive and command-line access to TF motifs.

In this talk, we will demonstrate how JASPAR can be accessed using its RESTful API (https://jaspar.elixir.no/api/) from any programming environment, allowing seamless integration into bioinformatics workflows. I will also introduce pyJASPAR (https://github.com/asntech/pyjaspar), a lightweight Python package we developed to make JASPAR motif queries easy, scriptable, and reproducible—whether from a Jupyter notebook or a shell terminal. Together, these tools form the JASPAR Suite, designed to empower the scientific community with open, reproducible, and interoperable access to TF binding motifs. All the code, data, and workflows are openly available under open licenses, supporting transparency and reproducibility in computational biology research.

VueGen: automating the generation of scientific reports
Confirmed Presenter: Sebastian Ayala-Ruano, Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Denmark

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Sebastian Ayala-Ruano, Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Denmark
  • Henry Webel, Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Denmark
  • Alberto Santos Delgado, Novo Nordisk Foundation Center for Biosustainability, Denmark

Presentation Overview: Show

The analysis of omics data typically involves multiple bioinformatics tools and methods, each producing distinct output files. However, compiling these results into comprehensive reports often requires additional effort and technical skills. This creates a barrier for non-bioinformaticians, limiting their ability to produce reports from their findings. Moreover, the lack of streamlined reporting workflows impacts reproducibility and transparency, making it difficult to communicate results and track analytical processes.

Here, we present VueGen, an open-source software that addresses the limitations of current reporting tools by automating report generation from bioinformatics outputs, allowing researchers with minimal coding experience to communicate their results effectively. With VueGen, users can produce reports by simply specifying a directory containing output files, such as plots, tables, networks, Markdown text, and HTML components, along with the report format. Supported formats include documents (PDF, HTML, DOCX, ODT), presentations (PPTX, Reveal.js), Jupyter notebooks, and Streamlit web applications. To showcase VueGen’s functionality, we present two case studies and provide detailed documentation to help users generate customized reports.

VueGen was designed with accessibility and community contribution in mind, offering multiple implementation options for users with varying technical expertise. It is available as a Python package, a portable Docker image, and an nf-core module, leveraging established open-source ecosystems to facilitate integration and reproducibility. Furthermore, a cross-platform desktop application for macOS and Windows provides a user-friendly interface for users less familiar with command-line tools. The source code is freely available on https://github.com/Multiomics-Analytics-Group/vuegen. Documentation is provided at https://vuegen.readthedocs.io/.

The world’s biomedical knowledge in less than a gram: introducing the PGP incubator
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Peter Amstutz, Curii Corporation, United States
  • Sarah Zaranek, Curii Corporation, United States
  • Alexander Sasha Wait Zaranek, Curii Corporation, United States
  • Zoe Ma, Curii Corporation, United States

Presentation Overview: Show

In this talk, we describe a new project, the Personal Genome Project incubator. The PGPincubator is an effort to create a distribution of open data, tools, workflows, AI models and learning materials that support validation, benchmarking, and education in bioinformatics and biomedicine for precision health and (pre-clinical) biomedical AI. In addition, the incubator is a distributed network of physical computing infrastructure used to test components included in the distribution, such as validating genomics workflows or benchmarking AI models.

To help hatch this network, PGPincubator is running a private network of “h-grams.” An h-gram is 1-4 microSD cards (3-4 weigh about a gram!) each flashed with an operating system image that can be booted on compatible commodity PC hardware. The operating system (Ubuntu) is pre-configured to act as a server suitable for home, office or lab and is accessed by other devices through a browser. Each h-gram is pre-loaded with hundreds of gigabytes of openly licensed infrastructure software, bioinformatics tools, genomic datasets, AI models, and learning resources.

The PGPincubator data and software distribution pre-loaded on the h-gram will be updated on a 6 month release schedule, inspired by Linux distribution releases. With both software and data sets distributed in versioned releases, it becomes far easier for researchers to precisely identify both software and data used in their work, for others to reproduce that work, and for students to study that work, while ensuring that validation and benchmarking methods are done fairly against a common baseline.

15:40-16:00
Slivka: a new ecosystem for wrapping local code as web services
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Jim Procter, University of Dundee, United Kingdom
  • Mateusz Warowny, University of Dundee, Poland
  • Stuart MacGowan, University of Dundee, United Kingdom
  • Javier Utges, University of Dundee, United Kingdom
  • Geoff Barton, University of Dundee, United Kingdom

Presentation Overview: Show

Slivka is a Python/Flask/MongoDB framework that allows command line tools to be made available as web services through creation of a YAML document that allows flexible execution configuration and semantic service discovery. Deployable via conda and Docker, this framework has been used to provide services for the Jalview desktop and web-based platform for interactive sequence alignment and analysis, and new services for analysis of protein ligand binding sites. Slivka is released under the Apache 2.0 license and is being developed under an open consortium model to foster community support across both industry and academia.

16:40-17:00
FAIRDOM-SEEK: Platform for FAIR data and research asset management
Confirmed Presenter: Munazah Andrabi, The University of Manchester, United Kingdom

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Munazah Andrabi, The University of Manchester, United Kingdom
  • Stuart Owen, The University of Manchester, United Kingdom
  • Finn Bacall, The University of Manchester, United Kingdom
  • Phil Reed, The University of Manchester, United Kingdom
  • Xiaoming Hu, Heidelberg Institute for Theoretical Studies, Germany
  • Ulrike Wittig, Heidelberg Institute for Theoretical Studies, Germany
  • Maja Rey, Heidelberg Institute for Theoretical Studies, Germany
  • Martin Golebiewski, Heidelberg Institute for Theoretical Studies, Germany
  • Flora D'Anna, Vlaams Instituut voor Biotechnologie (VIB), Belgium
  • Kevin De Pelseneer, Vlaams Instituut voor Biotechnologie (VIB), Belgium
  • Jacky Snoep, Stellenbosch University, South Africa, South Africa
  • Wolfgang Müller, Heidelberg Institute for Theoretical Studies, Germany
  • Carole Goble, The University of Manchester, United Kingdom

Presentation Overview: Show

As research becomes more data-driven, collaborative, and interdisciplinary, the need for structured, accessible, and well-curated data outputs with rich, standardized metadata is critical to ensure data is discoverable and reusable beyond its original context. FAIRDOM-SEEK platform addresses these challenges by providing a customisable, open-source, web-based catalogue designed to support FAIR (Findable, Accessible, Interoperable, Reusable) data and asset management.

FAIRDOM-SEEK enables scientists to organize, document, share, and publish research data using the Investigation, Study, Assay (ISA) framework, which structures experiments and related assets such as samples, protocols. Key features include robust metadata and sample management, version control, linking to external repositories, and integration with modeling tools. Controlled sharing and DOI creation further enhance collaboration and long-term accessibility.

The platform supports the creation of dedicated Project Hubs, which are customised local instances deployed for specific projects. These allow tailored use of the platform’s core capabilities, including modified appearance, structure, and content. Notable examples of hubs include IBISBAHub, WorkflowHub, NFDI4Heath Local DataHub and DataHub. MIT BioMicroCenter has integrated the platform to streamline data and sample management for their ongoing research initiatives. In addition, FAIRDOMHub, the flagship public instance, serves over 400 national and international projects as both a repository and a knowledge-sharing platform, promoting interdisciplinary collaboration and community engagement. As a core resource for many European organisations (e.g de.NBI, ELIXIR) and international consortia, FAIRDOMHub, plays a vital role in research data management. In the talk we will present the salient features of FAIRDOM-SEEK and highlighting how it facilitates FAIR Data Management.

17:00-17:20
Walkthrough of GA4GH standards and interoperability it provides for genomic data implementations
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Jimmy Payyappilly, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
  • Dashrath Chauhan, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
  • Sasha Siegel, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
  • Andrew D Yates, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
  • Chen Chen, Ontario Institute for Cancer Research, Canada
  • Deeptha Srirangam, Broad Institute of MIT and Harvard, United States

Presentation Overview: Show

The sharing of genomic and health-related data for biomedical research is of key importance in ensuring continued progress in our understanding of human health and wellbeing. In this journey, bioinformatics and genomics continue to be closely coupled. To further expand the benefits of research, the Global Alliance for Genomics and Health (GA4GH) builds the foundation for broad and responsible use of genomic data by setting standards and frames policies to expand genomic data use guided by the Universal Declaration of Human Rights. As is true with any data, interoperability between open-source systems processing genomic data is vital. When systems are based on standards, it eases interactions between technical ecosystems as there is a common framework and way to interact and request resources. In this talk, we present two GA4GH open-source standards, which through their reference implementations exhibit interoperability between standards. Through this session, we will showcase the use cases on how these standards support data science for genomics research and ensure easy discoverability of data across the globe.

LiMeTrack: A lightweight biosample management platform for the multicenter SATURN3 consortium
Confirmed Presenter: Laura Godfrey, Institute for Computational Cancer Biology (ICCB), University Hospital and University of Cologne, Germany, Germany

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Florian Heyl, German Cancer Research Center (DKFZ), Division of Computational Genomics and Systems Genetics (B260), Germany
  • Jonas Gassenschmidt, Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Germany
  • Lukas Heine, Institute for AI in medicine, Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Essen, Germany, Germany
  • Frederik Voigt, Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Germany
  • Jens Kleesiek, Institute for AI in medicine, Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Essen, Germany, Germany
  • Oliver Stegle, German Cancer Research Center (DKFZ), Division of Computational Genomics and Systems Genetics (B260), Germany
  • Jens Siveke, Bridge Institute of Experimental Tumor Therapy (BIT), University Hospital Essen & University of Duisburg-Essen Germany, Germany
  • Melanie Boerries, Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Germany
  • Roland Schwarz, Institute for Computational Cancer Biology (ICCB), University Hospital and University of Cologne, Germany, Germany
  • Laura Godfrey, Institute for Computational Cancer Biology (ICCB), University Hospital and University of Cologne, Germany, Germany

Presentation Overview: Show

Biomedical research projects involving large patient cohorts are increasingly complex, both in terms of data modalities and number of samples. Hence, they require robust data management solutions to foster data integrity, reproducibility and secondary use compliant with the FAIR principles. SATURN3, a German consortium with 17 partner sites investigates intratumoral heterogeneity using patient biosamples. As part of a complex, multicenter workflow, high-level multimodal analyses include bulk, single-cell, and spatial omics and corresponding data analysis. To manage this complexity and to avoid miscommunication, data loss and de-synchronization at different project sites, harmonization in a central infrastructure is essential. Additionally, real-time monitoring of the sample processing status must be accessible to all project partners throughout the project. This use case goes far beyond the capabilities of spreadsheets that are susceptible to security vulnerabilities, versioning mistakes, data loss and type conversion errors. Existing data management tools are often complex to set up or lack the necessary flexibility to be adopted for specific project needs. To address these challenges, we introduce LightMetaTrack (LiMeTrack), a biosample management platform built on the Django-Framework. Key features include customizable and user-friendly forms for data entry and a real-time dashboard for project and sample status overview. LiMeTrack simplifies the creation and export of sample sheets, streamlining subsequent bioinformatics analyses and research workflows. By integrating real-time monitoring with robust sample tracking and data management, LiMeTrack improves research transparency and reproducibility, ensures data integrity and optimizes workflows, making it a powerful solution for biosample management in multicenter biomedical research endeavours.

BFVD – A release of 351k viral protein structure predictions
Confirmed Presenter: Rachel Seongeun Kim, Seoul National University, Korea, The Democratic People's Republic of

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Rachel Seongeun Kim, Seoul National University, Korea, The Democratic People's Republic of
  • Eli Levy Karin, ELKMO, Denmark
  • Milot Mirdita, Seoul National University, Korea, The Democratic People's Republic of
  • Rayan Chikhi, Institut Pasteur, France
  • Martin Steinegger, Seoul National University, Korea, The Democratic People's Republic of

Presentation Overview: Show

While the AlphaFold Protein Structure Database (AFDB) is the largest resource of accurately predicted structures – covering 214 million UniProt entries with taxonomic labels – it excludes viral sequences, limiting its utility for virology. To fill this gap, we present the Big Fantastic Virus Database (BFVD), a repository of 351,242 protein structures predicted using ColabFold on viral sequence representatives of the UniRef30 clusters. By augmenting Logan’s petabase-scale SRA assemblies in homology searches and applying 12-recycle refinement, we further enhanced the confidence scores of 41% of BFVD entries.
BFVD serves as an essential, viral-focused expansion to existing protein structure repositories. Over 62% of its entries show no or low structural similarity to the PDB and AFDB, underscoring the novelty of its content. Notably, BFVD enables identification of a substantial fraction of bacteriophage proteins, which remain uncharacterized at the sequence level, by matching them to similar structures. In that, BFVD is on par with the AFDB, despite holding nearly three orders of magnitude fewer structures. Freely downloadable at bfvd.steineggerlab.workers.dev and explorable via Foldseek with UniProt labels at bfvd.foldseek.com, BFVD offers new opportunities for advanced viral research.

17:20-17:40
AI-readiness for biomedical data: Bridge2AI recommendations
Confirmed Presenter: Monica Munoz-Torres, University of Colorado Anschutz Medical Campus, United States

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Monica Munoz-Torres, University of Colorado Anschutz Medical Campus, United States

Presentation Overview: Show

The convergence of biomedical research and artificial intelligence (AI) promises unprecedented insights into complex biological systems. However, realizing this potential demands datasets meticulously designed for AI/ML analysis, addressing challenges from data quality to the critical imperatives of explainable AI (XAI) and ethical, legal, and social implications (ELSI).

The NIH Bridge2AI program is at the forefront of this effort, creating flagship biomedical datasets and establishing best practices for AI/ML data science. This paper, authored by the Bridge2AI Standards Working Group, introduces foundational criteria for assessing the AI-readiness of biomedical data. We present actionable methods and data standards perspectives developed within the program, emphasizing their crucial role in fostering scientific rigor and responsible innovation.

These AI-readiness criteria encompass essential considerations for XAI – ensuring the interpretability of AI-driven discoveries – and proactive integration of ELSI principles. While the landscape of biomedical AI rapidly evolves, these guidelines provide a vital framework for scientific rigor, enabling the creation and utilization of high-quality, ethically sound data resources that will drive impactful advancements in bioinformatics and beyond. During this presentation, we will examine these foundational standards that are shaping the future of AI in molecular biology and medicine.

17:40-17:50
Bridging the gap: advancing aging & dementia research through the open-access AD Knowledge Portal
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Jo Scanlan, Sage Bionetworks, United States
  • Amelia Kallaher, Sage Bionetworks, United States
  • Zoe Leanza, Sage Bionetworks, United States
  • Jessica Britton, Sage Bionetworks, United States
  • Jaclyn Beck, Sage Bionetworks, United States
  • Beatriz Saldana, Sage Bionetworks, United States
  • Anthony Pena, Sage Bionetworks, United States
  • William Poehlman, Sage Bionetworks, United States
  • Victor Baham, Sage Bionetworks, United States
  • Trisha Zintel, Sage Bionetworks, United States
  • Jesse Wiley, Sage Bionetworks, United States
  • Karina Leal, Sage Bionetworks, United States
  • Jessica Malenfant, Sage Bionetworks, United States
  • Laura Heath, Sage Bionetworks, United States
  • Susheel Varma, Sage Bionetworks, United Kingdom

Presentation Overview: Show

The AD Knowledge Portal (adknowledgeportal.org) is an NIA-funded resource developed by Sage Bionetworks to facilitate Alzheimer's Disease research through open data sharing. The secure Synapse platform enables researchers to share data with proper attribution while ensuring compliance with FAIR principles.

The Portal aggregates resources from 14 NIH-funded research programs and 97 aging-related grants, housing approximately 800TB of data from over 11,000 individuals. This multimodal data encompasses genomics, transcriptomics, epigenetics, imaging, proteomics, metabolomics, and behavioural assays from various sources, including brain banks, longitudinal cohorts, cell lines, and animal models. Recent additions include 290 TB of single-cell and nucleus expression data, alongside experimental tools and computational resources.

All content is available under Creative Commons BY 4.0 licenses, with software under open-source licenses such as Apache 2.0. The Portal's code is publicly available on GitHub with comprehensive documentation. The Community Data Contribution Program extends the Portal's scope beyond NIA-funded projects.

Since January 2022, over 6,000 unique users have downloaded 12.57 PB of data, with monthly downloads doubling between 2023-2024. Portal data has been cited in over 1,000 publications since 2019, with more than half of these representing the reuse of secondary data. Integration with platforms like CAVATICA and Terra enhances accessibility. Future developments include interoperability with AD Workbench, NACC, NIAGADS, and LONI, as well as new data types such as spatial transcriptomics and longitudinal data from Alzheimer's disease models.

17:50-18:00
The ELITE Portal: A FAIR Data Resource For Healthy Aging Over The Life Span
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Milan Vu, Sage Bionetworks, United States
  • Tanveer Talukdar, Sage Bionetworks, United States
  • Amelia Kallaher, Sage Bionetworks, United States
  • Melissa Klein, Sage Bionetworks, United States
  • Natosha Edmonds, Sage Bionetworks, United Kingdom
  • Jessica Malenfant, Sage Bionetworks, United States
  • Christine Suver, Sage Bionetworks, United States
  • Laura Heath, Sage Bionetworks, United States
  • Alberto Pepe, Sage Bionetworks, United States
  • Luca Foschini, Sage Bionetworks, United States
  • Solly Sieberts, Sage Bionetworks, United States
  • Susheel Varma, Sage Bionetworks, United Kingdom

Presentation Overview: Show

Exceptional longevity (EL) is a rare phenotype characterized by an extended health span and sustained physiological function. Various domain-specific factors contribute to EL, influencing the maintenance of key physiological systems (e.g., respiratory, cardiovascular, immune) and functional domains (e.g., mobility, cognition). Studying the impacts of protective genetic variants and cellular mechanisms associated with EL facilitates the identification of novel therapeutic targets that replicate their beneficial effects. The Exceptional Longevity Translational Resources (ELITE) Portal (eliteportal.synapse.org) is a new, open-access repository for disseminating data and other research resources from translational longevity research projects. The portal supports diverse data types including genetic, transcriptomic, epigenetic, proteomic, metabolomic, and phenotypic data from longitudinal human cohort studies and cross-species comparative biology studies of tens- to hundreds- of nonhuman species; data from longevity-enhancing intervention studies in mouse and cell models; access to web applications and software tools to support exploration of EL-related research outcomes; and a catalog of publications associated with the National Institute on Aging (NIA)-funded translational longevity research projects. The portal also integrates with the external Trusted Research Environment (TRE) CAVATICA and is poised to support future integrations with additional data resources like Terra. All resources hosted in the ELITE Portal are distributed under FAIR (Findable, Accessible, Interoperable, and Reusable) principles. The ELITE Portal is funded by the NIA 5U24AG078753 and 2U24AG061340.

Tuesday, July 22nd
11:20-12:20
Invited Presentation: Open Knowledge Bases in the Age of Generative AI
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Chris Mungall

Presentation Overview: Show

Dr. Chris Mungall is a Senior Scientist at Berkeley Lab, where he heads the Biosystems Data Science department in the Environmental Genomics and Systems Biology Division. Chris’s research interests center around the capture, computational integration, and dissemination of biological research data, and the development of methods for using this data to elucidate biological mechanisms underpinning the health of humans and of the planet. He and his team have led the creation of key biological ontologies for the integration of resources covering gene function, anatomy, phenotypes and the environment, including the the Uberon anatomy ontology, the Cell Ontology (CL), and the Mondo disease ontology. He is also one of the cofounders of the OBO Foundry. For decades, he has been a strong advocate for open-source bioinformatics software, open standards, and open science.

12:20-12:40
textToKnowledgeGraph: Generation of Molecular Interaction Knowledge Graphs Using Large Language Models for Exploration in Cytoscape
Confirmed Presenter: Favour James, Obafemi Awolowo University, Nigeria

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Favour James, Obafemi Awolowo University, Nigeria
  • Christopher Churas, Department of Medicine, University of California San Diego, La Jolla, CA, United States., United States
  • Trey Ideker, Department of Medicine, University of California San Diego, La Jolla, CA, United States., United States
  • Dexter Pratt, Department of Medicine, University of California San Diego, La Jolla, CA, United States., United States
  • Augustin Luna, National Library of Medicine and National Cancer Institute, Bethesda, MD, USA, United States

Presentation Overview: Show

Knowledge graphs (KGs) are powerful tools for structuring and analyzing biological information due to their ability to represent data and improve queries across heterogeneous datasets. However, constructing KGs from unstructured literature remains challenging due to the cost and expertise required for manual curation. Prior works have explored text-mining techniques to automate this process, but have limitations that impact their ability to capture complex relationships fully. Traditional text-mining methods struggle with understanding context across sentences. Additionally, these methods lack expert-level background knowledge, making it difficult to infer relationships that require awareness of concepts indirectly described in the text.
Large Language Models (LLMs) present an opportunity to overcome these challenges. LLMs are trained on diverse literature, equipping them with contextual knowledge that enables more accurate extraction. Additionally, LLMs can process the entirety of an article, capturing relationships across sections rather than analyzing single sentences; this allows for more precise extraction. We present textToKnowledgeGraph, an artificial intelligence tool using LLMs to extract interactions from individual publications directly in Biological Expression Language (BEL). BEL was chosen for its compact and detailed representation of biological relationships, allowing for structured and computationally accessible encoding.
This work makes several contributions. 1. Development of the open‑source Python textToKnowledgeGraph package (pypi.org/project/texttoknowledgegraph) for BEL extraction from scientific articles, usable from the command line and within other projects, 2. An interactive application within Cytoscape Web to simplify extraction and exploration, 3. A dataset of extractions that have been both computationally and manually reviewed to support future fine-tuning efforts.

12:40-13:00
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models built on Biomed-Multi-Omic
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Bharath Dandala, IBM, United States
  • Michael M Danziger, IBM, Israel
  • Ching-Huei Tsou, IBM, United States
  • Akira Koseki, IBM, Japan
  • Viatcheslav Gurev, IBM, United States
  • Tal Kozlovski, IBM, Israel
  • Ella Barkan, IBM, Israel
  • Matthew Madgwick, IBM, United Kingdom
  • Akihiro Kosugi, IBM, Japan
  • Tanwi Biswas, IBM, Japan
  • Liran Szalk, IBM, Israel
  • Matan Ninio, IBM, Israel

Presentation Overview: Show

High-throughput sequencing has revolutionized transcriptomic studies, and the synthesis of these diverse datasets holds significant potential for a deeper under- standing of cell biology. Recent advancements have introduced several promising techniques for building transcriptomic foundation models (TFMs), each emphasizing unique modeling decisions and demonstrating potential in handling the inherent challenges of high-dimensional, sparse data. However, despite their individual strengths, current TFMs still struggle to fully capture biologically meaningful representations, highlighting the need for further improvements. Recognizing that existing TFM approaches possess complementary strengths and weaknesses, a promising direction lies in the systematic exploration of various combinations of design, training, and evaluation methodologies. Thus, to accelerate progress in this field, we present bmfm-rna (shown in Figure 1), a comprehensive framework that not only facilitates this combinatorial exploration but is also inherently flexible and easily extensible to incorporate novel methods as the field continues to advance. This framework enables scalable data processing and features extensible transformer architectures. It supports a variety of input representations, pretraining objectives, masking strategies, domain-specific metrics, and model interpretation methods. Furthermore, it facilitates down- stream tasks such as cell type annotation, perturbation prediction, and batch effect correction on benchmark datasets. Models trained with the framework achieve performance comparable to scGPT, Geneformer and other TFMs on these downstream tasks. By open-sourcing this framework with strong performance, we aim to lower barriers for developing TFMs and invite the community to build more effective TFMs. bmfm-rna is available via Apache license at https://github.com/BiomedSciAI/biomed-multi-omic

DOME Registry - Supporting ML transparency and reproducibility in the life sciences
Confirmed Presenter: Gavin Farrell, Uni Padova, Italy

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Gavin Farrell, Uni Padova, Italy
  • Omar Attafi, University of Padova, Italy
  • Silvio Tosatto, University of Padova, Italy

Presentation Overview: Show

The adoption of machine learning (ML) methods in the life sciences has been transformative, solving landmark challenges such as accurate protein structure prediction, improving bioimaging diagnostics and accelerating drug discovery. However, researchers face a reuse and reproducibility crisis of ML publications. Authors are publishing ML methods lacking core information to transfer value back to the reader. Commonly absent are links to code, data and models eroding trust in the methods.

In response to this ELIXIR Europe developed a practical checklist of recommendations covering key ML methods aspects for disclosure covering; data, optimisation, model and evaluation. These are now known collectively as the DOME Recommendations published in Nature Methods by Walsh et al. 2021. Building on this successful first step towards addressing the ML publishing crisis, ELIXIR has developed a technological solution to support the implementation of the DOME Recommendations. This solution is known as the DOME Registry and was published in GigaScience by Ataffi et al. in late 2024.

This talk will cover the DOME Registry technology which serves as a curated database of ML methods for life science publications by allowing researchers to annotate and share their methods. The service can also be adopted by publishers during their ML publishing workflow to increase a publication’s transparency and reproducibility. An overview of the next steps for the DOME Registry will also be provided - considering new ML ontologies, metadata formats and integrations building towards a stronger ML ecosystem for the life sciences.

AutoPeptideML 2: An open source library for democratizing machine learning for peptide bioactivity prediction
Confirmed Presenter: Raúl Fernández-Díaz, IBM Research | UCD Conway Institute, Ireland

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Raúl Fernández-Díaz, IBM Research | UCD Conway Institute, Ireland
  • Thanh Lam Hoang, IBM Research Dublin, Ireland
  • Vanessa Lopez, IBM Research Dublin, Ireland
  • Denis Shields, University College Dublin, Ireland

Presentation Overview: Show

Peptides are a rapidly growing drug modality with diverse bioactivities and accessible synthesis, particularly for canonical peptides composed of the 20 standard amino acids. However, enhancing their pharmacological properties often requires chemical modifications, increasing synthesis cost and complexity. Consequently, most existing data and predictive models focus on canonical peptides. To accelerate the development of peptide drugs, there is a need for models that generalize from canonical to non-canonical peptides.

We present AutoPeptideML, an open-source, user-friendly machine learning platform designed to bridge this gap. It empowers experimental scientists to build custom predictive models without specialized computational knowledge, enabling active learning workflows that optimize experimental design and reduce sample requirements. AutoPeptideML introduces key innovations: (1) preprocessing pipelines for harmonizing diverse peptide formats (e.g., sequences, SMILES); (2) automated sampling of negative peptides with matched physicochemical properties; (3) robust test set selection with multiple similarity functions (via the Hestia-GOOD framework); (4) flexible model building with multiple representation and algorithm choices; (5) thorough model evaluation for unseen data at multiple similarity levels; and (6) FAIR-compliant, interpretable outputs to support reuse and sharing. A webserver with GUI enhances accessibility and interoperability.

We validated AutoPeptideML on 18 peptide bioactivity datasets and found that automated negative sampling and rigorous evaluation reduce overestimation of model performance, promoting user trust. A follow-up investigation also highlighted the current limitations in extrapolating from canonical to non-canonical peptides using existing representation methods.

AutoPeptideML is a powerful platform for democratizing machine learning in peptide research, facilitating integration with experimental workflows across academia and industry.

14:00-14:20
BioPortal: a rejuvenated resource for biomedical ontologies
Confirmed Presenter: J. Harry Caufield, Lawrence Berkeley National Laboratory, United States

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • J. Harry Caufield, Lawrence Berkeley National Laboratory, United States
  • Jennifer Vendetti, Stanford University, United States
  • Nomi Harris, Lawrence Berkeley National Laboratory, United States
  • Michael Dorf, Stanford University, United States
  • Alex Skrenchuk, Stanford University, United States
  • Rafael Gonçalves, Stanford University, United States
  • John Graybeal, Stanford University, United States
  • Harshad Hegde, Lawrence Berkeley National Laboratory, United States
  • Timothy Redmond, Stanford University, United States
  • Chris Mungall, Lawrence Berkeley National Laboratory, United States
  • Mark Musen, Stanford University, United States

Presentation Overview: Show

BioPortal is an open repository of biomedical ontologies that supports data organization, curation, and integration across various domains. Serving as a fundamental infrastructure for modern information systems, BioPortal has been an open-source project for 20 years and currently hosts over 1,500 ontologies, with 1,192 publicly accessible.

Recent enhancements include tools for creating cross-ontology knowledge graphs and a semi-automated process for ontology change requests. Traditionally, ontology updates required expertise and were time-consuming, as users had to submit requests through developers. BioPortal's new service expedites this process using the Knowledge Graph Change Language (KGCL). A user-friendly interface accepts change requests via forms, which are then converted to GitHub issues with KGCL commands.

The new BioPortal Knowledge Graph (KG-Bioportal) tool merges user-selected ontology subsets using a common graph format and the Biolink Model. An open-source pipeline translates ontologies into the KGX graph format, facilitating interoperability with other biomedical knowledge sources. KG-Bioportal enables more integrated and flexible querying of ontologies, allowing researchers to connect information across domains.

Future improvements include enhanced ontology pages, automated metadata updates, and KG features with graph-based search and large language model integration. These enhancements aim to position BioPortal as an interoperable resource that meets the community's evolving needs.

14:20-14:40
Formal Validation of Variant Classification Rules Using Domain-Specific Language and Meta-Predicates
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Michael Bouzinier, Forome Association, Harvard University, IDEXX Laboratories, United States
  • Michael Chumack, Forome Association, United States
  • Giorgi Shavtvalishvili, Forome Association, Impel, Georgia
  • Eugenia Lvova, Forome Association, Deggendorf Institute of Technology, Germany
  • Dmitry Etin, Forome Association, Deggendorf Institute of Technology, Austria

Presentation Overview: Show

This talk aims to initiate a community discussion on strategies for validating the selection and curation of genetic variants for clinical and research purposes. We present our approach using a Domain-Specific Language (DSL), first introduced with the AnFiSA platform at BOSC 2019.

Since our 2022 publication, we have continued developing this methodology. At BOSC 2023, we presented two extensions: the strong typing of genetic variables in the DSL, and the application of our framework beyond genetics, into population and environmental health.

This year, we focus on validating the provenance and evidentiary support of annotation elements based on purpose, knowledge domain, method of derivation, and scale — an ontology we introduced in 2023. We aim to support two key use cases: (1) logical validation during rule development, and (2) ensuring rule portability when existing rules are adapted for new clinical or laboratory settings.

We present a proof of concept using meta-predicates — embedded assertions in DSL scripts that validate specific properties of genetic annotations used in variant curation. This technique draws inspiration from Invariant-based Programming.

Finally, we frame our work in the context of AI-assisted code synthesis. Recent studies highlight the advantages of deep learning-guided program induction over test-time training and fine tuning (TTT/TTFT) for structured reasoning tasks. This reinforces the promise of DSL-based approaches as transparent, verifiable complements to generative AI in modern computational genomics.

14:40-15:00
Accelerating precision medicine with open-source frameworks for ontology-driven knowledge graphs and large language models
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Sebastian Lobentanzer, Helmholtz Munich, Germany

Presentation Overview: Show

Advances in biomedical data integration and knowledge management leveraging large language models (LLMs) and knowledge graphs (KGs) hold significant promise for translational medicine. Despite the vast availability of clinical, molecular, and other datasets, their effective harmonization, semantic grounding, and transformation into actionable insights remain challenging.

We present our open-source framework to facilitate biomedical application of LLMs, BioChatter (https://biochatter.org). Extending our BioCypher ecosystem for building modular KGs based in ontology, BioChatter leverages these richly structured KGs to allow construction of robust, customizable, and user-centric LLM-driven solutions tailored for translational biomedical tasks. We improve trustworthiness of LLM applications in the medical domain by our benchmark-first philosophy, implementing state-of-the-art benchmarks under supervision of physicians and other domain experts.

We showcase the utility and flexibility of this integrated ecosystem through several translational applications. This includes a molecular tumour board solution, integrating clinical, molecular, and publicly available biomedical knowledge to deliver personalized therapeutic recommendations. Within the Open Targets Consortium, we employ both BioCypher and BioChatter to build semantically enriched knowledge representations and sophisticated agent-based solutions, automating complex knowledge management pipelines crucial for drug discovery and translational research.

The modular and open-source nature of BioCypher and BioChatter facilitates widespread adoption, adaptation, and integration into existing translational medical research frameworks. Our ongoing projects, such as the development of a data sharing platform funded by the German Network University Medicine and the German Centre for Diabetes Research, further illustrate their broad applicability. We encourage and support extensions and specific use cases in our active community.

15:00-15:20
Applications of Bioschemas in FAIR, AI and knowledge representation
Confirmed Presenter: Nick Juty, The University of Manchester, United Kingdom

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Nick Juty, The University of Manchester, United Kingdom
  • Phil Reed, The University of Manchester, United Kingdom
  • Helena Schnitzer, Forschungszentrum Jülich GmbH, Germany
  • Leyla Jael Castro, ZB MED Information Centre for Life Sciences, Germany
  • Alban Gaignard, University of Nantes, France
  • Carole Goble, The University of Manchester, United Kingdom

Presentation Overview: Show

Bioschemas.org defines domain-specific metadata schemas based on schema.org extensions, which expose key metadata properties from resource records. This provides a lightweight and easily adoptable means to incorporate key metadata on web records, and a mechanism to link to domain-specific ontology/vocabulary terms. As an established community effort focused on improving the FAIRness of resources in the Life Sciences, we now aim to extend the impact of Bioschemas beyond improvements to ‘findability’.
Bioschemas has been used to aggregate data in a distributed environment through federation, using metadata Bioschemas markup. More recently, we are leveraging Bioschemas deployments on resource websites, harvesting directly to populate SPARQL endpoints, subsequently creating queryable knowledge graphs.
An improved Bioschemas validation process will assess the ‘FAIR’ level of the user’s web records and suggest the most appropriate Bioschemas profile based on similarity to those in the Bioschemas registry.
Our learnings in operating this community will be extended into non-’bio’ domains wishing to more easily incorporate ontologies and metadata in their web-based records. To that end, we have a sister site dedicated to hosting the many domain-agnostic types/profiles that have already emerged from our work (so far 7 profiles aligned to digital objects in research, e.g., workflows, datasets, training materials): https://schemas.science/. Through this infrastructure we will develop a sustainable, cross-institutional collaborative space for long-term and wide ranging impact, supporting our existing community engagement with global AI, ML, and Training communities, and others in the future.

15:20-15:40
RO-Crate: Capturing FAIR research outputs in bioinformatics and beyond
Confirmed Presenter: Phil Reed, The University of Manchester, United Kingdom

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Eli Chadwick, The University of Manchester, United Kingdom
  • Stian Soiland-Reyes, The University of Manchester, United Kingdom
  • Phil Reed, The University of Manchester, United Kingdom
  • Claus Weiland, Leibniz Institute for Biodiversity and Earth System Research, Germany
  • Dag Endresen, University of Oslo, Norway
  • Felix Shaw, Earlam Institute, United Kingdom
  • Timo Mühlhaus, RPTU Kaiserslautern-Landau, Germany
  • Carole Goble, The University of Manchester, United Kingdom

Presentation Overview: Show

RO-Crate is a mechanism for packaging research outputs with structured metadata, providing machine-readability and reproducibility following the FAIR principles. It enables interlinking methods, data, and outputs with the outcomes of a project or a piece of work, even where distributed across repositories.

Researchers can distribute their work as an RO-Crate to ensure their data travels with its metadata, so that key components are correctly tracked, archived, and attributed. Data stewards and infrastructure providers can integrate RO-Crate into the projects and platforms they support, to make it easier for researchers to create and consume RO-Crates without requiring technical expertise.

Community-developed extensions called “profiles” allow the creation of tailored RO-Crates that serve the needs of a particular domain or data format.

Current uses of RO-Crate in bioinformatics include:
∙ Describing and sharing computational workflows registered with WorkflowHub
∙ Creating FAIR exports of workflow executions from workflow engines and biodiversity digital twin simulations
∙ Enabling an appropriate level of credit and attribution, particularly in currently under-recognised roles (eg. sample gathering, processing, sample distribution)
∙ Capturing plant science experiments as Annotated Research Contexts (ARC), complex objects which include workflows, workflow executions, inputs, and results
∙ Defining metadata conventions for biodiversity genomics

This presentation will outline the RO-Crate project and highlight its most prominent applications within bioinformatics, with the aim of increasing awareness and sparking new conversations and collaborations within the BOSC community.

PheBee: A Graph-Based System for Scalable, Traceable, and Semantically Aware Phenotyping
Confirmed Presenter: David Gordon, Office of Data Sciences at Nationwide Children's Hospital, United States

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • David Gordon, Office of Data Sciences at Nationwide Children's Hospital, United States
  • Max Homilius, Office of Data Sciences at Nationwide Children's Hospital, United States
  • Austin Antoniou, Office of Data Sciences at Nationwide Children's Hospital, United States
  • Connor Grannis, Office of Data Sciences at Nationwide Children's Hospital, United States
  • Grant Lammi, Office of Data Sciences at Nationwide Children's Hospital, United States
  • Adam Herman, Office of Data Sciences at Nationwide Children's Hospital, United States
  • Ashley Kubatko, Office of Data Sciences at Nationwide Children's Hospital, United States
  • Peter White, Office of Data Sciences at Nationwide Children's Hospital, United States

Presentation Overview: Show

The association of phenotypes and disease diagnoses is a cornerstone of clinical care and biomedical research. Significant work has gone into standardizing these concepts in ontologies like the Human Phenotype Ontology and Mondo, and in developing interoperability standards such as Phenopackets. Managing subject-term associations in a traceable and scalable way that enables semantic queries and bridges clinical and research efforts remains a significant challenge.

PheBee is an open-source tool designed to address this challenge by using a graph-based approach to organize and explore data. It allows users to perform powerful, meaning-based searches and supports standardized data exchange through Phenopackets. The system is easy to deploy and share thanks to reproducible setup templates.

The graph model underlying PheBee captures subject-term associations along with their provenance and modifiers. Queries leverage ontology structure to traverse semantic term relationships. Terms can be linked at the patient, encounter, or note level, supporting temporal and contextual pattern analysis. PheBee accommodates both manually assigned and computationally derived phenotypes, enabling use across diverse pipelines. When integrated downstream of natural language processing pipelines, PheBee maintains traceability from extracted terms to the original clinical text, enabling high-throughput, auditable term capture.

PheBee is currently being piloted in internal translational research projects supporting phenotype-driven pediatric care. Its graph foundation also empowers future feature development, such as natural language querying using retrieval augmented generation or genomic data integration to identify subjects with variants in phenotypically relevant genes.

PheBee advances open science in biomedical research and clinical support by promoting structured, traceable phenotype data.

The role of the Ontology Development Kit in supporting ontology compliance in adverse legal landscapes
Confirmed Presenter: Damien Goutte-Gattat, University of Cambridge, United Kingdom

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Damien Goutte-Gattat, University of Cambridge, United Kingdom

Presentation Overview: Show

Ontologies, like code, are a form of speech. As such, they can be
subject to laws and other regulations that attempt to control how
freedom of speech is exercised, and ontology editors may find themselves
in the position of being legally compelled to introduce some changes in
their ontologies for the sole purpose of complying with the laws that
applies to them.

Therefore, developers of tools used for ontology editing and maintenance
need to ponder whether their tools should provide features to facilitate
the introduction of such legally mandated changes, and how.

As developers of the Ontology Development Kit (ODK), one of the main
tools used to maintain ontologies of the OBO Foundry, we will consider
both the moral and technical aspects of allowing ODK users to comply
with arbitrary legal restrictions. The overall approach we are
envisioning, in order to contain the impacts of such restrictions to the
jurisdiction that mandate them, is a “split world” system, where the ODK
would facilitate the production of slightly different editions of the
same ontology.

15:40-16:00
10 years of the AberOWL ontology repository: moving towards federated reasoning and natural language access
Confirmed Presenter: Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia

Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Fernando Zhapa-Camacho, King Abdullah University of Science and Technology, Saudi Arabia
  • Olga Mashkova, King Abdullah University of Science and Technology, Saudi Arabia
  • Maxat Kulmanov, King Abdullah University of Science and Technology, Saudi Arabia
  • Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia

Presentation Overview: Show

AberOWL is a framework for ontology-based data access in biology that has provided reasoning services for bio-ontologies since 2015. Unlike other ontology repositories in the life sciences such as BioPortal, OLS, and OntoBee, AberOWL uniquely focuses on providing access to Description Logic querying through a Description Logic reasoner. The system comprises a reasoning service using OWLAPI and the Elk reasoner, an ElasticSearch service for natural language queries, and a SPARQL endpoint capable of embedding Description Logic queries within SPARQL queries. AberOWL contains all ontologies from BioPortal and the OBO library, enabling lightweight reasoning over the OWL 2 EL profile and implementing the Ontology-Based Data Access paradigm. This allows query enhancement through reasoning to infer implicit knowledge not explicitly stated in data. After a decade of operation, AberOWL is evolving in three key directions: (1) introducing a lightweight, containerized version enabling local deployment for single ontologies with the ability to register with the central repository for federated reasoning access; (2) integrating improved natural language processing through Large Language Models to facilitate Description Logic querying without requiring strict syntax adherence; and (3) implementing a FAIR API that standardizes access to ontology querying and repositories, improving interoperability. These advancements will transform AberOWL into a more federated system with FAIR API access and LLM integration for enhanced ontology interaction.

16:40-16:50
The global biodata infrastructure: how, where, who, and what?
Room: 03A
Format: In person

Moderator(s): Karsten Hokamp


Authors List: Show

  • Guy Cochrane, Global Biodata Coalition, United Kingdom
  • Chuck Cook, Global Biodata Coalition, United Kingdom

Presentation Overview: Show

Life science and biomedical research around the world is critically dependent on a global infrastructure of biodata resources that store and provide access to research data, and to tools and services that allow users to interrogate, combine and re-use these data to generate new insights. These resources, most of which are open and freely available, form a critical, globally distributed, and highly-connected infrastructure that has grown organically over time.

Funders and managers of biodata resources are keenly aware that the long-term sustainability of this infrastructure, and of the individual resources it comprises, is under threat. The infrastructure has not been well described and there is a need to understand how many resources there are, where they are located, who funds them, and which are of the greatest importance for the scientific community.

The Global Biodata Coalition has worked to describe the infrastructure by undertaking an inventory of global biodata resources and by running a selection process to identify a set of—currently 52—Global Core Biodata Resources (GCBRs) that are of fundamental importance to global life sciences research.

We will present an overview of the location and funders of the GCBRs, and will summarise the findings of the two rounds of the global inventory of biodata resources, which identified over 3,700 resources.

The results of these analyses provide an overview of the infrastructure and will allow the GBC to identify major funders of biodata resources that are not currently engaged in the discussion of issues of sustainability.

16:50-17:50
Panel: Panel: Data Sustainability
Room: 03A
Format: In person

Moderator(s): Monica Munoz-Torres


Authors List: Show

  • Susanna Sansone
  • Chris Mungall
  • Varsha Khodiyar

Presentation Overview: Show

This BOSC 2025 panel will tackle the essential challenge of Data Sustainability, defined as the proactive and principled approach to ensuring bioinformatics research data remains FAIR, ethically managed, and valuable for future generations through sufficient infrastructure, funding, expertise, and governance. In light of current funding pressures and the risk of data loss that impedes scientific progress and wastes resources, establishing sustainable practices has become more urgent than ever. This discussion will incorporate diverse perspectives to examine practical strategies and solutions across key areas, including FAIR/CARE principles, funding models, open science, data lifecycle management, technical scalability, and ethical considerations.

17:50-18:00
Closing Remarks
Room: 03A
Format: In person

Moderator(s): Monica Munoz-Torres


Authors List: Show

  • Nomi Harris