The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 22, 2025
11:20-12:20
Invited Presentation: Open Knowledge Bases in the Age of Generative AI
Confirmed Presenter: Chris Mungall
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Chris Mungall

Presentation Overview:Show

The scientific and clinical community relies on the active development of a wide range of inter-linked knowledge bases, in order to plan experiments, interpret omics data, and to help with the diagnosis and treatment of disease. These knowledge bases make use of expert curation, and the use of community ontologies in order to provide accurate and structured information that can be used algorithmically. The advent of generative AI and agentic methods presents fantastic opportunities for accelerating curation, increasing the breadth and depth of coverage. Open knowledge bases also present opportunities to generative AI, in the form of a trusted backbone of knowledge that can mitigate the hallucinations that plague large language models. However, the pace of development of AI, combined with misunderstandings about both strengths and weaknesses, poses significant dangers. In this talk, I will present our recent work on the use of agentic AI to assist with manual knowledge base tasks, particularly those involving complex ontology development and maintenance tasks. I will present a realistic picture of challenges we face, but also strategies to mitigate them, and a path towards a future where agents, curators, and others can work together to leverage and integrate open source tools and data along with the combined knowledge of the scientific community.

July 22, 2025
12:20-12:40
textToKnowledgeGraph: Generation of Molecular Interaction Knowledge Graphs Using Large Language Models for Exploration in Cytoscape
Confirmed Presenter: Favour James, Obafemi Awolowo University, Nigeria
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Favour James, Favour James, Obafemi Awolowo University
  • Christopher Churas, Christopher Churas, Department of Medicine
  • Trey Ideker, Trey Ideker, Department of Medicine
  • Dexter Pratt, Dexter Pratt, Department of Medicine
  • Augustin Luna, Augustin Luna, National Library of Medicine and National Cancer Institute

Presentation Overview:Show

Knowledge graphs (KGs) are powerful tools for structuring and analyzing biological information due to their ability to represent data and improve queries across heterogeneous datasets. However, constructing KGs from unstructured literature remains challenging due to the cost and expertise required for manual curation. Prior works have explored text-mining techniques to automate this process, but have limitations that impact their ability to capture complex relationships fully. Traditional text-mining methods struggle with understanding context across sentences. Additionally, these methods lack expert-level background knowledge, making it difficult to infer relationships that require awareness of concepts indirectly described in the text.
Large Language Models (LLMs) present an opportunity to overcome these challenges. LLMs are trained on diverse literature, equipping them with contextual knowledge that enables more accurate extraction. Additionally, LLMs can process the entirety of an article, capturing relationships across sections rather than analyzing single sentences; this allows for more precise extraction. We present textToKnowledgeGraph, an artificial intelligence tool using LLMs to extract interactions from individual publications directly in Biological Expression Language (BEL). BEL was chosen for its compact and detailed representation of biological relationships, allowing for structured and computationally accessible encoding.
This work makes several contributions. 1. Development of the open‑source Python textToKnowledgeGraph package (pypi.org/project/texttoknowledgegraph) for BEL extraction from scientific articles, usable from the command line and within other projects, 2. An interactive application within Cytoscape Web to simplify extraction and exploration, 3. A dataset of extractions that have been both computationally and manually reviewed to support future fine-tuning efforts.

July 22, 2025
12:40-13:00
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models built on Biomed-Multi-Omic
Confirmed Presenter: Bharath Dandala, IBM, United States
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Bharath Dandala, Bharath Dandala, IBM
  • Michael M Danziger, Michael M Danziger, IBM
  • Ching-Huei Tsou, Ching-Huei Tsou, IBM
  • Akira Koseki, Akira Koseki, IBM
  • Viatcheslav Gurev, Viatcheslav Gurev, IBM
  • Tal Kozlovski, Tal Kozlovski, IBM
  • Ella Barkan, Ella Barkan, IBM
  • Matthew Madgwick, Matthew Madgwick, IBM
  • Akihiro Kosugi, Akihiro Kosugi, IBM
  • Tanwi Biswas, Tanwi Biswas, IBM
  • Liran Szalk, Liran Szalk, IBM
  • Matan Ninio, Matan Ninio, IBM

Presentation Overview:Show

High-throughput sequencing has revolutionized transcriptomic studies, and the synthesis of these diverse datasets holds significant potential for a deeper under- standing of cell biology. Recent advancements have introduced several promising techniques for building transcriptomic foundation models (TFMs), each emphasizing unique modeling decisions and demonstrating potential in handling the inherent challenges of high-dimensional, sparse data. However, despite their individual strengths, current TFMs still struggle to fully capture biologically meaningful representations, highlighting the need for further improvements. Recognizing that existing TFM approaches possess complementary strengths and weaknesses, a promising direction lies in the systematic exploration of various combinations of design, training, and evaluation methodologies. Thus, to accelerate progress in this field, we present bmfm-rna (shown in Figure 1), a comprehensive framework that not only facilitates this combinatorial exploration but is also inherently flexible and easily extensible to incorporate novel methods as the field continues to advance. This framework enables scalable data processing and features extensible transformer architectures. It supports a variety of input representations, pretraining objectives, masking strategies, domain-specific metrics, and model interpretation methods. Furthermore, it facilitates down- stream tasks such as cell type annotation, perturbation prediction, and batch effect correction on benchmark datasets. Models trained with the framework achieve performance comparable to scGPT, Geneformer and other TFMs on these downstream tasks. By open-sourcing this framework with strong performance, we aim to lower barriers for developing TFMs and invite the community to build more effective TFMs. bmfm-rna is available via Apache license at https://github.com/BiomedSciAI/biomed-multi-omic

July 22, 2025
12:40-13:00
DOME Registry - Supporting ML transparency and reproducibility in the life sciences
Confirmed Presenter: Gavin Farrell, Uni Padova, Italy
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Gavin Farrell, Gavin Farrell, Uni Padova
  • Omar Attafi, Omar Attafi, University of Padova
  • Silvio Tosatto, Silvio Tosatto, University of Padova

Presentation Overview:Show

The adoption of machine learning (ML) methods in the life sciences has been transformative, solving landmark challenges such as accurate protein structure prediction, improving bioimaging diagnostics and accelerating drug discovery. However, researchers face a reuse and reproducibility crisis of ML publications. Authors are publishing ML methods lacking core information to transfer value back to the reader. Commonly absent are links to code, data and models eroding trust in the methods.

In response to this ELIXIR Europe developed a practical checklist of recommendations covering key ML methods aspects for disclosure covering; data, optimisation, model and evaluation. These are now known collectively as the DOME Recommendations published in Nature Methods by Walsh et al. 2021. Building on this successful first step towards addressing the ML publishing crisis, ELIXIR has developed a technological solution to support the implementation of the DOME Recommendations. This solution is known as the DOME Registry and was published in GigaScience by Ataffi et al. in late 2024.

This talk will cover the DOME Registry technology which serves as a curated database of ML methods for life science publications by allowing researchers to annotate and share their methods. The service can also be adopted by publishers during their ML publishing workflow to increase a publication’s transparency and reproducibility. An overview of the next steps for the DOME Registry will also be provided - considering new ML ontologies, metadata formats and integrations building towards a stronger ML ecosystem for the life sciences.

July 22, 2025
12:40-13:00
AutoPeptideML 2: An open source library for democratizing machine learning for peptide bioactivity prediction
Confirmed Presenter: Raúl Fernández-Díaz, IBM Research | UCD Conway Institute, Ireland
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Raúl Fernández-Díaz, Raúl Fernández-Díaz, IBM Research | UCD Conway Institute
  • Thanh Lam Hoang, Thanh Lam Hoang, IBM Research Dublin
  • Vanessa Lopez, Vanessa Lopez, IBM Research Dublin
  • Denis Shields, Denis Shields, University College Dublin

Presentation Overview:Show

Peptides are a rapidly growing drug modality with diverse bioactivities and accessible synthesis, particularly for canonical peptides composed of the 20 standard amino acids. However, enhancing their pharmacological properties often requires chemical modifications, increasing synthesis cost and complexity. Consequently, most existing data and predictive models focus on canonical peptides. To accelerate the development of peptide drugs, there is a need for models that generalize from canonical to non-canonical peptides.

We present AutoPeptideML, an open-source, user-friendly machine learning platform designed to bridge this gap. It empowers experimental scientists to build custom predictive models without specialized computational knowledge, enabling active learning workflows that optimize experimental design and reduce sample requirements. AutoPeptideML introduces key innovations: (1) preprocessing pipelines for harmonizing diverse peptide formats (e.g., sequences, SMILES); (2) automated sampling of negative peptides with matched physicochemical properties; (3) robust test set selection with multiple similarity functions (via the Hestia-GOOD framework); (4) flexible model building with multiple representation and algorithm choices; (5) thorough model evaluation for unseen data at multiple similarity levels; and (6) FAIR-compliant, interpretable outputs to support reuse and sharing. A webserver with GUI enhances accessibility and interoperability.

We validated AutoPeptideML on 18 peptide bioactivity datasets and found that automated negative sampling and rigorous evaluation reduce overestimation of model performance, promoting user trust. A follow-up investigation also highlighted the current limitations in extrapolating from canonical to non-canonical peptides using existing representation methods.

AutoPeptideML is a powerful platform for democratizing machine learning in peptide research, facilitating integration with experimental workflows across academia and industry.

July 22, 2025
14:00-14:20
BioPortal: a rejuvenated resource for biomedical ontologies
Confirmed Presenter: J. Harry Caufield, Lawrence Berkeley National Laboratory, United States
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • J. Harry Caufield, J. Harry Caufield, Lawrence Berkeley National Laboratory
  • Jennifer Vendetti, Jennifer Vendetti, Stanford University
  • Nomi Harris, Nomi Harris, Lawrence Berkeley National Laboratory
  • Michael Dorf, Michael Dorf, Stanford University
  • Alex Skrenchuk, Alex Skrenchuk, Stanford University
  • Rafael Gonçalves, Rafael Gonçalves, Stanford University
  • John Graybeal, John Graybeal, Stanford University
  • Harshad Hegde, Harshad Hegde, Lawrence Berkeley National Laboratory
  • Timothy Redmond, Timothy Redmond, Stanford University
  • Chris Mungall, Chris Mungall, Lawrence Berkeley National Laboratory
  • Mark Musen, Mark Musen, Stanford University

Presentation Overview:Show

BioPortal is an open repository of biomedical ontologies that supports data organization, curation, and integration across various domains. Serving as a fundamental infrastructure for modern information systems, BioPortal has been an open-source project for 20 years and currently hosts over 1,500 ontologies, with 1,192 publicly accessible.

Recent enhancements include tools for creating cross-ontology knowledge graphs and a semi-automated process for ontology change requests. Traditionally, ontology updates required expertise and were time-consuming, as users had to submit requests through developers. BioPortal's new service expedites this process using the Knowledge Graph Change Language (KGCL). A user-friendly interface accepts change requests via forms, which are then converted to GitHub issues with KGCL commands.

The new BioPortal Knowledge Graph (KG-Bioportal) tool merges user-selected ontology subsets using a common graph format and the Biolink Model. An open-source pipeline translates ontologies into the KGX graph format, facilitating interoperability with other biomedical knowledge sources. KG-Bioportal enables more integrated and flexible querying of ontologies, allowing researchers to connect information across domains.

Future improvements include enhanced ontology pages, automated metadata updates, and KG features with graph-based search and large language model integration. These enhancements aim to position BioPortal as an interoperable resource that meets the community's evolving needs.

July 22, 2025
14:20-14:40
Formal Validation of Variant Classification Rules Using Domain-Specific Language and Meta-Predicates
Confirmed Presenter: Michael Bouzinier, Forome Association, Harvard University
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Michael Bouzinier, Michael Bouzinier, Forome Association
  • Michael Chumack, Michael Chumack, Forome Association
  • Giorgi Shavtvalishvili, Giorgi Shavtvalishvili, Forome Association
  • Eugenia Lvova, Eugenia Lvova, Forome Association
  • Dmitry Etin, Dmitry Etin, Forome Association

Presentation Overview:Show

This talk aims to initiate a community discussion on strategies for validating the selection and curation of genetic variants for clinical and research purposes. We present our approach using a Domain-Specific Language (DSL), first introduced with the AnFiSA platform at BOSC 2019.

Since our 2022 publication, we have continued developing this methodology. At BOSC 2023, we presented two extensions: the strong typing of genetic variables in the DSL, and the application of our framework beyond genetics, into population and environmental health.

This year, we focus on validating the provenance and evidentiary support of annotation elements based on purpose, knowledge domain, method of derivation, and scale — an ontology we introduced in 2023. We aim to support two key use cases: (1) logical validation during rule development, and (2) ensuring rule portability when existing rules are adapted for new clinical or laboratory settings.

We present a proof of concept using meta-predicates — embedded assertions in DSL scripts that validate specific properties of genetic annotations used in variant curation. This technique draws inspiration from Invariant-based Programming.

Finally, we frame our work in the context of AI-assisted code synthesis. Recent studies highlight the advantages of deep learning-guided program induction over test-time training and fine tuning (TTT/TTFT) for structured reasoning tasks. This reinforces the promise of DSL-based approaches as transparent, verifiable complements to generative AI in modern computational genomics.

July 22, 2025
14:40-15:00
BioChatter: An open-source framework integrating knowledge graphs and large language models for Accessible Biomedical AI
Confirmed Presenter: Sebastian Lobentanzer, Helmholtz Munich, Germany
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Sebastian Lobentanzer, Sebastian Lobentanzer, Helmholtz Munich

Presentation Overview:Show

The integration of large language models (LLMs) with structured biomedical knowledge remains a key challenge for building robust, trustworthy, and reproducible AI applications in biomedicine. We present BioChatter (https://biochatter.org), an open-source Python framework that bridges ontology-driven knowledge graphs (KGs) and LLMs through a modular, extensible architecture. Built as a companion to the BioCypher ecosystem for constructing biomedical KGs (https://biocypher.org), BioChatter allows researchers to easily build LLM-powered applications grounded in domain knowledge and interoperable data standards.

BioChatter emphasises transparent, community-driven development, supported by extensive documentation, real-world usage examples, and active support channels. Its design supports multiple modes of use from lightweight prototyping to server-based deployment and integrates naturally with open LLM ecosystems (e.g., Ollama, LangChain), knowledge graphs, and the Model Context Protocol (MCP) for LLM tool usage. We highlight ongoing applications across biomedical domains, including automated knowledge integration pipelines for drug discovery (Open Targets), clinical decision support prototypes, and data sharing platforms within the German research infrastructure.

The open-source nature of BioChatter, together with its benchmark-first approach for validating biomedical LLM applications, facilitates broad adoption and collaboration. By lowering the entry barrier for building trustworthy biomedical AI systems, BioChatter contributes to the growing open-source ecosystem supporting reproducible, transparent, and community-driven AI development in the life sciences.

July 22, 2025
15:00-15:20
Applications of Bioschemas in FAIR, AI and knowledge representation
Confirmed Presenter: Nick Juty, The University of Manchester, United Kingdom
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Nick Juty, Nick Juty, The University of Manchester
  • Phil Reed, Phil Reed, The University of Manchester
  • Helena Schnitzer, Helena Schnitzer, Forschungszentrum Jülich GmbH
  • Leyla Jael Castro, Leyla Jael Castro, ZB MED Information Centre for Life Sciences
  • Alban Gaignard, Alban Gaignard, University of Nantes
  • Carole Goble, Carole Goble, The University of Manchester

Presentation Overview:Show

Bioschemas.org defines domain-specific metadata schemas based on schema.org extensions, which expose key metadata properties from resource records. This provides a lightweight and easily adoptable means to incorporate key metadata on web records, and a mechanism to link to domain-specific ontology/vocabulary terms. As an established community effort focused on improving the FAIRness of resources in the Life Sciences, we now aim to extend the impact of Bioschemas beyond improvements to ‘findability’.
Bioschemas has been used to aggregate data in a distributed environment through federation, using metadata Bioschemas markup. More recently, we are leveraging Bioschemas deployments on resource websites, harvesting directly to populate SPARQL endpoints, subsequently creating queryable knowledge graphs.
An improved Bioschemas validation process will assess the ‘FAIR’ level of the user’s web records and suggest the most appropriate Bioschemas profile based on similarity to those in the Bioschemas registry.
Our learnings in operating this community will be extended into non-’bio’ domains wishing to more easily incorporate ontologies and metadata in their web-based records. To that end, we have a sister site dedicated to hosting the many domain-agnostic types/profiles that have already emerged from our work (so far 7 profiles aligned to digital objects in research, e.g., workflows, datasets, training materials): https://schemas.science/. Through this infrastructure we will develop a sustainable, cross-institutional collaborative space for long-term and wide ranging impact, supporting our existing community engagement with global AI, ML, and Training communities, and others in the future.

July 22, 2025
15:20-15:40
RO-Crate: Capturing FAIR research outputs in bioinformatics and beyond
Confirmed Presenter: Phil Reed, The University of Manchester, United Kingdom
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Eli Chadwick, Eli Chadwick, The University of Manchester
  • Stian Soiland-Reyes, Stian Soiland-Reyes, The University of Manchester
  • Phil Reed, Phil Reed, The University of Manchester
  • Claus Weiland, Claus Weiland, Leibniz Institute for Biodiversity and Earth System Research
  • Dag Endresen, Dag Endresen, University of Oslo
  • Felix Shaw, Felix Shaw, Earlam Institute
  • Timo Mühlhaus, Timo Mühlhaus, RPTU Kaiserslautern-Landau
  • Carole Goble, Carole Goble, The University of Manchester

Presentation Overview:Show

RO-Crate is a mechanism for packaging research outputs with structured metadata, providing machine-readability and reproducibility following the FAIR principles. It enables interlinking methods, data, and outputs with the outcomes of a project or a piece of work, even where distributed across repositories.

Researchers can distribute their work as an RO-Crate to ensure their data travels with its metadata, so that key components are correctly tracked, archived, and attributed. Data stewards and infrastructure providers can integrate RO-Crate into the projects and platforms they support, to make it easier for researchers to create and consume RO-Crates without requiring technical expertise.

Community-developed extensions called “profiles” allow the creation of tailored RO-Crates that serve the needs of a particular domain or data format.

Current uses of RO-Crate in bioinformatics include:
∙ Describing and sharing computational workflows registered with WorkflowHub
∙ Creating FAIR exports of workflow executions from workflow engines and biodiversity digital twin simulations
∙ Enabling an appropriate level of credit and attribution, particularly in currently under-recognised roles (eg. sample gathering, processing, sample distribution)
∙ Capturing plant science experiments as Annotated Research Contexts (ARC), complex objects which include workflows, workflow executions, inputs, and results
∙ Defining metadata conventions for biodiversity genomics

This presentation will outline the RO-Crate project and highlight its most prominent applications within bioinformatics, with the aim of increasing awareness and sparking new conversations and collaborations within the BOSC community.

July 22, 2025
15:20-15:40
PheBee: A Graph-Based System for Scalable, Traceable, and Semantically Aware Phenotyping
Confirmed Presenter: David Gordon, Office of Data Sciences at Nationwide Children's Hospital, United States
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • David Gordon, David Gordon, Office of Data Sciences at Nationwide Children's Hospital
  • Max Homilius, Max Homilius, Office of Data Sciences at Nationwide Children's Hospital
  • Austin Antoniou, Austin Antoniou, Office of Data Sciences at Nationwide Children's Hospital
  • Connor Grannis, Connor Grannis, Office of Data Sciences at Nationwide Children's Hospital
  • Grant Lammi, Grant Lammi, Office of Data Sciences at Nationwide Children's Hospital
  • Adam Herman, Adam Herman, Office of Data Sciences at Nationwide Children's Hospital
  • Ashley Kubatko, Ashley Kubatko, Office of Data Sciences at Nationwide Children's Hospital
  • Peter White, Peter White, Office of Data Sciences at Nationwide Children's Hospital

Presentation Overview:Show

The association of phenotypes and disease diagnoses is a cornerstone of clinical care and biomedical research. Significant work has gone into standardizing these concepts in ontologies like the Human Phenotype Ontology and Mondo, and in developing interoperability standards such as Phenopackets. Managing subject-term associations in a traceable and scalable way that enables semantic queries and bridges clinical and research efforts remains a significant challenge.

PheBee is an open-source tool designed to address this challenge by using a graph-based approach to organize and explore data. It allows users to perform powerful, meaning-based searches and supports standardized data exchange through Phenopackets. The system is easy to deploy and share thanks to reproducible setup templates.

The graph model underlying PheBee captures subject-term associations along with their provenance and modifiers. Queries leverage ontology structure to traverse semantic term relationships. Terms can be linked at the patient, encounter, or note level, supporting temporal and contextual pattern analysis. PheBee accommodates both manually assigned and computationally derived phenotypes, enabling use across diverse pipelines. When integrated downstream of natural language processing pipelines, PheBee maintains traceability from extracted terms to the original clinical text, enabling high-throughput, auditable term capture.

PheBee is currently being piloted in internal translational research projects supporting phenotype-driven pediatric care. Its graph foundation also empowers future feature development, such as natural language querying using retrieval augmented generation or genomic data integration to identify subjects with variants in phenotypically relevant genes.

PheBee advances open science in biomedical research and clinical support by promoting structured, traceable phenotype data.

July 22, 2025
15:20-15:40
The role of the Ontology Development Kit in supporting ontology compliance in adverse legal landscapes
Confirmed Presenter: Damien Goutte-Gattat, University of Cambridge, United Kingdom
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Damien Goutte-Gattat, Damien Goutte-Gattat, University of Cambridge

Presentation Overview:Show

Ontologies, like code, are a form of speech. As such, they can be
subject to laws and other regulations that attempt to control how
freedom of speech is exercised, and ontology editors may find themselves
in the position of being legally compelled to introduce some changes in
their ontologies for the sole purpose of complying with the laws that
applies to them.

Therefore, developers of tools used for ontology editing and maintenance
need to ponder whether their tools should provide features to facilitate
the introduction of such legally mandated changes, and how.

As developers of the Ontology Development Kit (ODK), one of the main
tools used to maintain ontologies of the OBO Foundry, we will consider
both the moral and technical aspects of allowing ODK users to comply
with arbitrary legal restrictions. The overall approach we are
envisioning, in order to contain the impacts of such restrictions to the
jurisdiction that mandate them, is a “split world” system, where the ODK
would facilitate the production of slightly different editions of the
same ontology.

July 22, 2025
15:40-16:00
10 years of the AberOWL ontology repository: moving towards federated reasoning and natural language access
Confirmed Presenter: Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Fernando Zhapa-Camacho, Fernando Zhapa-Camacho, King Abdullah University of Science and Technology
  • Olga Mashkova, Olga Mashkova, King Abdullah University of Science and Technology
  • Maxat Kulmanov, Maxat Kulmanov, King Abdullah University of Science and Technology
  • Robert Hoehndorf, Robert Hoehndorf, King Abdullah University of Science and Technology

Presentation Overview:Show

AberOWL is a framework for ontology-based data access in biology that has provided reasoning services for bio-ontologies since 2015. Unlike other ontology repositories in the life sciences such as BioPortal, OLS, and OntoBee, AberOWL uniquely focuses on providing access to Description Logic querying through a Description Logic reasoner. The system comprises a reasoning service using OWLAPI and the Elk reasoner, an ElasticSearch service for natural language queries, and a SPARQL endpoint capable of embedding Description Logic queries within SPARQL queries. AberOWL contains all ontologies from BioPortal and the OBO library, enabling lightweight reasoning over the OWL 2 EL profile and implementing the Ontology-Based Data Access paradigm. This allows query enhancement through reasoning to infer implicit knowledge not explicitly stated in data. After a decade of operation, AberOWL is evolving in three key directions: (1) introducing a lightweight, containerized version enabling local deployment for single ontologies with the ability to register with the central repository for federated reasoning access; (2) integrating improved natural language processing through Large Language Models to facilitate Description Logic querying without requiring strict syntax adherence; and (3) implementing a FAIR API that standardizes access to ontology querying and repositories, improving interoperability. These advancements will transform AberOWL into a more federated system with FAIR API access and LLM integration for enhanced ontology interaction.

July 22, 2025
16:40-16:50
The global biodata infrastructure: how, where, who, and what?
Confirmed Presenter: Guy Cochrane, Global Biodata Coalition, United Kingdom
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Karsten Hokamp


Authors List: Show

  • Guy Cochrane, Guy Cochrane, Global Biodata Coalition
  • Chuck Cook, Chuck Cook, Global Biodata Coalition

Presentation Overview:Show

Life science and biomedical research around the world is critically dependent on a global infrastructure of biodata resources that store and provide access to research data, and to tools and services that allow users to interrogate, combine and re-use these data to generate new insights. These resources, most of which are open and freely available, form a critical, globally distributed, and highly-connected infrastructure that has grown organically over time.

Funders and managers of biodata resources are keenly aware that the long-term sustainability of this infrastructure, and of the individual resources it comprises, is under threat. The infrastructure has not been well described and there is a need to understand how many resources there are, where they are located, who funds them, and which are of the greatest importance for the scientific community.

The Global Biodata Coalition has worked to describe the infrastructure by undertaking an inventory of global biodata resources and by running a selection process to identify a set of—currently 52—Global Core Biodata Resources (GCBRs) that are of fundamental importance to global life sciences research.

We will present an overview of the location and funders of the GCBRs, and will summarise the findings of the two rounds of the global inventory of biodata resources, which identified over 3,700 resources.

The results of these analyses provide an overview of the infrastructure and will allow the GBC to identify major funders of biodata resources that are not currently engaged in the discussion of issues of sustainability.

July 22, 2025
16:50-17:50
Panel: Data Sustainability
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Monica Munoz-Torres


Authors List: Show

  • Chris Mungall
  • Varsha Khodiyar
  • Tony Burdett
  • Nicky Mulder

Presentation Overview:Show

This BOSC 2025 panel will tackle the essential challenge of Data Sustainability, defined as the proactive and principled approach to ensuring bioinformatics research data remains FAIR, ethically managed, and valuable for future generations through sufficient infrastructure, funding, expertise, and governance. In light of current funding pressures and the risk of data loss that impedes scientific progress and wastes resources, establishing sustainable practices has become more urgent than ever. This discussion will incorporate diverse perspectives to examine practical strategies and solutions across key areas, including FAIR/CARE principles, funding models, open science, data lifecycle management, technical scalability, and ethical considerations.

July 22, 2025
17:50-18:00
Closing Remarks
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Monica Munoz-Torres


Authors List: Show

  • Nomi Harris