The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 23, 2025
11:20-11:40
Knowledge-Graph-driven and LLM-enhanced Microbial Growth Predictions
Confirmed Presenter: Marcin Joachimiak, Lawrence Berkeley National Laboratory, United States
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Tiffany Callahan


Authors List: Show

  • Marcin Joachimiak, Marcin Joachimiak, Lawrence Berkeley National Laboratory

Presentation Overview:Show

Predicting microbial growth preferences has far-reaching impacts in biotechnology, healthcare, and environmental management. Cultivating microbes allows researchers to streamline strain selection, develop targeted antimicrobials, and uncover metabolic pathways for biodegradation or biomanufacturing. However, with most microbial taxa remaining uncultivated and knowledge of their metabolic capabilities and organismal traits fragmented in unstructured text, cultivation remains a major challenge. To address this, we developed KG-Microbe, a knowledge graph (KG) of over 800,000 bacterial and archaeal taxa, 3,000 types of traits, and 30,000 types of functional annotations.
Using KG-Microbe, we constructed machine learning pipelines to predict microbial growth preferences. We compared symbolic rule mining, which produces human-readable explanations, with "black box" methods like gradient boosted decision trees and deep graph-based models. While boosted tree models achieved a mean precision of over 70% across 46 diverse media, we demonstrate that symbolic rule mining can match their performance, offering crucial interpretability. To further validate predictions, we used large language models (LLMs) to interpret and explain model outputs.
By comparing these different models and their outputs, we identified key data features and knowledge gaps relevant to predicting microbial cultivation media preferences. We also used vector embedding analogy reasoning as well as complex graph queries on KG-Microbe to generate novel hypotheses and identify organisms with specific properties. Our work highlights the power of a KG-driven approach and the trade-offs between model interpretability and predictive performance. These findings motivate the development of hybrid AI models that combine transparency with predictive accuracy to advance microbial cultivation.

July 23, 2025
11:40-12:00
ProDiGenIDB – a unified resource of disease-associated genes, their protein products, and intrinsic disorder annotations
Confirmed Presenter: Jovana Kovacevic, Faculty of Mathematics, Belgrade University
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Tiffany Callahan


Authors List: Show

  • Jovana Kovacevic, Jovana Kovacevic, Faculty of Mathematics
  • Anđelka Zečević, Anđelka Zečević, Mathematical Institute
  • Lazar Vasović, Lazar Vasović, Faculty of Mathematics

Presentation Overview:Show

Understanding gene-disease associations is essential in biomedical research, yet relevant information is often distributed across multiple heterogeneous databases. To overcome this inconsistency, we developed ProDiGenIDB, an integrated database that consolidates gene-disease relationships from several recognized and publicly available sources, while also enriching them with complementary data on gene and protein identifiers, disease ontology, and protein structural disorder.

ProDiGenIDB brings together over 400,000 curated associations sourced from DisGeNet, COSMIC, HumsaVar, Orphanet, ClinVar, HPO, and DISEASES. Each entry includes gene-related metadata (Gene Symbol, Entrez ID, UniProt ID, Ensembl ID), disease descriptors (Disease Name, DOID), and a reference to the original source database.

Importantly, the database also incorporates predicted intrinsic disorder information for proteins encoded by the associated genes. These predictions were generated using commonly used protein disorder prediction tools such as IUPred and VSL2, providing an additional insight into potential the lack of structure of disease-related proteins.

Another important aspect of the database construction involved mapping disease names to standardized Disease Ontology IDs (DOIDs). To improve this process, we applied Natural Language Processing (NLP) techniques using advanced text representation models to enhance the accuracy and consistency of term association.
ProDiGenIDB represents a valuable resource for integrative biomedical studies, particularly in contexts where protein disorder is hypothesized to play a functional or pathological role.

July 23, 2025
12:00-12:20
Causal knowledge graph analysis identifies adverse drug effects
Confirmed Presenter: Sumyyah Toonsi, King Abdullah Unversity of Science and Technology, Saudi Arabia
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Tiffany Callahan


Authors List: Show

  • Sumyyah Toonsi, Sumyyah Toonsi, King Abdullah Unversity of Science and Technology
  • Paul Schofield, Paul Schofield, Cambridge University
  • Robert Hoehndorf, Robert Hoehndorf, King Abdullah Unversity of Science and Technology

Presentation Overview:Show

Motivation: Knowledge graphs and structural causal models have each proven valuable for organizing biomedical
knowledge and estimating causal effects, but remain largely disconnected: knowledge graphs encode qualitative
relationships focusing on facts and deductive reasoning without formal probabilistic semantics, while causal models lack
integration with background knowledge in knowledge graphs and have no access to the deductive reasoning capabilities
that knowledge graphs provide.
Results: To bridge this gap, we introduce a novel formulation of Causal Knowledge Graphs (CKGs) which extend
knowledge graphs with formal causal semantics, preserving their deductive capabilities while enabling principled
causal inference. CKGs support deconfounding via explicitly marked causal edges and facilitate hypothesis formulation
aligned with both encoded and entailed background knowledge. We constructed a Drug–Disease CKG (DD-CKG)
integrating disease progression pathways, drug indications, side-effects, and hierarchical disease classification to enable
automated large-scale mediation analysis. Applied to UK Biobank and MIMIC-IV cohorts, we tested whether drugs
mediate effects between indications and downstream disease progression, adjusting for confounders inferred from the
DD-CKG. Our approach successfully reproduced known adverse drug reactions with high precision while identifying
previously undocumented significant candidate adverse effects. Further validation through side effect similarity analysis
demonstrated that combining our predicted drug effects with established databases significantly improves the prediction
of shared drug indications, supporting the clinical relevance of our novel findings. These results demonstrate that our
methodology provides a generalizable, knowledge-driven framework for scalable causal inference.

July 23, 2025
12:20-12:40
CROssBARv2: A Unified Biomedical Knowledge Graph for Heterogeneous Data Representation and LLM-Driven Exploration
Confirmed Presenter: Erva Ulusoy, Hacettepe University, Turkey
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Tiffany Callahan


Authors List: Show

  • Bünyamin Şen, Bünyamin Şen, Hacettepe University
  • Erva Ulusoy, Erva Ulusoy, Hacettepe University
  • Melih Darcan, Melih Darcan, Hacettepe University
  • Mert Ergün, Mert Ergün, Hacettepe University
  • Tunca Dogan, Tunca Dogan, Hacettepe University

Presentation Overview:Show

Developing effective therapeutics against prevalent diseases requires a deep understanding of molecular, genetic, and cellular factors involved in disease development/progression. However, such knowledge is dispersed across different databases, publications, and ontologies, making collecting, integrating and analysing biological data a major challenge. Here, we present CROssBARv2, an extended and improved version of our previous work (https://crossbar.kansil.org/), a heterogeneous biological knowledge graph (KG) based system to facilitate systems biology and drug discovery/repurposing. CROssBARv2 collects large-scale biological data from 32 data sources and stores them in a Neo4j graph database. CROssBARv2 consists of 2,709,502 nodes and 12,688,124 relationships between 14 node types. On top of that, we developed a GraphQL API and a large language model interface to convert users’ natural language-based queries into Neo4j's Cypher query language back and forth to access information within the KG and answer specific scientific questions without LLM hallucinations, mainly to facilitate the usage of the resource. To evaluate the capability of CROssBAR-LLMs (LLMs augmented with structured knowledge in CROssBAR) in answering biomedical questions, we constructed multiple benchmark datasets and employed an independent benchmark to systematically compare various open- and closed-source LLMs. Our results revealed that CROssBAR-LLMs display a significantly improved accuracy in answering these scientific questions compared to standalone LLMs and even LLMs augmented with web search. CROssBARv2 (https://crossbarv2.hubiodatalab.com/) is expected to contribute to life sciences research considering (i) the discovery of disease mechanisms at the molecular level and (ii) the development of effective personalised therapeutic strategies.

July 23, 2025
12:40-12:45
Benchmarking Data Leakage on Link Prediction in Biomedical Knowledge Graph Embeddings
Confirmed Presenter: Galadriel Brière, Aix Marseille Univ, INSERM
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Tiffany Callahan


Authors List: Show

  • Galadriel Brière, Galadriel Brière, Aix Marseille Univ
  • Thomas Stosskopf, Thomas Stosskopf, Aix Marseille Univ
  • Benjamin Loire, Benjamin Loire, Aix Marseille Univ
  • Anaïs Baudot, Anaïs Baudot, Aix Marseille Univ

Presentation Overview:Show

In recent years, Knowledge Graphs (KGs) have gained significant attention for their ability to organize massive biomedical knowledge into entities and relationships. Knowledge Graph Embedding (KGE) models facilitate efficient exploration of KGs by learning compact data representations. These models are increasingly applied on biomedical KGs for various tasks, notably link prediction that enables applications such as drug repurposing.

The research community has implemented benchmarks to evaluate and compare the large diversity of KGE models. However, existing benchmarks often overlook the issue of Data Leakage (DL), which can lead to inflated performance and compromise the validity of benchmark results. DL may occur due to inadequate separation between training and test sets (DL1), use of illegitimate features (DL2), or evaluation settings that fail to reflect real-world inference conditions (DL3).

In this study, we implement systematic procedures to detect and mitigate these sources of DL. We evaluate popular KGE models on a biomedical KG and show that inappropriate data separation (DL1) artificially inflates model performances and that models do not rely on node degree as a shortcut feature (DL2). For DL3, we implement realistic inference conditions with i) a zero-shot training procedure in which drugs in test and validation sets have no known indications during training and ii) a drug repurposing ground-truth for rare diseases. Performances collapse in both these scenarios.

Our findings highlight the need for more rigorous evaluation protocols and raise concerns about the reliability of current KGE models for real-world biomedical applications such as drug repurposing.

July 23, 2025
12:45-12:50
A machine learning framework for extracting and structuring biological pathway knowledge from scientific literature
Confirmed Presenter: Mun Su Kwon, Korea Advanced Institute of Science and Technology (KAIST), South Korea
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Tiffany Callahan


Authors List: Show

  • Mun Su Kwon, Mun Su Kwon, Korea Advanced Institute of Science and Technology (KAIST)
  • Junkyu Lee, Junkyu Lee, Korea Advanced Institute of Science and Technology (KAIST)
  • Haechan Sung, Haechan Sung, Korea Advanced Institute of Science and Technology (KAIST)
  • Hyun Uk Kim, Hyun Uk Kim, Korea Advanced Institute of Science and Technology (KAIST)

Presentation Overview:Show

Advances in text mining have significantly improved the accessibility of scientific knowledge from literature. However, a major challenge in biology and biotechnology remains in extracting information embedded within biological pathway images, which are not easily accessible through conventional text-based methods. To overcome this limitation, we present a machine learning–based framework called “Extraction of Biological Pathway Information” (EBPI). The framework systematically retrieves relevant publications based on user-defined queries, identifies biological pathway figures, and extracts structured information such as genes, enzymes, and metabolites. EBPI combines image processing and natural language models to identify texts from diagrams, classify terms into biological categories, and infer biochemical reaction directionality using graphical cues such as arrows. The extracted information is output in an editable, tabular format suitable for integration with pathway databases and knowledge graphs. Validated against manually curated pathway maps, EBPI enables scalable knowledge extraction from complex visual data of biological pathways and opens new directions for automated literature curation across many biological disciplines.

July 23, 2025
12:50-13:00
Invited Presentation: Poster Madness
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Tiffany Callahan


Authors List: Show

Presentation Overview:Show

Each accepted poster presenter is given up 1 minute to advertise their poster.

July 23, 2025
14:00-14:20
Proceedings Presentation: ScGOclust: leveraging gene ontology to find functionally analogous cell types between distant species
Confirmed Presenter: Yuyao Song, European Bioinformatics Institute, United Kingdom
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Yuyao Song, Yuyao Song, European Bioinformatics Institute
  • Yanhui Hu, Yanhui Hu, Department of Genetics
  • Julian Dow, Julian Dow, School of Molecular Biosciences
  • Norbert Perrimon, Norbert Perrimon, Department of Genetics
  • Irene Papatheodorou, Irene Papatheodorou, European Bioinformatics Institute; Earlham Institute and University of East Anglia

Presentation Overview:Show

Basic biological processes are shared across animal species, yet their cellular mechanisms are profoundly diverse. Comparing cell-type gene expression between species reveals conserved and divergent cellular functions. However, as phylogenetic distance increases, gene-based comparisons become less informative. The Gene Ontology (GO) knowledgebase offers a solution by serving as the most comprehensive resource of gene functions across a vast diversity of species, providing a bridge for distant species comparisons. Here, we present scGOclust, a computational tool that constructs de novo cellular functional profiles using GO terms, facilitating systematic and robust comparisons within and across species. We applied scGOclust to analyse and compare the heart, gut and kidney between mouse and fly, and whole-body data from C.elegans and H.vulgaris. We show that scGOclust effectively recapitulates the function spectrum of different cell types, characterises functional similarities between homologous cell types, and reveals functional convergence between unrelated cell types. Additionally, we identified subpopulations within the fly crop that show circadian rhythm-regulated secretory properties and hypothesize an analogy between fly principal cells from different segments and distinct mouse kidney tubules. We envision scGOclust as an effective tool for uncovering functionally analogous cell types or organs across distant species, offering fresh perspectives on evolutionary and functional biology.

July 23, 2025
14:20-14:40
Integrating autoantibody-related knowledge in an ontology populated using a curated dataset from literature
Confirmed Presenter: Fabien Maury, Inserm, France
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Fabien Maury, Fabien Maury, Inserm
  • Solène Grosdidier, Solène Grosdidier, BlueMed Writing
  • Killian Halberda, Killian Halberda, Inserm
  • Isabelle Desguerre, Isabelle Desguerre, AP-HP
  • Adrien Coulet, Adrien Coulet, Inria
  • Maud de Dieuleveult, Maud de Dieuleveult, Inserm

Presentation Overview:Show

Autoimmune diseases (AIDs) are often characterized by the presence of autoantibodies (AAbs). But many of these diseases are rare and can be hard to diagnose, partly due to the lack of easily accessible knowledge such as the type of AAb to test for, in order to diagnose a particular AID. Indeed, to our knowledge, no centralized resource including all available knowledge
related to human autoantibodies exists as of 04-2025.
To fill this gap, first, we introduce a light ontology that allows to represent relationships about AAbs, their molecular targets, and the related AIDs and their clinical signs. Also, this ontology allows to specify the provenance of the relationships, by reusing the PROV-O ontology.
Second, we introduce the MAKAAO Core dataset, a dataset compiled manually from the literature by several curators. MAKAAO Core includes the name and synonyms (both in English and French) of over 350 autoantibodies, along with their targets and associated AIDs. Targets and diseases are referred to using identifiers from reference resources.
We used this dataset to populate our ontology, and named the result the MAKAAO knowledge graph (MAKAAO KG), which constitutes the central part of a future reference resource.

July 23, 2025
14:40-15:00
Ontology pre-training improves machine learning predictions of aqueous solubility and other metabolite properties
Confirmed Presenter: Charlotte Tumescheit, University of Zurich, Swiss Institute of Bioinformatics
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Charlotte Tumescheit, Charlotte Tumescheit, University of Zurich
  • Martin Glauer, Martin Glauer, Otto-von-Guericke University Magdeburg
  • Simon Flügel, Simon Flügel, Osnabrück University
  • Fabian Neuhaus, Fabian Neuhaus, Otto-von-Guericke University Magdeburg
  • Till Mossakowski, Till Mossakowski, Osnabrück University
  • Janna Hastings, Janna Hastings, Unversity of Zurich

Presentation Overview:Show

Predicting properties of small molecule metabolites from structures is a challenging task. Molecular language models have emerged as a highly performant AI approach for prediction of diverse properties directly from ‘language-like’ representations of the structures of molecules. However, for many prediction problems, there is a shortage of available training data and model performance is still limited.

Integrating expert knowledge into language models has the potential to improve performance on prediction tasks and model generalisability. Bio-ontologies offer curated knowledge ideal for this purpose. Here, we demonstrate a novel approach to knowledge injection, ‘ontology pre-training’, which we have previously shown to work for a pilot case study in the classification task of toxicity prediction. Now, we extend this to regression tasks such as solubility prediction and a wider range of classification tasks.

First, we pre-train a Transformer-based language model on molecules from PubChem. Then, using our novel method, we embed the knowledge contained in a classification hierarchy derived from the ChEBI ontology into the model as an intermediate training step between general-purpose pre-training and task-specific fine-tuning. Finally, we fine-tune the models on a range of regression tasks. We find a clear improvement in performance and training times across the diverse prediction tasks.

Our results show that adding an additional knowledge-based training step to a machine learning model can improve performance. Our method is intuitive and generalisable and we plan to extend it to further biological modalities and prediction datasets, including proteins and RNA, as well as exploring the impact of different ontologies.

July 23, 2025
15:00-15:20
Building the Aging Biomarkers Ontology and Its Applications in Aging Research
Confirmed Presenter: Hande McGinty, Kansas State University, Manhattan KS
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Hande McGinty, Hande McGinty, Kansas State University
  • Srikar Reddy Gadusu, Srikar Reddy Gadusu, Kansas State University
  • Yigit Kucuk, Yigit Kucuk, KONCORDANT Lab
  • Aaron King, Aaron King, Aeon Biomarkers

Presentation Overview:Show

Aging is a complex biological process shaped by numerous biomarkers—such as cholesterol and blood sugar levels—that serve as measurable indicators of health and disease. Despite the abundance of biomarker data, identifying meaningful patterns and relationships remains a significant challenge. To address this, we began developing the Aging Biomarkers Ontology (ABO), a structured framework that formally defines aging-related biomarkers, organizes them hierarchically, and maps their interconnections to facilitate deeper analysis. Furthermore, we employed two complementary approaches to enrich the graph and uncover hidden associations among aging biomarkers: Depth-Limited Search (DLS) and machine learning-based embedding search. DLS identifies associations by traversing connected nodes within a predefined depth, while the embedding-based method encodes biomarker relationships as numerical vectors and uses cosine similarity to predict potential links. We evaluated the performance of both methods in detecting known and novel relationships. Our results demonstrate the value of systematically integrating statistical analysis with graph-based reasoning and machine learning to explore aging-related biomarkers. The resulting framework enhances the interpretability of biomarker data, supports hypothesis generation, and contributes to advancing biomedical research in aging and longevity.

July 23, 2025
15:20-15:40
Discovering cellular contributions to disease pathogenesis in the NLM Cell Knowledge Network
Confirmed Presenter: Richard Scheuermann, Division of Intramural Research, National Library of Medicine
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Richard Scheuermann, Richard Scheuermann, Division of Intramural Research
  • Anne Deslattes Mays, Anne Deslattes Mays, Division of Intramural Research
  • Matthew Diller, Matthew Diller, Division of Intramural Research
  • Caroline Eastwood, Caroline Eastwood, Wellcome Sanger Institute
  • Rezarta Islamaj, Rezarta Islamaj, Division of Intramural Research
  • James Leaman, James Leaman, Division of Intramural Research
  • Raymond LeClair, Raymond LeClair, Division of Intramural Research
  • Zhiyong Lu, Zhiyong Lu, Division of Intramural Research
  • Chris Mungall, Chris Mungall, Lawrence Berkeley National Laboratory
  • Vinh Nguyen, Vinh Nguyen, Division of Intramural Research
  • David Osumi-Sutherland, David Osumi-Sutherland

Presentation Overview:Show

Knowledge about the role of genes in disease pathogenesis has been obtained from genetic and genome-wide association studies. The proteins encoded by these genes are frequently found to be effective therapeutic targets. However, little is known about which cells are the functional home of these disease-associated genes and proteins. Single cell genomic technologies are now revealing the cellular complexity of human tissues at high resolution. The transcriptomes defined by these technologies reflect the functional cellular phenotypes. Database resources that capture and disseminate data derived from these single cell technologies have been developed. But the knowledge derived from their analysis and interpretation is largely buried as free text in the scientific literature.
Here we describe the development of a Cell Knowledge Network (CKN) prototype at the National Library of Medicine (NLM) that captures and exposes knowledge about cell phenotypes (cell types and states) derived from single cell technologies and related experiments. NLM-CKN is populated using validated computational analysis pipelines and natural language processing of the scientific literature and integrated with other sources of relevant knowledge about genes, anatomical structures, diseases, and drugs.
Using this integration of experimental sc/snRNAseq data with prior knowledge about disease predispositions and drug targets, a novel linkage between lung pericytes and pulmonary hypertension was discovered through the KCNK3 gene intermediary with implications for novel therapeutic interventions.
Through the integration of knowledge from single cell technologies with other sources of knowledge about genetic predispositions and therapeutic targets, the NLM-CKN is revealing the cellular contributions to disease pathogenesis.

July 23, 2025
15:40-16:00
Cat-VRS for Genomic Knowledge Curation: A Hyperintensional Representation Framework for FAIR Categorical Variation
Confirmed Presenter: Daniel Puthawala, Nationwide Children's Hospital, United States
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Daniel Puthawala, Daniel Puthawala, Nationwide Children's Hospital
  • Brendan Reardon, Brendan Reardon, Dana-Farber Cancer Institute

Presentation Overview:Show

Cat-VRS: A FAIR catvar Standard
Categorical variants (catvars)—such as “MET exon 14 skipping” and “TP53 loss”—are foundational to genomic knowledge, linking sets of genomic variants to clinically relevant assertions like oncogenicity scores or predicted therapeutic response. Yet despite their importance, catvars remain unstandardized, ambiguous, and largely non-computable, creating persistent barriers to search, curation, interoperability, and reuse. Existing standards either offer flexible models for sequence-resolved variants (e.g., GA4GH VRS) or rigid top-down nomenclatures (e.g., HGVS) that fail to capture the diversity and nuance of categorical assertions.

We present the Categorical Variation Representation Specification (Cat-VRS), a new GA4GH standard for representing catvars using a hyperintensional, constraint-based model. Cat-VRS encodes categorical meaning compositionally and bottom-up: structured constraints—such as sequence location or protein functional consequence—support precise, flexible representations at varying levels of granularity. Cat-VRS is fully interoperable with other GA4GH standards, supports ontology mappings, and was developed through global community collaboration in alignment with the FAIR data principles.

Cat-VRS 1.0 was recently released by GA4GH and is already in use by ClinVar and MaveDB, with integration underway in CIViC and the VICC MetaKB. These early implementations demonstrate Cat-VRS’s practical utility in enabling reusable, computable representations of categorical knowledge.

As precision medicine scales, so too does the need for infrastructure that supports consistent curation, standardized data sharing, and automated variant knowledge matching. We invite the bio-ontologies and knowledge representation community to engage with Cat-VRS as both a practical tool and an extensible framework for advancing interoperable genomic knowledge.

July 23, 2025
16:40-17:40
Invited Presentation: Knowledge Graphs: Theory, Applications and Challenges
Confirmed Presenter: Ian Horrocks
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Augustin Luna


Authors List: Show

  • Ian Horrocks

Presentation Overview:Show

Knowledge Graphs have rapidly become a mainstream technology that combines features of databases and AI. In this talk I will introduce Knowledge Graphs, explaining their features and the theory behind them. I will then consider some of the challenges inherent in both the theory and implementation of Knowledge Graphs and present some solutions that have made possible the development of popular language standards and robust and high-performance Knowledge Graph systems. Finally, I will illustrate the wide applicability of knowledge graph technology with some example use cases.

July 23, 2025
17:40-17:45
Bridging Language Barriers in Bio-Curation: An LLM-Enhanced Workflow for Ontology Translation into Japanese
Confirmed Presenter: Mark Streer, SciBite (Elsevier Ltd.), United Kingdom
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Augustin Luna


Authors List: Show

  • Mark Streer, Mark Streer, SciBite (Elsevier Ltd.)
  • Olivia Watson, Olivia Watson, SciBite (Elsevier Ltd.)
  • Mark McDowall, Mark McDowall, SciBite (Elsevier Ltd.)
  • Jane Lomax, Jane Lomax, SciBite (Elsevier Ltd.)

Presentation Overview:Show

SciBite’s ontology management and named entity recognition (NER) software relies on curated public ontologies to support data harmonization under FAIR principles (findable, accessible, interoperable, and reusable). Public ontologies are foundational for data FAIR-ification, providing structured vocabularies that enable consistent annotation and semantic integration; however, they are predominantly developed in English, creating barriers for non-English users and applications. To address this challenge for our Japanese customers, we developed a large language model (LLM)-enhanced bio-curation workflow for English-to-Japanese translation, focusing on synonym enrichment of the Uberon anatomy ontology as a case study. Our approach implements a three-step process: (1) importing mapped Japanese synonyms from existing bilingual datasets (e.g., DBCLS resources), (2) generating Japanese candidate synonyms based on English synonyms and definitions using an LLM, and (3) validating candidates against the source ontology to ensure appropriate placement as well as online dictionaries and other references to confirm their real-world applicability. Initially developed for synonym enrichment, this workflow can be extended to semantic refinement into broadMatch and narrowMatch relationships in addition to exactMatch—critical for terminology lacking perfect English equivalents. Furthermore, the workflow is well-suited to agentic frameworks such as LangGraph to orchestrate generation and Internet research processes, as well as LLM-ensemble evaluation to automatically confirm clear matches, allowing ambiguous cases to be prioritized for “human-in-the-loop" curation. This approach represents a promising solution for scalable ontology translation, contributing to the FAIR development and application of bio-ontologies across language barriers and enhancing international biomedical research collaboration.

July 23, 2025
17:45-17:50
Enabling FAIR Single-Cell RNAseq Data Management with COPO
Confirmed Presenter: Felix Shaw, Earlham Institute, United Kingdom
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Augustin Luna


Authors List: Show

  • Felix Shaw, Felix Shaw, Earlham Institute
  • Debby Ku, Debby Ku, Earlham Institute
  • Aaliyah Providence, Aaliyah Providence, Earlham Institute
  • Irene Papatheodorou, Irene Papatheodorou, Earlham Institute

Presentation Overview:Show

We present our work on establishing standards and tools for validating and submitting single-cell RNA sequencing (scRNA-seq) data and metadata using the COPO brokering platform. Effective research data management is essential for enabling data reuse, integration, and the discovery of new biological insights. As new technologies like single-cell sequencing and transcriptomics emerge, they often outpace existing data infrastructure.

Single-cell technologies allow detailed insights into biological processes, for example, tracking gene expression dynamics in crops, dissecting pathogen-host interactions at the cellular level, or identifying stress-resilient cell types. Yet without comprehensive metadata and appropriate data management tools, the full potential of these datasets remains unrealised.

Implementing the FAIR principles—particularly around metadata quality is crucial. At present, there are few widely adopted standards or tools for describing scRNA-seq experiments. In response, we have developed a structured metadata template tailored to these experiments, informed by extensive consultation with researchers across the single-cell community and aligned with existing standards.

This metadata standard is integrated into COPO, which provides a streamlined interface for validating and brokering data and metadata to public repositories. Standardised metadata improves discoverability, supports data integration across platforms, and enables consistent reuse. It also ensures proper attribution, facilitates collaboration across diverse disciplines, and enhances reproducibility.

By submitting with FAIR metadata viaSingle-cell RNA-seq COPO, we transform scRNA-seq outputs from isolated experimental results into well-labelled, interoperable datasets suitable for downstream applications such as machine learning. Our work addresses a key infrastructure gap, enabling more effective, collaborative, and impactful research in the single-cell field.

July 23, 2025
17:50-17:55
Cancer Complexity Knowledge Portal: A centralized web portal for finding cancer related data, software tools, and other resources
Confirmed Presenter: Susheel Varma, Sage Bionetworks, United States
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Augustin Luna


Authors List: Show

  • Orion Banks, Orion Banks, Sage Bionetworks
  • Ashley Clayton, Ashley Clayton, Sage Bionetworks
  • Aditi Gopalan, Aditi Gopalan, Sage Bionetworks
  • Amber Nelson, Amber Nelson, Sage Bionetworks
  • Stockard Simon, Stockard Simon, Sage Bionetworks
  • Verena Chung, Verena Chung, Sage Bionetworks
  • Amy Heiser, Amy Heiser, Sage Bionetworks
  • Jay Hodgson, Jay Hodgson, Sage Bionetworks
  • Aditya Nath, Aditya Nath, Sage Bionetworks
  • Adam Hindman, Adam Hindman, Sage Bionetworks
  • Milen Nikolov, Milen Nikolov, Sage Bionetworks
  • Adam Taylor, Adam Taylor, Sage Bionetworks
  • James Eddy, James Eddy, Sage Bionetworks
  • Susheel Varma, Susheel Varma, Sage Bionetworks
  • Jineta Banerjee, Jineta Banerjee, Sage Bionetworks
  • Aditya Nath, Aditya Nath, Sage Bionetworks

Presentation Overview:Show

Applying artificial intelligence and machine learning to biomedical problems requires clean, high-quality data and reusable software tools. The Cancer Complexity Knowledge Portal (CCKP), a NIH-listed domain-specific repository maintained by the Multi-Consortia Coordinating (MC2) Center at Sage Bionetworks, makes oncology data findable and accessible. The MC2 Center coordinates resources among six cancer-focused research consortia funded by the National Cancer Institute.

To establish metadata standards, the CCKP hosts data models for various modalities, including genomics and imaging. New models are also being developed for emerging types, such as spatial transcriptomics. These models undergo iterative development with versioned releases maintained in a public GitHub repository. They power data management tools developed by Sage Bionetworks, including the Schematic Python package and the Data Curator App, which support FAIR data annotation.

The data models help researchers link research outputs and assist the CCKP in highlighting activities from NCI-funded cancer research programs. The portal offers search and filtering capabilities to accelerate discovery and collaboration. As of November 2024, it hosts information on 3,786 publications, 904 datasets, and 292 computational tools from over 140 research grants. The models incorporate elements from the Cancer Research Data Commons Data Hub to support integration within the CRDC ecosystem.

We are engaging with scientists, clinicians, and patient advocates to leverage user-centred design and structured data models, making cancer data more findable, accessible, and reusable. These improvements aim to bridge the gap between experimental and computational labs, fueling scientific discovery.

July 23, 2025
17:55-18:00
COSI Closing Remarks
Track: BOKR: Bio-Ontologies and Knowledge Representation

Room: 03A
Format: In person
Moderator(s): Augustin Luna


Authors List: Show

  • Augustin Luna
  • Tiffany Callahan