Return to ISMB/ECCB 2025 Homepage Click here for the abridged agenda
Select Track: 3DSIG | Bio-Ontologies and Knowledge Representation | BioInfo-Core | Bioinfo4Women Meet-Up | Bioinformatics in the UK | BioVis | BOSC | CAMDA | CollaborationFest | CompMS | Computational Systems Immunology | Distinguished Keynotes | Dream Challenges | Education | Equity and Diversity | EvolCompGen | Fellows Presentation | Function | General Computational Biology | HiTSeq | iRNA | ISCB-China Workshop | JPI | MICROBIOME | MLCSB | NetBio | NIH Cyberinfrastructure and Emerging Technologies Sessions | NIH/Elixir | Publications - Navigating Journal Submissions | RegSys | Special Track | Stewardship Critical Infrastructure | Student Council Symposium | SysMod | Tech Track | Text Mining | The Innovation Pipeline: How Industry & Academia Can Work Together in Computational Biology | TransMed | Tutorials | VarI | WEB 2025 | Youth Bioinformatics Symposium | All
NOTE: Browser resolution may limit the width of the agenda and you may
need to scroll the iframe to see additional columns.
Click the buttons below to download your current table in that format
Date | Start Time | End Time | Room | Track | Title | Confrimed Presenter | Format | Authors | Abstract |
---|---|---|---|---|---|---|---|---|---|
2025-07-23 | 11:20:00 | 11:40:00 | 03A | Bio-Ontologies and Knowledge Representation | Knowledge-Graph-driven and LLM-enhanced Microbial Growth Predictions | Marcin Joachimiak | Marcin Joachimiak | Predicting microbial growth preferences has far-reaching impacts in biotechnology, healthcare, and environmental management. Cultivating microbes allows researchers to streamline strain selection, develop targeted antimicrobials, and uncover metabolic pathways for biodegradation or biomanufacturing. However, with most microbial taxa remaining uncultivated and knowledge of their metabolic capabilities and organismal traits fragmented in unstructured text, cultivation remains a major challenge. To address this, we developed KG-Microbe, a knowledge graph (KG) of over 800,000 bacterial and archaeal taxa, 3,000 types of traits, and 30,000 types of functional annotations. Using KG-Microbe, we constructed machine learning pipelines to predict microbial growth preferences. We compared symbolic rule mining, which produces human-readable explanations, with "black box" methods like gradient boosted decision trees and deep graph-based models. While boosted tree models achieved a mean precision of over 70% across 46 diverse media, we demonstrate that symbolic rule mining can match their performance, offering crucial interpretability. To further validate predictions, we used large language models (LLMs) to interpret and explain model outputs. By comparing these different models and their outputs, we identified key data features and knowledge gaps relevant to predicting microbial cultivation media preferences. We also used vector embedding analogy reasoning as well as complex graph queries on KG-Microbe to generate novel hypotheses and identify organisms with specific properties. Our work highlights the power of a KG-driven approach and the trade-offs between model interpretability and predictive performance. These findings motivate the development of hybrid AI models that combine transparency with predictive accuracy to advance microbial cultivation. | |
2025-07-23 | 11:40:00 | 12:00:00 | 03A | Bio-Ontologies and Knowledge Representation | ProDiGenIDB – a unified resource of disease-associated genes, their protein products, and intrinsic disorder annotations | Jovana Kovacevic | Jovana Kovacevic, Anđelka Zečević, Lazar Vasović | Understanding gene-disease associations is essential in biomedical research, yet relevant information is often distributed across multiple heterogeneous databases. To overcome this inconsistency, we developed ProDiGenIDB, an integrated database that consolidates gene-disease relationships from several recognized and publicly available sources, while also enriching them with complementary data on gene and protein identifiers, disease ontology, and protein structural disorder. ProDiGenIDB brings together over 400,000 curated associations sourced from DisGeNet, COSMIC, HumsaVar, Orphanet, ClinVar, HPO, and DISEASES. Each entry includes gene-related metadata (Gene Symbol, Entrez ID, UniProt ID, Ensembl ID), disease descriptors (Disease Name, DOID), and a reference to the original source database. Importantly, the database also incorporates predicted intrinsic disorder information for proteins encoded by the associated genes. These predictions were generated using commonly used protein disorder prediction tools such as IUPred and VSL2, providing an additional insight into potential the lack of structure of disease-related proteins. Another important aspect of the database construction involved mapping disease names to standardized Disease Ontology IDs (DOIDs). To improve this process, we applied Natural Language Processing (NLP) techniques using advanced text representation models to enhance the accuracy and consistency of term association. ProDiGenIDB represents a valuable resource for integrative biomedical studies, particularly in contexts where protein disorder is hypothesized to play a functional or pathological role. | |
2025-07-23 | 12:00:00 | 12:20:00 | 03A | Bio-Ontologies and Knowledge Representation | Causal knowledge graph analysis identifies adverse drug effects | Sumyyah Toonsi | Sumyyah Toonsi, Paul Schofield, Robert Hoehndorf | Motivation: Knowledge graphs and structural causal models have each proven valuable for organizing biomedical knowledge and estimating causal effects, but remain largely disconnected: knowledge graphs encode qualitative relationships focusing on facts and deductive reasoning without formal probabilistic semantics, while causal models lack integration with background knowledge in knowledge graphs and have no access to the deductive reasoning capabilities that knowledge graphs provide. Results: To bridge this gap, we introduce a novel formulation of Causal Knowledge Graphs (CKGs) which extend knowledge graphs with formal causal semantics, preserving their deductive capabilities while enabling principled causal inference. CKGs support deconfounding via explicitly marked causal edges and facilitate hypothesis formulation aligned with both encoded and entailed background knowledge. We constructed a Drug–Disease CKG (DD-CKG) integrating disease progression pathways, drug indications, side-effects, and hierarchical disease classification to enable automated large-scale mediation analysis. Applied to UK Biobank and MIMIC-IV cohorts, we tested whether drugs mediate effects between indications and downstream disease progression, adjusting for confounders inferred from the DD-CKG. Our approach successfully reproduced known adverse drug reactions with high precision while identifying previously undocumented significant candidate adverse effects. Further validation through side effect similarity analysis demonstrated that combining our predicted drug effects with established databases significantly improves the prediction of shared drug indications, supporting the clinical relevance of our novel findings. These results demonstrate that our methodology provides a generalizable, knowledge-driven framework for scalable causal inference. | |
2025-07-23 | 12:20:00 | 12:40:00 | 03A | Bio-Ontologies and Knowledge Representation | CROssBARv2: A Unified Biomedical Knowledge Graph for Heterogeneous Data Representation and LLM-Driven Exploration | Bünyamin Şen | Bünyamin Şen, Erva Ulusoy, Melih Darcan, Mert Ergün, Tunca Dogan | Developing effective therapeutics against prevalent diseases requires a deep understanding of molecular, genetic, and cellular factors involved in disease development/progression. However, such knowledge is dispersed across different databases, publications, and ontologies, making collecting, integrating and analysing biological data a major challenge. Here, we present CROssBARv2, an extended and improved version of our previous work (https://crossbar.kansil.org/), a heterogeneous biological knowledge graph (KG) based system to facilitate systems biology and drug discovery/repurposing. CROssBARv2 collects large-scale biological data from 32 data sources and stores them in a Neo4j graph database. CROssBARv2 consists of 2,709,502 nodes and 12,688,124 relationships between 14 node types. On top of that, we developed a GraphQL API and a large language model interface to convert users’ natural language-based queries into Neo4j's Cypher query language back and forth to access information within the KG and answer specific scientific questions without LLM hallucinations, mainly to facilitate the usage of the resource. To evaluate the capability of CROssBAR-LLMs (LLMs augmented with structured knowledge in CROssBAR) in answering biomedical questions, we constructed multiple benchmark datasets and employed an independent benchmark to systematically compare various open- and closed-source LLMs. Our results revealed that CROssBAR-LLMs display a significantly improved accuracy in answering these scientific questions compared to standalone LLMs and even LLMs augmented with web search. CROssBARv2 (https://crossbarv2.hubiodatalab.com/) is expected to contribute to life sciences research considering (i) the discovery of disease mechanisms at the molecular level and (ii) the development of effective personalised therapeutic strategies. | |
2025-07-23 | 12:40:00 | 12:45:00 | 03A | Bio-Ontologies and Knowledge Representation | Benchmarking Data Leakage on Link Prediction in Biomedical Knowledge Graph Embeddings | Galadriel Brière | Galadriel Brière, Thomas Stosskopf, Benjamin Loire, Anaïs Baudot | In recent years, Knowledge Graphs (KGs) have gained significant attention for their ability to organize massive biomedical knowledge into entities and relationships. Knowledge Graph Embedding (KGE) models facilitate efficient exploration of KGs by learning compact data representations. These models are increasingly applied on biomedical KGs for various tasks, notably link prediction that enables applications such as drug repurposing. The research community has implemented benchmarks to evaluate and compare the large diversity of KGE models. However, existing benchmarks often overlook the issue of Data Leakage (DL), which can lead to inflated performance and compromise the validity of benchmark results. DL may occur due to inadequate separation between training and test sets (DL1), use of illegitimate features (DL2), or evaluation settings that fail to reflect real-world inference conditions (DL3). In this study, we implement systematic procedures to detect and mitigate these sources of DL. We evaluate popular KGE models on a biomedical KG and show that inappropriate data separation (DL1) artificially inflates model performances and that models do not rely on node degree as a shortcut feature (DL2). For DL3, we implement realistic inference conditions with i) a zero-shot training procedure in which drugs in test and validation sets have no known indications during training and ii) a drug repurposing ground-truth for rare diseases. Performances collapse in both these scenarios. Our findings highlight the need for more rigorous evaluation protocols and raise concerns about the reliability of current KGE models for real-world biomedical applications such as drug repurposing. | |
2025-07-23 | 12:45:00 | 12:50:00 | 03A | Bio-Ontologies and Knowledge Representation | A machine learning framework for extracting and structuring biological pathway knowledge from scientific literature | Mun Su Kwon | Mun Su Kwon, Junkyu Lee, Haechan Sung, Hyun Uk Kim | Advances in text mining have significantly improved the accessibility of scientific knowledge from literature. However, a major challenge in biology and biotechnology remains in extracting information embedded within biological pathway images, which are not easily accessible through conventional text-based methods. To overcome this limitation, we present a machine learning–based framework called “Extraction of Biological Pathway Information” (EBPI). The framework systematically retrieves relevant publications based on user-defined queries, identifies biological pathway figures, and extracts structured information such as genes, enzymes, and metabolites. EBPI combines image processing and natural language models to identify texts from diagrams, classify terms into biological categories, and infer biochemical reaction directionality using graphical cues such as arrows. The extracted information is output in an editable, tabular format suitable for integration with pathway databases and knowledge graphs. Validated against manually curated pathway maps, EBPI enables scalable knowledge extraction from complex visual data of biological pathways and opens new directions for automated literature curation across many biological disciplines. | |
2025-07-23 | 12:50:00 | 13:00:00 | 03A | Bio-Ontologies and Knowledge Representation | Poster Madness | Each accepted poster presenter is given up 1 minute to advertise their poster. | |||
2025-07-23 | 14:00:00 | 14:20:00 | 03A | Bio-Ontologies and Knowledge Representation | ScGOclust: leveraging gene ontology to find functionally analogous cell types between distant species | Yuyao Song | Yuyao Song, Yanhui Hu, Julian Dow, Norbert Perrimon, Irene Papatheodorou | Basic biological processes are shared across animal species, yet their cellular mechanisms are profoundly diverse. Comparing cell-type gene expression between species reveals conserved and divergent cellular functions. However, as phylogenetic distance increases, gene-based comparisons become less informative. The Gene Ontology (GO) knowledgebase offers a solution by serving as the most comprehensive resource of gene functions across a vast diversity of species, providing a bridge for distant species comparisons. Here, we present scGOclust, a computational tool that constructs de novo cellular functional profiles using GO terms, facilitating systematic and robust comparisons within and across species. We applied scGOclust to analyse and compare the heart, gut and kidney between mouse and fly, and whole-body data from C.elegans and H.vulgaris. We show that scGOclust effectively recapitulates the function spectrum of different cell types, characterises functional similarities between homologous cell types, and reveals functional convergence between unrelated cell types. Additionally, we identified subpopulations within the fly crop that show circadian rhythm-regulated secretory properties and hypothesize an analogy between fly principal cells from different segments and distinct mouse kidney tubules. We envision scGOclust as an effective tool for uncovering functionally analogous cell types or organs across distant species, offering fresh perspectives on evolutionary and functional biology. | |
2025-07-23 | 14:20:00 | 14:40:00 | 03A | Bio-Ontologies and Knowledge Representation | Integrating autoantibody-related knowledge in an ontology populated using a curated dataset from literature | Fabien Maury | Fabien Maury, Solène Grosdidier, Killian Halberda, Isabelle Desguerre, Adrien Coulet, Maud de Dieuleveult | Autoimmune diseases (AIDs) are often characterized by the presence of autoantibodies (AAbs). But many of these diseases are rare and can be hard to diagnose, partly due to the lack of easily accessible knowledge such as the type of AAb to test for, in order to diagnose a particular AID. Indeed, to our knowledge, no centralized resource including all available knowledge related to human autoantibodies exists as of 04-2025. To fill this gap, first, we introduce a light ontology that allows to represent relationships about AAbs, their molecular targets, and the related AIDs and their clinical signs. Also, this ontology allows to specify the provenance of the relationships, by reusing the PROV-O ontology. Second, we introduce the MAKAAO Core dataset, a dataset compiled manually from the literature by several curators. MAKAAO Core includes the name and synonyms (both in English and French) of over 350 autoantibodies, along with their targets and associated AIDs. Targets and diseases are referred to using identifiers from reference resources. We used this dataset to populate our ontology, and named the result the MAKAAO knowledge graph (MAKAAO KG), which constitutes the central part of a future reference resource. | |
2025-07-23 | 14:40:00 | 15:00:00 | 03A | Bio-Ontologies and Knowledge Representation | Ontology pre-training improves machine learning predictions of aqueous solubility and other metabolite properties | Charlotte Tumescheit | Charlotte Tumescheit, Martin Glauer, Simon Flügel, Fabian Neuhaus, Till Mossakowski, Janna Hastings | Predicting properties of small molecule metabolites from structures is a challenging task. Molecular language models have emerged as a highly performant AI approach for prediction of diverse properties directly from ‘language-like’ representations of the structures of molecules. However, for many prediction problems, there is a shortage of available training data and model performance is still limited. Integrating expert knowledge into language models has the potential to improve performance on prediction tasks and model generalisability. Bio-ontologies offer curated knowledge ideal for this purpose. Here, we demonstrate a novel approach to knowledge injection, ‘ontology pre-training’, which we have previously shown to work for a pilot case study in the classification task of toxicity prediction. Now, we extend this to regression tasks such as solubility prediction and a wider range of classification tasks. First, we pre-train a Transformer-based language model on molecules from PubChem. Then, using our novel method, we embed the knowledge contained in a classification hierarchy derived from the ChEBI ontology into the model as an intermediate training step between general-purpose pre-training and task-specific fine-tuning. Finally, we fine-tune the models on a range of regression tasks. We find a clear improvement in performance and training times across the diverse prediction tasks. Our results show that adding an additional knowledge-based training step to a machine learning model can improve performance. Our method is intuitive and generalisable and we plan to extend it to further biological modalities and prediction datasets, including proteins and RNA, as well as exploring the impact of different ontologies. | |
2025-07-23 | 15:00:00 | 15:20:00 | 03A | Bio-Ontologies and Knowledge Representation | Building the Aging Biomarkers Ontology and Its Applications in Aging Research | Hande McGinty | Hande McGinty, Srikar Reddy Gadusu, Yigit Kucuk, Aaron King | Aging is a complex biological process shaped by numerous biomarkers—such as cholesterol and blood sugar levels—that serve as measurable indicators of health and disease. Despite the abundance of biomarker data, identifying meaningful patterns and relationships remains a significant challenge. To address this, we began developing the Aging Biomarkers Ontology (ABO), a structured framework that formally defines aging-related biomarkers, organizes them hierarchically, and maps their interconnections to facilitate deeper analysis. Furthermore, we employed two complementary approaches to enrich the graph and uncover hidden associations among aging biomarkers: Depth-Limited Search (DLS) and machine learning-based embedding search. DLS identifies associations by traversing connected nodes within a predefined depth, while the embedding-based method encodes biomarker relationships as numerical vectors and uses cosine similarity to predict potential links. We evaluated the performance of both methods in detecting known and novel relationships. Our results demonstrate the value of systematically integrating statistical analysis with graph-based reasoning and machine learning to explore aging-related biomarkers. The resulting framework enhances the interpretability of biomarker data, supports hypothesis generation, and contributes to advancing biomedical research in aging and longevity. | |
2025-07-23 | 15:20:00 | 15:40:00 | 03A | Bio-Ontologies and Knowledge Representation | Discovering cellular contributions to disease pathogenesis in the NLM Cell Knowledge Network | Richard Scheuermann | Richard Scheuermann, Anne Deslattes Mays, Matthew Diller, Caroline Eastwood, Rezarta Islamaj, James Leaman, Raymond LeClair, Zhiyong Lu, Chris Mungall, Vinh Nguyen, David Osumi-Sutherland, Beverly Peng, Noam Rotenberg, William Spear, Bingfang Xu, Yun Zhang | Knowledge about the role of genes in disease pathogenesis has been obtained from genetic and genome-wide association studies. The proteins encoded by these genes are frequently found to be effective therapeutic targets. However, little is known about which cells are the functional home of these disease-associated genes and proteins. Single cell genomic technologies are now revealing the cellular complexity of human tissues at high resolution. The transcriptomes defined by these technologies reflect the functional cellular phenotypes. Database resources that capture and disseminate data derived from these single cell technologies have been developed. But the knowledge derived from their analysis and interpretation is largely buried as free text in the scientific literature. Here we describe the development of a Cell Knowledge Network (CKN) prototype at the National Library of Medicine (NLM) that captures and exposes knowledge about cell phenotypes (cell types and states) derived from single cell technologies and related experiments. NLM-CKN is populated using validated computational analysis pipelines and natural language processing of the scientific literature and integrated with other sources of relevant knowledge about genes, anatomical structures, diseases, and drugs. Using this integration of experimental sc/snRNAseq data with prior knowledge about disease predispositions and drug targets, a novel linkage between lung pericytes and pulmonary hypertension was discovered through the KCNK3 gene intermediary with implications for novel therapeutic interventions. Through the integration of knowledge from single cell technologies with other sources of knowledge about genetic predispositions and therapeutic targets, the NLM-CKN is revealing the cellular contributions to disease pathogenesis. | |
2025-07-23 | 15:40:00 | 16:00:00 | 03A | Bio-Ontologies and Knowledge Representation | Cat-VRS for Genomic Knowledge Curation: A Hyperintensional Representation Framework for FAIR Categorical Variation | Daniel Puthawala | Daniel Puthawala, Brendan Reardon | Cat-VRS: A FAIR catvar Standard Categorical variants (catvars)—such as “MET exon 14 skipping” and “TP53 loss”—are foundational to genomic knowledge, linking sets of genomic variants to clinically relevant assertions like oncogenicity scores or predicted therapeutic response. Yet despite their importance, catvars remain unstandardized, ambiguous, and largely non-computable, creating persistent barriers to search, curation, interoperability, and reuse. Existing standards either offer flexible models for sequence-resolved variants (e.g., GA4GH VRS) or rigid top-down nomenclatures (e.g., HGVS) that fail to capture the diversity and nuance of categorical assertions. We present the Categorical Variation Representation Specification (Cat-VRS), a new GA4GH standard for representing catvars using a hyperintensional, constraint-based model. Cat-VRS encodes categorical meaning compositionally and bottom-up: structured constraints—such as sequence location or protein functional consequence—support precise, flexible representations at varying levels of granularity. Cat-VRS is fully interoperable with other GA4GH standards, supports ontology mappings, and was developed through global community collaboration in alignment with the FAIR data principles. Cat-VRS 1.0 was recently released by GA4GH and is already in use by ClinVar and MaveDB, with integration underway in CIViC and the VICC MetaKB. These early implementations demonstrate Cat-VRS’s practical utility in enabling reusable, computable representations of categorical knowledge. As precision medicine scales, so too does the need for infrastructure that supports consistent curation, standardized data sharing, and automated variant knowledge matching. We invite the bio-ontologies and knowledge representation community to engage with Cat-VRS as both a practical tool and an extensible framework for advancing interoperable genomic knowledge. | |
2025-07-23 | 16:40:00 | 17:40:00 | 03A | Bio-Ontologies and Knowledge Representation | Knowledge Graphs: Theory, Applications and Challenges | Ian Horrocks | Knowledge Graphs have rapidly become a mainstream technology that combines features of databases and AI. In this talk I will introduce Knowledge Graphs, explaining their features and the theory behind them. I will then consider some of the challenges inherent in both the theory and implementation of Knowledge Graphs and present some solutions that have made possible the development of popular language standards and robust and high-performance Knowledge Graph systems. Finally, I will illustrate the wide applicability of knowledge graph technology with some example use cases. | ||
2025-07-23 | 17:40:00 | 17:45:00 | 03A | Bio-Ontologies and Knowledge Representation | Bridging Language Barriers in Bio-Curation: An LLM-Enhanced Workflow for Ontology Translation into Japanese | Mark Streer | Mark Streer, Olivia Watson, Mark McDowall, Jane Lomax | SciBite’s ontology management and named entity recognition (NER) software relies on curated public ontologies to support data harmonization under FAIR principles (findable, accessible, interoperable, and reusable). Public ontologies are foundational for data FAIR-ification, providing structured vocabularies that enable consistent annotation and semantic integration; however, they are predominantly developed in English, creating barriers for non-English users and applications. To address this challenge for our Japanese customers, we developed a large language model (LLM)-enhanced bio-curation workflow for English-to-Japanese translation, focusing on synonym enrichment of the Uberon anatomy ontology as a case study. Our approach implements a three-step process: (1) importing mapped Japanese synonyms from existing bilingual datasets (e.g., DBCLS resources), (2) generating Japanese candidate synonyms based on English synonyms and definitions using an LLM, and (3) validating candidates against the source ontology to ensure appropriate placement as well as online dictionaries and other references to confirm their real-world applicability. Initially developed for synonym enrichment, this workflow can be extended to semantic refinement into broadMatch and narrowMatch relationships in addition to exactMatch—critical for terminology lacking perfect English equivalents. Furthermore, the workflow is well-suited to agentic frameworks such as LangGraph to orchestrate generation and Internet research processes, as well as LLM-ensemble evaluation to automatically confirm clear matches, allowing ambiguous cases to be prioritized for “human-in-the-loop" curation. This approach represents a promising solution for scalable ontology translation, contributing to the FAIR development and application of bio-ontologies across language barriers and enhancing international biomedical research collaboration. | |
2025-07-23 | 17:45:00 | 17:50:00 | 03A | Bio-Ontologies and Knowledge Representation | Enabling FAIR Single-Cell RNAseq Data Management with COPO | Felix Shaw | Felix Shaw, Debby Ku, Aaliyah Providence, Irene Papatheodorou | We present our work on establishing standards and tools for validating and submitting single-cell RNA sequencing (scRNA-seq) data and metadata using the COPO brokering platform. Effective research data management is essential for enabling data reuse, integration, and the discovery of new biological insights. As new technologies like single-cell sequencing and transcriptomics emerge, they often outpace existing data infrastructure. Single-cell technologies allow detailed insights into biological processes, for example, tracking gene expression dynamics in crops, dissecting pathogen-host interactions at the cellular level, or identifying stress-resilient cell types. Yet without comprehensive metadata and appropriate data management tools, the full potential of these datasets remains unrealised. Implementing the FAIR principles—particularly around metadata quality is crucial. At present, there are few widely adopted standards or tools for describing scRNA-seq experiments. In response, we have developed a structured metadata template tailored to these experiments, informed by extensive consultation with researchers across the single-cell community and aligned with existing standards. This metadata standard is integrated into COPO, which provides a streamlined interface for validating and brokering data and metadata to public repositories. Standardised metadata improves discoverability, supports data integration across platforms, and enables consistent reuse. It also ensures proper attribution, facilitates collaboration across diverse disciplines, and enhances reproducibility. By submitting with FAIR metadata viaSingle-cell RNA-seq COPO, we transform scRNA-seq outputs from isolated experimental results into well-labelled, interoperable datasets suitable for downstream applications such as machine learning. Our work addresses a key infrastructure gap, enabling more effective, collaborative, and impactful research in the single-cell field. | |
2025-07-23 | 17:50:00 | 17:55:00 | 03A | Bio-Ontologies and Knowledge Representation | Cancer Complexity Knowledge Portal: A centralized web portal for finding cancer related data, software tools, and other resources | Susheel Varma | Orion Banks, Ashley Clayton, Aditi Gopalan, Amber Nelson, Stockard Simon, Verena Chung, Amy Heiser, Jay Hodgson, Aditya Nath, Adam Hindman, Milen Nikolov, Adam Taylor, James Eddy, Susheel Varma, Jineta Banerjee | Applying artificial intelligence and machine learning to biomedical problems requires clean, high-quality data and reusable software tools. The Cancer Complexity Knowledge Portal (CCKP), a NIH-listed domain-specific repository maintained by the Multi-Consortia Coordinating (MC2) Center at Sage Bionetworks, makes oncology data findable and accessible. The MC2 Center coordinates resources among six cancer-focused research consortia funded by the National Cancer Institute. To establish metadata standards, the CCKP hosts data models for various modalities, including genomics and imaging. New models are also being developed for emerging types, such as spatial transcriptomics. These models undergo iterative development with versioned releases maintained in a public GitHub repository. They power data management tools developed by Sage Bionetworks, including the Schematic Python package and the Data Curator App, which support FAIR data annotation. The data models help researchers link research outputs and assist the CCKP in highlighting activities from NCI-funded cancer research programs. The portal offers search and filtering capabilities to accelerate discovery and collaboration. As of November 2024, it hosts information on 3,786 publications, 904 datasets, and 292 computational tools from over 140 research grants. The models incorporate elements from the Cancer Research Data Commons Data Hub to support integration within the CRDC ecosystem. We are engaging with scientists, clinicians, and patient advocates to leverage user-centred design and structured data models, making cancer data more findable, accessible, and reusable. These improvements aim to bridge the gap between experimental and computational labs, fueling scientific discovery. | |
2025-07-23 | 17:55:00 | 18:00:00 | 03A | Bio-Ontologies and Knowledge Representation | COSI Closing Remarks | Augustin Luna, Tiffany Callahan, Augustin Luna, Tiffany Callahan |