Return to ISMB/ECCB 2025 Homepage Click here for the abridged agenda
Select Track: 3DSIG | Bio-Ontologies and Knowledge Representation | BioInfo-Core | Bioinfo4Women Meet-Up | Bioinformatics in the UK | BioVis | BOSC | CAMDA | CollaborationFest | CompMS | Computational Systems Immunology | Distinguished Keynotes | Dream Challenges | Education | Equity and Diversity | EvolCompGen | Fellows Presentation | Function | General Computational Biology | HiTSeq | iRNA | ISCB-China Workshop | JPI | MICROBIOME | MLCSB | NetBio | NIH Cyberinfrastructure and Emerging Technologies Sessions | NIH/Elixir | Publications - Navigating Journal Submissions | RegSys | Special Track | Stewardship Critical Infrastructure | Student Council Symposium | SysMod | Tech Track | Text Mining | The Innovation Pipeline: How Industry & Academia Can Work Together in Computational Biology | TransMed | Tutorials | VarI | WEB 2025 | Youth Bioinformatics Symposium | All
NOTE: Browser resolution may limit the width of the agenda and you may
need to scroll the iframe to see additional columns.
Click the buttons below to download your current table in that format
Date | Start Time | End Time | Room | Track | Title | Confrimed Presenter | Format | Authors | Abstract |
---|---|---|---|---|---|---|---|---|---|
2025-07-23 | 11:20:00 | 11:40:00 | 12 | NIH/Elixir | Disease Ontology Knowledgebase: A Global BioData hub for FAIR disease data discovery | Lynn Schriml | Lynn Schriml | Development of long-term biodata resources, by design, depends on a stable data model with persistent identifiers, regular data releases, and reliable responsiveness to ongoing community needs. Addressing evolving needs while continually advancing our data representation has facilitated the sustained 20-year growth and utility of the Human Disease Ontology (DO, https://www.disease-ontology.org/). Biodata resources must maintain their relevance, adapting to address and fulfill persistent, evolving needs. Strategically, the DO actively identifies and connects with our expanding user community, thusly driving DO’s integration of diverse disease factors (e.g., molecular, environmental and mechanistic) into a singular framework. Serving a vast user community since 2003 (> 415 biomedical resources across 45 countries), the DO’s continual content and classification expansion is driven by the ever-evolving disease knowledge ecosystem. The DO, a designated Global Core Biodata Resource (https://globalbiodata.org/), empowers disease data integration, standardization, and analysis across the interconnected web of biomedical information. A focus on modernizing infrastructure is imperative to provide new mechanisms for data interoperability and accessibility. Our strategic approach includes following community best practices (e.g., OBO Foundry, FAIR principles), adapting established technical approaches (e.g., Neo4j; Swagger for API), and openly sharing project-developed tooling - reduces technical debt while maximizing data delivery opportunities. The DO Knowledgebase (DO-KB) tools (DO-KB SPARQL service and endpoint, Faceted Search Interface, advanced API service, DO.utils) have been developed to enhance data discovery, delivering an integrated data system that exposes the DO’s semantic knowledge and connects disease-related data across Open Linked Data resources. | |
2025-07-23 | 11:40:00 | 12:00:00 | 12 | NIH/Elixir | Integrating Data Treasures: Knowledge graphs of the DSMZ Digital Diversity | Julia Koblitz | Julia Koblitz | The DSMZ (German Collection of Microorganisms and Cell Cultures) hosts a wealth of biological data, covering microbial traits (BacDive), taxonomy (LPSN), enzymes and ligands (BRENDA), rRNA genes (SILVA), cell lines (CellDive), cultivation media (MediaDive), strain identity (StrainInfo), and more. To make these diverse datasets accessible and interoperable, the DSMZ Digital Diversity initiative provides a central hub for integrated data and establishes a framework for linking and accessing these resources (https://hub.dsmz.de). At its core lies the DSMZ Digital Diversity Ontology (D3O), an upper ontology designed to unify key concepts across all databases, enabling seamless integration and advanced exploration. This ontology is complemented by well-established ontologies such as ChEBI, ENVO, and NCIT, among others. By standardizing all resources within a defined vocabulary, we enhance their interoperability, both internally and with the Linked Open Data community. Where necessary, we will also develop and curate our own ontologies, such as the well-known BRENDA tissue ontology (BTO), a comprehensive ontology for LPSN taxonomy and nomenclature, and the Microbial Isolation Source Ontology (MISO), which has already been applied to annotate more than 80,000 microbial strains. D3O also provides a stable foundation for transforming our databases into RDF (resource description framework) and providing the knowledge graphs via open SPARQL endpoints. The first knowledge graphs of BacDive and MediaDive are already available at https://sparql.dsmz.de, enabling researchers to query and analyze microbial trait data and cultivation media. These initial steps lay the groundwork for integrating additional databases, such as BRENDA and StrainInfo, into unified, queryable knowledge graphs. | |
2025-07-23 | 12:00:00 | 12:20:00 | 12 | NIH/Elixir | Metabolomics Workbench: Data Sharing, Analysis and Integration at the National Metabolomics Data Repository | Mano Maurya | Mano Maurya | The National Metabolomics Data Repository (NMDR) was developed as part of the National Institutes of Health (NIH) Common Fund Metabolomics Program to facilitate the deposition and sharing of metabolomics data and metadata from researchers worldwide. The NMDR, housed at the San Diego Supercomputer Center (SDSC), University of California, San Diego, has developed the Metabolomics Workbench (MW). The Metabolomics Workbench also provides analysis tools and access to metabolite standards, including RefMet, protocols, tutorials, training, and more. RefMet facilitates metabolite name harmonization, an essential step in data integration across different studies and collaboration across different research centers. Thus, the MW-NMDR serves as a one-stop infrastructure for metabolomics research and is widely regarded as one of the most FAIR (findable, accessible, interoperable, usable) data resources. In this work, we will present some of the key aspects of the MW-NMDR, such as continuous curation to maintain quality, use of controlled vocabularies and ontologies to promote interoperability, development of tools to contribute to driving scientific innovation, and integration of tools developed by the community into the MW. We will also discuss our involvement in other data sharing, reuse, and integration efforts, namely the NIH Common Fund Data Ecosystem (CFDE) and a collaboration with the European Bioinformatics Institute (EBI)’s MetabolomeXchange as part of the Chan Zuckerberg initiative. | |
2025-07-23 | 12:20:00 | 12:40:00 | 12 | NIH/Elixir | Building sustainable solutions for federally-funded open-source biomedical tools and technologies | Karamarie Fecho | Karamarie Fecho | Federally-funded, open-source, biomedical tools and technologies often fail due to a lack of a business model for sustainability, which quickly leads to technical obsolescence and is often preceded by insufficient scientific impact and the failure to create a thriving Community of Practice. The open-source ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways) knowledge graph (KG) system is jointly funded by the National Institute of Environmental Health Sciences and the Office of Data Science Strategy within the National Institutes of Health as a modular biomedical KG system designed to explore relationships between biomedical entities. The ROBOKOP system includes the aggregated ROBOKOP KG composed of integrated and harmonized “knowledge” derived from dozens of “knowledge sources”, a user interface to the ROBOKOP KG, and a collection of supporting tools and resources. ROBOKOP has demonstrated its utility in a variety of use cases, including suggesting “adverse outcome pathways” to explain the biological relationships between chemical exposures and disease outcomes and the related concept of “clinical outcome pathways” to explain the biological mechanisms underlying the therapeutic effects of drug exposures. We have been evaluating approaches to ensure the long-term sustainability of ROBOKOP, independent of federal funding. One approach is to adopt and adapt the best practices of, and lessons learned by, successful open-source biomedical Communities of Practice, with engaged scientific end users and technical contributors. This presentation will provide an overview of our evaluation results and detail our proposed solution for transitioning ROBOKOP from federal funding to independent long-term sustainability. | |
2025-07-23 | 12:40:00 | 13:00:00 | 12 | NIH/Elixir | SEA CDM: An Ontology-Based Common Data Model for Standardizing and Integrating Biomedical Experimental Data in Vaccine Research | Yongqun He | Yongqun He | With the increasing volume of experimental data across biomedical fields, standardizing, sharing, and integrating heterogeneous experimental data has become a major challenge. Our systematic VIOLIN vaccine database has collected and annotated over 4,700 vaccines against 217 infectious and non-infectious diseases such as cancer, and vaccine components such as over 100 vaccine adjuvants and over 1,700 vaccine-induced host immune factors. To support standardization, we developed the community-based Vaccine Ontology (VO) to represent vaccine knowledge and associated metadata. To support interoperable standardization, annotation, and integration of various biomedical experimental datasets, we have developed an ontology-supported Study-Experiment-Assay (SEA) common data model (CDM), consisting of 12 core classes (or called tables in a relational database setting), such as Organism, Sample, Intervention, and Assay. The SEA CDM was evaluated systematically using the vaccine-induced host gene immune response data from our VIOLIN VIGET (Vaccine Induced Gene Expression Analysis Tool) system. We also developed a MySQL database and a Neo4J knowledge graph based on the SEA CDM, to systematically represent the VIGET data and influenza-related host gene expression data from two large-scale data resources: ImmPort and CELLxGENE. Our results show that ontologies such as VO can greatly support interoperable data annotation and provide additional semantic knowledge (e.g., vaccine hierarchy). This proof-of-concept study demonstrated the feasibility and validity of the SEA CDM for standardizing and integrating heterogeneous datasets and highlights its potential for application to other big bioresources. The novel SEA CDM lays a foundation for building a FAIR and AI-ready Biodata Ecosystem, leading to advanced AI research. | |
2025-07-23 | 14:00:00 | 14:20:00 | 12 | NIH/Elixir | The Evolution of Ensembl: Scaling for Accessibility, Performance, and Interoperability | Mallory Freeberg | Mallory Freeberg | Ensembl is an open platform that integrates publicly available genomics data across the tree of life, enabling activities spanning research to clinical and agricultural applications. Ensembl provides a comprehensive collection of data including genomes, genomic annotations, and genetic variants, as well as computational outputs such as gene predictions, functional annotations, regulatory region predictions, and comparative genomic analyses. In its 25-year history, Ensembl has grown to support all domains of life - from vertebrates to plants to bacteria - releasing new data roughly quarterly. Initially developed for the human genome, Ensembl expanded to include additional key vertebrates totalling a few hundred genomes. With the advent of global biodiversity and pangenome projects, Ensembl now contains thousands of genomes and is anticipated to grow to tens of thousands of genomes in the coming years. This explosion in data size necessitates a more scalable and rapidly deployable mechanism to ensure timely release of new high-quality genomes for immediate use by the community. Ensembl is evolving to meet increasing scalability demands to ensure continued accessibility, performance, and interoperability. We have developed a new service-oriented infrastructure, deployed as a set of orchestrated microservices. Our new refget implementation enables rapid, unambiguous sequence retrieval using checksum-based identifiers. Our GraphQL service has been expanded to support genome metadata queries, facilitating programmatic access to assembly composition and linked datasets. With streamlined components and more modern technologies, Ensembl will be easier to maintain, delivering high-quality data quickly and benefiting the global scientific communities that rely on this key resource. | |
2025-07-23 | 14:20:00 | 14:40:00 | 12 | NIH/Elixir | Insights from GlyGen in Developing Sustainable Knowledgebases with Well-Defined Infrastructure Stacks | Kate Warner | Kate Warner | GlyGen is a data integration and dissemination project for glycan and glycoconjugate related data, which retrieves information from multiple international data sources to form a central knowledgebase for glycoscience data exploration. To maintain our high-quality service while meeting the needs of our users, we have structured GlyGen into related but distinct spaces - the Web portal, Data portal, API, Wiki, and SPARQL - which makes clear delineation of tasks for maintenance and innovation while providing different mechanisms for data access. General users can use the interactive GlyGen web portal to search and explore GlyGen data using our various tools and search functionalities. For programmatic access, users can use the API (https://api.glygen.org) to access GlyGen data objects for glycans and proteins, while the SPARQL endpoint (https://sparql.glygen.org) is built to provide an alternative programmatic access to the GlyGen data using semantic web technologies. For users interested in using the datasets in research, data mining or machine learning projects, versioned dataset flat files can be downloaded at our Data portal (https://data.glygen.org), along with the dataset’s Biocompute Object (BCO) (https://biocomputeobject.org) which documents the metadata of the dataset for proper attribution, reproducibility and data sharing. All components of the GlyGen ecosystem are built using well-established web technology stacks, enabling rapid development and deployment on both on-premise infrastructure and commercial cloud platforms, while also ensuring straightforward maintenance. Finally, we will discuss how the ability to be freely accessible and under the Creative Commons Attribution 4.0 International (CC BY 4.0) license helps to encourage FAIR data, open science, and collaboration. | |
2025-07-23 | 14:40:00 | 15:00:00 | 12 | NIH/Elixir | Philip Blood | ||||
2025-07-23 | 15:00:00 | 15:20:00 | 12 | NIH/Elixir | Production workflows and orchestration at MGnify, ELIXIR’s Core Data Resource for metagenomics | Martin Beracochea | Martin Beracochea | MGnify is a key resource for the assembly, analysis and archiving of microbiome-derived sequencing datasets. Designed to be interoperable with the European Nucleotide Archive (ENA) for data archiving, MGnify’s analyses can be initiated from various ENA sequence data products, including private datasets. Accessioned data outputs are produced in commonly used formats and available via web visualisation and APIs. The rapid evolution of the field of microbiome research over the past decade has brought significant challenges: exponential dataset growth; increased sample diversity; additional data analyses and new sequencing technologies. To address these challenges, MGnify’s latest pipelines have transitioned from the Common Workflow Language to Nextflow, nf-core, and a new automation system. This enhances resource management and supports heterogeneous computing including cloud environments, handles large-scale data production, and reduces manual intervention. Key MGnify outputs include taxonomic and functional analyses of metagenomes, covering >600,000 datasets. The service produces and organises metagenome assemblies and metagenome-assembled genomes, totaling >480,000, as well as nearly 2.5 billion protein sequences. The available annotations have broadened to include the mobilome and virome, as well as increased taxonomic specificity via additional amplicon sequence variant analyses. While these developments have positioned MGnify to efficiently take advantage of elastic compute resources, the volume of demand still outstrips the available resources. As such, we have started to evaluate how analyses can be federated through the use of our Nextflow pipelines (and community produced Galaxy versions), in combination with Research Objects, to provide future scalability yet retaining a centralised point of discovery. | |
2025-07-23 | 15:20:00 | 15:40:00 | 12 | NIH/Elixir | A SCALE-Able Approach to Building “Hybrid” Repositories to Drive Sustainable, Data Ecosystems | Robert Schuler | Robert Schuler | Scientific discovery increasingly relies on the ability to acquire, curate, integrate, analyze, and share vast and varied datasets. For instance, advancements like AlphaFold, an AI-based protein prediction tool, and ChatGPT, a large language model-based chatbot, have generated immense excitement in science and industry for harnessing data and computation to solve significant challenges. However, it’s easy to overlook that these remarkable achievements were only made possible after the accumulation of a critical mass of AI-ready data. Both examples relied on open data sources meticulously generated by user communities over several decades. We argue that scalable, sustainable data repositories that bridge the divide between domain-specific and generalist repositories and that actively engage communities of investigators in the task of organizing and curating data will be required to meet the challenge of producing a future critical mass of data to unlock new discoveries. Such resources must move beyond the label of “repository” and instead employ a socio-technical approach that inculcates a culture and skill set for data management, sharing, reuse, and reproducibility. In this talk, we will discuss our efforts toward developing FaceBase as a “SCALE-able” data resource built on the principles of Self-service Curation, domain-Agnostic data-centric platforms, Lightweight information models, and Evolvable systems. Based on our approach, working within the dental, oral, craniofacial, and biologically relevant research community, we have seen several hundred studies encompassing many thousands of subjects and specimens’ worth of data across multiple imaging modalities and sequencing assay types contributed and curated by the community. | |
2025-07-23 | 15:40:00 | 15:50:00 | 12 | NIH/Elixir | From Platforms to Practice: How the ELIXIR Model Enables Impactful, Sustainable Biodata Resources | Fabio Liberante | Fabio Liberante | Biodata resources are only as impactful as the ecosystems in which they operate. ELIXIR provides a coordinated European infrastructure that supports the sustainability, discoverability, and effective reuse of life science data — enabling biodata resources to thrive in an increasingly complex global research environment. This talk will provide an overview of how ELIXIR delivers this support through its Core Data Resources, five Platforms — including Data and Interoperability — and an active network of Communities. Together, these elements underpin the long-term value and resilience of biodata infrastructures by helping resources: Implement FAIR practices Link across scientific domains Plan for the full biodata resource lifecycle We will highlight the role of registries and standards, the monitoring and periodic review of Core Data Resources, and the importance of both qualitative and quantitative indicators in tracking impact. Recent challenges — including the effects of large-scale data scraping — will also be discussed, alongside the need to balance openness with sustainability. Finally, we will share some insights from ELIXIR’s international collaborations, including with the NIH, to illustrate how global coordination enhances the visibility, value, and future-proofing of open data infrastructures. | |
2025-07-23 | 15:50:00 | 16:00:00 | 12 | NIH/Elixir | NIH-ODSS | Ishwar Chandramouliswaran | |||
2025-07-23 | 16:40:00 | 17:00:00 | 12 | NIH/Elixir | Melissa Harrison | ||||
2025-07-23 | 17:00:00 | 17:20:00 | 12 | NIH/Elixir | Mark Hahnel | ||||
2025-07-23 | 17:20:00 | 17:40:00 | 12 | NIH/Elixir | Evaluating the Impact of Biodata Resources: Insights from EMBL-EBI’s Impact Assessments | Eleni Tzampatzopoulou | Eleni Tzampatzopoulou | The provision of open access data through biodata resources is a critical driver of breakthroughs in life sciences research, advances in clinical practice and industry innovations that benefit humankind. However, understanding their long-term economic and societal impacts remains a challenge. As part of ongoing efforts to establish a framework and evidence base for demonstrating the value of open data resources, EMBL-EBI employs a combination of qualitative and quantitative approaches, such as service monitoring metrics, cost-benefit analyses, large-scale user surveys, data resource usage analysis and in-depth case studies. Service monitoring metrics, including unique visitors, data submission volumes and citation of datasets, indicate the breadth and diversity of user engagement with FAIR data resources. The 2024 user survey showcased the depth of utility users derive from resources, such as research years saved and reduced duplication of effort. Surveys and other user engagement also highlight EMBL-EBI’s contribution to downstream products and AI model development. Economic impact analyses, focused on the impact of direct increases in research efficiency, do not quantify these secondary or indirect impacts through data reuse, even though qualitative data suggests they are likely to be significant. Here we explore how mixed methods can characterise the impact of data reuse, considering methodologies such as in-depth case studies, data mining, administrative data and other novel approaches. We consider different methodologies EMBL-EBI has explored and propose how future impact monitoring could capture a fuller extent of the direct and indirect impacts of biodata resources, informing priority setting for life sciences funders. | |
2025-07-23 | 17:40:00 | 18:00:00 | 12 | NIH/Elixir | Alex Bateman |