Return to ISMB/ECCB 2025 Homepage Click here for the abridged agenda

Schedule for Elixir

NOTE: Browser resolution may limit the width of the agenda and you may need to scroll the iframe to see additional columns.
Click the buttons below to download your current table in that format

Date	Start Time	End Time	Room	Track	Title	Confrimed Presenter	Format	Authors	Abstract
2025-07-23	11:20:00	11:40:00	12	ELIXIR/NIH-ODSS	Disease Ontology Knowledgebase: A Global BioData hub for FAIR disease data discovery	Lynn Schriml	In person	Lynn Schriml	Development of long-term biodata resources, by design, depends on a stable data model with persistent identifiers, regular data releases, and reliable responsiveness to ongoing community needs. Addressing evolving needs while continually advancing our data representation has facilitated the sustained 20-year growth and utility of the Human Disease Ontology (DO, https://www.disease-ontology.org/). Biodata resources must maintain their relevance, adapting to address and fulfill persistent, evolving needs. Strategically, the DO actively identifies and connects with our expanding user community, thusly driving DO’s integration of diverse disease factors (e.g., molecular, environmental and mechanistic) into a singular framework. Serving a vast user community since 2003 (> 415 biomedical resources across 45 countries), the DO’s continual content and classification expansion is driven by the ever-evolving disease knowledge ecosystem. The DO, a designated Global Core Biodata Resource (https://globalbiodata.org/), empowers disease data integration, standardization, and analysis across the interconnected web of biomedical information. A focus on modernizing infrastructure is imperative to provide new mechanisms for data interoperability and accessibility. Our strategic approach includes following community best practices (e.g., OBO Foundry, FAIR principles), adapting established technical approaches (e.g., Neo4j; Swagger for API), and openly sharing project-developed tooling - reduces technical debt while maximizing data delivery opportunities. The DO Knowledgebase (DO-KB) tools (DO-KB SPARQL service and endpoint, Faceted Search Interface, advanced API service, DO.utils) have been developed to enhance data discovery, delivering an integrated data system that exposes the DO’s semantic knowledge and connects disease-related data across Open Linked Data resources.
2025-07-23	11:40:00	12:00:00	12	ELIXIR/NIH-ODSS	Integrating Data Treasures: Knowledge graphs of the DSMZ Digital Diversity	Julia Koblitz	In person	Julia Koblitz	The DSMZ (German Collection of Microorganisms and Cell Cultures) hosts a wealth of biological data, covering microbial traits (BacDive), taxonomy (LPSN), enzymes and ligands (BRENDA), rRNA genes (SILVA), cell lines (CellDive), cultivation media (MediaDive), strain identity (StrainInfo), and more. To make these diverse datasets accessible and interoperable, the DSMZ Digital Diversity initiative provides a central hub for integrated data and establishes a framework for linking and accessing these resources (https://hub.dsmz.de). At its core lies the DSMZ Digital Diversity Ontology (D3O), an upper ontology designed to unify key concepts across all databases, enabling seamless integration and advanced exploration. This ontology is complemented by well-established ontologies such as ChEBI, ENVO, and NCIT, among others. By standardizing all resources within a defined vocabulary, we enhance their interoperability, both internally and with the Linked Open Data community. Where necessary, we will also develop and curate our own ontologies, such as the well-known BRENDA tissue ontology (BTO), a comprehensive ontology for LPSN taxonomy and nomenclature, and the Microbial Isolation Source Ontology (MISO), which has already been applied to annotate more than 80,000 microbial strains. D3O also provides a stable foundation for transforming our databases into RDF (resource description framework) and providing the knowledge graphs via open SPARQL endpoints. The first knowledge graphs of BacDive and MediaDive are already available at https://sparql.dsmz.de, enabling researchers to query and analyze microbial trait data and cultivation media. These initial steps lay the groundwork for integrating additional databases, such as BRENDA and StrainInfo, into unified, queryable knowledge graphs.
2025-07-23	12:00:00	12:20:00	12	ELIXIR/NIH-ODSS	Metabolomics Workbench: Data Sharing, Analysis and Integration at the National Metabolomics Data Repository	Mano Maurya	In person	Mano Maurya	The National Metabolomics Data Repository (NMDR) was developed as part of the National Institutes of Health (NIH) Common Fund Metabolomics Program to facilitate the deposition and sharing of metabolomics data and metadata from researchers worldwide. The NMDR, housed at the San Diego Supercomputer Center (SDSC), University of California, San Diego, has developed the Metabolomics Workbench (MW). The Metabolomics Workbench also provides analysis tools and access to metabolite standards, including RefMet, protocols, tutorials, training, and more. RefMet facilitates metabolite name harmonization, an essential step in data integration across different studies and collaboration across different research centers. Thus, the MW-NMDR serves as a one-stop infrastructure for metabolomics research and is widely regarded as one of the most FAIR (findable, accessible, interoperable, usable) data resources. In this work, we will present some of the key aspects of the MW-NMDR, such as continuous curation to maintain quality, use of controlled vocabularies and ontologies to promote interoperability, development of tools to contribute to driving scientific innovation, and integration of tools developed by the community into the MW. We will also discuss our involvement in other data sharing, reuse, and integration efforts, namely the NIH Common Fund Data Ecosystem (CFDE) and a collaboration with the European Bioinformatics Institute (EBI)’s MetabolomeXchange as part of the Chan Zuckerberg initiative.
2025-07-23	12:20:00	12:40:00	12	ELIXIR/NIH-ODSS	Building sustainable solutions for federally-funded open-source biomedical tools and technologies	Karamarie Fecho	Live stream	Karamarie Fecho	Federally-funded, open-source, biomedical tools and technologies often fail due to a lack of a business model for sustainability, which quickly leads to technical obsolescence and is often preceded by insufficient scientific impact and the failure to create a thriving Community of Practice. The open-source ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways) knowledge graph (KG) system is jointly funded by the National Institute of Environmental Health Sciences and the Office of Data Science Strategy within the National Institutes of Health as a modular biomedical KG system designed to explore relationships between biomedical entities. The ROBOKOP system includes the aggregated ROBOKOP KG composed of integrated and harmonized “knowledge” derived from dozens of “knowledge sources”, a user interface to the ROBOKOP KG, and a collection of supporting tools and resources. ROBOKOP has demonstrated its utility in a variety of use cases, including suggesting “adverse outcome pathways” to explain the biological relationships between chemical exposures and disease outcomes and the related concept of “clinical outcome pathways” to explain the biological mechanisms underlying the therapeutic effects of drug exposures. We have been evaluating approaches to ensure the long-term sustainability of ROBOKOP, independent of federal funding. One approach is to adopt and adapt the best practices of, and lessons learned by, successful open-source biomedical Communities of Practice, with engaged scientific end users and technical contributors. This presentation will provide an overview of our evaluation results and detail our proposed solution for transitioning ROBOKOP from federal funding to independent long-term sustainability.
2025-07-23	12:40:00	13:00:00	12	ELIXIR/NIH-ODSS	SEA CDM: An Ontology-Based Common Data Model for Standardizing and Integrating Biomedical Experimental Data in Vaccine Research	Yongqun He	In person	Yongqun He	With the increasing volume of experimental data across biomedical fields, standardizing, sharing, and integrating heterogeneous experimental data has become a major challenge. Our systematic VIOLIN vaccine database has collected and annotated over 4,700 vaccines against 217 infectious and non-infectious diseases such as cancer, and vaccine components such as over 100 vaccine adjuvants and over 1,700 vaccine-induced host immune factors. To support standardization, we developed the community-based Vaccine Ontology (VO) to represent vaccine knowledge and associated metadata. To support interoperable standardization, annotation, and integration of various biomedical experimental datasets, we have developed an ontology-supported Study-Experiment-Assay (SEA) common data model (CDM), consisting of 12 core classes (or called tables in a relational database setting), such as Organism, Sample, Intervention, and Assay. The SEA CDM was evaluated systematically using the vaccine-induced host gene immune response data from our VIOLIN VIGET (Vaccine Induced Gene Expression Analysis Tool) system. We also developed a MySQL database and a Neo4J knowledge graph based on the SEA CDM, to systematically represent the VIGET data and influenza-related host gene expression data from two large-scale data resources: ImmPort and CELLxGENE. Our results show that ontologies such as VO can greatly support interoperable data annotation and provide additional semantic knowledge (e.g., vaccine hierarchy). This proof-of-concept study demonstrated the feasibility and validity of the SEA CDM for standardizing and integrating heterogeneous datasets and highlights its potential for application to other big bioresources. The novel SEA CDM lays a foundation for building a FAIR and AI-ready Biodata Ecosystem, leading to advanced AI research.
2025-07-23	14:00:00	14:20:00	12	ELIXIR/NIH-ODSS	The Evolution of Ensembl: Scaling for Accessibility, Performance, and Interoperability	Mallory Freeberg	In person	Mallory Freeberg	Ensembl is an open platform that integrates publicly available genomics data across the tree of life, enabling activities spanning research to clinical and agricultural applications. Ensembl provides a comprehensive collection of data including genomes, genomic annotations, and genetic variants, as well as computational outputs such as gene predictions, functional annotations, regulatory region predictions, and comparative genomic analyses. In its 25-year history, Ensembl has grown to support all domains of life - from vertebrates to plants to bacteria - releasing new data roughly quarterly. Initially developed for the human genome, Ensembl expanded to include additional key vertebrates totalling a few hundred genomes. With the advent of global biodiversity and pangenome projects, Ensembl now contains thousands of genomes and is anticipated to grow to tens of thousands of genomes in the coming years. This explosion in data size necessitates a more scalable and rapidly deployable mechanism to ensure timely release of new high-quality genomes for immediate use by the community. Ensembl is evolving to meet increasing scalability demands to ensure continued accessibility, performance, and interoperability. We have developed a new service-oriented infrastructure, deployed as a set of orchestrated microservices. Our new refget implementation enables rapid, unambiguous sequence retrieval using checksum-based identifiers. Our GraphQL service has been expanded to support genome metadata queries, facilitating programmatic access to assembly composition and linked datasets. With streamlined components and more modern technologies, Ensembl will be easier to maintain, delivering high-quality data quickly and benefiting the global scientific communities that rely on this key resource.
2025-07-23	14:20:00	14:40:00	12	ELIXIR/NIH-ODSS	Insights from GlyGen in Developing Sustainable Knowledgebases with Well-Defined Infrastructure Stacks	Kate Warner	In person	Kate Warner	GlyGen is a data integration and dissemination project for glycan and glycoconjugate related data, which retrieves information from multiple international data sources to form a central knowledgebase for glycoscience data exploration. To maintain our high-quality service while meeting the needs of our users, we have structured GlyGen into related but distinct spaces - the Web portal, Data portal, API, Wiki, and SPARQL - which makes clear delineation of tasks for maintenance and innovation while providing different mechanisms for data access. General users can use the interactive GlyGen web portal to search and explore GlyGen data using our various tools and search functionalities. For programmatic access, users can use the API (https://api.glygen.org) to access GlyGen data objects for glycans and proteins, while the SPARQL endpoint (https://sparql.glygen.org) is built to provide an alternative programmatic access to the GlyGen data using semantic web technologies. For users interested in using the datasets in research, data mining or machine learning projects, versioned dataset flat files can be downloaded at our Data portal (https://data.glygen.org), along with the dataset’s Biocompute Object (BCO) (https://biocomputeobject.org) which documents the metadata of the dataset for proper attribution, reproducibility and data sharing. All components of the GlyGen ecosystem are built using well-established web technology stacks, enabling rapid development and deployment on both on-premise infrastructure and commercial cloud platforms, while also ensuring straightforward maintenance. Finally, we will discuss how the ability to be freely accessible and under the Creative Commons Attribution 4.0 International (CC BY 4.0) license helps to encourage FAIR data, open science, and collaboration.
2025-07-23	14:40:00	15:00:00	12	ELIXIR/NIH-ODSS	Flexible Hybrid Cloud Infrastructure Enabling Innovative Use and Long-Term Sustainability of Biomolecular Data and Reference Maps in HuBMAP and SenNet	Philip Blood		Philip Blood	We have established two successful NIH Common Fund Data Ecosystem (CFDE) data repositories and coordinating centers supporting the Human BioMolecular Atlas Program (HuBMAP) and the Cellular Senescence Network (SenNet). These data repositories are based on our efficient and flexible hybrid cloud architecture, supporting robust data and metadata standards, data ingestion and validation frameworks, plug-and-play data analysis tools, and unifying knowledge-graph and spatial frameworks to connect these manifold data and tools. Making choices that maintain flexibility wherever possible is key to building, managing and updating these shared infrastructures in ways that foster both ongoing innovation and long-term sustainability. For example, our flexible hybrid cloud microservices architecture can be run fully on-premises, fully in public clouds, or as a hybrid cloud deployment that can leverage the strengths of both, achieving substantial efficiencies of cost and time, as well as proximity to significant free computing and data resources at the Pittsburgh Supercomputing Center (PSC). This microservices architecture communicates with RESTful application programming interfaces (APIs) facilitating federated search and integration with, or export to, other services. This system also features flexible workflow integration and orchestration, enabling plug-and-play integration with other tools and workflows, which is ideal for supporting the varied and evolving applications required to support biomedical research.
2025-07-23	15:00:00	15:20:00	12	ELIXIR/NIH-ODSS	Production workflows and orchestration at MGnify, ELIXIR’s Core Data Resource for metagenomics	Martin Beracochea	In person	Martin Beracochea	MGnify is a key resource for the assembly, analysis and archiving of microbiome-derived sequencing datasets. Designed to be interoperable with the European Nucleotide Archive (ENA) for data archiving, MGnify’s analyses can be initiated from various ENA sequence data products, including private datasets. Accessioned data outputs are produced in commonly used formats and available via web visualisation and APIs. The rapid evolution of the field of microbiome research over the past decade has brought significant challenges: exponential dataset growth; increased sample diversity; additional data analyses and new sequencing technologies. To address these challenges, MGnify’s latest pipelines have transitioned from the Common Workflow Language to Nextflow, nf-core, and a new automation system. This enhances resource management and supports heterogeneous computing including cloud environments, handles large-scale data production, and reduces manual intervention. Key MGnify outputs include taxonomic and functional analyses of metagenomes, covering >600,000 datasets. The service produces and organises metagenome assemblies and metagenome-assembled genomes, totaling >480,000, as well as nearly 2.5 billion protein sequences. The available annotations have broadened to include the mobilome and virome, as well as increased taxonomic specificity via additional amplicon sequence variant analyses. While these developments have positioned MGnify to efficiently take advantage of elastic compute resources, the volume of demand still outstrips the available resources. As such, we have started to evaluate how analyses can be federated through the use of our Nextflow pipelines (and community produced Galaxy versions), in combination with Research Objects, to provide future scalability yet retaining a centralised point of discovery.
2025-07-23	15:20:00	15:40:00	12	ELIXIR/NIH-ODSS	A SCALE-Able Approach to Building “Hybrid” Repositories to Drive Sustainable, Data Ecosystems	Robert Schuler	In person	Robert Schuler	Scientific discovery increasingly relies on the ability to acquire, curate, integrate, analyze, and share vast and varied datasets. For instance, advancements like AlphaFold, an AI-based protein prediction tool, and ChatGPT, a large language model-based chatbot, have generated immense excitement in science and industry for harnessing data and computation to solve significant challenges. However, it’s easy to overlook that these remarkable achievements were only made possible after the accumulation of a critical mass of AI-ready data. Both examples relied on open data sources meticulously generated by user communities over several decades. We argue that scalable, sustainable data repositories that bridge the divide between domain-specific and generalist repositories and that actively engage communities of investigators in the task of organizing and curating data will be required to meet the challenge of producing a future critical mass of data to unlock new discoveries. Such resources must move beyond the label of “repository” and instead employ a socio-technical approach that inculcates a culture and skill set for data management, sharing, reuse, and reproducibility. In this talk, we will discuss our efforts toward developing FaceBase as a “SCALE-able” data resource built on the principles of Self-service Curation, domain-Agnostic data-centric platforms, Lightweight information models, and Evolvable systems. Based on our approach, working within the dental, oral, craniofacial, and biologically relevant research community, we have seen several hundred studies encompassing many thousands of subjects and specimens’ worth of data across multiple imaging modalities and sequencing assay types contributed and curated by the community.
2025-07-23	15:40:00	15:50:00	12	ELIXIR/NIH-ODSS	From Platforms to Practice: How the ELIXIR Model Enables Impactful, Sustainable Biodata Resources	Fabio Liberante	In person	Fabio Liberante	Biodata resources are only as impactful as the ecosystems in which they operate. ELIXIR provides a coordinated European infrastructure that supports the sustainability, discoverability, and effective reuse of life science data — enabling biodata resources to thrive in an increasingly complex global research environment. This talk will provide an overview of how ELIXIR delivers this support through its Core Data Resources, five Platforms — including Data and Interoperability — and an active network of Communities. Together, these elements underpin the long-term value and resilience of biodata infrastructures by helping resources: Implement FAIR practices Link across scientific domains Plan for the full biodata resource lifecycle We will highlight the role of registries and standards, the monitoring and periodic review of Core Data Resources, and the importance of both qualitative and quantitative indicators in tracking impact. Recent challenges — including the effects of large-scale data scraping — will also be discussed, alongside the need to balance openness with sustainability. Finally, we will share some insights from ELIXIR’s international collaborations, including with the NIH, to illustrate how global coordination enhances the visibility, value, and future-proofing of open data infrastructures.
2025-07-23	15:50:00	16:00:00	12	ELIXIR/NIH-ODSS	NIH-ODSS Data Repository Landscape	Ishwar Chandramouliswaran		Ishwar Chandramouliswaran	This talk will share about the NIH data repository landscape and associated program initiatives
2025-07-23	16:40:00	17:00:00	12	ELIXIR/NIH-ODSS	Meeting user expectations in a resource constrained environment: Europe PMC’s approach	Melissa Harrison	In person	Melissa Harrison	Artificial intelligence (AI), in particular generative AI, is rapidly changing the expectations of researchers and how they approach literature search. New tools are being brought to market and established services are focussing on incorporating AI to develop more advanced features. In 2024 landscape analyses and focus groups outreach market research was conducted to understand evolving user needs as AI technology advances in research workflows. Over 50 scholarly discovery tools were assessed based on their governance and payment models, use of AI, and services offered. It highlighted issues around sustainability and widespread adoption of AI-enabled features, in particular for summarisation, recommendation, and natural language search. The focus groups outlined main user journeys involved in literature research and discovery, uncovering community doubts around the use of AI in relation to trust and transparency, and the need for reproducible results. As we develop innovative solutions and modernise existing infrastructure, acknowledging existing resource constraints, careful planning and assurances that the new features meet the needs of users is required. To better track user engagement we have increased tracking capabilities with Matomo web analytics tools and introduced A/B testing to the site to help ensure iterative improvements address user needs and make data-driven decisions efficiently. The new advanced search tool is released in beta stages to gather insights on user behaviour and incrementally improve performance and design. We will share outputs of our market and user research along with insights we have gained and our plans on how Europe PMC can address these challenges.
2025-07-23	17:00:00	17:20:00	12	ELIXIR/NIH-ODSS	Coopetition as a Catalyst for Researcher Engagement with Open Data	Mark Hahnel	In person	Mark Hahnel	The NIH GREI (Generalist Repository Ecosystem Initiative) aims to enhance data sharing and reuse of NIH-funded research by fostering collaboration among generalist repositories. It focuses on establishing consistent standards and practices, and promoting FAIR data principles to improve data discoverability. GREI's coopetition model allows for the creation of a more cohesive and effective data-sharing landscape, while still allowing individual repositories to innovate and differentiate themselves. Repositories work together to establish common standards, metadata, and best practices for data sharing, improving overall interoperability. They collaborate on initiatives that benefit the entire ecosystem, such as developing consistent metrics and enhancing data discoverability. While cooperating on core principles, repositories maintain their unique features, services, and competitive advantages. They continue to attract users by offering specialized functionalities, such as data visualization, analysis tools, or specific community support. This unique strategy for innovating in the repository space allows for uniform ways to track the impact of open datasets. Primarily through citations, we can start to look at ways in which researchers can be rewarded and incentivised to follow good data practices.
2025-07-23	17:20:00	17:40:00	12	ELIXIR/NIH-ODSS	Evaluating the Impact of Biodata Resources: Insights from EMBL-EBI’s Impact Assessments	Eleni Tzampatzopoulou	In person	Eleni Tzampatzopoulou	The provision of open access data through biodata resources is a critical driver of breakthroughs in life sciences research, advances in clinical practice and industry innovations that benefit humankind. However, understanding their long-term economic and societal impacts remains a challenge. As part of ongoing efforts to establish a framework and evidence base for demonstrating the value of open data resources, EMBL-EBI employs a combination of qualitative and quantitative approaches, such as service monitoring metrics, cost-benefit analyses, large-scale user surveys, data resource usage analysis and in-depth case studies. Service monitoring metrics, including unique visitors, data submission volumes and citation of datasets, indicate the breadth and diversity of user engagement with FAIR data resources. The 2024 user survey showcased the depth of utility users derive from resources, such as research years saved and reduced duplication of effort. Surveys and other user engagement also highlight EMBL-EBI’s contribution to downstream products and AI model development. Economic impact analyses, focused on the impact of direct increases in research efficiency, do not quantify these secondary or indirect impacts through data reuse, even though qualitative data suggests they are likely to be significant. Here we explore how mixed methods can characterise the impact of data reuse, considering methodologies such as in-depth case studies, data mining, administrative data and other novel approaches. We consider different methodologies EMBL-EBI has explored and propose how future impact monitoring could capture a fuller extent of the direct and indirect impacts of biodata resources, informing priority setting for life sciences funders.
2025-07-23	17:40:00	18:00:00	12	ELIXIR/NIH-ODSS	Beyond Citations: Measuring the Economic and Scientific Impact of UniProt in the Biodata Ecosystem	Alex Bateman		Alex Bateman	This talk presents a comprehensive cost-benefit analysis of UniProt, the universal protein resource that serves as a crucial catalogue for protein data in the scientific community. The analysis was carried out by CSIL as part of the Pathos project funded by the European Union's Horizon Europe framework programme. Drawing from extensive quantitative and qualitative research, we examine UniProt's economic impact across multiple dimensions including transaction cost savings, access cost savings, and labor cost savings for its diverse global user base. The analysis establishes a counterfactual scenario to evaluate what the research landscape would look like without UniProt, revealing significant efficiency gains and economic benefits that substantially outweigh operational costs. Beyond direct economic impacts, we explore UniProt's broader influence through citation and patent analysis, demonstrating its critical role in enabling scientific advancements across multiple fields and supporting sustainable development goals. The assessment methodology combines survey data, bibliometric analysis, and stakeholder interviews to provide a holistic view of how this essential resource facilitates knowledge dissemination and scientific innovation. Our findings offer valuable insights for research infrastructure evaluation and underscore UniProt's position as a foundational element of the global bioinformatics ecosystem.

- top -