The DSMZ (German Collection of Microorganisms and Cell Cultures) hosts a wealth of biological data, covering microbial traits (BacDive), taxonomy (LPSN), enzymes and ligands (BRENDA), rRNA genes (SILVA), cell lines (CellDive), cultivation media (MediaDive), strain identity (StrainInfo), and more. To make these diverse datasets accessible and interoperable, the DSMZ Digital Diversity initiative provides a central hub for integrated data and establishes a framework for linking and accessing these resources (https://hub.dsmz.de).
At its core lies the DSMZ Digital Diversity Ontology (D3O), an upper ontology designed to unify key concepts across all databases, enabling seamless integration and advanced exploration. This ontology is complemented by well-established ontologies such as ChEBI, ENVO, and NCIT, among others. By standardizing all resources within a defined vocabulary, we enhance their interoperability, both internally and with the Linked Open Data community. Where necessary, we will also develop and curate our own ontologies, such as the well-known BRENDA tissue ontology (BTO), a comprehensive ontology for LPSN taxonomy and nomenclature, and the Microbial Isolation Source Ontology (MISO), which has already been applied to annotate more than 80,000 microbial strains.
D3O also provides a stable foundation for transforming our databases into RDF (resource description framework) and providing the knowledge graphs via open SPARQL endpoints. The first knowledge graphs of BacDive and MediaDive are already available at https://sparql.dsmz.de, enabling researchers to query and analyze microbial trait data and cultivation media. These initial steps lay the groundwork for integrating additional databases, such as BRENDA and StrainInfo, into unified, queryable knowledge graphs.