ProDiGenIDB – a unified resource of disease-associated genes, their protein products, and intrinsic disorder annotations
Confirmed Presenter: Jovana Kovacevic, Faculty of Mathematics, Belgrade University
Track: BOKR: Bio-Ontologies and Knowledge Representation
Room: 03A
Format: In person
Moderator(s): Tiffany Callahan
Authors List: Show
- Jovana Kovacevic, Jovana Kovacevic, Faculty of Mathematics
- Anđelka Zečević, Anđelka Zečević, Mathematical Institute
- Lazar Vasović, Lazar Vasović, Faculty of Mathematics
Presentation Overview:Show
Understanding gene-disease associations is essential in biomedical research, yet relevant information is often distributed across multiple heterogeneous databases. To overcome this inconsistency, we developed ProDiGenIDB, an integrated database that consolidates gene-disease relationships from several recognized and publicly available sources, while also enriching them with complementary data on gene and protein identifiers, disease ontology, and protein structural disorder.
ProDiGenIDB brings together over 400,000 curated associations sourced from DisGeNet, COSMIC, HumsaVar, Orphanet, ClinVar, HPO, and DISEASES. Each entry includes gene-related metadata (Gene Symbol, Entrez ID, UniProt ID, Ensembl ID), disease descriptors (Disease Name, DOID), and a reference to the original source database.
Importantly, the database also incorporates predicted intrinsic disorder information for proteins encoded by the associated genes. These predictions were generated using commonly used protein disorder prediction tools such as IUPred and VSL2, providing an additional insight into potential the lack of structure of disease-related proteins.
Another important aspect of the database construction involved mapping disease names to standardized Disease Ontology IDs (DOIDs). To improve this process, we applied Natural Language Processing (NLP) techniques using advanced text representation models to enhance the accuracy and consistency of term association.
ProDiGenIDB represents a valuable resource for integrative biomedical studies, particularly in contexts where protein disorder is hypothesized to play a functional or pathological role.