Integrated data Modeling of Protein Structures by using a fact constellation model based on a XML Mediated Warehouse System

RongHua Li1, Sung-Hee Park2, Kwang Su Jeong, Keun Ho Ryu
1lrh@dblab.chungbuk.ac.kr, Chungbuk University; 2shpark@dblab.chungbuk.ac.kr, Chungbuk University

Recent trends of structural genomics demand to analyze and treat with large datasets that combine heterogeneous biological data propagated from diverse sources as production of biological data such as a volume of genome sequences and protein sequences, 3D structures, gene expressions and functions have increased. Therefore, many public databases include cross-references to link their internal data to related data in external databases for the purpose of integration and connection. To analyze and predict functions of proteins need to manage protein structures as well as related data. It is necessary to integrate and manage heterogeneous data related to proteins for supporting analysis and prediction of structures. In this paper, we propose integrated data modeling of protein structures and related information by using a relational multidimensional model, a fact constellation model and represent this modeling to XML data model in order to present and query highly complex and hierarchical protein data. The proposed integrated modeling was implemented in a XML mediated warehouse system, which was combination of a mediated warehouse system and a XML mediator. The proposed XML mediated warehouse system extend the mediated warehouse system to a XML mediator in order to perform complex queries employed during analyzing and mining process for protein structure data by using XML query language and processing.