GOODIES: Gene Ontology-based Data-mining Tool for Biological Interpretation and Functional Classification on a Group of Biological Entities

Sung Geun Lee1, Wan Seon Lee2
1sglee@istech21.com, Bioinformatics Unit, ISTECH Inc.; 2konan@istech21.com, Bioinformatics Unit, ISTECH Inc.

GOODIES is a Gene Ontology(GO)-based data-mining tool with intuitive visualization on a GO tree. Its algorithm uses the ontological structure of GO to interpret and classify aggregates of biological entities. Given a gene group, GOODIES takes the multiple functionality of genes into account and selects the optimal GO candidate terms from combinatorially many choices for biological interpretation of the group. The main usage of GOODIES is as follows: biologically-oriented cluster analysis of DNA microarray data, automated functional annotation via clustering, and functional categorization of biological objects. First, in cluster analysis of DNA microarrays, biologists primarily want to know how well clusters of expression profiles are associated with known functional categories and cellular processes. GOODIES performs such a task in terms of GO that it is complementary to statistical clustering methods. Secondly, the function of unknown genes can be putatively predicted through the clustering interpretation of GOODIES. After the biological relationship for each cluster is quantitatively measured by a newly-defined metric AverPd on GO terms, the clusters whose AverPd score is sufficiently low can be used for functional assignment of unknown genes in those clusters. Thirdly, GOODIES can accomplish a large-scale functional categorization of biological entities – e.g. ESTs, genes, and proteins – according to the GO annotations of each entity that are extracted from reliable, curated databases. As more information is accumulating about the genomes, GOODIES will be a more helpful data-mining tool in functional genomics.