Hierarchical classification of cDNA libraries for gene expression analysis

Bumjin Kim1, Sanghyuk Lee2, Hyunjung Lee, Young-Ah Shin, Euiju Jung, Pora Kim
1unikbj@ewha.ac.kr, Division of Molecular Life Sciences, Ewha Womans University, Seoul 120-750, KOREA; 2sanghyuk@ewha.ac.kr, Division of Molecular Life Sciences, Ewha Womans University, Seoul 120-750, KOREA

Expressed sequence tags (ESTs) can be potential source of information on gene expression. We developed a hierarchical classification system of cDNA libraries to extract gene expression information in 4 categories - tissue type, pathology, developmental stage, and sex. Each category is constructed in hierarchical tree-type structure and represents the expression ontology of corresponding information. Unlike other similar systems, our tissue category includes organ, tissue, cell type information in consistent fashion. Our classification integrates all available information from NCI’s CGAP, SANBI’s database of library classification, MEROPS protease database, and cell line description. Then, we carefully examined the cDNA library descriptions for approximately 8200 human cDNA libraries contained in dbEST, and assigned the most detailed node in four categories. Information on library dissection method, vector, restriction enzymes, cell line, is also collected. The resulting database, what we call EODB (expression ontology database), should provide a comprehensive dataset for investigating gene expression using EST database. Unigene and our own EST clusters are currently being analyzed. We also provide a web-based tool for displaying expression profiles of query genes and gene search system with desired expression properties. EODB is available at http://genome.ewha.ac.kr/EODB/.