Construction of the plant gene index system based on tissue-categorized EST sets

Seung-Jae Noh1, Cheol-Goo Hur2, Sung-Ho Goh, Ho-Jin Chung, Kyoung-Oak Choi
1sjnoh@kribb.re.kr, Korea Research Institute of Bioscience and Biotechnology; 2hurlee@kribb.re.kr, Korea Research Institute of Bioscience and Biotechnology

For a decade, the computational analyses using enormous public or private expressed sequence tag (EST) data to make more valuable information in many research fields such as gene discovery and functional genomics have been done on several kinds of model organisms. One of such efforts is to make the gene index systems based on EST clustering. In this research, we constructed the web-based plant gene index systems on 9 principal plant model organisms including Arabidopsis, rice, soy bean, and etc. After grouping the EST sets by species, or by monocots vs dicots, or by 11 distinct tissues according to the EST library information, we had performed the clustering process using stackPACK™ tools (version 2.1, SANBI, http://www.sanbi.ac.za/Dbases.html). As a result, we got the 149,454 consensus sequences and 143,682 singletons by clustering 1,370,998 ESTs. Among total 149,454 consensus, 19,922 (13.3%) were assumed to be tissue-unique ones. In Arabidopsis, 1,283 (6.2%) of total 20,630 consensus were tissue-unique. Each consensus had been undertaken to sequence-homology search for several public databases such as Unigene or non-redundant protein database, to deduce its molecular function, to assess the possibility of CDS candidate or to be a splice variant, and finally to categorize its gene function according to the gene ontology system. Our web-based system can be browsable by navigating well-organized hyperlinks and can be searchable by using keywords or user own sequences with user-friendly web interface. Our data can be freely accessed and downloaded at http://plant.pdrc.re.kr/new_korea/genepool/Plant/index.html