A New Method of Block/Spot Indexing with Maximal epsilon-Regularity Point Set for Microarray Image Analysis

Hee-Jeing Jin1, Ho-Youl Jung2, Hyun-Kyung Lee, Choon-Hwan Lee, Hwan-Gue Cho
1hjjin@pearl.cs.pusan.ac.kr, Pusan National University; 2hyjung@ngri.re.kr, National Genome Research Institute

The analysis of microarray data has gained increasing attention in biological high-throughput study. It is very difficult to automatically analyze DNA chip images due to several problems such as spot position variation, spot shape and size irregularity, sample contamination, and global problems that affect multiple spots. We propose a novel block and spot indexing algorithm with the use of maximal epsilon-regularity(e-regularity) points sets. A sequence is regular when it is evenly spaced and collinear. Based on the definition of the regularity, the e-regularity is defined as follows: A sequence satisfies e-regularity if each point in the sequence can be displaced by at most e along each axis to yield a regular sequence. Regularity detection is an important problem in a number of domains as computer vision and scene analysis. It takes a long time to find all the maximal e-regular sequences. However, the block/spot indexing problem, fortunately, can be solved with a small subset of all the maximal e-regular sequences. Therefore, we proposed a modified maximal e-regularity algorithm for the microarray image analysis. Our method goes as follows: The system first constructs a points set by segmenting the given microarray image (geometric center of the spots). We use the points set to find maximal e-regular sequences for block/spot indexing. For the block indexing, the system computes the address of each block by using the rotational angle of the image and unit distance calculated as follows: 1) The microarray image is divided into cells whose sizes are 2e by 2e for the given e (epsilon) value. 2) We use the term valid cell to refer to a cell that contains at least one spot, and the center points of the valid cells constitute a points set for the computation of the maximal e-regularity. 3) Among the obtained maximal e-regular sequences, a few sequences are selected for the computation of the rotational angle and unit distance. The criterion of the selection is the shortness of the distance between the adjacent points in the sequences. 4) We construct a graph by using the points set of each row and column according to the rotational angle. If the distance between two points is shorter than several times (X times) of unit distance, the points are included in the graph. The graphs which have common vertices are connected and merged into a single graph. If the number of the final graphs is not equal to the number of blocks, the variable X is automatically adjusted by the system. The rotational angle and unit distance for the spot indexing can be computed with exactly the same processes used in the block indexing. Our analytic model is validated by a large body of experimental results. The time complexity of our algorithm is O(n2) where n is the number of cells.