Biclustering Microarray Data by Gibbs Sampling

Qizheng Sheng1, Yves Moreau2, Bart De Moor
1qizheng.sheng@esat.kuleuven.ac.be, Katholieke Universiteit Leuven, Department of Electrical Engineering; 2yves.moreau@esat.kuleuven.ac.be, Katholieke Universiteit Leuven, Department of Electrical Engineering

Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data present similar challenges, we have adapted this strategy to the biclustering of discretized micorarray data. In contrast with standard clustering that reveals genes that behave similarly over all the conditions, biclustering groups genes over only a subset of conditions for which those genes have a sharp probability distribution. Gibbs sampling has the key advantage of providing transparent probabilistic interpretation of the biclusters. Furthermore, Gibbs sampling does not suffer from the problem of local minima that often characterizes Expectation-Maximizatoin. We demostrate the effectiveness of our approach on a synthetic data set as well as a data set from leukemia patients.