Scalable multi-processor application for gene expression profile clustering

Andrey Ptitsyn1, Pennington Biomedical Research Center

Cluster analysis is widely used in functional genomics to analyse the expression profiles of thousands of genes across multiple conditions. Available clustering algorithms are oriented primarily towards recognition of genes that show coordinated expression, i.e. they are either up- or down- regulated in the same samples. We take the next step from recognition of clusters of similar profiles to recognition of patterns of interacting gene groups in expression space. The algorithms, developed at PBRC can reproduce the results of hierarchail tree clustering, but with a change of parameters the same application can be set up to recognize more loose associations of objects, bases on the geometrical shape, density and other properties. With a rapid growth of the volume of microarray data clustering of expression profiles becomes more and more computationally demanding. We would like to report a high-performance application, able to cluster the number of gene expression profiles limited only by the number of genes in the organism. Potential number of conditions (for example, microarray experiments) is also limited by the availability of the data rather then computational power. The algorithms are implemented as a scalable multi-processor application for high-performance computers and Beowulf clusters The program is written in MPI standard and tested on IBM AIX Parallel Environment. The testing has been done on the LSU High Performance Computing cluster of IBM RS6000. The same clustering application can be used in a broad variety of research fields infolving cluster analysis. The source code is freely available to the academic community.