C. elegans microarray data seen through a novel nonmetric multidimensional scaling method

Y-h. Taguchi1, Y. Oono2
1tag@granular.com, Department of Physics, Chuo University; 2y-oono@uiuc.edu, Department of Physics, UIUC

We have developed a novel nonmetric multidimensional scaling method that avoids any intermediate distances needed for monotone fit. Consequently, the algorithm is maximally nonmetric and is efficient enough to allow large scale data analysis with a small computer.
The C. elegans microarray data avaialbale at http://www.sciencemag.org/feature/data/kim1061603/kimbig.zip have been analyzed by this method. We have found that the information captured in the correlation coefficients of the gene expression levels can be visualized in a 3D space as a thick spherical shell. The dimensionality of this space is uniquely specified by the data. The locations of the genes are consistent with their known annotation results and are stable against the choice of experiments and genes used in the analysis, provided the number of randomly chosen experiments is more than 100, and that of genes more than 1000. The 3D coordinates of the embedded genes have turned out to be expressed as linear combinations of small number of the original microarray experimental data. The source code in fortran is available at http://www.granular.com/MDS/.