A multivariate method for comparison of microarray data from different platforms

Aedin C Culhane1, Guy Perriére2, Desmond G. Higgins
1A.Culhane@ucc.ie, University College Cork; 2perriere@biomserv.univ-lyon.fr, Universite Claude Bernard

Rapid development of microarray technology has resulted in different laboratories adopting numerous different protocols and technological platforms, which has severely impacted on the comparability of microarray results. Current meta-analyses of microarray gene expression data are usually based on cross-referencing the annotation of each spot represented on each array, extracting genes common to all arrays and comparing expression data of these genes. Filtering of genes to a subset represented across all arrays often excludes many thousands of genes, because different subsets of genes from the genome are represented on different microarrays, as it is not yet technically possible to represent the entire human genome together with all possible splice variants on one microarray chip.

We wish to describe the application of a powerful yet simple method for cross-platform comparison of gene expression data. Co-inertia analysis (CIA, Dolédec and Chessel, 1994) is a multivariate method that identifies trends or co-relationships in multiple datasets. CIA simultaneously finds ordinations (dimension reduction diagrams) from the datasets that are most similar. It does this by finding successive axes from the two datasets with maximum covariance. CIA can be applied to datasets where the number of variables (genes) far exceeds the number of samples (arrays) such is the case with microarray analyses. In addition CIA accepts datasets with different numbers of variables e.g. datasets containing different numbers of genes, thus genes do not have to be filtered in advance. CIA of microarray datasets is not sensitive to array annotation errors.

We illustrate the power of CIA by using it to identify the main common relationships in expression data on a panel of 60 cell lines from the National Cancer Institute (NCI) which have been subjected to microarray studies using both Affymetrix and spotted cDNA array technology. The co-ordinates of the CIA projections of the 60 cell lines from each dataset are graphed in a bi-plot and are connected by a line, the length of which indicates the divergence between two datasets. Thus CIA provides graphical representation of consensus and divergence between the gene expression profiles from different microarray platforms. Secondly the genes that define the main trends in the analysis can be easily identified. For example, the colon, leukaemia and melanoma cell lines defined the first two principal axes of CIA of the NCI60 cell lines. Of the top 60 most important genes on these axes, only 13 were represented on both Affymetrix and cDNA spotted arrays. We believe CIA is a robust flexible method for comparison and visualisation of multiple gene expression datasets.

CIA was implemented in the ADE-4 software, which is freely available for MacOS 7 and Windows operating systems. ADE-4 is also distributed as a package in R. Further details about CIA are available on our web site

Dolédec, S. & Chessel, D. (1994) Co-inertia analysis: an alternative method for studying species-environment relationships. Freshwater Biology 31. 277-294.