Discovering biological knowledge from gene expression using association rules

P. Carmona-Saez1, M. Chagoyen2, A. Rodriguez, O. Trelles, J.M. Carazo and A. Pascual-Montano, National Center of Biotechnology. Madrid;, National Center of Biotechnology. Madrid

Motivation: DNA microarray technology allows simultaneous measurement of thousands of genes in tens or hundreds of different conditions. This approach has represented a revolution in biology providing a methodology that is able to study biological processes in a genomic scale. Many data mining analysis techniques are been explored to extract biological information from microarrays data. One of these methods recently reported in this context is association rules discovery technique that can be used to find subsets of frequent items or events and establish orientated relations between them. For example, “if process A occurs then process B also occurs”. Application of this technique to microarrays data analysis has been only focused to find relation among genes that are co-expressed frequently. In this work we have explored the potential of this method to find other type of relationships of genes and conditions that appear frequently together in the dataset. For example, relationships among genes at different levels: expression at each experimental condition or sample, biological processes, metabolic pathways, promoter elements, chromosomal localization, clusters, relevant keywords, or any other type of annotation or description about the genes. This approach could be used to find relevant biological information from the dataset and find answers to relevant biological questions such: -What biological processes are acting or which molecular pathways are activated or inhibited when a cell or organism is subject to different conditions? -Which genes with similar promoter elements are co-expressed in a subset of experimental conditions? -Which experimental conditions have the same cell response? Results: We report that association rules discovery can be applied to microarrays data analysis to establish relations between any type of gene attribute and experimental conditions. To this end we have used the data set generated by Spellman et al. (Mol. Biol. Cell. 1998) which contains the transcriptional profile of cell cycle time course analysis in yeast. We have found that resulting rules provide relevant biological information and the method can automatically explore the dataset to find in which subset of experiments a percentage of genes with a certain function are up-or down-regulated. In addition, new important relationships between function and experimental conditions are also reported. Availability: An implementation of the method is included in the Engene software package, which is freely accessible upon request at Contact: