Statistical Promoter Regulatory Element Analysis of cDNA Microarray Data For the Prediction of cAMP Responsive Genes

Lyle D Burgoon1, Ken Y Kwan, Tim Zacharewski2
1 Dept of Pharmacology & Toxicology; 2 Institute for Environmental Toxicology, National Food Safety & Toxicology Center

Statistical Promoter Regulatory Element Analysis of cDNA Microarray Data For the Prediction of cAMP Responsive Genes. Burgoon LD1,2, Kwan KY2,3, Zacharewski TR2,3. 1Dept of Pharmacology & Toxicology, 2Institute for Environmental Toxicology, National Food Safety & Toxicology Center, 3Dept of Biochemistry & Molecular Biology cDNA microarrays facilitate the study of mechanistic chemical activity within cells and tissues. Changes in gene expression patterns result from alterations in transcription factor binding and recruitment of corepressors and coactivators following chemical exposure. Temporal microarray analysis of 3,000 genes from the glial human cell line, SVG, in response to forskolin (F) and 3-isobutyl-1-methylxanthine (IBMX) treatment has identified genes active in the treated cells over time compared to vehicle controls. F/IBMX treatment of cells increases intracellular cAMP levels, leading to changes in cAMP response element (CRE) binding by the CRE binding protein (CREB), and modulation of gene expression. The Statistical Promoter Regulatory Element (SPREE) application has been developed to predict transcription factor binding sites within the promoter and enhancer regions of genes by calculating a total adjusted log of the odds (LOD) score, based on a position weight matrix analysis of the 5,000 bases upstream of the transcription start site. SPREE, coupled with Support Vector Machines (SVM) analysis of predicted response elements and microarray gene expression has identified putative cAMP-responsive genes. The SPREE total adjusted LOD score, number of predicted elements, and normalized microarray data are passed to the SVM radial basis kernel for analysis. The SVM outperformed Fuzzy C-Means, Unsupervized Fuzzy Competitive Learning, and k-Means methods on the test set, with positive model validation by leave-one-out cross validation for all methods except k-Means. The sensitivity of each model for identifying cAMP-responsive genes was 1.0, 0.8, 0.8, and 0.2, respectively. The SVM generated model classified 66 out of 84 genes with early active gene expression patterns as cAMP-responsive genes, with an average of 3.64 ± 1.44 (mean ± standard deviation; p < .001) predicted CREs, and a mean total adjusted LOD score of 1.46 ± 1.08 (p < .001), as opposed to a mean of 1.12 ± 0.86 predicted CREs, and a mean total adjusted LOD score of 0.37 ± 0.32 for the non cAMP-responsive genes.