Efficiently finding regulatory elements using correlation with gene expression

Hideo Bannai1, Shunsuke Inenaga2, Ayumi Shinohara, Masayuki Takeda, Satoru Miyano
1bannai@ims.u-tokyo.ac.jp, Human Genome Center, Institute of Medical Science, University of Tokyo; 2s-ine@i.kyushu-u.ac.jp, Department of Informatics, Kyushu University

We present an efficient algorithm for detecting putative regulatory elements in the upstream DNA sequences of genes, using gene expression information obtained from microarray experiments. Based on a generalized suffix tree, our algorithm looks for motif patterns whose appearance in the upstream region is most correlated with the expression levels of the genes. We are able to find the optimal pattern, in time linear in the total length of the upstream sequences. We implement and apply our algorithm to publicly available microarray gene expression data, and show that our method is able to discover biologically significant motifs, including various motifs which have been reported previously using the same dataset. We further discuss applications for which the efficiency of the method is essential.