Identification of putative transcription factor binding sites conserved across orthologous human, mouse and rat sequences

Alex Gout1, Tim Beissbarth2, Joelle Michaud, Catherine Carmichael, Matthew Ritchie, Gordon
K. Smyth, Terry Speed, Hamish S. Scott
1gout@wehi.edu.au, The Walter and Eliza Hall Institute of Medical Research; 2beissbarth@wehi.edu.au, The Walter and Eliza Hall Institute of Medical Research

Transcription factors bind transcription factor binding sites (TFBS) in regulatory elements leading to interaction with the basal transcription apparatus (TATA-binding protein, TFIIA, TFIIB, TFIIF, TFIIE, TFIIH and RNA polymerase II) and transcriptional initiation of the target gene. Sequence data from genome projects and data generated from high throughput genomic techniques, such as microarrays, both require additional annotation to allow biological interpretation and to add value to the dataset. For example, knowledge of the patterns of TFBSs within the upstream regions of differentially expressed genes from microarray experiments may provide insight into transcriptional regulation, transcription factor interactions and help identify regulatory genetic networks.

However, to date, no readily query-able resource exists for observing potential TFBSs present within such large datasets. We have constructed a database of predicted TFBSs that are conserved between pairs/triples of orthologous human, mouse and rat genes. Upon obtaining upstream sequence, the Match search tool was used in conjunction with the Transfac database of TFBS matricies to identify potential TFBSs. Following an alignment of the orthologous upstream regions using MAVID, the positions of potential TFBSs were adjusted and then used to determine whether the TFBS is positionally conserved across the two or three species.