QSAR Analysis of Transcription Factors

Akinori Sarai1, Samuel Selvaraj2, Michael M. Gromiha, Hidetoshi Kono
1sarai@bse.kyutech.ac.jp, KIT; 2sel_emi@yahoo.co.uk, Bharathidasan University

Transcription factors play essential role in the gene regulation in higher organisms. Complete genome sequences of many organisms have opened up the possibility of systematic analysis of gene regulation at the genome level. Transcription factors usually bind to multiple target sequences and regulate multiple genes in a complex manner. In order to understand the molecular mechanism of target recognition, and to predict target genes for transcription factors at the genome level, it is important to analyze the relationship between the structure and function (activity) of transcription factors. Such QSAR analysis would also provide useful information for developing drugs targeted for transcription factors. We use two kinds of approaches: one is a knowledge-based approach, utilizing rapidly increasing structural data of protein-DNA complexes. We have derived empirical potential functions for the specific interactions between bases and amino acids from the statistical analysis on the structural data. Then these statistical potentials are used to evaluate the fitness of sequences to the complex structures of particular transcription factors by a combinatorial threading procedure similar to the fold recognition of protein structures. By threading a set of random DNA sequences onto the template structure, we calculate the Z-score of the specific sequences against the random sequences, which represent the specificity of the complex. The quantification of specificity has enabled us to establish the QSAR analysis of transcription factors. We have applied the QSAR to such questions as the roles of structural deformation of DNA and cooperativity in target recognition. The threading procedure is also applied to the real genome sequences in order to find potential target sites. We have attempted to predict targets of particular transcription factors in yeast genome. The results suggest that the method is capable of predicting experimentally known target genes and binding sites correctly. We are also using computer simulations of base-amino acid interactions to quantify the specificity of protein-DNA recognition. Combining the two approaches together, we aim to improve the QSAR analysis of transcription factors.