HMM Frameworks for Nuclear Receptor Binding Sites

Albin Sandelin1, Wyeth Wasserman2
1albin.sandelin@cgb.ki.se, Karolinska Institutet; 2wyeth@cmmt.ubc.ca, Univeristy of British Colombia

Nuclear receptors (NR) are transcription factors (TF) responsible for the regulated expression of genes active in diverse biological processes, such as metabolism, detoxification and cellular differentiation. NRs are unique in that they serve both as receptors for hormones or steroids and as TFs via a DNA binding domain resembling those present in C2H2 zinc finger proteins.

Unlike C2H2 proteins, DNA binding by NRs requires dimerization of subunits. Each DNA-binding domain of the subclass of nuclear hormone receptors binds to the consensus sequence AGGTCA. Specificity in binding site selection (and responses) is achieved using different dimerization modes – subunit pairings define the spacing and orientation of the factors and thereby the binding sites.

Standard bioinformatics methods for the detection of TF binding sites are based on profile models, termed Position Weight Matrices (PWMs). Since transcription factors generally do not tolerate insertions/deletions in the core binding site, PWM models are usually adequate descriptions of binding preferences. However, in the specific case of NRs, PWMs are clearly insufficient to describe the observed properties of binding sites.

We have constructed a framework for describing NR binding preferences as 1-state Hidden Markov Models, enabling simultaneous prediction and classification of NR binding sites. The method is generalized to apply to the entire structural class of NR proteins, with specific examples for well-studied NRs.