Studies of the transcriptional regulation of the genes coding for the novel IL28A,B and IL29 protein family: Illustration of an in silico approach applicable on a genomic scale

William Krivan1, Brian Fox2, Emily Cooper, Teresa Gilbert, Frank Grant, Betty
Haldeman, Katherine Henderson, Wayne Kindsvogel, Kevin Klucher, Gary
McKnight, Patrick O'Hara, Scott Presnell, Monica Tackett, David Taft, and
Paul Sheppard
1krivan@zgi.com, ZymoGenetics, Inc.; 2bfox@zgi.com, ZymoGenetics, Inc.

The novel IL28A,B and IL29 protein family consists of three non-allelic human proteins, and homologous mouse proteins, which are distantly related to interferons and IL-10. We use this protein family to illustrate an approach to the computational identification and characterization of putative transcriptional regulatory regions that consists of a combination of available and novel techniques that can be applied on a genomic scale. Insights into the regulatory mechanisms of the novel IL28A,B and IL29 protein family may be gained from comparisons of their potential regulatory regions with the regulatory regions of characterized cytokines as, for example, IFN-alpha, beta, and gamma. In metazoans, however, it is in general not feasible to study co-regulation of paralogous genes by simply performing alignments of the upstream sequences, an approach that has been successful for yeast and bacteria. Comparisons of potential regulatory regions must reveal more subtle similarities such as the individual transcription factor binding sites. However, the low binding specificity of transcription factors results in a high rate of false predictions in the analysis of genes from metazoan species. The number of predicted sites can be reduced by about one order of magnitude to a set more likely to have sequence-specific functions by means of phylogenetic footprinting, a conservation-based filter based on the biological observation that regulatory regions are often more highly conserved between species than other non-coding regions. Another technique that can be used for the selection of presumably functional motifs is motivated by the observation that groups of transcription factors rather than single factors are required for the function of regulatory regions and is based on the hypothesis that statistical significance of clusters of sites is correlated with biological function. We illustrate the combined application of these techniques for the characterization of putative regulatory regions of IL28A,B and IL29.