In silico detection of CpG-island in plants
Stephane Rombauts1, Kobe Florquin2, Rouze Pierre and Yves Van de Peer
1strom@gengenp.rug.ac.be, University of Gent, dep. Plant Systems Biology; 2koflo@gengenp.rug.ac.be, University of Gent, dep. Plant Systems Biology
The knowledge on CpG-islands primarily comes from studies on animal sequences where CpG islands are linked to the regulation of gene expression through silencing, imprinting and cancer, to name a few (Inamdar et al., 1991; Finnegan et al., 2000).
CpG islands are characterized by a locally increased GC percentage compared to local averages and by the presence of CpGs (and CpNpGs in plants). The CpG dinucleotide, usually methylated at the fifth position on the cytosine ring, is counter selected for and found much less frequently than expected based on mononucleotide frequencies. This depletion is believed to result from accidental mutations by deamination of 5 methylcytosine to thymine (Duret and Galtier, 2000). In fact, CpG islands are considered evolutionary remnants where the deamination process has been hampered because some promoters have somehow been kept free of methylation in the course of evolution. Another explanation could be that to function as part of an expression pattern, a selection pressure has to be exerted and, hence, CpG islands stand out in the surrounding regions.
In plants CpG islands are less well studied and little experimental data exists confirming the existence of functional CpG-islands. Compared to animal systems, plants have a higher number of genes encoding DNA-methyltransferases, and it has been shown that among those methyltransferases, specificity was not restricted to CpG repeats and that methylation also occurred on CpNpG and non-symmetric CpNpH motifs. This implies the possible existence of CpNpG-islands next to the CpG-islands, but their role might be different (Sorensen et al., 1996).
The original description of CpG-islands was based on observations in the animal system, where it was postulated that about 40% of the genes are characterized by an over-representation of CG motifs in their promoters. However, this value is subject to debate (Venter et al., 2001). So far, no estimations have been made for plants on how many genes would have their expression influenced by either CpG or CpNpG-islands. We looked for discriminative parameters that would enable  the identification of distinct CpG or CpNpG-islands in promoter sequences.  To this end, we have build a data set of 5000 genes with an experimentally defined transcription start as well as exon-intron boundaries mapped on the genomic sequence. Using this data set we explored the compositional landscape surrounding promoters using different ranges of parameters.
1.	Duret L, Galtier N (2000) The covariation between TpA deficiency, CpG deficiency, and G+C content of human isochores is due to a mathematical artifact. Mol Biol Evol 17: 1620 1625
2.	Finnegan EJ, Peacock WJ, Dennis ES (2000b) DNA methylation, a key regulator of plant development and other processes. Curr Opin Genet Dev 10: 217 223
3.	Inamdar NM, Ehrlich KC, Ehrlich M (1991) CpG methylation inhibits binding of several sequence specific DNA binding proteins from pea, wheat, soybean and cauliflower. Plant Mol Biol 17: 111 123
4.	Sorensen MB, Muller M, Skerritt J, Simpson D (1996) Hordein promoter methylation and transcriptional activity in wild type and mutant barley endosperm. Mol Gen Genet 250: 750 760
5.	Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al. (2001) The sequence of the human genome. Science 291: 1304 1351