A sequence-independent strategy for the prediction of prokaryotic promoters

Pierre-Etienne Jacques1, Sebastien Rodrigue2, Jocelyn Beaucher, Jean-François Jacques, Luc Gaudreau, Jean Goulet and Ryszard Brzezinski
1pierre-etienne.jacques@hermes.usherb.ca, Universite de Sherbrooke; 2, Universite de Sherbrooke

The sequences recognized as promoters by the transcriptional machinery in prokaryotes are highly variable, depending on the sigma factor involved, and on the prokaryotic species considered. We developed an in silico approach based on the biological fact which show that promoters are localized, in the vast majority, in the upstream regulatory regions of genes. In other words, the possibility for a particular sequence to be a bona fide promoter can be evaluated from its mismatch distribution amongst the various areas of the genome. For the purpose of this analysis, we subdivided the genome in four functional areas representing the coding regions, the divergent transcription initiation regions, the single orientation transcription initiation regions, and the convergent transcription termination regions. The genome segment submitted to analysis for promoter detection is subdivided in hexanucleotide pairs spaced by 16 to 20 bp and a score is then calculated for each pair. The score takes into account the relative mismatch distributions amongst the four areas of the genome, and is balanced by a weight matrix predetermined from neural network training representing the distribution pattern of real promoters. Potential promoters should correspond to the hexanucleotide pairs with the highest score in the DNA segment analyzed. Consequently, our algorithm is distribution-dependent, and it not sequence-dependent and can be applied for any prokaryotic organism which genome has been completely sequenced. Our method was tested on many genomic regions of Bacillus subtilis, and we have been able to detect the same hexanucleotide pairs identified in the literature as the principal transcriptional promoter of these regions.