Phylogenetic footprinting of co-expressed genes by Tree-Gibbs sampling

Stefan Van Yper 1, Olivier Thas 2, Jean-Pierre Ottoy and Wim Van Criekinge
1Stefan@biomath.rug.ac.be, Department of Applied Mathematics, Biometrics and Process Control, Ghent University; 2olivier.thas@rug.ac.be, Department of Applied Mathematics, Biometrics and Process Control, Ghent University

Motivation: Site/motif Gibbs sampling is a valuable technique for the detection and alignment of locally conserved regions (motifs) in both amino acids and nucleic acids sequences. Transcription factor binding sites can be found by analysing either the promoter sequences of co-expressed genes or the promoter sequences of orthologous genes. By combining both data sources in one algorithm additional information is available for the algorithm, resulting in improved, both in speed and accuracy, motif finding.

Results: The proposed Tree-Gibbs sampling algorithm makes this approach possible. It is an extension of the existing site/motif Gibbs sampler, programmed in C. On simulated data, the Tree-Gibbs algorithm works in situations where the classic site/motif sampler fails, but scores worse in others. Based on simulation studies a combination of the site Gibbs sampler with the Tree-Gibbs sampler gives the best results. Biological data will probably resemble a mixture of the generated datasets which makes the Tree-Gibbs sampler a valuable technique for phylogenetic footprinting of co-expressed genes.

Availability: The executable of the Tree-Gibbs sampler is available: http://biomath.rug.ac.be/~stefan/treegibbs, http://www.vancriekinge.com/treegibbs. The source code will be available at a later date. Both the 20/5 75 homology datasets and the perl datagenerating script are available.

Contact: stefan.vanyper@biomath.rug.ac.be

Keywords: Gibbs sampling; Motif; Co-expressed genes; Phylogenetic footprinting; Tree-Gibbs sampling.