Computing accurate phylogenies from gene-order data

Jijun Tang1, Bernard M.E. Moret2
1jtang@cs.unm.edu, University of New Mexico; 2moret@cs.unm.edu, University of New Mexico

DCM-GRAPPA is a highly accurate method for phylogeny reconstruction from gene-order data that scales gracefully to one thousand genomes, greatly extending the range and accuracy of existing methods.
DCM combines the disk-covering method (DCM) of Warnow et al. with the GRAPPA suite of software. GRAPPA, based on an approach pioneered by Sankoff, is the most accurate method to date for phylogenetic reconstruction from gene-order data, but is limited computationally to 16 genomes; DCM-GRAPPA removes that limit without losing accuracy through a two-step approach: it first decomposes the dataset into smaller overlapping pieces and runs GRAPPA on the pieces; it then uses the strict consensus method of DCM to produce a single tree from the overlapping trees produced by GRAPPA. (Details of our work with DCM-GRAPPA appear in this conference.)
We also extended GRAPPA itself to handle limited amounts of duplication and deletion among the genomes -- a necessary feature to work with real datasets. Again, the resulting software produces reconstructions accurate within a few percent. (Details of our work with unequal gene content will appear in the proceedings of the 8th Workshop on Algorithms and Data Structures, WADS'03.)
GRAPPA and DCM-GRAPPA are available in source form at http://www.cs.unm.edu/~moret/GRAPPA/