Genome phylogenies based on the mean normalized BLASTP score

Robert G. Beiko1, Robert L. Charlebois2, Mark A. Ragan
1rbeiko@science.uottawa.ca, Dept. of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON, K1N 6N5, Canada; Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia; 2rlcharl@neurogadgets.com, GenomeAtlantic, 1721 Lower Water St., Suite 401, Halifax, NS, B3J 1S5, Canada; Dept. of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON, K1N 6N5, Canada; NeuroGadgets Inc., www.neurogadgets.com; Evolutionary Biology Program, Canadian Institute for Advanced Research

When different genes from the same set of genomes are used for tree reconstruction, the tree topologies produced are often inconsistent. While these differences can sometimes be attributed to artifacts of the tree reconstruction method or a lack of signal in the data, the disparate trees are often strongly supported by multiple phylogenetic methods. If genomes lack consistent phylogenetic signals, then the idea of genome evolution through vertical descent (or of ‘organismal’ phylogeny) is thrown into doubt. We present here our method for reconstructing the phylogeny of entire genomes. This method establishes a distance score between each pair of genomes, which represents the mean of the normalized BLASTP scores for all orthologs between the pair. The resulting matrix of pairwise distances is converted to a phylogenetic tree using the Fitch algorithm. In this analysis, we also pre-screen the genomic data to eliminate phylogenetically discordant sequences (PDS) and yield more highly supported trees. When this method is applied to the ~130 publicly available genomes, we obtain a tree that is largely consistent with established taxonomic groupings. In some cases where our results disagree with accepted groupings, we offer explanations for the observed discrepancy. Notwithstanding these differences, the extensive agreement between our tree and accepted lineages supports the idea that genomic evolution is mainly a vertical process.