Novel Heterogeneous Maximum Likelihood Methods for The Detection of Adaptive Evolution.

Jennifer Commins1, Dr. James O. McInerney 2, NUI, Maynooth;, NUI, Maynooth

Maximum Likelihood (ML) methods are useful and widely used in the analysis of molecular evolution. In particular ML methods have become very popular in order to analyse sequences for signatures of adaptive evolution. Of particular importance is the choice of the evolutionary model in order to describe the data. With this in mind we have designed new methods for robustly inferring the evolutionary history of extant sequences and for precisely identifying signatures of adaptive evolution. In our approach we require the minimum of user intervention in order to find adaptive evolution events. A phylogenetic tree will be constructed from a sequence alignment and assumed to be correct, for each internal node of the tree; we evaluate silent (Ds) and replacement (Dn) substitutions between it and adjoining nodes, both ancestral and descendent (ML assumes that it is possible that every character-state can be found at every ancestral node). Also to be evaluated is whether or not these replacements remained invariable or whether they changed elsewhere in the phylogeny, the same kind of analysis is carried out on the silent substitutions. In this way, a path through the tree is maximised so that it represents the greatest difference in the behaviour of the silent and replacement substitutions. This can then be tested either for deviation from Dn:Ds ratio, or any other parameter that might indicate selection. This approach directly addresses claims that current ML methods are sensitive to violation of the assumptions regarding which model of evolution to use and how closely it approximates to reality and that in some cases the methods produce false-positive results when subjected to differing conditions. The software (on completion) will be tested on both real and fake data sets, where the likelihood can be verified using the existing Adaptive Evolution database. The end product of this project will be a software product that is capable of performing analyses of multiple sequence alignments in ways that are closer to biological reality than the existing methods. The software will be made freely available once completed at