Evolutionary analysis of single nucleotide polymorphism distribution in duplicated gene pairs of Arabidopsis thaliana

Brad Chapman1, Andrew Paterson2
1chapmanb@uga.edu, University of Georgia; 2paterson@uga.edu, University of Georgia

The complete sequencing of the model organism Arabidopsis thaliana revealed an interesting picture of the role of genome duplication in the history of plant evolution. Sequence comparison methods revealed numerous co-linear duplicated blocks in the finished genome, indicating the role of large scale duplication in shaping the current Arabidopsis genome. This result was surprising, given the relatively compact genome size of this model organism. Subsequent analysis of these duplicated blocks has revealed evidence for as many as three whole genome duplications during the history of Arabidopsis. These duplications, spaced at different intervals during plant evolution, have left a genome containing numerous duplicated genes shaped by the evolutionary processes that follow the duplication event. The creation of gene duplicates has long been proposed to be an important mechanism for the development of novel gene functionality. The general thinking is that duplication of a gene pairs provides a copy under relaxed selection pressure, since any essential functionality of the gene can be buffered by the duplicate pair. We were interested in examining the evolution of duplicate pairs by looking at the accumulation of single nucleotide polymorphisms (SNPs) in duplicate genes created by past whole genome duplications. Cereon Genomics has identified more than 37,000 SNPs between the publicly sequenced Columbia ecotype of Arabidopsis and their private Landsberg erecta sequence. These SNPs provide a picture of how single nucleotide changes are accumulating in the genome in relatively recent evolutionary time (in comparison with described whole genome duplications). Characterized SNPs were located within predicted coding sequences of Arabidopsis genes. The coding sequences were then characterized into singletons and duplicates created by whole genome duplication events. The distribution of SNPs between codon positions was examined, along with the amino acid changes caused by the polymorphism. Interestingly, SNP accumulation in duplicate genes was found to be more conservative then their distribution in both singletons and the whole set of Arabidopsis genes. Duplicates had a larger percentage of substitutions in the third (wobble) position of codons, and also contained less total SNP changes when compared with single copy genes. Additionally, these SNPs caused more conservative amino acid changes in the protein composition of duplicates. We discuss the relevance of these results to evolution in duplicate genes. Additionally, we examine how multi-domain proteins may effect the accumulation of SNP changes in both singletons and duplicates. The goal is to examine the unique mechanisms influencing the evolution of duplicate pairs after whole genome duplication. This provides a framework for understanding how duplication can lead to the development of unique gene functions.