PyPop: A framework for large-scale population genomics analysis

Alex Lancaster1, Mark P. Nelson2, Richard M. Single; Diogo Meyer; Glenys Thomson
1alexl@socrates.berkeley.edu, UC Berkeley; 2, UC Berkeley

PyPop (Python for Population Genetics) is a suite of programs for the large-scale analysis of multi-locus population genetic data. It includes tests for conformity to Hardy-Weinberg Proportions (HWP); tests for balancing or directional selection; estimates of haplotype frequencies; measures of linkage disequilibrium (LD) and tests of significance of LD. It can also interoperate with other population genetic packages such as Arlequin. PyPop is an object-oriented framework implemented in Python, and was originally developed to analyze the highly polymorphic HLA region in the human genome, but can be used for any multi-locus data. Outputs of the analyses are stored in XML which can then be transformed into many other data formats suitable for machine input (such as PHYLIP) or input for spreadsheet programs or statistical packages, such as R, plain text, or HTML. Storing the output in XML allows the final viewable output format to be redesigned at will, without requiring the time-consuming re-running of the statistical tests. The XML output facilitates the processing of results from analyses on large numbers of populations. PyPop will be made freely available under the GNU GPL at: http://allele5.biol.berkeley.edu/pypop/