A Visualization Framework to Assist in the Selection of SNP Markers for Association Studies of Complex Diseases

Francisco M. De La Vega1, Hadar Avi-Itzhak2
1delavefm@appliedbiosystems.com, Applied Biosystems; 2AviitzHI@appliedbiosystems.com, Applied Biosystems

Applied Biosystems developed a first generation whole-genome, gene-centric reference SNP map for use in candidate-gene, candidate region, and eventually whole-genome disease association studies. The selected candidate SNPs were validated by individually genotyping 180 DNA samples from African-American, Caucasian, Chinese, and Japanese individuals. The current set of validated markers comprises about 150,000 SNPs of high heterozygosity in at least one population available as ready to use 5’ nuclease assays. The individual genotypes generated thus far in the validation have enabled us to study the profile of linkage disequilibrium (LD) and haplotype diversity across all gene regions of the human genome in the four studied populations. This data offers a foundation for an empirically driven, rational design of association studies according to the specific population and region of the genome of interest. We developed a framework to visualize SNPs, haplotype blocks, and genes across chromosomal physical maps and their relationship with LD maps. In determining the optimal marker set for an association study that balances cost and statistical power, it is useful to examine the impact of making different choices for variables such as sample size, assumptions on the disease allele frequency, and the mode of inheritance. Thus, a summary of empirically-based power calculations under the assumptions of the common variant/common disease hypothesis are also represented in the visualization. Finally, strategies for the representation of SNP type, population-specific minimum-subsets of SNPs (“tagging” SNPs), and for identifying regions that will benefit from supplementary SNPs are discussed. The visualization framework presented here is intended to guide the cost-effective selection of SNP markers and their validated assays for candidate gene or candidate region disease association studies depending on the profile of LD obtained on reference population samples.