QTL analysis for outcrossing family data using genetic algorithm and simulated EM algorithm

Reiichiro Nakamichi1, Satoru Miyano2
1rei-naka@ims.u-tokyo.ac.jp, Human Genome Center, Institute of Medical Science, the University of Tokyo; 2miyano@ims.u-tokyo.ac.jp, Human Genome Center, Institute of Medical Science, the University of Tokyo

We propose a new method to detect quantitative trait loci (QTL). QTL analysis is a quick and economical method to detect narrow regions in a genome controlling quantitative traits using information from the phenotypic value and molecular markers. Discovery of genes susceptible to disease is valuable to create new medicines and to assess the side-effects. When doing these kinds of gene mapping, QTL analysis allows not only testing “affected” or “not affected” but testing the amount of affection. The application of QTL mapping for outcrossing species, especially for human, was limited to a simple case-control study because the traditional QTL analysis depends on a highly organized mating experiment. Our new method using genetic algorithm (GA) with simulated EM algorithm allows detecting QTL without experimental cross. Let us consider randomly mating families. Given a set of the locations of QTL, we can evaluate the plausibility of the set using likelihood based on observed phenotype and marker information. Therefore, QTL mapping is a case of combinatorial optimization to search for the combination of locations on a genome, which maximize the likelihood. We adopted GA for this optimization. In our GA, the “genotype” is the number and location of QTL, and the “fitness” is the likelihood function. Though the marker information is not directly observed in the case of random mating, the posterior probability of the marker information is available using whole the observed markers. Missing marker information is simulated by random sampling following this posterior probability, and the random sampling is repeated and refreshed for each GA individual and each GA generation. Simulation studies clearly showed high performance of our GA with simulated EM algorithm in the cases which are not supported by traditional gene mappings.