Discovering useful patterns from DNA microarray experiment with large-scale multifactor design by genetic algorithm and permutation test

Ju Han Kim1, Tae Su Chung2, Jihun Kim, Ji Yeon Park, Hye Won Lee, Jihoon Kim, Mingoo Kim
1juhan@snu.ac.kr, Seoul National University Human Genome Research Institute; 2epiai@korea.com, Seoul National University Human Genome Research Institute

Abstract Motivation: Replication is a cornerstone of scientific research, including DNA microarray experiment. We often, however, have to deal with experimental data with no replication because of some unwanted limitations in resource, methodology, or ignorance. In a multifactor design with many factors and/or levels, replication can be prohibitively costly. Biological interpretation of standard statistical analysis for multifactor design can sometimes be very complicated. Results: Because we are mainly interested in finding factor-specific effects and their interactions, we assumed that each observed gene expression pattern was created by a certain combination of underlying factor-specific generative functions. A genetic algorithm was created to search, for each observation, for the best-fit distinct generative pattern among all possible combinations. Permutation test for the distance measures between each observation and the corresponding distinct pattern discovered statistically significant multifactor gene expression patterns with very simple biological interpretations. Identifying genes with cancer-and/or-drug specific expression patterns was demonstrated in a microarray experiment of multifactor design with no replication.