The automatic discovery of structural principles describing protein fold space

Adrian P Cootes1, Michael je Sternberg2, Stephen H Muggleton
1a.cootes@ic.ac.uk, Imperial College; 2m.sternberg@ic.ac.uk, Imperial College

The study of protein structure has largely been driven by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will make it increasingly difficult to analyse (and determine the principles responsible for) the distribution of proteins in fold space by inspection alone. Here, we demonstrate a machine-learning strategy that automatically determines the structural principles describing 45 folds. The rules learnt were shown to be both statistically significant and meaningful to protein experts. With the increasing emphasis on high-throughput experimental initiatives, machine-learning and other automated methods of analysis will become increasingly important for many biological problems.