Correlated errors of neural network predictions improve fold recognition.

Dariusz Przybylski1, Burkhard Rost2
1dudek@cubic.bioc.columbia.edu, Columbia Univeristy; 2rost@columbia.edu, Columbia University

Use of predicted 1D information (secondary structure and solvent accessibility) in fold recognition algorithms was shown to significantly improve sensitivity. Often however, it came at the cost of declining accuracy. Here we present a way of improving both sensitivity and accuracy. The approach relies on the observation of correlation between errors of 1D predictions from neural networks. We build generalized scoring matrices that include 1D predicted information and show that they produce well defined random score distributions. This leads to accurate estimates of statistical significance of alignment scores, allowing us to develop a high accuracy and sensitivity method for searching a database of generalized protein family profiles. Finally, we present a way of combining forward and reverse database searches to improve accuracy and sensitivity even further.