Prediction of disulfide connectivity patterns in proteins

Shih-Chieh Chen1, Chi-Hung Tsai2, Huai-Kuang Tsai, Cheng-Yan Kao, Bioinfo Lab., Department of CSIE, National Taiwan University;, Bioinfo Lab., Department of CSIE, National Taiwan University

We propose an approach which can effectively predict the protein disulfide connectivity pattern directly from the sequence. In protein structure prediction, the conformation space is extremely large. Constraints were applied, such as secondary structure, to reduce the search space and thus the prediction accuracy could be improved. Disulfide bonds, the covalent strengths between two cysteines, are common in extracellular proteins. The correct prediction of disulfide connectivity can strongly reduce the conformation space and may also help in predicting protein structure. The approach combines two steps: 1) Train a model to predict the bond potential for all pairs of cysteines from the training set. 2) For a sequence of protein, the predicted bond potential is adopted to find the most possible disulfide connectivity pattern. In step 1 each pair of cysteines in the training set, whether forming disulfide bond or not, were fed into the Support Vector Regression to train the bond potential predictor. In step 2, for a target sequence, a weighted complete graph is constructed which cysteines are represented by vertices and the weights of edges are corresponding bond potential. The Gabow’s algorithm was applied to find the perfect matching with maximal weight. According to the matching, a disulfide connectivity pattern is obtained. We performed a four-fold cross-validation procedure on a data set which contains 1005 proteins. As the result of experiments, the proposed approach has an overall accuracy of 62%, which is much better than that of previous works (Fariselli and Casadio, 2001), and in the case of proteins with five disulfide bonds, the accuracy is 294 times higher than a random predictor. In summary, the proposed method is promising to locate the disulfide bridges in proteins. This work is partially supported by NSC 91-3112-B-002-001, Taiwan 1. Fariselli,P., and Casadio,R. (2001) Prediction of disulfide connectivity in proteins. Bioinformatics, 17, 957-964