PoPS: Prediction of Protease Specificity

S.E. Boyd ¹,²,³, M.M. Cameron ¹,²,³, S. Gunawan ¹, G.B. Rudy ⁴, R.N. Pike ²,³, J.C. Whisstock ²,³, and M. Garcia de la Banda ¹,²
¹ School of Computer Science and Software Engineering, Monash University; ² Victorian Bioinformatics Consortium, Monash University; ³ Department of Biochemistry and Molecular Biology, Monash University; ⁴ Genetics and Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research

Proteases are an important kind of enzyme which can bind to the amino acids of other proteins (referred to as substrates) and, if the binding is sufficiently strong, cleave them. This makes proteases particularly interesting to areas of research such as medicine, pharmaceutical development and biotechnology. However, despite their importance, the role and target substrate of many proteases remain uncharacterized. One reason for this is the difficulty in determining protease specificity, i.e. the particular preferences of a given protease for the amino acids in the substrates it cleaves. This difficulty arises, amongst other reasons, because of the high cost of laboratory techniques and the lack of accessibility to a great deal of expert specificity knowledge. To alleviate this problem, we have recently developed the PoPS (Prediction of Protease Specificity) program, publicly accessible as a Java Applet at http://pops.csse.monash.edu.au. PoPS allows users to easily build computational models of specificity for any protease using both expert knowledge and raw experimental data. Models are easily understood while being sensitive enough to express detailed protease specificity. Users can experiment with them, and they can be stored and retrieved from PoPS' central models database. Once a model of specificity has been built, the user can then apply the models to any protein substrate, and PoPS will identify and rank possible cleavages within the substrate. Furthermore, improbable cleavages can be detected by using the substrate's structural information. In this way, PoPS can be used to supplement and accelerate laboratory work. In addition, PoPS can apply a model to entire protein databases, allowing researchers to discover possible target substrates of the protease. To create a portable and robust tool satisfying all the user requirements, PoPS is of necessity a complex system composed of many modules implemented using different languages depending on the individual requirements of each separate module. However, all these modules are based on a single concept: PoPS' computational model for protease specificity. We are aware of only two other tools for predicting protease cleavage: Cutter and PeptideCutter. However, both tools deal with a fixed set of proteases and provide predefined models which cannot be altered. Furthermore, their computational models use pattern matching, and therefore lack the flexibility to discover non-obvious cleavage sites. For example, when predicting substrate cleavage for 40 known cleavage sites of the protease known as caspase 3, PoPS was able to identify 27 of these as being the most likely cleavage site within the substrate. In contrast, Cutter does not provide a caspase 3 model, and does not allow users to add one, and the PeptideCutter model could only identify 7 of the 40 known cleavage sites.