PoPS: Prediction of Protease Specificity
S.E. Boyd 1,2,3, M.M. Cameron 1,2,3, S. Gunawan 1, G.B. Rudy 4, R.N. Pike 2,3, J.C. Whisstock 2,3, and M.
Garcia
de la Banda 1,2
1 School of Computer Science and Software Engineering, Monash University; 2 Victorian Bioinformatics
Consortium, Monash University; 3 Department of Biochemistry and Molecular Biology, Monash University;
4 Genetics and Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research
Proteases are an important kind of enzyme which can bind to the amino acids of other
proteins (referred to as substrates) and, if the binding is sufficiently strong, cleave them. This
makes proteases particularly interesting to areas of research such as medicine, pharmaceutical
development and biotechnology. However, despite their importance, the role and target
substrate of many proteases remain uncharacterized. One reason for this is the difficulty in
determining protease specificity, i.e. the particular preferences of a given protease for the
amino acids in the substrates it cleaves. This difficulty arises, amongst other reasons,
because of the high cost of laboratory techniques and the lack of accessibility to a great deal
of expert specificity knowledge.
To alleviate this problem, we have recently developed the PoPS (Prediction of Protease
Specificity) program, publicly accessible as a Java Applet at http://pops.csse.monash.edu.au.
PoPS allows users to easily build computational models of specificity for any protease using
both expert knowledge and raw experimental data. Models are easily understood while being
sensitive enough to express detailed protease specificity. Users can experiment with them,
and they can be stored and retrieved from PoPS' central models database. Once a model of
specificity has been built, the user can then apply the models to any protein substrate, and
PoPS will identify and rank possible cleavages within the substrate. Furthermore, improbable
cleavages can be detected by using the substrate's structural information. In this way, PoPS
can be used to supplement and accelerate laboratory work. In addition, PoPS can apply a
model to entire protein databases, allowing researchers to discover possible target substrates
of the protease.
To create a portable and robust tool satisfying all the user requirements, PoPS is of necessity
a complex system composed of many modules implemented using different languages
depending on the individual requirements of each separate module. However, all these
modules are based on a single concept: PoPS' computational model for protease specificity.
We are aware of only two other tools for predicting protease cleavage: Cutter and
PeptideCutter. However, both tools deal with a fixed set of proteases and provide predefined
models which cannot be altered. Furthermore, their computational models use pattern
matching, and therefore lack the flexibility to discover non-obvious cleavage sites. For
example, when predicting substrate cleavage for 40 known cleavage sites of the protease
known as caspase 3, PoPS was able to identify 27 of these as being the most likely cleavage
site within the substrate. In contrast, Cutter does not provide a caspase 3 model, and does not
allow users to add one, and the PeptideCutter model could only identify 7 of the 40 known
cleavage sites.