Prediction of Protein Function from Primary Structure

Paul J. Tan1, Vladimir Brusic2, Asif M. Khan, Judice L.Y. Koh, Seng-Hong Seah
1tjtan@i2r.a-star.edu.sg, I2R; 2vladimir@i2r.a-star.edu.sg, I2R

A common assumption that protein structure implies its function is useful in some cases, but has very limited application with active peptides. A single amino acid substitution may completely abrogate functional effects, commonly seen with the disruption of active sites in proteins. At the other extreme, a single protein may exert multiple functional effects. For example, a single snake venom phospholipase A2 toxin is known to act as a myotoxic, cytotoxic, edema inducing and a platelet-effecting agent.


We have developed an approach for predicting the presence of a specific functional effect for active peptides. The approach consists of multiple steps: a) collection of protein sequences from multiple sources, b) data cleaning and functional annotation, c) definition of basic structure-function unit groups, and d) prediction of protein function by an intelligent agent. The first two steps are facilitated by using our data warehousing platform (BioWare) which enables rapid building of specialised searchable databases. BioWare facilitates sequence entry annotation and enrichment by functional annotations. A basic structure-function unit group consists of sequences that share both high primary sequence identity and functional properties. The intelligent agent compares query sequence with the basic structure-function unit groups and uses a set of rules to determine the putative function of a protein.


The analysis of 220 scorpion toxins resulted in 32 basic structure-function unit groups. The predictive system, using sequence comparison and nearest neighbour analysis, was tested on a set of 52 new sequences. The predictive system correctly classified 31 sequences into existing groups, proposed 9 new groups, and misclassified 3 sequences (94% correct classification). Ion channel specificity was correctly predicted for 91.5% of the sequences. Toxin action describing the potency of the toxin was correctly predicted for 83.3% of sequences, while 16.7% had correct prediction as one of several predicted options. We tested the prediction module with sequences other than scorpion toxin and in all cases, the results were ‘no similar record found’.