Improved prediction of protein subcellular localization: exploring inherent biases in neural network learning

Mikael Boden1
1mikael@itee.uq.edu.au, School of Information Technology and Electrical Engineering, University of Queensland

The subcellular translocation of proteins usually relies on an N-terminal targeting sequence. A series of feed forward neural network based predictors have shown potential in predicting localization and cleavage sites of various proteins. TargetP distinguishes between proteins destined for the mitochondron, for the chloroplast, for the secretory pathway, and all others.

A neural network is trained from example data to find a solution (which is later evaluated by presenting novel data). In contrast to conventional statistical tools, the network architecture imposes a bias (or constraint) on the search for the solution. The vast number of possible combinations of amino acids that may signal transportation and the relatively few examples available, require us to be careful in selecting an appropriate architecture.

We explore recurrent neural networks and their ability to help in predicting localization of proteins. We use the same data, learning task and evaluation methods as TargetP to objectively assess the usefulness of a range of recurrent neural networks. The recurrent neural networks are used to spatially scan and detect target sequences. By recursively creating an upstream and downstream sequence state from the residues next to each position in the sequence, the middle residue is classified as being part of the target sequence or not. The detection output is then fed through a feed forward neural network which identified the destination of the protein.

Generally, the prediction accuracy increases with the introduction of a state influenced by residues upstream and downstream. However, in a few cases feed forward neural networks perform better. The optimal predictor is a feed forward/recurrent hybrid ensemble of networks. For biological sequence prediction tasks where even marginal improvements in accuracy are crucial, recurrent neural networks are well worth exploring.