A neural network is trained from example data to find a solution (which is later evaluated by presenting novel data). In contrast to conventional statistical tools, the network architecture imposes a bias (or constraint) on the search for the solution. The vast number of possible combinations of amino acids that may signal transportation and the relatively few examples available, require us to be careful in selecting an appropriate architecture.
We explore recurrent neural networks and their ability to help in predicting localization of proteins. We use the same data, learning task and evaluation methods as TargetP to objectively assess the usefulness of a range of recurrent neural networks. The recurrent neural networks are used to spatially scan and detect target sequences. By recursively creating an upstream and downstream sequence state from the residues next to each position in the sequence, the middle residue is classified as being part of the target sequence or not. The detection output is then fed through a feed forward neural network which identified the destination of the protein.
Generally, the prediction accuracy increases with the introduction of a state influenced by residues upstream and downstream. However, in a few cases feed forward neural networks perform better. The optimal predictor is a feed forward/recurrent hybrid ensemble of networks. For biological sequence prediction tasks where even marginal improvements in accuracy are crucial, recurrent neural networks are well worth exploring.