Evaluating the Predictability of RNA PseudoknotsJ. Reeder1, R. Giegerich2
firstname.lastname@example.org, Bielefeld University; email@example.com, Bielefeld University
Pseudoknots have been shown to be functionally relevant elements of RNA
secondary structure. Determining such structures including pseudoknots in full generality has been shown to be an NP-hard problem. For more restricited classes of pseudoknots, polynomial algorithms have been devised. The program pknotsRE by Rivas and Eddy can handle chained simple recursive pseudoknots in O(n^6) time and O(n^4) space, which is practical only for molecules of less then 140 nucleotides in length.
Canonized simple recursive pseudoknots
We consider non-chained simple recursive pseudoknots and impose a condition of canonization. Such csr-PK pseudoknots can be predicted in O(n^4) time and O(n^2) space. Two orders of magnitude improvement allow to handle sequences up to n = 500.
The approach results in three variants of our algorithm: For given RNA sequence s,
Using the three programs in combination opens new ways to check for the potential of an important pseudoknotted structure in a given RNA molecule.
- pknotsRG-mfe(s) computes the structure of s (knotted or not) of minimal
- pnotsRG-enf(s) enforces a pseudoknot: it delivers the energetically best
complete structure of s that includes at least one pseudoknot somewhere.
- pknotsRG-loc(s) delivers the best local pseudoknot that can be element of
any structure of s, where ``best'' is defined by free energy per nucleotide. The rest of s remains unfolded.
Evaluation of Pseudobase
Using the above algorithms, we performed a systematic evaluation of Batenburg's pseudoknot database ``Pseudobase''. Out of 212 pseudoknots tested, 149 fall in class csr-PK, of which 138 are recognized correctly. Others are partially or not at all recognized for a variety of reasons.
We report on the number of confirmed structures, spot cases where we hit algorithmic limitations, analyse cases where the pseudoknot may be in competition with an alternative structure, and comment on the performance of the energy model.
The programs are available at the Bielefeld Bioinformatics Server