Systematic Dynamic Programming in Bioinformatics |
Topic: | Dynamic Programming for experts |
General Area: | Understanding and developing sequence analysis algorithms |
Duration: | 2 x 90 minutes |
Level: | Intermediate to advanced |
Dynamic programming (DP) is a most fundamental programming technique in bioinformatics. Sequence comparison, gene recognition, RNA structure prediction and hundreds of other problems are solved by ever new variants of DP. Despite of all available experience, the development of the typical DP recurrences is nontrivial, and their implementation presents quite a few pitfalls. To quote a recent comment by an expert:
Presently, developing efficient DP algorithms is a matter of experience, talent and luck. A more systematic or mechanical way to develop efficient DP algorithms would be very valuable in many fields in addition to bioinformatics.In my own experience, developing successful DP occurences is hard, and I would like to learn a mechanistic way to do it on problems that haven't already been worked out.
Such a systematic (though not mechanistic) way to develop DP algorithms has emerged over the last two years. The discipline is a program development method, applicable to a wide and well-defined class of sequence analysis problems. It is based on the concepts of algebras and tree grammars, but neither involved mathematics, nor formal language theory is needed to understand the method. Dynamic Programming in the new framework looks and feels radically different from what we find in the textbooks. To give an impression - this is the Needleman-Wunsch algorithm:
> nw_algorithm = axiom alignment where > alignment = R <<< xbase ~~~ alignment ~~~ ybase ||| > D <<< region ~~~ alignment ||| > I <<< alignment ~~~ region ||| > E <<< empty
Here R,D,I denote replacement, deletion, and insertion, while E denotes an empty alignment. Even without further explanation, you may recognize how the notation reflects the recursive nature in which alignments are defined. The absence of array subscripts in this notation is a major relief. Algorithm development proceeds in several stages on a very intuitive level of abstraction, roughly indicated by the example above. Exploring algorithmic ideas, analyzing and tuning for efficiency can be done conveniently on this level. An executable rapid prototype of the algorithm is obtained without further effort. Only after all the reasoning and some testing of the algorithm have been done, the typical DP recurrences are produced in a quasi-mechanistic way.
Due to its high level of abstraction, the method is as well-suited to the explanation and comparison of classical DP algorithms, as it is to the development of de-novo DP algorithms. This way, the tutorial provides new insights for the expert, and for the novice a gentle path into successfully programming sequence analysis tasks via DP.
Dirk Evers is a final stage PhD student at Bielefeld, author of the visualisation tool RNAMovies. His research is in advanced DP methods for RNA structure prediction.
Robert Giegerich is a professor of computer science at Bielefeld University, active in bioinformatics since 1993. Bioinformatics tools developed by his group are used throughout the community at http://bibiserv.techfak.uni-bielefeld.de/. As a university teacher, R.G. has designed and implemented a full bioinformatics curriculum.
Author: Dirk Evers