Systematic Dynamic Programming in Bioinformatics

Systematic Dynamic Programming in Bioinformatics

Dirk Evers Robert Giegerich

Synopsis

Topic: Dynamic Programming for experts

General Area: Understanding and developing sequence analysis algorithms

Duration: 2 x 90 minutes

Level: Intermediate to advanced

Details

Dynamic Programming in Bioinformatics

Dynamic programming (DP) is a most fundamental programming technique in bioinformatics. Sequence comparison, gene recognition, RNA structure prediction and hundreds of other problems are solved by ever new variants of DP. Despite of all available experience, the development of the typical DP recurrences is nontrivial, and their implementation presents quite a few pitfalls. To quote a recent comment by an expert:

Presently, developing efficient DP algorithms is a matter of experience, talent and luck. A more systematic or mechanical way to develop efficient DP algorithms would be very valuable in many fields in addition to bioinformatics.In my own experience, developing successful DP occurences is hard, and I would like to learn a mechanistic way to do it on problems that haven't already been worked out.

Towards a new discipline of DP

Such a systematic (though not mechanistic) way to develop DP algorithms has emerged over the last two years. The discipline is a program development method, applicable to a wide and well-defined class of sequence analysis problems. It is based on the concepts of algebras and tree grammars, but neither involved mathematics, nor formal language theory is needed to understand the method. Dynamic Programming in the new framework looks and feels radically different from what we find in the textbooks. To give an impression - this is the Needleman-Wunsch algorithm:

>  nw_algorithm =  axiom alignment where

>     alignment =  R <<< xbase  ~~~ alignment ~~~ ybase  |||

>                  D <<< region ~~~ alignment            |||

>                  I <<<            alignment ~~~ region |||

>                  E <<< empty

Here R,D,I denote replacement, deletion, and insertion, while E denotes an empty alignment. Even without further explanation, you may recognize how the notation reflects the recursive nature in which alignments are defined. The absence of array subscripts in this notation is a major relief. Algorithm development proceeds in several stages on a very intuitive level of abstraction, roughly indicated by the example above. Exploring algorithmic ideas, analyzing and tuning for efficiency can be done conveniently on this level. An executable rapid prototype of the algorithm is obtained without further effort. Only after all the reasoning and some testing of the algorithm have been done, the typical DP recurrences are produced in a quasi-mechanistic way.

Due to its high level of abstraction, the method is as well-suited to the explanation and comparison of classical DP algorithms, as it is to the development of de-novo DP algorithms. This way, the tutorial provides new insights for the expert, and for the novice a gentle path into successfully programming sequence analysis tasks via DP.

Planned tutorial contents

Introduction
1. Dynamic Programming: All over Bioinformatics
2. The Historic Roots of DP: An Algorithmic Fairy Tale
3. The General Setting: Evaluation of Recursive Search Spaces and the Optimality Principle
A Discipline of Dynamic Programming
1. Separating Concerns: Recognition versus Evaluation Phase
2. The Subjects under Evaluation: Formulas
3. Implementing Evaluation: Scoring, Counting, Estimation, Enumeration
4. Precisely what to Score: Grammars
5. Implementing Recognition: Parsers
6. Dynamic Programming = Recursion + Tabulation
7. Our first DP Algorithm: No Subscripts, No Errors
8. Algorithmic Efficiency: Analysis and Tuning
9. The Final Step: DP Recurrences for Free
A Selection of Famous and New DP Examples
1. Needleman-Wunsch: Pairwise Alignment
2. Smith-Waterman: Local Similarity
3. Gotoh: Pairwise Alignment
4. Giegerich: Aligning Recombinant DNA Sequences
5. Nussinov: RNA Folding via Base Pair Maximisation
6. Zuker: RNA Folding based on Energy Minimisation
7. Evers: Complete Suboptimal Folding of Canonical RNA Structures
Software Engineering Aspects of DP
1. Sources of Error: Redundancy, Ambiguity and (Non-) Canonical Models
2. Rapid Prototyping: The Dynamic Programmer's Blessing
3. Correctness: Multiple Derivation of DP Recurrences
4. Validation: Program Properties and Cross-Checking
Problem Session

Instructors

Dirk Evers is a final stage PhD student at Bielefeld, author of the visualisation tool RNAMovies. His research is in advanced DP methods for RNA structure prediction.

Robert Giegerich is a professor of computer science at Bielefeld University, active in bioinformatics since 1993. Bioinformatics tools developed by his group are used throughout the community at http://bibiserv.techfak.uni-bielefeld.de/. As a university teacher, R.G. has designed and implemented a full bioinformatics curriculum.

Author: Dirk Evers

Topic:	Dynamic Programming for experts
General Area:	Understanding and developing sequence analysis algorithms
Duration:	2 x 90 minutes
Level:	Intermediate to advanced