Multiple alignments of sequences and structures using T-Coffee

Orla OSullivan 1, Desmond Higgins2, Cedric Notredame
1ojos@student.ucc.ie, University College Cork; 2 University College Cork

T-Coffee is a package for multiple sequence alignment. It provides a significant improvement in accuracy over other methods we have tested. It provides a clean, fast and versatile system for not only producing multiple alignments from sequences but from other data sources such as existing multiple alignments or structures or mixtures of these. The aim of this project was to investigate the effects of mixing sequence and structural information on alignment accuracy. Aligning similar sequences is straightforward for most alignment programs. However when the percent identity between the sequences decreases so to does the performance of these programs. Even the most distantly related sequences may have similar tertiary structures; therefore by including this structural information in an alignment program you should increase the accuracy of the alignments. To generate the structural alignments we used SAP, (Orengo and Taylor, 1989) which carries out alignments between pairs of tertiary structures using "double dynamic programming", and FUGUE (Shi, et al, 2001), which carries out sequence-structure alignments, based on environment-specific substitution tables. We combined these methods with T-Coffee and tested them on the HOMSTRAD (Mizuguchi et al, 1998) dataset of reference alignments. The results displayed in this poster show that using all available structures with FUGUE gave an improvement over using no structures and using all structures with SAP gave a considerable improvement over using no structures. Furthermore, mixing together FUGUE and SAP alignments generates a small further improvement in alignment quality.