Incorporating Sequence and Biochemical Information in TOPS models - For Biologically Significant Pattern Matching and Pattern Discovery in Protein

Mallika Veeramalai1, David Gilbert2, David R Westhead
1mallika@dcs.gla.ac.uk, Bioinformatics Research Centre, Dept. of Computing Science, University of Glasgow; 2drg@dcs.gla.ac.uk,

TOPS (Topology of Protein Structure) Database contains 2D abstract spatial representation of secondary structure elements (SSEs) of the protein structures. Based on TOPS cartoons TOPS diagrams are developed. Instead of representing spatial positions by elements in a plane, a TOPS diagram contains information about the grouping of beta-strands in beta-sheets (two adjacent elements in a beta-sheet are connected by an H-bond, which can be either parallel or anti-parallel) and also information about the orientation of elements (any two SSEs can be connected by either left or right chirality). Based on these TOPS diagrams, very fast pattern matching and pattern discovery algorithms for protein topologies were developed. However, because of its abstract nature, it is possible to loose significant biological information. Incorporation of sequence information (in the form of PSSM/HMM profiles) and biochemical features such as ligand-binding sites, active-sites in the TOPS graph-based representation of the protein structure will increase its biological significance. Interesting results would be valuable efforts to predict protein structure and function from the sequences, and these problems remain key challenges of direct relevance to projects in structural and functional genomics. TOPS database can be accessible from http://www.tops.leeds.ac.uk