Semi-Automated Homology Modeling Using A Modified TRADES Algorithm

Michel Dumontier1, Howard J. Feldman2, Christopher W.V. Hogue
1micheld@mshri.on.ca, Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Ave.,Toronto, Ontario, Canada M5G 1X5; 2feldman@mshri.on.ca, Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Ave.,Toronto, Ontario, Canada M5G 1X5

Homology modeling is a powerful method for predicting the three dimensional structure of biological macromolecules from their primary sequence given even weak sequence similarity to a biomolecule with an experimentally determined structure. High-quality models can provide important information regarding the function and mechanism of a biomolecule and could be used for rationalizing experimental data or guiding the design of new experiments. Here, we present a modified version of the TRADES algorithm1 for homology modeling. Template protein structures for homology modeling were identified using BLAST against the protein structure database (PDB) and the conserved domain database (CDD). Templates with significant sequence similarity across the longest segment with the fewest indels and closest functional annotation were favorably considered. In the case of multi-domain proteins, the best hit for each domain was used as template. Where possible, alignments were modified to ensure that indels fell on loop regions rather than across elements of secondary structure. Next, a new target trajectory distribution was built from the template backbone Ca trajectory using a modification of the TRADES algorithm. A slightly flexible single fragment from the recorded trace replaced each structurally conserved (gapless) region of alignment. Gap-spanning fragments for variable regions were created from 'takeoff angles' starting from one residue prior to the gap and ending one residue following the gap. These fragments consisted of six degrees of freedom - the distance between the start and end of the gap, two virtual Ca angles and three virtual Ca dihedrals. Three atoms from each side of the gap were placed in space, according to the takeoff angles. Alpha carbons required to fill the gap were given arbitrary starting co-ordinates within the gap region, and a steepest descent energy minimization consisting of virtual Ca bond length restraints, virtual Ca angles restraints, and a van der Waals term was carried out. The three anchoring atoms on either side of the gap were held fixed during the minimization. Finally, the resulting loop was incorporated as a fragment using its own Ca trace. Roughly 1000 structures were generated using the fragments obtained from the previous steps and our Foldtraj software, with bump checking disabled. Using a modified version of a statistical residue-based potential2, which we have termed 'crease energy', the best five structures were chosen. These were then refined with a steepest-descent minimization using the CHARMM EEF1 force field to resolve steric clashes but without significantly changing the structure (typically 1Ao RMSD between the refined and unrefined structures). The modified TRADES algorithm generates realistic, all-atom protein structure homology models of non-idealized geometry as it incorporates side chains from a backbone dependent rotamer library and produces reasonable bond lengths, bond angles, torsion angles, as well as minimized electrostatics and van der Waals forces. Moreover, this method models loops for insertions and deletions and compensates for missing template atoms. Strategies for generating better alignments and minimizing alignment dependent errors are discussed. 1. Feldman H.J. and Hogue C.W.V. (2000). A fast method to sample real protein conformational space. Proteins. 39 (2), 112-31. 2. Bryant S.H. and Lawrence C.E. (1993) An empirical energy function for threading protein sequence through the folding motif. Proteins. 16 (1), 92-112.