Protein-Protein Docking Simulations with Local Backbone Flexibility

maxim totrov1
1max@molsoft.com, molsoft llc

Introduction While a number of docking algorithms[1] now can easily reconstitute the structure of a protein-protein complex from the subunits taken in their bound conformations, success and accuracy of predictions drops dramatically in the more realistic setup where unbound subunit structures are used[2-4]. Several approaches to the incorporation of side-chain flexibility into simulations have been reported[5-10]. However, often the conformational change upon binding is not restricted to side-chains but involves significant backbone movements, presenting still bigger challenge. A frequently observed mode of protein-protein association involves binding by one partner of a relatively flexible loop on the surface of the other. Large backbone movements within the binding loop make such systems difficult for protein-protein docking algorithms. For example, experimental structures of bound and free eglin-C and chymotrypsin-alpha are available, and the superposition shows that while bound/unbound heavy atom(backbone) RMSD for chymotrypsin-alphais only 1.1(0.6)Å, for eglin-C it is 2.2(1.4)Å, and for its binding loop (residues 42-48) it reaches 4.4(2.9)Å. Rigid-body and even flexible side-chain docking fails for this system. Here, locally flexible backbone docking simulation of egiln-C/chymotrypsin-alpha based on Internal Coordinate Mechanics (ICM)[11,12] is reported. Method Generally ICM methodology allows one to keep any part of the system rigid or flexible by selecting the free subset of internal variables, and efficiently includes in the energy calculations only the interactions that depend on these free variables. However, the situation where a fragment of the polypeptide chain needs to be flexible while the adjacent segments are kept rigid poses a problem. Indeed, freeing backbone torsions only within the loop of interest would not automatically restrict flexibility to that region, because the torsions within the loop would also move the C-terminal part of the protein beyond the loop region itself, with respect to the N-terminal part. To circumvent this problem and maintain the efficiency of ICM methodology, the loop of interest was excised from the rest of the protein and represented by a separate internal coordinate tree. The covalent geometry at the junctions was maintained using virtual ‘shadow’ atoms that duplicated corresponding real atoms in the adjoining part of the molecule – Calpha atom in the N-terminal part and a C atom in the C-terminal part. Strong (100. kcal/Å2) harmonic distance restraints between the ‘shadow’ virtual atoms and corresponding real atoms kept the junctions in a near-ideal configuration. The choice of junction between C and Calpha atoms minimizes the effect of chain disruption on the force-field energy, since the corresponding psi torsion has the smallest rotation barrier. The rest of the molecule moved as a rigid body, but the side-chain torsions of residues in the 6A vicinity of the loop and the C-terminal glycine residue were also left flexible. ICM biased probability Monte-Carlo (BPMC)[13] global minimization procedure was applied. Free side-chain torsions and loop backbone phi and psi angles were subject to BPMC random moves, each followed by up to 1500 steps of local gradient minimization. During the local minimization, the variables controlling the overall position and orientation of the ligand were also allowed to change. Simulation was terminated after 10 million energy evaluations, or approximately 17000 random moves. Initial position of the ligand (eglin-C) was taken from the rigid-body grid docking simulation previously reported[4]. Conformation with the binding loop of the ligand oriented towards the binding site of the receptor was selected as a starting point for flexible docking simulation. This conformation had RMSD of 5.3Å for the heavy atoms of loop residues. While the mutual orientation of the ligand and receptor resembled that of the native complex, the interface was very lose and residue-residue contacts were mostly wrong due to a completely different conformation of the loop in the NMR structure of unbound eglin-C (PDB code 1egl) used as a starting point. The receptor (chymotrypsin-alpha) molecule was taken from an unbound X-ray structure (PDB code 5cha). Receptor was represented by grid potentials[4]. While flexibility of the receptor was not treated explicitly, truncated van der Waals potentials (maximal repulsion 1.5 kcal/M) implicitly allowed for some ‘softness’. Results Ten simulations were run in parallel to ensure convergence. Each simulation took about 24 hours on a single P4 2.6Ghz CPU. 3 runs achieved very similar lowest energy conformations. The best energy conformation had all side-chains of the binding loop correctly placed into the corresponding receptor pockets, and RMSD for the 7 loop contact residues with respect to the experimental complex structure (PDB code 1acb) was only 1.9Å (1.3Å for the backbone). The salt bridge between Arg51 and Asp46 absent in the unbound eglin-C structure was formed. Thus, significant conformational changes can be successfully predicted by locally flexible backbone docking simulations in internal coordinates. In the present study, it was assumed that the flexible binding loop is known. While this requirement is clearly a limitation, such information may often be available from the non-structural experimental information (mutations), structural data for free subunits (b-factors, disordered regions), and from computational evaluation of local protein flexibility. The results demonstrate feasibility of inclusion of the backbone flexibility into protein-protein binding simulations. References: 1. Halperin I, Ma B, Wolfson H, Nussinov R. Proteins. 2002 Jun 1;47(4):409-43. 2. Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, Vajda S, Vakser I, Wodak SJ. Proteins. 2003 Jul 1;52(1):2-9. 3. Mendez R, Leplae R, De Maria L, Wodak SJ. Proteins. 2003 Jul 1;52(1):51-67. 4. Fernandez-Recio J, Totrov M, Abagyan R. Protein Sci. 2002 Feb;11(2):280-91. 5. Totrov M, Abagyan R. Nat Struct Biol. 1994 Apr;1(4):259-63. 6. Jackson RM, Gabb HA, Sternberg MJ. J Mol Biol. 1998 Feb 13;276(1):265-85. 7. Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D. J Mol Biol. 2003 Aug 1;331(1):281-99. 8. Zacharias M. Protein Sci. 2003 Jun;12(6):1271-82. 9. Lorber DM, Udo MK, Shoichet BK. Protein Sci. 2002 Jun;11(6):1393-408. 10. Fernandez-Recio J, Totrov M, Abagyan R. Proteins. 2003 Jul 1;52(1):113-7. 11. Abagyan RA, Totrov MM, Kuznetsov DA. J Comp Chem 1994, 15, 488-506. 12. MolSoft ICM 3.0 program manual; MolSoft LLC: San Diego, 2003. On-line: http://www.molsoft.com/services/help/frames.htm 13. Abagyan R, Totrov M. J Mol Biol. 1994 Jan 21;235(3):983-1002.