Towards modulating protein-protein interactions: Clustering protein surfaces to identify biologically-relevant structural space to focus molecular design

Stephen Long¹, Mark Smythe², Peter Adams, Darryn Bryant and Tran Trung Tran
¹sml@maths.uq.edu.au, School of Physical Sciences, Institute for Molecular Biosciences, The University of Queensland; ²M.Smythe@protagonist.com.au, Institute for Molecular Bioscience, The University of Queensland and Protagonist Pty. Ltd.

Identifying small molecules that modulate protein-protein interactions continues to be a major challenge for drug discovery. This is presumably a consequence of a different paradigm (large flat surfaces of protein-protein interactions compared to cavities of existing therapeutic targets), the immense size of the chemical universe (10⁶⁰ drug-like molecules) and a lack of knowledge of small molecules that modulate protein-protein interactions as a starting point for drug discovery.
To focus molecular selection processes for the discovery of molecules from the vast chemical universe, that have the potential to modulate protein-protein interactions, we have clustered protein contact surfaces and identified common side chain positions of proteins involved in molecular recognition events. Our thesis is based on the well-known structure-function relation of medicinal chemistry. Consequently identifying molecules that match common protein side chain shapes should significantly impact on the discovery of molecules to modulate protein functions. To achieve this, a database of homologous protein-protein complexes was created. From this database, two datasets were extracted, each representing the interaction region of pairs of proteins of these complexes. The first was produced by extracting residues that form contact (satisfying a maximum distance criterion) across the protein interface. The second dataset contains isolated regions of the Connolly surface of each protein that likewise satisfy a maximum distance criterion that are in " contact " with opposing the interacting protein. This dataset also contains information about the electrostatic charge of these interacting surfaces.
The aim of this research is to cluster features (or motifs) of each of these datasets, hence extracting general information about the structure of these interacting surfaces. Clustering these datasets, however, is a significant challenge. Their large size excludes currently existing clustering packages because they were too computationally expensive. To overcome the limitation of computation time, a quick and simple method to structurally compare our motifs was implemented and an algorithm that efficiently that efficiently scans the search space was developed and parallelised over a cluster of processors.