Domain-Domain correlations in Yeast protein complexes

Doron Betel1, Christopher W.V. Hogue2, Samuel Lunenfeld Research Institute, Mt. Sinai Hospital and Department of Biochemistry, University of Toronto;, Samuel Lunenfeld Research Institute, Mt. Sinai Hospital and Department of Biochemistry, University of Toronto

A principal component of protein function is the ability of a protein to associate with other macromolecules and facilitate biochemical events. Generally these events are mediated by functional elements which have been defined as a combination of conserved sequence regions, 3D motifs or memberships in protein families. Hence, it is often illuminating to study protein relations from the perspective of the domains present in those proteins.

We introduce a novel method for identifying domain-domain co-occurrences in Saccharomyces cerevisiae protein complexes. In this study, we identify pairs of domains that co-occur in the same protein complex with statistically significant probabilities. We analyzed four protein complexes datasets of which two contain human-curated complexes, and two are from high-throughput molecular complex detection experiments. Protein domain annotation was derived from our in-house adaptation of the NCBIís CDD database. The statistical model is based on computing a P-value for the occurrence of two domains in the same complex in two different proteins. These values are generated by comparing the observed number of co-occurrences of two domains in the same complex over the entire dataset against a random model of co-occurrences. For each dataset two random models were generated; the first model randomly assigns domain annotation to proteins in complexes while the second model randomly shuffles proteins among the complexes but retains the proper domain information for each protein.

Graph representations of the domain correlations show that in the high-quality datasets, domains of related or similar functions form independent components within the networks. Using GO functional classification terms, we identified domain clusters that map to well-known molecular complexes including the ribosome, RNA-polymerase, cyclin-dependent kinase, and components of the nucleosomes. In contrast, the high throughput datasets contain both biologically relevant and false correlations that form one large network of domain associations.