Protein Function Prediction Using Probabilistic Protein Interaction Networks

Debra S. Goldberg¹, Sharyl Wong², Frederick P. Roth
¹debg@hms.harvard.edu, Harvard Medical School; ²sharyl_wong@student.hms.harvard.edu, Harvard Medical School

Determining the function of genes and proteins is one of the greatest challenges in biology today. Many of the high-throughput experimental methods developed recently to address this need are both expensive and error-prone. Although the uncertain nature of these experimentally-derived networks necessarily impacts network-derived inference, the function of many uncharacterized proteins can still be inferred from them. While the data may be susceptible to errors, the structure of the underlying biological system produces patterns that are reflected in the overall observed network topology. All data is not equal, yet predictions based on this data have typically relied on an “all-or-nothing” view of the data, wherein all data passing a confidence threshold is treated equally. This method does not use all available information, and we would expect improved predictions if we could use all information effectively. In previous work, we have shown that the network properties of neighborhood cohesiveness and degree distribution can both be used to improve confidence assessment of error-prone networks such as yeast two-hybrid protein interaction data. Here we use edge weights based on confidence factors derived from the experimental data. These edge weights are further refined using a Bayesian framework to incorporate how well each observation fits the expected local patterns (graph topology) of the network, producing a posterior probability that two proteins truly interact. From this probabilistic network of protein interactions we make improved inferences about protein function. These predictions are assessed computationally using cross-validation, and compare favorably to previously published methods.