Understanding how protein families evolve and function remains a central question in molecular biophysics. By grouping evolutionarily related proteins into Functional Families (FunFams), CATH captures structural and functional conservation beyond sequence identity. By integrating AlphaFold2 models, CATH offers a representative view of the protein universe. Our group has developed a methodology to quantify local frustration conservation patterns in protein families, providing a biophysical interpretation of evolutionary constraints related to foldability, stability and function.
In this study, we scaled frustration conservation analysis to a representative portion of the protein universe. We have analyzed over 8,900 FunFams (2.2M sequences) from CATH and TED, and explored the frustration and aminoacid identities distributions across the 20 Foldseek’s 3Di tertiary neighborhoods. We investigated how these geometries influence conservation patterns and find that some aminoacid identities (e.g. C, V, L, F, I, M) are conserved in a minimally frustrated state, indicating their evolutionary importance as structural anchors. Other residues (e.g. T, S, H, G) tend to be conserved in a neutral state, historically overlooked, suggesting that neutral frustration is not just an energetic buffering state but an evolutionarily constrained one. Additionally, some residues (e.g. D, K, E, N, Q) exhibit high proportions of conserved high frustration, potentially relevant for function.
We present the first large-scale frustration survey of the protein universe, which allows us to distinguish whether sequence conservation reflects stability, neutrality or function. This framework offers a new way of interpreting conservation and lays the foundation for a biophysically informed understanding of protein evolution.