SPfast: Highly efficient protein structure alignment with segment-level representations and block-sparse optimization
Confirmed Presenter: Thomas Litfin, Griffith University, Australia
Room: 520a
Format: In Person
Moderator(s): Chris Kieslich
Authors List: Show
- Thomas Litfin, Griffith University, Australia
Presentation Overview: Show
Recent advances in protein structure modelling have increased the availability of high-quality protein structures at an unprecedented scale. Newly available structure libraries represent an exciting opportunity for discovery-based research. However, the explosion of protein structure data has exposed scaling deficiencies in the bioinformatics toolset which limit their utility for downstream analyses. These scaling problems will only be further exacerbated as modelling projects expand to noncanonical isoforms, dynamic trajectories, de novo designs etc. foldseek has introduced a structure state alphabet to mitigate this computational burden. However, the increased speed is accompanied by trade-offs in search sensitivity due to sacrificing information about global topology. In this work we describe a fully geometric protein structure search engine, SPfast, which leverages a coarse grained, hierarchical representation and an efficient block-sparse optimization heuristic to greatly accelerate pairwise protein structure alignment and enable practical analysis of large-scale structure libraries. Combining SPfast with a newly parameterized SPscore maintains state-of-the art performance for database search, more accurately reproduces pairwise evolutionary alignments and increases throughput by 100x compared with traditional methods.
STRPsearch: fast detection of structured tandem repeat proteins
Confirmed Presenter: Alexander Monzon, Department of Information Engineering, University of Padova, Italy
Room: 520a
Format: In Person
Moderator(s): Chris Kieslich
Authors List: Show
- Soroush Mozaffari, Department of Biomedical Sciences, University of Padova, Italy, Italy
- Paula Nazarena Arrias, Department of Biomedical Sciences, University of Padova, Italy, Italy
- Damiano Clementel, Department of Biomedical Sciences, University of Padova, Italy, Italy
- Damiano Piovesan, Department of Biomedical Sciences, University of Padova, Italy, Italy
- Carlo Ferrari, Department of Information Engineering, University of Padua, Italy, Italy
- Silvio Tosatto, University of Padova, Italy
- Alexander Monzon, Department of Information Engineering, University of Padova, Italy
Presentation Overview: Show
State-of-the-art prediction methods are generating millions of publicly available protein structures. Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. STRPs exhibit distinct propensities for secondary structure and form regular tertiary structures, often comprising large molecular assemblies. They can perform important and diverse biological functions due to their highly degenerated sequences, which maintain a similar structure while displaying a variable number of repeat units. This suggests a disconnection between structural size and protein function. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation is the most accurate method for detecting and classifying them, making it impossible to inspect millions of structures.
We present STRPsearch, a novel computational tool for rapid identification, classification, and mapping of STRPs. Leveraging the manually curated entries in RepeatsDB as the known conformational space of the STRPs, STRPsearch utilizes the latest advancements in structural alignment techniques for a fast and accurate detection of repeated structural motifs in protein structures, followed by an innovative approach to map units and insertions through the generation of TM-score graphs. STRPsearch can serve researchers in structural bioinformatics and protein science as an efficient and practical tool for analysis and detection of STRPs.