Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
B-001: Computational resources for understanding and predicting the binding affinity of protein-nucleic acid complexes.
Track: 3D-SIG
  • Harini Kannan, Indian Institute of Technology, Madras, India
  • M.Michael Gromiha, Indian Institute of Technology, Madras, India


Presentation Overview: Show

Protein–nucleic interactions play an important role in various biological processes such as gene expression, replication, transcription and so on. Specifically, understanding the binding affinity of protein-nucleic acid complexes is important for elucidating their recognition mechanisms. With increase in the experimental data, there is no well-curated database currently available for protein-nucleic acid binding affinity. Hence, we developed a database, ProNAB, which contains more than 20 000 experimental data for the binding affinities of protein-DNA and protein-RNA complexes. Each entry has comprehensive information on sequence and structural features of a protein, nucleic acid and its complex, experimental conditions, thermodynamic parameters such as dissociation constant (Kd), binding free energy (ΔG) and cross-linked to other databases. ProNAB is freely available at https://web.iitm.ac.in/bioinfo2/pronab/. Further, we filtered binding free energy of 391 non-redundant protein-DNA complexes. We derived several structure-based features to develop multiple regression equations for predicting the binding affinity of protein-DNA complexes. Our method showed an average correlation and mean absolute error of 0.78 and 0.98 kcal/mol, respectively, on a jackknife test. We have developed a webserver, PDA-PRED and it is freely available at https://web.iitm.ac.in/bioinfo2/pdapred/

B-002: Computational resources for understanding the binding affinity of membrane protein-protein complexes and their mutants
Track: 3D-SIG
  • Fathima Ridha Karuvanthodikayil, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India
  • M. Michael Gromiha, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India


Presentation Overview: Show

Membrane protein (MP) complexes play vital cellular functions which are mainly dictated by their binding affinity. Due to their intricate structure, however, the binding affinity of membrane proteins is less explored compared to globular proteins. Mutations in these complexes affect their binding affinity, as well as impair critical functions, and may lead to diseases. Despite an increase in experimental affinity data in literature, they are dispersed, necessitating their compilation into a comprehensive database for further analysis. Hence, we developed the first and specific database, MPAD (https://web.iitm.ac.in/bioinfo2/mpad), which contains experimental binding affinities of membrane protein-protein complexes and their mutants along with sequence, structure, functional information, membrane-specific features, experimental conditions, and literature information, with an easy-to-use interface and options to build search queries, display, sort, download, and upload the data. Using this database, we have developed the first ML-based method for predicting the affinity of novel MP complexes based on classification methodology. Our method showed a correlation and MAE of 0.83 and 0.91 kcal/mol, respectively, using jackknife test on a set of 114 complexes. Thus, these resources help contribute to an in-depth understanding of membrane protein complexes which may have potential applications to drug design and also for analysis in different directions.

B-003: Improvement of protein tertiary and quaternary structure predictions using the ReFOLD refinement method and the AlphaFold2 recycling process
Track: 3D-SIG
  • Recep Adiyaman, University of Reading, United Kingdom
  • Liam McGuffin, University of Reading, United Kingdom
  • Ahmet Genc, University of Reading, United Kingdom
  • Nicholas Edmunds, University of Reading, United Kingdom
  • Shuaa Alharbi, University of Reading, United Kingdom


Presentation Overview: Show

Motivation: The accuracy gap between predicted and experimental structures has been significantly reduced following the development of AlphaFold2 (AF2). However, for many targets, AF2 models still have room for improvement. In previous CASP experiments, highly computationally intensive MD simulation-based methods have been widely used to improve the accuracy of single 3D models.
Here, our ReFOLD pipeline was adapted to refine AF2 predictions while maintaining high model accuracy at a modest computational cost. Furthermore, the AF2 recycling process was utilised to improve 3D models by using them as custom template inputs for tertiary and quaternary structure predictions.
Results: According to the Molprobity score, 94% of the generated 3D models by ReFOLD were improved. AF2 recycling showed an improvement rate of 87.5% (using MSAs) and 81.25% (using single sequences) for monomeric AF2 models and 100% (MSA) and 97.8% (single sequence) for monomeric non-AF2 models, as measured by the average change in lDDT. By the same measure, the recycling of multimeric models showed an improvement rate of as much as 80% for AF2-Multimer (AF2M)models and 94% for non-AF2M models.
Availability: Refinement using AlphaFold2-Multimer recycling is available as part of the MultiFOLD docker package (https://hub.docker.com/r/mcguffin/multifold). The ReFOLD server is available at https://www.reading.ac.uk/bioinf/ReFOLD/

B-004: Graph-Theoretical Prediction of Biological Modules in Large Protein Complexes
Track: 3D-SIG
  • Florian J. Gisdon, Goethe University Frankfurt am Main, Germany
  • Mariella Zunker, Goethe University Frankfurt am Main, Germany
  • Jan Niclas Wolf, Goethe University Frankfurt am Main, Germany
  • Jörg Ackermann, Goethe University Frankfurt am Main, Germany
  • Ina Koch, Goethe University Frankfurt am Main, Germany


Presentation Overview: Show

The structural and functional complexity of biological processes is highly related to the assembly of individual proteins into diverse protein complexes. The discovery and the characterization of protein complexes in recent years has substantially progressed by advances in cryo-electron microscopy, proteomics, and computational structure prediction. It is therefore essential to have tools to process this number of data and to extract relevant information for structural and functional characterization.
We have developed the Protein Topology Graph Library, which computes the quaternary structure of protein complexes as undirected graphs with protein chains as vertices and chain-chain contacts as edges. Our hypothesis was that clusters obtained by graph-theoretical division of the protein-complex graph correspond to biological modules. For evaluation, we used the human respiratory complex I, an extensively investigated complex with an experimentally assigned module structure. By applying the Leiden clustering algorithm, we obtained five clusters, which are in good agreement with the biological modules.
We conclude that the prediction of biological modules for large protein complexes can be performed with graph-theoretical approaches. Our aim is to provide a suitable tool to deal with the high complexity of the data and the increasing number of large protein complexes.

B-005: Role of coevolution in drug resistance mechanism of EGFR
Track: 3D-SIG
  • Gyan Prakash Rai, Central University of South Bihar, India
  • Asheesh Shanker, Central University of South Bihar, India


Presentation Overview: Show

Background
Lung cancer remains a major public health concern and the leading cause of cancer-related death. Global cancer statistics show that more than 2 million people receive a new lung cancer diagnosis each year. Although EGFR-TKIs have a high initial response rate in EGFR-mutant NSCLC patients but after a period of continuous treatments resistance emerge. As a result, more research into the mechanisms of resistance, as well as new therapeutic strategies and inhibitors, are required in the treatment of NSCLC.
Methods
In order to understand the evolutionary mechanism of drug resistance, we detected compensatory mutations in EGFR and perform structural analysis to check the effect on stability. Compensatory mutations improve fitness in genotypes that contain deleterious mutations
Results
Our analysis suggests that tyrosine kinase inhibitors which are used to treat non-small cell lung cancer have shown drug resistance due to group of coevolving mutations. Our study also reveals the involvement of coevolution in secondary structures to understand the mechanism of drug resistance.
Conclusion
Our study provides insight into the structural, functional, and dynamical aspects of drug resistance changes in EGFR, which will help in designing drugs with higher efficacy and may also be used for other proteins.

B-006: USING WRITHE TO PRODUCE REALISTIC PROTEIN STRUCTURE PREDICTIONS FROM BIOSAXS DATA
Track: 3D-SIG
  • Arron Bale, Durham University, United Kingdom
  • Christopher Prior, Durham University, United Kingdom


Presentation Overview: Show

We present fast and simple-to-implement measures of the entanglement of protein
tertiary structures, appropriate for highly flexible structure comparison. These quantities are based on the writhe heavily utilised in DNA topology studies. We show how they can be applied in a novel manner across various scales of the protein’s backbone to identify similar topologies which can be missed by more common RMSD based comparison methods. We derive empirical bounds on the entanglement implied by these measures and show how they can be used to constrain the search space of a protein for solution scattering, a method highly suited to determining the likely structure of proteins in solution where crystal structure or machine learning predictions often fail. In addition we identify large scale helical geometries present in a large array of proteins which are consistent across a number of different protein structure types and sequences. This is used in one specific case to demonstrate significant structural similarity between Rossmann fold and Tim Barrel proteins, a link which is potentially significant as attempts to engineer the latter have in the past produced the former.

B-007: ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins.
Track: 3D-SIG
  • Brennan Abanades, University of Oxford, United Kingdom
  • Wing Ki Wong, Large Molecule Research, Roche Pharma Research and Early Development, Roche Innovation Center Munich, Germany
  • Fergus Boyles, Department of Statistics, University of Oxford, United Kingdom
  • Guy Georges, Large Molecule Research, Roche Pharma Research and Early Development, Roche Innovation Center Munich, Germany
  • Alexander Bujotzek, Large Molecule Research, Roche Pharma Research and Early Development, Roche Innovation Center Munich, Germany
  • Charlotte M Deane, Department of Statistics, University of Oxford, United Kingdom


Presentation Overview: Show

Immune receptor proteins play a key role in the immune system and have shown great promise as biotherapeutics. The structure of these proteins is critical for understanding their antigen binding properties. Here, we present ImmuneBuilder, a set of deep learning models trained to accurately predict the structure of antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2) and T-Cell receptors (TCRBuilder2). We show that ImmuneBuilder generates structures with state of the art accuracy while being far faster than AlphaFold2. For example, on a benchmark of 34 recently solved antibodies, ABodyBuilder2 predicts CDR-H3 loops with an RMSD of 2.81Å, a 0.09Å improvement over AlphaFold-Multimer, while being over a hundred times faster. Similar results are also achieved for nanobodies, (NanoBodyBuilder2 predicts CDR-H3 loops with an average RMSD of 2.89Å, a 0.55Å improvement over AlphaFold2) and TCRs. By predicting an ensemble of structures, ImmuneBuilder also gives an error estimate for every residue in its final prediction. ImmuneBuilder is made freely available, both to download (https://github.com/oxpig/ImmuneBuilder) and to use via our webserver (http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred). We also make available structural models for ≈150 thousand non-redundant paired antibody sequences (https://zenodo.org/record/7258553).

B-008: To probe activation mechanism of HER2 directed Chimeric T-Cell Receptor (CAR) by antigen using homology modeling and all-atom molecular dynamics simulation by Anton2
Track: 3D-SIG
  • Mariya Hryb, Rowan University, United States
  • Leah Davis, Rowan University, United States
  • Stefi Lao, Rowan University, United States
  • Nichole Daringer, Rowan Unviversity, United States
  • Xiaoyang Mou, Rowan University, United States
  • Chun Wu, Rowan University, United States


Presentation Overview: Show

Recently, anti-HER2-CAR T-cells have been developed and shown promising in vivo cytotoxicity to human HER2 positive breast cancer in mice model. The apo-form of this anti-HER2-CAR consists of five structure domains: the HER2 antibody (AB) domain, the CD8α hinge (HI) and transmembrane (TM) domains, the CD137 costimulatory (CS) domain, and the CD3ζ signaling (SI) domain containing three immunoreceptor tyrosine-based activation motifs (ITAMs) to be phosphorylated by a kinase named as Lck. Although crystal structure of mouse AB in complex with human HER2 extracellular antigen domain (AG) is available, the high resolution structure of the other domains are not available, hindering our understanding of the receptor activation involving in signal transduction cross the membrane. In this study, molecular dynamics simulation was used to investigate the structure and dynamics of a newly developed anti-HER2-CAR T-cell receptor homology model and a novel mechanism was found: the Binding Induced Domain Flexibility Switch (BIDFS), in which the binding of the HER2 antigen to the stimulating AB domain causes flexibility changes between the extracellular AB and intracellular SI domains, facilitating signal transduction and activation of the receptor and potentially applicable to other receptor tyrosine kinases.

B-009: Emerging variants of SARS-CoV-2 NSP10 highlight strong functional conservation of its binding to two non-structural proteins, NSP14 and NSP16
Track: 3D-SIG
  • Huan Wang, UNIVERSITY COLLEGE LONDON, United Kingdom
  • Shozeb Haider, UNIVERSITY COLLEGE LONDON, United Kingdom
  • Huan Wang, UNIVERSITY COLLEGE LONDON, United Kingdom


Presentation Overview: Show

The coronavirus SARS-CoV-2 protects its RNA from being recognized by host immune responses by methylation of its 5’ end, known as capping. This process is carried out by two enzymes, NSP16 containing 2’-O-methyltransferase and NSP14 through its N7 methyltransferase activity, which are essential for replication of the viral genome and evading the host’s innate immunity. NSP10 acts as a crucial cofactor and stimulator of NSP14 and NSP16. To further understand role of NSP10, we carried out a comprehensive analysis of >13 million globally collected whole-genome sequences of SARS-CoV-2 and compared it with the reference genome Wuhan/WIV04/2019 to identify all currently known variants in NSP10. T12I, T102I, and A104V in NSP10 have been identified as three most frequent variants and characterized using X-ray crystallography, biophysical assays and enhanced sampling simulations. The functional effects of variants were examined for their impact on the binding affinity and stability of both NSP14-NSP10 and NSP16-NSP10 complexes. These results highlight the limited changes induced by variant evolution in NSP10 and reflect on the critical roles NSP10 plays during SARS-CoV-2 life cycle. These results also indicate that there is limited capacity for the virus to overcome inhibitors targeting NSP10 via generation of variants in inhibitor binding pockets.

B-010: Structure-based deep learning for binding site detection in nucleic acid macromolecules
Track: 3D-SIG
  • Igor Kozlovskii, Skolkovo Institute of Science and Technology, Russia
  • Petr Popov, Skolkovo Institute of Science and Technology, Russia


Presentation Overview: Show

Most of the human genome is considered ‘undruggable’, as the largest part of it does not encode proteins and only a minor part of expressed proteins has been drugged. One of the possible ways to overcome this issue is to target RNA molecules directly. Knowledge of the binding site position is crucial for the structure-based drug design, and computational methods can significantly speed up this process. Deep learning-based approaches performed as state-of-the-art methods in many tasks, however the existing approaches for this problem were trained or evaluated on a relatively low number of nucleic acid structures. In this work, we have constructed a large dataset of ~2000 nucleic acid-small molecule complexes, and developed BiteNetN, the first structure-based deep learning method for the prediction of nucleic acid-small molecule binding sites. We demonstrated that BiteNetN outperforms other methods and it is applicable to different types of nucleic acids. Finally, we applied BiteNetN to two case studies, HIV-1 TAR RNA and ATP-aptamer, and showed that model is feasible for a large-scale analysis of different conformations and mutant variants.

B-011: Understanding the significance of bifurcated inter-protein interactions in protein-protein complexes
Track: 3D-SIG
  • Sneha Bheemireddy, Indian Institute of Science, Bangalore, India, India
  • Revathy Menon, National Centre for Biological Sciences (NCBS), India
  • Narayanaswamy Srinivasan, Indian Institute of Science, Bangalore, India, India


Presentation Overview: Show

Multi-protein assemblies play a crucial role in several cellular processes. Studying the functional basis of such complexes begins with analyzing protein-protein interactions. Several studies have highlighted the significance of interfacial residues in protein-protein complexes and their role in conferring stability and specificity to the complex. However, inter-chain bifurcated interactions in protein assemblies are still an unexplored corner. In this work, the features of inter-protein bifurcated interactions in multi-protein complexes have been investigated. We found that bifurcated inter-protein interactions are present in our datasets in over 635 out of 682 multi-protein assemblies. Arg, Tyr, andisplaysisplay the highest propensity to participate in bifurcated inter-protein interacions. Further, we found that most of these residues are hotspots and are moderate to highly conserved, with a few exceptions. We explain the biological significance of bifurcated interactions through a few case studies. Overall, this study expands the knowledge of protein-protein interactions paving the way for learning about multi-protein assemblies.

B-012: Computational analysis of HK:RR interactions in two component systems
Track: 3D-SIG
  • Sneha Bheemireddy, Indian Institute of Science, Bangalore, India, India
  • Narayanaswamy Srinivasan, Indian Institute of Science, Bangalore, India


Presentation Overview: Show

Two-component signaling systems (TCSs) form the primary apparati in bacteria for sensing and responding to extracellular cues. Bacteria can have a few tens to a few hundred distinct TCSs. These TCSs in bacteria are often positively autoregulated, where the histidine kinase (HK) and response regulator (RR) proteins, comprising a TCS, are expressed downstream of the signal they transduce. This autoregulation improves the sensitivity of the TCS to stimuli and amplifies adaptive responses. Experimental studies on TCSs of M. tuberculosis reveal that in one out of five HKs studied, there was at least one non-cognate RR with higher affinity than that of the cognate RR for the HK. Phosphorylated HKs would thus preferentially bind the non-cognate RRs, suppressing signal transduction through the cognate pathways. In this work, we augment these findings using computational modeling methods to study the interaction between non-cognate HK-RR pairs in comparison with their cognate partners. Such studies could provide a structural rationale for how non-cognate interactions might be preferred in inter-species interactions as a form of regulation.

B-013: A high-throughput computational pipeline for selection of effective antibody therapeutics against viruses
Track: 3D-SIG
  • Rahul Kaushik, Biotechnology Research Center, Technology Innovation Institute, Abu Dhabi, UAE, United Arab Emirates
  • Naveen Kumar, Zoonotic Diseases Group, ICAR-National Institute of High Security Animal Diseases, Bhopal, India, India
  • Thomas Launey, Biotechnology Research Center, Technology Innovation Institute, Abu Dhabi, UAE, United Arab Emirates


Presentation Overview: Show

Continuous evolution of viruses such as SARS-CoV-2 generates multiple variants that can evade approved neutralizing antibodies (NAbs) or other antibody therapeutics. Frequently re-testing whether a panel of NAbs have retained their neutralization potential against such emerging variants is costly and time-consuming. In such a case, computational tools that mimic the experimental binding affinity of target protein of virus and NAbs could play a pivotal role in providing researchers with quick understanding and guidance for using a set of NAbs effective against new virus variants. Hence, we developed and validated a computational pathway that accurately predicts binding affinity in hetero-dimeric and –trimeric protein complexes. We demonstrated the application of this computational pathway in rapid screening of a panel of NAbs for efficacy against new SARS-CoV-2 Omicron variant using benchmarking with the experimental data. We believe that the developed computational pathway could help the antiviral researchers in initial screening of multiple NAbs against viruses of interest.

B-014: GTalign: High-performance protein structure alignment, superposition and search
Track: 3D-SIG
  • Mindaugas Margelevicius, Institute of Biotechology, Life Sciences Center, Vilnius University, Lithuania


Presentation Overview: Show

Knowledge of protein structure and function is crucial for understanding life. Protein structures have recently been made available for almost every known protein, most of which are predicted with high accuracy. This resource has opened opportunities to accelerate biomedical research and innovation. Yet, a comprehensive and reliable solution to mining large protein structure databases is currently missing. To address this problem, we developed a novel computational tool for protein structure search and alignment, GTalign. One of its most important features is that it delivers more than three orders of magnitude speedup in finding alignment and superposition compared to TM-align. Furthermore, fast prescreening for similarities in structure and sequence space leads to a further more than 10-fold speedup for database searches. GTalign is accurate, too. The results of a large-scale single-chain structure alignment study show its superiority over TM-align and DALI, the two most commonly used tools. GTalign has many other useful features, is highly configurable, and uses standard structure file and database (TAR) formats. GTalign is available at https://github.com/minmarg/gtalign_alpha

B-015: Automating Structure-Based Design: Integrating Fragment Hotspot Maps in Drug Discovery Pipelines at Scale
Track: 3D-SIG
  • Francesca Vianello, Exscientia, United Kingdom
  • Catarina A. Carvalheda, Exscientia, United Kingdom
  • Gail Bartlett, Exscientia, United Kingdom
  • Jake McGreig, Exscientia, United Kingdom
  • Thorsten Nowak, Exscientia, United Kingdom
  • Chris Radoux, Exscientia, United Kingdom


Presentation Overview: Show

Structure-based drug design is powerful in discovering and optimising novel therapeutics and small molecule probes. There are still significant challenges in scaling the analysis of all available structures for a given protein target and using the resulting data efficiently and sensibly. Exscientia has developed several scalable automated bioinformatics workflows which incorporate successful methodologies (e.g. Fragment Hotspot Maps) and leverage the scale of our cloud-based infrastructure to support the development of novel, precision-engineered drugs. Here, we describe two applications of such tools.

First, our target tractability assessment process captures a global target profile drawn from all available structural data. Remarkably, our target tractability platform can scale to proteome-wide analysis in a few hours and for less than the cost of a mobile phone. Second, we introduce our work on structure-guided automatic generation of designs, an end-to-end pipeline to produce ready-for-synthesis hit-like molecules. We present our analysis across all kinome structures within the AlphaFold Database, showcasing early experimental validation on two kinase targets, DYRK1B and PKD1. Half of the synthesised hits were active, with our workflows finding at least one low-nanomolar hit per target. Lastly, we will highlight future areas of improvement such as incorporating conformational ensembles to account for protein flexibility.

B-016: Insight on ageing processes effects in collagen I using molecular dynamics
Track: 3D-SIG
  • Zara Msoili, Université de Reims Champagne-Ardenne, CNRS UMR 7369, MEDyC, Reims 51687,France, France
  • Zeynep Pinar Haslak, Université de Reims Champagne-Ardenne, CNRS UMR 7369, MEDyC, Reims 51687,France, France
  • Manuel Dauchez, Université de Reims Champagne-Ardenne, CNRS UMR 7369, MEDyC, Reims 51687,France, France
  • Hua Wong, Université de Reims Champagne-Ardenne, CNRS UMR 7369, MEDyC, Reims 51687,France, France
  • Stéphanie Baud, Université de Reims Champagne-Ardenne, CNRS UMR 7369, MEDyC, Reims 51687,France, France


Presentation Overview: Show

The extracellular matrix (ECM) accounts for different roles in cells: attachment, signaling, function, and repair. In the ECM, aging processes can involve post-translational modifications (PTMs). One of these, the carbamylation of lysine (LYS) to form homocitrulline (HCT), is well-studied experimentally. The most useful force field to study it in silico would be AMBER, which contains glycosylation (a major PTM). However, no set of energy parameters associated with HCT within the AMBER99SB*-ILDNP force field is publicly available. In the present study, two complementary approaches were used to (i) derive energy parameters describing HCT in this force field using quantum mechanics and (ii) assess them on experimental and modelled structures through molecular dynamics simulations. Using restrained electrostatic potential, no major deviations were observed for the partial charges of atoms shared with other LYS derivatives. Depending on the system, the presence of one to three HCT in the same region impacts differently the overall structure (polyproline-II structure is conserved) and influences locally the dynamics of triple helices. The determined energy parameters could be applied in any in-silico study of ECM components using the AMBER99SB*-ILDNP force field in GROMACS and will be useful to understand and decipher the role of carbamylation in aging processes.

B-017: FireProt 2.0: Web-based Platform for the Fully-Automated Design of Thermostable Proteins
Track: 3D-SIG
  • Miloš Musil, Brno University of Technology/Loschmidt Laboratories, Czechia
  • Andrej Ježík, Brno University of Technology, Slovakia
  • Jana Horáčková, Loschmidt Laboratories, Czechia
  • Simeon Borko, Brno University of Technology/Loschmidt Laboratories, Slovenia
  • Petr Kabourek, Loschmidt Laboratories, Czechia
  • David Bednář, Loschmidt Laboratories, Czechia
  • Jiří Damborský, Loschmidt Laboratories, Czechia


Presentation Overview: Show

Thermostable proteins find their use in numerous biomedical and biotechnological applications. However, the computational design of stable proteins is an uneasy task that usually results in a set of single-point mutations with a limited effect on protein stability.
FireProt 2.0 builds on top of the previously published FireProt-web, retaining the original functionality and expanding it with several new strategies and quality-of-life improvements. Compared to its predecessor, the new FireProt server provides users with additional multiple-point designs constructed using a novel approach based on the Bron-Kerbosch algorithm minimizing the antagonistic effect between the individual mutations.
Furthermore, it is possible to limit the FireProt calculation to a set of user-defined mutations, run a saturation mutagenesis of the whole protein, or select mutations based on the bfactor analysis. Potentially stabilizing mutations predicted from the sequences constructed by ancestral sequence reconstruction are now available as a second evolution-based strategy. FireProt also integrates the AlphaFold database and the homology modeling utilizing ProMod3.
Finally, the second version is significantly faster compared to its predecessor reducing the calculation time from over a week to 1-2 days of time, and the reworked user interface broadens the availability of the tool even to users with older hardware.

B-018: Predicting large protein complexes by combining AlphaFold with integrative modeling protocols
Track: 3D-SIG
  • Dima Molodenskiy, European Molecular Biology Laboratory, Hamburg, Germany
  • Dingquan Yu, European Molecular Biology Laboratory, Hamburg, Germany
  • Jan Kosinski, European Molecular Biology Laboratory, Hamburg, Germany


Presentation Overview: Show

The recent advancements in artificial intelligence (AI) have revolutionized the field of structural biology by enabling the prediction of protein structures from amino-acid sequence with near-experimental accuracy. AlphaFold2 and RoseTTAFold2 are two such AI-based structure prediction methods that have been shown to accurately predict the atomic structures of single and multichain proteins. However, modeling large protein complexes remains a challenge.

It is possible to overcome this challenge by utilizing integrative modeling approaches, which involve combining data from multiple experimental techniques such as cryoEM or crosslinks and computational methods to generate accurate structural models of large protein complexes. One such integrative modeling program, Assembline, was developed in our laboratory at EMBL Hamburg and has been successfully applied for modeling 10-100 MDa molecular complexes, including the human nuclear pore complex.

In this poster, we will present our recent developments in modeling protein complexes with AlphaFold Multimer leveraging available experimental data. We will also discuss the challenges of modeling large complexes and our approaches how to tackle these issues. We will also demonstrate the successful applications of our method to specific biological systems.

B-019: Is Protein BLAST a thing of the past?
Track: 3D-SIG
  • Ali Al-Fatlawi, TU Dresden, Germany
  • Michael Schroeder, TU Dresden, Germany


Presentation Overview: Show

BLAST, the algorithm, which compares and aligns the ever-growing amounts of sequences, is arguably the most widely spread computational tool in molecular biology to search for nucleotide and protein sequences. Three decades after BLAST was introduced, there were major breakthroughs in structure prediction, and tools such as RoseTTAFold and AlphaFold emerged. Consequently, every sequence in the major sequence databases now comes with a model of how it folds in 3D. While this does not affect (non-coding) nucleotide sequences, it begs the question, of whether search over 3D protein structures will replace search over protein sequences? Is Protein BLAST a thing of the past?

B-020: Case studies of protein-protein interaction analysis using a profile method.
Track: 3D-SIG
  • Nobuyuki Uchikoga, Meiji University, Japan
  • Yuri Matsuzaki, Tokyo Institute of Technology, Japan


Presentation Overview: Show

Proteins interact with each other to form a network that sustains biological activities in the cell. In this work, we introduce some studies toward understanding protein-protein interaction network mechanisms involving multiple proteins from the viewpoint of bioinformatics.
 The rigid-body docking method uses the conformational information of a pair of proteins to generate a large number of possible candidate complex structures called decoy structures. Therefore, this method is useful for obtaining information on the interaction surfaces between proteins. However, since it is difficult to compare protein-protein interaction surface information with only 3-D structure information, we have developed a profile method that represents interaction surface information based on amino acid sequences. This profile information has high affinity to the amino acid sequence information and facilitates the analysis of protein-protein interactions considering the conformational changes of proteins. In addition, it is possible to treat not only a pair of protein-protein interactions but also multiple protein-protein interactions or interaction networks comprehensively.
 We have taken advantage of this profiling method and applied it to the protein-protein network of bacterial chemotaxis and discussed the results. We also present our analysis of the interactions considering conformational changes.

B-021: Structural characterization of uncharacterized proteins of arsenic tolerant bacteria Deinococcus indicus DR1 strain
Track: 3D-SIG
  • Giri Vasan, SASTRA Deemed University, India
  • Richa Priyadarshini, Department of Life Sciences, School of Natural Sciences, Shiv Nadar University, India
  • Ragothaman Yennamalli, Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed to be University, Thanjavur, India., India


Presentation Overview: Show

Deinococcus indicus DR1 is a novel gram-negative bacterium isolated from the Dadri wetlands of Uttar Pradesh, India. This species shows resistance to high radiation and also to arsenic. Previously, the proteins of the ars gene clusters had been characterized and classified computationally using structure prediction and molecular dynamics simulation. The aim is to understand the resistance of this species to other heavy metals and the gene cluster that is responsible for its appropriate resistance. Currently, the focus has been given to hypothetical and uncharacterized proteins of D. indicus DR1 and attempts to assign functional and structural annotation using the structural information. This species has a total of 4128 proteins and among these, 1017 proteins are annotated as hypothetical proteins. In some cases, the function of a hypothetical protein in the metabolism of the organism through structural homology by considering the conserved domain analysis. The structures of the hypothetical proteins were clustered using MaxCluster and the domains of the hypothetical proteins were identified using SWORD2. The functional annotation pipeline is set using OrthoFinder by comparing sequences from other available Deinococcus genomes. Functional annotation might lead to mapping the metabolic pathway and structural-systems-biology of the arsenic tolerance in the same organism.

B-022: Generative Models for MHC Class I Protein Deimmunization
Track: 3D-SIG
  • Hans-Christof Gasser, University of Edinburgh, United Kingdom
  • Diego Oyarzun, University of Edinburgh, United Kingdom
  • Javier Antonio Alfaro, School of Informatics, University of Edinburgh; International Centre for Cancer Vaccine Science, University of Gdańsk, United Kingdom
  • Ajitha Rajan, University of Edinburgh, United Kingdom


Presentation Overview: Show

Protein therapeutics can be designed to have a wide range of applications - like actively interfering with viral replication or replace the function of genetically inactive proteins. However, they are faced with drawbacks. One is, that they might trigger an unwanted immune response. This response can be directed towards the therapeutic protein, or potentially one of the viral vector proteins. Generative machine learning has demonstrated immense potential in many areas. The profuse availability of protein sequence and structure data further fuels its application to protein generation. The model presented in this paper represents a novel first step towards utilizing language models in conjunction with reinforcement learning to reduce a protein's MHC-I immune visibility. We show that our model is capable of achieving lower immune visible proteins and we validate it using bioinformatics methods. We also compare our model to a state of the art physics based method and find that our technique is more efficient, requiring fewer peptide classifications, and is more similar in sequence to the original protein.

B-023: A comparison of the binding sites of antibodies and single-domain antibodies
Track: 3D-SIG
  • Gemma Gordon, University of Oxford, United Kingdom
  • Henriette Capel, University of Oxford, United Kingdom
  • Bora Guloglu, University of Oxford, United Kingdom
  • Eve Richardson, University of Oxford, United Kingdom
  • Ryan Stafford, Twist Bioscience, United States
  • Charlotte Deane, University of Oxford, United Kingdom


Presentation Overview: Show

Antibodies are the largest class of biotherapeutics. However, in recent years, single-domain antibodies have gained traction due to their smaller size and comparable binding affinity. Antibodies (Abs) and single-domain antibodies (sdAbs) differ in the structures of their binding sites: most significantly, single-domain antibodies lack a light chain and so have just three CDR loops. Given this inherent structural difference, it is important to understand whether Abs and sdAbs are distinguishable in how they engage a binding partner and thus, whether they are suited to different types of epitopes. In this study, we use non-redundant sequence and structural datasets to compare the paratopes, epitopes and antigen interactions of Abs and sdAbs. We demonstrate that even though sdAbs have smaller paratopes, they target epitopes of equal size to those targeted by Abs. To achieve this, the paratopes of sdAbs contribute more interactions
per residue than the paratopes of Abs. Additionally, we find that conserved framework residues are of increased importance in the paratopes of sdAbs, suggesting that they include non-specific interactions to achieve comparable affinity. Furthermore, the epitopes of sdAbs and Abs cannot be distinguished by their shape. For our datasets, sd- Abs do not target more concave epitopes than Abs: we posit that this may be explained by differences in the orientation and compaction of sdAb and Ab CDR-H3 loops. Overall, our results have important implications for the engineering and humanization of sdAbs, as well as the selection of the best modality for targeting a particular epitope.

B-024: Automated Cryo-EM nuclease structure determination supports fast and efficient nuclease engineering for optimized gene editing
Track: 3D-SIG
  • Yoav Atsmon-Raz, Emendo Biotherapeutics Ltd, Israel
  • Nurit Meron, Emendo Biotherapeutics Ltd, Israel
  • Lior Izhar, Emendo Biotherapeutics Ltd, Israel
  • Idit Buch, Emendo Biotherapeutics Ltd, Israel


Presentation Overview: Show

CRISPR-based gene editing holds great promise for treating a wide range of medical conditions. However, the editing activity and specificity of CRISPR-associated nucleases in mammalian cells are often insufficient to fully harness their therapeutic potential. To optimize nuclease function, protein engineering methodologies are used, supported and guided by 3D structural information. To obtain the 3D structure of the CRISPR ribonucleoprotein (RNP) complex bound to a target DNA, homology modeling or deep-learning methods such as AlphaFold2 are commonly applied. While these are time- and cost-efficient approaches, they are limited for distant homologs and accurate modeling of heterogeneous macromolecules, making single-particle cryo-EM a preferable method. Current cryo-EM structure determination methods are based on manually resolving the model, a typically laborious and error-prone process. We have developed an automated pipeline that resolves all-atom structures of CRISPR RNP-DNA complexes from cryo-EM density maps in a parallelized high-throughput manner. Our approach accounts for the constructed model’s spatial identity to a given template, and for its quality via a scoring function. We have integrated this model into our protein engineering pipeline, where it plays a pivotal role in design and development of novel nuclease variants with improved activity and specificity in mammalian cells.

B-025: Improved computational epitope profiling using structural models identifies a broader diversity of antibodies that bind the same epitope
Track: 3D-SIG
  • Fabian C Spoendlin, University of Oxford, United Kingdom
  • Brennan Abanades, University of Oxford, United Kingdom
  • Matthew I J Raybould, University of Oxford, United Kingdom
  • Wing Ki Wong, Roche Innovation Center Munich, Germany
  • Guy Georges, Roche Innovation Center Munich, Germany
  • Charlotte M Deane, University of Oxford, United Kingdom


Presentation Overview: Show

The function of an antibody is intrinsically linked to which epitope it engages. Clonal clustering methods, based on sequence identity, are commonly used to group antibodies that will bind the same epitope. However, such methods neglect the fact that antibodies with highly diverse sequences can exhibit similar binding site geometries and engage common epitopes. In a previous study we described a method (SPACE1) that structurally clustered antibodies in order to predict their epitopes. This methodology was limited by the inaccuracies and incomplete coverage of template-based modelling. It was also only benchmarked at the level of domain-consistency on one virus class. Here, we present SPACE2, which uses the latest machine learning based structure prediction technology combined with a novel clustering protocol and benchmark it on binding data that has epitope level resolution. On six diverse sets of antigen specific antibodies we demonstrate that SPACE2 accurately clusters antibodies that engage common epitopes and achieves far higher data set coverage than clonal clustering and SPACE1. Furthermore, we show that the functionally consistent structural clusters identified by SPACE2 are even more diverse in sequence, genetic lineage, and species origin than those found by SPACE1. These results reiterate that structural data improves our ability to identify antibodies that bind the same epitope, adding information to sequence-based methods, especially in data sets of antibodies from diverse sources. SPACE2 is openly available on Github (https://github.com/oxpig/SPACE2).

B-026: Frustraevo: A Web Server To Quantify Local Energetic Frustration Conservation In Protein Families
Track: 3D-SIG
  • Maria Freiberger, Fiuner, Argentina
  • Victoria Ruiz-Serra, Barcelona Supercomputing Center, Spain
  • Camila Pontes, Barcelona Supercomputing Center, Spain
  • Miguel Romero, Instituto de Ciencias de la Vid y del Vino, Spain
  • Miguel Fernández Martín, Barcelona Supercomputing Center - Life Sciences, Spain
  • Diego Ferreiro, FCEyN-Univ Buenos Aires, Argentina
  • Alfonso Valencia, Barcelona Supercomputing Centre BSC, Spain
  • R. Gonzalo Parra, Barcelona Supercomputing Center, Spain


Presentation Overview: Show

According to the Principle of Minimum Frustration, proteins fold by minimizing their internal conflicts when folding into a set of conformational states that are in equilibrium. However, not all residue-residue interactions in natural proteins are energetically optimized but some remain in energetic conflict, i.e. they are highly frustrated. This remaining local energetic frustration in the native state of proteins is known to encode different functional aspects such as protein-protein interactions, allosterism, catalytic sites, or adaptive polymorphisms. We have recently developed a strategy to calculate the conservation degree of local frustration patterns within protein families and have shown how these are useful to elicit stability and functional constraints. Here, we present FrustraEvo, a web server tool to quantify local energetic frustration conservation patterns in protein families. FrustraEvo can help to gain insights into the evolutionary history and functional properties of proteins and contribute to the rational design of novel functions.

The webserver is freely available at URL: http://frustraevo.qb.fcen.uba.ar

B-027: AlphaPulldown, a python package for protein–protein interaction screens using AlphaFold-Multimer - and its latest updates
Track: 3D-SIG
  • Dingquan Yu, EMBL, Germany
  • Jan Kosinski, EMBL, Germany


Presentation Overview: Show

The artificial intelligence-based structure prediction program AlphaFold-Multimer enabled structural modelling of protein complexes with unprecedented accuracy. However, high-throughput protein–protein interactions (PPIs) screens and modelling of protein complexes using AlphaFold-Multimer is still challenging not only because of its high demands of computing resources but also relatively poorer performance in cross-species protein complexes, such as viral-host interaction complexes. I will first present AlphaPulldown, a Python package that streamlines PPI screens and high-throughput modelling of higher-order oligomers using AlphaFold-Multimer and demonstrate some successful applications of using AlphaPulldown in looking for possible PPIs. Then I will outline the latest updates of AlphaPulldown that aim to improve AlphaFold-Multimer’s performance in modelling viral-host protein complexes. Lastly, I will demonstrate other new features we have added to the package, which improve usability, speed, and result interpretation. These additional auxiliary features will accommodate a wider range of users’ needs.

Overall, our work improved the computing efficiency of running AlphaFold-Multimer and provides a convenient command-line interface, a variety of confidence scores and a graphical analysis interface. The recent development and extension of AlphaPulldown will help the community build models for more challenging protein complexes with adequate auxiliary tools.

B-029: Large-scale Analysis of Tunnels in Cognate Enzyme-Ligand Complexes with CaverDock
Track: 3D-SIG
  • Ondrej Vavra, Masaryk University, FNUSA-ICRC, Czechia
  • Jon Tyzack, EMBL-EBI, United Kingdom
  • Faraneh Haddadi, Masaryk University, FNUSA-ICRC, Czechia
  • Stourac Jan, Masaryk University, FNUSA-ICRC, Czechia
  • Jiri Filipovic, Masaryk University, Czechia
  • Jiri Damborsky, Masaryk University, FNUSA-ICRC, Czechia
  • Stanislav Mazurenko, Masaryk University, FNUSA-ICRC, Czechia
  • Janet Thornton, EMBL-EBI, United Kingdom
  • David Bednar, Masaryk University, FNUSA-ICRC, Czechia


Presentation Overview: Show

Tunnels in enzymes that have buried active sites facilitate substrate entry and product release. Targeting the bottlenecks of enzyme tunnels is a powerful strategy in protein engineering, but identifying them across thousands of enzymes requires computational methods, many of which need manual setup and adjustments. Previously, we developed widely-used CaverDock for the analysis of ligand passage through tunnels and channels. The short computation times of the latter created unprecedented opportunities for large-scale screening campaigns. In the current study, we developed a pipeline combining automated structural analysis with an in-house machine-learning predictor to annotate protein pockets, as well as energy analysis of ligand binding and unbinding. The pipeline was used to analyse the transport of cognate ligands in over 29,000 tunnels from 13,000 enzyme structures. Our analysis of ligand passage revealed that in 75 % of cases, the top-priority tunnel identified by CAVER had the most favourable energies. Additionally, energy profiles of cognate ligands revealed that a simple geometry analysis can correctly identify tunnel bottlenecks in only 50 % of cases. The study provided valuable information for interpreting results from tunnel calculation and energy profiling, useful for protein engineering. CaverDock is available at https://loschmidt.chemi.muni.cz/caverdock/.

B-030: B-cell epitope prediction on HLA antigens using molecular dynamics simulation data
Track: 3D-SIG
  • Diego Amaya-Ramirez, Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
  • Cedric Usureau, APHP Saint-Louis Hospital, F-75010 Paris, France
  • Romain Lhotte, APHP, Saint-Louis Hospital, F-75010 Paris, France
  • Magali Devriese, APHP, Saint-Louis Hospital, F-75010 Paris, France
  • Jean-Luc Taupin, APHP Saint-Louis Hospital F-75010 Paris, France
  • Malika Smail-Tabbone, Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
  • Marie-Dominique Devignes, Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France


Presentation Overview: Show

In the context of organ transplantation, unexpected production by the recipient of antibodies against donor-specific HLA antigens is the main reason for transplant loss. Thus, it is necessary to better predict B-cell epitopes* on HLA antigens in order to improve the donor-recipient matching step from a structural point of view. Although there are currently multiple tools that predict B-cell epitopes, none have shown satisfactory results on HLA antigens. This may be explained by the limited availability of HLA-antibody complex structures in the PDB [1]. Here we present a method that relies on an unprecedented dataset of short molecular dynamics (MD) simulations of 207 HLA antigens. We define hydrophobic properties, electrostatic charges, flexibility and solvent accessibility as descriptors calculated on 3D patches and aggregated over MD trajectories. To overcome the lack of HLA-antibody complexes in the PDB, we leverage confirmed eplets from the HLA Eplet Registry database [2] as "ground truth" to train an Extremely Randomized Trees [3] machine learning model. This model outperforms state-of-the-art tools for B-cell epitope prediction on HLA antigens, such as DiscoTope 3.0 [4]. These results suggest the interest in using MD simulation data for the challenging task of epitope prediction.

B-031: DDMut: predicting effects of mutations on protein stability using deep learning
Track: 3D-SIG
  • Yunzhuo Zhou, University of Queensland, Australia
  • Qisheng Pan, University of Queensland, Australia
  • Douglas Pires, University of Melbourne, Australia
  • Carlos Rodrigues, University of Queensland, Australia
  • David Ascher, University of Queensland, Australia


Presentation Overview: Show

Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutations, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.

B-032: Insights into the Mechanism of Clostridioides Difficile Toxin B Action on Human Intestinal Epithelial Cells at Gene Expression and Protein Structure Levels
Track: 3D-SIG
  • Damla Nur Camlı, Istanbul Medeniyet University, Turkey
  • Saliha Ece Acuner, Istanbul Medeniyet University, Turkey


Presentation Overview: Show

Clostridioides difficile (C. difficile) bacterium causes severe diarrhea and pseudomembranous colitis formation when over-colonized at the host intestinal tract and it is one of the important players of antibiotic resistance. Toxin A (TcdA) and Toxin B (TcdB or ToxB) of C. difficile are mainly responsible exotoxins for the symptoms. Here, we focused on adding structural and transcriptomic insights into the molecular mechanism of TcdB protein action on the host intestinal epithelial cells. For this aim, we predicted and functionally analyzed the differentially expressed genes (DEGs) of the human intestinal epithelial cells in TcdB presence. Resulting genes were in line with the literature, moreover, the novel genes KIF20B, CDCA3, MCM5, and KNL1 were found to be down-regulated and ABCC5 to be up-regulated. Importantly, the structures of known host protein and inhibitor/antibody interactions with pathogen TcdB were analyzed with docking, MD simulation, and the unknown complex structures were modeled to further clarify the mechanism of Toxin B action on host proteins, especially GTPases. TcdB is known to selectively interact with GDP-bound inactive Rho/Ras GTPases (such as Cdc42 and Rac1) and the selective binding mechanism for GDP-bound states as well as Cdc42 over Rac1 were enlightened at the molecular level.

B-032: Insights into the Mechanism of Clostridioides Difficile Toxin B Action on Human Intestinal Epithelial Cells at Gene Expression and Protein Structure Levels
Track: 3D-SIG
  • Damla Nur Camlı, Istanbul Medeniyet University, Turkey
  • Saliha Ece Acuner, Istanbul Medeniyet University, Turkey


Presentation Overview: Show

Clostridioides difficile (C. difficile) bacterium causes severe diarrhea and pseudomembranous colitis formation when over-colonized at the host intestinal tract and it is one of the important players of antibiotic resistance. Toxin A (TcdA) and Toxin B (TcdB or ToxB) of C. difficile are mainly responsible exotoxins for the symptoms. Here, we focused on adding structural and transcriptomic insights into the molecular mechanism of TcdB protein action on the host intestinal epithelial cells. For this aim, we predicted and functionally analyzed the differentially expressed genes (DEGs) of the human intestinal epithelial cells in TcdB presence. Resulting genes were in line with the literature, moreover, the novel genes KIF20B, CDCA3, MCM5, and KNL1 were found to be down-regulated and ABCC5 to be up-regulated. Importantly, the structures of known host protein and inhibitor/antibody interactions with pathogen TcdB were analyzed with docking, MD simulation, and the unknown complex structures were modeled to further clarify the mechanism of Toxin B action on host proteins, especially GTPases. TcdB is known to selectively interact with GDP-bound inactive Rho/Ras GTPases (such as Cdc42 and Rac1) and the selective binding mechanism for GDP-bound states as well as Cdc42 over Rac1 were enlightened at the molecular level.

B-032: Insights into the Mechanism of Clostridioides Difficile Toxin B Action on Human Intestinal Epithelial Cells at Gene Expression and Protein Structure Levels
Track: 3D-SIG
  • Damla Nur Camlı, Istanbul Medeniyet University, Turkey
  • Saliha Ece Acuner, Istanbul Medeniyet University, Turkey


Presentation Overview: Show

Clostridioides difficile (C. difficile) bacterium causes severe diarrhea and pseudomembranous colitis formation when over-colonized at the host intestinal tract and it is one of the important players of antibiotic resistance. Toxin A (TcdA) and Toxin B (TcdB or ToxB) of C. difficile are mainly responsible exotoxins for the symptoms. Here, we focused on adding structural and transcriptomic insights into the molecular mechanism of TcdB protein action on the host intestinal epithelial cells. For this aim, we predicted and functionally analyzed the differentially expressed genes (DEGs) of the human intestinal epithelial cells in TcdB presence. Resulting genes were in line with the literature, moreover, the novel genes KIF20B, CDCA3, MCM5, and KNL1 were found to be down-regulated and ABCC5 to be up-regulated. Importantly, the structures of known host protein and inhibitor/antibody interactions with pathogen TcdB were analyzed with docking, MD simulation, and the unknown complex structures were modeled to further clarify the mechanism of Toxin B action on host proteins, especially GTPases. TcdB is known to selectively interact with GDP-bound inactive Rho/Ras GTPases (such as Cdc42 and Rac1) and the selective binding mechanism for GDP-bound states as well as Cdc42 over Rac1 were enlightened at the molecular level.

B-033: Food components and their activity by interacting with protein targets
Track: 3D-SIG
  • Deborah Giordano, Istituto di Scienze dell’Alimentazione, CNR, via Roma 64, Avellino, Italy
  • Angelo Facchiano, Istituto di Scienze dell’Alimentazione, CNR, via Roma 64, Avellino, Italy


Presentation Overview: Show

The large amount of information available in public databases on food compounds, particularly their physicochemical and biological properties, is of extreme interest for studying the effects that foods may have on human health. In order to dispose of the information in a manner appropriate to our research needs, we have built a collection of data from the public resources, supplementing it with the results of our computational and bioinformatics studies. This new resource is managed and enriched through a procedure that adds compounds and collects new data to supplement the information already present. The data for each compound are enriched by an additional layer of information made by our research activity. The processing of the basic data is done by an original analysis procedure, based on bioinformatics tools, which generates new information about the potential biological activities of the compounds by integrating archival data, structural information, molecular properties, and molecular simulation results obtained by protein-target selection and ligand-protein docking.

B-034: Structural analysis of the sodium transporter HKT1;5 in Oryza species for new determinants of salinity tolerance
Track: 3D-SIG
  • Mohamed Shafi, National Centre for Biological Sciences, India
  • Gayatri Venkataraman, M.S. Swaminathan Research Foundation, India
  • Anne-Aliénor Véry, Biochimie et Physiologie Moléculaire des Plantes, France
  • Ramanathan Sowdhamini, National Centre for Biological Sciences, India


Presentation Overview: Show

The level of Na+ accumulation in many plant species is inversely correlated to salinity tolerance. Members of the high-affinity K+ transporter (HKT) gene family are required for the regulation of Na+ accumulation in plants, which leads to salinity tolerance. HKT1;5 is a plasma membrane transporter that transports Na+ from root xylem vessels to shoots. OsHKT1;5 from Asian cultivars of rice has displayed allelic variation associated to salinity tolerance. Among wild rice species, this transporter has been studied mostly in the halophytic Oryza coarctata. We generated homology models for HKT1;5 transporter from Oryza species, noted variations in residues, and applied electrostatics to infer charge differences. We noticed that some variations in the amino acid residue at the pore entrance could have an impact on ion entry. Using molecular dynamics simulations, we found that variations in Na+ transport affinities were caused by a swapping of positively and negatively charged amino acid residues at the ion pore entrance of the transporter. The affinity and conductance of the ions were further verified by transport kinetics assays and site directed mutagenesis studies. Overall, the knowledge gained from our study could be used to develop rice crop varieties that are more resilient to salinity stress.

B-035: Predictions of DNA mechanical properties at a genomic scale reveal potentially new functional roles of DNA flexibility.
Track: 3D-SIG
  • Georg Back, Max Planck Institute of Molecular Plant Physiology, Germany
  • Dirk Walther, Max Planck Institute of Molecular Plant Physiology, Germany


Presentation Overview: Show

Mechanical properties of DNA have been implied to influence many of its biological functions. Recently, a new high-throughput method, called loop-seq, that allows measuring the intrinsic bendability of DNA fragments, has been developed. Using loop-seq data, we created a deep learning model to explore the biological significance of local DNA flexibility in a range of different species from different kingdoms. Consistently, we observed a characteristic and largely dinucleotide-composition-driven change of local flexibility near transcription start sites. No evidence of a generally present region of lowered flexibility upstream of transcription start sites to facilitate transcription factor binding was found. Yet, depending on the actual transcription factor investigated, flanking-sequence-dependent DNA flexibility was identified as a potential factor influencing binding. Compared to randomized genomic sequences, depending on species and taxa, actual genomic sequences were observed both with increased and lowered flexibility. Furthermore, in Arabidopsis thaliana, crossing-over and mutation rates, both de novo and fixed, were found to be linked to rigid sequence regions. Our study presents a range of significant correlations between characteristic DNA mechanical properties and genomic features, the significance of which with regard to detailed molecular relevance awaits further experimental and theoretical exploration.

B-036: Exploring AlphaFold2's Capability in Predicting Intrinsically Disordered Protein Interactions
Track: 3D-SIG
  • Seongeun Kim, Seoul National University, Korea, The Democratic People's Republic of
  • Martin Steinegger, Seoul National University, Korea, The Democratic People's Republic of


Presentation Overview: Show

Intrinsically disordered proteins (IDPs) can form stable structures upon interacting with partner proteins. Studying these interactions is challenging yet important for, e.g., developing targeted therapies for IDP-related diseases. Recent advances in protein structure prediction, such as AlphaFold2, RoseTTAfold, and AlphaFold-multimer, provide new opportunities to IDP interactions. Although AlphaFold2 can predict IDPs with low certainty, AlphaFold-multimer's disordered protein interaction (DPI) prediction performance is unknown.

We extracted DPIs from a dataset of 9,056 known interacting dimers obtained from the PDB, resulting in 40 DPIs. Using AlphaFold-multimer-v2 for prediction, only 3 IDP dimers achieved a high DockQ and interface-pTM scores, indicating accurate predictions. Nevertheless, correct and incorrect predictions could be distinguished by utilizing an interface-pTM value above 0.65. This method proved effective for pinpointing relevant targets without requiring an experimental structure. However, the limited availability of experimentally determined DPI interaction datasets presents a significant challenge in improving and evaluating these methods.


Our study highlights the potential of computational methods for identifying disordered protein interactions and has implications for understanding their functions and developing targeted therapies for IDP-related diseases. This approach can be expanded to screen larger datasets, uncovering novel protein-protein interactions.

B-037: LoopGrafter: A Web Tool for Transplanting Dynamical Loops for Protein Engineering
Track: 3D-SIG
  • Joan Planas-Iglesias, Loschmidt Labs., UEB, Masaryk University. International Clinical Research Center, St Anne’s University Hospital Brno., Czechia
  • Filip Opaleny, Department of Visual Computing, Faculty of Informatics, Masaryk University, Czechia
  • Pavol Ulbrich, Department of Visual Computing, Faculty of Informatics, Masaryk University, Czechia
  • Jan Stourac, Loschmidt Labs., UEB, Masaryk University. International Clinical Research Center, St Anne’s University Hospital Brno., Czechia
  • Zainab Sanusi, Loschmidt Laboratories. Department of Experimental Biology (UEB), Masaryk University., Czechia
  • Gaspar P. Pinto, Loschmidt Labs., UEB, Masaryk University. International Clinical Research Center, St Anne’s University Hospital Brno., Czechia
  • Andrea Schenkmayerova, Loschmidt Labs., UEB, Masaryk University. International Clinical Research Center, St Anne’s University Hospital Brno., Czechia
  • Jan Byska, Department of Visual Computing, Faculty of Informatics, Masaryk University, Czechia
  • Jiri Damborsky, Loschmidt Labs., UEB, Masaryk University. International Clinical Research Center, St Anne’s University Hospital Brno., Czechia
  • Barbora Kozlikova, Department of Visual Computing, Faculty of Informatics, Masaryk University, Czechia
  • David Bednar, Loschmidt Labs., UEB, Masaryk University. International Clinical Research Center, St Anne’s University Hospital Brno., Czechia


Presentation Overview: Show

Proteins, among them enzymes, have naturally evolved for billions of years. Yet, industrial, biotechnological, and medical applications often require enzymatic improvement. Protein design is aimed at achieving such enhancement in different aspects of enzyme performance. Engineering flexibility is one of the last frontiers in protein design: the implications of mutations in flexibility are yet not well understood. Cryptic relationships among distant residues (allostery) and the lack of a flexibility functional unit definition are reasons for this hardship. We recently engaged in this exciting field by showcasing and generalising the process of exchanging loops (dynamic structural elements) between homologous proteins, transferring the dynamic behaviour from one protein to another as result. To this end, we designed LoopGrafter (https://loschmidt.chemi.muni.cz/loopgrafter/, doi: 10.1093/nar/gkac249), a web server that provides a step-by-step interactive procedure where the user can successively identify loops in the input proteins, calculate their geometries, similarities and dynamics, and select loops to be transplanted. All different chimeras derived from any possible recombination point are calculated, and 3D models constructed and energetically evaluated for each of them. The obtained results can be interactively visualised in a user-friendly graphical interface and downloaded for detailed structural analyses. The server has 3500 users and 1200 jobs registered.

B-038: Foldcomp: scalable solution for compressing huge protein structure database
Track: 3D-SIG
  • Hyunbin Kim, Seoul National University, South Korea
  • Milot Mirdita, Seoul National University, South Korea
  • Martin Steinegger, Seoul National University, South Korea


Presentation Overview: Show

The AlphaFold databases of 214M UniProt proteins and the ESMatlas catalog of nearly 700M metagenomic protein structures provide valuable resources to the community. However, their extensive sizes of 23TB and 15TB, respectively, exceed the capacity of standard workstations and pose a challenge even to well-equipped cluster environments.
To address this issue, we introduce Foldcomp, a novel compression algorithm that encodes the torsion and bond angles in a compact binary format, named FCZ. Foldcomp achieves up to 90% compression compared to float-encoded 3D coordinates, requiring only 13 bytes per residue. Reconstruction of original coordinates is accomplished by utilizing the NeRF algorithm with internal anchor points. By averaging bi-directional reconstructed coordinates, we were able to reduce reconstruction loss to ~0.08Å range. Our method is as fast as gzip, with 3ms and 6ms for compression and decompression, respectively.
Foldcomp is available as a command line interface and a Python API at https://foldcomp.foldseek.com. Additionally, Foldcomp has been augmented by community contributions, such as a PyMol plugin and a dataset wrapper in Graphein. We provide the compressed database of AlphaFold database (1.1TB), ESMatlas (1.8TB), SwissProt (2.9GB), and recently released AlphaFold2 cluster representatives (2.2GB) at https://foldcomp.steineggerlab.workers.dev. Foldcomp is published at https://doi.org/10.1093/bioinformatics/btad153.

B-039: Petascale Search for Protein Structure Prediction
Track: 3D-SIG
  • Gyuri Kim, Seoul National University, South Korea
  • Sewon Lee, Seoul National University, South Korea
  • Rayan Chikhi, Institut Pasteur, France
  • Artem Babaian, University of Toronto, Canada
  • Milot Mirdita, Seoul National University, South Korea
  • Martin Steinegger, Seoul National University, South Korea
  • Sukhwan Park, Seoul National University,
  • Eli Levi Karin, ELKMO, Denmark,
  • Andriy Kryshtafovych


Presentation Overview: Show

In the recent CASP15 competition, the crucial role of multiple sequence alignments (MSAs) in protein structure prediction was underscored by the success of AlphaFold2-based models, such as ColabFold. To push the boundaries of MSA utilization we conducted petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided as input to ColabFold-predict. This approach significantly improved structure prediction accuracy, achieving a high GDT_TS (>70) for 66% of the non-easy targets, a substantial leap from the 52% achieved with baseline ColabFold MSAs. Enabling ColabFold’s advanced features, more recycles, using templates, and multimer models, contributed to a further performance boost. This significant increase in accuracy improved ColabFold’s CASP15 ranking from 11th to 3rd place among 47 server groups, indicating the vast potential of large-scale sequence exploration for better structure prediction.

B-040: Clustering predicted structures at the scale of the known protein universe
Track: 3D-SIG
  • Inigo Barrio-Hernandez, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Jingi Yeo, School of Biological Sciences, Seoul National University, South Korea
  • Jürgen Jänes, Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
  • Milot Mirdita, School of Biological Sciences, Seoul National University, South Korea
  • Cameron Gilchrist, School of Biological Sciences, Seoul National University, South Korea
  • Tanita Wein, Department of Molecular Genetics, Weizmann Institute of Science, Israel
  • Mihaly Varadi, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Sameer Velankar, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Pedro Beltrao, Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
  • Martin Steinegger, School of Biological Sciences, Seoul National University, South Korea


Presentation Overview: Show

Proteins are key to all cellular processes and their structures are important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy with over 214 million predicted structures available in the AlphaFold database (AFDB). However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment based clustering algorithm - Foldseek cluster - that can cluster hundreds of millions of structures. Using this method we have clustered all structures in AFDB, identifying 2.27M non-singleton structural clusters, of which 31% lack annotations representing likely novel structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AFDB. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem species specific, representing lower quality predictions or examples of de-novo gene birth. Additionally, we show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote homology. Based on these analyses we identify several examples of human immune related proteins with remote homology in prokaryotic species which illustrates the value of this resource for studying protein function and evolution across the tree of life.
Easy exploration of clusters at: cluster.foldseek.com

B-041: Dynamic Residue Network Analysis Reveals Mutant Allosteric Pathways and Functionally Significant Residues in Human NAT2 Enzyme
Track: 3D-SIG
  • Wayde Veldman, Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa


Presentation Overview: Show

Tuberculosis disease treatment has high variability of patient response, largely due to varying arylamine N-acetyltransferase 2 (NAT2) enzyme alleles in different human ethnicities. This often leads to either drug-underexposure or hepatoxicity. 500ns molecular dynamics simulations were run using 17 missense-mutant enzyme systems plus wild-type, representing all human NAT2 alleles with known phenotype. Subsequent betweenness centrality (BC) calculations were performed, where BC is computed for every MD frame and the residue centrality values are aggregated as medians or time-averages. BC can detect functionally and/or structurally significant residues, especially in the control of information flow. Per residue BC differences from wild-type (delta-BC) were then calculated for each mutant system (wild-type minus mutant). The global BC top 5% residues are all located in and around around the active-site pocket. The top (positive) and bottom (negative) 5% delta-BC residues, when shown as spheres in Pymol, revealed allosteric pathways connecting the mutated residue to the active-site in 10 of the 17 mutant systems. The pathway residue-residue interactions were confirmed using crystal structure 2PFR. Overall, BC calculations identified 1) residues important for information flow in human NAT2, and 2) mutant-system residues with the biggest BC changes from wild-type that likely affect enzyme function.

B-042: Uncovering patterns of natural selection for protein mutational robustness at the codon and genetic code levels through large-scale in silico mutagenesis experiments.
Track: 3D-SIG
  • Martin Schwersensky, Université Libre de Bruxelles, Belgium
  • Fabrizio Pucci, Université Libre de Bruxelles, Belgium
  • Marianne Rooman, Université Libre de Bruxelles, Belgium


Presentation Overview: Show

How, and to what extent, mutational robustness is selected for in the natural evolution of coding sequences is a long-standing question in molecular evolution.
We tackled this problem by estimating the folding free energy change upon all possible single-site mutations introduced in more than 20,000 proteins, and through experimental stability and fitness data.
Our results highlight patterns of natural selection for protein mutational robustness at multiple scales.
At the residue level, we found protein surface to be more robust against random mutations than the core, especially for small proteins. Destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, and stabilizing mutations are about 4% in both.
At the genetic code level, we observed smaller destabilization for substitutions of codon base III, followed by base I, bases I&III, base II, and other multiple base substitutions. This ranking anticorrelates with the codon-anticodon mispairing frequencies of the translation process, suggesting that the standard genetic code is optimized to limit the impact of random mutations, and even more so of translation errors.
At the codon level, codon usage and codon usage bias appear optimized for mutational robustness and translation accuracy, especially for surface residues.

B-043: Contrastive learning in protein language space predicts interactions between drugs and protein targets
Track: 3D-SIG
  • Rohit Singh, Massachusetts Institute of Technology, United States
  • Samuel Sledzieski, Massachusetts Institute of Technology, United States
  • Bryan Bryson, Massachusetts Institute of Technology, United States
  • Lenore Cowen, Tufts University, United States
  • Bonnie Berger, Massachusetts Institute of Technology, United States


Presentation Overview: Show

Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance on one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pre-trained protein language models ("PLex") and employing a novel protein-anchored contrastive co-embedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with sub-nanomolar affinity, plus a novel strongly-binding EPHB1 inhibitor (K_D = 1.3nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate ConPLex will facilitate novel drug discovery by making highly sensitive in-silico drug screening feasible at genome scale. ConPLex is available open-source at https://github.com/samsledje/ConPLex.

B-044: TT3D: Leveraging Pre-Computed Protein 3D Sequence Models to Predict Protein-Protein Interactions
Track: 3D-SIG
  • Samuel Sledzieski, Computer Science and Artificial Intelligence Laboratory, MIT, United States
  • Kapil Devkota, Tufts University, United States
  • Rohit Singh, Massachusetts Institute of Technology, United States
  • Lenore Cowen, Tufts University, United States
  • Bonnie Berger, Massachusetts Institute of Technology, United States


Presentation Overview: Show

High-quality computational structural models are now pre-computed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner, is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information from the distances and angles along the protein backbone into a linear string using tokens from a 21 letter discretized structural alphabet, of the same length as the protein string. We show that when structural data is available, so that the Foldseek string can be efficiently produced, when it is offered as an additional input to our recent Topsy-Turvy deep-learning method that predicts PPIs cross-species solely from a pair of protein amino acid sequences, performance substantially improves. Thus our new method, TT3D, presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, which is sufficiently lightweight so that high-quality binary PPI predictions across all protein pairs can be made genome-wide.

B-045: SpikePro: towards the accurate prediction of SARS-CoV-2 variant severity
Track: 3D-SIG
  • Gabriel Cia, Université Libre de Bruxelles, Belgium
  • Jean Marc Kwasigroch, Université Libre de Bruxelles, Belgium, Belgium
  • Fabrizio Pucci, Université Libre de Bruxelles, Belgium
  • Marianne Rooman, Université Libre de Bruxelles, Belgium


Presentation Overview: Show

Motivation: The SARS-CoV-2 virus has shown a remarkable ability to evolve and spread across the globe through successive waves of variants since the original Wuhan lineage. Despite the numerous studies published in the last three years, accurately predicting the severity of newly emerging variants is still a challenging issue which, if overcomed, could help in the genomic surveillance of COVID-19.
Results: We recently developed SpikePro, a structure-based computational model capable of quickly and accurately predicting the fitness of a SARS-CoV-2 variant based on its spike protein sequence. It proceeds by an in-silico estimation of the impact that mutations have on the spike protein’s stability as well as on its binding affinity with the human angiotensin-converting enzyme 2 (ACE2) and with a set of neutralizing antibodies. SpikePro yields a precise indication of the viral transmissibility, infectivity, immune escape and basic reproduction rate (R0). We present an updated version of SpikePro, which is now also available through an easy-to-use webserver, and illustrate its power through a retrospective study of the evolution of the fitness and reproduction rate for the main SARS-CoV-2 lineages.

B-047: A kinematic view of protein loop flexibility, with connexions to thermodynamics
Track: 3D-SIG
  • Frederic Cazals, INRIA, France
  • Timothee O'Donnell, Inria, France


Presentation Overview: Show

Accurate prediction of protein thermodynamics and kinetics remains a major challenge.
This talk will present a novel paradigm to sample loop conformations with en emphasis on torsion angles (soft coordinates), via solutions of certain tripeptide loop closure (TLC) problems. The paradigm uses techniques reminiscent from high dimensional volume calculations, and is therefore directly connected to thermodynamics via the calculation of densities of states (DoS). The talk will review our constructions, present results and discuss connexions with thermodynamics.
The corresponding code is currently being integrated to the Structural Bioinformatics Library (https://sbl.inria.fr)

B-048: CoCoNat: prediction of coiled-coil regions using protein language models
Track: 3D-SIG
  • Giovanni Madeo, University of Bologna, Italy
  • Castrense Savojardo, University of Bologna, Italy
  • Matteo Manfredi, Biocomputing Group, University of Bologna, Italy
  • Pier Luigi Martelli, University of Bologna, Italy
  • Rita Casadio, University of Bologna, Italy


Presentation Overview: Show

Coiled-coils domains (CCDs) are found in proteins in all kingdoms of life. They perform a wide range of important cellular functions. Canonical Coiled-Coil Domains (CCD) consist of interwined alpha helices containing heptad repeats (labeled abcdefg, the so-called registers) with constraint pairing. CCDs are classified according to the number and orientation of the α‐helices involved, i.e, by their oligomerization state. The importance of CCDs demands computational methods for predicting the presence and localization of CCDs, including registers, and their oligomerization state. Here we present CoCoNat (https://coconat.biocomp.unibo.it), a novel deep-learning based method for predicting CCD regions, registers and oligomerization state. Our method, for the first time, adopts a sequence encoding based on two state-of-the-art protein Language Models (pLMs): ProtT5 and ESM1-b. The pLMs embedding are processed by a three-step architecture including a deep network, a conditional random field and single-layer feed forward network. We trained CoCoNat on a dataset comprising 2191 proteins containing CCDs and 9040 proteins not endowed with CCD. When tested on a blind test set comprising 429 CCD and 278 non-CCD proteins, CoCoNat overpasses the current state-of-the-art both for residue-level and segment-level CCD detection, register annotation as well as oligomerization state prediction.

B-048: CoCoNat: prediction of coiled-coil regions using protein language models
Track: 3D-SIG
  • Giovanni Madeo, University of Bologna, Italy
  • Castrense Savojardo, University of Bologna, Italy
  • Matteo Manfredi, Biocomputing Group, University of Bologna, Italy
  • Pier Luigi Martelli, University of Bologna, Italy
  • Rita Casadio, University of Bologna, Italy


Presentation Overview: Show

Coiled-coils domains (CCDs) are found in proteins in all kingdoms of life. They perform a wide range of important cellular functions. Canonical Coiled-Coil Domains (CCD) consist of interwined alpha helices containing heptad repeats (labeled abcdefg, the so-called registers) with constraint pairing. CCDs are classified according to the number and orientation of the α‐helices involved, i.e, by their oligomerization state. The importance of CCDs demands computational methods for predicting the presence and localization of CCDs, including registers, and their oligomerization state. Here we present CoCoNat (https://coconat.biocomp.unibo.it), a novel deep-learning based method for predicting CCD regions, registers and oligomerization state. Our method, for the first time, adopts a sequence encoding based on two state-of-the-art protein Language Models (pLMs): ProtT5 and ESM1-b. The pLMs embedding are processed by a three-step architecture including a deep network, a conditional random field and single-layer feed forward network. We trained CoCoNat on a dataset comprising 2191 proteins containing CCDs and 9040 proteins not endowed with CCD. When tested on a blind test set comprising 429 CCD and 278 non-CCD proteins, CoCoNat overpasses the current state-of-the-art both for residue-level and segment-level CCD detection, register annotation as well as oligomerization state prediction.

B-049: MSA Pairing Transformer: Learning to pair interacting pairs of proteins across MSAs
Track: 3D-SIG
  • Alex Hawkins-Hooker, University College London, United Kingdom
  • Daniel Burkhardt Cerigo, datavaluepeople, United Kingdom
  • David T. Jones, University College London, United Kingdom
  • Brooks Paige, University College London, United Kingdom


Presentation Overview: Show

We study the problem of pairing interacting pairs of protein sequences within sets of evolutionarily related sequences that are known to interact. This problem arises most prominently in the context of structure prediction, where state-of-the-art models for the prediction of protein complexes rely on paired multiple sequence alignments as inputs. We propose to train an interaction partner predictor by applying contrastive learning to pairs of interacting domains in scrambled single-chain multiple sequence alignments. By leveraging an MSA-based model which jointly encodes sets of paired and unpaired sequences, we develop an iterative pairing algorithm which improves the accuracy of pairing predictions by allowing the model to condition its predictions on high-confidence pairs predicted in previous rounds. We demonstrate the effectiveness of our model across a set of bacterial interactions for which ground-truth pairings are known, finding that it is possible to achieve high pairing accuracy even within small sub-sampled sets of sequences. Across a large dataset of prokaryotic interactions with experimentally determined complexes, the paired MSAs generated by our model contain stronger co-evolutionary signal for unsupervised prediction of interface contacts than those generated by heuristic methods.