Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
A-001: PIABA: Peptides with immune and antimicrobial activity-based associations database
Track: 3D-SIG
  • Lucas Sousa Palmeira, State University of Southwest of Bahia, Brazil, Brazil
  • William Jefferson Silva Sena, State University of Southwest of Bahia, Brazil, Brazil
  • Rodrigo Barreto Rodrigues, State University of Southwest of Bahia, Brazil, Brazil
  • Bruno Silva Andrade, State University of Southwest of Bahia, Brazil, Brazil


Presentation Overview: Show

Introduction: Bioactive peptides generally present immunomodulatory, antioxidative, anti-inflammatory, anti-cancer, anti-microbial, and anti-hypertensive activities. Furthermore, plants, animals, fungi, and mainly bacterial sources are the sources of these molecules in nature. Due to their main immunological and anti-microbial functions, this work aimed to develop a relational database of peptide sequences, structures, and their physical-chemical characteristics focusing on immunomodulatory and anti-microbial activities. Methods: Sequences were directly downloaded from public databases, as well as obtained by web scraping using Python and Selenium scripts (https://github.com/WilliamJSS/LBQC-PDB). A data frame was created using Pandas, and the common features: sequence, biological activity, and unique sequence ID were merged. We used the R Peptides library for generating the physical-chemical characteristics, as well as Polars library was used to perform data visualization and filtering. Peptide structures were predicted using MOLEDELLER and Alphafold2, and a search engine was implemented using BLASTp API. Results: Our curated peptide database, after removing duplicated entries, has 286,755 peptide sequences, of which 173,951 are classified as immunomodulatory and 5,117 as anti-microbial, including 60,000 PDB structures available for downloading. We are implementing an integrated docking with multiple immune and microbial proteins, as well as a tool for IA prediction of peptide functions and annotation.

A-002: Machine learning for the efficient prediction of safe and biologically active organophosphorus molecules
Track: 3D-SIG
  • Hang Hu, National Research Council of Canada (NRC), Canada
  • Hsu Kiang Ooi, National Research Council Canada, Canada
  • Mohammad Sajjad Ghaemi, NRC, Canada
  • Anguang Hu, Defence Research and Development Canada, Canada


Presentation Overview: Show

We developed a fragment-based molecule discovery approach with quantum chemical simulations and Recurrent Neural Networks (RNN) with an attention layer to sample the chemical space of molecules. The process starts with semiempirical quantum simulation using machine learning-trained parameters to describe the ligand-protein complex accurately. The quantum simulation can identify key ligand chemical motifs necessary for protein binding. By using the chemical motifs as initial seeds for generating molecules, the fragment-based drug design is an approach that can effectively sample the chemical space of interest. The predicted molecules will exhibit similar biological action modes as known compounds. Additionally, growing molecules from de-risked compound fragments can potentially lower development costs and shorten development timelines for new drugs or pesticides. We adopted our method to predict molecules with similar biological activity as organophosphorus (OP) pesticides yet less toxic to humans. A recent study points to OP adduction of tubulin as the cause of its neurotoxicity, even at low dosages. The generated molecules contain a starting fragment of PO3S. They can interact with the intended serine binding site on the acetylcholinesterase but have a bulky hydrocarbon side chain limiting their binding effectiveness to the tyrosine abduction site on the tubulin.

A-003: AlphaFold DB: Recent advances and future developments
Track: 3D-SIG
  • Maxim Tsenkov, PDBe/AlphaFold DB (EMBL-EBI), United Kingdom
  • Paulyna Magana, PDBe/AlphaFold DB (EMBL-EBI), United Kingdom


Presentation Overview: Show

AlphaFold DB (AFDB) is a groundbreaking resource launched in 2021, providing over 214 million highly accurate predicted protein structures. It significantly impacts various fields of life sciences, including structural biology, drug discovery and bioinformatics. Since its inception, AFDB has attracted more than 1 million unique users from 225 countries, with 3.4 million page views and 17k downloads. The database offers a range of features, including data in PDB and mmCIF formats, confidence metrics like predicted aligned error (PAE) in JSON format, bulk download, programmatic access to API endpoints, and a user-friendly web UI with a search system and 3D visualisations.

Continuous improvements include an enhanced search functionality with advanced filtering options; results include sequence review status and genome status, improved interface accessibility and usability for an intuitive user experience. AFDB also has a new development to integrate sequence and structure-based search services, such as ElasticBlast and FoldSeek, based on extensive user research.

AFDB focuses on expanding its functionalities and data offerings, considering the addition of viral proteins, isoforms, and multimers, based on user feedback. In addition, we aim to create world-class training materials to ensure their continued relevance and usability in the life sciences community.

A-004: Analysis of protein-nucleic acid interactions in the PPI3D web server
Track: 3D-SIG
  • Justas Dapkunas, Institute of Biotechnology, Life Sciences Center, Vilnius University, Lithuania
  • Albertas Timinskas, Institute of Biotechnology, Life Sciences Center, Vilnius University, Lithuania
  • Kliment Olechnovic, Institute of Biotechnology, Life Sciences Center, Vilnius University, Lithuania
  • Migle Tomkuviene, Institute of Biotechnology, Life Sciences Center, Vilnius University, Lithuania
  • Ceslovas Venclovas, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania, Lithuania


Presentation Overview: Show

To perform their functions, proteins frequently interact with other proteins and nucleic acids. Detailed information on these interactions can be obtained from the three-dimensional structures of the corresponding protein-protein or protein-nucleic acid complexes. Since the experimental structure determination is often tedious and expensive, computational structure prediction methods are widely applied. Currently, the structures of protein-protein complexes can be modeled accurately by AlphaFold, but structures protein-nucleic acid complexes can be reliably inferred only based on homology. To facilitate the search and analysis of structural data on protein interactions based on sequence homology, we have developed the PPI3D web server. Here, we present its novel features related to the analysis of protein-nucleic acid interactions. Given the sequences of proteins, PPI3D can now identify not only protein-protein, but also protein-nucleic acid interactions available for homologous proteins in the Protein Data Bank. These structures in the PPI3D database are clustered according to similarity of both protein sequences and interaction interfaces. The identified protein-nucleic acid interfaces can be analyzed in detail at the sequence and structure levels. In addition, homology models of the interactions with nucleic acids can be generated for the query proteins. PPI3D web server is available at https://bioinformatics.lt/ppi3d.

A-005: Characterising the human protein-ligand interactome
Track: 3D-SIG
  • Javier Sánchez Utgés, University of Dundee, United Kingdom
  • Stuart MacGowan, University of Dundee, United Kingdom
  • Geoff Barton, University of Dundee, United Kingdom


Presentation Overview: Show

Ligands are key for protein function and can act as substrates, co-factors, or inhibitors in complex with a myriad of proteins spread across all molecular processes. For that reason, understanding how they interact with proteins is crucial and can provide insight into drug development, and wider protein function understanding. In this work, we analyse protein-ligand interactions found in the 5,000 human proteins with experimentally determined 3D-structures. Protein-ligand interaction fingerprints are used to define 13,000 ligand binding sites formed by 90,000 unique ligand binding residues. Ligand sites are grouped in four clusters based on their solvent accessibility profile. The defined clusters are biologically different. Cluster 0 is formed by the most buried, conserved and missense-depleted sites. Moreover, it is enriched in functional sites. These results suggest that these cluster labels can be used to infer ligand binding site functionality. K-nearest neighbours, and artificial neural network models are shown to predict cluster membership with an accuracy >80%. These models can be applied to any ligand binding site to obtain an indication of the likelihood that site is functional. The work is relevant to classifying the functional relevance of binding sites from fragment screening studies and prediction.

A-006: Acceleration of molecular dynamics simulation using deep learning
Track: 3D-SIG
  • Shuto Hayashi, Department of Computational and Systems Biology, Medical Research Institute, Tokyo Medical and Dental University, Japan
  • Jun Koseki, Division of Systems Biology, Nagoya University Graduate School of Medicine, Japan
  • Yasuhiro Kojima, Laboratory of Computational Life Science, National Cancer Center Research Institute, Japan
  • Teppei Shimamura, Department of Computational and Systems Biology, Medical Research Institute, Tokyo Medical and Dental University, Japan


Presentation Overview: Show

Molecular dynamics (MD) simulation is one of the most prevalent methods to investigate the conformational and dynamic properties of proteins. However, conducting long-time MD simulations is computationally demanding and requires a supercomputer even at the microsecond to millisecond scale.
To address this computational limitation, we developed LAMDA, a novel deep neural network that emulates MD simulation at high speed and accuracy. LAMDA takes the initial coordinates, velocities, and forces of protein atoms as input and predicts their respective values after a specified duration. LAMDA incorporates symmetries of dynamics, specifically rotation-, reflection-, translation-, and permutation-equivariance, allowing LAMDA to enhance prediction accuracy. Furthermore, LAMDA does not take explicit solvent information as input, which makes the simulation process several to dozens of times faster.
We tested LAMDA on a benchmark dataset and compared its predictions with trajectories obtained from conventional MD simulations. The results show that LAMDA can achieve an accuracy of 10^(-2) angstrom scale errors and a speedup of 20-fold in predicting the coordinates of protein atoms after simulating 64 femtoseconds.
LAMDA has the significant advantage of being universally applicable to any protein. The high performance and broad applicability of LAMDA suggest its potential as a valuable tool for analyzing protein dynamics.

A-007: HIPPO: HIstogram-based Pseudo-POtential for scoring protein-ssRNA fragment-based docking poses
Track: 3D-SIG
  • Anna Kravchenko, Université de Lorraine, CNRS, Inria, LORIA, France
  • Sjoerd de Vries, Université de Lorraine, CNRS, Inria, LORIA, France
  • Malika Smail-Tabbone, Université de Lorraine, CNRS, Inria, LORIA, France
  • Isaure Chauvot de Beauchene, Université de Lorraine, CNRS, Inria, LORIA, France


Presentation Overview: Show

Single-stranded RNA (ssRNA) binding to protein is challenging to model due to ssRNA flexibility. This is currently best addressed by fragment-based docking, but the inaccurate scoring of the near-native poses for fragments with a low number of protein-ssRNA contacts produces too many wrong models.
We present a novel analytic approach to derive a scoring potential from the relative frequencies of pseudoatom-pseudoatom distances in near-native versus non-native poses. A specificity of our approach is to create a small set of potentials that covers the variety of binding modes, apply them and pool their results. We derived HIstogram-based Pseudo-POtentials (HIPPO), a collection of 4 potentials for scoring ssRNA on RNA-Recognition Motifs (RRM) in ATTRACT coarse-grained representation. We tested it on a benchmark of 57 RRM-ssRNA structures.
HIPPO reaches a 3-fold enrichment for 135/217 fragments, versus only 74/217 with ATTRACT scoring function. The selection of the correct potential among 4 even leads to a 10-fold enrichment for half of the cases. HIPPO especially improves the scoring for the best-docked fragment in each complex, opening the possibility of using it as an anchor for incremental modelling.
Our approach is promising for studying protein-ssRNA interactions, and a priori extendable to other complexes.

A-008: The new mega dataset combined with a deep neural network makes progress in predicting the impact of single mutations on protein stability
Track: 3D-SIG
  • Marina Pak, Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Russia
  • Nikita Dovidchenko, Institute of Protein Research, Russian Academy of Sciences, Russia
  • Satyarth Mishra Sharma, Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Russia
  • Dmitry Ivankov, Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Russia


Presentation Overview: Show

Prediction of protein stability change (∆∆G) due to a single mutation is essential for biotechnology, medicine, and our understanding of the physics underlying protein folding. Despite the recent tremendous success in 3D protein structure prediction, the apparently simpler problem of predicting the effect of mutations on protein stability has been hampered by the low amount of experimental data. With the recent high-throughput measurements of mutational effects in the ‘mega’ experiment for ~850,000 mutations, it becomes possible to apply state-of-the-art deep learning methods. Here we explore the ability of the neural network trained on ESM2-generated embeddings to predict the change of protein stability due to single mutations. The developed method ABYSSAL predicts well the data from the ‘mega’ experiment (Pearson correlation 0.84) while the prediction of ∆∆G values from previous experiments is more modest. ABYSSAL also shows perfect satisfaction of the antisymmetry property. Overall our study shows great perspectives for developing the deep learning ∆∆G predictors.

A-009: Inhibition of Mfd as an innovative strategy in the battle against antimicrobial resistance.
Track: 3D-SIG
  • Samantha Samson, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France
  • Delphine Cormontagne, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Seav-Ly Tran, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Lucie Lebreuilly, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Jean-Christophe Cintrat, Université Paris-Saclay, CEA, Laboratoire de Chimie Bioorganique, 91191, Gif-sur-Yvette, France
  • Didier Rognan, Université de Strasbourg, CNRS, Laboratoire d'Innovation Thérapeutique, 67400, Illkirch, France, France
  • Nalini Rama Rao, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Gwenaëlle André, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France


Presentation Overview: Show

Drug-resistant bacterial infections result in at least 700,000 deaths every year; if nothing is done, it has been estimated that this number could rise up to 10 million by 2050, becoming the first cause of death worldwide. The declining discovery of new antibiotics has led to a lack of effective treatment options for these infections. Thus, the development of new antibacterials targeting innovative targets and mechanisms of action that have a low potential for resistance induction is crucial. Appropriately, this study focuses on the Mutation Frequency Decline protein (Mfd) as a promising bacterial target and on its inhibition potentiated in silico, in vitro and in vivo.
Mfd is a non-essential multifunctional protein and ubiquitous solely in bacteria. It was proven to be a virulence factor that is overexpressed in bacterial cells to overcome DNA damage caused by macrophage-generated nitric oxide (NO). Thus, it plays a crucial role in blocking the innate immune response during bacterial infection.
The purpose of this investigation is to identify molecules capable of inhibiting Mfd ATPase activity. We will mostly focus on the computational characterization of the target and hits, tested in vitro and in vivo, and how this drives the potentiation of hits into leads.

A-010: Structure, function and evolution of de novo proteins in rice
Track: 3D-SIG
  • Francisco Javier Guzmán-Vega, King Abdullah University of Science and Technology, Saudi Arabia
  • Afaque Ahmad Momin, King Abdullah University of Science and Technology, Saudi Arabia
  • Yuanmin Zheng, King Abdullah University of Science and Technology, Saudi Arabia
  • Stefan T. Arold, King Abdullah University of Science and Technology, Saudi Arabia


Presentation Overview: Show

In the last decade it has emerged that new genes can originate de novo from non-coding parts of the genome. De novo genes have been detected in several plant and animal species and are suggested to play a role in important processes like environmental adaptation, stress response, development and disease. Here we use computational and experimental methods to analyze the structural landscape of 175 de novo proteins from rice and compare them to canonical rice proteins and two sets of randomly generated proteins. The predicted properties suggest a selection for high intrinsic disorder in de novo proteins, which aligns with the “pre-adaptation” theory of protein evolution. Unexpectedly, some sequences in the de novo and random groups are predicted to adopt three dimensional folds, whereas others are predicted to form more complex folds through self-association. Hence, our analysis suggests that, although de novo proteins are highly enriched in disordered sequences, complex protein folds and novel functionality may arise through PPIs. These results further our understanding of the origin of protein structure and function and are driving the experimental testing of selected sequences. This will, in turn, provide feedback on the performance of state-of-the-art structure prediction tools.

A-011: Dissecting the sequence and structure determinants of GPCR - Gprotein selectivity via structural bioinformatics and machine learning
Track: 3D-SIG
  • Marin Matic, Scuola Normale Superiore, Italy
  • Francesco Raimondi, Scuola Normale Superiore, Italy
  • Pasquale Miglionico, Scuola Normale Superiore, Italy
  • Francesco Carli, Scuola Normale Superiore, Italy
  • Natalia De Oliviera Rosa, Scuola Normale Superiore, Italy
  • Asuka Inoue, Graduate School of Pharmaceutical Sciences, Tohoku University, Japan
  • Manae Tatsumi, Graduate School of Pharmaceutical Sciences, Tohoku University, Japan
  • Gurdeep Singh, Heidelberg University Biochemistry Centre, Germany
  • J Silvio Gutkind, Department of Pharmacology and Moores Cancer Center, University of CA, United States


Presentation Overview: Show

GPCRs transduce extracellular signals to intracellular pathways by coupling with heterotrimeric G-proteins categorized as Gs, Gi/o, Gq/11, and G12/13 based on their α-subunits. To understand the sequence-based coupling selectivity we created a new machine learning predictor PRECOGx. It is based on protein language models that encode structural and functional information of protein sequences. The ESM1b protein embeddings of GPCR are used as features. It predicts GPCR interactions with G protein and β-arrestin. It outperformed its predecessor (e.g., PRECOG) in predicting GPCR-transducer couplings, being also able to consider all GPCR classes. To explore the structural determinants of G-protein-coupling selectivity, we analyzed 362 available 3D structures of GPCR-G-protein complexes. Analysis of the residue contacts at the interfaces revealed a network of secondary structure elements that elucidated new and known structural features that determine coupling specificity. Through RMSD calculation, focusing on the docking mode of the G-protein α-subunits with respect to the receptor we show Gs-GPCR complexes have more structural constraint and a smaller range of docking poses than Gi/o-GPCR. Binding interface energy calculations showed that structural properties of the complexes contribute to higher stability of Gs compared to Gi/o complexes.

A-012: ISPRED-SEQ - Deep Neural Networks and Embeddings for Predicting Interaction Sites in Protein Sequences
Track: 3D-SIG
  • Matteo Manfredi, Biocomputing Group - University of Bologna, Italy
  • Castrense Savojardo, Biocomputing Group - University of Bologna, Italy
  • Gabriele Vazzana, Biocomputing Group - University of Bologna, Italy
  • Pier Luigi Martelli, University of Bologna, Italy
  • Rita Casadio, Biocomputing Group - University of Bologna, Italy


Presentation Overview: Show

Protein-Protein Interactions (PPI) are crucial in many biological processes. Identifying PPI sites is key to understanding a protein function in the context of cell complexity.
Computational tools can identify PPI sites on protein structures or sequences when experimental evidence is missing. Much effort has focused on methods exploiting structural information. Sequence-based prediction task is challenging and few methods are available, achieving limited prediction performance.
Here we present ISPRED-SEQ, a web server for identifying PPI sites in protein sequences freely available at https://ispredws.biocomp.unibo.it.
The method stands on a deep architecture combining convolutional blocks and three cascading fully connected layers. ISPRED-SEQ is trained on a dataset of 6,066 protein structures, comprising 285,751 binding and 1,471,545 non-binding residues. The input is generated using two state-of-the-art language models, ESM-1b and ProtT5, and avoids the need of computing hand-crafted features such as sequence profiles or physicochemical properties.
We benchmark ISPRED-SEQ on a dataset comprising 448 proteins. We adopted a stringent homology-reduction procedure to guarantee that all proteins included in the training dataset have less than 25% sequence similarity with sequences used for benchmarking.
Results show that ISPRED-SEQ significantly outperforms other state-of-the-art methods, reporting a MCC of 0.39.

A-013: New computational insights on enzyme stability-activity trade-off
Track: 3D-SIG
  • Fabrizio Pucci, ULB, Belgium
  • Qingzhen Hou, Shandong University, China
  • Marianne Rooman, ULB, Belgium


Presentation Overview: Show

Enzymes play a fundamental role in almost all biotechnological and biopharmaceutical processes. Despite all the efforts made to decipher the interplay between activity and stability, two key characteristics of enzymes, it is
still complex to understand their relationship as well as how evolution and environmental conditions shape it.
To further investigate this question, we collected six hundred enzymes with known structures and catalytic sites. Using the formalism of statistical potentials, we computed the contribution of each residue to the enzyme's folding free energy and studied its dependence on the residue distance from the closest catalytic site. We discover an interesting pattern of stability in the catalytic region, consisting of an energetic compensation between the catalytic residues, which are usually stability weaknesses, and their neighboring residues, which are rather stability strengths. We also compared stability patterns in psycrophilic and thermophilic enzymes, and found more pronounced stability weaknesses in pscycrophilic's catalytic sites than in thermophilics. This work provides interesting information on the stability and activity properties of enzymes that could be exploited to improve enzyme design methods.

A-014: Disordered flanks slow down the growth of amyloid fibrils in neurodegenerative disease, while hydrophobic surfaces accelerate growth
Track: 3D-SIG
  • Juami van Gils, Vrije Universiteit Amsterdam, Netherlands
  • Jacob Larsen, Danish Technical University, Denmark
  • Soumik Ray, Danish Technical University, Denmark
  • Alexander Buell, Danish Technical University, Denmark
  • Sanne Abeln, Vrije Universiteit Amsterdam, Netherlands


Presentation Overview: Show

Amyloid fibrils are aggregates consisting of many proteins of the same species. They are involved in many neurodegenerative diseases, such as Alzheimer and Parkinson. In this work, we investigate the role of the hydrophobic core and disordered flanks in amyloid fibril growth. We measure the growth rate of alpha-synuclein with and without disordered flanks, and compare our results to mathematical models to gain mechanistic insight. The experimental measurements show that alpha-synuclein grows significantly faster without disordered flanks, and the mathematical models suggest that this is due to secondary nucleation (initiation of growth of additional fibrils off the surface of the original fibril). Thus, the disordered flanks may play an important role in slowing down protein aggregation.

Additionally, we measure the interaction between alpha-synuclein and nanoPET, a type of nanoplastic. Nanoplastics are commonly found in nature through pollution, and in cosmetic products. Due to their large hydrophobicity, nanoplastics may facilitate amyloid fibril formation in a way similar to secondary nucleation. Here, we show that nanoPET interacts with the hydrophobic core of alpha-synuclein, and that the disordered flanks of alpha-synuclein prevent the formation of nanoPET clusters. Thus, these results indicate that nanoPET can enhance the formation of alpha-synuclein fibrils.

A-015: Ligand Binding Site Comparison Using a Graph Variational Autoencoder
Track: 3D-SIG
  • Arthur Goetzee, Schwede group, Biozentrum, University of Basel, Switzerland
  • Janani Durairaj, Schwede group, Biozentrum, University of Basel, Switzerland
  • Torsten Schwede, Schwede group, Biozentrum, University of Basel, Switzerland


Presentation Overview: Show

Comparison of ligand binding sites (LBS) on proteins is a powerful tool for drug design, identifying lead compounds, and inferring protein function. However, LBS comparison is a complex task which requires domain knowledge and intuition due to LBS having multiple definitions, including sequence homology, structural, physicochemical, and geometrical perspectives. While numerous comparison methods have been developed, they are often limited to a narrow range of these definitions. Recent advances in deep learning methods have shifted focus towards using learnable representations, which incorporate structural information and a wide range of handcrafted features. Incorporating these into graph representations through geometric deep learning allows for the capture of complex features at both local and global scales. We propose a novel approach using a graph variational autoencoder that learns the underlying distributions of LBS and calculates their similarity based on their euclidean distance in a latent space. ESM embeddings at the protein level are used for sequence and homology information, fPocket provides geometric information. Physicochemical and structural information at the amino-acid level is integrated using embeddings obtained from ScanNet trained through transfer learning. Our method aims to overcome previous limitations and develop a more accurate and generalized method for comparing LBS on proteins.

A-016: DiffRNAFold: Generating RNA Structures with Latent Space Diffusion
Track: 3D-SIG
  • Mihir Bafna, Georgia Institute of Technology, United States
  • Vikranth Keerthipati, Georgia Institute of Technology, United States
  • Subhash Kanaparthi, Georgia Institute of Technology, United States


Presentation Overview: Show

With the development of the mRNA based SARS-CoV-2 (Covid) vaccine, there has been a recent surge of RNA based therapeutics. The core mission of top pharmaceutical companies has changed to include the development of new RNA vaccines and therapies to aid clinical development. Even in the case of protein based therapeutics, biochemists must first synthesize RNA into a feasible structure to maximize its efficacy of forming proteins for the desired outcome. Thus, studying and designing feasible RNA tertiary structures is essential for medicinal applications.

Recently, there has been an influx of diffusion models for molecular and structural biology tasks including: protein structure, molecular docking, and small molecule conformation prediction, but RNA remains relatively unexplored. Here, we propose DiffRNAFold: a novel latent space diffusion denoising model to generate RNA tertiary structures. The model incorporates a point cloud and graph autoencoder to embed the RNA structures into latent representations which are then passed through an iterative scheme of adding noise and trained on the reverse denoising process. Our preliminary results on RNA as well as our conditional generation aspect, highlight the potential of this approach.

A-017: A coarse-grained efficacy prediction model of μ opioid receptor ligands based on structural and dynamical analysis of ligand/protein complexes.
Track: 3D-SIG
  • Gabriel Tiago Galdino, University of montreal, Canada
  • Olivier Mailhot, University of Montreal, Canada
  • Rafael Najmanovich, University of Montreal, Canada


Presentation Overview: Show

GPCRs are crucial for various biological processes, comprise about 1/3 of marketed drug targets. In this project, we used normal mode analysis to calculate Dynamical Signatures of ligand/GPCR complexes, identifying flexibility changes in residues upon ligand binding. We combined this information with structural analysis, to determine residues in contact with ligands, predict new drug candidates' efficacy and classify them as agonists, antagonists, or partial agonists.

We docked a set of ligands with known Emax for GTP-gammaS binding assay to active Mu (MOR) and Kappa (KOR) Opioid Receptors' crystal structures. We then used LASSO multiple linear regression to train a predictor using contacts and Dynamical Signatures, achieving a roc AUC > 0.85 in binary classification performance.

This analysis identified crucial positions for receptor activation, including L85(1.47), which has mutations impacting morphine response in MOR, and K305(6.58) for MOR, a position without known mutations. Our study offers insights into GPCR ligand binding dynamics and structural features, presenting a novel coarse-grained tool to include functional selection in high-throughput screening.

A-018: NRGDOCK: AN OPEN-SOURCE HIGH-THROUGHPUT VIRTUAL SCREENING SOFTWARE THAT CAN ANALYZE 1 MOLECULE PER SECOND
Track: 3D-SIG
  • Thomas Descoteaux, Université de Montréal, Canada
  • Olivier Mailhot, University of California San Francisco, United States
  • Rafael Najmanovich, Université de Montréal, Canada


Presentation Overview: Show

Here we present NRGDock, an easy-to-use docking software based on Python requiring less than 1 CPU second per molecule. With this speed, a modern laptop can dock 700 000 molecules in 24 hours. Its scoring function is based on that of FlexAID and an exhaustive search procedure. NRGDock has been benchmarked against the widely used DUD-E benchmarking dataset and obtained similar median enrichment factors than FlexAID as well as Autodock Vina. Furthermore, NRGDock performs well on protein structures generated by AlphaFold, where residue positioning may not be modelled precisely.

A-019: Surfaces: A software for fast quantification and visualisation of biomolecular interactions
Track: 3D-SIG
  • Natalia Fagundes Borges Teruel, Université de Montréal, Canada
  • Vinícius Magalhães Borges, Marshall University, United States
  • Rafael Najmanovich, Université de Montréal, Canada


Presentation Overview: Show

The quantification of biomolecular interactions is crucial to understand biological processes, guide drug discovery and protein engineering. However, current methods for evaluating protein interfaces are complex and computationally expensive. This study introduces Surfaces, a simplified approach that utilizes a per-residue decomposition method prioritizing performance, and utilizes the SARS-CoV-2 Spike protein as case study. Compared to different computational approaches, methods that employ molecular dynamics (MD) simulations, such as free-energy perturbation (FEP) calculations, offer good predictive performance when compared to experimental measurements, but are computationally demanding. In contrast, Surfaces, which uses a complementarity function (CF) based on atomic areas in contact, offers comparable performance with reduced computational cost, making large-scale applications feasible. Surfaces was applied to analyze a dataset of 738 structures of Spike protein in complex with antibodies and mutations in complex with the receptor ACE2. The results of Surfaces provide insights into the contribution of individual residue-residue interactions to receptor binding and immune escape. In conclusion, Surfaces offers a simplified and effective approach for evaluating protein interfaces and understanding per-residue interaction contributions, making it a valuable tool for large-scale applications, including the study of viral glycoprotein evolution, particularly relevant in the ongoing SARS-CoV-2 pandemic.

A-020: Computational and in vivo studies of new biopesticides against Drosophila suzukii using neuronal target candidates
Track: 3D-SIG
  • Tarcisio Silva Melo, State University of Feira de Santana, Brazil
  • Thiago Svacina, Federal University of Viçosa, Brazil
  • Sabrina Helena da Cruz Araujo, Federal University of Tocantins, Brazil
  • Rosane Moura Aguiar, State University of Southwest of Bahia, Brazil
  • Francisco Paiva Machado, Federal University of Rio de Janeiro, Brazil
  • Leandro Machado Rocha, Federal University of Fluminense, Brazil
  • Eugenio Eduardo de Oliveira, Federal University of Viçosa, Brazil
  • Bruno Silva Andrade, State University of Southwest of Bahia, Brazil


Presentation Overview: Show

Drosophila suzukii is a species considered a global agricultural pest. The economic damage caused by D. suzukii resulted in approximately over 500 million dollars in North America and European countries between 2010 and 2012. This species continues to cause significant damage to agriculture worldwide, with Brazil being one of the main targets due to the large production of soft-skinned fruits such as strawberry, blackberry, and grape, which are the main host crops. This study aimed to evaluate the biopesticide potential using Kwon molecules from Varronia curassavica Jacq. against AChE and Octopamine targets from D. suzukii using computational and in vivo methods. 3D Molecular modeling of protein structures was performed by Swiss-Model Workspace, and then molecular docking was performed using AutoDock 4 ligand candidates. The best binding energies were collected and the best complexes were submitted to 100 nanoseconds molecular dynamics to evaluate the stability of the biopesticides candidates using RMSD and RMSF graphs as references, as well as hydrogen bonding, gyration radius, and binding energy graphs. Shyobunol and α-cadinol were selected as possible active molecules from V. curassavica, and then its essential oil was capable to kill D. suzukii in vivo tests without harming non-target species.

A-021: ProteinShake: A Unified Framework for Deep Learning on Large Datasets of Protein Structures
Track: 3D-SIG
  • Tim Kucera, Max Planck Institute of Biochemistry, Germany
  • Carlos Oliver, ETH Zürich, Switzerland
  • Dexiong Chen, ETH Zürich, Switzerland
  • Karsten Borgwardt, Max Planck Institute of Biochemistry, Germany


Presentation Overview: Show

We present and demonstrate the usage of ProteinShake, a new Python software package that supports deep learning model development on protein 3D structure data by harmonizing the fundamental steps of data processing and model evaluation. The package abstracts away large amounts of boilerplate processing code for downloading, annotating, parsing, filtering, and splitting protein 3D structure files. This allows for rapid creation of new datasets and benchmark tasks for biological applications. Associated to ProteinShake we host a database of pre-processed datasets and evaluation tasks for supervised and self-supervised learning. ProteinShake drastically simplifies access to protein structure data, enabling rapid prototyping and reproducible model evaluation. We intend to serve the growing community of machine learning researchers aiming to expand their models to challenging biological domains. ProteinShake seamlessly integrates with all common deep learning frameworks and converts protein structures to point clouds, graphs, and voxel grids. The package is available at PyPi and at borgwardtlab.github.io/proteinshake

A-022: Identification of reproducible motifs in antibody-antigen binding interactions of viral pathogens using structural and genetic information
Track: 3D-SIG
  • Andrew Schaub, [11:56 PM] Shen, Chen-Hsiang (NIH/NIAID) [E] Vaccine Research Center, NIAID, NIH, Bethesda, MD, USA, United States
  • Chen-Hsiang Shen, [11:56 PM] Shen, Chen-Hsiang (NIH/NIAID) [E] Vaccine Research Center, NIAID, NIH, Bethesda, MD, USA, United States
  • Shuishu Wang, [11:56 PM] Shen, Chen-Hsiang (NIH/NIAID) [E] Vaccine Research Center, NIAID, NIH, Bethesda, MD, USA, United States
  • Reda Rawi, [11:56 PM] Shen, Chen-Hsiang (NIH/NIAID) [E] Vaccine Research Center, NIAID, NIH, Bethesda, MD, USA, United States
  • Peter Kwong, [11:56 PM] Shen, Chen-Hsiang (NIH/NIAID) [E] Vaccine Research Center, NIAID, NIH, Bethesda, MD, USA, United States


Presentation Overview: Show

Development of effective antibodies as therapeutics or immunogens for vaccine design is dependent on specific target interactions, which are complicated by conformational flexibility, pathogen sequence diversity, immunological differences across species and B-cell diversity between hosts. Despite these differences, we set out to identify and characterize reproducible motifs in the antibody recognition of HIV-1, Influenza A and SARS-CoV-2 viruses from immunizations in mice and non-human primates, as well as natural infection in humans. We developed a structural bioinformatics pipeline for performing in-depth characterization of these motifs, which share similar genetic and structural recognition elements, delineated here as sub-epitope classes with over 100 identified using this approach. A higher relative abundance of sub-epitope classes for flexible targets was observed, with 59 sub-epitope classes identified for the fusion peptide of the HIV-1 envelope, half of which occur in more than one species. However, for more rigid targets, including the SARS-CoV-2 spike protein receptor binding domain ACE2-binding site and influenza A hemagglutinin, 29 and 28 sub-epitope classes were observed, respectively. Here we demonstrate how flexibility in the antibody-recognized conformation can lead to especially robust recognition, thereby leading to an improvement in the frequency of vaccine responses and to prevalent reproducibility of specific antibody classes.

A-023: Using AlphaFold to delve into signal transduction mechanisms
Track: 3D-SIG
  • Pasquale Miglionico, Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
  • Marin Matic, Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
  • Christian Johannes Gloeckner, German Center for Neurodegenerative Diseases, Tübingen, Germany
  • Asuka Inoue, Department of Molecular and Cellular Biochemistry, Tohoku University, Sendai, Japan
  • Francesco Raimondi, Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy


Presentation Overview: Show

Cells respond to external stimuli through a variety of proteins that, interacting with each other, transmit the signal across the cell membrane and ultimately elicit a cellular response. Structurally characterizing these protein-protein interactions is crucial to understand the pathways in which they are involved. Here, we show that using AlphaFold-Multimer to predict the structure of key complexes can provide novel insights into the mechanisms of signal transduction. We focus on two systems: G-protein-coupled receptor (GPCR) and leucine-rich repeat kinase 2 (LRRK2). We predicted the structure of all known interacting GPCR-G-protein pairs and identified differences in the binding modes of different G-protein classes. We used the same method to study the structure of the complexes between GPCRs and β-arrestins, shedding structural insights about different β-arrestins binding modes that were used to interpret experimental data. Finally, we investigated the role of LRRK2 in RAB-mediated signaling pathways by predicting the interacting complex of LRRK2 with all human RABs, gaining insights into the interaction sites of different RABs and the different conformations that LRRK2 adopts while interacting with its partners. The information provided by these models can guide future experimental studies and facilitate the development of new therapeutic strategies for treating diseases.

A-024: Unlocking the Conformational Landscape of Protein Kinases: A Custom MSA Approach with ColabFold for Active State Modeling
Track: 3D-SIG
  • Carmen Al Masri, Harmonic Discovery Inc., United States
  • Francesco Trozzi, Harmonic Discovery Inc., United States
  • Navriti Sahni, Harmonic Discovery Inc., United Kingdom
  • Marcel Patek, Harmonic Discovery Inc., United States
  • Anna Cichonska, Harmonic Discovery Inc., Finland
  • Balaguru Ravikumar, Harmonic Discovery Inc., Finland
  • Rayees Rahman, Harmonic Discovery Inc., United States


Presentation Overview: Show

The protein kinases are an essential signaling protein family that serves as a prime target for drug discovery. These proteins are structurally dynamic and can adopt different conformational states, including active and inactive states. The positioning of c-helix and DFG structural elements defines these states, which maintain cellular homeostasis. Kinase inhibitors (KIs) exhibit different biophysical, biochemical, and pharmacophore properties depending on the specific kinase conformation they target. The recent success of AlphaFold2 (AF2) in predicting protein structures accurately based on sequence inspired our investigation into the conformational landscape of protein kinases modeled by AF2. Our research demonstrated that AF2 can accurately model several kinase conformations across the kinome in a kinase-specific manner. However, it is challenging to direct AF2 to generate structures of kinases in specific conformational states. This lack of conformational coverage hinders the discovery of novel conformation specific KIs, especially for kinases lacking experimental structures. We present a new methodology that utilizes ColabFold, an open-source protein modeling software based on AF2, to model any kinase in the active conformation. Our findings create the opportunity to use AF2 to model any protein kinase in several pharmacologically relevant conformational states.

A-025: Combining rigorous structural bioinformatics and deep-learning-based protein structure prediction: AlphaFold2 models of all 438 catalytically competent human kinases in the active form
Track: 3D-SIG
  • Bulat Faezov, Fox Chase Cancer Center, Kazan Federal University, United States
  • Roland Dunbrack, Fox Chase Cancer Center, United States


Presentation Overview: Show

Humans have 438 catalytically competent protein kinase domains with the typical kinase fold, similar to the structure of PKA. Only 280 of these kinases are currently represented in the PDB. The active form of the kinase must satisfy requirements for binding ATP, magnesium, and substrate. From bioinformatics analysis of structures of 40 unique substrate-bound kinases, as well as many structures with bound ATP and phosphorylated activation loops, we derived criteria for the active form of protein kinases. These criteria include the conformation of the DFG motif (in dihedral angles) and the N-terminal domain salt bridge, required for binding ATP and magnesium. There are also novel requirements on the position of the N and C terminal portions of the activation loop, which lead to the formation of a substrate binding cleft. With these criteria, only 148 of 438 kinase domains (32%) are present in the PDB. We used extensive sampling with AlphaFold2 with these active-state structures as templates and shallow multiple sequence alignments to make active-conformation models of all 438 human kinases. In addition, we used active models produced by AlphaFold2 as templates for modeling recalcitrant kinases ("diffusion templates"). Models of all 438 catalytically competent kinases in the active form are available at http://dunbrack.fccc.edu/kincore/active). They are suitable for interpreting mutations leading to constitutive catalytic activity in cancer as well as for templates for modeling substrate-kinase complexes and inhibitors which bind to the active state.

A-026: Identifying communications hotspots in the Nuclear Receptors´s Ligand binding domain.
Track: 3D-SIG
  • Israel Barrios, CABD-CSIC, Spain
  • Ildefonso Cases, CABD-CSIC, Spain
  • Alba Jiménez-Panizo, University of Barcelona/NCI-NIH, Spain
  • Andrea Alegre-Martí, Institute of Biomedicine of the University of Barcelona (IBUB), Spain
  • Pablo Fuentes-Prior, Institute of Biomedicine of the University of Barcelona (IBUB), Spain
  • Eva Estébanez-Perpiñá, Institute of Biomedicine of the University of Barcelona (IBUB), Spain
  • Ana M Rojas, CABD-CSIC, Spain


Presentation Overview: Show

The Nuclear Receptors are transcription factors essential to cell viability. Despite its importance, and having being studied for decades now, how they acquire a plethora of functions have been challenging. In contrast to other families where sequence analysis enables a more detail characterization of its members, Nuclear receptors do not provide enough sequence diversity to address functional diversity. Therefore, classification of its components has relied on ligand binding specificities exhibited by its members. Using a multidisciplinary approach including structural analyses, biochemistry and computational biology, we have addressed the multi-functionality of the Glucocorticoid nuclear receptor, centered around the LBD domain, where a path for allosteric communication along the LBD domain was identified. Here we present the identification of such spots in the context of the experimental data.

A-027: Evolutionary biophysics of human adenovirus penton base proteins elucidates a variable pH-dependent lysis mechanism in different adenovirus serotypes
Track: 3D-SIG
  • Isabella Crisostomo, California State University Northridge, United States
  • Ravinder Abrol, California State University Northridge, United States


Presentation Overview: Show

Human adenovirus penton base (PB) proteins play vital roles in virion particle attachment to host cells, internalization, and endosomal escape. The mechanism of this escape is not understood, however, it has been proposed that the endosomal low-pH environment causes protonation of PB pentamer’s titratable residues that causes electrostatic repulsion, pentamer dissociation, and virion escape. To test this hypothesis, human adenovirus C serotype-5 PB protein was used to identify one highly homologous PB protein (serotype-2) and two distantly homologous PB proteins (serotype-7 and serotype-9). The pH-dependent structural models of these 4 PB proteins demonstrated biochemical properties consistent with their relative homologies and predicted that serotype-5 and serotype-2 PB pentamers should dissociate at late endosomal pH of 5.0, whereas serotype-7 and serotype-9 PB pentamers should resist dissociation at pH 5.0. Serotype-5 and serotype-7 PB proteins were used in implicit solvent molecular dynamics simulations at pH of 7.0, 5.0, and 3.0, which showed serotype-5 PB protein beginning to dissociate at pH 5.0 and serotype-7 PB protein resisting breakage at pH 5.0. These studies provide support for the electrostatic charge repulsion hypothesis for PB dissociation and predict variable pH-dependent lysis mechanisms for different adenovirus PB proteins.

A-028: Jalview 2.11.3. Visualisation and analysis of sequences and alignments in the context of computationally determined and observed structures
Track: 3D-SIG
  • Jim Procter, Barton Group, Division of Computational Biology, University of Dundee, Scotland, UK, United Kingdom
  • Ben Soares, Barton Group, Division of Computational Biology, University of Dundee, Scotland, UK, United Kingdom
  • Mungo Carstairs, Barton Group, Division of Computational Biology, University of Dundee, Scotland, UK, United Kingdom
  • Suzanne Duce, Barton Group, Division of Computational Biology, University of Dundee, Scotland, UK, United Kingdom
  • Geoff Barton, Division of Computational Biology, University of Dundee, Scotland, UK, United Kingdom


Presentation Overview: Show

Jalview 2.11.3 is the latest release of the widely used open source application and browser based platform for interactive integration, visualisation and analysis of biomolecular and genomic sequences, structures, annotation and trees. Jalview interoperates with 3D structure viewers including Jmol, UCSF Chimera/X and Pymol. It also integrates services provided by EMBL-EBI, including 3D-Beacons, which allows discovery of structural data from a wide range of sources.

Computationally determined structures from EMBL-EBI's AlphaFoldDB potentially add a huge amount of value to researchers investigating how structure and function of proteins have evolved and diversified. Jalview 2.11.3 allows these structures to be linked to aligned proteins and visualised in context with genomic variation from their coding loci. This release adds new capabilities for colouring and filtering structures according to pLDDT and predicted alignment error annotations, enabling researchers to more confidently assess and interpret their data in the light of structural models.

A-029: Is the success of AlphaFold due to a better understanding of physics?
Track: 3D-SIG
  • Dmitry Ivankov, Skolkovo Institute of Science and Technology, Russia
  • Marina Pak, Skolkovo Institute of Science and Technology, Russia


Presentation Overview: Show

AlphaFold's remarkable success in predicting protein structures with near-experimental accuracy has revolutionized the field of structural bioinformatics.
Despite its success, there is ongoing debate over whether AlphaFold's achievement is due to a better understanding of the physics of protein folding. Some researchers have raised skepticism, citing AlphaFold's prediction of side groups with a dangling orientation that suggests a bound zinc ion, despite no input of such information in the program. Others, however, suggest that good protein design facilitated by AlphaFold or better decoy ranking could imply to a deeper understanding of the physics of protein folding.
In this study, we investigated AlphaFold's understanding of physics in two ways. First, we compared AlphaFold's ability to predict distant homologous structures to that of the threading algorithm MUSTER. We found that AlphaFold's correlation between predicted and actual structures was significantly worse than MUSTER, indicating a weaker understanding of protein folding physics. Second, we attempted to use AlphaFold metrics to predict the impact of single mutations on protein stability, but found no significant correlation, further supporting the idea that AlphaFold does not know the physics of protein folding.

A-030: Explainable Deep Generative Models, Ancestral Fragments, and Murky Regions of the Protein Structure Universe
Track: 3D-SIG
  • Eli Draizen, University of Virginia, United States
  • Cameron Mura, University of Virginia, United States
  • Philip Bourne, University of Virginia, United States


Presentation Overview: Show

Modern proteins did not arise abruptly, as singular events, but rather over the course of at least 3.5 billion years of evolution. The molecular evolutionary processes that yielded their intricate 3D structures involve duplication, recombination and mutation of genetic elements, corresponding to short peptide fragments. Identifying these ancestral fragments is crucial to deciphering the interrelationships amongst proteins, as well as how evolution acts upon protein sequences, structures & functions. Traditionally, common fragments have been found using alignment approaches, which becomes challenging when proteins have undergone extensive permutations—allowing for architecture similarity despite topological variability, which we term Urfold. We designed a framework to identify compact, potentially-discontinuous peptide fragments by combining deep generative models of protein superfamilies with explainable AI to identify relevant atoms. Our approach recapitulates known relationships amongst the evolutionarily ancient small β-barrels and amongst P-loop–containing proteins, established via manual analysis. We are now applying our approach to every CATH superfamily, including CATH-AlphaFold2 predicted domain structures. Because of the generality of our model’s approach, we anticipate that it can enable the discovery of new ancestral peptides. We leverage decades worth of structural biology knowledge to decipher the underlying molecular bases for protein structural relationships—including those which are exceedingly remote.

A-031: Studying protein 3D structures using network science
Track: 3D-SIG
  • Khalique Newaz, University of Hamburg, Germany
  • Tijana Milenkovic, University of Notre Dame, United States


Presentation Overview: Show

Functions of a protein depend on its 3-dimensional (3D) structure. Hence, understanding proteins’ 3D structures is important. With the hypothesis that network models of proteins can better capture their 3D structural characteristics, in a series of studies, we proposed several (static or dynamic) network-based approaches to model protein 3D structures as protein structure networks (PSNs). Static PSNs model the whole 3D structure of a protein as a single-layer PSN. Because the folding of a protein is a dynamic process, where some parts (sub-structures) of a protein fold before others, we additionally proposed to model a protein as a dynamic (multi-layer) PSN that captures these sub-structures. We evaluated our (static or dynamic) PSN models in the task of protein structural classification (PSC); a supervised problem of assigning proteins into pre-defined structural (CATH or SCOP) classes based on the proteins' features. Using 72 datasets spanning 44,000 protein domains, we significantly showed that our PSN models outperformed state-of-the-art PSC approaches, with dynamic PSNs working better than static PSNs. Our PSN models can prove to be one of the effective approaches to analyze 3D protein structural data that has recently exploded due to the recent success of computational protein 3D fold prediction methods.

A-032: Structure-based discovery of novel protein domains from AlphaFold predictions
Track: 3D-SIG
  • Greg Slodkowicz, Exscientia, United Kingdom
  • John Overington, Exscientia, United States
  • Douglas Pires, Exscientia, United Kingdom


Presentation Overview: Show

Domains are fundamental units of protein architecture, function and evolution. Understanding protein function is therefore contingent on comprehensively identifying and classifying domain regions. Traditionally, due to the paucity of protein structures, domains have been annotated using sequence-based approaches such as Hidden Markov Models. Here, we leverage AlphaFold2 protein structure predictions and develop an alternative, structure-based approach to identifying domains.

We show that the predicted regions correspond to globular regions of protein structure and largely correspond to annotated domain regions. We identify ~150 cases where existing annotation approaches have missed apparent domain regions and perform additional structure-based similarity searches to classify those regions and identify novel folds. We explore the structural and functional properties of these regions and discuss the reasons why these regions may have previously been missed by sequence-based annotation methods.

The novel domains we have identified have the potential to open new avenues in drug discovery and protein design. More generally, with AlphaFold now providing structures for the majority of proteins, we suggest that direct structure-based methods should be employed more broadly alongside sequence-based approaches.

A-033: Ten Tips for Protein Property Prediction – Common Challenges and Best Practices
Track: 3D-SIG
  • Qingzhen Hou, VU University Amsterdam, Netherlands
  • Katharina Waury, Vrije Universiteit Amsterdam, Netherlands
  • Dea Gogishvili, Vrije Universiteit Amsterdam, Netherlands
  • K. Anton Feenstra, Vrije Universiteit Amsterdam, Netherlands


Presentation Overview: Show

Machine learning is easily applied to predict properties of novel proteins, trained on many available types of annotations. However, we noticed recurring issues, making findings hard to understand and replicate. We suspect this may be due to biologists being unfamiliar with machine learning methodology, or conversely, machine learning experts may miss some knowledge about proteins. Here, we aim to bridge this gap for method developers.

The most striking issues are linked to a lack of clarity, about annotations, benchmark metrics, definition of positives and negatives. Others relate to a lack of rigour about structure vs. sequence-based, comparison to ‘state-of-the-art’, or significance of differences. During writing, these points may have seemed obvious to you – the author(s) – however, they may not be so clear-cut to everyone else – your readers.

We will give an overview of our main tips and expect these to hold for other machine learning based applications in biology. New methods should be tested rigorously but fairly on their merits. Best practices for publication should continue to keep pace with accelerating methodological developments and continuing growth of databases, as should the researchers. We expect that our tips will provide you with a good place to start.

A-034: Investigation of the synergistic antimicrobial effect of Winter Flounder peptides using molecular dynamics simulations
Track: 3D-SIG
  • Miruna Serian, King's College London, United Kingdom
  • Chris Lorenz, King's College London, United Kingdom
  • James Mason, King's College London, United Kingdom


Presentation Overview: Show

With the fast-growing resistance of bacteria to antibiotics, antimicrobial peptides (AMPs) have gained attention as potential drug candidates because of their potency against a broad spectrum of bacteria. Analogous to combinations of drugs, combinations of AMPs could lead to more potent antibacterial agents with lower host toxicity, a delay in the evolution of drug resistance and a reduction in the dosage needed, hence causing less side effects. The use of molecular dynamics simulations has proven to be a powerful tool to investigate the molecular mechanisms which drive the antimicrobial effects of AMPs. Here we report the results of all-atom molecular dynamics simulations of the interaction of a number of different two-way combinations of Winter Flounder peptides, a family of cationic AMPs found in the epithelial mucous cells of winter flounder, against membrane models representative of gram-positive bacteria (100% POPG) and gram-negative bacteria (POPE:POPG = 3:1). The results of the MD simulations of the most promising WF peptides combinations in different bacterial membrane models could help shed light into the synergistic activity of AMPs and help guide the creation of effective AMPs cocktails for therapeutic use.

A-035: Deep clustering based high-throughput in-situ macromolecular classification in cryo-electron tomograms
Track: 3D-SIG
  • Min Xu, Carnegie Mellon University, United States


Presentation Overview: Show

Cryoelectron tomography directly visualizes heterogeneous macromolecular structures in their native and complex cellular environments. However, existing computer-assisted structure sorting approaches are low throughput or inherently limited due to their dependency on available templates and manual labels. Here, we introduce a high-throughput template-and-label-free deep learning approach, Deep Iterative Subtomogram Clustering Approach (DISCA), that automatically detects subsets of homogeneous structures by learning and modeling 3D structural features and their distributions. Evaluation on five experimental cryo-ET datasets shows that an unsupervised deep learning based method can detect diverse structures with a wide range of molecular sizes. This unsupervised detection paves the way for systematic unbiased recognition of macromolecular complexes in situ.

Reference:
* Zeng X, Kahng A, Xue L, Mahamid J, Chang Y, Xu M. High-throughput cryo-ET structural pattern mining by deep unsupervised clustering. PNAS. doi:10.1073/pnas.2213149120
* Azzuni H, Ridzuan M, Xu M, Yaqub M. Color Space-based HoVer-Net for Nuclei Instance Segmentation and Classification. 2022 IEEE International Symposium on Biomedical Imaging Challenges (ISBIC), 2022, pp. 1-4, doi: 10.1109/ISBIC56247.2022.9854725.
* Sharif M, Demidov D, Hanif A, Yaqub M, Xu M. TransResNet: Integrating the Strengths of ViTs and CNNs for High Resolution Medical Image Segmentation via Feature Grafting. British Machine Vision Conference (BMVC) 2022

A-036: Utilizing protein language models combined with graph neural networks for protein-ligand binding sites detection
Track: 3D-SIG
  • Hamza Gamouh, Faculty of Mathematics and Physics, Charles University, Czechia
  • David Hoksza, Faculty of Mathematics and Physics, Charles University, Czechia


Presentation Overview: Show

The identification of protein-ligand binding sites through structural analysis is a crucial component of drug discovery and design. Recent advancements have introduced various binding site prediction approaches that leverage graph neural networks and hand-crafted features. Our previous work demonstrated the effectiveness of protein language models in detecting binding site residues solely based on the protein sequence. In this study, we integrate these two methodologies, examining the best practices for combining amino acid representations from protein language models with the propagation capabilities of graph neural networks. Our results indicate that incorporating network information significantly enhances performance compared to methods excluding structure. However, as the complexity of the protein language model increases, this advantage tends to diminish. Ultimately, we demonstrate that by carefully selecting the type of graph neural network and its parameters, a combination of protein language models and graph neural network can outperform other state-of-the-art techniques in protein-ligand binding site detection.

A-037: How protein structure dynamics improve catalytic sites prediction
Track: 3D-SIG
  • Simon Aubailly, Biolizard, Belgium
  • Ding Rong Ruo Yu, Biolizard, Belgium
  • Christian Rausch, Biolizard, Belgium


Presentation Overview: Show

Within the last decades, great efforts have been made to find catalytic sites in enzymes. Due to the complexity of experimental methods and with the rise of machine learning, prediction algorithms based on 3D structures and/or sequence were developed. However, while many models took into account features of the static 3D structures, to our knowledge, none of these have used dynamic features extracted from an elastic network model of the protein structure.

In this study, we combine structural parameters, normal modes of vibration, chemical features and sequence information to predict catalytic sites in enzymes. While different combinations of features derived from sequence and/or structure yield prediction accuracies comparable to those of previously published methods, adding dynamical features derived from the normal mode allowed us to increase the prediction metrics to a recall of 78% for a precision of 17% on the Catalytic Site Atlas dataset.

This new approach shows the importance of including the dynamic features to predict catalytic sites in enzymes. We show that functional sites do not depend on a single class of parameters and the need to consider the global complexity of protein functioning is highlighted.

A-038: Novel theoretical drug design strategy to discover highly potent urease inhibitors for H. pylori based on coumarin derivatives
Track: 3D-SIG
  • Gabriel J. Olguín-Orellana, Laboratory of Bioinformatics and Computational Chemistry, Faculty of Medicine, Catholic University of Maule, Chile
  • Daniel Bustos, Laboratory of Bioinformatics and Computational Chemistry, Faculty of Medicine, Catholic University of Maule, Chile


Presentation Overview: Show

Helicobacter pylori (H.p.) infection affects over 50% of the global population, causing various gastric diseases. Urease, a key pathogenic metalloenzyme of this microorganism, hydrolyzes urea to ammonia in the stomach, favoring the replication of H.p. Antibiotic therapies have become unfeasible mainly due to antibiotic resistance. Then, new strategies based on urease inhibitors are being proposed.

Here, a protocol is proposed to discover urease inhibitors for H.p. from new 240 coumarin derivatives. To accomplish it, the entire set of coumarins inhibitors already reported (CIAR; 142 compounds) is being subjected to Induced-Fit Docking (IFD), MM/GBSA and QM/MM to characterize their binding energy with H.p. urease and correlate it with experimental values such as the IC50 and other 49 physicochemical descriptors. Subsequently, this protocol will be applied to the new set.

Until now, we estimated the binding CIAR-urease free energy through IFD and MM/GBSA, but with a low correlation with the IC50 (R2=0.59). This value could be due to the higher polarization effects induced by the Ni2+ in the urease, which decreases the precision. Consequently, the reparametrization of the ligand charges with QM/MM should increase the correlation coefficient and improve the hit-rate of the protocol.

A-039: Deriving and Analysing a Non-Redundant Dataset of Nanobody-Protein Antigen Interfaces
Track: 3D-SIG
  • Tiziana Ricciardelli, King Abdullah University of Science and Technology, Saudi Arabia
  • Mohit Chawla, King Abdullah University of Science and Technology, Saudi Arabia
  • Luigi Cavallo, King Abdullah University of Science and Technology, Saudi Arabia
  • Romina Oliva, University Parthenope of Naples, Italy


Presentation Overview: Show

Nanobodies (Nbs) are a class of antibodies, naturally found mainly in camelid species, which are composed by a single heavy chain domain (VHH). The interest of the scientific community in these molecules, as alternatives to classical antibodies for biotechnological applications, has increased over time. This is due to them being small (~15 kDa), highly soluble and stable, and, consequently, readily bioengineered. However, the design of novel Nb-protein antigen complexes is hindered by their structure prediction being still challenging.
We derived here a non-redundant dataset of 319 Nb-protein antigen interfaces from 297 experimental 3D structures at high resolution (within 2.5 Å). We labeled the interfaces as natural or engineered, depending on them corresponding to wild-type or lab-modified sequences. This dataset was built to the aim of thoroughly characterizing the recognition in this special class of complexes. Among the features under analysis there are: the interface composition and extension, H-bonds and salt bridges, as long as specific atomic contacts such as pi-pi, lone pair/cation/anion/polar/sulfur-pi and water/metal mediated interactions, detected by our in-house scripts. Possible peculiarities highlighted by such analyses may be then exploited for an improved prediction of unknown Nb-protein antigen complexes and for the scoring of predicted 3D models.

A-040: AlphaConformers: Predicting Multiple Conformations with AlphaFold2
Track: 3D-SIG
  • Diego Javier Zea, Université Paris-Saclay, France


Presentation Overview: Show

Understanding the diverse conformations that a protein can adopt is crucial for exploring its biological functions. While AlphaFold2 has revolutionized protein structure prediction, its primary limitation is its tendency to predict one predominant conformation, usually the apo structure. This leads to an incomplete depiction of the protein's functional spectrum. Here, we introduce AlphaConformers, an innovative approach inspired by the ConTemplate hypothesis that proteins sharing one conformation likely share others. Our method leverages Foldseek to identify similar structures in the Protein Data Bank and extends this to their respective conformations. These identified structures are then clustered based on their structural similarity and used as templates in AlphaFold2, with each Multiple Sequence Alignment (MSA) utilizing sequence information specific to its cluster. This enhances the chances of capturing the desired conformation. Preliminary tests reveal that AlphaConformers can predict a more diverse structure range than the standard AlphaFold2 pipeline. This integration of Foldseek and AlphaFold2 holds significant potential for enabling a more comprehensive understanding of protein function by capturing and modeling different protein conformations. Further optimization and extensive validation of AlphaConformers, using pairs of proteins with known apo and holo forms, is ongoing.

A-041: A comprehensive toolset for accuracy evaluation of computational structure predictions of macromolecular complexes
Track: 3D-SIG
  • Gabriel Studer, SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Switzerland
  • Andrew M. Waterhouse, SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Switzerland
  • Xavier Robin, SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Switzerland
  • Gerardo Tauriello, SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Switzerland
  • Torsten Schwede, SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Switzerland


Presentation Overview: Show

We present a new toolset for evaluating the accuracy of computational structure predictions with a focus on macromolecular complexes and critical regions within them such as interfaces between molecules. Our OpenStructure software framework includes new capabilities which can handle large complexes containing proteins, polynucleotides, and small molecule ligands and compute scores such as the local Distance Difference Test (lDDT) which measures and aggregates the local accuracy of each atom’s neighborhood. We also integrated additional scores such as QS-score and DockQ which focus on macromolecular interactions, Oligo-GDTTS for the global similarity of superposed complexes, lDDT-PLI for protein-ligand interactions, and BiSyRMSD to evaluate ligand poses. We have integrated the scores into SWISS-MODEL's Structure Assessment page, allowing easy exploration of the scores, and added command line interfaces in OpenStructure to simplify programmatic uses. Our work has been applied for the assessments in the CASP15 experiment, is being included in CAMEO and can assist method developers in the field of computational structure prediction by providing a comprehensive and accessible resource to compare predictions with experimentally determined reference structures.

A-042: A Structure-Focused Analysis of Antigens Presented by the Class 1 Major Histocompatibility Complex
Track: 3D-SIG
  • Nele Quast, University of Oxford, United Kingdom
  • Matthew Raybould, University of Oxford, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom


Presentation Overview: Show

T cells perform their cytotoxic role in the immune system through T-cell receptors (TCRs) that interact with antigens presented on cell surfaces by the Class 1 Major Histocompatibility Complex (MHC). The biophysical and structural properties of presented antigens are important determinants of recognition by a given TCR. It is of high interest to better understand the conformations and properties of the presented antigen to improve TCR specificity modelling and prediction.

We present the first holistic structural analyses of crystallographically resolved peptide-MHC class 1 crystal structures. We retrieve peptide-MHC structures from www.histo.fyi, a database scraped from the Protein Data Bank, and interrogate these using a novel Python package (github.com/npqst/Structural-Protein-Clustering).

Key recognition features we analyse include peptide backbone coordinates and angles, coarse-grained side-chain coordinates and the surface accessibility of their amino-acids. Furthermore, we benchmark the feature distributions of predictions by Alphafold2 against a subset of solved crystal structures. Through this holistic analysis we identify structural trends in the presentation of peptides across different MHC alleles.

Our analysis provides insight into the conformational diversity of peptides presented by the MHC and is of immediate interest to the TCR specificity prediction community. Our python codebase can be extended for further structural analysis.

A-043: Efficient compression of protein structures
Track: 3D-SIG
  • Adam Gudyś, Silesian University of Technology, Poland
  • Sebastian Deorowicz, Silesian University of Technology, Poland


Presentation Overview: Show

The introduction of Deep Minds’ Alpha Fold 2 enabled prediction of protein structures at unprecedented scale. AlphaFold Protein Structure Database and ESM Metagenomic Atlas contain hundreds of millions of structures stored in CIF and/or PDB formats. When compressed with a general-purpose compressor like gzip, this translates to tens of terabytes of data which hinders the effective use of predicted structures in large-scale analyses. Here, we present a compressor dedicated to CIF/PDB files. Its main contribution is a novel, clustering-based approach to predict atom coordinates on the basis of the previously analyzed atoms. This allows efficient encoding of the coordinates which are the largest component of the protein structure files. By default, the compression is lossless, though the lossy mode with a controlled maximum error of coordinates reconstruction is also present. Compared to the competing packages, i.e., BinaryCIF and Foldcomp, our approach offers superior compression ratio at established reconstruction accuracy. By the efficient use of threads at both compression and decompression stages, the algorithm takes advantage of multicore architecture of current central processing units. The presence of C++ and Python API further increases the usability of the presented method.

A-044: Fast and accurate protein structure search with Foldseek
Track: 3D-SIG
  • Stephanie Kim, Seoul National University, South Korea
  • Michel van Kempen, Max Planck Institute, Germany
  • Charlotte Tumescheit, Seoul National University, South Korea
  • Milot Mirdita, Seoul National University, South Korea
  • Cameron Gilchrist, Seoul National University, South Korea
  • Johannes Soding, Max Planck Institute, Germany
  • Martin Steinegger, Seoul National University, South Korea


Presentation Overview: Show

As structure prediction methods, such as AlphaFold2 and ESM Atlas, are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. To address this issue, we developed Foldseek, a fast and sensitive protein structure alignment method designed for comparing large sets of structures. Foldseek aligns the structure of a query protein against a database by describing tertiary amino acid interactions within proteins as sequences over a structural alphabet. Foldseek decreases computation times by four to five orders of magnitude with 86%, 88%, and 133% of the sensitivities of Dali, TM-align, and CE, respectively. The open-source Foldseek software is available at foldseek.com and a web server at search.foldseek.com. Our work on Foldseek was recently published in Nature Biotechnology (https://doi.org/10.1038/s41587-023-01773-0).

A-045: Combining protein language models and alignment approaches to detect structural similarities in the twilight-zone
Track: 3D-SIG
  • Lorenzo Pantolini, Biozentrum, University of Basel; SIB: Swiss Institute of Bioinformatics, Switzerland
  • Gabriel Studer, Biozentrum, University of Basel; SIB: Swiss Institute of Bioinformatics, Switzerland
  • Joana Pereira, Biozentrum, University of Basel; SIB: Swiss Institute of Bioinformatics, Switzerland
  • Janani Durairaj, Biozentrum, University of Basel; SIB: Swiss Institute of Bioinformatics, Switzerland
  • Torsten Schwede, Biozentrum, University of Basel; SIB: Swiss Institute of Bioinformatics, Switzerland
  • Gerardo Tauriello, Biozentrum, University of Basel; SIB: Swiss Institute of Bioinformatics, Switzerland


Presentation Overview: Show

Unveiling evolutionary relationships between proteins is crucial for protein sequence annotation and structure prediction efforts. While recognizing relationships among proteins with similar sequences is relatively straightforward, it becomes challenging for protein pairs in the twilight-zone, where the sequence similarity is remarkably low. Recently, language models commonly employed for text classification and generative tasks have been harnessed in bioinformatics, opening doors to novel powerful tools. Protein language models (pLMs) create high-dimensional embeddings on a per-residue basis, capturing the "semantic meaning" of each amino acid in the entire protein sequence. We developed a new protein alignment method that generates embedding-based sequence alignments (EBA), successfully capturing structural similarities even within the twilight-zone. By leveraging distances in the embedding spaces, we effectively measure the similarity between two residues, considering also their context within the protein sequence. Consequently, we are able to build score matrices with richer information than traditional BLOSUM matrices. Our method outperforms both classical methods and other pLM-based approaches for the same purpose. Notably, our approach exhibits excellent accuracy despite the absence of training and parameter optimization. We expect that the association of pLMs and alignment methods will soon rise in popularity, helping the detection of relationships between proteins in the twilight-zone.

A-046: FiTMuSiC: Leveraging structural and (co)evolutionary data for protein fitness prediction
Track: 3D-SIG
  • Matsvei Tsishyn, Université libre de Bruxelles, Belgium
  • Gabriel Cia, Université libre de Bruxelles, Belgium
  • Pauline Hermans, Université libre de Bruxelles, Belgium
  • Jean Marc Kwasigroch, Université libre de Bruxelles, Belgium
  • Marianne Rooman, Université libre de Bruxelles, Belgium
  • Fabrizio Pucci, Université libre de Bruxelles, Belgium


Presentation Overview: Show

Accurately predicting how mutations impact the fitness of a protein is of major interest for the interpretation of genetic variants and thus for the understanding of genetic diseases. Here we introduce FiTMuSiC, a simple model that combines structural, evolutionary and coevolutionary features to predict the fitness of protein variants. Despite the fact that our model only has a few parameters, it outperforms deep learning models when benchmarked on a deep mutagenesis scanning dataset. Moreover, in contrast to deep learning models, our simple approach allows an easy biological interpretation of the phenomenon underlying the predicted fitness value and thus can deepen our understanding of pathogenic variants. We showcase the application of FiTMuSiC on hydroxymethylbilane synthase, which was one of the targets in the last round of the Critical Assessment of Genome Interpretation (CAGI) in which our method was one of the best. FiTMuSiC is freely available for academic use at babylone.3bio.ulb.ac.be/FiTMuSiC.

A-047: RING-PyMOL: residue interaction networks of structural ensembles and molecular dynamics
Track: 3D-SIG
  • Alessio Del Conte, University of Padua, Italy
  • Alexander Miguel Monzon, Department of information Engineering, University of Padova, Italy, Italy
  • Damiano Clementel, University of Padua, Italy
  • Giorgia F. Camagni, University of Padua, Italy
  • Giovanni Minervini, Department of Biomedical Sciences, University of Padua, Italy, Italy
  • Silvio Tosatto, Department of Biomedical Sciences, University of Padova, Italy
  • Damiano Piovesan, Department of Biomedical Sciences, University of Padova, Italy


Presentation Overview: Show

RING-PyMOL is a plugin for PyMOL providing a set of analysis tools for structural ensembles and molecular dynamic simulations. RING-PyMOL combines residue interaction networks, as provided by the RING software, with structural clustering to enhance the analysis and visualization of the conformational complexity. It combines precise calculation of non-covalent interactions with the power of PyMOL to manipulate and visualize protein structures. The plugin identifies and highlights correlating contacts and interaction patterns that can explain structural allostery, active sites, and structural heterogeneity connected with molecular function. It is easy to use and extremely fast, processing and rendering hundreds of models and long trajectories in seconds. RING-PyMOL generates a number of interactive plots and output files for use with external tools. The underlying RING software has been improved extensively. It is 10 times faster, can process mmCIF files and it identifies typed interactions also for nucleic acids.

A-048: Systematic Sequence-Based Antibody Optimization Analysis (SEQuOiA) of Antibody Libraries Reveals Global Trends for Stability and Humanisation
Track: 3D-SIG
  • Emmanuel Lorenzo de Los Santos, Research Analytics. Data and Translational Sciences, Early Solutions, UCB BioPharma, United Kingdom
  • Kelly Phalen, Research Analytics. Data and Translational Sciences, Early Solutions, UCB BioPharma, United States
  • Matthew Pharris, Research Analytics. Data and Translational Sciences, Early Solutions, UCB BioPharma, United States
  • James Snowden, Research Analytics. Data and Translational Sciences, Early Solutions, UCB BioPharma, United Kingdom


Presentation Overview: Show

An ensemble of large language models trained on representative protein sequences by Meta (ESM) was used to suggest single and double mutations to improve the affinity of antibodies towards a diverse set of targets. Sequences containing these mutations were selected and expressed by Antibody Discovery scientists at our UK Research Site, resulting in the identification of mutants with improved binding affinities. We termed this workflow as SEQuOiA or SEQuence-based Optimization Analysis. Given the success of SEQuOiA in improving the binding of antibodies towards different targets, we decided to perform a systematic analysis of the single mutations suggested by SEQuOiA on our historical library of antibodies to discover any general trends that could aid in antibody lead optimization. This analysis revealed that mutations suggested by SEQuOiA were preferentially located in structurally significant positions. Global trends were also observed for mutations between amino acid classes. Finally, several mutations suggested were consistent with successful humanization campaigns. The insights provided from the global analysis of these trends inform our further use of protein language models to optimize antibody binding.

A-049: A PDB-wide assignment of apo & holo relationships based on individual protein-ligand interactions
Track: 3D-SIG
  • Christos Feidakis, Department of Cell Biology, Faculty of Science, Charles University, Czechia
  • Radoslav Krivák, Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Czechia
  • David Hoksza, Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Czechia
  • Marian Novotný, Department of Cell Biology, Faculty of Science, Charles University, Czechia


Presentation Overview: Show

From studying protein dynamics to unveiling cryptic binding sites or assessing the effectiveness of ligand binding site prediction software, access to several snapshots of a protein is needed. Availability of both bound (holo) and unbound (apo) forms of a protein is paramount for making meaningful comparisons and drawing robust conclusions. The few existing resources that provide access to such data are restricted either in terms of protein coverage, or in the number of provided structure pairs which does not always reflect the conformational variance that is represented by the structures deposited in the Protein Data Bank (PDB).
Here we use a previously designed application (AHoJ, Apo-Holo Juxtaposition), to perform an extensive search for apo-holo pairs for each individual protein-ligand interaction across the PDB (~500,000 small molecule interactions, excluding interactions with peptides and nucleic acids). We assemble the results of this search into a database that can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal associations that can confirm existing hypotheses or expose protein- and ligand-specific relationships like order-to-disorder transitions, that were previously obscured by intermittent or partial data.

A-050: New insight on how evolution shaped protein thermodynamic stability.
Track: 3D-SIG
  • Pauline Hermans, Université Libre de Bruxelles, Belgium
  • Matsvei Tsishyn, Université Libre de Bruxelles, Belgium
  • Martin Schwersensky, Université Libre de Bruxelles, Belgium
  • Fabrizio Pucci, Université Libre de Bruxelles, Belgium
  • Marianne Rooman, Université Libre de Bruxelles, Belgium


Presentation Overview: Show

Prediction of the change in protein thermodynamic stability upon mutations (∆∆G) is crucial for understanding the mechanisms of biomolecular evolution, the deleteriousness of genetic variants, and for protein design and optimization. Protein thermodynamic stability, being the main component of protein fitness, is under strong selective evolutionary pressure, and evolutionary information derived from families of homologous proteins can potentially improve ∆∆G prediction. However, it is not yet clear how to fully exploit the evolutionary data to do this. In this work we leverage a massive amount of thermodynamic stability data from deep mutagenesis experiments to investigate a series of evolutionary-based features derived from multiple sequence alignments (MSAs) such as the residue evolutionary rate and the co-variation between residues. More in detail, we analyze how their correlation with ∆∆G depends on the number of effective sequences in the MSA and on some parameters used to construct the alignment. We also study how these relations change when considering protein core and surface regions as well as different secondary structures. In summary, this investigation sheds light on the relation between evolution and protein stability and gives insights on how to extract in an optimized way the evolutionary information from MSA to improve ∆∆G predictions.

A-051: Structural and energetic impact of modified nucleobases on base pairing in nucleic acid structures
Track: 3D-SIG
  • Mohit Chawla, KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology, Saudi Arabia
  • Utkarsh Kalra, Department of Research and Innovation, STEMskills Research and Education Lab Private Limited, Haryana, India, India
  • Romina Oliva, Dept of Sciences and Technologies, Univ. Parthenope of Naples, Naples, Italy, Italy
  • Luigi Cavallo, KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology, Saudi Arabia


Presentation Overview: Show

Posttranscriptional modifications greatly enhance the chemical information of RNA molecules, contributing to explain the diversity of their structures and functions. A significant fraction of RNA experimental structures present in fact modified nucleobases, with roughly half of them involved in H-bonding interactions with other bases, i.e. in “modified base pairs” (MBPs). After characterizing some MBPs in tRNA, in 2015 we compiled an atlas of all experimentally observed RNA MBPs, classifying 27 distinct types of them, for which we calculated optimal geometries and interaction energies by quantum mechanics (QM) calculations. We are now updating our atlas by adding 72 novel MBPs involving 24 modifications not included before. Our structural analyses show that most of the MBPs are non Watson–Crick like and are involved in tertiary structure motifs.
Furthermore, as non-natural (synthetic) modifications are increasingly introduced in RNA/DNA structures for targeted biotechnological applications, over time we have performed similar analyses on base pairs involving non-natural modifications of particular interest. Results of our combined bioinformatics and QM approach help provide a rationale for the impact of the different modifications on the geometry and stability of the base pairs they participate in and contribute to predict the effect of newly designed modifications.

A-052: Deep learning-based studying of DNA-protein binding using drying droplet patterns.
Track: 3D-SIG
  • Safoura Vaez, Karlsruhe institute of technology, Germany


Presentation Overview: Show

Identifying DNA-protein binding in DNA sequences is a heated issue because it plays a vital role in the biological process. In biochemistry, it is still a big challenge to find and develop a simple, inexpensive and accurate methods to investigate DNA-protein interactions.
This study develops the method based on reproducible and characteristic information left behind the evaporated droplets. To achieve this goal, a large image database of two different DNA s-histone mixtures is prepared by polarized light microscopy. Based on the ratios of mixing the DNA with histone, the different specific and non-specific reaction take place. Convolutional neural network approach is applied to stratify the taken PLM images of different DNA-histone mixtures. The results show that stain patterns eventuate the highly accurate classification of mixtures not only based on two various DNAs, but also secondary structure changes due to DNA-histone interactions for each of the DNAs. Our findings suggest that the dried droplets information can be fed successfully as an input data for deep learning clustering algorithms to identify DNA-protein interaction. This scalable approach can be used as a visual indicator to recognize the new candidates that are able to interact with DNA.