Attention Presenters - please review the Presenter Information Page available here
Schedule subject to change
All times listed are in EDT
Monday, July 15th
10:40-11:20
Invited Presentation: 50 Years of Protein Structures & Structural Bioinformatics
Confirmed Presenter: Janet M Thornton, EMBL-EBI, UK

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Janet M Thornton, EMBL-EBI, UK
  • Roman A Laskowski, EMBL-EBI, UK
  • Sameer Velankar, EMBL-EBI, UK

Presentation Overview: Show

The last 50 years have seen a revolution in our understanding of proteins and how they work in 3D. This has been enabled by the development of many new technologies in producing proteins, crystallisation with robots, the synchrotrons to collect very high resolution data, structure determination by NMR, the more recent developments in Cryo-electron microscopy and tomography. These experimental developments have been matched by the development of sophisticated computational tools and databases using powerful computers, to help in determining structures and also in curating, analysing, comparing and predicting their structures.

In this talk I will focus on our collective progress in understanding more about these molecules of life, from the handful or structures determined in 1974 to our current knowledge of the complex world of proteins. I will conclude by describing some of our own recent work on exploring enzyme catalysis.

I will highlight:
· Our current knowledge of the universe of protein structures
· The development of tools for annotating structures
· wwPDB & PDBe; EMDB & EMPIAR, AFDB
· Protein structure prediction & AI
· Computational Enzymology
· The impact & the future?

11:20-11:40
Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning
Confirmed Presenter: Samuel Sledzieski, Massachusetts Institute of Technology, United States

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Samuel Sledzieski, Massachusetts Institute of Technology, United States
  • Meghana Kshirsagar, AI for Good Research Lab, Microsoft Corporation, United States
  • Minkyung Baek, Seoul National University, South Korea
  • Rahul Dodhia, AI for Good Research Lab, Microsoft Corporation, United States
  • Juan Lavista Ferres, AI for Good Research Lab, Microsoft Corporation, United States
  • Bonnie Berger, Massachusetts Institute of Technology, United States

Presentation Overview: Show

Proteomics has been revolutionized by large protein language models (PLMs), which learn unsupervised representations from large corpora of sequences. These models are typically fine-tuned in a supervised setting to adapt the model to specific downstream tasks. However, the computational and memory footprint of fine-tuning large PLMs presents a barrier for many research groups with limited computational resources. Natural language processing has seen a similar explosion in the size of models, where these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we introduce this paradigm to proteomics through leveraging the parameter-efficient method LoRA and training new models for two important tasks: predicting protein-protein interactions (PPIs) and predicting the symmetry of homooligomer quaternary structures. We show that these approaches are competitive with traditional fine-tuning while requiring reduced memory and substantially fewer parameters. We additionally show that for the PPI prediction task, training only the classification head also remains competitive with full fine-tuning, using five orders of magnitude fewer parameters, and that each of these methods outperform state-of-the-art PPI prediction methods with substantially reduced compute. We further perform a comprehensive evaluation of the hyperparameter space, demonstrate that PEFT of PLMs is robust to variations in these hyperparameters, and elucidate where best practices for PEFT in proteomics differ from those in natural language processing. All our model adaptation and evaluation code is available open-source at https://github.com/microsoft/peft_proteomics. Thus, we provide a blueprint to democratize the power of protein language model adaptation to groups with limited computational resources.

11:40-12:00
EquiPNAS: improved protein–nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
Confirmed Presenter: Debswapna Bhattacharya, Virginia Tech, United States

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Rahmatullah Roche, Virginia Tech, United States
  • Bernard Moussad, Virginia Tech, United States
  • Md Hossain Shuvo, Virginia Tech, United States
  • Sumit Tarafder, Virginia Tech, United States
  • Debswapna Bhattacharya, Virginia Tech, United States

Presentation Overview: Show

Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein–nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here, we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein–nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein–DNA and protein–RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.

12:00-12:20
Accurate High-throughput Cryptic Binding Site Prediction Using Protein Language Model
Confirmed Presenter: Shuo Zhang, The City University of New York, United States

Room: 520a
Format: Live Stream

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Shuo Zhang, The City University of New York, United States
  • Lei Xie, The City University of New York, United States

Presentation Overview: Show

Identification of cryptic binding sites of proteins is an important but challenging task for understanding the function of proteins and screening potential drugs for proteins currently considered undruggable. Existing methods usually require 3D protein structures from resource-intensive molecular dynamics (MD) simulations or are too slow to be adopted in high-throughput compound screening. To tackle these limitations, we propose LaMPSite, which only takes protein sequences and ligand molecular graphs as input for cryptic binding site predictions. Without any 3D coordinate information of proteins, our proposed model is not only 100 to 1000 times faster than baseline methods that require 3D protein structures from time-consuming MD simulations or generative binding complex structures but also more accurate than them. Given the efficiency and accuracy of LaMPSite, it is promising to be applied to drug discovery.

14:20-14:40
Contrastive learning in protein language space predicts interactions between drugs and protein targets
Confirmed Presenter: Rohit Singh, Duke University, United States

Room: 520a
Format: In Person

Moderator(s): Douglas Pires


Authors List: Show

  • Rohit Singh, Duke University, United States
  • Samuel Sledzieski, Massachusetts Institute of Technology, United States
  • Bryan Bryson, Massachusetts Institute of Technology, United States
  • Lenore Cowen, Tufts University, United States
  • Bonnie Berger, Massachusetts Institute of Technology, United States

Presentation Overview: Show

Experimental screening of potential drug molecules against protein targets is a key bottleneck in the drug discovery pipeline. Fast and accurate computational prediction of drug-target interactions (DTIs) could significantly accelerate this process. However, current sequence-based DTI prediction methods struggle to achieve broad generalization and high specificity while remaining computationally efficient. We develop ConPLex, a deep learning model that successfully leverages the advances in pretrained protein language models (""PLex"") and employs a protein-anchored contrastive coembedding (""Con"") to outperform state-of-the-art approaches. ConPLex makes predictions of binding based on the distance between learned representations, achieving high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. Experimental validation yielded a 63% hit rate, including four hits with subnanomolar affinity and a novel strongly-binding EPHB1 inhibitor (KD = 1.3 nM). ConPLex is extremely fast, capable of making 100 million predictions per day on a single GPU, enabling predictions at the scale of massive compound libraries and the human proteome. The contrastive approach and the shared embedding space also provide interpretability, allowing visualization of drug-target relationships and functional characterization of cell-surface proteins. ConPLex has the potential to efficiently guide and prioritize candidates for experimental screening, unlocking significant value in the drug discovery process.

Availability: https://conplex.csail.mit.edu/

Source code: https://github.com/samsledje/ConPLex

Paper: Singh, Sledzieski, Bryson, Cowen, & Berger. PNAS, 120(24) (2023).
https://www.pnas.org/doi/full/10.1073/pnas.2220778120

14:40-15:00
NRGDock: An open-source software for ultra-massive high-throughput virtual screening
Confirmed Presenter: Thomas Descoteaux, Université de Montréal, Canada

Room: 520a
Format: In Person

Moderator(s): Douglas Pires


Authors List: Show

  • Thomas Descoteaux, Université de Montréal, Canada
  • Oliver Mailhot, Universite de Montreal, Canada
  • Rafael Najmanovich, University of Montreal, Canada

Presentation Overview: Show

Here we present NRGDock, an easy-to-use docking software based on Python requiring less than 0.5 CPU second per molecule. With this speed, a modern laptop can dock 1 000 000 molecules in 24 hours. Its scoring function is based on that of FlexAID and an exhaustive search procedure. NRGDock has been benchmarked against the widely used DUD-E benchmarking dataset and obtained median enrichment factors similar to AutoDock Vina and Glide. Furthermore, NRGDock performs well on protein structures generated by AlphaFold, where residue positioning may not be modelled precisely. To validate the performance of NRGDock in high throughput virtual screening, testing was conducted on 102 DUD-E targets against 48.3 million compounds from the Enamine Real Diversity Subset (ERDS) for a total of 4.9 billion docking simulations. A clear separation in scores was observed with true binders getting significantly better scores than the ERDS molecules. Lastly, we used the protein kinase PIM-1 associated with triple-negative breast cancer and the related kinases PIM-2 and PIM-3 against the ERDS library. We show that dissimilar top-scoring compounds can be identified unique for each related target.

15:00-15:20
Proceedings Presentation: Enhancing Generalizability and Performance in Drug-Target Interaction Identification by Integrating Pharmacophore and Pre-trained Models
Confirmed Presenter: Zuolong Zhang, Henan University, China

Room: 520a
Format: In Person

Moderator(s): Douglas Pires


Authors List: Show

  • Zuolong Zhang, Henan University, China
  • Gang Luo, Nanchang University, China
  • Shengbo Chen, Henan University, China
  • Xin He, Henan University, China
  • Dazhi Long, Ji'an Third People's Hospital, China

Presentation Overview: Show

In drug discovery, it is crucial to assess the drug-target binding affinity. Although molecular docking is widely used, computational efficiency limits its application in large-scale virtual screening. Deep learning-based methods learn virtual scoring functions from labeled datasets and can quickly predict affinity. However, there are three limitations. First, existing methods only consider the atom-bond graph or one-dimensional sequence representations of compounds, ignoring the information about functional groups (pharmacophores) with specific biological activities. Second, relying on limited labeled datasets fails to learn comprehensive embedding representations of compounds and proteins, resulting in poor generalization performance in complex scenarios. Third, existing feature fusion methods cannot adequately capture contextual interaction information. Therefore, we propose a novel drug-target binding affinity prediction method named HeteroDTA. Specifically, a multi-view compound feature extraction module is constructed to model the atom-bond graph and pharmacophore graph. The residue concat graph and protein sequence are also utilized to model protein structure and function. Moreover, to enhance the generalization capability and reduce the dependence on task-specific labeled data, pre-trained models are utilized to initialize the atomic features of the compounds and the embedding representations of the protein sequence. A context-aware nonlinear feature fusion method is also proposed to learn interaction patterns between compounds and proteins. Experimental results on public benchmark datasets show that HeteroDTA significantly outperforms existing methods. In addition, HeteroDTA shows excellent generalization performance in cold-start experiments and superiority in the representation learning ability of drug-target pairs. Finally, the effectiveness of HeteroDTA is demonstrated in a real-world drug discovery study.

15:20-15:40
DOCKGROUND: a new release of the long-standing resource for studying protein recognition
Confirmed Presenter: Petras Kundrotas, The University of Kansas, United States

Room: 520a
Format: In Person

Moderator(s): Douglas Pires


Authors List: Show

  • Petras Kundrotas, The University of Kansas, United States
  • Keeley Collins, The University of Kansas, United States
  • Matthew Copeland, The University of Kansas, United States
  • Ian Kothoff, The University of Kansas, United States
  • Amar Singh, The University of Kansas, United States
  • Marc Lensink, University of Lille, France
  • Ilay Vakser, The University of Kansas, United States

Presentation Overview: Show

Artificial intelligence (AI) has transformed the field of computational structural biology. Modeled structures of globular proteins now are accurate enough for computer-aided drug design. Structural prediction of protein-protein (PP) complexes (protein docking) has also been significantly advanced. Still, there is a constant need for re-training of the complex network models on newer data. Technical progress rapidly accelerates accumulation of data by various experimental techniques. Thus, static datasets quickly become obsolete. So far, significant efforts in generating reliable, up-to-date datasets have been focusing on individual macromolecules, while their assemblies have attracted less attention, mainly due to the complexity of the task. Here, we present a full revamp of our well-established DOCKGROUND resource for studying protein recognition (http://dockground.compbio.ku.edu). The resource contains comprehensive sets of data needed for the development and testing of protein docking techniques, including AI-based methods: bound and unbound (experimentally determined and simulated) structures of PP complexes, model-model complexes, docking decoys of experimentally determined and modeled proteins, and templates for comparative protein docking. The core dataset of bound PP structures, from which other sets are derived, is automatically updated on a weekly basis. We also implemented a new DOCKGROUND interactive interface that allows generating custom non-redundant datasets using various parameters and provides structure visualization. The DOCKGROUND resource also incorporates docking model quality assessment tool CAPRI-Q, which utilizes CAPRI criteria and other quality metrics such as DockQ, TM-score and l-DDT.

15:40-16:00
On finding the right match – a structural perspective
Confirmed Presenter: Marian Novotny, Charles University, Faculty of Science, Czechia

Room: 520a
Format: In Person

Moderator(s): Douglas Pires


Authors List: Show

  • Christos Feidakis, Charles University, Faculty of Science, Czechia
  • Radoslav Krivak, Charles University, Faculty of Mathematics and Physics, Czechia
  • Vit Skrhak, Charles University, Faculty of Mathematics and Physics, Czechia
  • David Hoksza, Charles University, Faculty of Mathematics and Physics, Czechia
  • Marian Novotny, Charles University, Faculty of Science, Czechia

Presentation Overview: Show

Proteins can assume a number of 3D structural conformations during their lifetime and many of them can undergo a substantial conformational change that might be crucial for their function, e.g., during ligand binding.
Many machine learning methods that are utilising 3D structural information are often trained on just a single structure of the protein. The single structure, however, does not have to represent the protein fully and it can even be misleading. For example,training a ligand-binding site prediction method on a conformation that is already binding a ligand (holo structure), while the prediction makes more sense for a conformation without a bound ligand (apo structure).
To help avoid potential biases in building datasets, we have developed a tool called AHoJ (www.apoholo.cz) to identify apo-holo structure pairs for user-defined binding sites and post-translational modifications. We have also developed AHoJ-DB (www.apoholo.cz/db), a database of apo-holo structure pairs for biologically relevant ligands as defined in the BioLiP2 database. Both services have easy-to-use interfaces and provide metrics of the similarity of binding sites between apo and holo structures, which can be used for further downstream analysis or development of derived datasets. An analysis of AHoJ-DB shows that apo structures are not available for more than 50% of the experimentally described binding sites. We used AHoJ-DB to build CryptoBench, a dataset of cryptic binding sites, which consists of 1437 apo structures and is the most extensive collection of its kind to date.

16:40-17:00
Explaining Conformational Diversity in Protein Families through Molecular Motion
Confirmed Presenter: Valentin Lombard, Sorbonne University, France

Room: 520a
Format: In Person

Moderator(s): Alexander Monzon


Authors List: Show

  • Valentin Lombard, Sorbonne University, France
  • Sergei Grudinin, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France, France
  • Elodie Laine, Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France

Presentation Overview: Show

Proteins play a central role in biological processes, and understanding their conformational variability is crucial for unraveling their functional mechanisms. Recent advancements in high-throughput technologies have enhanced our knowledge of protein structures, yet predicting their multiple conformational states and motions remains challenging. This study introduces Dimensionality Analysis for protein Conformational Exploration (DANCE) for a systematic and comprehensive description of protein families conformational variability. DANCE accommodates both experimental and predicted structures. It is suitable for analyzing anything from single proteins to superfamilies. Employing it, we clustered all experimentally resolved protein structures available in the Protein Data Bank into conformational collections and characterized them as sets of linear motions. The resource facilitates access and exploitation of the multiple states adopted by a protein and its homologs. Beyond descriptive analysis, we assessed classical dimensionality reduction techniques for sampling unseen states on a representative benchmark. This work improves our understanding of how proteins deform to perform their functions and opens ways to a standardized evaluation of methods designed to sample and generate protein conformations.
In brief, the main contributions of our work are the following: 1. A pipeline was constructed for systematic analysis of protein conformational variability,
2. Datasets of protein ensembles and extracted linear motions have been made publicly accessible,
3. The ability of classical manifold learning methods, including PCA and kPCA, to
capture the diversity of protein conformational states was evaluated.

Pathway of transition for HIV-1 envelope trimer from prefusion-closed to CD4-bound open through an occluded-intermediate state
Confirmed Presenter: Myungjin Lee, National Institutes of Health, United States

Room: 520a
Format: In Person

Moderator(s): Alexander Monzon


Authors List: Show

  • Myungjin Lee, National Institutes of Health, United States
  • Maolin Lu, University of Texas at Tyler Health Science Cente, United States
  • Baoshan Zhang, National Institutes of Health, United States
  • Tongqing Zhou, National Institutes of Health, United States
  • Revansiddha Katte, University of Texas at Tyler Health Science Center, United States
  • Yang Han, University of Texas at Tyler Health Science Center, United States
  • Reda Rawi, National Institutes of Health, United States
  • Peter D. Kwong, Columbia University Vagelos College of Physicians and Surgeons, United States

Presentation Overview: Show

HIV entry into host cells is initiated by the engagement of the gp120 subunit of the HIV-1 envelope (Env) trimer with the cellular receptor CD4. This interaction induces substantial structural changes in the HIV-1 Env trimer. Although there is existing static structural information for both the prefusion-closed and the CD4-bound prefusion open trimer, the complete transition pathway between these static states (such as transition structures) remains uncharacterized. In this study, we investigated the transition of a fully and site specifically glycosylated HIV-1 Env trimer between prefusion-closed and CD4-bound open conformations using a special molecular dynamics simulation technique – collective MD simulation (coMD). Here, we identified a transition intermediate – the occluded intermediate state. Previously reported antibodies Ab1303, Ab1573, b12, and DH851.3 recognized this intermediate. Additionally, we validated the result by experiments single-molecule Förster resonance energy transfer analysis, confirming that each of these four antibodies induces and stabilizes this distinct intermediate state of Env on the virus, replacing the CD4-bound open state. Overall, our findings using coMD simulation delineate a transition pathway between prefusion-closed and CD4-bound open conformations, unveiling the occluded intermediate as a prevalent intermediate state.

17:00-17:20
Analysis and prediction of RuBisCO kinetics using deep learning
Confirmed Presenter: Aleksey Porollo, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA, United States

Room: 520a
Format: In Person

Moderator(s): Alexander Monzon


Authors List: Show

  • Om Jadhav, College of Engineering and Applied Sciences, University of Cincinnati, Cincinnati, OH, USA, United States
  • Tatyana Belenkaya, College of Medicine, University of Cincinnati, Cincinnati, OH, USA, United States
  • Marat Khodoun, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA, United States
  • Aleksey Porollo, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA, United States

Presentation Overview: Show

This study focuses on enhancing the efficiency of Calvin cycle by targeting the kinetic parameters of its key enzyme, Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO). RuBisCO's slow catalytic rate (Kcat) and its specificity for CO₂ over O₂ (Sc/o) substantially limit photosynthetic efficiency, particularly under high CO₂ levels and light intensities. To address this, we analyzed 175 RuBisCO complexes with experimentally measured kinetic parameters using the protein language model ProtT5 for sequence embeddings. These embeddings were then processed through various machine learning models - Ridge regression, LASSO regression, SVM, and Random Forest regression - to predict Kcat and Sc/o. The Ridge regression models performed best, achieving a Pearson correlation coefficient of 0.611 and R² of 0.359 for Kcat, and 0.814 and R² of 0.663 for Sc/o, utilizing leave-one-out cross-validation. Further, we applied these models to predict kinetic parameters for 56,379 non-annotated RuBisCO sequences. Top performing sequences from both experimentally annotated and predicted datasets underwent in silico mutagenesis using a genetic algorithm. This mutagenesis targeted either any sequence position or specifically those lining the active site cavity, excluding the catalytic sites. Conducted over 10 iterations in 5 independent runs with 5000 mutants each, this approach yielded a maximum predicted Kcat of 12 s⁻¹ and 10 s⁻¹ from full sequence and cavity-targeted mutagenesis, respectively, a 2-fold improvement over natural enzymes. Our results highlight the potential of using computational tools and genetic algorithms for the rational design of RuBisCO, aiming to improve photosynthetic efficiency and agricultural productivity while contributing to climate change mitigation and renewable energy development.

17:20-17:40
Understanding and predicting ligand efficacy in the mu-opioid receptor through quantitative dynamical analysis of complex structures
Confirmed Presenter: Gabriel Galdino, University of Montreal, Canada

Room: 520a
Format: In Person

Moderator(s): Alexander Monzon


Authors List: Show

  • Gabriel Galdino, University of Montreal, Canada
  • Olivier Mailhot, University of California, United States
  • Rafael Najmanovich, University of Montreal, Canada

Presentation Overview: Show

GPCRs are a family of membrane proteins that regulate many biological processes and are attractive targets for drug development, representing approximately 1/3 of global marketed drugs. We docked a set of ligands with known Emax for GTP-gammaS binding to a crystal structure of the active Mu (MOR) and Kappa (KOR) Opioid Receptors. Using a coarse-grained approach, we applied normal mode analysis to calculate Dynamical Signatures of different ligand/GPCR complexes, identifying local and global changes in flexibility of different residues upon ligand binding. We used LASSO multiple linear regression to determine crucial residues in contact with the set of ligands and to obtain predictors of the efficacy of new drug candidates as agonists, antagonists, or partial agonists.
We obtained a roc AUC> 0.85 when analysing the performance of the model as a binary classifier. By analyzing the coefficients of these predictors, we identified positions of high importance to the receptor activation, such as L85 (Ballesteros-Weinstein position 1.47), that have mutations that are reported to affect morphine response in MOR, and positions with no known mutations reported such as K305 (6.58) for MOR. Our study provides insights into the dynamics and structural features of ligand binding to GPCRs and represents a new tool for predicting the efficacy of new drug candidates that can be coupled to high-throughput screening.

17:40-18:00
Dynamic network analysis of protein structural change
Confirmed Presenter: Aydin Wells, University of Notre Dame, United States

Room: 520a
Format: In Person

Moderator(s): Alexander Monzon


Authors List: Show

  • Aydin Wells, University of Notre Dame, United States
  • Siyu Yang, University of Notre Dame, United States
  • Khalique Newaz, University of Hamburg, Germany
  • Tijana Milenkovic, University of Notre Dame, United States

Presentation Overview: Show

A protein’s sequence folds into a 3D structure, which directs what other proteins it may interact with to carry out cellular function. Hence, analyses of protein structures are critical for understanding protein functions. Because functions of many proteins remain unknown, computational approaches for linking proteins’ structures to functions are necessary.

Our lab previously used network-based methods to model protein structures as protein structure networks (PSNs). Graph-based analyses of these PSNs proved to be superior to using state-of-the-art sequence and non-network-based 3D structural approaches in task of protein structure classification (PSC). However, traditional PSN approaches (including ours) modeled whole, native protein 3D structures as static PSNs that overlook the protein folding dynamics. To overcome this, we recently proposed a dynamic PSN idea. Unfortunately, there is lack of data on 3D sub-structural configurations (or intermediates) of a protein as it undergoes folding to attain its native structure. So, we had to resort to modeling native structures of proteins as dynamic PSNs. Nonetheless, even this yielded significant improvements in the PSC task over modeling the native structures as static PSNs.

Most recently, as an even better proxy to studying protein folding dynamics than our recent PSC study, we have identified large enough experimental data that captures how the structure of a protein dynamically changes before vs. after the protein is bound to a ligand. We aim to examine how well the dynamic PSN analyses of this data will be able to explain seven different types of protein structural changes observed in the data.

Tuesday, July 16th
8:40-9:20
Invited Presentation: Metallic origins of life
Confirmed Presenter: Yana Bromberg, Emory University, USA

Room: 520a
Format: In Person

Moderator(s): Gonzalo Parra


Authors List: Show

  • Yana Bromberg, Emory University, USA

Presentation Overview: Show

How did life appear on our planet? Alexander Oparin’s 1924 theory of abiotic evolution of carbon-based molecules in a primordial soup suggests a means to the end. However, the evolutionary path beyond formation of individual molecules remains one of the most profoundly unanswered questions in biology. Biologically catalyzed redox reactions, i.e. proton-coupled electron transfer, drive the energy requirements of all life on Earth, implying that they must have been among the first functionalities acquired by early life.
We aimed to explore the patterns of evolution of redox-driving proteins, i.e. oxidoreductases. The billions of years’ worth of divergence among existing oxidoreductases renders sequence similarity metrics inapplicable. Thus, we incorporated structure into our explorations. We found that the peptide structures that bind transition metals, ubiquitous in redox, have similar topologies across the full diversity of existing metal-binding proteins. The similarity between these peptides strongly suggests that metal binding had a small number of common origins. Moreover, folds central to our network of similarities came primarily from oxidoreductases, further confirming the idea that ancestral peptides facilitated electron transfer reactions. We further note that most (>85%) of the experimentally determined protein structures incorporate similar folds, suggesting that metal-binding may have given rise to much more functionality. Finally, our results suggest that the earliest, biologically-functional peptides were likely available prior to the assembly of the first fully functional protein domains over 3.8 billion years ago.

9:20-9:40
De Novo Atomic Protein Structure Modeling for Cryo-EM Density Maps Using 3D Transformer and Hidden Markov Model
Confirmed Presenter: Jianlin Cheng, University of Missouri - Columbia, United States

Room: 520a
Format: In Person

Moderator(s): Gonzalo Parra


Authors List: Show

  • Jianlin Cheng, University of Missouri - Columbia, United States
  • Nabin Giri, University of Missouri - Columbia, United States

Presentation Overview: Show

Accurately building three-dimensional (3D) atomic structures from 3D cryo-electron microscopy (cryo-EM) density maps is a crucial step in the cryo-EM-based determination of the structures of protein complexes. Despite improvements in the resolution of 3D cryo-EM density maps, the de novo conversion of density maps into 3D atomic structures for protein complexes that do not have accurate homologous or predicted structures to be used as templates remains a significant challenge. Here, we introduce Cryo2Struct, a fully automated ab initio cryo-EM structure modeling method that utilizes a 3D transformer to identify atoms and amino acid types in cryo-EM density maps first, and then employs a novel Hidden Markov Model (HMM) to connect predicted atoms to build backbone structures of proteins. Tested on a standard test dataset of 128 cryo-EM density maps with varying resolutions (2.08 - 5.6 ̊A) and different numbers of residues (448 - 8,416), Cryo2Struct built substantially more accurate and complete protein structural models than the widely used ab initio method - Phenix in terms of multiple evaluation metrics. Moreover, on a new test dataset of 500 recently released density maps with varying resolutions (1.9 - 4.0 ̊A) and different numbers of residues (234 - 8,828), its performance of building atomic structural models is rather robust against changes in the resolution of density maps and the size of protein structures.

9:40-10:00
Proceedings Presentation: RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models
Confirmed Presenter: Han Huang, The Chinese University of Hong Kong, Hong Kong

Room: 520a
Format: Live Stream

Moderator(s): Gonzalo Parra


Authors List: Show

  • Han Huang, The Chinese University of Hong Kong, Hong Kong
  • Ziqian Lin, Nanjing University, China
  • Dongchen He, The Chinese University of Hong Kong, Hong Kong
  • Liang Hong, The Chinese University of Hong Kong, Hong Kong
  • Yu Li, The Chinese University of Hong Kong, Hong Kong

Presentation Overview: Show

RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the non-unique structure-sequence mapping, and the flexibility of RNA conformation. In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in-silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints.

10:40-10:50
Positional Protein Bioinformatics: A universal residue numbering scheme for the Immunoglobulin (Ig) fold enables its systemic detection in the protein universe.
Confirmed Presenter: Philippe Youkharibache, National Cancer Institute, NIH, United States

Room: 520a
Format: In Person

Moderator(s): Chris Kieslich


Authors List: Show

  • Caesar Tawfeeq, California State University Northridge, United States
  • Jiyao Wang, NIH/NCBI, United States
  • Thomas Madej, NCBI/NLM/NIH, United States
  • Umesh Khaniya, National Cancer Institute, NIH, United States
  • James Song, NCBI/NLM/NIH, United States
  • Ravi Abrol, California State University Northridge, United States
  • Philippe Youkharibache, National Cancer Institute, NIH, United States

Presentation Overview: Show

The Immunoglobulin fold (Ig-fold) is the most populous fold in the human proteome, found in proteins from all domains of life, with current (under)estimates ranging from 2 to 3% of protein coding regions. The ability of Ig-domains to reliably fold and self-assemble through highly specific interfaces represents a remarkable property of these domains that makes them key elements of molecular interaction systems: the immune system, the nervous system, the muscular system and the vascular system. We define a universal sequence numbering scheme, called “IgStRAnD” (Immunoglobulin Strand Residue Anchor Dependent), to represent all domains sharing the Ig-fold. IgStrand numbering enables comparative structural, functional, and evolutionary analyses through positional comparisons between any Ig-domain variant across the universe of Ig-domains. It enables the systematic study of the Ig-proteome and associated Ig-Ig interactomes and sheds light on the robust Ig protein folding algorithm used by nature to form beta sandwich supersecondary structures, responsible for what may be convergent evolution for many of the more than 300 superfamilies sharing the fold. The numbering scheme is at the heart of an algorithm implemented in the interactive structural analysis software iCn3D to systematically recognize Ig-domains, to annotate them, and to perform detailed comparisons in sequence, topology, and structure, regardless of their tertiary plasticity and quaternary organizations. We performed a (preliminary) survey of the human proteome of over 80,000 protein structures leading to a surprisingly higher number of proteins having co-opted Ig-, Ig-like and Ig-extended domains than was estimated in the original human genome survey.

10:50-11:10
ImmunoMatch: Illuminating the design of antibody heavy and light chain pairs using deep learning approaches and structure analysis
Confirmed Presenter: Dongjun Guo, King's College London, United Kingdom

Room: 520a
Format: In Person

Moderator(s): Chris Kieslich


Authors List: Show

  • Dongjun Guo, King's College London, United Kingdom
  • Joseph Ng, University College London, United Kingdom
  • Deborah Dunn-Walters, University of Surrey, United Kingdom
  • Franca Fraternali, University College London, United Kingdom

Presentation Overview: Show

Antibodies are composed of heavy (H) and light (L) chains. Sequence variations of H and L chains therefore combinatorially contribute to a diverse antibody “repertoire” for eliciting responses against a variety of antigens. How H chain chooses its L chain partner is still under debate. Little attention has been paid to the exact amino acid preferences and their relative importance in the H-L protein interface. Our results illustrate molecular rules governing antibody H-L chain pairing preferences.
Here we present ImmunoMatch, a heavy-light chain pairing prediction tool taking advantage of recently published antibody language models. We capitalise on the increase in single-cell, paired H-L antibody repertoire data, and build the model to distinguish cognate H-L pairs from random synthetic pairs, with the AUC achieved 0.75. We assembled an antibody structure database (VCAb: https://fraternalilab.cs.ucl.ac.uk/VCAb/) for external validation and further structural interpretation of the pairing prediction. We show that our model, trained on human antibody repertoire, performs well on human and humanized antibody, while the performance dropped in detecting the cognitive pairs from mouse and chimera antibody. We took one therapeutic antibody (trastuzumab) for further analysis by searching through the potential mutation space using ImmunoMatch and extracting attention matrix and found positions which can increase/decrease the H-L pairing likelihood clustering around CDR loops and H-L interface. These results highlight the necessity of considering the entire antibody sequence in antibody design by pre-excluding unlikely H-L combinations in the pipeline for better developability.

11:10-11:30
Exploring the biophysical boundaries of protein families with deep learning methods
Confirmed Presenter: Miriam Poley-Gil, Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center (BSC-CNS), Spain

Room: 520a
Format: In Person

Moderator(s): Chris Kieslich


Authors List: Show

  • Miriam Poley-Gil, Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center (BSC-CNS), Spain
  • Maria I. Freiberger, Department of Biological Chemistry, Universidad de Buenos Aires (UBA), Argentina
  • Alin Banka, Department of Informatics, Bioinformatics & Computational Biology, Technical University of Munich (TUM), Germany
  • Michael Heinzinger, Department of Informatics, Bioinformatics & Computational Biology, Technical University of Munich (TUM), Germany
  • Noelia Ferruz, Artificial Intelligence for Protein Design Group, Institute of Molecular Biology of Barcelona (IBMB-CSIC), Spain
  • Alfonso Valencia, Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center (BSC-CNS), Spain
  • R. Gonzalo Parra, Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center (BSC-CNS), Spain

Presentation Overview: Show

Recently, Deep Learning models have revolutionised the Molecular Biology field allowing us to explore the intricate interplay between protein sequence, structure and function faster. To understand what they are capturing and generating we have combined state-of-the-art protein models for inverse folding (such as ProstT5[1] and ProteinMPNN[2]) and for sequence generation (such as ProtGPT2[3] and ZymCTRL[4]) with biophysical analyses (Figure 1).
We have studied conservation patterns of local energetic frustration in artificial datasets to shed light on the evolutionary processes leading to the diversification of some protein families, under the assumption that proteins are optimised for folding and stability, but also evolutionarily selected to function. We have developed a tool called FrustraEvo[5] that measures such conservation within and between protein families (available in full on the server https://frustraevo.qb.fcen.uba.ar/).
We found that most of the highly frustrated native residues are related to functional aspects. These functional residues are mostly recovered by sequence generation models, suggesting that there are alternative ways to design proteins instead of the way explored by evolution. In the case of catalytic sites, they are also recovered by inverse folding models. We therefore point out a selective memory concerning functionality (primary level of memory (local)). However, ProteinMPNN, also recovers the main network of frustrated contacts of the functional domains even suggesting a tertiary level of memory (contacts). Thus, our approach promises to effectively unravel the intricacies of protein family boundaries and explore design options for understanding protein evolution.

11:30-11:40
Can proteins be represented through secondary structures?
Confirmed Presenter: Michael Schroeder, TU Dresden, Germany

Room: 520a
Format: In Person

Moderator(s): Chris Kieslich


Authors List: Show

  • Ali Al-Fatlawi, TU Dresden, Germany
  • Michael Schroeder, TU Dresden, Germany

Presentation Overview: Show

Recent advancements in protein classification, driven by Foldseek for tertiary structure-based searches, raise the question of whether a simplified secondary structure format is enough for classification and functional inference, eliminating the need for confident tertiary structure determination. This paper explores this debate using a sequence format where 'H' denotes helices, 'S' represents strands, and 'L' signifies loops/turns for each amino acid's secondary structure. Through an all-versus-all comparison using CATH and SCOPe datasets, the approach, though slightly less accurate than tertiary structure-based classification, advocates for a simple, informative representation of proteins, maintaining 90%-93% of tertiary structure performance. This invites the development of a search engine for all secondary structure sequences, facilitating a simple, efficient, and rapid protein search with minimized information requirements.

11:40-12:00
SPfast: Highly efficient protein structure alignment with segment-level representations and block-sparse optimization
Confirmed Presenter: Thomas Litfin, Griffith University, Australia

Room: 520a
Format: In Person

Moderator(s): Chris Kieslich


Authors List: Show

  • Thomas Litfin, Griffith University, Australia

Presentation Overview: Show

Recent advances in protein structure modelling have increased the availability of high-quality protein structures at an unprecedented scale. Newly available structure libraries represent an exciting opportunity for discovery-based research. However, the explosion of protein structure data has exposed scaling deficiencies in the bioinformatics toolset which limit their utility for downstream analyses. These scaling problems will only be further exacerbated as modelling projects expand to noncanonical isoforms, dynamic trajectories, de novo designs etc. foldseek has introduced a structure state alphabet to mitigate this computational burden. However, the increased speed is accompanied by trade-offs in search sensitivity due to sacrificing information about global topology. In this work we describe a fully geometric protein structure search engine, SPfast, which leverages a coarse grained, hierarchical representation and an efficient block-sparse optimization heuristic to greatly accelerate pairwise protein structure alignment and enable practical analysis of large-scale structure libraries. Combining SPfast with a newly parameterized SPscore maintains state-of-the art performance for database search, more accurately reproduces pairwise evolutionary alignments and increases throughput by 100x compared with traditional methods.

STRPsearch: fast detection of structured tandem repeat proteins
Confirmed Presenter: Alexander Monzon, Department of Information Engineering, University of Padova, Italy

Room: 520a
Format: In Person

Moderator(s): Chris Kieslich


Authors List: Show

  • Soroush Mozaffari, Department of Biomedical Sciences, University of Padova, Italy, Italy
  • Paula Nazarena Arrias, Department of Biomedical Sciences, University of Padova, Italy, Italy
  • Damiano Clementel, Department of Biomedical Sciences, University of Padova, Italy, Italy
  • Damiano Piovesan, Department of Biomedical Sciences, University of Padova, Italy, Italy
  • Carlo Ferrari, Department of Information Engineering, University of Padua, Italy, Italy
  • Silvio Tosatto, University of Padova, Italy
  • Alexander Monzon, Department of Information Engineering, University of Padova, Italy

Presentation Overview: Show

State-of-the-art prediction methods are generating millions of publicly available protein structures. Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. STRPs exhibit distinct propensities for secondary structure and form regular tertiary structures, often comprising large molecular assemblies. They can perform important and diverse biological functions due to their highly degenerated sequences, which maintain a similar structure while displaying a variable number of repeat units. This suggests a disconnection between structural size and protein function. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation is the most accurate method for detecting and classifying them, making it impossible to inspect millions of structures.
We present STRPsearch, a novel computational tool for rapid identification, classification, and mapping of STRPs. Leveraging the manually curated entries in RepeatsDB as the known conformational space of the STRPs, STRPsearch utilizes the latest advancements in structural alignment techniques for a fast and accurate detection of repeated structural motifs in protein structures, followed by an innovative approach to map units and insertions through the generation of TM-score graphs. STRPsearch can serve researchers in structural bioinformatics and protein science as an efficient and practical tool for analysis and detection of STRPs.

12:00-12:20
The Encyclopedia of Domains
Confirmed Presenter: Nicola Bordin, University College London, United Kingdom

Room: 520a
Format: In Person

Moderator(s): Chris Kieslich


Authors List: Show

  • Andy Lau, University College London, United Kingdom
  • Nicola Bordin, University College London, United Kingdom
  • Shaun Kandathil, University College London, United Kingdom
  • Ian Sillitoe, University College London, United Kingdom
  • Vaishali Waman, University College London, United Kingdom
  • Jude Wells, University College London, United Kingdom
  • Christine Orengo, University College London, United Kingdom
  • David Jones, University College London, United Kingdom

Presentation Overview: Show

The Encyclopaedia of Domains (TED) is a comprehensive classification of all globular protein structure domains in AlphaFold Database v4. Harnessing state-of-the-art deep learning methods for domain detection, structure comparison and fold detection, TED segments and classifies domains across AFDB, identifying over 370 million distinct domains, surpassing sequence-based resources by over 100 million domains. Nearly 90% of these domains exhibit similarities with known superfamilies in CATH, expanding the resource by over 600-fold. The remaining domains that do not have relatives in any PDB-based resources unveiled over 7 thousand new folds, some of which have interesting and beautiful symmetries. We also find some fascinating new architectures.

TED uncovers over 10,000 previously undetected structural interactions between superfamilies and extends domain coverage to over 1 million taxa, enhancing research for organisms which previously had low to non-existent structural coverage. TED data will be made available in 3D-Beacons as well as a dedicated resource, significantly enriching CATH superfamilies.

14:20-14:40
Unraveling SARS-CoV-2 Spike Protein Evolution: A Comprehensive Structural Analysis
Confirmed Presenter: Natalia Fagundes Borges Teruel, Université de Montréal, Canada

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Natalia Fagundes Borges Teruel, Université de Montréal, Canada
  • Rafael Najmanovich, Université de Montréal, Canada

Presentation Overview: Show

The evolution of the SARS-CoV-2 virus, the cause of the COVID-19 pandemic, has prompted a detailed investigation into the structural dynamics of its Spike protein. This study presents an in-depth analysis of 1560 published structures of the SARS-CoV-2 Spike protein, covering various variants that have emerged during the pandemic. Employing Surfaces for interaction evaluation, we investigate receptor binding characteristics and antibody recognition patterns associated with these diverse Spike protein structures. We characterized 14 epitopes according to a novel data-driven approach, used to cluster 2044 vectors of antibody interactions and to sort 210 vectors of ACE2 interactions in order to examine their common biding sites. We also exploit the shift in conformational dynamics and its effects in epitope exposure, using NRGTEN for dynamical assessments and occupancy calculations. Through a systematic examination of mutations in each variant, we aim at providing a comprehensive overview of their functional effects on the Spike protein. Our methodologies allow the analysis of structural variations among different SARS-CoV-2 variants and reveals the intricate interplay between genetic alterations and protein functionality, shedding light on the evolutionary forces driving structural changes in the SARS-CoV-2 Spike protein throughout the COVID-19 pandemic.

Dynamic and Energetic Consequences of Disulfide Bonds in Proteins
Confirmed Presenter: Miguel Fernandez-Martin, Barcelona Supercomputing Center (BSC-CNS), Spain

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Miguel Fernandez-Martin, Barcelona Supercomputing Center (BSC-CNS), Spain
  • Alfonso Valencia, Barcelona Supercomputing Center (BSC-CNS), Spain
  • R. Gonzalo Parra, Barcelona Supercomputing Center (BSC-CNS), Spain

Presentation Overview: Show

Introduction
Disulfide bonds, crucial for protein structure stability, are found to have versatile roles beyond structural support. This study explores three scenarios showcasing their functions based on energetic context. 1) Disulfide bridges aid in forming cyclic cystine knot (CCK) motifs, like in the case of the cyclotide trypsin inhibitor (MCoT-II). Molecular dynamics simulations reveal interplays of frustration and correlation among specific disulfide bridges, like Cys4-Cys21exhibiting functional flexibility. 2) Disulfide bonds introduce frustration, affecting conformational dynamics and regulating functional signaling in bacterial species, as seen in oxidoreductase DsbD (nDSBd). 3) Finally, it might not be related to either structure or function. In the absence of a disulfide bond, Azurin sees its thermal and chemical stability dramatically reduced, but adopts a folded structure identical to that with an intact disulfide, with the Cys3-Cys26 bond modulating stability.

Methods
This work aims to study the different types of disulfide bonds present in the Protein Data Bank (PDB) by analyzing their local energetic frustration patterns. The analysis includes all structures in the PDB with at least one disulfide bond (full dataset n=36571) and a subset of proteins with experimental structures in both oxidized and reduced states for at least one disulfide bond (paired dataset n=1151).

Results
By using the Frustratometer algorithm and IUPRED to assess frustration patterns and predict intrinsic disorder, we aim to elucidate how disulfide bonds locally stabilize or destabilize protein structures, offering insights into their varied roles in biological systems. This understanding is vital for manipulating cellular responses and designing biomedical devices.

14:40-15:00
DDMut-PPI: predicting effects of mutations on protein-protein interactions using graph-based deep learning
Confirmed Presenter: Yunzhuo Zhou, University of Queensland, Australia

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Yunzhuo Zhou, University of Queensland, Australia
  • Yoochan Myung, University of Queensland, Australia
  • Carlos Rodrigues, University of Queensland, Australia
  • David Ascher, University of Queensland; Baker Institute, Australia

Presentation Overview: Show

Protein-protein interactions (PPIs) play a vital role in cellular functions and are essential for therapeutic development and understanding diseases. Traditional methods for exploring the effects of mutations on PPIs face challenges related to experimental complexity, cost, and scalability. While computational methods provide a quicker alternative, they often struggle to balance efficiency and precision in their predictions. In response, we present DDMut-PPI, a deep learning model that efficiently and accurately predicts changes in PPI binding free energy upon single and multiple point mutations. Building on the robust siamese network architecture with graph-based signatures from our prior work, DDMut, the DDMut-PPI model was enhanced with a graph convolutional network to better capture the importance of residues at the interface based on a 2D interaction graph. We used residue-specific embeddings from ProtT5 protein language model as node features, and a variety of molecular interactions as edge features. By integrating evolutionary context with spatial information, this framework enables DDMut-PPI to achieve a robust Pearson correlation of up to 0.67 (RMSE: 1.51 kcal/mol) in our non-redundant evaluations, outperforming most existing methods. Importantly, by utilising both forward and hypothetical reverse mutations to account for model anti-symmetry, the model demonstrated consistent performance across mutations that increase or decrease binding affinity. We believe DDMut-PPI would be a valuable resource for researchers and clinicians looking to explore the complex dynamics of protein interactions and their implications for health and disease. DDMut-PPI is freely available as a user-friendly web server and an API at https://biosig.lab.uq.edu.au/ddmut_ppi.

15:00-15:20
Proceedings Presentation: DDAffinity: Predicting the changes in binding affinity of multiple point mutations using protein three-dimensional structure
Confirmed Presenter: Qichang Zhao, Central South University, China

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Guanglei Yu, Central South University, China
  • Qichang Zhao, Central South University, China
  • Xuehua Bi, Central South University, China
  • Jianxin Wang, Central South University, China

Presentation Overview: Show

Motivation: Mutations are the crucial driving force for biological evolution as they can disrupt protein stability and protein-protein interactions which have notable impacts on protein structure, function, and expression. And the progressive accumulation of multiple point mutations would lead to cancer. However, existing computational methods for protein mutation effects prediction are generally limited to single point mutations with global dependencies, and do not systematically take into account the local and global synergistic epistasis inherent in multiple point mutations.
Results: To this end, we propose a novel spatial and sequential message passing neural network, named DDAffinity, to predict the changes in binding affinity caused by multiple point mutations based on protein three-dimensional (3D) structures. Specifically, instead of being on the whole protein, we perform message passing on the k-nearest neighbour residue graphs to extract pocket features of the protein 3D structures. Furthermore, to learn global topological features, a two-step additive Gaussian noising strategy during training is applied to blur out local details of protein geometry. We evaluate DDAffinity on benchmark datasets and external validation datasets. Overall, the predictive performance of DDAffinity is significantly improved compared with state-of-the-art baselines on multiple point mutations, including end-to-end and pre-training based methods. The ablation studies indicate the reasonable design of all components of DDAffinity. In addition, applications in non-redundant blind testing, predicting mutation effects of SARS-CoV-2 RBD variants, and optimizing human antibody against SARS-CoV-2 illustrate the
effectiveness of DDAffinity. Availability and implementation: DDAffinity is available at https://github.com/ak422/DDAffinity.

15:20-15:40
A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology
Confirmed Presenter: Yingying Zhang, Cornell University, United States

Room: 520a
Format: In Person

Moderator(s): Rafael Najmanovich


Authors List: Show

  • Yingying Zhang, Cornell University, United States
  • Alden Leung, Cornell University, United States
  • Jin Joo Kang, Cornell University, United States
  • Yu Sun, Cornell University, United States
  • Guanxi Wu, Cornell University, United States
  • Le Li, Cornell University, United States
  • Jiayang Sun, Cornell University, United States
  • Lily Cheng, Cornell University, United States
  • Tian Qiu, Cornell University, United States
  • Junke Zhang, Cornell University, United States
  • Shayne Wierbowski, Cornell University, United States
  • James Booth, Cornell University, United States
  • Haiyuan Yu, Cornell University, United States

Presentation Overview: Show

A major goal of cancer biology is to understand the mechanisms underlying tumorigenesis driven by somatically acquired mutations. Two distinct types of computational methodologies have emerged: one focuses on analyzing clustering of mutations within protein sequences and 3D structures, while the other characterizes mutations by leveraging the topology of protein-protein interaction network. Their insights are largely non-overlapping, offering complementary strengths. Here, we established a unified, end-to-end 3D structurally-informed protein interaction network propagation framework, NetFlow3D, that systematically maps the multiscale mechanistic effects of somatic mutations in cancer. The establishment of NetFlow3D hinges upon the Human Protein Structurome, a comprehensive repository we compiled that incorporates the 3D structures of every single protein as well as the binding interfaces of all known protein interactions in humans. NetFlow3D leverages the Structurome to integrate information across atomic, residue, protein and network levels: It conducts 3D clustering of mutations across atomic and residue levels on protein structures to identify potential driver mutations. It then anisotropically propagates their impacts across the protein interaction network, with propagation guided by the specific 3D structural interfaces involved, to identify significantly interconnected network “modules”, thereby uncovering key biological processes underlying disease etiology. Applied to 1,038,899 somatic protein-altering mutations in 9,946 TCGA tumors across 33 cancer types, NetFlow3D identified 12,378 significant 3D clusters throughout the Human Protein Structurome, of which ~54% would not have been found if using only experimentally-determined structures. It then identified 28 significantly interconnected modules that encompass ~8-fold more proteins than applying standard network analyses.