Return to ISMB/ECCB 2025 Homepage   Click here for the abridged agenda


Select Track: 3DSIG | Bio-Ontologies and Knowledge Representation | BioInfo-Core | Bioinfo4Women Meet-Up | Bioinformatics in the UK | BioVis | BOSC | CAMDA | CollaborationFest | CompMS | Computational Systems Immunology | Distinguished Keynotes | Dream Challenges | Education | Equity and Diversity | EvolCompGen | Fellows Presentation | Function | General Computational Biology | HiTSeq | iRNA | ISCB-China Workshop | JPI | MICROBIOME | MLCSB | NetBio | NIH Cyberinfrastructure and Emerging Technologies Sessions | NIH/Elixir | Publications - Navigating Journal Submissions | RegSys | Special Track | Stewardship Critical Infrastructure | Student Council Symposium | SysMod | Tech Track | Text Mining | The Innovation Pipeline: How Industry & Academia Can Work Together in Computational Biology | TransMed | Tutorials | VarI | WEB 2025 | Youth Bioinformatics Symposium | All


Schedule for 3DSIG

NOTE: Browser resolution may limit the width of the agenda and you may need to scroll the iframe to see additional columns.
Click the buttons below to download your current table in that format

Date Start Time End Time Room Track Title Confrimed Presenter Format Authors Abstract
2025-07-21 11:20:00 12:00:00 03B 3DSIG Decoding Immunity: Structural and Dynamical Insights Driving Antibody Innovation Franca Fraternali Franca Fraternali Effective adaptive immune responses rely on antibodies of different isotypes performing distinct effector functions. Understanding their structural diversity is crucial for engineering antibodies with optimal stability, binding, and therapeutic potential. In this keynote, I will present our integrative computational approaches to guide antibody design, which include isotype classification, chain compatibility prediction, 3D structural modeling, and analysis of allosteric communication. In designing novel antibodies, effective pairing of antibody heavy and light chains is essential for effective function, yet the rules governing this remain unclear. I will introduce ImmunoMatch, a suite of AI models fine-tuned on full-length variable regions to predict cognate H–L chain pairs. Built on the AntiBERTa2 language model, ImmunoMatch outperforms CDR- and gene usage–based models, with further improvements from chain type–specific tuning. Applied to B cell repertoires and therapeutic antibodies, ImmunoMatch identifies chain pairing refinement as a hallmark of B cell maturation and uncovers key sequence features driving specificity. Moving beyond the traditional focus on CDRs, we show that framework (FW) mutations can modulate antibody stability and effector function through long-range structural effects. Our analyses revealed that antibody language models (AbLMs) alone lack predictive power for FW mutagenesis. To improve on this, we adopted a structure-based approach, suggesting future directions such as fine-tuning AbLMs with in vitro FW-specific mutational data to improve their utility in antibody design. This shift can broaden the scope of rational engineering toward non-CDR regions and developability attributes, highlighting the need for a holistic view of antibody design.
2025-07-21 12:00:00 12:20:00 03B 3DSIG Rapid and accurate prediction of protein homo-oligomer symmetry using Seq2Symm Meghana Kshirsagar Meghana Kshirsagar, Artur Meller, Ian R. Humphreys, Samuel Sledzieski, Yixi Xu, Rahul Dodhia, Eric Horvitz, Bonnie Berger, Gregory R Bowman, Juan Lavista Ferres, David Baker, Minkyung Baek The majority of proteins must form higher-order assemblies to perform their biological functions, yet few machine learning models can accurately and rapidly predict the symmetry of assemblies involving multiple copies of the same protein chain. Here, we address this gap by finetuning several classes of protein foundation models, to predict homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based and deep learning methods achieving an average AUC-PR of 0.47, 0.44 and 0.49 across homo-oligomer symmetries on three held-out test sets compared to 0.24, 0.24 and 0.25 with template-based search. Seq2Symm uses a single sequence as input and can predict at the rate of ~80,000 proteins/hour. We apply this method to 5 proteomes and ~3.5 million unlabeled protein sequences, showing its promise to be used in conjunction with downstream computationally intensive all-atom structure generation methods such as RoseTTAFold2 and AlphaFold2-multimer. Code, datasets, model are available at: https://github.com/microsoft/seq2symm.
2025-07-21 12:20:00 12:40:00 03B 3DSIG Probing Homo-Oligomeric Interaction Signals in Protein Language Models Zhidian Zhang Zhidian Zhang, Yo Akiyama, Yehlin Cho, Sergey Ovchinnikov Homo-oligomeric protein complexes—assemblies of identical subunits—are central to many biological processes and disease mechanisms. Accurate prediction of inter-subunit contacts within these assemblies remains a key challenge, especially as existing structure predictors like AlphaFold scale poorly with complex size. Protein language models (pLMs), trained on vast sequence databases, offer a scalable alternative by learning coevolutionary statistics from single-chain inputs. In this work, we systematically investigate whether pLMs implicitly learn inter-subunit interaction signals, even when trained solely on monomeric sequences. We find that pLM-predicted contact maps often contain partial inter-subunit signal, but the prediction perfomance is consistently weaker than intra-subunit ones. Notably, we observe that larger pLMs recover more accurate inter-contacts, suggesting that model scaling enhances structural resolution of homo-oligomers. Interestingly, missing inter-contact signals often correspond to interfaces without strong biophysical support outside of crystallography, raising the possibility that some predicted absences reflect genuine lack of physiological relevance. Our findings suggest that inter-subunit contact prediction from pLMs could serve as a computational filter for distinguishing biologically relevant homo-oligomers from crystallographic artifacts. These findings open a new avenue for leveraging pLMs not only as structure predictors, but as tools for dissecting the evolutionary logic and physiological relevance of protein assemblies.
2025-07-21 12:40:00 13:00:00 03B 3DSIG Integration between large phenomics and genomics data using scRNA-seq techniques revealed a genetically driven differential progression pattern in neurodegenerative eye disease Christian Anderson, Liam Scott, Simone Muller, Melanie Bahlo, Lea Scheppke, Roberto Bonelli In this study, we used clinical imaging phenotypic data from a large imaging biobank of eyes affected by Macular Telangiectasia Type 2 (MacTel). We extracted 96 clinically graded phenotypic variables on 7,328 eyes (3,675 patients), 15% of eyes had longitudinal observations with an average of 5 years. By borrowing techniques from single-cell RNAseq analysis we were able to divide the eyes of MacTel patients into 11 distinct clusters. Just like cell differentiation, these clusters were differentiated by retinal phenotypes’ presence and severity. Pseudo-time calculations revealed a clear severity difference between these clusters, additionally validated via clinically defined linear severity score. Minimum spanning tree calculation (oblivious to the longitudinal nature of some of our data) revealed two differential progression routes across clusters, one leading to a more vascular-altered retina while the other leading to a neurodegenerative process resulting in vision loss. Our longitudinal data additionally validated this bifurcated route. Analysis of the progression route in fellow eyes revealed a strong intra-patient agreement. Integration with demographic data revealed that patients affected by type 2 diabetes – a known MacTel comorbidity – were significantly more likely to progress following the neurodegenerative route. By integrating genetic data on 2,182 patients, we tested for association between all MacTel significant GWAS loci and progression routes and found a strong genetic association with a locus known to impact retinal vasculature and thickness. Lastly, modelling the genetic loci effect on clinical phenotypes revealed a strong genetic background on retinal insult presence, severity, and progression rate.
2025-07-21 14:00:00 14:40:00 03B 3DSIG Simulations in the age of AlphaFold: dynamics, drug resistance and enzyme design Adrian Mulholland Adrian Mulholland Molecular simulations contribute to practical protein design and engineering workflows. Equilibrium molecular dynamics (MD) simulations not only test and filter designs, but can predict binding, redox and other properties for engineering and optimization. This includes activation heat capacities determining enzyme temperature activity optima, and analysing causes of epistasis. A particular challenge is understanding and predicting mutations far from the active site that affect activity, often introduced by evolution. Dynamical-nonequilibrium (D-NEMD) simulations can predict distal sites relevant to modulating activity, cryptic binding sites, and allosteric effects. Simulations of chemical reactions in proteins with combined quantum mechanics/molecular mechanics (QM/MM) methods characterize crucial species in catalysis, including transition states and reaction intermediates, and how they are formed and stabilized. QM/MM models can be used as ‘theozyme’ templates for enzyme design. QM/MM calculations also allow prediction of spectroscopic and other electronic properties, assisting in design and optimization of photovoltaic proteins, e.g. in designed spectral tuning. Simulations can also analyse trajectories and effects of directed and natural evolution, providing insights for enzyme design and engineering. Examples include identifying the dynamical origins of heat capacity changes introduced by directed evolution of designer enzymes, and revealing e.g. how local electric fields are optimized for specific catalytic activities in beta-lactamase enzymes that cause resistance to ‘last resort’ antibiotics. Electric fields are vital features of many natural enzymes, including heme peroxidases in which they drive proton delivery. Electric field calculations and MD simulations can be combined effectively with AI tools for protein engineering in evolutionary enzyme design.
2025-07-21 14:40:00 15:00:00 03B 3DSIG FlowProt: Classifier-Guided Flow Matching for Targeted Protein Backbone Generation in the de novo DNA Methyltransfarase Family Ali Baran Taşdemir Ali Baran Taşdemir, Ayşe Berçin Barlas, Abdurrahman Olğaç, Ezgi Karaca, Tunca Doğan Designing novel proteins with both structural stability and targeted molecular function remains a central challenge in computational biology. While recent generative models such as diffusion and flow-matching offer promising capabilities for protein backbone generation, functional controllability is still limited. In this work, we introduce FlowProt, a classifier-guided flow-matching generative model designed to create protein backbones with domain-specific functional properties. As a case study, we focus on the catalytic domain of human DNA methyltransferase DNMT3A, a 286-residue protein essential in early epigenetic regulation. FlowProt builds on the FrameFlow architecture, predicting per-residue translation and rotation matrices to reconstruct 3D backbones from noise. A domain classifier, trained to distinguish DNMT proteins from others, guides the model during inference using gradient-based feedback. This enables FlowProt to steer generation toward DNMT-like structures. We evaluate backbone quality using self-consistency metrics (scRMSD, scTM, pLDDT) and domain relevance using ProGReS, sequence similarity, and SAM-binding potential. FlowProt consistently generates high-confidence structures up to 286 residues—the exact length of DNMT3A—with low scRMSD, high scTM, and strong functional similarity. We further validate our designs through structure-based alignment and cofactor-binding analysis with Chai-1, demonstrating high-confidence SAM-binding regions in the generated models. To our knowledge, FlowProt is the first method to integrate flow-matching with classifier guidance for domain-specific backbone design. As future work, we aim to assess DNA-binding potential and further refine functional capabilities via molecular dynamics simulations and benchmarking against state-of-the-art protein design models.
2025-07-21 15:00:00 15:20:00 03B 3DSIG Molecular design and structure-based modeling with generative deep learning Remo Rohs Jesse Weller, Remo Rohs The rapid expansion of crystal structure data and libraries of readily synthesizable molecules has recently opened up new areas of chemical space for drug discovery. Combined with advancements in virtual ligand screening, these expanded libraries are making an impact in early-stage drug discovery. However, traditional virtual screening methods are still only able to explore a small fraction of the near-infinite drug-like chemical space. Generative deep learning techniques address these limitations by leveraging existing data to learn the key intra- and inter-molecular relationships in drug-target interactions. We present DrugHIVE, a deep hierarchical variational autoencoder that surpasses leading autoregressive and diffusion-based models in both speed and performance on standard generative tasks. Our model generates molecules in a rapid single-shot fashion, making it highly scalable and orders of magnitude faster than other top approaches requiring slow, multi-step inference. DrugHIVE’s hierarchical architecture provides enhanced control over molecular generation, enabling substantial improvements in virtual screening efficiency and automating various drug design processes such as de novo generation, molecular optimization, scaffold hopping, linker design, and high-throughput pattern replacement. We demonstrate an improved ability to optimize drug-like properties, synthesizability, binding affinity, and selectivity of molecules through evolutionary latent space search using both experimentally resolved and AlphaFold predicted receptor structures. Recently, we used DrugHIVE to design novel compounds as prospective therapeutics for the important P53 cancer target. These promising new compounds have been synthesized and are currently undergoing experimental testing.
2025-07-21 15:20:00 15:40:00 03B 3DSIG BC-Design: A Biochemistry-Aware Framework for Highly Accurate Inverse Protein Folding Xiangru Tang Xiangru Tang, Xinwu Ye, Fang Wu, Daniel Shao, Dong Xu, Mark Gerstein Inverse protein folding, which aims to design amino acid sequences for desired protein structures, is fundamental to protein engineering and therapeutic development. While recent deep-learning approaches have made remarkable progress, they typically represent biochemical properties as discrete features associated with individual residues. Here, we present BC-Design, a framework that represents biochemical properties as continuous distributions across protein surfaces and interiors. Through contrastive learning, our model learns to encode essential biochemical information within structure embeddings, enabling sequence prediction using only structural input during inference—maintaining compatibility with real-world applications while leveraging biochemical awareness. BC-Design achieves 88% sequence recovery versus state-of-the-art methods’ 67% (a 21% absolute improvement) and reduces perplexity from 2.4 to 1.5 (39.5% relative improvement) on the CATH 4.2 benchmark. Notably, our model exhibits robust generalization across diverse protein characteristics, performing consistently well on proteins of varying sizes (50-500 residues), structural complexity (measured by contact order), and all major CATH fold classes. Through ablation studies, we demonstrate the complementary contributions of structural and biochemical information to this performance. Overall, BC-Design establishes a new paradigm for integrating multimodal protein information, opening new avenues for computational protein engineering and drug discovery.
2025-07-21 15:40:00 16:00:00 03B 3DSIG DivPro: Diverse Protein Sequence Design with Direct Structure Recovery Guidance Xinyi Zhou Xinyi Zhou, Guibao Shen, Yingcong Chen, Guangyong Chen, Pheng Ann Heng Motivation: Structure-based protein design is crucial for designing proteins with novel structures and functions, which aims to generate sequences that fold into desired structures. Current deep learning-based methods primarily focus on training and evaluating models using sequence recovery-based metrics. However, this approach overlooks the inherent ambiguity in the relationship between protein sequences and structures. Relying solely on sequence recovery as a training objective limits the models’ ability to produce diverse sequences that maintain similar structures. These limitations become more pronounced when dealing with remote homologous proteins, which share functional and structural similarities despite low sequence identity. Results: Here, we present DivPro, a model that learns to design diverse sequences that can fold into similar structures. To improve sequence diversity, instead of learning a single fixed sequence representation for an input structure as in existing methods, DivPro learns a probabilistic sequence space from which diverse sequences could be sampled. We leverage the recent advancements in in-silico protein structure prediction. By incorporating structure prediction results as training guidance, DivPro ensures that sequences sampled from this learned space reliably fold into the target structure. We conduct extensive experiments on three sequence design benchmarks and evaluated the structures of designed sequences using structure prediction models including AlphaFold2. Results show that DivPro can maintain high structure recovery while significantly improve the sequence diversity.
2025-07-21 16:40:00 16:50:00 03B 3DSIG AlphaPulldown2—a general pipeline for high-throughput structural modeling Dmitry Molodenskiy Dmitry Molodenskiy, Valentin Maurer, Dingquan Yu, Grzegorz Chojnowski, Stefan Bienert, Gerardo Tauriello, Konstantin Gilep, Torsten Schwede, Jan Kosinski AlphaPulldown2 streamlines protein structural modeling by automating workflows, improving code adaptability, and optimizing data management for large-scale applications. It introduces an automated Snakemake pipeline, compressed data storage, support for additional modeling backends like AlphaFold3 and AlphaLink2, and a range of other improvements. These upgrades make AlphaPulldown2 a versatile platform for predicting both binary interactions and complex multi-unit assemblies.
2025-07-21 16:50:00 17:00:00 03B 3DSIG Extending 3Di: Increasing Protein Structure Search Sensitivity with a Complementary Alphabet Michel van Kempen Michel van Kempen, Johannes Soeding Fast protein structure search methods, such as Foldseek, are essential to make use of the vast amount of structural information generated by structure prediction methods. In Foldseek, the key idea is to represent structures as sequences of discrete tokens from a structural alphabet, enabling fast searches through structure databases using efficient sequence comparison methods. Foldseek uses the 3Di alphabet for structure representation. However, its structure representation, comprising 20 states, describes only a limited aspect of the overall structure, resulting in lower search sensitivity compared to methods like TMalign or Dali, which use the full structure. To further improve structure search sensitivity, we present a new structural alphabet as an extension to the established 3Di alphabet. Instead of replacing 3Di, our new alphabet was trained to encode structural information complementary to the 3Di states. The combination of the two alphabets allows to balance search sensitivity and speed: the 3Di alphabet alone is used for the most time-critical tasks, while the final alignments benefit from additional structural information from both alphabets, increasing the entire search performance. On the SCOPe dataset, extending the 3Di alphabet with 12 states of the new alphabet increases search sensitivity at the superfamily level by 22%, compared to 13% when adding the amino acid alphabet instead. Moreover, adding the new alphabet as a third alphabet to Foldseek improves its search sensitivity by 4.5%.
2025-07-21 17:00:00 17:10:00 03B 3DSIG PPI3D clusters: non-redundant datasets of protein-protein, protein-peptide and protein-nucleic acid complexes, interaction interfaces and binding sites Justas Dapkunas Justas Dapkunas, Kliment Olechnovic, Ceslovas Venclovas To accomplish their functions in living organisms, proteins usually interact with various biological macromolecules, including other proteins and nucleic acids. Despite recent progress in structure prediction, only part of these interactions can be predicted accurately, and modeling those involving nucleic acids is especially hard. Therefore, improved computational methods for analysis and prediction of biomolecular interactions are in high demand. The development of such methods largely depends on the availability of reliable data. However, the experimental data in the Protein Data Bank (PDB) are noisy and hard to interpret. To facilitate the analysis of the biomolecular interactions, we developed the PPI3D web resource that is based on a database of clustered non-redundant sets of biomolecular complexes, interaction interfaces and binding sites. The structures are clustered based on both sequence and structure similarity, thus retaining the alternative interaction modes. All protein-protein, protein-peptide and protein-nucleic acid interaction interfaces and binding sites are pre-analyzed by means of Voronoi tessellation. The data are updated every week to keep in sync with the PDB. The users can query the data by different criteria, select the interactions of interest, download the desired data subsets in tabular format and as coordinate files, and use them for detailed investigation of protein interactions or for training the machine learning models. We expect that the PPI3D clusters will become a useful resource for researchers working on diverse problems related to biomolecular interactions. PPI3D is available at http://bioinformatics.ibt.lt/ppi3d/.
2025-07-21 17:10:00 17:30:00 03B 3DSIG From GWAS to Protein Structures: Illuminating Stress Resistance in Plants Su Datt Lam Fatima Shahid, Neeladri Sen, Christine Orengo, Su Datt Lam Plants face significant environmental stress such as pathogens, salinity, drought, and extreme temperatures. To survive, they evolve diverse adaptive mechanisms. Genome-wide association studies (GWAS) are widely used to identify genes linked to stress resistance, but often generate too many variants to interpret easily. This study maps GWAS-derived missense mutations to rice protein structures to prioritise those with functional impact. Despite limited experimentally determined plant protein structures, resources like the AlphaFold Protein Structure Database and The Encyclopedia of Domains (TED) offer high-quality models and domain annotations. We focused on TED domains with reliable structure—excluding those with low pLDDT scores, disorder, poor packing, or non-globular features. Stress-resistance mutations from the GWAS Atlas were then mapped to these domains. Functional sites were predicted using P2Rank and AlphaFill, and proximity of mutations to these sites was analysed. Among 149 mutations mapped to 113 TED domains, 14 were predicted as non-deleterious by MutPred2—potential gain-of-function variants. 70 mutations were near predicted functional sites. To explore potential impacts on protein interactions, AlphaFold 3 was used to model 24 protein complexes, and mCSM-PPI2 estimated changes in binding affinity. Some mutations enhanced protein-protein interactions. We calculated predicted changes in binding affinity following mutations using mCSM-PPI2. Several interesting cases demonstrated increased binding to interacting partners, which will be discussed in the talk. This is the first study using AlphaFold models to investigate stress-resistance mutations in plants, providing insights into their functional impact and supporting future breeding strategies vital for food security amid climate change.
2025-07-21 17:30:00 17:40:00 03B 3DSIG Chromatin as a Coevolutionary Graph: Modeling the Interplay of Replication with Chromatin Dynamics Sevastianos Korsak Sevastianos Korsak, Krzysztof H Banecki, Karolina Buka, Piotr Górski, Dariusz Plewczynski Modeling DNA replication poses significant challenges due to the intricate interplay of biophysical processes and the need for precise parameter optimization. In this study, we explore the interactions among three key biophysical factors that influence chromatin folding: replication, loop extrusion, and compartmentalization. Replication forks, which act as moving barriers to loop extrusion factors, contribute to the dynamic reorganization of chromatin during S phase. Notably, replication timing is known to correlate with the phase separation of chromatin into A and B compartments. Our approach integrates three components: (1) a numerical model that uses single-cell replication timing data to simulate fork propagation; (2) a stochastic Monte Carlo simulation capturing loop extrusion dynamics, CTCF and fork barriers, and epigenetic state spreading via a Potts Hamiltonian; and (3) a 3D OpenMM simulation that reconstructs chromatin structure based on the resulting state trajectories. In this work, we model the dynamic evolution of chromatin states using co-evolutionary graphs, in which both node and link states evolve stochastically and interactively. These graphs are translated into 3D chromatin structures: links correspond to harmonic bonds representing physical loops, while node states determine compartmental interactions modeled via block-copolymer attractive forces. We reconstruct 3D chromatin trajectories across the cell cycle by incorporating biologically grounded force-field parameters that vary between cell cycle phases to reflect experimentally observed changes in chromatin organization. Our framework, to our knowledge the first to dynamically integrate these three biophysical factors, provides new insights into chromatin behavior during replication and reveals how replication stress impacts chromatin organization.
2025-07-21 17:40:00 18:00:00 03B 3DSIG RNA-TorsionBERT: leveraging language models for RNA 3D torsion angles prediction Clément Bernard Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariz Tahi Predicting the 3D structure of RNA is an ongoing challenge that has yet to be completely addressed despite continuous advancements. RNA 3D structures rely on distances between residues and base interactions but also backbone torsional angles. Knowing the torsional angles for each residue could help reconstruct its global folding, which is what we tackle in this work. We present a novel approach for directly predicting RNA torsional angles from raw sequence data. Our method draws inspiration from the successful application of language models in various domains and adapts them to RNA. We have developed a language-based model, RNA-TorsionBERT, incorporating better sequential interactions for predicting RNA torsional and pseudo-torsional angles from the sequence only. Through extensive benchmarking, we demonstrate that our method improves the prediction of torsional angles compared to state-of-the-art methods. In addition, by using our predictive model, we have inferred a torsion angle-dependent scoring function, called TB-MCQ, that replaces the true reference angles by our model prediction. We show that it accurately evaluates the quality of near-native predicted structures, in terms of RNA backbone torsion angle values. Our work demonstrates promising results, suggesting the potential utility of language models in advancing RNA 3D structure prediction. The source code is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr/evryrna/RNA-TorsionBERT.
2025-07-22 11:20:00 12:00:00 03B 3DSIG A (bio)computational perspective on protein folding, function and evolution Diego Ulises Ferreiro Diego Ulises Ferreiro Natural protein molecules are amazing objects that somehow compute their structure, dynamics, and activities given a sequence of amino acids and an environment. In turn, protein evolution solves the problem of finding sequences that satisfy the constraints given by biological functions, closing an informational loop that relates an equilibrium thermodynamic system (protein folding) with a non-equilibrium information-gathering and -using system (protein evolution). I will present and discuss results from an information-theory perspective of protein folding, function, and evolution. I will also present extensions of the theory to other terrestrial biopolymers and potential extraterrestrial ones.
2025-07-22 12:00:00 12:20:00 03B 3DSIG Structural Phylogenetics: toward an evolutionary model capturing both sequence and structure David Moi David Moi, Christophe Dessimoz Inferring deep phylogenetic relationships between proteins requires methods that can capture the iterative optimisation of the final folded protein object through evolution. This entails considering both sequence and structure information. While sequence-based phylogenetics has long been the standard, recent progress in structure prediction and modeling has opened new opportunities to harness 3D structural information in tree reconstruction. We recently introduced FoldTree, a practical framework for structure-based phylogenetics. Central to FoldTree is a robust benchmarking strategy that enables fair comparison between sequence and structure-based methods — a critical step given their fundamentally different inputs. Using local structural alphabets derived from protein geometry, FoldTree not only outperforms conventional sequence-based approaches for remote homologs but, surprisingly, also improves phylogenetic resolution among relatively close relatives. Its success has enabled novel evolutionary insights, such as clarifying the diversification of RRNPPA quorum-sensing receptors across bacteria, plasmids, and phages. Building on this foundation, we now introduce a new generation of structural alphabets developed using graph neural networks (GNNs). In this approach, protein structures are represented as graphs where residues are nodes labeled with physicochemical and geometric features, and edges encode diverse relationships such as spatial proximity, hydrogen bonding, or allosteric coupling. These alphabets capture both local residue environments and the broader network of structural constraints, bridging the gap between sequence and structure information and enabling integrative phylogenetic inference from sequence and structure. Together, these developments chart a path towards integrative sequence and structural phylogenetics, expanding the reach of evolutionary inference beyond the twilight zone of sequence similarity.
2025-07-22 12:20:00 12:40:00 03B 3DSIG The structural and functional plasticity of the GNAT fold: A case of convergent evolution Joel Roca Martinez Joel Roca Martinez, Hazel Leiva, Jialin Lin, Misty L Kuhn, Christine Orengo Spermine/spermidine acetyltransferases (SSATs) are members of the highly diverse Gcn5-related N-acetyltransferase (GNAT) superfamily, which ranks among the top 1% most structurally and sequence-diverse families in the CATH database. Prior studies have shown that while bacterial and eukaryotic SSAT enzymes catalyze the same reaction, they differ in residue conservation patterns, oligomeric states, and presence of allosteric sites. This raises the question of whether their functional similarity reflects convergent or divergent evolution. To investigate this, we utilized complementary in silico and in vitro experimental approaches. In silico experiments included analyzing ~37,000 GNAT sequences using AlphaFold2 modelling and additional sequence- and structure-based tools including FunFamer and FunTuner (in-house tools), Zebra3D, and IQ-TREE. A total of 71 SSAT enzymes were selected for in vitro experimental validation whereby substrate screening and enzyme kinetic assays showed distinct substrate preferences linked to specificity-determining residues and structural features. Our results support a model of convergent evolution between bacterial SpeG and human SSAT1 enzymes, with additional subfamilies showing divergent evolutionary paths. This work highlights the evolutionary plasticity of the GNAT fold and demonstrates how integrating computational and experimental strategies can uncover functional insights in large, diverse enzyme families.
2025-07-22 12:40:00 13:00:00 03B 3DSIG Virus targeting as a dominant driver of interfacial evolution in the structurally resolved human-virus protein-protein interaction network Wan-Chun Su Wan-Chun Su, Yu Xia The competitive nature of host-virus protein-protein interactions drives an ongoing evolutionary arms race between hosts and viruses. The surface regions on a host protein that interact with virus proteins (exogenous interfaces) frequently overlap with those that interact with other host proteins (endogenous interfaces), forming interfaces that are shared between virus and host protein partners (mimic-targeted interfaces). This phenomenon, referred to as interface mimicry, is a common strategy used by viruses to invade and exploit the cellular pathways of host organisms. Yet, the quantitative evolutionary consequences of interface mimicry on the host are not well-understood. Here, we integrate experimentally determined 3D structures and homology-based molecular templates of protein complexes with protein-protein interaction networks to construct a high-resolution human-virus structural interaction network. We perform rigorous site-specific evolutionary analyses on this structural interaction network and find that exogenous-specific interfaces evolve significantly faster than endogenous-specific interfaces. Surprisingly, mimic-targeted interfaces are as fast evolving as exogenous-specific interfaces, despite being targeted by both human and virus proteins. Moreover, we find that rapidly evolving mimic-targeted interfaces bound by human viruses are only visible in the mammalian lineage. Our findings suggest that virus targeting exerts an overwhelming influence on host interfacial evolution, within the context of domain-domain interactions, and that mimic-targeted interfaces on human proteins are the key battleground for a mammalian-specific host-virus evolutionary arms race. Overall, our study provides insights into the selective pressures that viruses impose on their hosts at the protein residue level, enabling a quantitative and systematic understanding of host-pathogen interaction and evolution.
2025-07-22 14:00:00 14:20:00 03B 3DSIG Novel structural arrangements from a billion-scale protein universe Nicola Bordin Jingi Yeo, Yewon Han, Nicola Bordin, Andy M. Lau, Shaun Kandathil, Hyunbin Kim, Milot Mirdita, David Jones, Christine Orengo, Martin Steinegger Recent advances in protein structure prediction by AlphaFold2 and ESMFold have massively expanded the known protein structural landscape. The AlphaFold Protein Structure Database (AFDB) now contains over 200 million models, while ESMAtlas hosts more than 600 million predicted structures from metagenomic data in MGnify. These resources span diverse taxonomic groups, including many unculturable species, and reveal previously unknown evolutionary relationships and structural arrangements. To harness this data, new computational strategies for classification and comparison are essential. We clustered the ESMatlas using Foldseek Cluster, identifying 72 million structure clusters and mapping their distribution across taxa. This uncovered novel evolutionary patterns, such as structural analogs in extreme environments and new domain combinations absent from PDB-based databases like CATH. In parallel, The Encyclopedia of Domains (TED) systematically classifies protein domains across the AFDB and reveals over 365 million domains—far surpassing traditional sequence-based methods. More than 100 million of these domains were previously undetected, underscoring the power of structure-based approaches in expanding known domain space. Together, ESMAtlas and TED help chart uncharted structural territory. ESMatlas proteins enrich the known protein universe with unique domain architectures, while TED reveals thousands of putative new folds. These breakthroughs demonstrate how multidomain proteins evolve through novel fold combinations and packing geometries. Both efforts also reveal patterns of domain exclusivity, lineage-specific architectures, and structural convergence across the Tree of Life, suggesting environmental adaptations and ancient, conserved folds crucial for cellular function. By uncovering new domain arrangements and interactions, we approach a comprehensive map of the protein universe.
2025-07-22 14:20:00 14:40:00 03B 3DSIG Towards a comprehensive view of the pocketome universe – biological implications and algorithmic challenges Hanne Zillmer Hanne Zillmer, Dirk Walther With the availability of reliably predicted 3D-structures for essentially all known proteins, characterizing the entirety of protein - small-molecule interaction sites (binding pockets) has become a possibility. The aim of this study was to identify and analyze all compound-binding sites, i.e. the pocketomes, of eleven different species’ from different kingdoms of life to discern evolutionary trends as well as to arrive at a global cross-species view of the pocketome universe. All protein structures available in the AlphaFold database for each species were subjected to computational binding site predictions. The resulting set of potential binding sites was inspected for overlaps with known pockets and annotated with regard to the protein domains. 2D-projections of all pockets embedded in a 128-dimensional feature space and characterizing all pockets with regard to selected physicochemical properties, yielded informative, global pocketome maps that reveal differentiating features between pockets. By clustering all pockets within species, our study revealed a sub-linear scaling law of the number of unique binding sites relative to the number of unique protein structures per species. Thus, larger proteomes harbor less than proportionally more different binding sites than species with smaller proteomes. We discuss the significance of this finding as well as identify critical and unmet algorithmic challenges.
2025-07-22 14:40:00 14:50:00 03B 3DSIG Towards a Biophysical Description of the Protein Universe Miguel Fernandez-Martin Miguel Fernandez-Martin, Nicola Bordin, Christine Orengo, Alfonso Valencia, Gonzalo Parra Understanding how protein families evolve and function remains a central question in molecular biophysics. By grouping evolutionarily related proteins into Functional Families (FunFams), CATH captures structural and functional conservation beyond sequence identity. By integrating AlphaFold2 models, CATH offers a representative view of the protein universe. Our group has developed a methodology to quantify local frustration conservation patterns in protein families, providing a biophysical interpretation of evolutionary constraints related to foldability, stability and function. In this study, we scaled frustration conservation analysis to a representative portion of the protein universe. We have analyzed over 8,900 FunFams (2.2M sequences) from CATH and TED, and explored the frustration and aminoacid identities distributions across the 20 Foldseek’s 3Di tertiary neighborhoods. We investigated how these geometries influence conservation patterns and find that some aminoacid identities (e.g. C, V, L, F, I, M) are conserved in a minimally frustrated state, indicating their evolutionary importance as structural anchors. Other residues (e.g. T, S, H, G) tend to be conserved in a neutral state, historically overlooked, suggesting that neutral frustration is not just an energetic buffering state but an evolutionarily constrained one. Additionally, some residues (e.g. D, K, E, N, Q) exhibit high proportions of conserved high frustration, potentially relevant for function. We present the first large-scale frustration survey of the protein universe, which allows us to distinguish whether sequence conservation reflects stability, neutrality or function. This framework offers a new way of interpreting conservation and lays the foundation for a biophysically informed understanding of protein evolution.
2025-07-22 14:50:00 15:00:00 03B 3DSIG Computational methods for the characterisation and evaluation of protein-ligand binding sites Javier Sánchez Utgés Javier Sánchez Utgés, Stuart MacGowan, Geoff Barton Fragment screening is used for hit identification in drug discovery, but it is often unclear which binding sites are functionally relevant. Here, data from 37 experiments is analysed. A method to group ligands by protein interactions is introduced and sites clustered by their solvent accessibility. This identified 293 ligand sites, grouped into four clusters. C1 includes buried, conserved, missense-depleted sites and is enriched in known functional sites. C4 comprises accessible, divergent, missense-enriched sites and is depleted in function. This approach is extended to the entire PDB, resulting in the LIGYSIS dataset, accessible through a new web server. LIGYSIS-web hosts a database of 65,000 protein-ligand binding sites across 25,000 proteins. LIGYSIS sites are defined by aggregating unique relevant protein-ligand interfaces across multiple structures. Additionally, users can upload structures for analysis, results visualisation and download. Results are displayed in LIGYSIS-web, a Python Flask web application. Finally, the human component of LIGYSIS, comprising 6800 binding sites across 2775 proteins, is employed to perform the largest benchmark of ligand site prediction to date. Thirteen canonical methods and fifteen novel variants are evaluated using fourteen metrics. Additionally, LIGYSIS is compared to datasets like PDBbind or MOAD and shown to be superior, since it considers non-redundant interfaces across biological assemblies. Re-scored fpocket predictions present the highest recall (60%). The detrimental effect in performance of redundant prediction, and the beneficial impact of stronger pocket scoring schemes is demonstrated. To conclude, top-N+2 recall is proposed as a robust benchmark metric and authors encouraged to share their benchmark code.
2025-07-22 15:00:00 15:20:00 03B 3DSIG ScGOclust: leveraging gene ontology to find functionally analogous cell types between distant species Yuyao Song Yuyao Song, Yanhui Hu, Julian Dow, Norbert Perrimon, Irene Papatheodorou Basic biological processes are shared across animal species, yet their cellular mechanisms are profoundly diverse. Comparing cell-type gene expression between species reveals conserved and divergent cellular functions. However, as phylogenetic distance increases, gene-based comparisons become less informative. The Gene Ontology (GO) knowledgebase offers a solution by serving as the most comprehensive resource of gene functions across a vast diversity of species, providing a bridge for distant species comparisons. Here, we present scGOclust, a computational tool that constructs de novo cellular functional profiles using GO terms, facilitating systematic and robust comparisons within and across species. We applied scGOclust to analyse and compare the heart, gut and kidney between mouse and fly, and whole-body data from C.elegans and H.vulgaris. We show that scGOclust effectively recapitulates the function spectrum of different cell types, characterises functional similarities between homologous cell types, and reveals functional convergence between unrelated cell types. Additionally, we identified subpopulations within the fly crop that show circadian rhythm-regulated secretory properties and hypothesize an analogy between fly principal cells from different segments and distinct mouse kidney tubules. We envision scGOclust as an effective tool for uncovering functionally analogous cell types or organs across distant species, offering fresh perspectives on evolutionary and functional biology.
2025-07-22 15:20:00 15:40:00 03B 3DSIG Mapping and characterization of the human missense variation universe using AlphaFold 3D models Alessia David Gordon Hanna, Elbert Timothy, Suhail A Islam, Michael Sternberg, Alessia David The deep learning algorithm AlphaFold has revolutionized the field of structural biology by producing highly accurate three-dimensional models of the proteome, thus providing a unique opportunity for atom-based analysis of human missense variants. Current variant prediction tools, such as REVEL, EVE and AlphaMissense, have significantly improved the prediction of damaging amino acid substitutions, but do not explain the mechanism by which these variants impact the phenotype, which, in most cases, remains elusive. We developed a pipeline to automatically identify accurately modelled amino acid regions that can be used for variant characterization. The recommended AlphaFold pLDDT threshold for an accurately modelled residue is =>70. When using this threshold for the query residue, the accuracy of the atom-based predictions calculated using our in-house variant prediction algorithm Missense3D is 0.66, MCC 0.36, TPR/FPR 5.1. We show that, when the model accuracy of the environment surrounding the query residue (E-plDDT5A) is considered, an E-plDDT5A >=60 provides similar accuracy, MCC and TPR/FPR to that obtained using the plDDT threshold >=70 for the query residue alone, but increases the number of residues for which an atom-based analysis can be performed. When using this new E-plDDT 5A>=60 threshold, >68% of the human proteome and >4 million missense variants can be modelled with sufficient quality to allow an atom-based analysis. In conclusion, AlphaFold 3D models offer a unique opportunity to understand the consequences of amino acid substitutions on protein structure, thus complementing existing evolutionary-based methods.
2025-07-22 15:40:00 16:00:00 03B 3DSIG CATH-ddG: towards robust mutation effect prediction on protein–protein interactions out of CATH homologous superfamily Guanglei Yu Guanglei Yu, Xuehua Bi, Teng Ma, Yaohang Li, Jianxin Wang Motivation: Protein-protein interactions (PPIs) are fundamental aspects in understanding biological processes. Accurately predicting the effects of mutations on PPIs remains a critical requirement for drug design and disease mechanistic studies. Recently, deep learning models using protein 3D structures have become predominant for predicting mutation effects. However, significant challenges remain in practical applications, in part due to the considerable disparity in generalization capabilities between easy and hard mutations. Specifically, a hard mutation is defined as one with its maximum TM-score < 0.6 when compared to the training set. Additionally, compared to physics-based approaches, deep learning models may overestimate performance due to potential data leakage. Results:We propose new training/test splits that mitigate data leakage according to the CATH homologous superfamily. Under the constraints of physical energy, protein 3D structures and CATH domain objectives, we employ a hybrid noise strategy as data augmentation and present a geometric encoder scenario, named CATH-ddG, to represent the mutational microenvironment differences between wild-type and mutated protein complexes. Additionally, we fine-tune ESM2 representations by incorporating a lightweight nonlinear module to achieve the transferability of sequence co-evolutionary information. Finally, our study demonstrates that CATH-ddG framework provides enhanced generalization by outperforming other baselines on non-superfamily leakage splits, which plays a crucial role in exploring robust mutation effect regression prediction. Independent case studies demonstrate successful enhancement of binding affinity on 419 antibody variants to human epidermal growth factor receptor 2 (HER2) and 285 variants in the receptor-binding domain (RBD) of SARS-CoV-2 to angiotensin-converting enzyme 2 (ACE2) receptor.
2025-07-22 16:40:00 17:00:00 03B 3DSIG Investigating Enzyme Function by Geometric Matching of Catalytic Motifs Raymund Hackett Raymund Hackett, Martin Larralde, Ioannis Riziotis, Janet Thornton, Georg Zeller Detecting catalytic features in protein structures can provide important hints about enzyme function and mechanism. Keeping pace with the rapidly growing universe of predicted protein structures requires computationally fast and scalable but interpretable tools. A library of 3D coordinates describing enzyme catalytic sites, referred to as templates, has been collected from manually curated and literature annotated examples of enzyme catalytic mechanisms described in the Mechanism and Catalytic Site Atlas. We provide this library of templates and a fast and modular python tool implementing the geometric matching algorithm Jess to identify matching catalytic sites in both experimental and predicted protein structures. We implement stringent match filtering to reduce the number of false matches occurring by chance. We validated this method against a non-redundant set of high quality experimental and predicted enzyme structures with well annotated catalytic sites. Geometric, knowledge based criteria are used to differentiate catalytically informative matches from spurious ones. We show that structurally matching catalytic templates is more sensitive than sequence based and even some structure based approaches in identifying homology between extremely distant enzymes. Since geometric matching does not depend on conserved sequence motifs or even common evolutionary history, we are able to identify examples of structural active site similarity in divergent and possibly convergent enzymes. Such examples make interesting case studies into the ancestral evolution of enzyme function. While insufficient for detecting and characterising substrate specific binding sites, this methodology could be suitable for expanding the annotation of enzyme active sites across proteomes.
2025-07-22 17:00:00 17:20:00 03B 3DSIG Cellular location shapes quaternary structure of enzymes. Gyorgy Abrusan Gyorgy Abrusan, Aleksej Zelezniak The main forces driving protein complex evolution are currently not well understood, especially in homomers, where quaternary structure might frequently evolve neutrally. Here we examine the factors determining oligomerisation by analysing the evolution of enzymes in circumstances where homomers rarely evolve. We show that 1) In extracellular environments, most enzymes with known structure are monomers, while in the cytoplasm homomers, indicating that the evolution of oligomers is cellular environment dependent; 2) The evolution of quaternary structure within protein orthogroups is more consistent with the predictions of constructive neutral evolution than an adaptive process: quaternary structure is gained easier than it is lost, and most extracellular monomers evolved from proteins that were monomers also in their ancestral state, without the loss of interfaces. Our results indicate that oligomerisation is context-dependent, and even when adaptive, in many cases it is probably not driven by the intrinsic properties of enzymes, like their biochemical function, but rather the properties of the environment where the enzyme is active. These factors might be macromolecular crowding and excluded volume effects facilitating the evolution of interfaces, and the maintenance of cellular homeostasis through shaping cytoplasm fluidity, protein degradation, or diffusion rates.
2025-07-22 17:20:00 17:40:00 03B 3DSIG In silico design of stable single-domain antibodies with high affinity Gabriel Cia Gabriel Cia, Frederic Rousseau, Luis Serrano Pubul, Joost Schymkowitz, Maarten Dewilde, Savvas N. Savvides, Alexander N. Volkov, Carlo Carolis, Nick Geukens, Zhongyao Zhang, Gabriele Orlando, Damiano Cianferoni, Javier Delgado Blanco, Katerina Maragkou, Teresa Garcia, David Vizarraga, Iva Marković, Rob Van der Kant Monoclonal antibodies are rapidly becoming a standard drug format in the pharmaceutical industry, but current immunization-based methods for antibody discovery often present limitations in terms of developability, binding affinity, cross-reactivity and, importantly, selectively targeting a prespecified epitope. Given these limitations, rational optimization and de novo design of antibodies with computational methods is becoming an attractive alternative to traditional antibody development methods. While recent deep learning methods have shown tremendous progress for protein design, their application to therapeutic antibody formats remains one of the major open challenges in the field. Here, we present EvolveX, a structure-based computational pipeline for antibody optimization and de novo design. EvolveX is a multi-objective optimization algorithm incorporating CDR modeling, biophysical parameters from the FoldX empirical force field and developability features into a single unified antibody design pipeline. We experimentally validated the ability of EvolveX to optimize the affinity and stability of a nanobody targeting mouse Vsig4 and, more challengingly, its ability to redesign the nanobody to bind the human Vsig4 ortholog with very high affinity compared to the wildtype nanobody, resulting in a 1000-fold improved Kd. Structural analyses by X-ray crystallography and NMR confirmed the accuracy of the predicted designs, which display optimized interactions with the antigen. Collectively, our study highlights EvolveX’s potential to overcome current limitations in antibody design, offering a powerful tool for the development of next-generation therapeutics with enhanced specificity, stability, and efficacy.
2025-07-22 17:40:00 18:00:00 03B 3DSIG An improved deep learning model for immunogenic B epitope prediction Rakshanda Sajeed Rakshanda Sajeed, Swatantra Pradhan, Rajgopal Srinivasan, Sadhna Rana The recognition of B epitopes by B cells of the immune system initiates immune response that leads to production of antibodies to combat bacterial and viral infections. The development of computational methods for predicting the epitopes on antigens has shown promising results in the development of subunit vaccines and therapeutics. Recently, the use of protein language models (pLMs) for epitope prediction has led to substantial increase in the prediction accuracies. However, precision needs to be improved greatly to gain significance in practical application. Here, we develop and evaluate a series of models using different combinations of features and feature fusion techniques on a curated independent test set. Our results show that the models that use both protein embeddings along with structural features perform better at predicting B epitopes as compared to the baseline model that uses only protein embeddings as features. We also show from the attention analysis of B and T epitopes, that the evolutionary scale model, ESM-2 captures T-B reciprocity implicitly in the model as a large fraction of high scoring B epitopes are highly attended by the T epitopes.

- top -