The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 16, 2024
8:40-9:20
Invited Presentation: Metallic origins of life
Confirmed Presenter: Yana Bromberg, Emory University, USA
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Gonzalo Parra


Authors List: Show

  • Yana Bromberg, Yana Bromberg, Emory University

Presentation Overview:Show

How did life appear on our planet? Alexander Oparin’s 1924 theory of abiotic evolution of carbon-based molecules in a primordial soup suggests a means to the end. However, the evolutionary path beyond formation of individual molecules remains one of the most profoundly unanswered questions in biology. Biologically catalyzed redox reactions, i.e. proton-coupled electron transfer, drive the energy requirements of all life on Earth, implying that they must have been among the first functionalities acquired by early life.
We aimed to explore the patterns of evolution of redox-driving proteins, i.e. oxidoreductases. The billions of years’ worth of divergence among existing oxidoreductases renders sequence similarity metrics inapplicable. Thus, we incorporated structure into our explorations. We found that the peptide structures that bind transition metals, ubiquitous in redox, have similar topologies across the full diversity of existing metal-binding proteins. The similarity between these peptides strongly suggests that metal binding had a small number of common origins. Moreover, folds central to our network of similarities came primarily from oxidoreductases, further confirming the idea that ancestral peptides facilitated electron transfer reactions. We further note that most (>85%) of the experimentally determined protein structures incorporate similar folds, suggesting that metal-binding may have given rise to much more functionality. Finally, our results suggest that the earliest, biologically-functional peptides were likely available prior to the assembly of the first fully functional protein domains over 3.8 billion years ago.

July 16, 2024
9:20-9:40
De Novo Atomic Protein Structure Modeling for Cryo-EM Density Maps Using 3D Transformer and Hidden Markov Model
Confirmed Presenter: Jianlin Cheng, University of Missouri - Columbia, United States
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Gonzalo Parra


Authors List: Show

  • Jianlin Cheng, Jianlin Cheng, University of Missouri - Columbia
  • Nabin Giri, Nabin Giri, University of Missouri - Columbia

Presentation Overview:Show

Accurately building three-dimensional (3D) atomic structures from 3D cryo-electron microscopy (cryo-EM) density maps is a crucial step in the cryo-EM-based determination of the structures of protein complexes. Despite improvements in the resolution of 3D cryo-EM density maps, the de novo conversion of density maps into 3D atomic structures for protein complexes that do not have accurate homologous or predicted structures to be used as templates remains a significant challenge. Here, we introduce Cryo2Struct, a fully automated ab initio cryo-EM structure modeling method that utilizes a 3D transformer to identify atoms and amino acid types in cryo-EM density maps first, and then employs a novel Hidden Markov Model (HMM) to connect predicted atoms to build backbone structures of proteins. Tested on a standard test dataset of 128 cryo-EM density maps with varying resolutions (2.08 - 5.6 ̊A) and different numbers of residues (448 - 8,416), Cryo2Struct built substantially more accurate and complete protein structural models than the widely used ab initio method - Phenix in terms of multiple evaluation metrics. Moreover, on a new test dataset of 500 recently released density maps with varying resolutions (1.9 - 4.0 ̊A) and different numbers of residues (234 - 8,828), its performance of building atomic structural models is rather robust against changes in the resolution of density maps and the size of protein structures.

July 16, 2024
9:40-10:00
Proceedings Presentation: RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models
Confirmed Presenter: Han Huang, The Chinese University of Hong Kong, Hong Kong
Track: 3DSIG

Room: 520a
Format: Live Stream
Moderator(s): Gonzalo Parra


Authors List: Show

  • Han Huang, Han Huang, The Chinese University of Hong Kong
  • Ziqian Lin, Ziqian Lin, Nanjing University
  • Dongchen He, Dongchen He, The Chinese University of Hong Kong
  • Liang Hong, Liang Hong, The Chinese University of Hong Kong
  • Yu Li, Yu Li, The Chinese University of Hong Kong

Presentation Overview:Show

RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the non-unique structure-sequence mapping, and the flexibility of RNA conformation. In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in-silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints.

July 16, 2024
10:40-10:50
Positional Protein Bioinformatics: A universal residue numbering scheme for the Immunoglobulin (Ig) fold enables its systemic detection in the protein universe.
Confirmed Presenter: Philippe Youkharibache, National Cancer Institute, NIH
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Chris Kieslich


Authors List: Show

  • Caesar Tawfeeq, Caesar Tawfeeq, California State University Northridge
  • Jiyao Wang, Jiyao Wang, NIH/NCBI
  • Thomas Madej, Thomas Madej, NCBI/NLM/NIH
  • Umesh Khaniya, Umesh Khaniya, National Cancer Institute
  • James Song, James Song, NCBI/NLM/NIH
  • Ravi Abrol, Ravi Abrol, California State University Northridge
  • Philippe Youkharibache, Philippe Youkharibache, National Cancer Institute

Presentation Overview:Show

The Immunoglobulin fold (Ig-fold) is the most populous fold in the human proteome, found in proteins from all domains of life, with current (under)estimates ranging from 2 to 3% of protein coding regions. The ability of Ig-domains to reliably fold and self-assemble through highly specific interfaces represents a remarkable property of these domains that makes them key elements of molecular interaction systems: the immune system, the nervous system, the muscular system and the vascular system. We define a universal sequence numbering scheme, called “IgStRAnD” (Immunoglobulin Strand Residue Anchor Dependent), to represent all domains sharing the Ig-fold. IgStrand numbering enables comparative structural, functional, and evolutionary analyses through positional comparisons between any Ig-domain variant across the universe of Ig-domains. It enables the systematic study of the Ig-proteome and associated Ig-Ig interactomes and sheds light on the robust Ig protein folding algorithm used by nature to form beta sandwich supersecondary structures, responsible for what may be convergent evolution for many of the more than 300 superfamilies sharing the fold. The numbering scheme is at the heart of an algorithm implemented in the interactive structural analysis software iCn3D to systematically recognize Ig-domains, to annotate them, and to perform detailed comparisons in sequence, topology, and structure, regardless of their tertiary plasticity and quaternary organizations. We performed a (preliminary) survey of the human proteome of over 80,000 protein structures leading to a surprisingly higher number of proteins having co-opted Ig-, Ig-like and Ig-extended domains than was estimated in the original human genome survey.

July 16, 2024
10:50-11:10
ImmunoMatch: Illuminating the design of antibody heavy and light chain pairs using deep learning approaches and structure analysis
Confirmed Presenter: Dongjun Guo, King's College London, United Kingdom
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Chris Kieslich


Authors List: Show

  • Dongjun Guo, Dongjun Guo, King's College London
  • Joseph Ng, Joseph Ng, University College London
  • Deborah Dunn-Walters, Deborah Dunn-Walters, University of Surrey
  • Franca Fraternali, Franca Fraternali, University College London

Presentation Overview:Show

Antibodies are composed of heavy (H) and light (L) chains. Sequence variations of H and L chains therefore combinatorially contribute to a diverse antibody “repertoire” for eliciting responses against a variety of antigens. How H chain chooses its L chain partner is still under debate. Little attention has been paid to the exact amino acid preferences and their relative importance in the H-L protein interface. Our results illustrate molecular rules governing antibody H-L chain pairing preferences.
Here we present ImmunoMatch, a heavy-light chain pairing prediction tool taking advantage of recently published antibody language models. We capitalise on the increase in single-cell, paired H-L antibody repertoire data, and build the model to distinguish cognate H-L pairs from random synthetic pairs, with the AUC achieved 0.75. We assembled an antibody structure database (VCAb: https://fraternalilab.cs.ucl.ac.uk/VCAb/) for external validation and further structural interpretation of the pairing prediction. We show that our model, trained on human antibody repertoire, performs well on human and humanized antibody, while the performance dropped in detecting the cognitive pairs from mouse and chimera antibody. We took one therapeutic antibody (trastuzumab) for further analysis by searching through the potential mutation space using ImmunoMatch and extracting attention matrix and found positions which can increase/decrease the H-L pairing likelihood clustering around CDR loops and H-L interface. These results highlight the necessity of considering the entire antibody sequence in antibody design by pre-excluding unlikely H-L combinations in the pipeline for better developability.

July 16, 2024
11:10-11:30
Exploring the biophysical boundaries of protein families with deep learning methods
Confirmed Presenter: Miriam Poley-Gil, Computational Biology Group, Life Sciences Department
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Chris Kieslich


Authors List: Show

  • Miriam Poley-Gil, Miriam Poley-Gil, Computational Biology Group
  • Maria I. Freiberger, Maria I. Freiberger, Department of Biological Chemistry
  • Alin Banka, Alin Banka, Department of Informatics
  • Michael Heinzinger, Michael Heinzinger, Department of Informatics
  • Noelia Ferruz, Noelia Ferruz, Artificial Intelligence for Protein Design Group
  • Alfonso Valencia, Alfonso Valencia, Computational Biology Group
  • R. Gonzalo Parra, R. Gonzalo Parra, Computational Biology Group

Presentation Overview:Show

Recently, Deep Learning models have revolutionised the Molecular Biology field allowing us to explore the intricate interplay between protein sequence, structure and function faster. To understand what they are capturing and generating we have combined state-of-the-art protein models for inverse folding (such as ProstT5[1] and ProteinMPNN[2]) and for sequence generation (such as ProtGPT2[3] and ZymCTRL[4]) with biophysical analyses (Figure 1).
We have studied conservation patterns of local energetic frustration in artificial datasets to shed light on the evolutionary processes leading to the diversification of some protein families, under the assumption that proteins are optimised for folding and stability, but also evolutionarily selected to function. We have developed a tool called FrustraEvo[5] that measures such conservation within and between protein families (available in full on the server https://frustraevo.qb.fcen.uba.ar/).
We found that most of the highly frustrated native residues are related to functional aspects. These functional residues are mostly recovered by sequence generation models, suggesting that there are alternative ways to design proteins instead of the way explored by evolution. In the case of catalytic sites, they are also recovered by inverse folding models. We therefore point out a selective memory concerning functionality (primary level of memory (local)). However, ProteinMPNN, also recovers the main network of frustrated contacts of the functional domains even suggesting a tertiary level of memory (contacts). Thus, our approach promises to effectively unravel the intricacies of protein family boundaries and explore design options for understanding protein evolution.

July 16, 2024
11:30-11:40
Can proteins be represented through secondary structures?
Confirmed Presenter: Michael Schroeder, TU Dresden, Germany
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Chris Kieslich


Authors List: Show

  • Ali Al-Fatlawi, Ali Al-Fatlawi, TU Dresden
  • Michael Schroeder, Michael Schroeder, TU Dresden

Presentation Overview:Show

Recent advancements in protein classification, driven by Foldseek for tertiary structure-based searches, raise the question of whether a simplified secondary structure format is enough for classification and functional inference, eliminating the need for confident tertiary structure determination. This paper explores this debate using a sequence format where 'H' denotes helices, 'S' represents strands, and 'L' signifies loops/turns for each amino acid's secondary structure. Through an all-versus-all comparison using CATH and SCOPe datasets, the approach, though slightly less accurate than tertiary structure-based classification, advocates for a simple, informative representation of proteins, maintaining 90%-93% of tertiary structure performance. This invites the development of a search engine for all secondary structure sequences, facilitating a simple, efficient, and rapid protein search with minimized information requirements.

July 16, 2024
11:40-12:00
SPfast: Highly efficient protein structure alignment with segment-level representations and block-sparse optimization
Confirmed Presenter: Thomas Litfin, Griffith University, Australia
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Chris Kieslich


Authors List: Show

  • Thomas Litfin, Thomas Litfin, Griffith University

Presentation Overview:Show

Recent advances in protein structure modelling have increased the availability of high-quality protein structures at an unprecedented scale. Newly available structure libraries represent an exciting opportunity for discovery-based research. However, the explosion of protein structure data has exposed scaling deficiencies in the bioinformatics toolset which limit their utility for downstream analyses. These scaling problems will only be further exacerbated as modelling projects expand to noncanonical isoforms, dynamic trajectories, de novo designs etc. foldseek has introduced a structure state alphabet to mitigate this computational burden. However, the increased speed is accompanied by trade-offs in search sensitivity due to sacrificing information about global topology. In this work we describe a fully geometric protein structure search engine, SPfast, which leverages a coarse grained, hierarchical representation and an efficient block-sparse optimization heuristic to greatly accelerate pairwise protein structure alignment and enable practical analysis of large-scale structure libraries. Combining SPfast with a newly parameterized SPscore maintains state-of-the art performance for database search, more accurately reproduces pairwise evolutionary alignments and increases throughput by 100x compared with traditional methods.

July 16, 2024
11:40-12:00
STRPsearch: fast detection of structured tandem repeat proteins
Confirmed Presenter: Alexander Monzon, Department of Information Engineering, University of Padova
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Chris Kieslich


Authors List: Show

  • Soroush Mozaffari, Soroush Mozaffari, Department of Biomedical Sciences
  • Paula Nazarena Arrias, Paula Nazarena Arrias, Department of Biomedical Sciences
  • Damiano Clementel, Damiano Clementel, Department of Biomedical Sciences
  • Damiano Piovesan, Damiano Piovesan, Department of Biomedical Sciences
  • Carlo Ferrari, Carlo Ferrari, Department of Information Engineering
  • Silvio Tosatto, Silvio Tosatto, University of Padova
  • Alexander Monzon, Alexander Monzon, Department of Information Engineering

Presentation Overview:Show

State-of-the-art prediction methods are generating millions of publicly available protein structures. Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. STRPs exhibit distinct propensities for secondary structure and form regular tertiary structures, often comprising large molecular assemblies. They can perform important and diverse biological functions due to their highly degenerated sequences, which maintain a similar structure while displaying a variable number of repeat units. This suggests a disconnection between structural size and protein function. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation is the most accurate method for detecting and classifying them, making it impossible to inspect millions of structures.
We present STRPsearch, a novel computational tool for rapid identification, classification, and mapping of STRPs. Leveraging the manually curated entries in RepeatsDB as the known conformational space of the STRPs, STRPsearch utilizes the latest advancements in structural alignment techniques for a fast and accurate detection of repeated structural motifs in protein structures, followed by an innovative approach to map units and insertions through the generation of TM-score graphs. STRPsearch can serve researchers in structural bioinformatics and protein science as an efficient and practical tool for analysis and detection of STRPs.

July 16, 2024
12:00-12:20
The Encyclopedia of Domains
Confirmed Presenter: Nicola Bordin, University College London, United Kingdom
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Chris Kieslich


Authors List: Show

  • Andy Lau, Andy Lau, University College London
  • Nicola Bordin, Nicola Bordin, University College London
  • Shaun Kandathil, Shaun Kandathil, University College London
  • Ian Sillitoe, Ian Sillitoe, University College London
  • Vaishali Waman, Vaishali Waman, University College London
  • Jude Wells, Jude Wells, University College London
  • Christine Orengo, Christine Orengo, University College London
  • David Jones, David Jones, University College London

Presentation Overview:Show

The Encyclopaedia of Domains (TED) is a comprehensive classification of all globular protein structure domains in AlphaFold Database v4. Harnessing state-of-the-art deep learning methods for domain detection, structure comparison and fold detection, TED segments and classifies domains across AFDB, identifying over 370 million distinct domains, surpassing sequence-based resources by over 100 million domains. Nearly 90% of these domains exhibit similarities with known superfamilies in CATH, expanding the resource by over 600-fold. The remaining domains that do not have relatives in any PDB-based resources unveiled over 7 thousand new folds, some of which have interesting and beautiful symmetries. We also find some fascinating new architectures.

TED uncovers over 10,000 previously undetected structural interactions between superfamilies and extends domain coverage to over 1 million taxa, enhancing research for organisms which previously had low to non-existent structural coverage. TED data will be made available in 3D-Beacons as well as a dedicated resource, significantly enriching CATH superfamilies.

July 16, 2024
14:20-14:40
Unraveling SARS-CoV-2 Spike Protein Evolution: A Comprehensive Structural Analysis
Confirmed Presenter: Natalia Fagundes Borges Teruel, Université de Montréal, Canada
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Rafael Najmanovich


Authors List: Show

  • Natalia Fagundes Borges Teruel, Natalia Fagundes Borges Teruel, Université de Montréal
  • Rafael Najmanovich, Rafael Najmanovich, Université de Montréal

Presentation Overview:Show

The evolution of the SARS-CoV-2 virus, the cause of the COVID-19 pandemic, has prompted a detailed investigation into the structural dynamics of its Spike protein. This study presents an in-depth analysis of 1560 published structures of the SARS-CoV-2 Spike protein, covering various variants that have emerged during the pandemic. Employing Surfaces for interaction evaluation, we investigate receptor binding characteristics and antibody recognition patterns associated with these diverse Spike protein structures. We characterized 14 epitopes according to a novel data-driven approach, used to cluster 2044 vectors of antibody interactions and to sort 210 vectors of ACE2 interactions in order to examine their common biding sites. We also exploit the shift in conformational dynamics and its effects in epitope exposure, using NRGTEN for dynamical assessments and occupancy calculations. Through a systematic examination of mutations in each variant, we aim at providing a comprehensive overview of their functional effects on the Spike protein. Our methodologies allow the analysis of structural variations among different SARS-CoV-2 variants and reveals the intricate interplay between genetic alterations and protein functionality, shedding light on the evolutionary forces driving structural changes in the SARS-CoV-2 Spike protein throughout the COVID-19 pandemic.

July 16, 2024
14:20-14:40
Dynamic and Energetic Consequences of Disulfide Bonds in Proteins
Confirmed Presenter: Miguel Fernandez-Martin, Barcelona Supercomputing Center (BSC-CNS), Spain
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Rafael Najmanovich


Authors List: Show

  • Miguel Fernandez-Martin, Miguel Fernandez-Martin, Barcelona Supercomputing Center (BSC-CNS)
  • Alfonso Valencia, Alfonso Valencia, Barcelona Supercomputing Center (BSC-CNS)
  • R. Gonzalo Parra, R. Gonzalo Parra, Barcelona Supercomputing Center (BSC-CNS)

Presentation Overview:Show

Introduction
Disulfide bonds, crucial for protein structure stability, are found to have versatile roles beyond structural support. This study explores three scenarios showcasing their functions based on energetic context. 1) Disulfide bridges aid in forming cyclic cystine knot (CCK) motifs, like in the case of the cyclotide trypsin inhibitor (MCoT-II). Molecular dynamics simulations reveal interplays of frustration and correlation among specific disulfide bridges, like Cys4-Cys21exhibiting functional flexibility. 2) Disulfide bonds introduce frustration, affecting conformational dynamics and regulating functional signaling in bacterial species, as seen in oxidoreductase DsbD (nDSBd). 3) Finally, it might not be related to either structure or function. In the absence of a disulfide bond, Azurin sees its thermal and chemical stability dramatically reduced, but adopts a folded structure identical to that with an intact disulfide, with the Cys3-Cys26 bond modulating stability.

Methods
This work aims to study the different types of disulfide bonds present in the Protein Data Bank (PDB) by analyzing their local energetic frustration patterns. The analysis includes all structures in the PDB with at least one disulfide bond (full dataset n=36571) and a subset of proteins with experimental structures in both oxidized and reduced states for at least one disulfide bond (paired dataset n=1151).

Results
By using the Frustratometer algorithm and IUPRED to assess frustration patterns and predict intrinsic disorder, we aim to elucidate how disulfide bonds locally stabilize or destabilize protein structures, offering insights into their varied roles in biological systems. This understanding is vital for manipulating cellular responses and designing biomedical devices.

July 16, 2024
14:40-15:00
DDMut-PPI: predicting effects of mutations on protein-protein interactions using graph-based deep learning
Confirmed Presenter: Yunzhuo Zhou, University of Queensland, Australia
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Rafael Najmanovich


Authors List: Show

  • Yunzhuo Zhou, Yunzhuo Zhou, University of Queensland
  • Yoochan Myung, Yoochan Myung, University of Queensland
  • Carlos Rodrigues, Carlos Rodrigues, University of Queensland
  • David Ascher, David Ascher, University of Queensland; Baker Institute

Presentation Overview:Show

Protein-protein interactions (PPIs) play a vital role in cellular functions and are essential for therapeutic development and understanding diseases. Traditional methods for exploring the effects of mutations on PPIs face challenges related to experimental complexity, cost, and scalability. While computational methods provide a quicker alternative, they often struggle to balance efficiency and precision in their predictions. In response, we present DDMut-PPI, a deep learning model that efficiently and accurately predicts changes in PPI binding free energy upon single and multiple point mutations. Building on the robust siamese network architecture with graph-based signatures from our prior work, DDMut, the DDMut-PPI model was enhanced with a graph convolutional network to better capture the importance of residues at the interface based on a 2D interaction graph. We used residue-specific embeddings from ProtT5 protein language model as node features, and a variety of molecular interactions as edge features. By integrating evolutionary context with spatial information, this framework enables DDMut-PPI to achieve a robust Pearson correlation of up to 0.67 (RMSE: 1.51 kcal/mol) in our non-redundant evaluations, outperforming most existing methods. Importantly, by utilising both forward and hypothetical reverse mutations to account for model anti-symmetry, the model demonstrated consistent performance across mutations that increase or decrease binding affinity. We believe DDMut-PPI would be a valuable resource for researchers and clinicians looking to explore the complex dynamics of protein interactions and their implications for health and disease. DDMut-PPI is freely available as a user-friendly web server and an API at https://biosig.lab.uq.edu.au/ddmut_ppi.

July 16, 2024
15:00-15:20
Proceedings Presentation: DDAffinity: Predicting the changes in binding affinity of multiple point mutations using protein three-dimensional structure
Confirmed Presenter: Qichang Zhao, Central South University, China
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Rafael Najmanovich


Authors List: Show

  • Guanglei Yu, Guanglei Yu, Central South University
  • Qichang Zhao, Qichang Zhao, Central South University
  • Xuehua Bi, Xuehua Bi, Central South University
  • Jianxin Wang, Jianxin Wang, Central South University

Presentation Overview:Show

Motivation: Mutations are the crucial driving force for biological evolution as they can disrupt protein stability and protein-protein interactions which have notable impacts on protein structure, function, and expression. And the progressive accumulation of multiple point mutations would lead to cancer. However, existing computational methods for protein mutation effects prediction are generally limited to single point mutations with global dependencies, and do not systematically take into account the local and global synergistic epistasis inherent in multiple point mutations.
Results: To this end, we propose a novel spatial and sequential message passing neural network, named DDAffinity, to predict the changes in binding affinity caused by multiple point mutations based on protein three-dimensional (3D) structures. Specifically, instead of being on the whole protein, we perform message passing on the k-nearest neighbour residue graphs to extract pocket features of the protein 3D structures. Furthermore, to learn global topological features, a two-step additive Gaussian noising strategy during training is applied to blur out local details of protein geometry. We evaluate DDAffinity on benchmark datasets and external validation datasets. Overall, the predictive performance of DDAffinity is significantly improved compared with state-of-the-art baselines on multiple point mutations, including end-to-end and pre-training based methods. The ablation studies indicate the reasonable design of all components of DDAffinity. In addition, applications in non-redundant blind testing, predicting mutation effects of SARS-CoV-2 RBD variants, and optimizing human antibody against SARS-CoV-2 illustrate the
effectiveness of DDAffinity. Availability and implementation: DDAffinity is available at https://github.com/ak422/DDAffinity.

July 16, 2024
15:20-15:40
A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology
Confirmed Presenter: Yingying Zhang, Cornell University, United States
Track: 3DSIG

Room: 520a
Format: In Person
Moderator(s): Rafael Najmanovich


Authors List: Show

  • Yingying Zhang, Yingying Zhang, Cornell University
  • Alden Leung, Alden Leung, Cornell University
  • Jin Joo Kang, Jin Joo Kang, Cornell University
  • Yu Sun, Yu Sun, Cornell University
  • Guanxi Wu, Guanxi Wu, Cornell University
  • Le Li, Le Li, Cornell University
  • Jiayang Sun, Jiayang Sun, Cornell University
  • Lily Cheng, Lily Cheng, Cornell University
  • Tian Qiu, Tian Qiu, Cornell University
  • Junke Zhang, Junke Zhang, Cornell University
  • Shayne Wierbowski, Shayne Wierbowski, Cornell University
  • James Booth, James Booth, Cornell University
  • Haiyuan Yu, Haiyuan Yu, Cornell University

Presentation Overview:Show

A major goal of cancer biology is to understand the mechanisms underlying tumorigenesis driven by somatically acquired mutations. Two distinct types of computational methodologies have emerged: one focuses on analyzing clustering of mutations within protein sequences and 3D structures, while the other characterizes mutations by leveraging the topology of protein-protein interaction network. Their insights are largely non-overlapping, offering complementary strengths. Here, we established a unified, end-to-end 3D structurally-informed protein interaction network propagation framework, NetFlow3D, that systematically maps the multiscale mechanistic effects of somatic mutations in cancer. The establishment of NetFlow3D hinges upon the Human Protein Structurome, a comprehensive repository we compiled that incorporates the 3D structures of every single protein as well as the binding interfaces of all known protein interactions in humans. NetFlow3D leverages the Structurome to integrate information across atomic, residue, protein and network levels: It conducts 3D clustering of mutations across atomic and residue levels on protein structures to identify potential driver mutations. It then anisotropically propagates their impacts across the protein interaction network, with propagation guided by the specific 3D structural interfaces involved, to identify significantly interconnected network “modules”, thereby uncovering key biological processes underlying disease etiology. Applied to 1,038,899 somatic protein-altering mutations in 9,946 TCGA tumors across 33 cancer types, NetFlow3D identified 12,378 significant 3D clusters throughout the Human Protein Structurome, of which ~54% would not have been found if using only experimentally-determined structures. It then identified 28 significantly interconnected modules that encompass ~8-fold more proteins than applying standard network analyses.