Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

3DSIG COSI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in UTC
Sunday, July 25th
11:00-12:00
3DSig KEYNOTE: Structure Predictions Transform Protein Family Classification
Format: Live-stream

Moderator(s): Rafael Josef Najmanovich

  • Alex Bateman, EMBL-EBI, United Kingdom

Presentation Overview: Show

Structural prediction models have come of age and are beginning to revolutionise molecular biology. In the recent CASP competition AlphaFold 2 showed accuracies close in many cases to crystal structures. Other methods such as trRosetta and RaptorX still give excellent models that are adequate for many applications. In this talk I will discuss how in collaboration with David Baker’s group we released a large collection (>6,300) of structural models for Pfam families. We have begun to dig into this treasure trove to refine, define and classify protein domain families.

Pfam is a collection of over 19,000 protein families with multiple sequence alignments and profile-HMMs. Pfam is widely used to annotate genomes and metagenomes. Ideally Pfam families would correspond to structural domains or repeats found in protein structures. However, often the Pfam family was built before a structure was known. They may be truncated single domains or contain multiple domains. We find that protein structural models can be used to split large multidomain families. Structural models based on Pfam alignments have been less useful to correct truncated domains and these will require the creation of structural models of full length proteins. The structural models have also been instrumental in identifying which superfamilies many Pfam’s should belong to. Thus the difference between protein sequence and protein structure classification is becoming smaller and may be unified within the coming few years.

12:00-12:20
Exploring the microbiome protein structure space using simulations and deep learning
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Pawel Szczerbiak, Malopolska Centre of Biotechnology, Jagiellonian University, Poland
  • Julia Koehler Leman, Flatiron Institute, United States
  • Douglas Renfrew, Flatiron Institute, United States
  • Vladimir Gligorijevic, Flatiron Institute, United States
  • Daniel Berenberg, New Your University, United States
  • Chris Chandler, Flatiron Institute, United States
  • Richard Bonneau, Flatiron Institute, United States
  • Tomasz Kosciolek, Malopolska Centre of Biotechnology, Jagiellonian University, Poland

Presentation Overview: Show

The human gut microbiome comprises about 3 million unique bacterial genes which is ~10 more than the number of human body genes. Exploring them would give us a possibility to treat diseases that originate in or are influenced by the microbiome. The main goal of the Microbiome Immunity Project is to understand the role played by the various bacteria in the human microbiome. In the first stage of the project we focused on GEBA genomes that cover the microbial part of the tree of life. For this purpose we prepared a dataset consisting of ~250,000 unique newly predicted microbial protein domain structures (in the second stage we will concentrate solely on human gut microbiome genes coming from the UHGP project, reaching ~1,000,000 structures in total). In order to make our analysis more robust, we used two methods: Rosetta and DMPFold which utilize different approaches to the protein structure prediction problem. In the poster we are showing the difference between both methods with special emphasis on structural and functional annotations, new folds identification and structure space visualization. We will also shed some light on sequence-structure-function relationships and compare our dataset with experimental ones (CATH, PDB).

12:40-13:00
How good are protein structure prediction methods at predicting folding pathways?
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Carlos Outeiral Rubiera, University of Oxford, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom

Presentation Overview: Show

Deep learning has achieved unprecedented success in predicting a protein's crystal structure, but whether this achievement relates to a better modelling of the folding process is an open question. In this work, we compare the dynamic pathways from six state-of-the-art protein structure prediction methods to experimental folding data. We find evidence of a weak correlation between simulated dynamics and formal kinetics; however, many of the structures of the predicted intermediates are incompatible with available hydrogen-deuterium exchange experiments. These results suggest that recent advances in protein structure prediction do not provide an enhanced understanding of the principles underpinning protein folding.

13:00-13:20
How sticky are your proteins?
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Dea Gogishvili, Vrije Universiteit Amsterdam, Netherlands
  • Juami van Gils, Vrije Universiteit Amsterdam, Netherlands
  • Jan van Eck, Vrije Universiteit Amsterdam, Netherlands
  • Robbin Bouwmeester, Vrije Universiteit Amsterdam, Netherlands
  • Erik van Dijk, Vrije Universiteit Amsterdam, Netherlands
  • Sanne Abeln, Vrije Universiteit Amsterdam, Netherlands

Presentation Overview: Show

Proteins tend to bury hydrophobic residues inside their core during the folding process to provide stability to the protein structure and to prevent aggregation. Nevertheless, many proteins do expose such ’sticky’ hydrophobic residues to the solvent. Hydrophobic residues may play an important functional role, for example in protein-protein interactions and ligand binding. Here, we investigated how hydrophobic/sticky proteins should be defined in terms of surface hydrophobicity and trained a machine learning model that predicts these hydrophobicity measures from the primary sequence. Firstly, we define structure-based measures: the total and relative hydrophobic surface area(T/RHSA), and - using our MolPatch method - the largest hydrophobic patch(LHP). Secondly, by adapting solvent accessibility predictions from NetsurfP2.0 we obtain well-performing sequence-based prediction methods for the THSA(R2 = 0.75), and RHSA(R2 = 0.49), while the LHP is more difficult to predict(R2 = 0.12). Finally, sticky proteins were mapped to the human proteome by considering tissues, pathways, and diseases in which such proteins occur. We show that very hydrophobic proteins are typically not highly expressed, suggesting there is evolutionary pressure against overabundant sticky proteins. Despite this, we show that sticky proteins are surprisingly abundant in the human brain and that such proteins are associated with neurodegenerative pathways.

13:20-13:40
DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Sutanu Bhattacharya, Auburn University, United States
  • Rahmatullah Roche, Auburn University, United States
  • Debswapna Bhattacharya, Auburn University, United States

Presentation Overview: Show

Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact- or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment.

We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the Continuous Automated Model Evaluation (CAMEO) experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches; and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER.

The open-source DisCovER software package is freely available at https://github.com/Bhattacharya-Lab/DisCovER.

13:40-14:00
Cross-Modality and Self-Supervised Protein Embedding for Compound-Protein Affinity and Contact Prediction
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Yang Shen, Texas A&M University, United States
  • Yuning You, Texas A&M University, United States

Presentation Overview: Show

In silico prediction of compound-protein interaction is important to accelerate drug discovery. Current sequence-based methods for the prediction of compound-protein affinity and contact (CPAC), while aiming at contact prediction as mechanistic interpretation of affinity prediction, only rely on learning from the lone structure-unaware 1D protein sequences. We for the first time adopt cross-modality learning in CPAC to introduce structure-awareness into protein embeddings. We treat proteins as multi-modal data available in both modalities of 1D amino-acid sequences and 2D residue-residue contact maps, where 2D modality provides complementary structure knowledge for utilization. To integrate the information from both modalities, two cross-modality schemes, concatenation and cross interaction, are proposed for combining hierarchical recurrent neural networks (HRNN) as sequence encoder and graph attention networks (GAT) as structure encoder. Moreover, we leverage the promising self-supervised pre-training techniques for embedding 1D sequences and 2D graphs on top of cross-modality models, to address supervision starvation via exploiting rich unpaired unlabelled protein domain data. Numerical results demonstrate that our cross-modality and self-supervised protein embedding can improve the generalizability of affinity and contact prediction for unseen proteins.

14:20-14:40
MODAMDH: identification of diverse Amine Dehydrogenases by screening biodiversity using sequence and structure-based approaches
Format: Pre-recorded with live Q&A

Moderator(s): Maximilian Meixner

  • Eddy Elisée, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Raphaël Meheust, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Laurine Ducrot, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Mark Stam, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Eric Pelletier, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Karine Bastard, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Jean-Louis Petit, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Megan Bennett, York Structural Biology Laboratory, Department of Chemistry, University of York, United Kingdom
  • Gideon Grogan, York Structural Biology Laboratory, Department of Chemistry, University of York, United Kingdom
  • Véronique de Berardinis, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Anne Zaparucha, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • David Vallenet, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France
  • Carine Vergne-Vaxelaire, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, France

Presentation Overview: Show

The current boom in environmental genomics data provides a huge resource of new sequences of potential biocatalysts. Through the MODAMDH project, we focus on one of the key biocatalysts named amine dehydrogenases (AmDHs) which enable the access to amines that are important entities in the chemical industry.
We started from a previously described NAD(P)H-dependent AmDH family from which several members were experimentally characterized. This family was first expanded, up to 27k sequences, by mining very large metagenomic databanks in search of the conserved catalytic domain. We then applied structural modelling and active site classification to define subfamilies. We also generated a pool of ~100k candidate families containing more than 20M NAD(P)-binding protein sequences from which we found, using HMM-HMM profile comparison, >30 families sharing distant homology with the reference AmDH family. Furthermore, catalophores (i.e. minimal active site topologies) will be designed from native AmDH structures and used to find active site analogs in the candidate NAD(P)-dependent families. Most interesting enzymes will be experimentally characterized through enzymatic and crystallographic assays.
This work is ongoing and the presented workflow could be applied to other enzyme families in the quest for new structures and activities.

14:40-15:00
Exploring human population variation and three-dimensional structures in the Armadillo repeat family
Format: Pre-recorded with live Q&A

Moderator(s): Maximilian Meixner

  • Maxim Tsenkov, University of Dundee, United Kingdom
  • Javier Sánchez Utgés, University of Dundee, United Kingdom
  • Stuart MacGowan, University of Dundee, United Kingdom
  • Geoff Barton, University of Dundee, United Kingdom

Presentation Overview: Show

Armadillo repeats (ARs) are 41-amino acids long that fold into a conserved three alpha-helical structure. ARs are organised in tandem arrays that form a superhelical Armadillo-Domain (AD) structure. ADs mediate protein-protein interactions on the concave surface and assemble multi-protein complexes.

We conducted a sequence analysis of 2270 ARs across 57 organisms and explored the human population variant data. We studied the different structural and functional pressures at each position in the ARs, as determined by amino acid conservation, and constraints in the human population data. We integrated this with a quantitative structural analysis of 265 ARs across 162 PDBe structures to identify positions important for the structural fold of the AD and the binding sites involved with protein-substrate interactions.

Positions inferred from sequence and variant analysis reflect the same positions in our structural analysis that have high contact numbers and sites enriched for substrate interactions. Conserved positions constrained in the human population highlight residues in the hydrophobic core of ADs. Unconserved sites depleted for missense variants represent binding-specificity sites or may be involved in the structural maintenance of ADs. We demonstrate how conservation and variation features are constrained by structure and thus, can be used to predict structural features.

15:00-15:20
Graphical Models For Identifying Pore-forming Family Proteins In The Twilight Zone
Format: Pre-recorded with live Q&A

Moderator(s): Maximilian Meixner

  • Nan Xu, USC, United States
  • Ted Kahn, BASF, United States
  • Theju Jacob, BASF, United States
  • Yan Liu, USC, United States

Presentation Overview: Show

Pore-forming toxins (PFTs) are proteins that form lesions in biological membranes. Understanding structures of PFTs benefits us in developing biotechnological applications like antimicrobial drug development, cancer gene therapy and DNA sequencing. However, some existing approaches like HMMs only consider sequence information, which makes them fail to recognize new toxins of similar functions but from low homology (twilight zones) sequences. Meanwhile, available structure information from toxins of interesting functions is too limited to develop well-performed data-hungry models like deep learning approaches. To solve the challenging pore-forming family proteins identification problem, we propose a sample-efficient graphical model, where a protein structure graph is firstly constructed according to consensus secondary structures and a Semi-Markov CRFs model is then developed to perform protein sequence segmentation. We collect toxins of similar functions from 3 twilight zones and observe high rankings of positive proteins for each zone compared with negative proteins from Culled PDB Database. To extract toxins of interesting functions from genome-wide protein database for further study, we develop an efficient framework on 43 million proteins from UniRef50, where a deep learning module is introduced to perform PFTs classification and the proposed graphical model is incorporated for structural similarity ranking among the most likely PFTs.

Monday, July 26th
11:00-11:20
Deep Relaxed Complex Scheme
Format: Live-stream

Moderator(s): Douglas Pires

  • Sara Omar, Genomica AI, Canada
  • Sahar Arbabimoghadam, Genomica AI, Canada
  • Eldad Haber, Genomica AI, Canada

Presentation Overview: Show

Drug discovery is both expensive and time-consuming. It involves different techniques including in silico screening of large databases to find potential therapeutics. Docking is the most commonly used tool for such screening. However, docking millions of compounds can become very expensive and infeasible even on GPU accelerated codes. Additionally, accounting for protein flexibility to increase docking accuracy adds to the computational cost of the problem. Machine learning techniques are currently being explored to replace docking calculations. However, previous efforts have been focused on using a single protein conformation. In this work, we propose combining docking to multiple protein conformations with graph neural networks. These networks work directly with the 3D structure of the molecules and utilize information about chemical bonds, charges and other physical characteristics of the protein and ligand. As a case study, we applied this technique to find inhibitors of the main protease protein of SARS-CoV2. Our initial prediction errors are mostly less than 1 kcal/mol although they can reach a high of almost 2 kcal/mol. This new approach provides X1500 speed up to docking calculations, which can significantly accelerate the drug discovery process.

11:20-11:40
Strategies to Improve the Description of Ligand Binding Sites in Metalloproteins for Biomolecular Simulations.
Format: Pre-recorded with live Q&A

Moderator(s): Douglas Pires

  • Okke Melse, TUM Center for Protein Assemblies and TUM School of Life Sciences, Technische Universität München, Germany
  • Iris Antes, TUM Center for Protein Assemblies and TUM School of Life Sciences, Technische Universität München, Germany

Presentation Overview: Show

Metalloproteins play a pivotal role in many biological processes, and are therefore often found as drug targets. Metalloproteins are also appreciated as biocatalysts in numerous biotechnological applications because of the diverse characteristics of transition metals (TM).[1] However, biomolecular simulations of TM-containing binding sites are challenging, mainly due to the lack of the classical 12-6 Lennard-Jones potentials to describe the strong polarization of the binding site and specific coordination geometries of TMs.[2] We assessed the performance of a large variety of Zn2+ models in long-term classical molecular dynamics simulations. We performed this study in both a monometallic and a bimetallic ligand binding site, as the latter are rarely included in evaluation studies. We found serious differences in performance between the Zn2+ models, especially in the description of Zn2+-non-charged ligating atoms, which strongly affect the binding site geometry and thereby ligand binding. We hereby illustrate the importance of parameterization of metal-containing binding sites and were able to identify suitable simulation conditions depending on the aim of the simulation. The results from this study can provide guidance for biomolecular simulations, such as ligand docking and molecular dynamics simulations of metalloproteins.
[1]Riccardi et al. Nat. Rev. Chem.,2018. 2(7):100-112.
[2]Li. et al. Chem. Rev.,2017. 117(3):1564-1686.

11:40-12:00
Data-Driven Analysis of Single Point Mutations through Rapid Scan of 3D Micro-Environments
Format: Pre-recorded with live Q&A

Moderator(s): Douglas Pires

  • Jochen Sieg, Universität Hamburg, Germany
  • Matthias Rarey, Universität Hamburg, Germany

Presentation Overview: Show

The way mutations change a protein structure is fundamental to understand and modulate protein functionality. Especially, the structural changes that a single mutation introduces into the local micro-environment can provide valuable insights. While modeling such changes is essential for rational design and predicting mutation effects experimental protein structures for both wild type and mutant are rarely directly available. However, if we limit our focus to the direct proximity of a mutation, the PDB holds a wealth of experimental examples. We developed an efficient algorithm based on the SIENA technology for the fast retrieval of mutations from the PDB based on the similarity of amino acid 3D micro-environments. The algorithm recovers 90% of known mutant structures in the ProTherm data set. A search for all amino acids in one input structure against the entire PDB is performed in <10 seconds for most cases and ensembles of the mutated structures are generated automatically. On average we find 208 mutants per query protein of which only 11% are available in ProTherm illustrating the large amount of experimental mutant structures that can be mined for protein design and method development. The new method will be made available as part of our online platform https://proteins.plus.

12:00-12:20
Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness
Format: Pre-recorded with live Q&A

Moderator(s): Douglas Pires

  • Martin Schwersensky, Université Libre de Bruxelles, Belgium
  • Marianne Rooman, Université Libre de Bruxelles, Belgium
  • Fabrizio Pucci, Université Libre de Bruxelles, Belgium

Presentation Overview: Show

How, and to what extent, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in molecular evolution. We addressed this issue by estimating the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, and through available experimental stability and fitness data. Our results highlight the multiscale dependence of mutational robustness.
At the residue level, we found the protein surface to be more robust against random mutations than the core, especially for small proteins. Destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, and stabilizing mutations are about 4% in both.
At the genetic code level, we observed smaller destabilization for substitutions of codon base III, followed by base I, bases I&III, base II, and other multiple base substitutions. This ranking anticorrelates with the codon-anticodon mispairing frequencies of the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, and even more so of translation errors.
At the codon level, codon usage and codon usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues.

12:40-13:00
ASSESSING THE CONSERVATION OF LARGE-SCALE CONFORMATIONAL MOVEMENTS IN HOMOLOGOUS PROTEINS USING A NOVEL METRIC BASED ON DIFFERENCE DISTANCE MAPS
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Mallika Iyer, Sanford Burnham Prebys Medical Discovery Institute, United States
  • Lukasz Jaroszewski, Biosciences Division, University of California Riverside School of Medicine, United States
  • Zhanwen Li, Biosciences Division, University of California Riverside School of Medicine, United States
  • Mayya Sedova, Biosciences Division, University of California Riverside School of Medicine, United States
  • Adam Godzik, Biosciences Division, University of California Riverside School of Medicine, United States

Presentation Overview: Show

We know that homologous proteins share similar sequences, structures and (often) functions. However, proteins exist in not one, but a multitude of conformations. Transitions between these conformations occur via large- and small-scale movements and are intrinsic to protein function. However, the relationship between evolutionary distance and the similarity of large-scale conformational movements in homologous proteins has not been systematically assessed. Here, we begin to do so using X-ray crystal structures deposited in the Protein Data Bank (PDB). For many individual proteins, the PDB contains multiple coordinate sets representing their distinct conformations that can be used to study their large-scale movements. Therefore, for each protein with two conformations deposited in the PDB, we created a difference distance map (DDM) representing the conformational difference/movement between them. We then compared the DDMs of homologous protein pairs and calculated the correlation between them to quantify their similarity. We found that as sequence identity increases the DDM correlation, and thus the similarity in the conformational movements, also increases. These results can be used to inform structure modeling methods, where instead of modeling just a single conformation of the target protein, we can model functionally relevant conformational movements based on those of its homologs.

13:00-13:20
SenseNet: A Cytoscape3-plugin for analysis of MD-based interaction networks
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Markus Schneider, Technical University of Munich, Germany
  • Iris Antes, Technical University of Munich, Germany

Presentation Overview: Show

Computational methods play a key role for investigating allosteric mechanisms in proteins, with the potential of generating valuable insights for innovative drug design. Here we present the SenseNet framework for analysis of protein structure networks, which differs from established network models by focusing on interaction timelines obtained by molecular dynamics simulations. This approach is evaluated by predicting allosteric residues reported by NMR experiments in the PDZ2 domain of hPTP1e, a reference system for which previous predictions have shown considerable variance. We applied models based on the mutual information between interaction timelines to estimate the conformational influence of each residue on its local environment. In terms of accuracy our prediction model is comparable to the top performing model published for this system, but does not rely on NMR structures as the others. Our results are complementary to experiments and the consensus of previous predictions, demonstrating the potential of our analysis tool. Biochemical interpretation of our model suggests that allosteric residues in the PDZ2 domain form two distinct clusters of contiguous sidechain surfaces. SenseNet is provided as a plugin for the network analysis software Cytoscape, contributing to a system of compatible tools bridging the fields of system and structural biology.

13:20-13:40
Dynamic networks improve protein structural classification
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Khalique Newaz, University of Notre Dame, United States
  • Jacob Piland, University of Notre Dame, United States
  • Patricia Clark, University of Notre Dame, United States
  • Scott Emrich, Univerisity of Tennessee, United States
  • Jun Li, University of Notre Dame, United States
  • Tijana Milenkovic, University of Notre Dame, United States

Presentation Overview: Show

Protein structural classification (PSC) is a supervised problem of assigning proteins into pre-defined structural (e.g., CATH or SCOPe) classes based on the proteins' sequence or 3D structural features. We recently proposed PSC approaches that model protein 3D structures as protein structure networks (PSNs) and analyze PSN-based protein features, which performed better than or comparable to state-of-the-art sequence or other 3D structure-based approaches in the task of PSC. However, existing PSN-based PSC approaches model the whole 3D structure of a protein as a static PSN. Because folding of a protein is a dynamic process, where some parts of a protein fold before others, modeling the 3D structure of a protein as a dynamic PSN can further help improve the existing PSC performance. Here, we propose a novel way to model 3D structures of proteins as dynamic PSNs, with the hypothesis that this will improve upon the current state-of-the-art PSC approaches that are based on static PSNs (and thus upon the existing state-of-the-art sequence and other 3D structural approaches). Indeed, we confirm this on 71 datasets spanning a large set of ~44,000 protein domains from CATH and SCOPe.

13:40-14:00
Graph embeddings for protein structural comparison
Format: Pre-recorded with live Q&A

Moderator(s): Rafael Josef Najmanovich

  • Vladimir Gligorijevic, Flatiron Institute, United States
  • Daniel Berenberg, New York University, United States
  • Richard Bonneau, New York University, United States

Presentation Overview: Show

In the age of big data, protein fold recognition methods are required to behave reliably and efficiently at scale. Algorithmic developments in Deep Learning, namely graph- and 2D- convolutional networks, are celebrated in a variety of different contexts including automatic feature extraction of biological data. While empirically these methods have shown state-of-the-art performance amongst one another, there remains a distinct lack of comparison to earlier methods relying on manually engineered features. We present a comparison study of existing deep learning-based methods to simpler graphlet-based approaches in order to fairly assess the general utility of deemed state-of-the-art machine learning algorithms. Doing so will provide a definitive categorization of structural comparison methods and alert researchers to current limitations in the most recent approaches.

14:20-14:40
Three dimensional computational visualization of a distinct chromatin loop in human lymphoblastoid cells by super resolution imaging
Format: Pre-recorded with live Q&A

Moderator(s): Markus Schneider

  • Zofia Parteka, Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland, Poland
  • Jacqueline Jufen Zhu, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06030, USA, United States
  • Byoungkoo Lee, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06030, USA, United States
  • Karolina Jodkowska, Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland, Poland
  • Ping Wang, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06030, USA, United States
  • Jesse Aaron, Advanced Imaging Center, Janelia Research Campus, Howard Hughes Medical Institute, United States
  • Teng-Leong Chew, Advanced Imaging Center, Janelia Research Campus, Howard Hughes Medical Institute, United States
  • Yijun Ruan, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06030, USA, United States
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland, Poland

Presentation Overview: Show

The three-dimensional (3D) genome structure plays a fundamental role in gene regulation and cellular functions. Recent studies in 3D genomics inferred the very basic functional chromatin folding structures known as chromatin loops, the long-range chromatin interactions that are mediated by protein factors. To visualize the looping structure of chromatin, we applied interferometric photoactivated localization microscopy (iPALM) to image a specific chromatin loop in human lymphoblastoid cells. Totally, we have generated thirteen good-quality images of the target chromatin region with super-resolution. To reconstruct the chromatin structures from captured images, we modeled them as continuous looping conformations using a traveling salesman problem solver (TSP). We then compared the physical distances in image models with contact frequencies generated by ChIA-PET and Hi-C to examine the concordance. While showing a good correlation with genomic sequencing data, image models reveal heterogeneity between individuals and the fine structures within the loop in single cells.

14:40-15:00
All-Atom Molecular Simulations of a Type II DNA Topoisomerase Molecular Motor
Format: Pre-recorded with live Q&A

Moderator(s): Markus Schneider

  • Andrej Perdih, National Institute of Chemistry, Slovenia
  • Matic Pavlin, University of Ljubljana, Faculty of Electrical Engineering, Slovenia
  • Katja Valjavec, National Institute of Chemistry, Slovenia
  • Barbara Herlah, National Institute of Chemistry, Slovenia

Presentation Overview: Show

Type II DNA topoisomerases are complex molecular motors, which manage the topological states of the DNA in the cell and are crucial players in fundamental cellular processes such as cell division. Since the human type II topoisomerase isoform α has a higher expression level in rapidly proliferating cells, including cancer cells, targeting these molecular motors is also considered as an attractive approach in cancer chemotherapy.
By applying all-atom molecular dynamics (MD) simulations, we investigated the full-length type II topoisomerase molecular motor to increase the understanding of the structure - function relationship. Starting from the available crystal structure of a the topoisomerase IIα from Saccharomyces cerevisiae (PDB ID 4GFH), we additionally constructed two more configurations of this molecular motor and performed a μs-long simulations of each system and subsequently comprehensively analyzed the observed behavior of all three states. The results provide us with new basic understanding of a variety of possible conformational changes these huge systems can undergo. Finally, based on the obtained results we mapped the simulated configurations of all three structures to the steps of the hypothetical catalytic cycle via which type II topoisomerases perform its function.

Tuesday, July 27th
11:00-11:20
PREDICTING CONFORMATIONAL B-CELL EPITOPES USING GRAPH-BASED SIGNATURES
Format: Pre-recorded with live Q&A

Moderator(s): Iris Antes

  • Bruna Moreira da Silva, The University of Melbourne, Australia
  • David B. Ascher, The University of Melbourne, Australia
  • Douglas E. V. Pires, The University of Melbourne, Australia

Presentation Overview: Show

Accurate identification of B-cell epitopes is crucial for disease control, diagnostics and vaccine development, but in general experimental approaches are expensive, time consuming and low throughput.

Here we present epitope3D, a new machine learning method trained and validated on the largest conformational epitope data set collected to date, outperforming available alternative approaches, achieving MCC and F1-scores of 0.55 and 0.57 on cross-validation and 0.45 and 0.36 during independent blind tests, respectively.

epitope3D uses the concept of graph-based structural signatures to better model and distinguish epitope from non-epitope regions.

11:20-11:40
Sequence-based Interface Prediction for Conformational Epitopes
Format: Pre-recorded with live Q&A

Moderator(s): Iris Antes

  • Sanne Abeln, Vrije Universiteit Amsterdam, Netherlands
  • Katharina Waury, Vrije Universiteit Amsterdam, Netherlands
  • K. Anton Feenstra, Vrije Universiteit, Amsterdam, Netherlands
  • Qingzhen Hou, Shandong University, China
  • Bas Stringer, Vrije Universiteit Amsterdam, Netherlands
  • Henriette Capel, VU University Amsterdam, Netherlands
  • Reza Haydarlou, Vrije Universiteit Amsterdam, Netherlands
  • Jaap Heringa, Vrije Universiteit Amsterdam, Netherlands

Presentation Overview: Show

Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen's epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments towards most promising epitope regions. Here, we extend our previously developed sequence-based homo- & heteromeric PPI interface predictors, to predict epitope residues that have the potential to bind an antibody.

We collected and curated a high quality epitope dataset from the SAbDab database. We trained a random forest model on this epitope dataset, reaching an AUC-ROC of 0.7. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the SARS-CoV-2 virus spike receptor binding domain, our predictor reaches AUC 0.778. We furthermore show our SeRenDIP-CE predictions are stable with respect to protein length, and also for transmembrane and disordered proteins.

We added the SeRenDIP-CE Conformational Epitope predictors to our webserver (http://www.ibi.vu.nl/programs/serendipwww/), which is simple to use and only requires a single antigen sequence as input. This will help make the method immediately applicable in a wide range of biomedical and biomolecular research.

11:40-12:00
DLAB - Deep learning methods for structure-based virtual screening of antibodies
Format: Pre-recorded with live Q&A

Moderator(s): Iris Antes

  • Constantin Schneider, University of Oxford, United Kingdom
  • Andrew Buchanan, AstraZeneca, United Kingdom
  • Bruck Taddese, AstraZeneca, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom

Presentation Overview: Show

Antibodies are one of the most important classes of pharmaceuticals, with over 80 approved molecules currently in use against a wide variety of diseases.
Despite this importance, the discovery and development process for antibody therapeutics remains reliant on high-throughput experimental screens, which are both cost- and time-intensive.

Recently, several machine learning studies on antibody sequence data have demonstrated the potential of machine learning approaches for the development of computational tools to support the antibody therapeutics discovery pipeline. Here, we describe the development of a framework for structure-based deep learning for antibodies (DLAB), which can be used to learn from structural data on antibody-antigen complexes. We demonstrate that DLAB can be used both to improve antibody-antigen docking and for structure-based virtual screening of antibody drug candidates.

12:00-12:20
TCRen: a statistical potential for residue interaction that accurately predicts TCR:peptide recognition
Format: Pre-recorded with live Q&A

Moderator(s): Iris Antes

  • Vadim Karnaukhov, Skoltech, Russia
  • Dmitrii Shcherbinin, Shemyakin–Ovchinnikov Institute of bioorganic chemistry RAS, Russia
  • Anton Chugunov, Shemyakin–Ovchinnikov Institute of bioorganic chemistry RAS, Russia
  • Ivan Zvyagin, Shemyakin–Ovchinnikov Institute of bioorganic chemistry RAS, Russia
  • Dmitrii Chudakov, Shemyakin–Ovchinnikov Institute of bioorganic chemistry RAS, Russia
  • Roman Efremov, Shemyakin–Ovchinnikov Institute of bioorganic chemistry RAS, Russia
  • Mikhail Shugay, Shemyakin–Ovchinnikov Institute of bioorganic chemistry RAS, Russia

Presentation Overview: Show

T-cell receptor (TCR) recognition of foreign peptides presented by major histocompatibility complex (MHC) proteins is a crucial step in triggering the adaptive immune response. Prediction of TCR:peptide recognition is important for many clinically relevant problems: prediction of cross-reactivity of TCRs used in adoptive T-cell-based therapies, identification of targets for antigen-specific therapies of autoimmune disorders, vaccine design. In this work, we propose a knowledge-based potential TCRen that can be used to assess binding probability between TCRs and cognate antigens. TCRen is derived from statistics of amino acid residue contacts between peptides and TCRs in crystal structures of TCR-peptide-MHC complexes from PDB. We demonstrate excellent performance of TCRen for two tasks related to TCR-peptide recognition: 1) discrimination between real and mocked TCR-peptide-MHC complexes; 2) discrimination between cognate epitope and unrelated peptides in TCR-peptide-MHC crystal structures. Comparison of TCRen with potentials describing general protein-protein interaction and protein folding rules reveals the distinctive features of TCR-peptide interactions, such as intrinsic asymmetry of the interface, complex interplay between different physicochemical properties of contacting residues and lower impact of hydrophobic interactions. We suppose TCRen may be further used in development of novel tools for prediction of TCR specificity and cross-reactivity and modeling of TCR-peptide-MHC structures.

12:40-13:00
Proceedings Presentation: Predicting MHC-peptide binding affinity by differential boundary tree
Format: Pre-recorded with live Q&A

Moderator(s): Okke Melse

  • Jianzhu Ma, Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA, United States
  • Peiyuan Feng, Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China, China
  • Jianyang Zeng, Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China, China

Presentation Overview: Show

The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false positive rates in practical applications since in most cases, a single residue mutation may largely alter the binding affinity of peptide binding to MHC which can not be identified by conventional deep learning methods.
We developed a differential boundary tree model, named DBTpred to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We developed a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied on large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity.

13:00-13:20
Contribution of bioinformatics to blood transfusion: database and 3D intraprotein interaction studies
Format: Pre-recorded with live Q&A

Moderator(s): Okke Melse

  • Aline Floch, Univ Paris Est Creteil, INSERM, IMRB, EFS, France
  • Stéphane Téletchéa, Nantes Université, CNRS UMR 6286, UFIP, France
  • Alexandre G. De Brevern, INSERM UMR-S 1134, DSIMB, University of Paris, France

Presentation Overview: Show

In transfusion medicine, blood group antigens from the Rhesus system play a crucial role. We have provided a more complete and integrative view of this system.
Thanks to an intense curation (>500 articles), a modern database was developed for RhD variants. RHeference (http://www.rheference.org/) contains 2-3 more entries than related databases and multiple queries can be done. It contains more than 10.000 individual pieces of data. It is useful for medical doctors and researchers. (Floch et al, Transf MedRev, 2021)
We have proposed a RhD model, highlighting the interest of 3D structural models to apprehend mutations involved in transfusion problems (de Brevern et al, Transfusion, 2018).
Modeling of trimers provided arguments for the benignity of some known RhD variants in transfusion medicine. (Floch et al, Transfusion, 2021). With Molecular dynamics, we were able to show that the stoichiometry (still unknown) does not influence the dynamics of the monomers. Different parts of the complexes were categorized with classical approaches and more structural alphabet (Melarkode Vattekatte et al, J Struct Biol, 2020), some loops are disordered regions (ms in preparation).
These works provide a baseline of the structural behavior of these proteins to study the hundreds of genetic variants relevant in transfusion medicine.

13:20-13:40
SARS-CoV-2 genome variants epidemiology surveillance in Ethiopia and dynamic mutational change of S Spike protein through computational analysis
Format: Pre-recorded with live Q&A

Moderator(s): Okke Melse

  • Ayele Abaysew, Technology and Innovation Institute, Ethiopia
  • Tesfaye Adisu, Technology and Innovation Institute, Ethiopia
  • Professor Rita Majumdar, Sharda University, India

Presentation Overview: Show

SARS CoV-2 virus has been a global pandemic since 2019 and 4 thousand death records were registered in Ethiopia as of June, 2021. This study aims to identify the variants of SARS CoV 2 that were circulating in Ethiopia and spot dynamic mutational changes of spike antigenicity based on genome data computational analysis. The genomes from Ethiopia were confirmed to be evolutionary related to RaTG13 and SL-bat coronavirus and Spike receptor sites were conserved. The clade distribution of the genome was reflected as GH, GR and other O and intended for new variants. 3 female samples were detected as variants of concern VUI202012/01GRY B.1.1.7. Despite 21 notable mutations, 71% D614G, 28% D614X, 35% N501Y and 21% NSP5 S284G mutation occurred predominantly in our genome samples and could be antigenicity and infectivity. Mutation on N440K was perceived in a sample and potency resist SER-52 antibody neutralization and vaccine escape.

13:40-14:00
Revealing SARS-CoV-2 protein architectures through in-situ MS proteomics and integrative modeling
Format: Pre-recorded with live Q&A

Moderator(s): Okke Melse

  • Michal Linial, The Hebrew University of Jerusalem, Israel
  • Nir Kalisman, The Hebrew University of Jerusalem, Israel
  • Dina Schneidman, The Hebrew University of Jerusalem, Israel

Presentation Overview: Show

The genome of SARS-CoV-2, the causal virus of the COVID-19 pandemic, encodes 29 proteins. However, only a handful of them is associated with structure and function. In this study, we utilize a novel application called in situ cross-linking mass spectrometry (in situ CLMS) that provides rich spatial information on the structures of proteins as they occur in intact cells. We demonstrate the utility of this approach by targeting three SARS-CoV-2 proteins for which full atomic structures are missing. We show that integrating cross-links with external structural data is sufficient to model the full-length protein. Cells that expressed tagged-Nsp1 were subjected to in situ CLMS approach. We identified the interactions of Nsp1 with the 40S ribosomal subunit which confirms its fundamental role in blocking translation of infected cells. Similarly, based on structure predictions of individual domains for Nsp2 by AlphaFold2, we successfully assembled Nsp2 into a single consistent model. The Nucleocapsid (N) protein plays a key role in genome packing and virion assembly. Using in situ CLMS was fundamental to assemble a model of the full dimer from available 3D structures of individual domains. These results highlight the importance of cellular context for achieving detailed atomic resolution of SARS-CoV-2 proteins.

14:20-14:40
Energetic Local Frustration across NMR Structures in the Protein Data Bank
Format: Pre-recorded with live Q&A

Moderator(s): Iris Antes

  • Atilio O Rausch, Facultad de Ingenieria, Universidad Nacional de Entre Rios, Oro Verde, Argentina, Argentina
  • Alexander M. Monzon, Department of Biomedical Sciences, University of Padua, Padua, Italy, Italy
  • Leandro G. Radusky, Center for Genomic Regulation, Barcelona Institute for Science and Technology, Barcelona, Spain, Spain
  • R. Gonzalo Parra, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany, Germany

Presentation Overview: Show

Natural Proteins spontaneously fold by globally minimizing their internal conflicts. However, 10-15% of their residue-residue interactions are in strong energetic conflict or “highly frustrated”. Such frustration sculpts protein dynamics and allows proteins to explore the basin of their energy landscapes. NMR spectroscopy provides information on proteins in solution and hence, enables the study of the ensemble of structures that constitute the native state. Thanks to our brand new FrustratometeR (Rausch et al, Bioinformatics 2021), we have been able to analyse all NMR entries in the PDB to study the links between local frustration and structural flexibility.

Results:
We have analysed 3191 non redundant proteins with a median number of 20 models per PDB entry. We have computed local frustration for all ensembles and analysed which are the typical frustration variations to be expected in NMR structures. We correlated local frustration values with amino acid identity as well as with many structural properties such as RMSF, disorder propensity, amino acid identity and others.

Significance:
We present the first work that studies local frustration in the context of conformational dynamics of the native state. Our work contributes to better understand the importance of local frustration for protein motion and function.

14:40-15:00
Final Remarks
Format: Live-stream

Moderator(s): Iris Antes



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube