Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide




Schedule subject to change
Wednesday, July 15th
10:40 AM-11:00 AM
Deep Learning of Protein Structural Classes: Any Evidence for an 'Urfold'?
Format: Pre-recorded with live Q&A

  • Eli Draizen, University of Virginia, United States
  • Menuka Jaiswal, University of Virginia, United States
  • Saad Saleem, University of Virginia, United States
  • Yonghyeon Kweon, University of Virginia, United States
  • Stella Veretnik, University of Virginia, United States
  • Cameron Mura, University of Virginia, United States
  • Philip Bourne, University of Virginia, United States

Presentation Overview: Show

Recent advances in protein structure determination and prediction offer new opportunities to decipher relationships amongst proteins--a task that entails 3D structure comparison and classification. Historically, protein domain classification has been somewhat manual and heuristic. While CATH and related resources represent significant steps towards a more systematic (and automatable) approach, more scalable and objective classification methods, e.g., grounded in machine learning, could be informative. Indeed, comparative analyses of protein structures via Deep Learning (DL), though it may entail large-scale restructuring of classification schemes, could uncover distant relationships. We have developed new DL models for domain structures (including physicochemical properties), focused initially at CATH's homologous superfamily (SF) level. Adopting DL approaches to image classification and segmentation, we have devised and applied a hybrid convolutional autoencoder architecture that allows SF-specific models to learn features that, in a sense, 'define' the various homologous SFs. We quantitatively evaluate pairwise 'distances' between SFs by building one model per SF and comparing the loss functions of the models. Clustering on these distance matrices provides a new view of protein interrelationships--a view that extends beyond simple structural/geometric similarity, towards the realm of structure/function properties, and that is consistent with a recently proposed 'Urfold' concept.

11:00 AM-11:10 AM
EM Map Segmentation and De Novo Protein Structure Modeling for Multiple Chain Complexes with MAINMAST
Format: Pre-recorded with live Q&A

  • Genki Terashi, Department of Biological Sciences, Purdue University, United States
  • Yuki Kagaya, Tohoku University, Japan
  • Daisuke Kihara, Purdue University, United States

Presentation Overview: Show

The significant progress of cryo-electron microscopy (cryo-EM) poses a pressing need for software for
structural interpretation of EM maps. Methods for map segmentation is particularly needed for the modeling
because most of the modeling methods are designed for building a single protein structure. Here, we developed
new software, MAINMASTseg, for segmenting maps with symmetry. Unlike existing segmentation methods
that merely consider densities in an input EM map, MAINMASTseg captures underlying molecular structures
by constructing a skeleton that connects local dense points in the map. MAINMASTseg performed significantly
better than other popular existing methods.

11:10 AM-11:20 AM
Protein Contact Map De-noising Using Generative Adversarial Networks
Format: Pre-recorded with live Q&A

  • Yuki Kagaya, Tohoku University, Japan
  • Daisuke Kihara, Purdue University, United States
  • Sai Raghavendra Maddhuri Venkata Subramaniya, Purdue University, United States
  • Aashish Jain, Purdue University, United States

Presentation Overview: Show

Protein residue-residue contact prediction from protein sequence information has made substantial improvement in the past years and has been a driving force in the protein structure prediction field. In this work, we propose a novel contact map denoising method, ContactGAN, which uses Generative adversarial networks (GAN). ContactGAN takes a predicted protein contact map as input and outputs a refined, more accurate contact map. On a test set of 43 protein domains from CASP13, ContactGAN showed an average improvement of 24% in precision values of L/1 long contacts.

11:20 AM-11:30 AM
Deep Learning Protein Contacts and Real-valued Distances Using PDNET
Format: Pre-recorded with live Q&A

  • Badri Adhikari, University of Missouri-St. Louis, United States

Presentation Overview: Show

As deep learning algorithms drive the progress in protein structure prediction, a lot remains to be studied at this emerging crossway of deep learning and protein structure prediction. Recent findings show that inter-residue distance prediction, a more granular version of the well-known contact prediction problem, is a key to predict accurate models. We believe that deep learning methods that predict these distances are still at infancy. To advance these methods and develop other novel methods, we need a small and representative dataset packaged for fast development and testing. In this work, we introduce Protein Distance Net (PDNET), a dataset derived from the widely used DeepCov dataset consisting of 3456 representative protein chains. It is packaged with all the scripts that were used to curate the dataset, generate the input features and distance maps, and scripts with deep learning models to train, validate, and test. Deep learning models can also be trained and tested in a web browser using free platforms such as Google Colab. We discuss how our framework can be used to predict contacts, distance intervals, and real-valued distances. PDNET is available at https://github.com/ba-lab/pdnet/.

11:30 AM-11:40 AM
Redundancy-Weighting the PDB for Detailed Secondary Structure Prediction
Format: Pre-recorded with live Q&A

  • Tomer Sidi, Ben-Gurion University of the Negev, Israel
  • Chen Keasar, Ben Gurion University of the Negev, Israel

Presentation Overview: Show

The Protein Data Bank (PDB), the ultimate source for data in structural biology, is inherently imbalanced. To alleviate biases, virtually all structural biology studies use non-redundant subsets of the PDB, which include only a fraction of the available data. An alternative approach, dubbed redundancy-weighting, down-weights redundant entries rather than discarding them. This approach may be particularly helpful for Machine Learning (ML) methods that use the PDB as their source for data.

Current state-of-art methods for Secondary Structure Prediction of proteins (SSP) use non-redundant datasets to predict either 3-letter or 8-letter secondary structure annotations. The current study challenges both choices: the dataset and alphabet size. Non-redundant datasets are presumably unbiased, but are also inherently small, which limits the machine learning performance. On the other hand, the utility of both 3- and 8-letter alphabets is limited by the aggregation of parallel, anti-parallel, and mixed beta-sheets in a single class. Each of these subclasses imposes different structural constraints, which makes the distinction between them desirable. In this study we show improvement in prediction accuracy by training on a redundancy-weighted dataset. Further, we show the information content is improved by extending the alphabet to consider beta subclasses while hardly effecting SSP accuracy.

12:00 PM-12:40 PM
3DSIG Keynote: Thinking Deeply About Protein Structure Prediction
Format: Live-stream

  • David Jones, University College London, United Kingdom

Presentation Overview: Show

In this talk I will overview the astonishing recent progress in protein structure prediction that has arisen from better modelling of amino acid covariation effects, and most recently, the application of deep neural networks to the problem e.g. with Deepmind's AlphaFold. Bringing things up to the present day, I will discuss some of the recent method developments we have been experimenting with in my own lab, and discuss how close we are to being able to model the structure for the entire complement of proteins encoded by a bacterial genome, and the interactions between them. I will finish by discussing the future prospects for these techniques and discuss some of the current limitations, along with some intriguing new methods that are coming down the line that might be able to take us further.

2:00 PM-2:10 PM
De Novo Protein Design for Novel Folds with Guided & Conditional Wasserstein GAN
Format: Pre-recorded with live Q&A

  • Mostafa Karimi, Texas A&M University, United States
  • Shaowen Zhu, Texas A&M University, United States
  • Yue Cao, Texas A&M University, United States
  • Yang Shen, Texas A&M University, United States

Presentation Overview: Show

Facing data quickly accumulating on protein sequence and structure, this study is addressing the following question: to what extent could current data alone reveal deep insights into the sequence-structure relationship, such that new sequences can be designed accordingly for novel structure folds? We have developed novel deep generative models, constructed low-dimensional representation of fold space, exploited sequence data with and without paired structures, and developed ultra-fast fold predictor (oracle) as feedback. The resulting semi-supervised gcWGAN (guided & conditional, Wasserstein Generative Adversarial Networks), assessed by the oracle over 100 novel folds not in the training set, generates more yields and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). gcWGAN designs are predicted to be physically and biologically sound. Targeting representative novel folds, including one not even part of basis folds, gcWGAN designs are predicted by Rosetta to have comparable or better fold accuracy; yet have much more sequence diversity and sometimes novelty. The ultra-fast data-driven model is shown to boost the success of principle-driven Rosetta for de novo design, through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning from current sequence-structure data.

2:10 PM-2:20 PM
The vestibule role of membrane-water interface as the intermediate stage in a new three-stage model for helical membrane protein folding
Format: Pre-recorded with live Q&A

  • Bridget Kawamala, CSUN, United States
  • Ravinder Abrol, California State University, Northridge, United States

Presentation Overview: Show

Transmembrane alpha-helical (TMH) proteins play critical roles in cellular signaling. They display a diversity of structural folds featuring almost-parallel orientation of TM helices packing into helical bundles. The membrane environment enormously reduces the accessible conformational landscape for folding, but also makes its experiments challenging. The contribution of helix insertion energies to the folding energy landscape was computed using structural bioinformatics based hydropathy analysis for most of the polytopic helical membrane proteome (from 1-TMH to 24-TMH proteins with structures). The magnitudes of TM helix insertion energies from Water to membrane-water Interface (WAT→INT energies) are on average half of those insertion energies from water to Trans-Membrane-Helix orientation (WAT→TMH energies), suggesting a potential vestibule role of the membrane-water interface for the TM helices after translocon exit. This is confirmed by showing the stability of very hydrophobic TM helices in the membrane-water interface through multiple microsecond long molecular dynamics simulations of a stop-transfer helix, a re-integration helix, and a pre-folded helical-hairpin from the ribosomal exit vestibule. So, a three-stage folding model is proposed to extend Popot-Engelman’s original two-stage model, where the membrane-water interface acts as the intermediate stage holding vestibule for translated TM helices, reconciling the interface’s critical role seen in many previous studies.

2:20 PM-2:40 PM
Detecting Symmetry in Membrane Proteins
Format: Pre-recorded with live Q&A

  • Lucy Forrest, NINDS - NIH, United States
  • Antoniya Aleksandrova, NINDS - NIH, United States
  • Edoardo Sarti, CNRS - Sorbonne Université, France

Presentation Overview: Show

Available membrane protein structures have revealed an abundance of symmetry and pseudo-symmetry, which arose not only by the formation of multi-subunit assemblies, but also by repetition of internal structural elements. In many cases, these symmetry relationships play a crucial role in defining the functional properties of the proteins. Therefore, a systematic study of symmetry should provide a framework for a broader understanding of the mechanistic principles and evolutionary development of membrane proteins. However, available symmetry detection methods have not been tested systematically on this class of proteins because of the lack of an appropriate benchmark set. Hence, we collected membrane protein structures with unique architectures and manually curated their symmetries to create the MemSTATS dataset. Using MemSTATS, we compared the performance of four widely used symmetry detection algorithms and pinpointed areas for improvement. To address the identified shortcomings, we developed a robust symmetry detection methodology called MSSD, which takes into consideration the restrictions that the lipid bilayer places on protein structures. MSSD detected symmetries with higher accuracy and lower false positive rate compared to any other tested method. Consequently, we used MSSD to analyze all available membrane protein structures and presented the resultant symmetries in a database called EncoMPASS (encompass.ninds.nih.gov).

2:40 PM-3:00 PM
Evolutionary pathways of repeat protein topology in bacterial outer membrane proteins
Format: Pre-recorded with live Q&A

  • Joanna Slusky, University of Kansas, United States
  • Meghan Franklin, University of Kansas, United States
  • Sergey Nepomnyachiy, University of Haifa, Israel
  • Ryan Feehan, University of kansas, United States
  • Nir Ben Tal, Tel-Aviv University, Israel
  • Rachel Kolodny, University of Haifa, Israel

Presentation Overview: Show

Outer membrane proteins (OMPs) are the proteins in the surface of Gram-negative bacteria. These proteins have diverse functions but a single topology: the β-barrel. Sequence analysis has suggested that this common fold is a β-hairpin repeat protein, and that amplification of the β-hairpin has resulted in 8–26-stranded barrels. Using an integrated approach that combines sequence and structural analyses, we find events in which non-amplification diversification also increases barrel strand number. Our network-based analysis reveals strand-number-based evolutionary pathways, including one that progresses from a primordial 8-stranded barrel to 16-strands and further, to 18-strands. Among these pathways are mechanisms of strand number accretion without domain duplication, like a loop-to-hairpin transition. These mechanisms illustrate perpetuation of repeat protein topology without genetic duplication, likely induced by the hydrophobic membrane. Finally, we find that the evolutionary trace is particularly prominent in the C-terminal half of OMPs, implicating this region in the nucleation of OMP folding.

3:20 PM-3:30 PM
BIO-GATS: A tool for automated GPCR template selection through a biophysical approach for homology modeling.
Format: Pre-recorded with live Q&A

  • Shoba Ranganathan, Macquarie University, Australia
  • Amara Jabeen, Macquarie University, Australia
  • Ramya Vijayram, Indian Institute of Technology Madras, Chennai, India

Presentation Overview: Show

G Protein coupled receptors (GPCRs) are the largest membrane proteins family comprised of seven transmembrane (TM) domains and more than 800 members. GPCRs are involved in numerous physiological functions within the human body and are the target of more than 30% of the US Food and Drug Administration approved drugs. At present, 64 unique receptors have known experimental structures. The absence of experimental structure of majority GPCRs demands homology models of GPCRs for structure-based drug discovery workflows. Homology model requires appropriate templates. The common methods for template selection considers sequence identity. However, there exist low sequence identity among the TM domains of GPCRs. The sequences with similar pattern of hydrophobic residues are often structural homologues even sharing low sequence identity. We have proposed a novel biophysical approach for template selection based on hydrophobicity correspondence between the target and the template. The approach takes into consideration the other parameters as well including sequence identity, resolution, and query coverage for template selection. The proposed approach has been implemented in the form of graphical user interface. We have applied the approach to an olfactory receptor and presented a comprehensive comparison between the templates for the ORs based on our template selection criteria.

3:30 PM-3:40 PM
Modeling of G protein-coupled receptor structures : Improving the prediction of loop conformations and the usability of models for structure-based drug design
Format: Pre-recorded with live Q&A

  • Bhumika Arora, Indian Institute of Technology Bombay, Monash University, and IITB-Monash Research Academy, India

Presentation Overview: Show

G protein-coupled receptors (GPCRs) form the largest group of potential drug targets and therefore, the knowledge of their three-dimensional structure is important for rational drug design. Homology modeling serves as a common approach for modeling the transmembrane helical cores of GPCRs, however, these models have varying degrees of inaccuracies that result from the quality of template used. We have explored the extent to which inaccuracies inherent in homology models of the transmembrane helical cores of GPCRs can impact loop prediction. We found that loop prediction in GPCR models is much more difficult than loop reconstruction in crystal structures owing to the imprecise positioning of loop anchors. Therefore, minimizing the errors in loop anchors is likely to be critical for optimal GPCR structure prediction. To address this, we have developed a Ligand Directed Modeling (LDM) method comprising of geometric protein sampling and ligand docking. The method was evaluated for capacity to refine the GPCR models built across a range of templates with varying degrees of sequence similarity with the target. LDM reduced the errors in loop anchor positions and improved the prediction of binding poses of ligands, resulting in much better performance of these models in virtual ligand screenings.

3:40 PM-3:50 PM
Nanocapsule Designs for Antimicrobial Resistance
Format: Pre-recorded with live Q&A

  • Irene Marzuoli, King's College London, United Kingdom
  • F Fraternali, Randall Division of Cell and Molecular Biophysics, King’s College London, United Kingdom
  • Carlos Cruz, Instituto de Tecnologia Química e Biológica António Xavier (ITQB), Portugal

Presentation Overview: Show

Antimicrobial resistance and drug delivery have been main focuses of the recent medical research. Recently engineered virus-like nanocapsules derived from synthetic multi branched peptides have been shown to promote bacterial membrane poration and to be suitable for gene delivery at the same time [1].
The atomistic details of the nanocapsule assembly, necessary for the antimicrobial and gene delivery activities, are not accessible to experimental techniques. Therefore, the nanocapsule stability in water and its interaction with a model membrane was studied through Molecular Dynamics simulations, comparing the results with the available experimental data [2].
Integrated results from simulations at different resolutions highlighted the role of the amphiphilic structure of capzip as driven promoter of the assembly stability. Moreover, simulations highlighted a strong affinity with a bacterial model membrane and lower with a mammalian one. This results in bacterial membrane poration in presence of an electric field, a process triggered by the insertion of Arginine residues, which are abundant in the structure. This investigation shows the essential role of computational techniques in rationalizing the experimental results and suggests how to manipulate capzip composition in order to trigger particular functions.

1. Chem. Sci., 7(3):1707–1711, 2016.
2. ACS Nano, 14(2):1609-1622, 2020.

3:50 PM-4:00 PM
GeoMine: A Web-Based Tool for Chemical Three-Dimensional Searching of the PDB
Format: Pre-recorded with live Q&A

  • Joel Graef, Universität Hamburg - Center for Bioinformatics (ZBH), Germany
  • Konrad Diedrich, Universität Hamburg - Center for Bioinformatics (ZBH), Germany
  • Katrin Schöning-Stierand, Universität Hamburg - Center for Bioinformatics (ZBH), Germany
  • Matthias Rarey, Universität Hamburg - Center for Bioinformatics (ZBH), Germany

Presentation Overview: Show

The relative arrangement of functional groups and the shape of protein binding sites are the key elements to comprehend a protein’s function. Interactive searching for these three-dimensional patterns is an important tool in life science research, however highly challenging from the computational point of view. This problem is addressed by only a few tools limited in terms of query variability, adjustable search sets, retrieval speed and user friendliness. Here, we present GeoMine, a computational approach enabling spatial geometric queries with full chemical awareness on a regularly updated database containing protein-ligand interfaces of the entire PDB. Due to the use of modern algorithms and database technologies, reasonable queries can be searched in up to a few minutes. With a GeoMine query, almost any relative atom arrangement can be searched. GeoMine is implemented as a publicly available web service within ProteinsPlus (https://proteins.plus). The user interface provides an interactive 3D panel that allows an easy design of queries either from scratch or based on a 3D representation of an existing protein-ligand complex. GeoMine opens a plethora of data analytics opportunities on protein structures, a few of them showcased in this presentation.

4:00 PM-4:20 PM
Generating Property-Matched Decoy Molecules Using Deep Learning
Format: Pre-recorded with live Q&A

  • Charlotte Deane, University of Oxford, United Kingdom
  • Fergus Imrie, University of Oxford, United Kingdom
  • Anthony Bradley, Exscientia Ltd, United Kingdom
  • Mihaela van der Schaar, University of Cambridge, United Kingdom

Presentation Overview: Show

An essential component in the development of structure-based virtual screening methods is the datasets or benchmarks used for training and testing. These typically consist of experimentally verified active molecules together with assumed inactive molecules, known as decoys.
However, the decoy molecules used in such sets have been shown to exhibit substantial bias in basic chemical properties. In some cases, there is evidence to suggest that some structure-based methods are simply exploiting this bias, rather than learning how to perform molecular recognition. The use of biased decoy molecules therefore is preventing generalisation and hindering the development of structure-based virtual screening methods.
We have developed a deep learning method to generate property-matched decoy molecules, called DeepCoy. This eliminates the need to use a database to search for molecules and allows decoys to be generated for the requirements of a particular active molecule. Using DeepCoy generated molecules reduced the bias in basic physicochemical properties of such decoy molecules by 78% and 65% in the DUD-E and DEKOIS 2.0 databases, respectively.
We believe that this substantial reduction in bias will benefit the development and improve generalisation of structure-based virtual screening methods.

4:20 PM-4:40 PM
In silico selection of RNA aptamers for a target protein based on discriminative classifiers and the Monte-Carlo tree search
Format: Pre-recorded with live Q&A

  • Giltae Song, Pusan national University, South Korea
  • Gwangho Lee, Pusan National University, South Korea

Presentation Overview: Show

Aptamers are polynucleotide or peptide chains folded into a stable structure and useful for therapeutic applications. SELEX (systematic evolution of ligands by exponential enrichment) is one of the experimental methods to generate the aptamers, but it is too expensive and time-consuming. Recently, there are some in silico attempts to reduce the cost such as an approach based on discriminative classifiers. Some of such methods actually generate the sequences that bind for a target protein, but they produce low quality and specific size candidates only.
In this study, we develop an approach based on the Monte-Carlo Tree Search (MCTS) algorithm for generating the sequences using a score function computed by a discriminative classifier. We evaluate our approach based on three metrics: the minimum free energy (MFE) of aptamer structures, scores in docking simulations, and TM-score_RNA structure similarity scores. Our model shows quite a similar MFE to real aptamers and similar to or better docking scores than other existing methods in the ZDOCK docking simulations. Most of our samples obtain TM-score_RNA scores higher than 0.17 (< 0.17 indicates unrelated random sequences). We believe that our study can substantially reduce the cost and time for generating aptamers.

5:00 PM-5:20 PM
Proceedings Presentation: Geometric Potentials from Deep Learning Improve Prediction of CDR H3 Loop Structures
Format: Pre-recorded with live Q&A

  • Jeffrey A. Ruffolo, Johns Hopkins University, United States
  • Carlos Guerra, George Mason University, United States
  • Sai Pooja Mahajan, Johns Hopkins University, United States
  • Jeremias Sulam, Johns Hopkins University, United States
  • Jeffrey J. Gray, Johns Hopkins University, United States

Presentation Overview: Show

Antibody structure is largely conserved, except for a complementarity-determining region featuring six variable loops. Five of these loops adopt canonical folds which can typically be predicted with existing methods, while the remaining loop (CDR H3) remains a challenge due to its highly diverse set of observed conformations. In recent years, deep neural networks have proven to be effective at capturing the complex patterns of protein structure. This work proposes DeepH3, a deep residual neural network that learns to predict inter-residue distances and orientations from antibody heavy and light chain sequence. The output of DeepH3 is a set of probability distributions over distances and orientation angles between pairs of residues. These distributions are converted to geometric potentials and used to discriminate between decoy structures produced by RosettaAntibody and predict new CDR H3 loop structures de novo. When evaluated on the Rosetta antibody benchmark dataset of 49 targets, DeepH3-predicted potentials identified better, same, and worse structures (measured by root-mean-squared distance [RMSD] from the experimental CDR H3 loop structure) than the standard Rosetta energy function for 33, 6, and 10 targets, respectively, and improved the average RMSD of predictions by 32.1% (1.4 Å). Analysis of individual geometric potentials revealed that inter-residue orientations were more effective than inter-residue distances for discriminating near-native CDR H3 loops. When applied to de novo prediction of CDR H3 loop structures, DeepH3 achieves an average RMSD of 2.2 ± 1.1 Å on the Rosetta antibody benchmark.

5:20 PM-5:40 PM
Proceedings Presentation: QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks
Format: Pre-recorded with live Q&A

  • Md Hossain Shuvo, Auburn University, United States
  • Sutanu Bhattacharya, Auburn University, United States
  • Debswapna Bhattacharya, Auburn University, United States

Presentation Overview: Show

Motivation: Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction.

Results: We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep.

Availability: https://github.com/Bhattacharya-Lab/QDeep

5:40 PM-6:00 PM
Proceedings Presentation: Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
Format: Pre-recorded with live Q&A

  • John Kececioglu, University of Arizona, United States
  • Spencer Krieger, University of Arizona, United States

Presentation Overview: Show

Motivation: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy.

Method: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino-acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically-valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, that estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods.

Results: On challenging CASP benchmarks the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2-10%, and Q3 accuracy by more than 1-3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction.

Availability: A preliminary implementation in a new tool we call Nnessy is available free for non-
commercial use at http://nnessy.cs.arizona.edu.

Thursday, July 16th
10:40 AM-11:00 AM
Quality Assessment of Protein Docking Models Based on Graph Neural Network
Format: Pre-recorded with live Q&A

  • Ye Han, Jilin Agricultrual University, China
  • Fei He, University of Missouri-Columbia, United States
  • Dong Xu, Univ. of Missouri-Columbia, United States

Presentation Overview: Show

Protein docking provides a structural basis for the design of drugs and vaccines. Among the processes of protein docking, quality assessment (QA) is utilized to pick near-native models from numerous protein docking candidate conformations, which directly determines the final docking results. Although extensive efforts have been put to improve QA accuracy, it is still the bottleneck of current protein docking systems. In this paper, we presented a Deep Graph Attention Neural Network (DGANN) to evaluate and rank protein docking candidate models. DGANN learns inter-residue physio-chemical properties and structural fitness across the two protein monomers in a docking model and generates their probabilities of near-native models. On the ZDOCK decoy benchmark, our DGANN outperformed the ranking provided by ZDOCK in terms of ranking good models into the top selections.

11:00 AM-11:20 AM
ProtCID: A data resource for structural information on protein interactions
Format: Pre-recorded with live Q&A

  • Roland Dunbrack, Fox Chase Cancer Center, United States
  • Qifang Xu, Fox Chase Cancer Center, United States

Presentation Overview: Show

Structural information on the interactions of proteins with other molecules is plentiful, and for some proteins and protein families, there may be 100s or even 1000s of available structures. It can be very difficult for a scientist who is not trained in structural bioinformatics to access this information comprehensively. Previously, we developed the Protein Common Interface Database (ProtCID), which provided clusters of the interfaces of full-length protein chains as a means of verifying or suggesting biological assemblies, which differ from crystallographic asymmetric units about 40% of the time. Because proteins consist of domains that act as modular functional units which are often recombined in different genes, we have extended the analysis in ProtCID to the individual domain level. This has greatly increased the number of large protein-protein clusters in ProtCID, enabling the generation of hypotheses on the structures of biological assemblies of many systems. The analysis of domain families allows us to extend ProtCID to the interactions of domains with peptides, nucleic acids, and ligands. ProtCID provides complete annotations and coordinate sets for every cluster.

11:20 AM-11:40 AM
SARS-CoV-2 spike protein predicted to bind strongly to host receptor protein orthologues from mammals, but not fish, birds or reptiles
Format: Pre-recorded with live Q&A

  • F Fraternali, Randall Division of Cell and Molecular Biophysics, King’s College London, United Kingdom
  • I Sillitoe, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Christine Orengo, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Hm Scholes, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Sd Lam, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
  • N Bordin, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Vp Waman, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • P Ashford, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • N Sen, Indian Institute of Science Education and Research, Pune, 411008, India
  • L van Dorp, University College London, United Kingdom
  • C Rauer, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Nl Dawson, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Csm Pang, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • M Abbasian, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Sjl Edwards, University College London, United Kingdom
  • Jg Lees, Oxford Brookes University, United Kingdom
  • Jm Santini, Institute of Structural and Molecular Biology, University College London, United Kingdom

Presentation Overview: Show

The coronavirus disease 2019 (COVID-19) global pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 has zoonotic origin and transmitted to humans via an undetermined intermediate host, leading to widespread infections in human and reported infections in other mammals. To enter host cells, viral spike protein binds to angiotensin-converting enzyme 2 (ACE2), and is processed by transmembrane protease serine 2 (TMPRSS2). Whilst receptor binding contributes to the viral host-range, changes in energy of the spike protein:ACE2 complex in orthologues from other animals have not been widely explored. Here, we analyse interactions between spike protein and orthologues of ACE2 and TMPRSS2 from 215 vertebrate species. We predicted structures for these orthologues, used structures of the spike protein:ACE2 complex to calculate changes in the energy of the complex and correlated these to COVID-19 severities in mammals. Across vertebrate orthologues, mutations are predicted to be more disruptive to the structure of ACE2 than TMPRSS2. Finally, we provide phylogenetic evidence that SARS-CoV-2 has recently transmitted from humans to animals. Our results suggest SARS-CoV-2 can infect a broad range of mammals––but not fish, birds or reptiles––which could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance.

12:00 PM-12:20 PM
DELPHI: accurate deep ensemble model for protein interaction sites prediction
Format: Pre-recorded with live Q&A

  • Yiwei Li, University of Western Ontario, Canada
  • Lucian Ilie, University of Western Ontario, Canada

Presentation Overview: Show

Motivation: Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein-protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods.
Methods and Results: We propose DELPHI (DEep Learning Prediction of Highly probable protein Interaction sites), a new sequence-based deep learning suite for PPI binding sites prediction. DELPHI has an ensemble structure with data augmentation. The model structure combines a convolutional neural network and a recurrent neural network with fine tuning. Three novel features, ProtVec1D, position information, and high-scoring segment pair (HSP), are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programs on five datasets, and DELPHI outperforms competitors in all metrics. The trained model, source code for training, predicting, feature computation, and data processing are made freely available online.
Conclusion: DELPHI has a novel network architecture in which three features are used the first time in this problem. DELPHI is shown to be more accurate than the current state-of-the-art programs. All components of DELPHI are freely available online.

12:20 PM-12:40 PM
Evidence of Antibody Repertoire Functional Convergence through Public Baseline and Shared Response Structures
Format: Pre-recorded with live Q&A

  • Jiye Shi, UCB Pharma, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom
  • Matthew Raybould, University of Oxford, United Kingdom
  • Claire Marks, University of Oxford, United Kingdom
  • Aleksandr Kovaltsuk, University of Oxford, United Kingdom
  • Alan Lewis, GlaxoSmithKline, United Kingdom

Presentation Overview: Show

The antibody repertoires of different individuals ought to exhibit significant functional commonality, given that most pathogens trigger a successful immune response in most people. Sequence-based antibody repertoire analysis based on identifying common genetic origins and high sequence identities has so far offered little evidence for this phenomenon. However, to engage the same epitope, antibodies only require a similar binding site structure and the presence of key paratope interactions, which can occur even when their sequences are dissimilar. Here, we investigate functional convergence in human antibody repertoires by comparing the antibody structures they contain. We first structurally profile baseline antibody diversity, predicting all modellable distinct structures within each repertoire. This analysis uncovers a high degree of structural commonality. For instance, around 3% of distinct structures are common to snapshots from ten unrelated individuals (‘Public Baseline’ structures). We then apply the same structural profiling method to the repertoire snapshots of three individuals before and after flu vaccination, detecting a convergent structural drift indicative of recognising similar epitopes (‘Public Response’ structures). Antibody Model Libraries (AMLs) derived from Public Baseline and Public Response structures represent a powerful geometric basis set of low-immunogenicity candidates exploitable for general or target-focused therapeutic antibody screening.

3:20 PM-3:30 PM
Computational epitope binning of protein binders
Format: Pre-recorded with live Q&A

  • Chris Bailey-Kellogg, Dartmouth College, United States
  • Jarjapu Mahita, Dartmouth College, United States
  • Dong-Gun Kim, Korea Advanced Institute of Science and Technology (KAIST), South Korea
  • Yoonjoo Choi, Korea Advanced Institute of Science and Technology (KAIST), South Korea
  • Hak-Sung Kim, Korea Advanced Institute of Science and Technology (KAIST), South Korea

Presentation Overview: Show

Recent advances in next-generation sequencing technologies have enabled high-throughput characterization of repertoires comprised of protein binders. Attributing the sequence of a protein binder to its function is possible through structural elucidation, a technique unsuitable for large-scale structure determination of protein sequences. Epitope binning is emerging as a versatile tool for these purposes, by enabling identification of binders likely to target similar epitopes on the antigens and subsequent categorization into bins. Limitations of experimental epitope binning due to the vast sequence space necessitates the need for computational methods. We describe a computational epitope binning method that utilizes a scoring scheme developed by us. To test the reliability of this method, we applied it to bin a phage-displayed library of IL6-binding repebodies which are binding scaffolds containing leucine-rich repeat (LRR) modules. Results of our method were validated using experimental epitope binning. We further show how the output of our binning method was used to drive mutagenesis experiments for narrowing down residues contributing to the specificity of each bin. Overall, the results demonstrate the utility of our method and indicate that it is a promising strategy to reliably bin protein binders.

3:30 PM-3:40 PM
How proteins evolved to recognize an ancient nucleotide?
Format: Pre-recorded with live Q&A

  • Nir Ben Tal, Tel-Aviv University, Israel
  • Aya Narunsky, Yale University, United States
  • Amit Kessel, Tel Aviv University, Israel
  • Ron Solan, Tel Aviv University, Israel
  • Vikram Alva, Max Planck Institute for Developmental Biology, Germany
  • Rachel Kolodny, University of Haifa, Israel

Presentation Overview: Show

Proteins’ interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein–adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein–adenine interactions in the Watson–Crick edge of adenine and shows that all of adenine’s edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments (“themes”) that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.

(Abstract taken from: Narunsky, A., Kessel, A., Solan, R., Alva, V., Kolodny, R., & Ben-Tal, N. (2020). On the evolution of protein-adenine binding. Proceedings of the National Academy of Sciences of the United States of America, 117(9), 4701–4709. https://doi.org/10.1073/pnas.1911349117)

3:40 PM-4:00 PM
Nature of long-range evolutionary constraint in enzymes: Insights from comparison to non-catalytic ligand binding sites
Format: Pre-recorded with live Q&A

  • Yu Xia, McGill University, Canada
  • Avital Sharir-Ivry, McGill University, Israel

Presentation Overview: Show

Quantitative evolutionary design principles of enzymes remain elusive on the proteomic scale. Recent studies have uncovered a remarkably long-range evolutionary constraint in enzymes structure in which site-specific evolutionary rate increases with distance from the catalytic site affecting distant sites. Counterpart pseudoenzymes that share the same protein fold but are catalytically inactive exhibit a significantly reduced conservation gradient, showing that the three-dimensional structure of the enzyme does not dictate its unique long-range constraint. Searching for the origin of the evolutionary constraint we systematically studied the magnitude of conservation gradients induced by different types of functional sites in enzymes and other proteins: catalytic sites, non-catalytic ligand binding sites, allosteric binding sites, and protein-protein interaction sites. We show that catalytic sites induce significantly stronger conservation gradients than all other types of non-catalytic binding sites. Notably, the weak conservation gradient induced by non-catalytic binding sites in enzymes is nearly identical in magnitude to those induced by ligand binding sites in non-enzymes. Our results show that the unique constraint from catalytic sites in enzymes is likely driven by the optimization of catalysis rather than ligand binding and allosteric functions. These results shed light on the structural and functional determinants of enzyme evolution.

4:00 PM-4:20 PM
Studying de novo mutations via structural alterations in protein-protein interaction: STXBP1 associated neuronal pathology
Format: Pre-recorded with live Q&A

  • Ehud Banne, Kaplan Medical Center, Rehovot, Israel
  • Esther Brielle, The Hebrew University of Jerusalem, Israel
  • Danielle Klinger, The Hebrew University of Jerusalem, Israel
  • Dina Schneidman-Duhovny, The Hebrew University of Jerusalem, Israel
  • Michal Linial, The Hebrew University of Jerusalem, Israel

Presentation Overview: Show

A large fraction of childhood epilepsy, developmental delays and neurodevelopmental diseases (NDD) is attributed to de novo mutations including missense and in-frame indels. Often, despite a detailed genetic, no explanation exists for the manifestation of the disease or its. In this study, we benefit from 3D structural data of proteins complexes to assess the impact of specific mutations on the protein-protein interactions (PPI). We focused on STXBP1 (also known as Munc-18), a master regulator of synaptic function and neurotransmitter release. Many de novo STXBP1 mutations lead to epilepsy and diverse forms of NDD. We applied structural modeling and molecular dynamics (MD) simulations to quantify the stability and properties of the STXBP1 interaction with syntaxin 1A. We show that while state-of-the-art variant prediction tools resulted in discordant interpretation, we could assess mutations by their pathological severity that match the calculated properties of the STXBP1-syntaxin 1A interface. Mutations that cause a reduced interaction surface area of STXBP1-syntaxin 1A led to the destabilization of the protein complex and eventually a disruption in synaptic transmission. This study provides a direct approach that connects novel variants with 3D structure and dynamics. The method is extended to protein complexes associated with other clinical rare diseases.

4:40 PM-5:00 PM
Predicting changes in protein thermostability upon point mutation with deep 3D convolutional neural networks
Format: Pre-recorded with live Q&A

  • Bian Li, Yale University, United States
  • Yucheng Yang, Yale University, United States
  • John Capra, Vanderbilt University, United States
  • Mark Gerstein, Yale University, United States

Presentation Overview: Show

Predicting mutation-induced changes in protein thermostability (ΔΔG) is of great interest in protein engineering, variant interpretation, and drug discovery. We introduce ThermoNet, a deep 3D-convolutional neural network designed for structure-based prediction of ΔΔG upon point mutation. To naturally leverage the image-processing power inherent in convolutional neural networks, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. ThermoNet is trained with a data set balanced with direct and reverse mutations generated by symmetry-based data augmentation. It demonstrates improved performance compared to fifteen previously developed computational methods on a widely used blind test set. Unlike all other methods that exhibit a strong bias towards predicting destabilization, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔG landscape for two clinically relevant proteins, p53 and myoglobin, and ClinVar missense variants. Overall, our results suggest that 3D convolutional neural networks can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.

5:00 PM-5:10 PM
Network analysis of synonymous codon usage
Format: Pre-recorded with live Q&A

  • Khalique Newaz, University of Notre Dame, United States
  • Gabriel Wright, University of Notre Dame, United States
  • Jacob Piland, University of Notre Dame, United States
  • Jun Li, University of Notre Dame, United States
  • Patricia Clark, University of Notre Dame, United States
  • Scott Emrich, University of Tennessee, United States
  • Tijana Milenkovic, University of Notre Dame, United States

Presentation Overview: Show

Most amino acids are encoded by multiple codons, some of which are used more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of their positions in protein 3-dimensional structures, which are biochemically richer than sequences alone, might further explain the role of rare codons in protein folding. We model protein structures as networks and use network centrality to measure the structural position of an amino acid. We first validate that amino acids buried within the protein’s core are network-central, and those on the surface are not. Only then, we study potential differences between network and thus structural positions of amino acids encoded by evolutionarily conserved rare, evolutionarily non-conserved rare, and commonly used codons. In 84% of our proteins, the three codon categories occupy significantly different structural positions. We examine protein groups showing different relationships between structural positions of the three codon categories. Several of the groups show interesting structural or functional characteristics. Our work provides evidence that codon usage is linked to the final protein 3D structure and thus potentially to co-translational protein folding.

5:10 PM-5:20 PM
In silico ensemble modeling suggests binding-induced expansion as a possible functional mechanism for two endocytic proteins.
Format: Pre-recorded with live Q&A

  • N. Suhas Jagannathan, National University of Singapore, Singapore
  • Christopher W. V. Hogue, Mechanobiology Institute, NUS. Current Address: Global AI Accelerator, Santa Clara, CA., United States
  • Lisa Tucker-Kellogg, Duke NUS Medical School, Singapore, Singapore

Presentation Overview: Show

Intrinsically disordered regions (IDRs) are known to function as linkers or through folding-upon-binding. In this work, we explore the possibility of binding-induced expansion, a mechanism where binding of a partner to an IDR results in either a local or a global expansion of the steric volume occupied by the IDR. We focus on the IDRs of Epsin and Eps15 from Clathrin-mediated endocytosis (CME), both of which contain multiple binding motifs to another CME protein AP2. We generated large conformational ensembles for Epsin and Eps15 IDRs and studied how the dimensions and energetics of the ensembles varied when bound to increasing numbers of AP2 molecules. Our results showed that Epsin-IDR and Eps15-IDR behave differently upon AP2 binding. Epsin-IDR undergoes binding-induced global expansion, a mechanism where AP2-binding causes a concurrent increase in the steric volume occupied by the energetically-stable members of the ensemble. This results in molecular crowding of Epsin-IDR at the endocytic hotspot, that could help remodel plasma membrane during endocytosis. In contrast, Eps15-IDR undergoes binding-induced local expansion, a mechanism where the binding of AP2 at one motif in the IDR makes other motifs more accessible for binding further AP2 molecules, allowing Eps15-IDR to function as an AP2 recruiter during endocytosis.

5:20 PM-5:40 PM
Frustration leads to fuzzy interactions in disordered proteins
Format: Pre-recorded with live Q&A

  • Maria I. Freiberger, Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica IQUIBICEN N-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires.
    Laboratory of Protein Dynamics, University of Debrecen, Hungary
    Center for Theoretical Biological Physics, Rice University, Houston, USA, Argentina
  • Diego Ferreiro, Laboratorio de Fisiología de Proteínas, IQUIBICEN N-CONICET, FCEyN, Universidad de Buenos Aires, Argentina
  • Viktor Ambrus, Laboratory of Protein Dynamics, University of Debrecen, Hungary
  • Peter Wolynes, Center for Theoretical Biological Physics, Rice University, United States
  • Monika Fuxreiter, Laboratory of Protein Dynamics, University of Debrecen, Hungary

Presentation Overview: Show

While proteins fold, strong energetic conflicts are minimized towards their native states according to the “Principle of Minimal Frustration". Local violations of this principle allow proteins to encode the complex energy landscapes, required for active biological functions.
Disordered proteins often exhibit templated folding and adopt a well-defined structure upon binding. These complexes, however, are fuzzy, as they adopt different binding modes with different partners.

We have performed a systematic analysis of frustration on complexes of 138 disordered proteins. These proteins contained disordered regions in the free form, while exhibited different binding modes in the bound form. Disorder-to-order regions (DOR) fold upon binding; disorder-to-disorder regions (DDRs) remained disordered with the partner; while many of the regions were context-dependent (CDRs) and observed in both ordered and disordered forms in their complexes.

We have found, in particular, that folding of disordered regions upon binding reduces frustration, but the interactions at the binding interface are not fully optimised. Disordered regions, which alternate between folded and disordered forms in different binding modes exhibit a higher degree frustration in their bound states. These results rationalize specificity without achieving an optimal structure and provide a physical framework for interaction versatility of disordered regions.

5:40 PM-6:00 PM
Protein local conformations analyses in ordered and intrinsically disordered proteins in the light of a structural alphabet
Format: Pre-recorded with live Q&A

  • Alexandre G. De Brevern, Université de Paris - INSERM UMR-S 1134 - INTS - DSIMB Team, France

Presentation Overview: Show

Protein structures are highly dynamic macromolecules. Molecular dynamics (MDs) simulations were performed on 169 representative protein domains. Classical secondary structures were explored. Concerning the helical structures, only 76.4% of the residues associated to α-helices retain the conformation; this tendency dropped to 40% for 310- and for π-helices (Narwani et al, Arch Biol Sci, 2018). The rigidity of β-sheet was confirmed, but showed its capacity to transform into turns. Finally, turns converted easily to helical structures while bends prefer the extended conformations. Protein Blocks structural alphabet (PBs, de Brevern et al, Proteins, 2000) showed that the majority of PBs remains with high frequency in original conformation. Few PBs have a higher tendency to be more flexible. The intriguing fact was that the change from a PB to another one did not correspond to a simple geometrical evolution. It was more frequent to go to an unexpected PB than an expected one (Narwani et al, J Biomol Struct Dyn, 2019). Disorder protein ensembles were analysed with PBs allowing to quantify the continuum from rigidity to flexibility and finally disorder (Melarkode Vattekatte et al, J Struct Biol, 2020, Data in Brief, 2020). These results have been compared to different types of prediction.