- Torsten Schwede, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
Presentation Overview: Show
3DSIG Keynote
- Hugo Schweke, université Paris Sud, France
- Marie-Hélène Mucchielli, université Paris Sud, France
- Sophie Sacquin-Mora, IBPC, France
- Anne Lopes, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, UPSay, France
Presentation Overview: Show
In the crowded cell, the competition between functional and non-functional interactions is severe. Understanding how a protein binds the right piece in the right way in this complex jigsaw puzzle is crucial and very difficult to address experimentally. To interrogate how this competition constrains the behavior of proteins with respect to their partners or random encounters, we (i) performed thousands of cross-docking simulations to systematically characterize the interaction energy landscapes of functional and non-functional protein pairs and (ii) developed an original theoretical framework based on two-dimensional energy maps that reflect the propensity of a protein surface to interact. Strikingly, we show that the interaction propensity of not only binding sites but also of the rest of protein surfaces is conserved for homologous partners be they functional or not. We show that exploring non-functional interactions (i.e. non-functional assemblies and interactions with non-functional partners) is a viable route to investigate the mechanisms underlying protein-protein interactions. Precisely, our 2D energy maps based strategy enables it in an efficient and automated way. Moreover, our theoretical framework opens the way for the developments of a variety of applications covering functional characterization, binding site prediction, or characterization of protein behaviors in a specific environment.
- Emine Sıla Özdemir, Koç University (Current Affiliation: OHSU, CEDAR), Turkey
- Hyunbum Jang, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., United States
- Zhigang Li, National Institutes of Health, United States
- David B. Sacks, National Institutes of Health, United States
- Ruth Nussinov, Leidos, United States
- Ozlem Keskin, Koc University, Turkey
- Atilla Gursoy, Koc University, Turkey
Presentation Overview: Show
The Rho GTPases Cdc42 and Rac1, in their active forms, interact with IQGAPs. The IQGAP–Cdc42 interaction promotes metastasis by enhancing actin polymerization. However, despite their high sequence identity, Cdc42 and Rac1 differ in their interactions with IQGAP. Two Cdc42 molecules can bind to the Ex-domain and the RasGAP site of GRD of IQGAP and promote IQGAP dimerization. Only one Rac1 molecule might bind to the RasGAP site of GRD and may not facilitate the dimerization. Using molecular dynamics simulations, site-directed mutagenesis, and Western blotting, we unraveled the detailed mechanisms of Cdc42 and Rac1 interactions with IQGAP2. We observed that Cdc42 binding to the Ex-domain of GRD2 releases the Ex-domain at the C-terminal of GRD2, facilitating IQGAP2 dimerization. Cdc42 binding to the Ex-domain promoted allosteric changes in the RasGAP site, providing a binding site for Cdc42 in the RasGAP site. The Cdc42 “insert loop” was important for the interaction of the first Cdc42 with the Ex-domain. By contrast, differences in Rac1 insert-loop sequence precluded its interaction with the Ex-domain. Rac1 could bind only to the RasGAP site of apo-GRD2 and could not facilitate IQGAP2 dimerization. Our detailed mechanistic insights help decipher how Cdc42 can stimulate actin polymerization in metastasis.
- Jan Kosinski, European Molecular Biology Laboratory, Germany
Presentation Overview: Show
Determining the structure of macromolecular complexes is crucial for understanding their function and mechanism of action. Recently, revolutions in electron microscopy (EM) brought structures of numerous complexes at near-atomic resolution. However, for many complexes the high resolution is still difficult to achieve. Integrative (or hybrid) structural modelling allows determining the structure of such complexes by combining low-resolution data from complementary techniques, such as X-ray crystallography, EM, NMR, SAXS or cross-linking mass spectrometry.
We have developed an automated modelling pipeline for integrative modelling of protein complexes based on EM and other data. It relies on a Monte Carlo sampling algorithm with a reinforcement step that enriches for best scoring solutions. Using this pipeline, we modelled the 1 MDa Elongator complex, the 110 MDa human nuclear pore complex (NPC), and other complexes. For the Elongator complex, I will present how our published model agrees with the high-resolution cryo-EM structure we have recently obtained in collaboration. For the NPC, in addition to our published results, I will show our current work on applying this pipeline to extending the model of the human NPC and to building models of NPCs from other species, which reveal surprisingly different architectures.
- Yue Cao, Texas A&M University, United States
- Yang Shen, Texas A&M University, United States
Presentation Overview: Show
We introduce a novel algorithm, Bayesian Active Learning (BAL), for optimization and uncertainty quantification (UQ) in flexible protein docking. BAL directly models the posterior distribution of the global optimum (or native structures for protein docking) with active sampling and posterior estimation iteratively feeding each other. Furthermore, we use complex normal modes to represent a homogeneous Euclidean conformation space suitable for high-dimension optimization and construct funnel-like energy models for encounter complexes. Over a protein docking benchmark set and a CAPRI set involving homology docking, we establish that BAL significantly improves against both starting points by rigid docking and refinements by particle swarm optimization, providing for one third targets a top-3 near-native prediction. BAL also generates tight confidence intervals with half range around 25\% of iRMSD and confidence level at 85\%. Its estimated probability of a prediction being native or not achieves binary classification AUROC at 0.93 and AUPRC over 0.60 (compared to 0.14 by chance). To the best of knowledge, this study represents the first uncertainty quantification solution for protein docking, with theoretical rigor and comprehensive assessment.
- Simon Kelow, University of Pennsylvania, United States
- Roland Dunbrack, Fox Chase Cancer Center, United States
Presentation Overview: Show
Antibodies are the largest family of solved structures in the Protein Data Bank (PDB), with ~3,500 structures currently available. Characterizing the structural features of the antibody complementary determining regions (CDRs) remains an important step in understanding antibody structure, and research tasks such as antibody design and classification rely on robust characterizations. We previously clustered the conformations of CDRs (North et al., J Mol Biol 406, 228-256) with a backbone dihedral angle metric and an affinity propagation algorithm. Many of the clusters established at that time remain small with very few sequences. We have re-clustered the antibody CDR conformations using a new maximum distance dihedral clustering metric, alongside an implementation of the DBSCAN clustering algorithm. Given a dataset quadruple in size, we have an opportunity to determine a new set of “canonical” clusters with improved sequence features compared to the previous clustering. Here we detail the development of a new antibody database relating CDR structural information to antibody sequence information based on this new clustering.
- Fergus Imrie, University of Oxford, United Kingdom
- Anthony Bradley, Exscientia Ltd, United Kingdom
- Mihaela van der Schaar, University of Cambridge, United Kingdom
- Charlotte M. Deane, University of Oxford, United Kingdom
Presentation Overview: Show
Fragment-based drug discovery (FBDD) has become an increasingly important tool for finding hit compounds, in particular for challenging targets and novel protein families. A key challenge is deciding which fragment hits to follow-up, and in what way. We seek to automate the elaboration of initial fragment hits in a data-driven and principled manner using machine learning techniques.
We have developed graph-based deep generative methods for fragment elaboration combining state-of-the-art machine learning techniques with structural knowledge. For fragment linking, our method takes two fragment hits and designs a molecule incorporating both fragments. The generation process is protein context dependent, utilising the relative distance and orientation between the fragments. This 3D information is vital to successful compound design, and we demonstrate the limitations of omitting such information.
As far as we are aware, this is both the first application of deep learning to FBDD and the first molecular generative model to incorporate 3D structural information directly. Our method designs sensible linkers and allows fragment elaboration without the limitations of database-based methods. We believe that our research will prompt a shift in how FBDD is conducted and we are currently working on extensions of our methods to more challenging scenarios within FBDD.
- Mohammad Elgamacy, Friedrich Miescher Laboratory of The Max Planck Society, Germany
- Birte Hernandez-Alvarez, Max Planck Institute for Developmental Biology, Germany
- Murray Coles, Max Planck Institute for Developmental Biology, Germany
- Patrick Mueller, Friedrich Miescher Laboratory of The Max Planck Society, Germany
Presentation Overview: Show
Recent advances in protein design have demonstrated the capacity of physics-based calculations
to navigate towards previously unobserved sequence-structure relations. The principal motivation behind these efforts is to precisely generate novel proteins with tailored functions and biophysical features. For the de novo design of pharmacologically active proteins, two prerequisites must be fulfilled; the biochemical target activity is understood at an atomic-structure-function detail, and availability of computational design methods that are accurate to an atomic-level. Reaching the latter, can thus enable single-step in silico design-to-in vivo testing. Thus, the elimination of intervening empirical optimisation steps can promise highly efficient and expedited therapeutic lead discovery.
In this work, we designed eight novel hematopoietic agents using two different computational design strategies. All of the experimentally tested designs were folded, monomeric and stable. We further proceeded to solve three NMR structures of the three representative molecules, which agreed to the design models at atomic accuracy. Finally, we evaluated the designed hematopoietic activity for these molecules, through in cell, ex vivo and in vivo assays, where three molecules have shown to posses specific, nanomolar activity.
- Louis-Philippe Morency, University of Montreal, Canada
- Rafael Najmanovich, University of Montreal, Canada
Presentation Overview: Show
We introduce the latest version of Flexible Artificial Intelligence Docking (FlexAID) that allows its scoring function to consider the conformational entropy of ligands in complex with their biological targets. We present the impact of FlexAID’s newest feature on its accuracy in binding mode prediction using three increasingly complex scenarios: the Astex Diverse Set, the Astex Non Native Set and HAP2. We show that FlexAID outperforms other open-source molecular docking methods when molecular flexibility is crucial. The improved accuracy of FlexAID on complex cases, the addition of novel features, i.e., the normal mode analysis, its accessibility and its easy-to-use graphical user interface suggest that FlexAID is in an interesting position to tackle biologically challenging and pharmacologically relevant situations currently ignored by other methods. Furthermore, FlexAID now outputs statistical thermodynamic parameters, i.e., ∆G, ∆H and -T∆S, as well as multiple fluid conformations that are computed for each predicted binding modes, two unique features useful in the dynamic visualization of results, in a more thorough energy comparison between different ligands (relative ranking of molecules by affinity and virtual screening) and for the analysis of conformational entropic contributions to the energy of formation of a complex of interest.
- Simoun Mikhael, CSUN, United States
- Ravi Abrol, California State University Northridge, United States
Presentation Overview: Show
Rational structure-based drug design relies on a detailed atomic level understanding of the protein-ligand interactions. The chiral nature of drug binding sites in proteins has led to the discovery of predominantly chiral drugs. Mechanistic understanding of stereoselectivity (which governs how one stereoisomer of a drug might bind stronger than the others to a protein) depends on the topology of stereocenters in the chiral molecule. Chiral graphs and reduced chiral graphs are new topological representations of chiral ligands that are introduced here, utilizing graph theory, to facilitate a detailed understanding of chiral recognition of ligands/drugs by proteins. These representations are demonstrated by application to all ~14,000+ chiral ligands in the protein data bank (PDB) [1], which will facilitate an understanding of protein-ligand stereoselectivity mechanisms. Ligand modifications during drug development can be easily incorporated into these chiral graphs. In addition, these chiral graphs present an efficient tool for a deep dive into the enormous chemical space to sample unexplored structural scaffolds.
[1] S. Mikhael and R. Abrol (2019) ChemMedChem (In press)
- Susan Leung, University of Oxford, United Kingdom
- Mike Bodkin, Evotec, United Kingdom
- Frank von Delft, Diamond Light Source, United Kingdom
- Paul Brennan, University of Oxford, United Kingdom
Presentation Overview: Show
One of the fundamental assumptions of hit-to-lead fragment-based drug discovery is that the fragment’s binding mode will be structurally conserved upon synthetic elaboration. The most common way of quantifying binding mode similarity is Root Mean Square Deviation (RMSD), but Protein Ligand Interaction Fingerprint (PLIF) similarity and shape-based metrics are sometimes used. We present SuCOS, an open-source RDKit-based implementation of Malhotra and Karanicolas’ combined overlap score (COS). SuCOS has a Pearson correlation coefficient with COS of 0.93. We explore the strengths and weaknesses of RMSD, PLIF similarity, and SuCOS on a dataset of X-ray crystal structures of paired elaborated larger and smaller molecules bound to the same protein. We show that combined shape and 3D-pharmacophoric-based metrics like SuCOS are superior to RMSD when comparing an elaborated fragment (larger molecule) with its original fragment hit counterpart (smaller molecule). When the molecules are identical, such as in redocking, the threshold of 2 Å RMSD is widely used. However, this often disregards the size of the molecules being compared. The SuCOS score ranges from 0 to 1, regardless of molecular size, and is therefore suitable for defining a more universal threshold. SuCOS also has potential applications in ligand-based and implicit structure-based virtual screening.
- Melissa F. Adasme, Biotechnology Center TU Dresden, Germany
Presentation Overview: Show
Drug repositioning aims to identify new indications for known drugs. With the growth of 3D structures of drug-target complexes, it is today possible to study drug promiscuity at the structural level and to screen vast amounts of drug-target interactions to predict side effects, polypharmacological potential, and repositioning opportunities. Here, we developed a structure-based drug repositioning approach, which extends the scope of the search to novel chemical scaffolds by exploiting the binding mode similarities between drugs. We applied this approach to identify drugs inactivating B-cells, whose dysregulation can function as a driver of autoimmune diseases. As an initial step, an RNAi screening over 500 kinases identified 22 proteins whose knock out imped the activation of B-cells. Our drug repositioning approach was applied to those targets’ structures revealing a well-known cancer drug as a micromolar inhibitor. The repositioning is explained through a specific pattern of noncovalent interactions shared between the original and predicted target. The novel inhibitor was finally validated, showing a very high therapeutic and selectivity index in B-cell inactivation. Overall, the repositioning approach was able to predict these findings at a fraction of the time and cost of a conventional screen.
- Philip Kim, University of Toronto, Canada
Presentation Overview: Show
Biologics are a rapidly growing class of therapeutics with many advantages over traditional small molecule drugs. A major obstacle to their development is that proteins and peptides are easily destroyed by proteases and, thus, typically have prohibitively short half-lives in human gut, plasma, and cells. One of the most effective ways to prevent degradation is to engineer analogs from dextrorotary (D)-amino acids, with up to 105-fold improvements in potency reported. We here propose a general peptide-engineering platform that overcomes limitations of previous methods. By creating a mirror image of every structure in the Protein Data Bank (PDB), we generate a database of ∼2.8 million D-peptides. To obtain a D-analog of a given peptide, we search the (D)-PDB for similar configurations of its critical-"hotspot"-residues. As a proof of concept, we apply our method to two peptides that are Food and Drug Administration approved as therapeutics for diabetes and osteoporosis, respectively. We obtain D-analogs that activate the GLP1 and PTH1 receptors with the same efficacy as their natural counterparts and show greatly increased half-life.
- Fergus Boyles, University of Oxford, United Kingdom
- Garrett Morris, University of Oxford, United Kingdom
- Charlotte M. Deane, University of Oxford, United Kingdom
Presentation Overview: Show
Scoring functions for protein-ligand binding affinity typically use features describing the protein-ligand complex, with limited information about the ligand itself. We have investigated the effect of adding a diverse set of ligand-based features to structure-based machine learning scoring functions. The inclusion of ligand-based features consistently improves the performance of structure-based machine learning scoring functions, even when ligands highly similar to those in the test set are excluded from the training set. The presence of similar proteins in the training and test sets has a significant impact on scoring function performance. However, the inclusion of ligand-based features improves performance regardless of training set composition. We investigated this behaviour and show that features of the ligand appear to be predictive of its mean binding affinity for its protein targets. We also find that the same ligand-based features are consistently important regardless of which structure-based features they are combined with. On the Comparative Assessment of Scoring Functions 2009, 2013, and 2016 scoring power benchmarks, a purely ligand-based model outperforms both the AutoDock Vina scoring function.
- Peter Man-Un Ung, Icahn School of Medicine at Mount Sinai, United States
- Rayees Rahman, Icahn School of Medicine, United States
- Avner Schlessinger, Icahn School of Medicine, United States
Presentation Overview: Show
Protein kinases are key signaling proteins that constitute one of the most important drug target families. The kinases are dynamic proteins, adopting different conformations that are associated with their catalytic activity and preference for particular KI structures. Despite considerable effort, the current known chemical space of kinase inhibitors (KIs) is limited, partly due to the high-costs of chemical synthesis and activity profiling of KIs, as well as structure determination of these proteins. We have developed Kinformation, a random forest classifier that annotates the conformation of over 3,500 protein kinase structures in the PDB. Our classification scheme captures known active and inactive kinase conformations and defines an additional conformational state, thereby refining the current understanding of the kinase conformational space. Next, we developed KinaMetrix, a comprehensive publicly accessible web-resource for studying kinase pharmacology and drug discovery. KinaMetrix serves as a repository for small molecule substructures that are significantly associated with each conformation type. Finally, we use these methods in collaboration with experimental labs to tackle challenges in kinase pharmacology. We combine our predictions with Drosophila genetics, to rationally design clinically relevant multi-kinase inhibitors targeting reprogrammed networks in thyroid cancer, as well as to develop inhibitors against an emerging diabetes drug target.
- Jiansheng Wu, Nanjing University of Posts and Telecommunications, China
- Yang Zhang, University of Michigan, United States
Presentation Overview: Show
Motivation: Accurate prediction and interpretation of ligand bioactivities are essential for virtual screening and drug discovery. Unfortunately, many important drug targets lack experimental data about the ligand bioactivities; this is particularly true for G protein-coupled receptors (GPCRs), which account for the targets of about a third of drugs currently on the market. Computational approaches with the potential of precise assessment of ligand bioactivities and determination of key substructural features which determine ligand bioactivities are needed to address this issue.
Results: A new method, SED, was proposed to predict ligand bioactivities and to recognize key substructures associated with GPCRs through the coupling of screening for Lasso of long extended-connectivity fingerprints (ECFPs) with deep neural network training. The SED pipeline contains three successive steps: 1) representation of long ECFPs for ligand molecules, 2) feature selection by screening for Lasso of ECFPs, and 3) bioactivity prediction through a deep neural network regression model. The method was examined on a set of sixteen representative GPCRs that cover most subfamilies of human GPCRs, where each has 300–5000 ligand associations. The results show that SED achieves excellent performance in modelling ligand bioactivities, especially for those in the GPCR datasets without sufficient ligand associations, where SED improved the baseline predictors by 12% in correlation coefficient (r2) and 19% in root mean square error. Detail data analyses suggest that the major advantage of SED lies on its ability to detect substructures from long ECFPs which significantly improves the predictive performance.
Availability: The source code and datasets of SED are freely available at https://zhanglab.ccmb.med.umich.edu/SED/.
- Jinbo Xu, Toyota Technological Institute at Chicago, United States
Presentation Overview: Show
Computational structure prediction of a protein without detectable homology in PDB is very challenging and usually needs a large amount of computing power. This talk will show that by using very deep convolutional residual neural network (ResNet), we may predict protein structures much more accurately and efficiently than ever before. Deep ResNet can predict inter-residue distance distribution very well, which enables us to construct protein 3D models from the geometric constraints given by the predicted distance matrix, without using time-consuming conformation sampling. Running on 20 CPUs, our deep ResNet method successfully folded 21 of the 37 CASP12 hard targets within 4 hours. In CASP13 our folding server successfully folded 17 of 32 hard targets and obtained the best contact prediction accuracy and almost the best folding accuracy among all servers. In the latest blind CAMEO test our folding server predicted correct folds of two membrane proteins, one of which has a new fold, while all the others failed. Our method also works well for complex contact prediction even trained by single-chain proteins. This talk will also compare the top human and server groups in CASP13, all of which have adopted deep ResNet for protein folding.
- Joe Greener, University College London, United Kingdom
- Shaun Kandathil, University College London, United Kingdom
- David Jones, University College London, United Kingdom
Presentation Overview: Show
The impact of deep learning on protein residue-residue contact prediction has not extended to template-free (de novo) model generation. Here we introduce DMPfold, a development of the DeepMetaPSICOV contact predictor. It uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and main chain torsion angles and uses these to build models in an iterative procedure. DMPfold produces more accurate models than two popular methods for a test set of CASP12 free modelling domains, and is able to generate high-quality models without any modifications for a set of transmembrane proteins. We apply it to all Pfam domains without a known structure and provide high-confidence models for 25% of these so-called "dark" families, a calculation that takes less than a week on a cluster with 200 available cores. DMPfold provides models for 16% of human proteome UniProt entries without structures, can generate accurate models with alignments of fewer than 100 sequences in some cases, and is freely available.
- Badri Adhikari, University of Missouri - St. Louis, United States
Presentation Overview: Show
Exciting new opportunities have arisen to solve the protein contact prediction problem from the progress in neural networks and the availability of a large number of homologous sequences through high-throughput sequencing. In this work, we study how deep convolutional neural network methods (ConvNets) may be best designed and developed to solve this long-standing problem. With publicly available datasets, we designed and trained various ConvNet architectures. We tested several recent deep learning techniques including wide residual networks, dropouts, and dilated convolutions. We studied the improvements in the precision of medium-range and long-range contacts, and compared the performance of our best architectures with the ones used in existing state-of-the-art methods. The ConvNet architectures we propose predict contacts with significantly more precision than the architectures used in several state-of-the-art methods. For example, when trained using the DeepCov dataset consisting of 3,456 proteins and tested on PSICOV dataset of 150 proteins, our architectures achieve up to 15% higher precision when L/2 long-range contacts are evaluated.
- Anna Smolinski, SIB Swiss Institute of Bioinformatics, Basel & Biozentrum University of Basel, Switzerland
- Flavio Ackermann, SIB Swiss Institute of Bioinformatics, Basel & Biozentrum, University of Basel, Switzerland
- Xavier Robin, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
- Rafal Gumienny, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
- Juergen Haas, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
- Torsten Schwede, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
Presentation Overview: Show
Protein structure prediction has become widely used in the life sciences as methods have matured significantly over the past 10 years. Today, most structure prediction workflows are fully automated. Consequently, establishing an automatized assessment and benchmarking process is key to sustained high-paced development of emerging methods. Continuously assessing structure prediction servers e.g. allows scientists to leverage the accumulated data to retrospectively select the best tool for a given scientific question.
The Continuous Automated Model EvaluatiOn (CAMEO) platform has been assessing predictions for over 6’700 targets in the 3D protein structure prediction category over 377 weeks, with currently about 20 new targets being assessed each week. CAMEO features baseline structure predictors in each of its categories. Additionally, we have recently implemented a “Best Single Template” baseline comparison resembling an upper “optimal alignment” limit by employing structural superposition to scan the ProteinDataBank (PDB) for the best available template at the time of target submission. This method helps identifying potential room for improvement of the template selection and alignment steps in automated protein structure modeling pipelines. Another integral part to CAMEO is the target selection, where we present the latest efforts on fully automated target validation.
- Clare E. West, University of Oxford, United Kingdom
- Charlotte M. Deane, University of Oxford, United Kingdom
- Saulo Oliveira, Stanford Linear Accelerator Center, Stanford University, United Kingdom
Presentation Overview: Show
Template-free protein structure prediction protocols typically generate many models for each target. Reliably choosing the best model and determining whether this model is likely to be correct is a fundamental problem. We have developed the Random Forest Quality Assessment (RFQAmodel), which combines existing quality assessment scores and two predicted contact map alignment scores, and outputs for each model an estimated probability that it is in the correct fold (TM-score>=0.5). Using RFQAmodel, we classify target proteins into distinct confidence categories, with those in the high-confidence category being most enriched with modelling successes. The classifier was trained and validated on two diverse sets of 244 protein domains. On the validation set, the highest-scoring model was in the high-confidence category for 67 modelling targets, of which 52 had the correct fold. Furthermore, RFQAmodel predicted that for 59 targets all models had a TM-score of less than 0.5, which was correct in 54 cases. Similar performance was achieved for CASP12/13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how our protocol can be used to focus computational efforts on difficult modelling targets.
- Joshua Toth, Geisinger Commonwealth School of Medicine, United States
- Paul DePietro, Geisinger Commonwealth School of Medicine, United States
- William McLaughlin, Geisinger Commonwealth School of Medicine, United States
- Juergen Haas, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
Presentation Overview: Show
The Continuous Automated Model Evaluation (CAMEO) platform provides quality assessments for individual protein structural models and overall performance estimates for structure prediction techniques. Here we describe a method to further estimate the average accuracies of structure prediction techniques according to their capacities to produce structural models which exhibit functional site predictions like those found in the corresponding experimentally determined reference structures. We utilized the FEATURE program to provide the probabilities of functional sites that are centered on specific residues within structural models and the reference structures. We measure the correlation coefficients and the subtracted differences between the cumulative probabilities of the functional predictions of the reference structures and the structural models. Average scores are used in head-to-head, round-robin pairwise comparisons between structural prediction techniques. The results provide a relatively robust manner to rank the structure prediction techniques according to their capacities to enable accurate functional site predictions. Further evidence that structure prediction techniques can accurately reconstitute the structural features found at local functional sites is thereby provided. A study of amino acid types at predicted functional sites revealed that across the various structure prediction techniques they more accurately reconstitute functional sites centered on some amino acid residue types over others.
- Rafael Najmanovich, University of Montreal, Canada
Presentation Overview: Show
Discussion on the development of 3DSIG-related activities sponsored by ISCB throughout the year at your own institutions.
- Jens Kleinjung, Soseiheptares, United Kingdom
- James Macpherson, Francis Crick Institute London UK, United States
- Franca Fraternali, King's College London; Francis Crick Institute London, United Kingdom
Presentation Overview: Show
The transition of proteins through multiple conformational states is a result of their structural plasticity,
which is inherently linked to their transition between different functional states. Delineating those states in
large systems and long molecular simulations can be a challenging task, not least deciding which structural features to base the delineation on. Using a Structural Alphabet and eigensystems of Mutual Information matrices, we have developed the method AlloHubMat to automate the detection and abstract representation of conformational states. We measure allosteric signal transmission between two distant protein sites by the Mutual Information between the observed state transitions at those sites over a chosen simulated time and number of conformational snapshots. Using AlloHubMat, we identified allosteric hub residues in the enzyme PKM2 and expressed them in mutated form experimentally. The chosen mutations were intended to abrogate allosteric signaling between the ligand FBP and amino acid binding sites. Five of the seven mutations abrogated the allosteric activation by FBP. The two seemingly inert mutations showed instead reduced allosteric inhibition in the presence of Phe. We show that computational detection of allosteric hub residues with AlloHubMat via analysis of Mutual Information matrices provides a viable route to detect allosteric effects in proteins.
- Jörn Schmiedel, Centre for Genomic Regulation (CRG), Spain
Presentation Overview: Show
Determining the three-dimensional structures of macromolecules is a major goal of biological research because of the close relationship between structure and function but thousands of protein domains still have unknown structures. Structure determination usually relies on physical techniques including x-ray crystallography, NMR spectroscopy and cryo-electron microscopy. Here we present a method that allows the high-resolution three-dimensional structure of a biological macromolecule to be determined only from measurements of the activity of mutant variants of the molecule. This genetic approach to structure determination relies on the quantification of genetic interactions (epistasis) between mutations and the discrimination of direct from indirect interactions. This provides a new experimental strategy for structure determination, with the potential to reveal functional and in vivo structural conformations at low cost and high throughput.
- Pablo Gainza, Ecole Polytechnique Federale de Lausanne and Swiss Institute of Bioinformatics, Switzerland
- Freyr Sverrisson, Ecole Polytechnique Federale de Lausanne and Swiss Institute of Bioinformatics, Switzerland, Switzerland
- Federico Monti, Institute of Computational Science, Faculty of Informatics, USI Lugano, Switzerland, Switzerland
- Emanuele Rodola, Department of Computer Science, Sapienza University of Rome, Italy, Italy
- Michael Bronstein, Institute of Computational Science, Faculty of Informatics, USI Lugano, Switzerland, Switzerland
- Bruno E Correia, Ecole Polytechnique Fédérale de Lausanne, Switzerland
Presentation Overview: Show
Predicting interactions between proteins and other biomolecules purely based on structure is an unsolved problem in biology. A high-level description of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein’s modes of interactions with other biomolecules. We hypothesize that proteins performing similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF, a conceptual framework based on a new geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: (a) protein pocket-ligand prediction, (b) protein-protein interaction site prediction, where we achieve a median ROC AUC of 0.80, compared with 0.65 for an established tool; and (c) ultrafast scanning of protein surfaces for prediction of protein-protein complexes, where we achieve runtime speeds fast tools up to 1000 times faster than some of the fastest docking tools. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.
- Fabian Sesterhenn, Ecole Polytechnique Fédérale de Lausanne, Switzerland
- Che Yang, Ecole Polytechnique Fédérale de Lausanne, Switzerland
- Zander Harteveld, Ecole Polytechnique Fédérale de Lausanne, Switzerland
- Bruno E Correia, Ecole Polytechnique Fédérale de Lausanne, Switzerland
- Jaume Bonet, Ecole Polytechnique Fédérale de Lausanne, Switzerland
Presentation Overview: Show
Computational de novo protein design holds the promise to expand the topological space and boost our understanding of the rules guiding protein folding. The ability to create novel structures has been used to tackle a broad range of biomedical and biotechnological challenges for which no suitable structural conformation was available (such as the design of novel therapeutics and immunogens). However, most current approaches require a high level of understanding of the target topology and primarily focuses on the stabilisation of the final design.
To tackle these limitations, we developed TopoBuilder, a protocol mixing parametric and heuristic protein design to generate de novo scaffolds. TopoBuilder transforms 2D projections of a protein’s secondary structures into full 3D designs, using statistically observed local correlations to ensure a natural-like disposition of the structural elements.
Thanks to its simplified starting requirements, the protocol provides the tools to systematically explore the structural space, in a way no currently available tool offers. Furthermore, with its fine control of the secondary structure placement, TopoBuilder is capable of tailoring scaffolds around structurally complex functional motifs, setting the most favourable context for their stabilisation and presentation, being the first tool to design de novo scaffolds around previously known structural motifs.
- Arnold Amusengeri, Rhodes University, South Africa
- Ozlem Tastan Bishop, Rhodes University, South Africa
Presentation Overview: Show
The need for next-generation anti-cancer drugs cannot be overstated. Due to resistance to multi-drug therapy, new strategies have been proposed to identify top quality leads against key human cancers’ pro-survival targets. The present study aims at combining structure-based drug design fundamentals with dynamic residue network (DRN) concepts to identify and assess allosteric regulation propensities of South African natural compounds. Utilizing high through-put molecular docking technique, heat shock proteins Hsp72 and Hsc70 were screened for their previously identified allosteric sites against South African Natural Compounds Database (SANCDB; https://sancdb.rubi.ru.ac.za/). Selected protein-hit complexes were further analyzed by molecular dynamics calculations. Discorhabdin N, a marine alkaloid commonly isolated from Latrunculid sponges, which bound allosteric substrate binding domain (SBD) back pocket, modulated both protein targets’ dynamic behavior. Further, using DRN analysis via MD-TASK tool kit, key allosteric communication centers within the proteins were identified. The implications of ligand binding on these signal sensitive hotspots were determined. Our findings allowed us to discuss possible allostery regulatory mechanisms of Discorhabdin-N, and hence provide a novel approach developed by the group for allosteric drug discovery.
- Maria Freiberger, Protein Physiology Laboratory - FCEyN - Universidad de Buenos Aires, Argentina
- A. Brenda Guzovsky, Protein Physiology Laboratory - FCEyN - Universidad de Buenos Aires, Argentina
- Diego M. Luna, Facultad de Ingenieria, Universidad Nacional de Entre Rios, Argentina
- Peter Wolynes, Center for Theoretical Biological Physics, Rice University, United States
- Diego Ferreiro, Protein Physiology Laboratory - FCEyN - Universidad de Buenos Aires, Argentina
- R. Gonzalo Parra, Genome Biology Unit, European Molecular Biology Laboratory, Germany
Presentation Overview: Show
Introduction
While proteins fold, strong energetic conflicts are minimized towards their native states according to the principle of minimal frustration. Local violations of this principle allow proteins to encode the complex energy landscapes, required for active biological functions. Enzymatic reaction rates strongly depend on precise and conserved arrangements bringing together in space residues that would otherwise adopt different interactions. Hence, catalytic sites are expected to be locally frustrated.
Results
We quantified local energetic conflicts of all protein enzymes with known structures and experimentally annotated catalytic residues. Catalytic sites are effectively highly frustrated in extant enzymes, regardless of protein oligomeric state, topology, and enzymatic class. We show that, in the context of protein families, frustration at catalytic sites is more evolutionarily conserved than the primary structure itself (Freiberger et. al, PNAS 2019). Additionally, we will also discuss the appearance of specific frustration patterns along the evolutionary history of protein superfamilies.
Concluding Remarks
Highly frustrated active sites constitute a general characteristic of protein enzymes. Comparisons, in the context of related protein families, help to study the emergence of family-specific energetic conflicts imprinted by functional requirements. Understanding the functional implications of frustrated interactions of protein enzymes will help to improve enzyme engineering strategies.
- Georgi Kanev, Amsterdam UMC - VUmc, Netherlands
- Bart Westerman, Amsterdam UMC - VUmc, Netherlands
Presentation Overview: Show
Knowledge of the promiscuity of small molecule inhibitors towards members of the protein kinase family is very limited. This unexplored territory could have important consequences for anti-cancer therapies and when uncovered could result in enhanced multi-targeted personalized treatments. We have performed comprehensive sequence, structural, mutational and inhibitor binding analysis of kinase inhibitors and developed a structure-based virtual screening pipeline that uses 1D, 2D and 3D (structural) kinase-ligand information as input for deep convolutional neural networks (CNN) to predict the activity of small molecules against the kinome. This provides opportunities for therapeutic repurposing, development of drugs with a synergistic polypharmacological profile and an improved efficacy as well as untargeted subpockets adjacent to the ATP pocket. Critical steps in the structure-based virtual screening pipeline include integrating bioactivity data from web resources and literature, analyzing in-silico generated interaction profiles from molecular docking with the in-house developed Kinase-Ligand Interaction Fingerprints and Structures database (KLIFS) and preparing the data as input for CNN while retaining 3D and lower dimensional structures in the data. The pipeline achieved performance less than 1.0 root-mean-square error (RMSE) and Pearson correlation higher than 0.62, indicating that our method can lead to the identification of multi-target kinase inhibitors with clinical relevance.
- Renmin Han, KAUST, Saudi Arabia
- Zhipeng Bao, Tsinghua University, China
- Tongxin Niu, Institute of Biophysics, CAS, China
- Fa Zhang, Institute of Computing Technology, CAS, China
- Xin Gao, King Abdullah University of Science and Technology, Saudi Arabia
- Xiangrui Zeng, Carnegie Mellon University, United States
- Min Xu, Carnegie Mellon University, United States
Presentation Overview: Show
Motivation: Electron tomography (ET) is a widely used technology for 3D macro-molecular structure reconstruction. To obtain a satisfiable tomogram reconstruction, several key processes are involved, one of which is the calibration of projection parameters of the tilt series. Although fiducial marker-based alignment for tilt series has been well studied, marker-free alignment remains a challenge, which requires identifying and tracking the identical objects (landmarks) through different projections. However, the tracking of these landmarks is usually affected by the pixel density (intensity) change caused by the geometry difference in different views. The tracked landmarks will be used to determine the projection parameters. Meanwhile, different projection parameters will also affect the localization of landmarks. Currently, there is no alignment method that takes interrelationship between the projection parameters and the landmarks.
Results: Here, we propose a novel, joint method for marker-free alignment of tilt series in ET, by utilizing the information underlying the interrelationship between the projection model and the landmarks. The proposed method is the first joint solution that combines the extrinsic (track-based) alignment and the intrinsic (intensity-based) alignment, in which the localization of landmarks and projection parameters keep refining each other until convergence. This iterative approach makes our solution robust to different initial parameters and extreme geometric changes, which ensures a better reconstruction for marker-free electron tomography. Comprehensive experimental results on three real datasets show that our new method achieved a significant improvement in alignment accuracy and reconstruction quality, compared to the state-of-the-art methods.
Availability: The main program is available at https://github.com/icthrm/joint-marker-free-alignment.