Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

3DSIG: Structural Bioinformatics and Computational Biophysics

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Monday, July 9th
10:15 AM-10:20 AM
3DSIG: Introduction
Room: Columbus AB
10:20 AM-11:00 AM
Structural insights into the impacts of genetic variations in splicing and disease
Room: Columbus AB
  • Christine Orengo, University College London, United Kingdom
11:00 AM-11:20 AM
scoreD: Deep discriminative ensemble classifiers for protein scoring
Room: Columbus AB
  • Debswapna Bhattacharya, Auburn University, United States

Presentation Overview: Show

Scoring function that can assign correct accuracy score to a predicted protein model and can select the optimal model from several alternative models (a.k.a. decoys) is critical for protein structure prediction. Here, we develop deep discriminative learning based ensemble classifiers to predict a decoy’s GDT-TS score. We train an ensemble of four deep discriminative classifiers by leveraging combinations of sequence and structure-derived features as well as several centroid energy terms from Rosetta. Weighted combination of these four classifiers is subsequently applied to predict a decoy’s score. The proposed method, scoreD, has been found to: (i) have enhanced ability of native state recognition compared to several widely-used knowledge-based statistical potentials in a benchmark of two decoy datasets comprising of 78 proteins, particularly for diversity-enhanced and evenness-enforced variants of these decoy sets; (ii) perform comparably in optimal model selection to structural consensus based meta-approach used in Zhang-server and QUARK, two top-performing protein structure predictors in Critical Assessment of Protein Structure Prediction (CASP) experiment, in a dataset of 43 protein targets from CASP10 as well as CASP-ROLL and 54 targets from CASP11; and (iii) achieve state-of-the-art performance in a dataset of 40 targets from CASP12. scoreD webserver is freely available at http://watson.cse.eng.auburn.edu/scoreD/.

11:20 AM-11:40 AM
Network approach integrates 3D structural and sequence data to improve protein structural comparison
Room: Columbus AB
  • Khalique Newaz, University of Notre Dame, United States
  • Fazle Faisal, University of Notre Dame, United States
  • Julie Chaney, University of Notre Dame, United States
  • Jun Li, University of Notre Dame, United States
  • Scott Emrich, University of Notre Dame, United States
  • Patricia Clark, University of Notre Dame, United States
  • Tijana Milenkovic, University of Notre Dame, United States

Presentation Overview: Show

Proteins are key macromolecules of life, and thus understanding their function is important. However, doing so experimentally is resource-consuming. Hence, computational prediction of protein function can help. Since proteins with similar structures often have similar functions, computational approaches have been proposed for capturing proteins’ structural and thus functional similarity. Traditionally, such approaches were sequence-based. Since amino acids that are distant in the sequence can be close in the 3-dimensional (3D) structure, 3D structural approaches can complement sequence approaches. Traditional 3D structural approaches compare “raw” protein structural information. In contrast, 3D structures can first be modeled as protein structure networks (PSNs). Then, “processed” PSN-based information can be used to compare proteins. We developed a novel approach, GRAFENE, to use integrative sequence and PSN-based structural information to compare proteins. In extensive evaluation on PSNs corresponding to protein domains from CATH and SCOP databases against existing state-of-the-art approaches (e.g., DaliLite, TM-align, and GR-Align), GRAFENE was both more accurate (in identifying as similar those PSNs that belong to the same CATH/SCOP class) and faster. Hence, GRAFENE is expected to impact future research on protein structural comparison and thus protein function prediction.

11:40 AM-12:00 PM
Backbone Brackets and Arginine Tweezers delineate Class I and Class II aminoacyl tRNA synthetases
Room: Columbus AB
  • Florian Kaiser, University of Applied Sciences Mittweida, Germany
  • Sebastian Bittrich, University of Applied Sciences Mittweida, Germany
  • Sebastian Salentin, Biotechnology Center (BIOTEC), TU Dresden, Germany
  • Christoph Leberecht, University of Applied Sciences Mittweida, Germany
  • V. Joachim Haupt, Biotechnology Center (BIOTEC), TU Dresden, Germany
  • Sarah Krautwurst, University of Applied Sciences Mittweida, Germany
  • Michael Schroeder, Biotechnology Center (BIOTEC), TU Dresden, Germany
  • Dirk Labudde, University of Applied Sciences Mittweida, Germany

Presentation Overview: Show

Aminoacyl tRNA synthetases (aaRS) ligate amino acids to their corresponding tRNA molecule and understanding the origin of aaRS can explain how the genetic code was established.

Sequence analyses revealed that aaRS enzymes can be divided into two complementary classes which differ significantly on a sequence and structural level. We identified Backbone Brackets and Arginine Tweezers as most compact ATP binding motifs characteristic for each Class. This oppositional implementation of enzyme substrate binding shows how nature realized the binding of the same ligand species with completely different mechanisms. A structural rearrangement of the Backbone Brackets observed upon ATP binding indicates a general mechanism of all Class I structures. We demonstrate that sequence or even structure analysis for conserved residues may miss important functional aspects. The study shows how structural bioinformatics can be applied to link evolution and genetic coding.

Backbone Brackets and Arginine Tweezers were traced back to the ancient Protozymes of aaRS, which were presumably encoded bidirectionally on opposite strands of the same gene. Both structural motifs can be observed in contemporary structures and it seems that the time of their addition, indicated by their placement in the ancient aaRS, coincides with the evolutionary trace of aaRS.

12:00 PM-12:20 PM
Structural Classification of Proteins in the post-Structural Genomics era
Room: Columbus AB
  • John-Marc Chandonia, Berkeley National Lab, United States
  • Steven E. Brenner, University of California, Berkeley, United States

Presentation Overview: Show

SCOPe (Structural Classification of Proteins – extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOPe 2.07, a major stable update, was released in March 2018. SCOPe continues high quality manual classification of new superfamilies, a key feature of SCOP.
As public investment in Structural Genomics has waned, the novelty of newly solved protein structures has fallen to a 20-year low, with only 17 structures each month (~2% of the 800 characterized) representing the first structure from a Pfam family. About half of these newly structurally characterized Pfam families classified to date in SCOPe represent a new fold or superfamily. Thus, ongoing expert manual curation of protein structure classifications such as SCOPe is feasible when abetted by automated methods, and continues to yield new discoveries.
An unfortunate consequence of the rate of sequencing outpacing the rate of structural characterization of protein families is that the fraction of large families with a known structure peaked 10 years ago, and is more than 10% lower today than it was at its peak. This makes interpretation of sequence variation much more challenging than would be the case had investment in Structural Genomics continued.

12:20 PM-12:40 PM
Improved protein contact prediction using two-level deep convolutional neural networks
Room: Columbus AB
  • Badri Adhikari, University of Missouri, St. Louis, United States
  • Jie Hou, University of Missouri - Columbia, United States
  • Jianlin Cheng, University of Missouri - Columbia, United States

Presentation Overview: Show

Significant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction. Here we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks – the first five predict contacts at 6, 7.5, 8, 8.5, and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11, and 12 experiments, DNCON2 achieves mean precisions of 35%, 50%, and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset, and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:20 PM
Proceedings Presentation: Protein threading using residue co-variation and deep learning
Room: Columbus AB
  • Jianwei Zhu, Chinese Academy of Sciences, China
  • Sheng Wang, Toyota Technological Institute at Chicago, United States
  • Dongbo Bu, Chinese Academy of Sciences, China
  • Jinbo Xu, Toyota Technological Institute at Chicago, United States

Presentation Overview: Show

Template-based modeling (TBM), including homology modeling and protein threading, is a popular method for protein 3D structure prediction. However, alignment generation and template selection for protein sequences without close templates remain very challenging. We present a new method called DeepThreader to improve protein threading, including both alignment generation and template selection, by making use of deep learning and residue co-variation information. Our method first employs deep learning to predict inter-residue distance distribution from residue co-variation and sequential information (e.g., sequence profile and predicted secondary structure), and then builds sequence-template alignment by integrating the predicted distance information with sequential information through an ADMM algorithm. Experimental results suggest that predicted inter-residue distance is helpful to both protein alignment and template selection especially for protein sequences without very close templates, and that our method outperforms currently popular homology modeling method HHpred and threading method CNFpred by a large margin and greatly outperforms the latest contact-assisted protein threading method EigenTHREADER.

2:20 PM-2:40 PM
Where the context-free grammar meets the contact map: a probabilistic model of protein sequences aware of contacts between amino acids
Room: Columbus AB
  • Witold Dyrka, Wroclaw University of Science and Technology, Poland
  • Francois Coste, Univ Rennes, Inria, CNRS, IRISA, France
  • Juliette Talibart, Univ Rennes, Inria, CNRS, IRISA, France

Presentation Overview: Show

Learning language of protein sequences, which captures non-local interactions between amino acids close in the spatial structure, is a long-standing bioinformatics challenge, which requires at least context-free grammars. However, complex character of protein interactions impedes unsupervised learning of context-free grammars. Using structural information to constrain the syntactic trees proved effective in learning probabilistic natural and RNA languages. In this work, we establish a framework for learning probabilistic context-free grammars for protein sequences from syntactic trees partially constrained using amino acid contacts obtained from wet experiments or computational predictions, whose reliability has substantially increased recently. Within the framework, we implement the maximum-likelihood and contrastive estimators of parameters for simple yet practical grammars. Tested on samples of protein motifs, grammars developed within the framework showed improved precision in recognition and higher fidelity to protein structures. The framework is applicable to other biomolecular languages and beyond wherever knowledge of non-local dependencies is available.

2:40 PM-3:00 PM
Predicting Loop Conformational Ensembles
Room: Columbus AB
  • Claire Marks, University of Oxford, United Kingdom
  • Jiye Shi, UCB Pharma, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom

Presentation Overview: Show

Protein function often relies on the ability of the protein to exist in a number of different stable conformations. Accurate prediction of these states would therefore be useful, providing an ensemble of structures that represent the diversity of the target instead of just a single, static model. We have investigated whether current algorithms are capable of this, in the context of loop structure prediction. We obtained two sets of targets, one containing loops with several experimentally-observed conformations and a set containing loops with only one conformation, and assessed the ability of four algorithms to generate and select decoys that are close to any, or all, of the known structures. We found that conformationally diverse loops are modelled significantly less accurately compared to loops with one known conformation. In fact, for most of these diverse loops, the decoys made were not similar to any of the native conformations. Our results imply that the idea of multiple native conformations being present in the decoy ensemble is incorrect, indicating that the prediction of conformation ensembles is impossible using current techniques, and that novel algorithms need to be designed with this specific goal in mind.

3:00 PM-3:20 PM
Systematic Analysis of Symmetry and Pseudo-Symmetry in Membrane Protein Structures
Room: Columbus AB
  • Antoniya Aleksandrova, NINDS - National Institutes of Health, United States
  • Lucy Forrest, NINDS (NIH), United States

Presentation Overview: Show

Available membrane protein structures have revealed an abundance of symmetry and pseudo-symmetry, which are observed not only in the formation of multi-subunit assemblies, but also in the repetition of internal structural elements. There are many known examples of the functional significance of these symmetries. In this context, a systematic study of symmetry should provide a framework for a broader understanding of the mechanistic principles and evolutionary development of membrane proteins. However, existing analyses lack the detail and breadth required for such a systematic study. Therefore, we aim to quantify both the extent and diversity of symmetry relationships in known structures of membrane proteins. To achieve this task, we combine the output of two programs for symmetry detection, namely SymD and CE-Symm, each of which has certain limitations. By leveraging the complementarity of these programs and taking into consideration the restrictions that the lipid bilayer places on protein structures, we improve both the sensitivity of symmetry detection and the coverage of the symmetric units. This analysis provides a valuable foundation for addressing a wide range of questions relating to the function and evolution of these important proteins. Therefore, we have incorporated this data into an online database called EncoMPASS (encompass.ninds.nih.gov).

3:20 PM-3:40 PM
Pseudo-Symmetry in 7 Transmembrane Helix (7TMH) Proteins: Intragenic Duplication of Protodomains with Evolutionary Balance of Structural Constraints and Functional Divergence
Room: Columbus AB
  • Philippe Youkharibache, NCI/NIH, United States
  • Alexander Tran, California State University Northridge, United States
  • Ravinder Abrol, California State University Northridge, United States

Presentation Overview: Show

7-Transmembrane-helix (7TMH) proteins cannot be grouped under a monolithic fold. A parallel structure-based analysis of sequence and functional evolution on folds sharing that magic number of 7 transmembrane (7TM) helices has revealed an evolutionary principle showing evidence of a duplication pattern of a 3/4-transmembrane helix (3/4-TMH) protodomain. This results in 7TMH proteins being made up of either two 4-TMH protodomains related by a two-fold symmetry, where one TM helix is lost, or two 3-TMH protodomains related by a two-fold symmetry, where an extra transmembrane helix can be present. The independent evolution of the two 3/4-TMH protodomains within a specific superfamily’s 7TMH protein appears to be guided by functional and structural constraints, which leads to either pseudo-symmetric folds of functionally-obligatory oligomeric 7TMH super-families like nicotinamide riboside transporter protein PnuC or pseudo-symmetric folds in other 7TMH super-families like G protein coupled receptors (GPCRs). This study also provides a surprising evolutionary link between GPCRs and ligand-gated ion channels. The sequence and structural protodomain analysis of different 7TMH super-families provides a unifying theme of their evolutionary process, where the intragenic duplication of protodomains is guided by varying degrees of functional divergence and structural constraints.

3:40 PM-4:00 PM
Predicting the assembly order of multimeric heteroprotein complexes
Room: Columbus AB
  • Lenna Peterson, Purdue University, United States
  • Yoichiro Togawa, Purdue University, United States
  • Juan Esquivel-Rodriguez, Purdue University, United States
  • Genki Terashi, Purdue University, United States
  • Charles Christoffer, Purdue University, United States
  • Amitava Roy, Purdue University, United States
  • Woong-Hee Shin, Purdue University, United States
  • Daisuke Kihara, Purdue University, United States

Presentation Overview: Show

Protein-protein interactions, particularly those involving multiple proteins, are the cornerstone of numerous biological processes. Although an increasing number of multi-chain protein complex structures have been determined, fewer studies have been performed to determine the assembly order of complexes.
Knowing the assembly order of a complex provides insights into the process of complex formation. Assembly order is also practically useful for constructing subcomplexes as a step toward solving the entire complex experimentally, designing artificial protein complexes, and developing drugs that interrupt a critical step in the complex assembly.
We developed a computational method, Path-LZerD, which predicts the assembly order of a protein complex by simulating its assembly process, which is the first method of this kind. A strong advantage of Path-LZerD is that the assembly order can be predicted even when the overall complex structure is not known. Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes. This work was published on PLoS Computational Biology, 2018 Jan 12; 14(1):e1005937.

4:00 PM-4:40 PM
Coffee Break
4:40 PM-5:00 PM
MAINMAST: De novo Main-chain Modeling for EM maps Using Tree-graph optimization.
Room: Columbus AB
  • Genki Terashi, Purdue University, United States
  • Daisuke Kihara, Purdue University, United States

Presentation Overview: Show

An increasing number of protein structures are determined by cryo-electron microscopy (cryo-EM) at near atomic resolution. However, tracing the main-chains and building full-atom models from EM maps of ~4-5 Å is still not trivial and a demanding task. Here, we introduce a novel de novo structure modeling method MAINMAST (MAINchin Model trAcing from Spanning Tree) that builds an entire three-dimensional model of a protein from a near-atomic resolution EM map. The method directly traces the main-chain and identifies Cα atom positions as tree-graph structures in the EM map. The method has substantial advantages over the existing methods: i) MAINMAST directly traces main-chain models from an EM density map without using known protein structures; ii) The procedure is fully automated and no manual setting is required; iii) MAINMAST can estimate a confidence score that indicates accuracy of structure regions. We tested MAINMAST on 40 simulated density maps at 5Å resolution and 30 experimentally determined maps at ~4-5 Å resolution and showed that MAINMAST performed significantly better than existing software. This work is in press in Nature Communications (2018).

5:00 PM-5:20 PM
Proceedings Presentation: An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification
Room: Columbus AB
  • Yixiu Zhao, Carnegie Mellon University, United States
  • Xiangrui Zeng, Carnegie Mellon University, United States
  • Qiang Guo, Max Planck Institute for Biochemistry, Germany
  • Min Xu, Carnegie Mellon University, United States

Presentation Overview: Show

Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at sub-molecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation.

Results: Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation-maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost.Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structural models from macromolecules captured by CECT.

5:20 PM-5:40 PM
Metalloproteome landscape from the amino acid covariance perspective
Room: Columbus AB
  • Frazier Baker, University of Cincinnati, United States
  • Nicholas Maltbie, University of Cincinnati, United States
  • Joseph Hirschfeld, University of Cincinnati, United States
  • Alexey Porollo, Cincinnati Children's Hospital Medical Center, United States

Presentation Overview: Show

Metal binding proteins are estimated to constitute at least one third of the proteome in any living organism. There is a great need for developing a reliable sequence-based annotation method for metal binding sites. We approached this problem using amino acid covariance analysis. 6090 non-redundant metal binding proteins were retrieved from the BioLiP database. A wide set of cumulative features derived from the top co-varying residues for a given site were evaluated. The best performing feature to discriminate metal binding from non-binding sites was found to be the individual conservation score (Shannon entropy). For metal specificity, the correlation-based metric appears the most informative to discriminate one metal versus others, as well as to achieve their pairwise distinctions. When discerning one type of metal from the other five types, metals can be discriminated in the following descending order of signal strength: Zn > Cu > Ca > Mg > Mn > Fe. In pairwise comparisons, Ca vs Mg appears to be the hardest metal pair to discern. Our study strongly suggests the possibility of developing an accurate sequence-based method for the annotation of metal binding sites and their specificity.

5:40 PM-6:00 PM
Clustering and classification of active and inactive protein kinase structures
Room: Columbus AB
  • Vivek Modi, Fox Chase Cancer Center, United States
  • Roland Dunbrack, Fox Chase Cancer Center, United States

Presentation Overview: Show

The active site of a protein kinase consists of several conserved residues, including the catalytic Asp residue in the HRDmotif and the DFGmotif at the activation loop N-terminus. Unlike the HRDmotif, the DFGmotif exhibits a unique conformation in the active state but displays flexibility across different inactive forms. To classify kinase structures, we have clustered the DFGmotif conformations based on the backbone dihedral angles of the sequence XDF, where X is the residue before the DFGmotif, and the position and conformation of the DFG Phe side-chain, utilizing a density-based clustering algorithm (DBSCAN). We have identified 8 distinct conformations that comprise 92% of kinase structures, and label them based on their Ramachandran regions (A (alpha), B (beta), L (left), E (epsilon)) and the Phe rotamer (gminus, gplus, trans). Active kinases with bound ATP exist exclusively in a BLAgminus conformation (55.4% of structures), known as DFGin, while Type II inhibitors solely to BBAgminus (5.2%), known as DFGout. The most common inactive conformations are BLBgplus (9.5%) and ABAgminus (9.4%), which place the Phe side-chain under the C-helix in a DFGin conformation. We believe the new classification and nomenclature will benefit understanding of conformational dynamics and inhibitor binding in the protein kinase family.

Tuesday, July 10th
8:35 AM-8:40 AM
3DSIG: Introduction
Room: Columbus AB
8:40 AM-9:00 AM
Improving the prediction of loops and drug binding in GPCR structure models
Room: Columbus AB
  • Bhumika Arora, Indian Institute of Technology, Monash University, and IITB-Monash Research Academy, India
  • K.V. Venkatesh, Indian Institute of Technology Bombay, India, India
  • Denise Wootten, Monash University, Australia
  • Patrick Sexton, Monash University, Australia

Presentation Overview: Show

G protein-coupled receptors (GPCRs) form the largest group of potential drug targets and therefore, the knowledge of their three dimensional structure is important for rational drug design. Homology modeling serves as a common approach for modeling transmembrane helical cores of GPCRs, however, these models have varying degrees of inaccuracies that result from the quality of template used. We have explored the extent to which inaccuracies inherent in homology models of the transmembrane helical cores of GPCRs can impact loop prediction. We found that loop prediction in GPCR models is much more difficult than loop reconstruction in crystal structures owing to the imprecise positioning of loop anchors. Therefore, minimizing the errors in loop anchors is likely to be critical for optimal GPCR structure prediction. To address this, we have developed a ligand directed modeling (LDM) method comprising of geometric protein sampling and ligand docking, and evaluated it for capacity to refine the GPCR models built across a range of templates with varying degrees of sequence similarity with the target. The LDM reduced the errors in loop anchor positions, as well as improved the prediction of ligand binding poses, resulting in the much better performance of these models in virtual library screenings.

9:00 AM-9:20 AM
OSPREY 3.0: Open-Source Protein Redesign for You, with Powerful New Features
Room: Columbus AB
  • Jeffrey W. Martin, Deparment of Computer Science, Duke University, United States
  • Anna U. Lowegard, Program in Computational Biology and Bioinformatics, Duke University, United States
  • Marcel S. Frenkel, Duke University, United States
  • Mark A. Hallen, Toyota Technological Institute at Chicago, United States
  • Adegoke Ojewole, Program in Computational Biology and Bioinformatics, Duke University, United States
  • Jonathan D. Jou, Duke University, United States
  • Siyu Wang, Program in Computational Biology and Bioinformatics, Duke University, United States
  • Graham T. Holt, Program in Computational Biology and Bioinformatics, Duke University, United States
  • Bruce R. Donald, Duke University, United States

Presentation Overview: Show

Computational protein design (CPD) holds great promise as a novel and ever more important tool in drug development and enzyme design. The Donald lab has shown that CPD can be applied to develop new drugs, change the specificity of enzymes, enhance the potency and breadth of antibodies, and predict resistance mutations to new drugs. We present OSPREY 3.0, a new and greatly improved release of the OSPREY protein design software. OSPREY 3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions of OSPREY when running the same algorithms on the same hardware. Moreover, OSPREY 3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions of OSPREY, OSPREY 3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically that OSPREY 3.0 accurately predicts the effect of mutations on protein-protein binding. OSPREY 3.0 is available at https://www2.cs.duke.edu/donaldlab/osprey.php as open-source software.

9:20 AM-9:40 AM
iCFN: an efficient exact algorithm for multistate protein design
Room: Columbus AB
  • Mostafa Karimi, Texas A&M University, United States
  • Yang Shen, Texas A&M University, United States

Presentation Overview: Show

Motivation: Multistate computational protein design (CPD) simultaneously considers positive and negative objectives corresponding to various protein states (e.g. oligomerization) and substates (e.g. conformation). Exact algorithms can guarantee the optimal solutions and thus enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate CPD.

Methods and results: We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for a generic formulation of multistate CPD. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a cost function network; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a hierarchical tree of solutions. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared to state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity.

Significance: To our best knowledge, this is the first exact algorithm that makes large-scale multistate CPD problems computationally tractable.

9:40 AM-10:15 AM
Coffee Break
10:15 AM-10:20 AM
3DSIG: Introduction
Room: Columbus AB
10:20 AM-11:20 AM
Towards Better Reproducibility (Discussion led by Philip E. Bourne)
Room: Columbus AB
  • Philip E. Bourne, School of Medicine, University of Virginia, United States
11:20 AM-11:40 AM
High throughput analysis of allostery through propagation of rigidity
Room: Columbus AB
  • Adnan Sljoka, Kwansei Gakuin University, Japan

Presentation Overview: Show

Allostery can be viewed as an effect of binding at one site of the protein to a second, often significantly distant functional site, enabling regulation of the protein function. In spite of its importance, the molecular mechanisms that give rise to allostery are still poorly understood. We have recently developed rigidity-transmission allostery (RTA) algorithm, an extremely fast computational method based on mathematical algorithms in rigidity theory. RTA algorithm provides a mechanical interpretation of allosteric signaling and is designed to predict if mechanical perturbation of rigidity (mimicking ligand binding) at one site of the protein can transmit and propagate across a protein structure and in turn cause a transmission and change in conformational degrees of freedom at a second distant site, resulting in allosteric transmission. In this talk, we will illustrate our method, identification of novel allosteric sites and a detailed mapping of allosteric pathways, which are experimentally validated with NMR studies on 3 different class of proteins: GPCRs [Nature Communication 2018], fluorocatate dehalogenase [Science 2017], eukaryotic translation initiation factor eIF4E and others. RTA method is computational very efficient (takes minutes of computational time on standard PC) and can scan many unknown sites for allosteric communication, identifying potential new allosteric sites.

11:40 AM-12:00 PM
ProBiSdock: flexible docking using existing knowledge from the Protein Data Bank
Room: Columbus AB
  • Janez Konc, National Institute of Chemistry, Slovenia
  • Dusanka Janezic, UP FAMNIT, Slovenia

Presentation Overview: Show

The co-crystallized ligands in Protein Data Bank (PDB) represent a great quantity of information about protein binding sites. However, this information is not used explicitly in the existing docking algorithms. We developed ProBiSdock, a docking algorithm that scores the docked poses using unique new scoring function in which the pose’s score depends on its overlap with the existing ligands' atom field. This force field is generated for each query protein specifically from existing co-crystallized ligands in other protein structures in the PDB transposed to the query protein using ProBiS binding sites alignment algorithm. To account for conformational changes in protein upon ligand binding, both compounds and proteins are treated as flexible. ProBiSdock enables fast docking of large databases containing millions of compounds and has been successfully validated on the DUD-E benchmark. It was already used to perform proteome-scale docking as well as to discover new experimentally confirmed inhibitors of IDO-1 enzyme, an attractive target in cancer therapy. ProBiSdock enables researchers to quickly search for new active compounds or, inversely, for new target proteins of existing drugs taking into account knowledge in the PDB and has been successfully validated in silico and in vitro.

12:00 PM-12:20 PM
Energetic conflicts in catalytic sites of protein enzymes
Room: Columbus AB
  • Maria I. Freiberger, Protein Physiology Laboratory, Buenos Aires University, Argentina
  • A. Brenda Guzovsky, Protein Physiology Laboratory, Buenos Aires University, Argentina
  • Diego U. Ferreiro, Protein Physiology Laboratory, Buenos Aires University, Argentina
  • R. Gonzalo Parra, Quantitative and Computational Biology Group, Max Planck Institute for Biophysical Chemistry, Germany

Presentation Overview: Show

Introduction
Natural Proteins spontaneously fold by globally minimizing the internal conflicts. However, 10-15% of their residue-residue interactions are in strong energetic conflict or “highly frustrated”. Such frustration sculpts protein dynamics for specific functions such as protein binding, protein-protein interactions and allosterism. Enzymatic reaction rates strongly depend on precise and conserved arrangements bringing together in space residues that would otherwise adopt different interactions. Hence, it is expected that natural enzymes have locally frustrated catalytic sites.

Results
We analyzed all enzymes with experimentally assigned catalytic residues from the Catalytic Site Atlas (926 non-redundant structures). We studied frustration patterns around catalytic residues, accounting for their oligomeric state, catalytic mechanisms and structural architectures. Catalytic residues shown to be generally in conflict with their surroundings. Moreover, when analyzing protein families, most residues related to enzymatic activity form conserved networks of highly frustrated interactions.

Concluding Remarks
Highly frustrated active sites constitute a general characteristic of protein enzymes. Comparison of the differences in the evolutionary frustration patterns of related protein families will help to trace back and detect the emergence of family-specific energetic conflicts imprinted by functional requirements. Additionally, understanding the functional implications of frustrated interactions of protein enzymes will help to improve enzyme engineering strategies.

12:20 PM-12:40 PM
The Impact of Conformational Entropy on the Accuracy of the Molecular Docking Software FlexAID in Binding Mode Prediction
Room: Columbus AB
  • Louis-Philippe Morency, University of Montreal, Canada
  • Rafael Najmanovich, University of Montreal, Canada

Presentation Overview: Show

Here we introduce the latest version of Flexible Artificial Intelligence Docking (FlexAID) that allows its scoring function to consider the conformational entropy of ligands in complex with their biological targets. We present the impact of FlexAID’s newest feature on its accuracy in binding mode prediction using three increasingly complex scenarios: the Astex Diverse Set, the Astex Non Native Set and the HAP2 dataset. We show that FlexAID outperforms other open-source molecular docking methods when molecular flexibility is crucial. The improved accuracy of FlexAID on complex cases, the addition of novel features, i.e. the conformational entropy, its accessibility and its easy-to-use graphical user interface suggest that FlexAID is in an interesting position to tackle biologically challenging and pharmacologically relevant situations currently ignored by other methods.

FlexAID is available as source code, as a command-line pre-compiled executable (available at http://biophys.umontreal.ca/nrg for Windows, macOS & Linux) or through the NRGsuite, a PyMOL integrated user interface allowing the user to use FlexAID in an intuitive manner with real time visualization. Both the NRGsuite and FlexAID are distributed as open-source software.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-3:00 PM
Collaborative structural biology using machine learning and Jupyter notebook
Room: Columbus AB
  • Fergus Boyles, University of Oxford, United Kingdom
  • Fergus Imrie, University of Oxford, United Kingdom
3:00 PM-3:20 PM
Proceedings Presentation: A novel methodology on distributed representations of proteins using their interacting ligands
Room: Columbus AB
  • Hakime Öztürk, Boğaziçi University, Turkey
  • Elif Ozkirimli, Bogazici University, Turkey
  • Arzucan Ozgur, Bogazici University, Turkey

Presentation Overview: Show

Motivation: The effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand based approach can be utilized in protein representation.

Methods: In this study, we propose SMILESVec, a SMILES-based method to represent ligands and a novel method to compute similarity of proteins by describing them based on their ligands. The proteins are defined utilizing the word-embeddings of the SMILES strings of their ligands. The performance of the proposed protein description method is evaluated in protein clustering task using TransClust and MCL algorithms. Two other protein representation methods that utilize protein sequence, BLAST and ProtVec, and two compound fingerprint based protein representation methods are compared.
Results: We showed that ligand-based protein representation, which uses only SMILES strings of the ligands that proteins bind to, performs as well as protein-sequence based representation methods in protein clustering. The results suggest that ligand-based protein description can be an alternative to the traditional sequence or structure based representation of proteins and this novel approach can be applied to different bioinformatics problems such as prediction of new protein-ligand interactions and protein function annotation

3:20 PM-3:40 PM
FoldX accurate biomolecular binding prediction using PADA1 (Protein Assisted DNA Assembly v1)
Room: Columbus AB
  • Leandro Radusky, CRG, Spain
  • Javier Delgado, CRG, Spain
  • Hector Climente-González, CRG, Spain
  • Luis Serrano, CRG, Spain

Presentation Overview: Show

In this work we present PADA1, a generic algorithm that accurately models structural complexes and predicts the interaction regions of resolved protein structures. PADA1 relies on a library of protein and interacting biomolecular fragment pairs obtained from training sets of deposited complexes. It includes a fast statistical force field computed from atom-atom distances, to evaluate and filter the 3D docking models. Using published validation sets we predicted the binding regions with an RMSD of <1.8 Å per residue in >95% of the cases. We show that the quality of the docked templates is compatible with FoldX protein design tool suite to identify the crystallized DNA/RNA/protein molecule sequence as the most energetically favorable in 80% of the cases. We highlighted the biological potential of PADA1 by reconstituting conformational changes upon protein mutagenesis for a variety of protein-DNA/RNA/protein complexes, and by predicting binding regions and protein/nucleotide sequences in proteins crystallized without partner. These results opens up new perspectives for the engineering of biomolecular interfaces.
The algorithm is already published only for the DNA version, and here we are presenting the updated version, valid for any kind of biomolecular complex.

3:40 PM-4:00 PM
Investigating the molecular determinants of ebolavirus pathogenicity
Room: Columbus AB
  • Henry Martell, The University of Kent, United Kingdom
  • Morena Pappalardo, The University of Kent, United Kingdom
  • Stuart Masterson, The University of Kent, United Kingdom
  • Franca Fraternali, King's College London, United Kingdom
  • Martin Michaelis, The University of Kent, United Kingdom
  • Mark Wass, The Univesity of Kent, United Kingdom

Presentation Overview: Show

The West Africa Ebola virus outbreak killed thousands of people. Using sequencing data combined with detailed structural analysis and experimental data, we compared Reston virus genomes, which is the only species of Ebolavirus not pathogenic in humans, to the other four Ebolavirus species. Here we present a significant update of this analysis, using nearly 1500 Ebolavirus genome sequences, compared to 196 in our original analysis. The number of specificity determining positions (SDPs) that are differentially conserved between the two groups and that may act as molecular determinants of pathogenicity reduces to 165 from 180. The large overlap of SDPs between the two datasets (73%) demonstrated the robustness of our approach, and ability to obtain reliable results with a limited number of genome sequences. The updated analysis places greater confidence that the SDPs present in the protein VP24 are likely to impair binding to human karyopherin alpha proteins and prevent inhibition of interferon signaling in response to infection.

4:20 PM-4:40 PM
The breadth of HIV broadly neutralizing antibodies depends on how they engage key epitope sites
Room: Columbus AB
  • Hongjun Bai, WRAIR, Henry M. Jackson Foundation for the Advancement of Military Medicine, United States
  • Merlin Robb, WRAIR, Henry M. Jackson Foundation for the Advancement of Military Medicine, United States
  • Nelson Michael, U.S. Military HIV Research Program, WRAIR, United States
  • Morgane Rolland, WRAIR, Henry M. Jackson Foundation for the Advancement of Military Medicine, United States

Presentation Overview: Show

Better characterizing the relationship between HIV-1 Env diversity and the breadth of broadly neutralizing antibodies (bnAbs) could reveal key knowledge for the development of effective HIV-1 vaccines. We proposed and tested several methods to quantitatively define the epitope diversity of HIV-1 epitopes. Our results highlighted that epitopes of bnAbs with broader neutralization spectra were not necessarily more conserved based on standard sequence diversity measurements. We found that the diversity of the top-nine epitope sites explained half of the difference in neutralization breadth across bnAbs (Spearman’s Rho = -0.74, p = 6e-7). These results illustrated how the broadest antibodies target their epitopes: they focused on the most conserved sites, thereby achieving cross-reactivity with heterologous Env proteins. These findings support vaccine strategies focusing on conserved elements of the virus.



The views expressed are those of the authors and should not be construed to represent the positions of the U.S. Army or the Department of Defense.

4:40 PM-5:00 PM
Coffee Break (on the go) to Closing Keynote