Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide



Poster Categories
Poster Schedule
Preparing your Poster - Information and Poster Size
How to mount your poster
Print your poster in Basel

View Posters By Category

Session A: (July 22 and July 23)
Session B: (July 24 and July 25)

Presentation Schedule for July 22, 6:00 pm – 8:00 pm

Presentation Schedule for July 23, 6:00 pm – 8:00 pm

Presentation Schedule for July 24, 6:00 pm – 8:00 pm

Session A Poster Set-up and Dismantle
Session A Posters set up: Monday, July 22 between 7:30 am - 10:00 am
Session A Posters should be removed at 8:00 pm, Tuesday, July 23.

Session B Poster Set-up and Dismantle
Session B Posters set up: Wednesday, July 24 between 7:30 am - 10:00 am
Session B Posters should be removed at 2:00 pm, Thursday, July 25.

A-001: Chiral Graphs as Reduced Representations of Ligand Scaffolds for Stereoselective Drug Discovery and Enhanced Exploration of Chemical Scaffolds Space
  • Simoun Mikhael, CSUN, United States
  • Ravi Abrol, California State University Northridge, United States

Short Abstract: Rational structure-based drug design relies on a detailed atomic level understanding of the protein-ligand interactions. The chiral nature of drug binding sites in proteins has led to the discovery of predominantly chiral drugs. Mechanistic understanding of stereoselectivity (which governs how one stereoisomer of a drug might bind stronger than the others to a protein) depends on the topology of stereocenters in the chiral molecule. Chiral graphs and reduced chiral graphs are new topological representations of chiral ligands that are introduced here, utilizing graph theory, to facilitate a detailed understanding of chiral recognition of ligands/drugs by proteins. These representations are demonstrated by application to all ~14,000+ chiral ligands in the protein data bank (PDB) [1], which will facilitate an understanding of protein-ligand stereoselectivity mechanisms. Ligand modifications during drug development can be easily incorporated into these chiral graphs. In addition, these chiral graphs present an efficient tool for a deep dive into the enormous chemical space to sample unexplored structural scaffolds. [1] S. Mikhael and R. Abrol (2019) ChemMedChem (In press)

A-002: SuCOS: a pharmacophoric-shape overlap metric for comparing binding modes
  • Susan Leung, University of Oxford, United Kingdom
  • Mike Bodkin, Evotec, United Kingdom
  • Frank von Delft, Diamond Light Source, United Kingdom
  • Paul Brennan, University of Oxford, United Kingdom

Short Abstract: One of the fundamental assumptions of hit-to-lead fragment-based drug discovery is that the fragment’s binding mode will be structurally conserved upon synthetic elaboration. The most common way of quantifying binding mode similarity is Root Mean Square Deviation (RMSD), but Protein Ligand Interaction Fingerprint (PLIF) similarity and shape-based metrics are sometimes used. We present SuCOS, an open-source RDKit-based implementation of Malhotra and Karanicolas’ combined overlap score (COS). SuCOS has a Pearson correlation coefficient with COS of 0.93. We explore the strengths and weaknesses of RMSD, PLIF similarity, and SuCOS on a dataset of X-ray crystal structures of paired elaborated larger and smaller molecules bound to the same protein. We show that combined shape and 3D-pharmacophoric-based metrics like SuCOS are superior to RMSD when comparing an elaborated fragment (larger molecule) with its original fragment hit counterpart (smaller molecule). When the molecules are identical, such as in redocking, the threshold of 2 Å RMSD is widely used. However, this often disregards the size of the molecules being compared. The SuCOS score ranges from 0 to 1, regardless of molecular size, and is therefore suitable for defining a more universal threshold. SuCOS also has potential applications in ligand-based and implicit structure-based virtual screening.

A-003: Energetic conflicts in catalytic sites of protein enzymes
  • Maria Freiberger, Protein Physiology Laboratory - FCEyN - Universidad de Buenos Aires, Argentina
  • A. Brenda Guzovsky, Protein Physiology Laboratory - FCEyN - Universidad de Buenos Aires, Argentina
  • Diego M. Luna, Facultad de Ingenieria, Universidad Nacional de Entre Rios, Argentina
  • Peter Wolynes, Center for Theoretical Biological Physics, Rice University, United States
  • Diego Ferreiro, Protein Physiology Laboratory - FCEyN - Universidad de Buenos Aires, Argentina
  • R. Gonzalo Parra, Genome Biology Unit, European Molecular Biology Laboratory, Germany

Short Abstract: Introduction While proteins fold, strong energetic conflicts are minimized towards their native states according to the principle of minimal frustration. Local violations of this principle allow proteins to encode the complex energy landscapes, required for active biological functions. Enzymatic reaction rates strongly depend on precise and conserved arrangements bringing together in space residues that would otherwise adopt different interactions. Hence, catalytic sites are expected to be locally frustrated. Results We quantified local energetic conflicts of all protein enzymes with known structures and experimentally annotated catalytic residues. Catalytic sites are effectively highly frustrated in extant enzymes, regardless of protein oligomeric state, topology, and enzymatic class. We show that, in the context of protein families, frustration at catalytic sites is more evolutionarily conserved than the primary structure itself (Freiberger et. al, PNAS 2019). Additionally, we will also discuss the appearance of specific frustration patterns along the evolutionary history of protein superfamilies. Concluding Remarks Highly frustrated active sites constitute a general characteristic of protein enzymes. Comparisons, in the context of related protein families, help to study the emergence of family-specific energetic conflicts imprinted by functional requirements. Understanding the functional implications of frustrated interactions of protein enzymes will help to improve enzyme engineering strategies.

A-004: Ligand binding site structure shapes allosteric signal transduction and the evolution of allostery in protein complexes.
  • Gyorgy Abrusan, MRC HGU, Univeristy of Edinburgh, United Kingdom
  • Joseph Marsh, The University of Edinburgh, United Kingdom

Short Abstract: The structure of ligand binding sites has been shown to profoundly influence the evolution of function in homomeric protein complexes. Complexes with multi-chain binding sites (MBSs) have more conserved quaternary structure, more similar binding sites and ligands between homologues, and evolve new functions slower than homomers with single-chain binding sites (SBSs). Here, using in silico analyses of protein dynamics, we investigate whether ligand binding-site structure shapes allosteric signal transduction pathways (STPs), and whether the structural similarity of binding sites influences the evolution of allostery. Our analyses show that: 1) allostery is more frequent among MBS complexes than in SBS complexes, particularly in homomers; 2) in MBS homomers, semi-rigid communities and critical residues frequently connect interfaces and thus they are characterized by STPs that cross protein-protein interfaces, while SBS homomers usually not; 3) ligand binding alters community structure differently in MBS and SBS homomers; 4) except MBS homomers, allosteric proteins are more likely to have homologs with similar binding site than non-allosteric proteins, suggesting that binding site similarity is an important factor driving the evolution of allostery.

A-005: Computational Analysis of Atomic Trajectories in Domains of Proteins Causing Lung Cancer
  • Rizwan Qureshi, City University of Hong Kong, Hong Kong

Short Abstract: —Non-small cell Lung Cancer (NSCLC) is a major cause of death worldwide. About (80 - 85)% of lung cancer cases are NSCLC. It is well known that mutation of epidermal growth factor (EGFR, cancer related protein) may lead to NSCLC. In this work, we perform MD simulation for wildtype EGFR, EGFR with L858R mutation and EGFR with L858R and T790M mutation. We, then consider atom trajectories of EGFR and its mutant sequences as non-stationary time signals and autoregressive integrated (ARI) model is applied to these signals. The Power spectral density for each type is calculated. To analyze further, we divide complete structure of EGFR and its mutant sequences into 8 domains based on prior experimental knowledge. Dynamic time warping is used to analyze the similarity between each domain of the structures. The simulation results give useful insight about conformations dynamics of EGFR, such as the mutation causes the motion of atoms to be distorted and the protein become destabilized. The domains are less correlated in L858R type and even weaker when the second mutations occur. Overall, these findings will be helpful in computer aided drug discovery for NSCLC treatment


Short Abstract: Elapidic accidents are considered as dangerous as viperid envenomation despite the low rate reported. The composition of elapids venom includes some ligands that are very affined with cholinergic receptors presented on cell membranes, called three fingers toxins (3FTx´s), dotted with neurotoxic activity that may cause death in less than 30 minutes. The increase number of cases due the absence of an effectively antivenom, leads us in to the search of new technologies that allow us to optimize the serum production schemes. Through the application of bioinformatics tools in the field of venomics and toxinology we could be achieved it, in the present work by modeling a neurotoxic peptide (SWISSMODEL) using conserved orthologous sequences-COS (MEGA7/CHIMERA1.13r) of different 3FTx´s from worldwide elapids (NCBI/PDB), later analyze and predict its docking in silico (CLUSPRO2.0) with six different cholinergic receptors. Taking into consideration Cluster coefficients, the number of hydrogen bonds formed and synthesis viability –presence/absence of disulfide bridges- the optimal ligand was chosen, which could mean a health sector breakthrough, enabling the development of capable blocking antibodies for elapids venom components, resorting the classical way leading us to less production and development times.

A-007: Structure-based drug repositioning uncovers a well-known cancer drug as B-cells inactivator
  • Melissa F. Adasme, Biotechnology Center TU Dresden, Germany

Short Abstract: Drug repositioning aims to identify new indications for known drugs. With the growth of 3D structures of drug-target complexes, it is today possible to study drug promiscuity at the structural level and to screen vast amounts of drug-target interactions to predict side effects, polypharmacological potential, and repositioning opportunities. Here, we developed a structure-based drug repositioning approach, which extends the scope of the search to novel chemical scaffolds by exploiting the binding mode similarities between drugs. We applied this approach to identify drugs inactivating B-cells, whose dysregulation can function as a driver of autoimmune diseases. As an initial step, an RNAi screening over 500 kinases identified 22 proteins whose knock out imped the activation of B-cells. Our drug repositioning approach was applied to those targets’ structures revealing a well-known cancer drug as a micromolar inhibitor. The repositioning is explained through a specific pattern of noncovalent interactions shared between the original and predicted target. The novel inhibitor was finally validated, showing a very high therapeutic and selectivity index in B-cell inactivation. Overall, the repositioning approach was able to predict these findings at a fraction of the time and cost of a conventional screen.

A-008: Rational Design of Protein Dynamics
  • David Bednář, Loschmidt Laboratories, Czechia
  • Jiri Damborsky, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia

Short Abstract: Deciphering structural interrelations and constraints within proteins and their dynamics is key to understand their function and evolution. This is particularly valid for enzymes, which are singularly complex in terms of function, structure and dynamics. Apprising structural fluctuations on proteins is still challenging due to intrinsic limitations of experimental methods, and yet computational techniques can help surmounting such hindrances. Rational protein design aims to exploit the structure-function relationships for tailoring different aspects of enzymatic activity. Due to their lesser evolutionary constraints and larger distance to the catalytic center, recent design efforts have specifically targeted loops -particularly dynamic aperiodic regions flanked by regular secondary structures. However, loop design approaches still rely more on empirical sampling than on rational design, hinting the need for wider quantitative knowledge about loops flexibility. A remarkably challenging task in loop design is transferring some desired property between two proteins by means of loop grafting. A successful loop transplant requires precise geometric overlay of the target structure and meeting dynamical requirements for the engineered property. To address this problem we are developing a computational framework to compare loops geometry and dynamics on different proteins, which will be applicable to a wide range of protein families.

  • Tarcisio Melo, Universidade Estadual do Sudoeste da Bahia, Brazil
  • Bruno Silva Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil

Short Abstract: Cruzaine is an enzyme of the papain family, which has cysteine protease activity. This group of enzymes performs catalytic hydrolysis of peptides. In diseases caused by parasites of the genus Leishmania, it acts in the replication and virulence of the parasite through the process of rupturing the membrane of the infected cell. Since there was no crystallographic structure of this protein, we constructed the Leishmania brasiliensis Cruzaine by homology modelling approach, using Modeller9.19 software. Furthermote, built models were subjected to energy miminization and molecular dynamics using AMBER14 package. Subsequently, we carried out virtual screening of possible new inhibitors using pharmacophore method, as well as docked known inhibitors with leishmanicidal activity described. In virtual screening, PharmaGist (http://bioinfo3d.cs.tau.ac.il/PharmaGist/php.php) generated the phagemid sdf file, and ZincPharmer (http://zincpharmer.csb.pitt.edu/pharmer.html) returned 1775 pharmacophore-like molecules. Additionally we used 30 known inhibitors. Molecular docking was performed with all molecules by AutoDock Vina program, with the criterion of punctuation function. The best 30 ligand were selected with affinity energies below -8.5 Kcal/mol and the interaction maps with the catalytic site were generated in PyMOL 2.1.1. The next step of this work is perform molecular dynamics of 10 the best complexes.

A-010: Acceptable (ϕ,ψ) outliers and position-dependent steric maps derived using bond parameter specific Ramachandran analysis
  • Ashraya Ravikumar, Indian Institute of Science, India
  • Chandrasekharan Ramakrishnan, Indian Institute of Science, India
  • Narayanaswamy Srinivasan, Indian Institute of Science, India

Short Abstract: Backbone bond lengths and angles in proteins and peptides are major determinants of atomic positions. Though they are expected to be very close to ideal values, analysis of several small molecule crystal structures show that minor deviations in bond geometry have significant effect on the acceptable (ϕ,ψ) values of two-linked peptide units. These effects are best represented as bond-geometry specific Ramachandran steric maps, where acceptable (ϕ,ψ) values are the ones with no steric clash within the two-linked peptide unit. Nearly 2000 and 200,000 residue-wise bond-geometry dependent Ramachandran maps have been generated for peptides and proteins respectively. Apart from showing the acceptable (ϕ,ψ) combinations at a residue position, they also show the accessible conformational space for a given residue. This method has already been used in studying steric constraints due to backbone amide substitutions in β turns (Lahiri et. al, 2018). The bond-geometry dependent maps are primarily used to analyze acceptability of Molprobity outlier () points for high-resolution protein structures where bond geometry is deemed reliable. For validation of lower resolution structures, an ensemble map has been generated by putting together the peptide bond-geometry dependent Ramachandran maps, where for every (ϕ,ψ) value, the probability of it being steric clash-free is provided.

A-011: VoroMQA web server for assessing three-dimensional structures of proteins and protein complexes
  • Kliment Olechnovic, Vilnius University, Lithuania
  • Ceslovas Venclovas, Vilnius University, Lithuania

Short Abstract: We present the VoroMQA (Voronoi tessellation-based Model Quality Assessment) web server that is dedicated to the estimation of protein structure quality, a common step in selecting realistic and most accurate computational models and in validating experimental structures. As an input, the VoroMQA web server accepts one or more protein structures in PDB format. Input structures may be either monomeric proteins or multimeric protein complexes. For every input structure, the server provides both global and local (per-residue) scores. Visualization of the latter scores along the protein chain is enhanced by providing secondary structure assignment and information on solvent accessibility. A unique feature of the VoroMQA server is the ability to directly assess protein-protein interaction interfaces. If this type of assessment is requested, the web server provides interface quality scores, interface energy estimates, and local scores for residues involved in inter-chain interfaces. VoroMQA, the underlying method of the web server, was extensively tested in recent community-wide CASP and CAPRI experiments. During these experiments VoroMQA showed outstanding performance both in model selection and in estimation of accuracy of local structural regions. The VoroMQA web server is available at http://bioinformatics.ibt.lt/wtsam/voromqa.

A-012: An Integrated Molecular Modeling and Dynamic Residue Network Analysis Strategy to Identify Allosteric Modulators of Human Heat Shock Proteins
  • Arnold Amusengeri, Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: The need for next-generation anti-cancer drugs cannot be overstated. Due to resistance to multi-drug therapy, new strategies have been proposed to identify top quality leads against key human cancers’ pro-survival targets. The present study aims at combining structure-based drug design fundamentals with dynamic residue network (DRN) concepts to identify and assess allosteric regulation propensities of South African natural compounds. Utilizing high through-put molecular docking technique, heat shock proteins Hsp72 and Hsc70 were screened for their previously identified allosteric sites against South African Natural Compounds Database (SANCDB; https://sancdb.rubi.ru.ac.za/). Selected protein-hit complexes were further analyzed by molecular dynamics calculations. Discorhabdin N, a marine alkaloid commonly isolated from Latrunculid sponges, which bound allosteric substrate binding domain (SBD) back pocket, modulated both protein targets’ dynamic behavior. Further, using DRN analysis via MD-TASK tool kit, key allosteric communication centers within the proteins were identified. The implications of ligand binding on these signal sensitive hotspots were determined. Our findings allowed us to discuss possible allostery regulatory mechanisms of Discorhabdin-N, and hence provide a novel approach developed by the group for allosteric drug discovery.

A-013: Identification of Novel Allosteric Modulators using in silico Approaches: A Case Study for Plasmodium falciparum Prolyl tRNA Synthetase
  • Dorothy Nyamai, Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: Even though the concept of allostery in biomolecular systems has been known over 50 years, computational drug discovery research for allosteric sites has only started to witness exponentially growing interest recently. Here, as a case study, allosteric site of malarial Prolyl tRNA synthetase (ProRS), a member of the ubiquitous aminoacyl tRNA synthetase family involved in adding amino acids to their cognate tRNA during protein translation, was investigated. Previous studies have designed inhibitors that target Plasmodium falciparum ProRS (PfProRS) active site, but none has gone through clinical trials as they showed high levels of toxicity to human cells. Thus, there is need to circumvent the shortcoming of toxicity effects resulting from targeting the active site, hence the allosteric site targeting as allosteric modulators are highly specific. A potential allosteric drug targeting site of PfProRS was identified using FTMap and SiteMap. A total of five South African natural compounds were identified as potential hits that selectively bind PfProRS allosteric site but not human homolog. Further studies of protein-allosteric modulator complexes were performed through molecular dynamics, principal component analysis, free energy landscapes and dynamic residue networks (DRN). The results and hit scaffolds can be used as a starting point for antimalarial inhibitor development.

A-014: Reliable charge parameterization of metal cofactors for the Plasmodium falciparum cytochrome b and iron-sulfur protein complex for inhibitory drug design
  • Lorna Chebon-Bore, Research Unit in Bioinformatics (RuBi)- Rhodes University, South Africa
  • Colleen Manyumwa, Research Unit in Bioinformatics (RuBi)- Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: Cytochrome bc1 (Cytbc1) complex is a homodimer consisting of 11 subunits essential in promoting electron transfer which maintains membrane potential in eukaryotes. In Plasmodium, cytochrome b (PfCytb) subunit has two heme groups and is a validated drug target for the inhibition of the cellular respiration metabolic pathway. The protein interacts with iron-sulfur protein (ISP) containing a [2FE-2S] cluster. Iron metals in these co-factors have redox centers where oxidation and reduction occur to aid in catalysis. Prior to conducting inhibitory drug design, accurate force field parameters for each of these co-factors are important in studying molecular dynamics (MD) simulations. This work derives and validates optimal force-field parameters appropriate for PfCytb-ISP complex. Geometry optimization and quantum mechanics/molecular mechanic calculations for force-field generation for the cofactors were done using Amberff14SB force-field. For validation, MD simulations were performed in GROMACS at 310K and 1 bar pressure. The root-mean-square deviations were analyzed. Force-field parameters (distances, charges and energies) were successfully generated. The MD simulations showed system stability, and strong coupling between the cofactors and coordinating Histidine and Cysteine residues.These parameters hold metal ions in coordinating sphere of hemes and [2FE-2S] cluster thus they mirror the functionality of the co-factors as in their natural state.

A-015: Effective all-atom reconstruction algorithm for NARES-2P coarse-grained model of nucleic acids
  • Łukasz Golon, Faculty of Chemistry, University of Gdańsk, Poland
  • Adam Sieradzan, Faculty of Chemistry, University of Gdansk, Poland
  • Adam Liwo, Faculty of Chemistry, University of Gdańsk, Poland

Short Abstract: While coarse-grained (CG) models have opened extensive possibilities of biomolecular simulations at length- and timescales inaccessible with all-atom (AA) representation, some information is lost in this process. To regain this information AA reconstruction (backmapping) algorithms have been developed for some CG models. Most known backmapping algorithms for nucleic acids (NAs) CG models are designed for relatively detailed models with 5 or more sites per nucleotide. Unlike those models, NARES-2P has only two sites of interaction per nucleotide. Although most algorithms for AA reconstruction of NAs are not well suited to the resolution of NARES-2P, the problem is analogous to reconstructing AA structures of proteins. In this work we present a new backmapping algorithm for the NARES-2P model of NAs. This algorithm was benchmarked on a set of experimentally determined NA structures, by converting the NA PDB structure into NARES-2P representation, recovering the all-atom geometry and comparison with the experimental one. We applied the algorithm to determine all-atom details of intermediates obtained during NARES-2P simulations of telomere stretching and other biological processes. Our method is an important step in backmapping of low resolution CG NA models. Acknowledgements: This work was supported by National Science Centre, Poland Grant nr UMO-2017/27/N/ST4/01907.

A-016: A mutation Y823D in KIT shifts dynamic equilibrium between the kinase active and inactive states thus conferring resistance to anti-cancer drugs
  • Sanjay Kumar Srikakulam, Helmholtz Institute for Pharmaceutical Research Saarland, Germany
  • Tomas Bastys, Max Planck Institute for Informatics, Germany
  • Olga V. Kalinina, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Germany

Short Abstract: Tyrosine phosphorylation, a highly regulated post-translational modification is carried out by the enzyme tyrosine kinase (TK) that are important mediators in cell signaling cascades. TKs may acquire transforming mutations leading to malignancy and can be targeted by anti-cancer drugs. In turn, TKs may acquire secondary mutations that lead to resistance against these drugs. KIT is a TK that plays a role in cell differentiation, whose deregulation leads to various types of cancer, including gastrointestinal stromal tumors, leukemia and melanoma. KIT can be targeted by a range of inhibitors. A mutation Y823D in the activation loop of KIT is known to be responsible for the loss of sensitivity to some drugs in metastatic tumors. In order to understand the impact of Y823D on the KIT conformation and dynamics, we performed in total 32 100ns-long molecular dynamics simulations for wild-type and mutant KIT in the active and inactive state conformations. We found that Y823D affects the protein dynamics differently: in the active state, the mutation increases the protein stability whereas in the inactive state it induces local destabilization, thus shifting the dynamic equilibrium towards the active state, altering the communication between distant regulatory regions and possibly affecting the binding of drugs.

A-017: DEEPCON: Protein Contact Prediction using Dilated Convolutional Neural Networks with Dropout
  • Badri Adhikari, University of Missouri - St. Louis, United States

Short Abstract: Exciting new opportunities have arisen to solve the protein contact prediction problem from the progress in neural networks and the availability of a large number of homologous sequences through high-throughput sequencing. In this work, we study how deep convolutional neural network methods (ConvNets) may be best designed and developed to solve this long-standing problem. With publicly available datasets, we designed and trained various ConvNet architectures. We tested several recent deep learning techniques including wide residual networks, dropouts, and dilated convolutions. We studied the improvements in the precision of medium-range and long-range contacts, and compared the performance of our best architectures with the ones used in existing state-of-the-art methods. The ConvNet architectures we propose predict contacts with significantly more precision than the architectures used in several state-of-the-art methods. For example, when trained using the DeepCov dataset consisting of 3,456 proteins and tested on PSICOV dataset of 150 proteins, our architectures achieve up to 15% higher precision when L/2 long-range contacts are evaluated.

A-018: How effectively can one assess the structural impact of a missense variant in both experimental and predicted protein structures?
  • Michael Sternberg, Imperial College London, United Kingdom
  • Sirawit Ittisoponpisan, Imperial College London, United Kingdom
  • Tarun Khanna, Imperial College London, United Kingdom
  • Lawrence Kelley, Imperial College London, United Kingdom
  • Suhail A Islam, Imperial College London, United Kingdom
  • Devlina Chakravarty, University of Kansas, United States
  • Petras Kundrotas, University of Kansas, United States
  • Ilya Vakser, The University of Kansas, United States
  • Alessia David, Imperial College London, United Kingdom

Short Abstract: A missense variant can disrupt the stability of a protein with consequential disease association. However only ~17% of residues in the human proteome are covered by an experimental PDB structure whilst a structural models can be generated for an additional ~35%. It is important therefore that any structure-based evaluation of missense variants can be applied to, and has been benchmarked on, both experimental and template-modelled structures. This motivated our development of Missense3D. The SQWRL4 algorithm remodels the local environment of the variant side-chain keeping the main-chain fixed. We mapped Humsavar, ClinVar and ExAC variants with their neutral or disease classifications onto 606 protein structures. 40% of the 1,965 variants associated with disease are identified by Missense3D as having a destabilising impact whilst only 11% of the neutral had an impact. We show that these results are almost the same in predicted structures, even those based on a template in the 30 – 40% identify range. A Missense3D web site is available. Recent work on extending Missense3D to experimental and predicted complexes and the development of a version for membrane-spanning proteins will be reported.

A-019: FireProtASR: an Automated Workflow for Ancestral Sequence Reconstruction
  • Milos Musil, Brno University of Technology, Faculty of Information Technology, Czechia
  • Hannes Konegger, International Centre for Clinical Research, St. Anne's University Hospital Brno, Austria
  • Rayyan Khan, Loschmidt Laboratories, Pakistan
  • Jaroslav Zendulka, Brno University of Technology, Faculty of Information Technology, Czechia
  • Jan Stourac, The International Clinical Research Center, St. Anne's University Hospital, Brno, Czech Republic, Czechia
  • Jiri Damborsky, Loschmidt Laboratories, Faculty of Science, Masaryk University, Brno, Czech Republic, Czechia
  • David Bednar, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia

Short Abstract: Proteins are widely used in numerous biomedical and biotechnological applications. However, naturally occurring proteins cannot usually withstand the harsh industrial environment, since they are evolved to function in mild conditions. Therefore, there is a great interest in increasing protein stability to enhance their applicability. Ancestral sequence reconstruction (ASR) is a well-established method for deducing the evolutionary history of genes or whole genomes. Besides of its ability to discover the potential evolutionary ancestors of the modern proteins, ASR has proven to be a useful approach to infer their highly thermostable variants as the environmental conditions of the Precambrian Earth were significantly more demanding. However, its wide applicability is currently restricted due to its complicated nature, demanding a profound knowledge and understanding of numerous tools and principles of the phylogenetic analysis. FireProtASR aims to overcome these obstacles by providing a fully automated computational pipeline, which would allow users to acquire ancestral sequences of their proteins of interest without the need of constructing multiple-sequence alignment or phylogenetic tree, commonly required by the existing tools for ASR. FireProtASR will be completed with the robust framework for annotation-based homology search and the algorithm for the selection of the most suitable outgroup.

A-020: “Old Drugs, New Tricks”: Learning from failure and success stories of drug discovery
  • Sohini Chakraborti, Indian Institute of Science, India
  • Narayanaswamy Srinivasan, Indian Institute of Science, India

Short Abstract: The high rates of attrition in conventional drug discovery has popularized cost-effective alternatives dealing with repurposing the existing approved drugs or repositioning the failed molecules to respond to unmet medical needs. Historically, drug repurposing and repositioning have largely been serendipitous. Here, we present our systematic efforts in identifying potential agents to be considered for (a) repurposing, and (b) repositioning. In the former case, we have developed a protocol to identify promising repurpose-able drugs for treatment of infectious diseases by exploring evolutionary relationships between established targets of FDA-approved drugs and proteins belonging to the pathogen of interest. This strategy has recently been applied to propose 26 known drugs that can be repurposed against 31 protein targets of Candida albicans. Some of these drugs identified through our unbiased approach have been experimentally shown to possess anti-candida properties by other investigators (Chakraborti et al, 2019a, 2019b). Case-‘b’ deals with re-investigating a failed peptide drug-candidate, cilengitide – an anti-glioblastoma molecule targeted against avb3 integrin, through chemical modification. Our docking analysis shows that site-specific thioamidation of cilengitide results in better molecular interaction profile of the modified peptide with avb3, improving its efficacy and stability which have been demonstrated in experimental studies (Verma et al, 2018).

A-021: Caver Web: Identification of Tunnels and Channels in Proteins and Analysis of Ligand Transport
  • Jan Stourac, The International Clinical Research Center, St. Anne's University Hospital, Brno, Czech Republic, Czechia
  • Ondrej Vavra, Loschmidt Laboratories, Faculty of Science, Masaryk University, Brno, Czech Republic, Czechia
  • Piia Kokkonen, Loschmidt Laboratories, Faculty of Science, Masaryk University, Brno, Czech Republic, Czechia
  • Jiri Filipovic, Institute of Computer Science, Masaryk University, Brno, Czech Republic, Czechia
  • Jan Brezovsky, Loschmidt Laboratories, Faculty of Science, Masaryk University, Brno, Czech Republic, Czechia
  • Jiri Damborsky, Loschmidt Laboratories, Faculty of Science, Masaryk University, Brno, Czech Republic, Czechia
  • Gaspar Pinto, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia
  • David Bednar, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia

Short Abstract: Protein tunnels and channels serve as transport pathways for ligands during their binding or unbinding processes. It has been proven that their properties can define many crucial protein characteristics like substrate specificity, enantioselectivity, thermal stability, and activity. Therefore, a good understanding of such pathways is important for the deciphering of the protein function, protein engineering or drug design. Since transport pathways cannot be easily studied using experimental techniques, computational methods are often employed. Caver Web 1.0 is a novel web server for comprehensive in silico analysis of protein tunnels and channels as well as the ligands’ transport through the identified pathways in a single and easy-to-use graphical user interface. The only required inputs are a protein structure for tunnel detection and optionally a list of ligands for transport analysis. Several automated guidance procedures aid the users during the setup phase of the calculation leading to more accurate and biologically relevant results. The identified tunnels, their properties, and energy profiles of passing ligands can be directly analyzed and visualized in the interface. The server is freely available at https://loschmidt.chemi.muni.cz/caverweb.

A-022: The characterization of Plasmodial GTP Cyclohydrolase I enzyme as a potential anti-malarial drug target using computational approaches
  • Afrah Khairallah, Rhodes University, Research Unit in Bioinformatics (RUBi), South Africa
  • Vuyani Moses, Rhodes University, Research Unit in Bioinformatics (RUBi), South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: Drug resistance of the malaria parasite has raised a great challenge in anti-malarial drug discovery and grounded the need for new treatments. In this study, computational approaches were employed to discover new drug candidates by targeting the malaria parasite GTP CycloHydrolase (GCH1). GCH1 is the first enzyme of parasite de novo folate pathway. The enzyme is a toroid-shaped homo-decamer holding ten equivalent zinc containing active sites, each buried in a deep pocket of 10 A˚ (1 nm). Sequence and phylogenetic analyses were performed followed by homology modelling of accurate 3D structures. The resources available at the Centre for High-Performance Computing were utilized for the Virtual High Throughput Screening of candidate inhibitory compounds from the South African Natural Compounds Database. All atom molecular dynamics (MD) simulations were performed to study the protein dynamical properties. Potential energy surface scans were performed using the Gaussian software package, to derive the GCH1 zinc force field (FF) parameters. The developed FF parameters were validated via the Chemistry at HARvard Molecular Mechanics (CHARMM) MD package. The malaria parasite GCH1 enzyme showed selective binding to inhibitor compounds and promising inhibitors were identified. New FF parameters describing the zinc coordination environment were derived and validated successfully.

A-023: Single domain Immunoglobulins - Complexity deconstruction
  • Philippe Youkharibache, National Institutes of Health, United States

Short Abstract: The Ig-fold represents a unique case among superfolds. Its success in vertebrate evolution was noticed in the initial human genome sequencing1 with ~3% of its protein coding genes. In cell surface receptors (surfaceome) or the immune system (immunome), the Ig-fold is at the heart of a staggering 30% of cell surface receptors, making it a major orchestrator of cell-cell-interactions, in its Ig-like and also its FN3 form (b.1 and b.2 in SCOP). A reason for that success may well lie in its 3D structure that is self associative. In fact Immunoglobulins, TCRs, or multiple CD molecules such as CD8 are well known to form symmetric or pseudo-symmetric quaternary structures. This also extends to ligand-receptors pairing as in PD1-PDL1, a highly important checkpoint in immunotherapy. What is largely ignored is that the Ig-fold itself is pseudo-symmetric. This property gives us a decoding frame of reference to understand the fold, relate all domains forms, and provide new protein engineering avenues. Harnessing both tertiary and quaternary pseudo-symmetric self association properties should be of significance in designing single domain antibodies to target Ig-based receptors, as next generation checkpoint inhibitors, chimeric antigen receptors for CAR T-cell therapies, and the burgeoning field of immunoengineering.

A-024: Progress on protein structure prediction by deep learning
  • Jinbo Xu, Toyota Technological Institute at Chicago, United States

Short Abstract: Computational structure prediction of a protein without detectable homology in PDB is very challenging and usually needs a large amount of computing power. This talk will show that by using very deep convolutional residual neural network (ResNet), we may predict protein structures much more accurately and efficiently than ever before. Deep ResNet can predict inter-residue distance distribution very well, which enables us to construct protein 3D models from the geometric constraints given by the predicted distance matrix, without using time-consuming conformation sampling. Running on 20 CPUs, our deep ResNet method successfully folded 21 of the 37 CASP12 hard targets within 4 hours. In CASP13 our folding server successfully folded 17 of 32 hard targets and obtained the best contact prediction accuracy and almost the best folding accuracy among all servers. In the latest blind CAMEO test our folding server predicted correct folds of two membrane proteins, one of which has a new fold, while all the others failed. Our method also works well for complex contact prediction even trained by single-chain proteins. This talk will also compare the top human and server groups in CASP13, all of which have adopted deep ResNet for protein folding.

A-025: Study of protein unfolding for engineering stability
  • Sergio M. Marques, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia
  • Antonin Kunka, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia
  • Piia Kokkonen, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia
  • Joan Planas-Iglesias, Loschmidt Laboratories, Masaryk University, Czechia
  • Gaspar Pinto, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia
  • Zbynek Prokop, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia
  • David Bednar, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia
  • Jiri Damborsky, Loschmidt Laboratories, Masaryk University; International Clinical Research Center, Czechia

Short Abstract: High stability is a desirable property for many proteins of biotechnological interest. The reasons are related to the needs of using those biocatalysts at unnaturally high temperatures, in the presence of co-solvents, co-solutes, high salt concentrations, etc. A number of methods and approaches to develop protein mutants with improved stability have been applied with large success. The haloalkane dehalogenase DhaA is one of such cases, where the 11-point mutant DhaA115 displayed a melting temperature increase of 25 ºC. In this work we performed a comparative study of the unfolding pathways of the wild-type DhaA and the DhaA115 mutant. Adaptive molecular dynamics simulations were carried out at high temperature for a combined time of 20 microseconds. Markov state models were constructed to describe the unfolding pathways. This allowed us to characterize the possible intermediates and predict the unfolding rates. The results were in agreement with experimental data and corresponded well with the fact that DhaA115 is the most stable variant. The main unfolding sites were identified, and will be used to find new hotspots for further stabilization.

A-026: Amino acid sequences of protein-protein interaction surfaces for unknown protein complex pairs
  • Nobuyuki Uchikoga, Meiji University, Japan
  • Yuri Matsuzaki, Tokyo Institute of Technology, Japan

Short Abstract: Protein-protein interaction surfaces are specified from tertiary structure data of protein complex. However, in some cases, we find protein structure data without known protein partners of tertiary structures. For such cases, we have developed a rigid-body protein docking software, MEGADOCK, and also a method of interaction fingerprint (IFP) in terms of amino acid sequences. IFP method can be combine with a rigid-body docking process for performing classification of many candidate protein complex poses (decoys). We applied these methods to investigation protein-protein interactions involved in bacterial chemotaxis systems. We performed all-to-all docking processes and analyzed every protein pairs of decoy sets using IFPs for specifying their interaction surfaces. We then classified protein interaction surfaces with IFPs into groups, each of which include similar protein interaction surfaces of decoys. Some groups include unknown interaction protein pairs were observed. Then, actual amino acid sequences of protein-protein interaction surfaces are specified, and we will discuss annotations of these sequences.

A-027: Molecular Modelling and Active-site Channels of a Protein Associated with Homocystinuria in Qatar: Structural Basis for Developing New Therapies
  • Navaneethakrishnan Krishnamoorthy, Sidra Medicine & Imperial College London, Qatar

Short Abstract: Homocystinuria is a complex metabolic disorder leading to multiple disorders involving central nervous system and cardiovascular system etc. The protein cystathionine β-synthase (CBS) with missense mutation R336C is reported to be associated with homocystinuria. Specifically, in Qatar, it causes severe disease with high prevalence thus considered as a founder mutation. Our recent report with experimental models (in yeast and cell culture) have examined the activity and stability of the protein and showed deleterious effect of the mutation. However, the molecular mechanism of the disease is poorly understood. Here, molecular modelling was used to understand the structural consequences of the human CBS with mutation R336C and the potential active-site channels available for the entry/exit. The results of molecular dynamics simulations indicate that the mutant induces several structural and conformational changes that affect the surface and transferring to the channels. We further mapped the potential routes of the channels, size, key residues and their interactions. The mutant-impact on the surface accessibility shows that it could influence the binding of the target substrates and related activity of the CBS. Altogether the modelling suggests structure-functional relationships and it can provide a basis for developing new therapies for the treatment of homocystinuria.

A-028: Structural variations in the 1998-2018 influenza B hemagglutinin receptor binding site coincide with viral evolution and infection: A computational study with antiviral applications
  • Marni Cueno, Nihon University School of Dentistry, Japan
  • Kenichi Imai, Nihon University School of Dentistry, Japan

Short Abstract: Influenza B hemagglutinin (HA) is a major surface glycoprotein that mediates viral host entry with HA structural changes being linked to viral evolution, whereas, HA receptor binding site (RBS) differences associated to viral infection. However, the structural relationship between influenza B viral evolution and infection was not fully elucidated. Here, we analyzed and compared the 1998-2018 influenza B hemagglutinin (HA) surface glycoproteins from the Yamagata strain. Throughout this study, we generated HA homology models, verified the quality of each model, superimposed HA homology models to determine structural differences, and performed network analysis of the Yamagata strain HA evolution. Based on RMSD scores, we found that the 1998-2005 HA models have high structural distinction (RMSD > 0.50) compared to the 2006-2018 HA models (RMSD < 0.50) which coincidentally is consistent with network analysis of HA evolution. Similarly, structural analyses of the 2005-2006 HA RBS showed that the distances between known RBS stabilization residues (Y200, S140, P236) varied. In particular, the Y200-S140 distance differed between the 2005 (17.0 Å) and 2006 (19.88 Å) HA models. Taken together, we postulate that the 1998-2005 and 2006-2018 HA models are structurally distinct possibly ascribable to the 2005-2006 HA RBS distance change.

A-029: Structural analysis of binding promiscuity for MHC class II alleles
  • Rodrigo Ochoa, Max Planck Tandem Group, Biophysics of Tropical Diseases, University of Antioquia and EMBL-EBI, Colombia
  • Alessandro Laio, International School for Advanced Studies, Italy
  • Pilar Cossio, Max Planck Tandem Group, Biophysics of Tropical Diseases, University of Antioquia, Colombia
  • Roman Laskowski, EMBL-EBI, United Kingdom
  • Janet Thornton, EMBL-EBI, United Kingdom

Short Abstract: Prediction of binding for peptides interacting with MHC class II receptors is a task of interest, for example, in the study of autoimmune diseases and vaccine development. Most of the approaches to predict peptide affinity is through sequence-based models trained with experimental binding data and binding motifs derived from multiple alignments of known peptide substrates. However, many challenges remain due to the variability of peptide sizes and their flexibility. Here we developed a method to massively model and sample thousands of peptides in complex to four different MHC class II alleles to understand the interactions driving their binding promiscuity. The peptides were modelled using iterative single-point mutations that were defined based on alignments of their core regions. Each complex was subjected to Monte Carlo simulations using the Backrub method from Rosetta. Different observables were obtained from the ensemble such as the number of hydrogen bonds, relative surface accessibility and heavy-atom contacts. We averaged and compared the observables for each amino acid at each position of the 9-mer core region. With this information we generated a set of amino acid indexes for the included alleles. The results were assessed using an available binding dataset and sequence-based matrices to rank core regions.

A-030: Structure Based Prediction of MHC II Binding Peptides
  • Josef Laimer, University of Salzburg, Austria
  • Markus Wiederstein, University of Salzburg, Austria
  • Peter Lackner, University of Salzburg, Austria

Short Abstract: The binding of an antigen peptide to MHC class II molecules is essential for initiating an immune response. Thus, fast and accurate identification of potential binding peptides is critical for basic research and clinical translation. Nowadays, various computational approaches exist for this task, which can roughly be divided into two classes: sequence-based methods employing machine learning (ML), and structure-based methods using physical concepts realized by docking, molecular dynamics, or threading. We present a novel structure-based approach which utilizes statistical scoring functions (SSFs). SSFs have a wide range of applications in protein science, e.g. for the assessment of protein structures or protein stability prediction. Here, SSFs are used to evaluate interactions between peptides and MHC II molecules. Thereby, predictions are performed on sets of MHC II allele-specific 3D models, where potential binding peptide sequences are applied on each of these models and subsequently scored. Finally, a consensus score is computed. Our method is not limited to specific alleles or the availability of an experimentally determined structure. The prediction is fast, and its accuracy is close to the performance shown by ML approaches, while the risk of overfitting on certain training data is reduced.

A-031: PDBe-KB: Placing structural data in its biological context
  • Pdbe-Kb Consortium, EMBL-EBI, United Kingdom
  • Mihaly Varadi, EMBL-EBI, United Kingdom

Short Abstract: New technologies drive the expansion of structures deposited in the Protein Data Bank, with over 150,000 structures referencing over 45,000 unique UniProtKB entries. Atomic coordinates of macromolecular structures help in gaining insights regarding the molecular function, but the inherent value of these structures can only be fully realized in their biological context. The Protein Data Bank in Europe - Knowledge Base, managed by the PDBe team at EMBL-EBI, is an international, community-driven resource with the primary goal of collating and making available structural and functional annotations contributed by partner resources, and providing FAIR access to this context-related data alongside the core structural data of PDB in a novel, aggregated manner. Partner groups provide literature-based manual curations and computational predictions such as ligand binding sites, catalytic sites or protein-protein interfaces as well as effects of residue mutations. These annotations are stored in a highly interconnected graph database, which enables comparisons between prediction methods, facilitates data exchange and allows insights into the biological function by providing a comprehensive view of the functional context of the protein structure. Data is available programmatically and via novel PDBe-KB web pages focusing on biological entities, such as full-length proteins (https://pdbe-kb.org/proteins), complementing the traditional entry-focused PDB pages.

A-032: Improving Scoring Functions for Protein-Ligand Binding Affinity Using Small Molecule Descriptors
  • Fergus Boyles, University of Oxford, United Kingdom
  • Garrett Morris, University of Oxford, United Kingdom
  • Charlotte M. Deane, University of Oxford, United Kingdom

Short Abstract: Scoring functions for protein-ligand binding affinity typically use features describing the protein-ligand complex, with limited information about the ligand itself. We have investigated the effect of adding a diverse set of ligand-based features to structure-based machine learning scoring functions. The inclusion of ligand-based features consistently improves the performance of structure-based machine learning scoring functions, even when ligands highly similar to those in the test set are excluded from the training set. The presence of similar proteins in the training and test sets has a significant impact on scoring function performance. However, the inclusion of ligand-based features improves performance regardless of training set composition. We investigated this behaviour and show that features of the ligand appear to be predictive of its mean binding affinity for its protein targets. We also find that the same ligand-based features are consistently important regardless of which structure-based features they are combined with. On the Comparative Assessment of Scoring Functions 2009, 2013, and 2016 scoring power benchmarks, a purely ligand-based model outperforms both the AutoDock Vina scoring function.

A-033: Pyrimethamine compromised Plasmodium falciparum mutant dihydrofolate reductase (DHFR) proteins reveal residue network differences underlying drug resistance in the parasite
  • Arnold Amusengeri, Rhodes University, South Africa
  • Rolland Bantar Tata, Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: Malaria remains a public health challenge with high global prevalence and death rates. Among Plasmodium species causing human malaria, Plasmodium falciparum is the most devastating. Crucial to Plasmodium parasite replication is the enzyme dihydrofolate reductase (DHFR). DHFR is involved in folate metabolism, responsible for generating the parasite DNA base, dTMP. DHFR has long been validated as an antimalarial drug target. However, widespread resistance of DHFR to known therapeutic agents has been reported, due to induced active site mutations. In this work, we aim to understand the effects of four point mutations at the active site, responsible for resistance to the approved antimalarial drug, pyrimethamine. Using molecular docking technique, pyrimethamine was initially docked into wildtype and mutated DHFR homology models. Subsequent all-atom molecular dynamics simulations (MD) and binding free energy computations were performed. It was discovered that, mutations marginally influence protein-ligand stability, while significantly aggravating pyrimethamine’s binding affinity. Next, dynamic residue network analysis (DRN) was used to determine the impact of mutations on communication dispositions of DHFR residues. Relative to wildtype, specific mutated models demonstrated non-native connectivity patterns. Furthermore, presence of pyrimethamine resulted in unique network changes in wildtype compared to mutants, suggesting compromised information flow in pyrimethamine bound wildtype.

A-034: Homology modelling and molecular docking of Moniliophtora perniciosa TOR1 and prediction of its interaction with Theobroma cacao
  • Andria Freitas, Universidade Estadual de Santa Cruz, Brazil
  • Fabienne Micheli, CIRAD, Brazil
  • Bruno Silva Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil

Short Abstract: The basidiomycete Moniliophthora perniciosa the causal agent witches’ broom disease of cacao (Theobroma cacao), and is responsible for a severe drop in its production this in different countries, and especially in Brazil. The TOR protein is a serine/threonine kinase related with the phosphatidylinositol kinase family (PKI), and it is grouped into two distinctly evolutionary conserved and distinctly multiprotein complexes - TORC1 and TORC2. TOR1 is involved in processes of availability and nutrient quality and regulates cell growth by antagonizing the activity of transcription factors in the cytoplasm. The aim of this work was carried out 3D homology modelling of TOR1 receptor from M. perniciosa, as well as seaching for its inhibitors and describe their mechanism of interaction, using molecular docking approach. Ligand seaching was perfomed using ZINC (http://zinck.docking.org) and DRUGBANK (https://www.drugbank.ca/) databases, and pharmacophore modelling by PharmaGist (http://bioinfo3d.cs.tau.ac.il/PharmaGist/). For 3D construction, we used Modeller 9.21, and after performed an AMBER 14 energy minimization for 5000 cycles of steepest descent and 5000 cycles of conjugated gradient for adjusting protein structure. The structure was validated using QMEAN, ANOLEA and Procheck programs. Docking results were obtained with Autodock Vina, and 2D ligand interaction maps were constructed using Accelrys Discovery Studio 2.5.

A-035: Fast protein assembly search and alignment with complete 3D Zernike moment invariants
  • Dmytro Guzenko, RCSB Protein Data Bank, UC San Diego, United States
  • Jose M. Duarte, RCSB Protein Data Bank, UC San Diego, United States
  • Stephen K. Burley, RCSB Protein Data Bank, Rutgers University, United States

Short Abstract: Detection of protein structure similarity is a central challenge in structural bioinformatics. The functional form of a protein within the cell is often an oligomer, rather than the isolated monomer. This fact, together with the explosive growth of protein assembly structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment and retrieval. Traditional structure comparison methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements within an assembly. These challenges can be circumvented by comparing electron density maps directly. But, brute force alignment of electron density distributions is a compute intensive 3-dimensional search problem. Instead, we exploit a novel 3D Zernike moment normalization procedure to implicitly orient electron density maps, and assess similarity in moment space with unprecedented speed. Using this computational strategy, we introduced the concept of structure profiles analogous to sequence profiles, thereby increasing the sensitivity of structural similarity detection. Based on these principles, we have developed a search system that enables real-time retrieval of similar protein assemblies to a target assembly, predefined in PDB or uploaded by a user, together with their alignment (http://shape.rcsb.org).

A-036: Analysis of amino acid structures from amyloid-like end segments in PDB entries
  • Kristóf Takács, Eötvös Loránd University, Hungary
  • Vince Grolmusz, Eotvos University, Hungary

Short Abstract: Entries of the Protein Data Bank (PDB) having relatively long amyloid-like segments (i.e. containing significantly long parallel beta-sheets between their chains) can be considered important in the view of drug development against amyloid-related diseases (e.g. amyloidosis, Alzheimer’s, Huntington’s diseases etc.). Heuristically, these entries can be suspected as structures which are disposed to transform into amyloids, and finding the reason of the lack of this expected change can be an essential step in developing new effective drugs for the diseases mentioned above. The latest results of our research regarding the amino acid structure of amyloid-like segments in PDB entries are presented, particularly considering the last few amino acids at the ends of the previously found parallel beta-sheets which can be strongly conjectured as related to starting and ending of the process of amyloid development.

A-037: Mutational patterns of the HIV-1 integrase in isolates of Raltegravir-treated and drug-naïve patients: structure, variability and mutation co-occurrence
  • Lucas Machado, Fiocruz, Brazil
  • Marcelo Gomes, Fiocruz, Brazil
  • Ana Carolina Guimarães, Fiocruz, Brazil

Short Abstract: One of the primary drug targets in the therapy against immunodeficiency virus type 1 (HIV-1) is the integrase - the enzyme responsible for the integration of the viral DNA into the host genome. The integrase inhibitor Raltegravir has been widely used in antiretroviral therapy; however, Raltegravir-resistant HIV-1 strains have become a worldwide problem. Here, we compared the variability of each position of the HIV-1 integrase sequence in clinical isolates of Raltegravir-treated and drug-naïve patients by calculating their Shannon entropies. We also built tridimensional models of the HIV-1 integrase and a mutation co-occurrence network. The relationship between variability, architecture, and co-occurrence was investigated. It was observed that positions bearing major resistance-related mutations are highly conserved among non-treated patients and variable among the treated ones. The integrase structure showed that the highest-entropy residues are in the vicinity of the host DNA, and their variations may impact the protein-DNA interface. The co-occurrence network and structural analysis support the hypothesis that the resistance-related E138K mutation compensates for mutated DNA-anchoring lysine residues. The study results reveal patterns by which the integrase adapts during the Raltegravir therapy; this information can be useful to rethink the drugs currently used or to guide the development of new ones.

A-038: Novel Potential Antimalarials through Drug Repurposing and Multitargeting: A Computational Approach
  • Bakary Ntji Diallo, Rhodes University, South Africa
  • Kevin Lobb, Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: Drug repurposing offers a time and cost-effective strategy to find new antimalarials with reduced failure risk. This study identified potential antimalarials from Food and Drug Administration (FDA) approved drugs. 796 drugs from DrugBank were docked against 36 Plasmodium targets from screening Protein Data Bank (sc-PDB) using QuickVina-W. Lipophilic efficiency (LipE), surface efficiency index (SEI) and binding efficiency index (BEI)) and GRaph Interaction Matching (GRIM) were combined for hits selection later assessed in molecular dynamics simulations. In docking validation, 77% of the top poses had Root Mean Square Deviations (RMSD ) <= 2 Å when compared to co-crystalized ligands. Out of 28656 (36x796) docking experiments, 26 protein-ligand complexes were selected. Their minimum LipE, SEI and BEI were 4, 23 and 7 respectively while their binding affinity (-6 to -11 kcal/mol) and GRIM scores (0.58 to 0.78) indicated strong binding and good interactions similarity to co-crystalized ligands. During simulations, complexes were stable: maximum proteins and ligands RMSD was 5 Å and 3.7 Å respectively with an average of 1.4 hydrogen bond. Six potential multitarget ligands having a high GRIM (>0.7) on proteins having similar binding sites were also found. Using a combined computational workflow, 32 drugs are proposed for repurposing as antimalarials.

A-039: Resources for repeat protein structure annotation: RepeatsDB and RepeatsDB-Lite
  • Lisanna Paladin, University of Padua, Italy
  • Damiano Piovesan, University of Padova, Italy
  • Silvio Tosatto, University of Padova, Italy

Short Abstract: Tandem repeats (TR) in proteins are ubiquitous in genomes and have been demonstrated to be of fundamental importance in several biological processes. Structural TR modules, called units, determine the repeated region structure, stability and function. The largest collection of TR proteins detected from structural features is provided by the RepeatsDB database, developed to fill the gap in TR protein annotation. It relies on computational approaches and expert manual curation to detect TRs in the Protein Data Bank (PDB) structures. RepeatsDB annotation pipeline is integrated in RepeatsDB-Lite, web server for the prediction and refinement of TRs annotation. RepeatsDB-lite outperforms existing methods in the prediction of TR units. Its interface allows the user to evaluate the prediction visualizing similarity relationships between units at both sequence and structure level. The revision process provides us a feedback about RepeatsDB-Lite prediction reliability. In addition, high quality annotations can be submitted to RepeatsDB, where currently about 60% of all entries are manually reviewed. In the database is now possible to compare unit positions, together with secondary structure, fold information and Pfam domains. Future work will concentrate on exploiting the repeat unit definitions to create Pfam profiles for TR detection from sequence.

A-040: MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins.
  • Marco Necci, University of Padua, Italy
  • Silvio Tosatto, University of Padova, Italy

Short Abstract: The MobiDB (URL: mobidb.bio.unipd.it) database of protein disorder and mobility annotations has been significantly updated and upgraded since its last major renewal in 2014. Several curated datasets for intrinsic disorder and folding upon binding have been integrated from specialized databases. The indirect evidence has also been expanded to better capture information available in the PDB, such as high temperature residues in X-ray structures and overall conformational diversity. Novel nuclear magnetic resonance chemical shift data provides an additional experimental information layer on conformational dynamics. Predictions have been expanded to provide new types of annotation on backbone rigidity, secondary structure preference and disordered binding regions. MobiDB 3.0 contains information for the complete UniProt protein set and synchronization has been improved by covering all UniParc sequences. An advanced search function allows the creation of a wide array of custom-made datasets for download and further analysis. A large amount of information and cross-links to more specialized databases are intended to make MobiDB the central resource for the scientific community working on protein intrinsic disorder and mobility.

A-041: Implementing InteractoMIX platform in Galaxy
  • Patricia Mirela Bota, UVic: Universitat de Vic - Universitat Central de Catalunya, Spain
  • Narcis Fernandez-Fuentes, UVic: Universitat de Vic - Universitat Central de Catalunya, Spain
  • Joaquim Aguirre-Plans, GRIB (IMIM-UPF), Spain
  • Baldo Oliva, GRIB (IMIM-UPF), Spain

Short Abstract: Protein-protein interactions (PPIs) play a crucial role among the different functions of a cell and central to our understanding of cellular processes both in health and disease. On 2016, we presented the InteractoMIX platform (http://interactomix.com), a suite of computational tools devoted to the study of protein-protein interactions (PPIs) and interactomics data. InteractoMIX platform represent a one-stop, centralized, resource for the analysis of PPIs information ranging from genome-wide interactomes to the atomic details (i.e. 3D strucutures) of protein complexes. Here we present the implementation of InteractoMIX into Galaxy, an open-source, web-based platform for data intensive biomedical research. By integrating all the InteractoMIX tools in Galaxy we facilitate the in silico study of some of the top topics in the current biomedical research in PPIs in an user-friendly manner. Moreover combining the different tools we have created ready-to-use workflows, that scientists can exploit to expand their knowledge on explicit pathways, identify key residues related to the function of a particular interaction, model the structure of protein complexes and, eventually, predict new therapeutic targets and its corresponding agents including the de novo modeling of orthosteric peptides to target particular PPIs.

A-042: Bacterial origin of the 19-stranded outer membrane β-barrels specific to mitochondria
  • Joana Pereira, Max Planck Institute for Developmental Biology, Germany
  • Andrei Lupas, Max Planck Institute for Developmental Biology, Germany

Short Abstract: Amplification of subdomain-sized fragments is a dominant phenomenon in the evolution of protein folds. One example is the outer membrane β-barrel (OMBB) fold, a toroidal array of anti-parallel β-strands spanning the outer membrane of Gram-negative bacteria and eukaryotic organelles. Although forming a homologous group, most families of OMBBs evolved independently through amplification from a pool of ancestral ββ-hairpins. Mitochondria carry an atypical family of 19-stranded OMBBs (MOMBBs), for which no bacterial ortholog is known and whose origins are substantially unclear. To elucidate its roots, we performed a large-scale comparison of mitochondrial and bacterial OMBBs. Our results indicate that the common ancestor of all MOMBBs emerged by the fivefold amplification of a double ββ-hairpin of bacterial origin, probably at the root of the eukaryotic lineage. The resulting 20-stranded barrel yielded the 19-stranded form observed today through a strand-to-helix transition at its N-terminus, possibly driven by the need to plug the newly evolved pore. With this, our study provides an explanation for the origins of MOMBBs while highlighting the role of repetition and fold change in the emergence of new forms of established protein folds.

A-043: Prediction of Critical Residues in Protein Structures using Amino Acid Networks
  • Tomáš Martínek, Brno University of Technology, Czechia
  • David Bednář, Loschmidt Laboratories, Czechia
  • Lenka Sumbalová, Brno University of Technology, Czechia
  • Jiri Damborsky, Loschmidt Laboratories, Czechia

Short Abstract: For the engineering of improved proteins, it is important to know which residues are important and whether their replacement will have a big influence on protein function, stability or other properties. Knowledge about critical residues can be used either for prediction of an impact of substitutions or for selecting the best spots for mutagenesis. Limitations of the traditional protein design tools like FoldX or Rosetta are their speed and accuracy particularly at the solvent exposed surfaces. The amino acid network analysis is alternative to modelling the proteins using the force-field calculations and then selecting the most important residues based on the energies. An amino acid network is a graph derived from protein structure based on contacts or interactions between the residues. Our approach is to create the network from a protein, compute network parameters, then add physico-chemical and biological properties of amino acids and use it as features for machine learning. Currently we are focusing on the residues critical for protein stability. We are using the dataset of 2893 mutations from 150 proteins with experimentally measured stabilities for testing of our method. Our method is significantly faster and gives comparable results to the force-field calculations based on preliminary results.

A-044: Matching biomolecular structures by registration of point clouds
  • Nima Vakili, Goettingen University/ Max Planck Institute, Germany
  • Michael Habeck, Goettingen University/ Max Planck Institute, Germany

Short Abstract: Assessing the match between two biomolecular structures is at the core of many structural analyses, including superposition, alignment and docking. Typically, these tasks are solved with specialized structure-matching techniques for protein structural alignment, rigid-body docking, rigid fitting into cryo-EM maps and other structure-based algorithms. We present a unifying framework for comparing biomolecular structures. First, biomolecular structures (represented as atomic coordinates or cryo-EM maps) are converted into three-dimensional weighted point clouds. Second, the match between two structures is measured as the overlap between the densities induced by the point clouds resulting in the kernel correlation as a measure of agreement between the structures. To compare two structures, six rigid degrees of freedom need to be optimized. Global optimization of the kernel correlation is challenging due to its non-convexity and the existence of multiple suboptima. We compute the optimal rigid transformation by branch and bound based on a tessellation of the space of all rigid transformations and an upper bound predicted with Gaussian process regression. We illustrate the efficiency of our approach in protein structure alignment and cryo-EM docking.

A-045: Flavors of linear interacting peptides derived from structural data in MobiDB
  • Alexander Monzon, Department of Biomedical Sciences, University of Padova, Italy
  • Damiano Piovesan, University of Padova, Italy
  • Silvio Tosatto, University of Padova, Italy

Short Abstract: Linear Interacting Peptides (LIPs) are binding regions presumed or demonstrated to be intrinsically disordered that fold upon binding. Those LIPs are commonly referred in the literature as SLIMs (Short Linear Interacting Motifs) or MoREs (Molecular Recognition Elements). Different databases collect manually curated examples of LIPs as ELM, IDEAL, DIBS and MFIB, focusing on apparently different aspects of this phenomenon. MobiDB (url: http://mobidb.bio.unipd.it) provides derived LIPs using an automatic approach based on the new Mobi 2.0 software to extract LIPs from PDB files. The amount of LIPs derived in MobiDB is ten times more than actual curated sources, allowing us to perform a statistical analysis to improve their detection and annotation. In this work we analyzed different sequence-structure features such as, length, composition, structural environment (DNA/RNA, Homo-oligomers and Hetero-oligomers) and percentage of disorder, to assess different flavors of LIPs. We found that most of the LIPs annotated in MobiDB are disordered and their amino acid frequencies are correlated with the ones found in DisProt. Moreover our analysis shows that each subset of LIPs is unique in terms of residues and proteins in common. This suggests different types of LIPs possibly related with their biological function.

A-046: A supervised but interpretable coevolutionary predictor of protein-protein interactions
  • Maureen Muscat, Sorbonne Université, France
  • Giancarlo Croce, Sorbonne Université, France
  • Edoardo Sarti, Sorbonne Université, France
  • Martin Weigt, Sorbonne Université, France

Short Abstract: Computational predictions of protein-protein interactions (PPIs) are limited by factors such as the narrowness of protein-protein interfaces, the mixed signals due to multiple interaction partners, and the scarcity of co-crystallized structures. Recently, deep learning methods attained unprecedented accuracy levels in 3D structure prediction, but the learning procedure needs large amounts of structural data, typically not available for PPI. Direct Coupling Analysis (DCA) is an unsupervised coevolution-based method that has been employed with success on domain and protein structure prediction starting exclusively from sequence data. Nonetheless, in most PPI cases the resulting contact map is too noisy to extract reliable predictions, and supervision would be needed to extract signal from noise. We defined a small set of local filters, based on available PDB structures for domain-domain interfaces. These filters, applied to the outcome of DCA and combined via a simple step of supervised learning, are found to significantly improve the predictions of DCA. We find that the performance is comparable to deep-learning based methods like PConsC4 and RaptorX, even in the case when very limited data sets were used for supervision.

A-047: Network analysis of synonymous codon usage
  • Khalique Newaz, University of Notre Dame, United States
  • Gabriel Wright, University of Notre Dame, United States
  • Jacob Piland, University of Notre Dame, United States
  • Jun Li, University of Notre Dame, United States
  • Patricia Clark, University of Notre Dame, United States
  • Scott Emrich, University of tennessee, United States
  • Tijana Milenkovic, University of Notre Dame, United States

Short Abstract: Most amino acids are encoded by multiple synonymous codons. For an amino acid, some of its synonymous codons are used significantly more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of many rare codons are evolutionary conserved. Analyses of positions of rare codons in proteins’ 3-dimensional structures, which are “richer” in biochemical information than sequences, might further explain the role of rare codons in protein folding. So, we analyze a large protein set recently annotated with codon usage information, considering sequence non-redundant proteins with sufficient structural information. We model the proteins’ structures as networks and study potential differences between network positions (centralities) of amino acids encoded by evolutionary conserved rare, evolutionary non-conserved rare, and commonly used codons. In 84% of the proteins, at least one of the three codon types occupies significantly more or less network-central positions than the other codon type(s). Different protein groups showing different codon centrality trends (i.e., different types of relationships between network positions of the three codon types) are enriched in different biological functions, implying the existence of a link between codon usage, protein folding, and protein function.

A-048: CONAN: Interactive contact network analysis and visualization of ensembles of molecular structures
  • Markus Schneider, Technical University of Munich, Germany
  • Iris Antes, Technical University of Munich, Germany

Short Abstract: Structure-based network analysis methods enjoy growing popularity for the investigation of structural and dynamical features of proteins. They are often used to study allostery and intramolecular signaling, an elusive yet important topic due to its potential impact on drug and protein design. However, often allosteric effects are not observable in individual crystal structures, but emerge from dynamic fluctuations within the system. We present CONAN (COntact Network ANalyzer), a Cytoscape 3 plugin that focuses on the analysis of dynamic networks in biomolecules generated from time-dependent interaction data which can be obtained by molecular simulations. We provide several analysis functions that make use of this timeline data, e.g. timeline correlation, autocorrelation, lifetime, time frame clustering and network comparison. In addition, the plugin allows for parallel network and structure visualization by connecting CONAN to either the PyMOL, VMD, or UCSF Chimera GUIs. The overall aim of the plugin is to provide a tool for analysis of ensembles of molecular structures that is (1) easy to learn and use, (2) capable of fully interactive side-by-side network and structure visualization and (3) compatible with a large system of network analysis tools.

A-049: In silico structural characterization and molecular docking for human TAS2R16 receptor
  • Catiane Souza, Laboratório de Pesquisa em Microbiologia das Universidade Estadual de Feira de Santana, Brazil
  • Geovane Araujo, Laboratório de Bioinformática e Química Computacional da Universidade Estadual do Sudoeste da Bahia - Jequié, Brazil
  • Samille Gonçalves, Laboratório de Pesquisa em Microbiologia das Universidade Estadual de Feira de Santana, Brazil
  • Aristóteles Goes-Neto, Laboratório de Biologia Molecular e Computacional de Fungos da Universidade Federal de Minas Gerais, Brazil
  • Raquel Benevides, Laboratório de Pesquisa em Microbiologia das Universidade Estadual de Feira de Santana, Brazil
  • Bruno Silva Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil

Short Abstract: Autism is a rare psychiatric disorder characterized by imbalanced intellectual development, which impairs the ability to socialize, and in some cases motor coordination. This condition occurs due to genetic alterations that affect the normal development of the central Nervous System. There is a range of genetic basis involved in autism, one of these are related to G-protein–coupled taste receptors. For some authors, genetic polymorphism in these molecules is responsible for different levels of Autism. TAS2R16 receptor is associated with detecting bitter taste for molecules such as sesquiterpene lactones, clerodane and labdane, diterpenoids, strychnine and denatonium. In this work, we aimed to construct 3D structures of human TAS2R16 receptor, based on their normal gene sequences, as well as performing molecular docking with different bitter taste molecules in order to describe active site interactions. For 3D construction, we used Modeller 9.21, and after performed an AMBER 14 energy minimization for 5000 cycles of steepest descent and 5000 cycles of conjugated gradient for adjusting protein structures. The structure was validated using QMEAN, ANOLEA and Procheck programs. Docking results were obtained with Autodock Vina, and 2D ligand interaction maps were constructed using Accelrys Discovery Studio 2.5.

A-050: Probing the effect of mutations on the Interactome
  • Jean Marc Kwasigroch, Université Libre de Bruxelles, Belgium
  • Marianne Rooman, Université Libre de Bruxelles, Belgium
  • Fabrizio Pucci, ULB, Belgium

Short Abstract: In order to have a better comprehension of how missense mutations result in disease phenotypes it is nowadays clear that their impact has to be investigated in the context of biological networks. Here we present a multi-scale method to analyze the effect of mutations on the protein-protein interaction network (PPIN). After the introduction of the statistical potentials, that are the key ingredients of our analysis at the molecular scale, we will briefly show how these structural energetic information can be combined to predict the change upon mutations of the folding free energy of a protein and of its binding affinity for an interacting partner. The combination of these methods (called PoPMuSIC and BeAtMuSiC) will be then applied to the Interactome scale to predict the “edgotype” of a mutation, namely if the mutation induces a "node" removal in the PPIN, is likely to lead to an edgetic perturbation or it has essentially no effect on the network. The systematic characterization of the mutation's "edgotype" is a fundamental step towards the understanding of the genotype-phenotype relations and lead to a deeper understanding of the perturbed Interactome that will be an invaluable asset in drug design for the choice of therapeutical strategies.

A-051: Novel analysis and data visualization of molecular dynamics trajectories.
  • Dibyajyoti Maity, Indian Institute of Science, Bengaluru, India
  • Debnath Pal, Indian Institute of Science, India

Short Abstract: With the advent of high-performance computing microsecond long MD simulations for proteins can be easily performed. Since, visual inspection of such huge amount of data is not feasible; statistical analysis and efficient representation of the data is required. To understand the electric environment around a molecule the electrostatic potentials are calculated by solving the Poisson-Boltzmann equation and visualized by coloring the molecular surface according to the electrostatic potential in the vicinity. This is generally done for a single conformation and the visual comparison of such surfaces from different molecules is difficult. Instead, we calculated the electrostatic potential for an ensemble of conformations from the MD trajectory and plotted the mean electrostatic potential on the molecular surface per residue. This enabled us to locate residues responsible for significantly perturbing the electric environment around the molecules. We found that the electrostatic potential on the surface is significantly different for the crystal structure compared to the corresponding ensemble obtained from the MD trajectory. Combined with secondary structure calculation, root mean square fluctuation and circular standard deviation of dihedral angles, inferences about the structural mechanism proteins can be made. This tool can be easily used to analyze MD simulation of any protein

A-052: Template-based modeling of protein complexes using the PPI3D web server
  • Kliment Olechnovic, Vilnius University, Lithuania
  • Ceslovas Venclovas, Vilnius University, Lithuania
  • Justas Dapkūnas, Vilnius University, Lithuania

Short Abstract: Comprehensive understanding of protein function requires the knowledge of how proteins interact with each other. Although the number of solved structures of protein complexes grows steadily, there is a huge gap between known protein interactions and the corresponding structures. Currently, template-based modeling is the most accurate computational method to predict these structures. It requires identification of homologous protein complexes that can serve as templates. To simplify this task, we developed PPI3D, the database of protein-protein interactions and the associated web server. Provided only the sequences of proteins, PPI3D searches for homologous protein complexes that have experimental structures in the Protein Data Bank. PPI3D also enables detailed analysis of the identified protein interaction interfaces as well as using these interfaces as templates for structure modeling. The experimental structures in PPI3D are pre-clustered according to the similarity of both protein sequences and interaction interfaces, thus reducing the data redundancy and at the same time keeping the alternative protein interaction modes. This facilitates both the analysis of experimentally solved structures and the selection of suitable templates for structure modeling. The PPI3D web server was used during recent CASP and CAPRI experiments and proved to be a highly useful tool for modeling multimeric proteins.

A-053: Long range interactions of Human Papillomavirus with human genes can cause alteration in gene expression levels of oncogenes in cervical cancer.
  • Mahua Bhattacharya, Bar Ilan University, Israel
  • Dorith Raviv Shay, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel
  • Milana Frenkel-Morgenstern, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel

Short Abstract: Cervical cancer accounts for 90% of gynaecological cancers worldwide. The major cause of the cancer is known to be Human papillomavirus (HPV) that integrates in the human genome causing gene alteration. The progression of oncogenesis is largely driven by the expression of the integrated oncogene E6 and E7 which codes for E6 and E7 oncoproteins respectively. These viral oncoproteins are known to perturb the expressions of genes. Studies show that in diseased conditions, the alteration of gene regulation is the consequence of overall chromatin topology alteration and altered TADs position. We hypothesised that such integration can cause alteration in gene regulation by influencing the overall chromosome architecture and interacting with distal regulatory genes. To study this, we used circular chromosome conformation capture and identified interaction profile of the E6 oncogene with various genes in the human genome. We found that integrated E6 interacts with distal genes as well as genes in its vicinity and cis-activates them by influencing the regulation of the genes it interacts with. We also see an altered TADs position suggesting that integration causes perturbed gene expression and regulation and also causes oncogenic progressions. Such TADs position can be used to associate the risk of cervical cancer.

A-054: Deciphering interaction fingerprints from protein molecular surfaces
  • Pablo Gainza, Ecole Polytechnique Federale de Lausanne and Swiss Institute of Bioinformatics, Switzerland
  • Freyr Sverrisson, Ecole Polytechnique Federale de Lausanne and Swiss Institute of Bioinformatics, Switzerland, Switzerland
  • Federico Monti, Institute of Computational Science, Faculty of Informatics, USI Lugano, Switzerland, Switzerland
  • Emanuele Rodola, Department of Computer Science, Sapienza University of Rome, Italy, Italy
  • Michael Bronstein, Institute of Computational Science, Faculty of Informatics, USI Lugano, Switzerland, Switzerland
  • Bruno E Correia, Ecole Polytechnique Fédérale de Lausanne, Switzerland

Short Abstract: Predicting interactions between proteins and other biomolecules purely based on structure is an unsolved problem in biology. A high-level description of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein’s modes of interactions with other biomolecules. We hypothesize that proteins performing similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF, a conceptual framework based on a new geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: (a) protein pocket-ligand prediction, (b) protein-protein interaction site prediction, where we achieve a median ROC AUC of 0.80, compared with 0.65 for an established tool; and (c) ultrafast scanning of protein surfaces for prediction of protein-protein complexes, where we achieve runtime speeds fast tools up to 1000 times faster than some of the fastest docking tools. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.

A-055: A Global Structural Protein Interaction Network with Genomic and Phenomic Annotations
  • Janez Konc, National Institute of Chemistry, Slovenia
  • Blaž Škrlj, Department of Animal Science, Biotechnical Faculty, University of Ljubljana, SI-1000 Ljubljana, Slovenia, Slovenia
  • Nika Eržen, University of Ljubljana, Slovenia
  • Tanja Kunej, University of Ljubljana, Slovenia
  • Tjaša Stare, National Institute of Biology, Slovenia
  • Kristina Gruden, National Institute of Biology, Slovenia

Short Abstract: Understanding how proteins interact on the structural as well as at the network level should provide novel insights into the development of complex diseases. We constructed a global structural protein interaction network, consisting of interactions between proteins and other proteins, small molecules, ions, cofactors, glycans, nucleic acids and conserved water molecules. The network includes multiple layers of biological annotations: genomic, proteomic, phenomic and structural, and consists of 76,182 protein nodes from more than 5,000 organisms, and more than 3 million non-protein nodes. Human proteins are annotated with more than 30,000 diseases. Edges in the network represent physical interactions predicted based on protein binding site similarity calculation using the ProBiS local structural alignment algorithm. To enable intuitive exploration of these networks, we developed a Structural Protein Interaction Network Atlas, a web resource in which networks can be viewed globally, for the first time providing the view of entire interactomes. Some predicted protein-protein interactions that could not be determined by other methods were prospectively verified using yeast two-hybrid screening in the Arabidopsis thaliana, Solanum lycopersicum and Solanum tuberosum plants. The achieved accuracy of 50% of correctly predicted interactions indicates a high potential of the developed network approach for guiding experimental research.

A-056: SOLart: a structure-based method to predict protein solubility and aggregation
  • Jean Marc Kwasigroch, Université Libre de Bruxelles, Belgium
  • Marianne Rooman, Université Libre de Bruxelles, Belgium
  • Qingzhen Hou, Université Libre de Bruxelles, Belgium
  • Fabrizio Pucci, ULB, Belgium

Short Abstract: The solubility of a protein is often decisive for its proper functioning. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools. We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue-residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of E. coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart’s performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of 0.7 both in the training dataset and on an independent set of S. Cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists.

A-057: Structural analysis of AMPK and its mutants and phosphorylation sites in breast cancer
  • Sachendra Kumar, Indian Institute of Science, India
  • Debnath Pal, Indian Institute of Science, India
  • Annapoorni Rangarajan, Indian Institute of Science, India

Short Abstract: AMP-activated protein kinase (AMPK) is conserved master regulator of energy homeostasis and a potential therapeutic target for human diseases like diabetes, obesity and cancer. It is a heterotrimeric protein (α1-2, β1-2 and γ1-3 isoforms) yielding 12 possible complexes of AMPKαβγ isoforms. The results from our lab’s experiments suggest that AMPKα2 gene is highly expressed than AMPKα1 in matrix-deprived breast cancer cells. AMPKα1 and AMPKα2 have shown different substrate specificity based on the substrates data available in PhosphoSitePlus. Therefore, it is important to study the conformational changes that occur during nucleotide dependent activation of AMPKα2. Our molecular dynamics simulation of AMPKα2β1γ1-AMP complex suggests that there is significant fluctuation of AMPK β subunit when compared to its crystal structure. This is supported by rigid structural alignment, comparison with B-factor and structural morphing. This structural dynamics creates the possibility for cis-autophosphorylation of β Ser-108, which is important for stable interaction between α and β subunits and allosteric activation of AMPK using drugs. Mapping mutations and phosphorylation site data provides mechanistic insight to understand the correlation between structure and function of AMPKα2 in breast cancer. Our findings further help in designing potent AMPKα2-targeting therapeutics using peptide based drugs.

A-058: Exploring DOT assisted interactions in protein-RNA complexes
  • Ambuj Srivastava, Department of Biotechnology, Indian Institute of Technology Madras, India
  • Shandar Ahmad, School of Computational and Integrative Sciences, Jawaharlal Nehru University, India
  • M. Michael Gromiha, Department of Biotechnology, Indian Institute of Technology Madras, India

Short Abstract: Intrinsic disorder is a well-known character of proteins, which helps in several functions such as regulation and immune response mediated by different ligand, protein, and nucleic acid interactions. Residues in a disordered region often become ordered upon binding with their substrate, also known as disorder-to-order transition (DOT) regions. Because of the dynamic nature of the complementary RNA component, protein-RNA complexes are more often mediated through intrinsic disorder than protein-DNA complexes. In this study, we have developed a dataset of 52 proteins-RNA complexes and performed statistical analysis on various sequence and structure-based features. Results show that most proteins have a single DOT region with a length of up to 10 amino acids. The frequency and propensity of binding suggest the importance of positively charged residues. The DOT residues are also observed to have relatively high binding frequency than its counterparts in the ordered regions. Further, we shortlisted 13 protein-RNA complexes having non-terminal DOT regions of at least 10 amino acids, and performed molecular dynamics simulations. Our preliminary analysis shows that DOT regions are functionally important and helps in RNA binding. DOT regions are either directly involved in binding or present in the linker regions to facilitate the binding residues.

A-059: Conformational space derived from normal mode analysis: a dynamical metric for scoring 3D predictions of RNA, proteins and their complexes
  • Olivier Mailhot, University of Montreal, Canada
  • Vincent Frappier, University of Montreal, Canada
  • François Major, University of Montreal, Canada
  • Rafael Najmanovich, University of Montreal, Canada

Short Abstract: All the metrics currently used in the scoring of predictions in the RNA-Puzzles are static: they consider only one solved structure. In reality, these RNA molecules are occupying vast conformational landscapes in vivo due to thermal fluctuations and the interaction with partner macromolecules. Normal mode analysis (NMA) is a fast and robust way to explore the most energetically favored conformational transitions of macromolecules, in contrast to molecular dynamics simulations which are more precise, but computationally costly and very sensitive to initial conditions, thus hard to reproduce independently. Here, we present dynaRMSD, a metric using the most global motions obtained from NMA to rescore predicted 3D structures with regards to the experimentally resolved structure. We explore the relative improvement of the predictions according to the number of normal modes used, a critical parameter. Interestingly, this metric can change the ranking of the predictions but never changes the winning group of a particular RNA-Puzzle contest. The motions which favor the new winners are realistic, for example the slight elongation of a double-helix or the bending of a hinge-like junction. This metric is implemented in an open-source, user-friendly software package and will be submitted soon for publication.

A-060: The interaction propensity of protein surfaces is shaped by functional but also non-functional partners
  • Hugo Schweke, université Paris Sud, France
  • Marie-Hélène Mucchielli, université Paris Sud, France
  • Sophie Sacquin-Mora, IBPC, France
  • Anne Lopes, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, UPSay, France

Short Abstract: In the crowded cell, the competition between functional and non-functional interactions is severe. Understanding how a protein binds the right piece in the right way in this complex jigsaw puzzle is crucial and very difficult to address experimentally. To interrogate how this competition constrains the behavior of proteins with respect to their partners or random encounters, we (i) performed thousands of cross-docking simulations to systematically characterize the interaction energy landscapes of functional and non-functional protein pairs and (ii) developed an original theoretical framework based on two-dimensional energy maps that reflect the propensity of a protein surface to interact. Strikingly, we show that the interaction propensity of not only binding sites but also of the rest of protein surfaces is conserved for homologous partners be they functional or not. We show that exploring non-functional interactions (i.e. non-functional assemblies and interactions with non-functional partners) is a viable route to investigate the mechanisms underlying protein-protein interactions. Precisely, our 2D energy maps based strategy enables it in an efficient and automated way. Moreover, our theoretical framework opens the way for the developments of a variety of applications covering functional characterization, binding site prediction, or characterization of protein behaviors in a specific environment.

A-061: DMPfold: fast de novo protein modelling using iterative deep learning-based prediction of structural constraints
  • Joe Greener, University College London, United Kingdom
  • Shaun Kandathil, University College London, United Kingdom
  • David Jones, University College London, United Kingdom

Short Abstract: The impact of deep learning on protein residue-residue contact prediction has not extended to template-free (de novo) model generation. Here we introduce DMPfold, a development of the DeepMetaPSICOV contact predictor. It uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and main chain torsion angles and uses these to build models in an iterative procedure. DMPfold produces more accurate models than two popular methods for a test set of CASP12 free modelling domains, and is able to generate high-quality models without any modifications for a set of transmembrane proteins. We apply it to all Pfam domains without a known structure and provide high-confidence models for 25% of these so-called "dark" families, a calculation that takes less than a week on a cluster with 200 available cores. DMPfold provides models for 16% of human proteome UniProt entries without structures, can generate accurate models with alignments of fewer than 100 sequences in some cases, and is freely available.

A-062: Supporting Structure Prediction Method Development with Continuous Automated Model EvaluatiOn (CAMEO)
  • Anna Smolinski, SIB Swiss Institute of Bioinformatics, Basel & Biozentrum University of Basel, Switzerland
  • Flavio Ackermann, SIB Swiss Institute of Bioinformatics, Basel & Biozentrum, University of Basel, Switzerland
  • Xavier Robin, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Rafal Gumienny, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Juergen Haas, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Torsten Schwede, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland

Short Abstract: Protein structure prediction has become widely used in the life sciences as methods have matured significantly over the past 10 years. Today, most structure prediction workflows are fully automated. Consequently, establishing an automatized assessment and benchmarking process is key to sustained high-paced development of emerging methods. Continuously assessing structure prediction servers e.g. allows scientists to leverage the accumulated data to retrospectively select the best tool for a given scientific question. The Continuous Automated Model EvaluatiOn (CAMEO) platform has been assessing predictions for over 6’700 targets in the 3D protein structure prediction category over 377 weeks, with currently about 20 new targets being assessed each week. CAMEO features baseline structure predictors in each of its categories. Additionally, we have recently implemented a “Best Single Template” baseline comparison resembling an upper “optimal alignment” limit by employing structural superposition to scan the ProteinDataBank (PDB) for the best available template at the time of target submission. This method helps identifying potential room for improvement of the template selection and alignment steps in automated protein structure modeling pipelines. Another integral part to CAMEO is the target selection, where we present the latest efforts on fully automated target validation.

A-063: e-Infrastructure for the Multi-Scale Complex Genomics Virtual Research Environment
  • Genís Bayarri, Institute for Research in Biomedicine (IRB) Barcelona, Barcelona. Spain, Spain
  • Francisco Javier Conejero, Barcelona Supercomputing Center (BSC-CNS), Barcelona, Spain, Spain
  • Adam Hospital, Institute for Research in Biomedicine (IRB) Barcelona, Barcelona. Spain, Spain
  • Marco Pasi, School of Pharmacy and Centre for Biomolecular Sciences, Nottingham, UK, United Kingdom
  • Mark McDowall, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK, United Kingdom
  • Andrew Yates, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK, United Kingdom
  • Marc Marti-Renom, Structural Genomics Group, CNAG-CRG, The Barcelona Institute of Science and Technology (BIST), Spain, Spain
  • Rosa Badia, Barcelona Supercomputing Center (BSC-CNS), Barcelona, Spain, Spain
  • Modesto Orozco, Institute for Research in Biomedicine (IRB) Barcelona, Barcelona. Spain, Spain
  • Josep Ll Gelpí, Dept. Bioquimica i Biologia Molecular. Univ. Barcelona, Spain
  • Laia Codó, Barcelona Supercomputing Center (BSC), Spain

Short Abstract: Multiscale Genomics (MuG) Virtual Research Environment (MuGVRE) is a cloud-based computational infrastructure created to support the deployment of software tools addressing the various levels of analysis in 3D/4D genomics. Integrated tools tackle needs ranging from high computationally demanding applications to high-throughput data analysis applications. The current offer includes NGS pipelines, structure and Hi-C data analysis and visualizers, tools to handle nucleosome positioning, and applications for performing and analyzing simulation data from atomistic to coarse-grained levels. The research platform includes a seamless free-to-access web interface accessible at https://vre.multiscalegenomics.eu. Researchers are granted a private workbench space with access to the complete catalogue of ready-to-use tools and visualizers, together with data resources and other support services that end up building a complete research environment. The underlying computational infrastructure is supported by the MuG cloud infrastructure composed by two cloud systems implemented at the Institute for Research in Biomedicine, and at the Barcelona Supercomputing Center. Execution scheduling is based on a multi-hosted queueing system to handle tools with fixed needs, and an elastic and multi-scale programming model (pyCOMPSs, controlled by the PMES scheduler), for complex workflows requiring distributed or multi-scale executions schemes.

A-064: Deep generative models for 3D compound design from fragment screens
  • Fergus Imrie, University of Oxford, United Kingdom
  • Anthony Bradley, Exscientia Ltd, United Kingdom
  • Mihaela van der Schaar, University of Cambridge, United Kingdom
  • Charlotte M. Deane, University of Oxford, United Kingdom

Short Abstract: Fragment-based drug discovery (FBDD) has become an increasingly important tool for finding hit compounds, in particular for challenging targets and novel protein families. A key challenge is deciding which fragment hits to follow-up, and in what way. We seek to automate the elaboration of initial fragment hits in a data-driven and principled manner using machine learning techniques. We have developed graph-based deep generative methods for fragment elaboration combining state-of-the-art machine learning techniques with structural knowledge. For fragment linking, our method takes two fragment hits and designs a molecule incorporating both fragments. The generation process is protein context dependent, utilising the relative distance and orientation between the fragments. This 3D information is vital to successful compound design, and we demonstrate the limitations of omitting such information. As far as we are aware, this is both the first application of deep learning to FBDD and the first molecular generative model to incorporate 3D structural information directly. Our method designs sensible linkers and allows fragment elaboration without the limitations of database-based methods. We believe that our research will prompt a shift in how FBDD is conducted and we are currently working on extensions of our methods to more challenging scenarios within FBDD.

A-065: RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
  • Clare E. West, University of Oxford, United Kingdom
  • Charlotte M. Deane, University of Oxford, United Kingdom
  • Saulo Oliveira, Stanford Linear Accelerator Center, Stanford University, United Kingdom

Short Abstract: Template-free protein structure prediction protocols typically generate many models for each target. Reliably choosing the best model and determining whether this model is likely to be correct is a fundamental problem. We have developed the Random Forest Quality Assessment (RFQAmodel), which combines existing quality assessment scores and two predicted contact map alignment scores, and outputs for each model an estimated probability that it is in the correct fold (TM-score>=0.5). Using RFQAmodel, we classify target proteins into distinct confidence categories, with those in the high-confidence category being most enriched with modelling successes. The classifier was trained and validated on two diverse sets of 244 protein domains. On the validation set, the highest-scoring model was in the high-confidence category for 67 modelling targets, of which 52 had the correct fold. Furthermore, RFQAmodel predicted that for 59 targets all models had a TM-score of less than 0.5, which was correct in 54 cases. Similar performance was achieved for CASP12/13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how our protocol can be used to focus computational efforts on difficult modelling targets.

A-066: The landscape of atypical and eukaryotic protein kinases
  • Georgi Kanev, Amsterdam UMC - VUmc, Netherlands
  • Bart Westerman, Amsterdam UMC - VUmc, Netherlands

Short Abstract: Knowledge of the promiscuity of small molecule inhibitors towards members of the protein kinase family is very limited. This unexplored territory could have important consequences for anti-cancer therapies and when uncovered could result in enhanced multi-targeted personalized treatments. We have performed comprehensive sequence, structural, mutational and inhibitor binding analysis of kinase inhibitors and developed a structure-based virtual screening pipeline that uses 1D, 2D and 3D (structural) kinase-ligand information as input for deep convolutional neural networks (CNN) to predict the activity of small molecules against the kinome. This provides opportunities for therapeutic repurposing, development of drugs with a synergistic polypharmacological profile and an improved efficacy as well as untargeted subpockets adjacent to the ATP pocket. Critical steps in the structure-based virtual screening pipeline include integrating bioactivity data from web resources and literature, analyzing in-silico generated interaction profiles from molecular docking with the in-house developed Kinase-Ligand Interaction Fingerprints and Structures database (KLIFS) and preparing the data as input for CNN while retaining 3D and lower dimensional structures in the data. The pipeline achieved performance less than 1.0 root-mean-square error (RMSE) and Pearson correlation higher than 0.62, indicating that our method can lead to the identification of multi-target kinase inhibitors with clinical relevance.

A-067: 3D spatial organization and network-guided comparison of mutation profiles in Glioblastoma reveals similarities across patients
  • Nurcan Tuncbag, Middle East Technical University, Turkey
  • Cansu Dincer, Middle East Technical University, Turkey
  • Ozlem Keskin, Koc University, Turkey
  • Atilla Gursoy, Koc University, Turkey

Short Abstract: Glioblastoma has a very heterogeneous mutation profile which is the main challenge in interpretation of effects of mutations. The pathway level representation of the mutations is limited. In this work, we approach these challenges through a systems level perspective in which we analyze how the mutations in 290 GBM tumors from TCGA are distributed in protein structures/interfaces and how they are organized at the network level. As a result, while spatial arrangements of the mutations reducing the heterogeneity, network-guided analysis completes the interaction components, reveals the predominant pathways, and clusters the patients in 4 groups which carry a set of signature 3D patches. Additionally, we found that driver mutations on tumor suppressors have a tendency to be in 3D patch regions whereas mutations on oncogenes prefer being singletons. We compared the survival curves of each group and found that the group with PI3K and TP53 patches are significantly different. We further integrate the sensitivity data of the drugs in GBM cell lines with our results to find a set of potentially efficient drugs for each subgroup. We believe that this study provides a novel perspective to the analysis of mutations and a promising training towards the network-guided precision medicine.

A-068: DeePpiS – towards Deep Learning for Predicting Protein-Protein Interface Positions from Sequence
  • Jaap Heringa, Vrije Universiteit Amsterdam, Netherlands
  • K. Anton Feenstra, Vrije Universiteit Amsterdam, Netherlands
  • Hans de Ferrante, Vrije Universiteit Amsterdam, Netherlands
  • Reza Haydarlou, Vrije Universiteit Amsterdam, Netherlands

Short Abstract: Protein function may best be understood through obser­ving molecular interactions. However, experimental mapping of such interactions is costly, and continues to be outpaced by elucidation of new protein sequences. Thus, one center of interest is to predict protein interactions from sequence alone. We here addressed this lack of standardization in thresholds and datasets by retrieval of an orthogonal dataset from BioLiP. Our first goal is to presented a benchmark of seven sequence-based, residue-level tools, including our recent method SeRenDIP. Our second goal is to assess in which way further, incremental, improvements in PPI interface prediction may be achieved. A final goal is to construct a large dataset to enable deep-learning approaches to be effectively employed. Our results show that elastic net and profile methods may be used to improved prediction performance compared to the published SeRenDIP method, and reach accuracies close to the two best methods: SSWRF and CRF-PPI. ASA and RSA, both predicted from input sequence, seem to drive classification performance of decision-tree based learners, whereas the PSSM profile data seems more important for the elastic net approach.

  • Tülay Karakulak, Izmir Biomedicine and Genome Center, Turkey
  • João Rodrigues, Stanford University, United States
  • Ezgi Karaca, Izmir Biomedicine and Genome Center, Turkey

Short Abstract: TAM (Tyro3, Axl, Mer) proteins belong to the receptor tyrosine kinase family. These receptors can be activated via their interactions with certain ligands. The most commonly studied TAM ligands are two paralogous Vitamin K-dependent proteins, i.e. Gas6 and Pros1. Although Gas6 and Pros1 sequences are significantly similar, Gas6 can bind to all TAM receptors while Pros1 can interact only with Tyro3 and Mer. Expanding on this knowledge, in this work, we aim at characterizing the molecular grounds of Axl’s ligand selectivity through a structure-based approach. For this, we modeled the complex structures of all possible TAM-ligand interactions, by using the available Axl-Gas6 complex structure as a template. This process was followed by the refinement of each complex with HADDOCK. Strikingly, HADDOCK score directly reflected the experimentally characterized TAM-ligand affinities. Following this, we run 4x200 ns molecular dynamics simulations yielding 800 ns ensembles of each Axl-Gas6 and Axl-Pros1 complexes. Detailed interaction analyses of these pairs highlighted that electrostatic interactions are dominating Axl- Gas6 interactions. By interpreting these results within the context of TAM family’s evolutionary relationships, we identified three specificity determining positions that potentially control Axl’s ligand selectivity of Gas6 over Pros1.

A-070: Structure-based Prediction of Terpene Synthase Product Specificity
  • Janani Durairaj, Wageningen University and Research, Netherlands
  • Aalt-Jan Van Dijk, Wageningen University and Research, Netherlands
  • Dick De Ridder, Wageningen University and Research, Netherlands

Short Abstract: Terpenes are a large class of fragrant, volatile secondary metabolites in plants. Enzymes called terpene synthases (TPSs) form terpenes from isoprenoid units, based on a small number of possible reactions. The reaction catalysed by a TPS is difficult to predict from sequence due to high, phylogenetically biased sequence diversity. However, all known TPS structures share a common structural fold. We make use of this fact by combining homology modelling and structure-based machine learning to predict TPS product specificity. As many residue-level descriptors can be extracted from a single enzyme, our prediction framework is an ensemble of classifiers designed to deal with a large number of features from varied information sources. This also allows for easy selection of predictive residues and features from the trained model. One-clade-out validation on over 250 TPSs shows that structure-based prediction outperforms sequence-based approaches. We selected 70 previously uncharacterized putative TPSs and experimentally determined their products. Results on this independent test set show our predictor’s applicability to the thousands of uncharacterized TPSs across all plant species. Predictive residues, likely to be crucial to the reaction mechanism, cluster in the active site cavity and have distinguishing feature distributions. This knowledge can be utilized for engineering reaction-specific TPSs.

A-071: TopoBuilder: Expanding and Functionalizing the Protein Fold Space
  • Fabian Sesterhenn, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Che Yang, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Zander Harteveld, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Bruno E Correia, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Jaume Bonet, Ecole Polytechnique Fédérale de Lausanne, Switzerland

Short Abstract: Computational de novo protein design holds the promise to expand the topological space and boost our understanding of the rules guiding protein folding. The ability to create novel structures has been used to tackle a broad range of biomedical and biotechnological challenges for which no suitable structural conformation was available (such as the design of novel therapeutics and immunogens). However, most current approaches require a high level of understanding of the target topology and primarily focuses on the stabilisation of the final design. To tackle these limitations, we developed TopoBuilder, a protocol mixing parametric and heuristic protein design to generate de novo scaffolds. TopoBuilder transforms 2D projections of a protein’s secondary structures into full 3D designs, using statistically observed local correlations to ensure a natural-like disposition of the structural elements. Thanks to its simplified starting requirements, the protocol provides the tools to systematically explore the structural space, in a way no currently available tool offers. Furthermore, with its fine control of the secondary structure placement, TopoBuilder is capable of tailoring scaffolds around structurally complex functional motifs, setting the most favourable context for their stabilisation and presentation, being the first tool to design de novo scaffolds around previously known structural motifs.

A-072: Modeling TCR-p-MHC complexes relevant for T-cell therapy development
  • Iris Antes, Technical University of Munich, Germany
  • Lukas Wietbrock, Technical University of Munich, Germany
  • Manuel Glaser, Technical University of Munich, Germany

Short Abstract: T-cell receptors (TCR) are important for the adaptive immune response as they distinguish between self- and non-self peptides (p) presented by Major Histocompatibility complexes (MHC) on the cell surface. Adoptive T-cell therapies aim at the identification and design of high avidity TCR-variants, which efficiently eliminate virus-infected and aberrant cells. Thus, computational studies elucidating the molecular basis of TCR binding and activation as well as the prediction of TCR-p-MHC complex structures are of growing importance for therapy development. We contribute to this field by developing new tools for TCR-p-MHC structure prediction as well as performing modeling studies of (TCR-)p-MHC complexes, which are experimentally investigated by our collaborators. For this, we combine different approaches such as homology modeling, molecular docking, structural prediction of TCR domain angles, and molecular dynamics. These studies led to new insights into the structural basis of TCR binding and allowed us to identify molecular properties important for the immunogenicity of peptide-MHC complexes (Hoffmann, T., et. al., PLoS Comput Biol, 2015, 11(7), BMC Struct Biol, 2017; Karimzadeh, H., et al., Journal of Virology, 2018, 92(13)). Therefore, our results contribute to a better understanding of T-cell activation and an efficient and safe in silico driven T-cell therapy development.

A-073: ResiRole: Residue-Level Functional Site Predictions to Gauge the Average Accuracies of Protein Structure Prediction Techniques
  • Joshua Toth, Geisinger Commonwealth School of Medicine, United States
  • Paul DePietro, Geisinger Commonwealth School of Medicine, United States
  • William McLaughlin, Geisinger Commonwealth School of Medicine, United States
  • Juergen Haas, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland

Short Abstract: The Continuous Automated Model Evaluation (CAMEO) platform provides quality assessments for individual protein structural models and overall performance estimates for structure prediction techniques. Here we describe a method to further estimate the average accuracies of structure prediction techniques according to their capacities to produce structural models which exhibit functional site predictions like those found in the corresponding experimentally determined reference structures. We utilized the FEATURE program to provide the probabilities of functional sites that are centered on specific residues within structural models and the reference structures. We measure the correlation coefficients and the subtracted differences between the cumulative probabilities of the functional predictions of the reference structures and the structural models. Average scores are used in head-to-head, round-robin pairwise comparisons between structural prediction techniques. The results provide a relatively robust manner to rank the structure prediction techniques according to their capacities to enable accurate functional site predictions. Further evidence that structure prediction techniques can accurately reconstitute the structural features found at local functional sites is thereby provided. A study of amino acid types at predicted functional sites revealed that across the various structure prediction techniques they more accurately reconstitute functional sites centered on some amino acid residue types over others.

A-074: Model Archive – a deposition system for persistent archiving of theoretical protein structure models
  • Dario Behringer, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Gerardo Tauriello, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Andrew Waterhouse, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Juergen Haas, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Torsten Schwede, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland

Short Abstract: Since 2006, the Protein Data Bank (PDB) only archives protein structures that have been determined experimentally. Recently, PDB-Dev has been established as a prototype archiving system for structural models obtained through integrative/hybrid (I/H) methods. However, purely theoretical models of macromolecular structures cannot be deposited in either system. At the same time, the application of computational macromolecular models in life science research has rapidly increased in recent years thanks to improved accuracy of the prediction algorithms and more reliable methods for model quality estimation. Archiving of those models is crucial for interpretation and reproducibility of results published in the scientific literature. To address this need, we have created Model Archive following a community workshop recommendation with the aim of persistent archiving of these models. Model Archive (https://modelarchive.org/) provides stable accession codes (DOI) for depositions associated with a publication. Besides the model coordinates, details about modelling methods, parameters and constraints are archived. Reviewers can access these details to check a model prior to publication. All models are searchable and publicly accessible under an open license following FAIR data principles. To date, the Model Archive contains 1'564 theoretical model depositions.

A-075: De novo protein design of potent hematopoietic agents
  • Mohammad Elgamacy, Friedrich Miescher Laboratory of The Max Planck Society, Germany
  • Birte Hernandez-Alvarez, Max Planck Institute for Developmental Biology, Germany
  • Murray Coles, Max Planck Institute for Developmental Biology, Germany
  • Patrick Mueller, Friedrich Miescher Laboratory of The Max Planck Society, Germany

Short Abstract: Recent advances in protein design have demonstrated the capacity of physics-based calculations to navigate towards previously unobserved sequence-structure relations. The principal motivation behind these efforts is to precisely generate novel proteins with tailored functions and biophysical features. For the de novo design of pharmacologically active proteins, two prerequisites must be fulfilled; the biochemical target activity is understood at an atomic-structure-function detail, and availability of computational design methods that are accurate to an atomic-level. Reaching the latter, can thus enable single-step in silico design-to-in vivo testing. Thus, the elimination of intervening empirical optimisation steps can promise highly efficient and expedited therapeutic lead discovery. In this work, we designed eight novel hematopoietic agents using two different computational design strategies. All of the experimentally tested designs were folded, monomeric and stable. We further proceeded to solve three NMR structures of the three representative molecules, which agreed to the design models at atomic accuracy. Finally, we evaluated the designed hematopoietic activity for these molecules, through in cell, ex vivo and in vivo assays, where three molecules have shown to posses specific, nanomolar activity.

A-076: Symmetry in Protein Complexes
  • Guillaume Pages, INRIA, France
  • Sergei Grudinin, Inria / CNRS, France

Short Abstract: Many protein complexes in the Protein Data Bank (PDB) are symmetric homo-oligomers. Indeed, it appears that large symmetrical protein structures have evolved in many organisms because they carry specific advantages compared to individual proteins. There is therefore considerable interest in studying these structures. We developed a computational tool called Ananas. It analytically computes the best symmetry axes that minimize the symmetry-aware root-mean-square deviation (RMSD) over transformation operators in this group. The method also computes the corresponding RMSD-based symmetry measure. We also proposed another symmetry detection method, called DeepSymmetry, that does not require atomistic representation of the input structures. Instead, it operates on electron density maps and uses convolutional neural networks that find the order and the axis of symmetry. We performed exhaustive analysis of all symmetric structures in the PDB. We have found that cubic groups are better organized than the others. We have also found that there is no dependence of quality of assembly packing on the size of the assembly. However, for the dihedral symmetries, assemblies with an even order of symmetry are more stable compared to the odd order assemblies. Also, the C6 assemblies are the least organized ones, which can be related to their function.

A-077: SWISS-MODEL – automated structure prediction of protein complexes in a web-based system
  • Gerardo Tauriello, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Martino Bertoni, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Stefan Bienert, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Xavier Robin, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Florian Heer, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Tjaart de Beer, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Lorenza Bordoli, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Rosalba Lepore, SIB Swiss Institute of Bioinformatics, Basel & University of Basel, Switzerland
  • Gabriel Studer, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Christine Rempfer, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Rafal Gumienny, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Andrew Waterhouse, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Torsten Schwede, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland

Short Abstract: Since 25 years, SWISS-MODEL has been providing life science researchers worldwide with high-quality protein structure models. In 2018, the server generated approximately 1.27 million protein structure models, i.e. 2.4 models per minute have been requested by users through the interactive modelling web interface, and SWISS-MODEL was referenced in 2'421 scientific publications. The SWISS-MODEL Repository provides up-to-date models for the complete proteomes of 12 selected model organisms. Here we present recent advances in SWISS-MODEL: enhanced capabilities for automated modelling of homo- and hetero-oligomeric protein complexes, the open source modelling engine ProMod3 which lies at the core of SWISS-MODEL, and a new web service for assessing the quality of protein structures which offers an intuitive way to identify structural irregularities. The automated SWISS-MODEL pipeline is continuously benchmarked with other state-of-the-art methods within the CAMEO project, where we observe that SWISS-MODEL is generating accurate models in their biologically relevant oligomeric state with accurate estimations of model quality, and nevertheless has by far the lowest response time in this benchmark.

A-078: QMEANDisCo – Distance Constraints Applied on the Local Model Quality Estimation Problem
  • Gabriel Studer, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Christine Rempfer, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Rafal Gumienny, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Andrew Waterhouse, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Juergen Haas, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland
  • Torsten Schwede, SIB Swiss Institute of Bioinformatics & University of Basel, Switzerland

Short Abstract: Modelling methods, in particular homology / comparative modelling, have established themselves as a valuable complement to structural analysis when experimental data are missing. While such methods have matured into pipelines that can generate models for almost any protein automatically, the quality of the generated models can be highly variable and hard to estimate in the absence of experimental observables. This is a major concern as the range of applications for which a model can be used directly depends on its quality, hence the importance of quality estimation methods. Here we present the recent advances in QMEANDisCo. QMEANDisCo is the default quality estimation method employed by the SWISS-MODEL homology modelling server. It is a composite score relying on a combination of knowledge based terms and a new distance constraint (DisCo) score. DisCo assesses the agreement between observed pairwise distances in a model with an ensemble of constraints extracted from experimentally determined structures that are homologous to the model being assessed. All scores are combined with feed-forward neural networks to predict local per-residue scores. QMEANDisCo is continuously benchmarked within the CAMEO project and participated in CASP13. For both, it ranks among the top performers and excels with low runtimes.

A-079: Structural signatures: a robust descriptor of cellular gene expression change.
  • Rayees Rahman, Icahn School of Medicine, United States
  • Avner Schlessinger, Icahn School of Medicine, United States

Short Abstract: Gene expression profiles have been used in a variety of contexts from categorizing cancer subtypes, identifying perturbed cellular pathways in response to therapeutic intervention, and have enabled the repositioning of drugs for new therapeutic indications. Gene expression signatures have the capability of reducing the complexity of context-specific gene expression data. However, discovery of robust signature gene expression signatures can have large amounts of variability across distinct experimental samples. As a result, transcriptional signatures may be noisy and unstable across analyses, and have had limited clinical utility. Because each gene encodes a three-dimensional protein structure, often consistent of multiple domains that determines its function, we investigated whether using the structural profile of differentially expressed genes can result in a stable expression signature across distinct experiments. We show that structural profiles from over expressed genes indeed defines a more robust signature that is consistent across samples. Distinct structural signatures also define each organ and tissues type from the Genotype-Tissue Expression Project (GTEx).

A-080: JET2DNA: a new tool for the accurate prediction of DNA-binding sites and identification of their properties.
  • Flavia Corsi, Sorbonne Université, CNRS, Laboratory of Computational and Quantitative Biology — UMR 7238, Paris, France, France
  • Richard Lavery, Lyon University, CNRS, IBCP, UMR 5086, Molecular Microbiology and Structural Biochemistry, 69367 Lyon, France, France
  • Elodie Laine, Sorbonne Université - Laboratory of Computational and Quantitative Biology (LCQB, CNRS-SU), France
  • Alessandra Carbone, Sorbonne Université, France

Short Abstract: Interactions between proteins and DNA play a fundamental role in many essential biological processes. The substantial gap between the number of known and not yet characterized protein-DNA complexes requires computational methods able to predict key amino acids involved in the DNA-binding process and help in deciphering the characteristics of these DNA-binding sites, in order to understand their specific functions. We developed JET2DNA, a new tool for the accurate prediction of DNA-binding sites on protein structures. It employs three sequence- and structure-based descriptors: sequence evolutionary conservation, interface residue propensities and geometrical properties. Combining these descriptors in different ways, we provided the tool of three scoring strategies able to identify different classes of DNA-binding sites and helping in deciphering the properties of predicted regions. We assessed JET2DNA performance on 187 bound and 82 unbound conformations of DNA-binding proteins. It proved to be robust upon conformational changes between bound/unbound states and to outperforms two other prediction tools on both datasets. Moreover, it gives more interpretable results compared to them, in terms of properties of predicted regions. Finally, JET2DNA demonstrated to be able to detect alternative functional DNA-binding sites on the same proteins. This feature is crucial in helping drug design and repurposing.

A-081: The evolution of contact prediction: evidence for non-random selection of contacts in statistical protein contact prediction
  • Charlotte M. Deane, University of Oxford, United Kingdom
  • Mark Chonofsky, University of Oxford, United Kingdom
  • Saulo Oliveira, Stanford Linear Accelerator Center, Stanford University, United Kingdom
  • Konrad Krawczyk, NaturalAntibody, Hamburg, Germany, Germany

Short Abstract: Contact prediction software, which detects inter-residue correlations in multiple sequnce alignments, has transformed protein structure prediction. However, there is little evidence about the origin of these correlations, nor why these methods favour certain contacts over others. We predicted contacts for 1030 protein domains using CCMpred, MetaPSICOV, and DNCON2, as examples of direct coupling analysis, metaprediction, and deep learning methods, respectively. We compared the physico-chemical bonding interactions and conservation of the correctly-predicted contacts from these methods, as well as comparing those contacts against contacts that were not predicted. We found that predicted contacts have more bonds and are more conserved than other contacts, which highlights the importance of these contacts for protein function. Further, we found substantial variation in the bonds formed by predicted contacts and the positions of those contacts. CCMpred favours stronger bonds than MetaPSICOV and DNCON2, as well as ‘riskier’ isolated contacts. This suggests that MetaPSICOV and DNCON2 favour accuracy over physico-chemically important contacts. These results underscore the connection between protein chemistry and the couplings that can be derived from multiple sequence alignments. This relationship is likely to be relevant to protein structure prediction and may be key to understanding their utility for different problems in structural biology.

A-082: Scalable, Interactive, and Reproducible Data Mining of 3D Macromolecular Structures
  • Shih-Cheng Huang, Stanford University, United States
  • Yue Yu, University of California San Diego, United States
  • Peter Rose, University of California San Diego, United States

Short Abstract: The Protein Data Bank (PDB) represents the core data resource for Structural Bioinformatics. The rapid growth of the PDB (> 150,000 structures) enables large-scale data mining, such as development of knowledge-based potentials, docking and scoring functions, and machine learning for protein structure and function prediction. We have developed efficient data representations (MacroMolecular Transmission Format) and a scalable framework to mine the PDB using state-of-the-art Big Data Technologies (mmtf-pyspark). We have deployed applications of this framework in Jupyter Notebooks that are hosted on free public servers, including mybinder.org and CyVerse.org, enabling researchers to publish documented workflows that are reproducible and that can be re-run, modified, or used as starting-points for new structural analyses. We present our approach of using Apache Spark and columnar data formats to scale structural analysis to enable the interactive exploration of the PDB archive, as well as scalable data integration. We demonstrate these capabilities by creating representative subsets of the PDB for machine learning applications and the mapping and visualization of post-translational modifications from proteomics experiments and genomic variations to 3D structures in the context of protein-protein/nucleic acid/ligand/drug interactions. We also cover best practices of deploying these workflows on public servers to enable reproducibility and reuse.

A-083: The Impact of Conformational Entropy on the Accuracy of the Molecular Docking Software FlexAID in Binding Mode Prediction (III).
  • Louis-Philippe Morency, University of Montreal, Canada
  • Rafael Najmanovich, University of Montreal, Canada

Short Abstract: We introduce the latest version of Flexible Artificial Intelligence Docking (FlexAID) that allows its scoring function to consider the conformational entropy of ligands in complex with their biological targets. We present the impact of FlexAID’s newest feature on its accuracy in binding mode prediction using three increasingly complex scenarios: the Astex Diverse Set, the Astex Non Native Set and HAP2. We show that FlexAID outperforms other open-source molecular docking methods when molecular flexibility is crucial. The improved accuracy of FlexAID on complex cases, the addition of novel features, i.e., the normal mode analysis, its accessibility and its easy-to-use graphical user interface suggest that FlexAID is in an interesting position to tackle biologically challenging and pharmacologically relevant situations currently ignored by other methods. Furthermore, FlexAID now outputs statistical thermodynamic parameters, i.e., ∆G, ∆H and -T∆S, as well as multiple fluid conformations that are computed for each predicted binding modes, two unique features useful in the dynamic visualization of results, in a more thorough energy comparison between different ligands (relative ranking of molecules by affinity and virtual screening) and for the analysis of conformational entropic contributions to the energy of formation of a complex of interest.

A-084: Evolution of domain structure within repetitive proteins
  • Aleix Lafita, EMBL-EBI, United Kingdom
  • Jennifer Potts, University of York, United Kingdom
  • Alex Bateman, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Short Abstract: Most proteins anchored at the bacterial cell wall are highly repetitive and evolve rapidly. Recent experimental structures have revealed that some of these repeating units fold into single globular domains, forming rod-like stalks of tandem domains. A number of these domains have proven difficult to classify due to their rare sequences and structures and have required the creation of new protein families. Here we present one example, the Rib domain, for which three evolutionarily related variants have been found. Rib domains are part of the immunoglobulin fold, but experimental structures show loss of beta strands and conversion of a beta strand to an alpha helix. These structural variations in the Rib domains are not isolated cases and similar examples have been found in other repetitive domains. We believe that these repetitive domains can be ideal candidates to study structural domain evolution in proteins.

A-085: Structure-Informed Variant Prioritization in Personalized Oncology
  • Siao-Han Wong, German Cancer Research Center (DKFZ), Germany
  • Benedikt Brors, German Cancer Research Center (DKFZ), Germany

Short Abstract: In personalized oncology, treatment strategies take individualized genomic aberrations into consideration. The interpretation of their pathogenicity relies on our understanding of regional significance within genes. There had been initiatives identifying mutation hotspots based on protein sequences or protein structures, where residues distant in sequence can be brought together through protein folding. Nonetheless, these information are not well utilized in the clinical setting. In this study, we compiled prior knowledge as a hotspot list to molecularly stratify patients, and further consider the structural neighborhood to gain insights into underlying mechanisms. Specifically, 1,572 residue hotspots are predicted from three sequence approaches and an additional 174 structural hotspots from three structural algorithms. With setting up a mechanism to map genomic variants to protein structures, we were able to annotate the structural neighborhood of variants and investigate the interplay between them. Focusing on protein-protein interaction interfaces, we have identified interfaces preferentially disrupted and also the exact contributing mutations. In summary, this approach enables us to quickly prioritize variants in patient genomes by their prevalence, and further annotate structural significance to inform potential mechanisms behind driving events. This would help facilitate clinical evaluations to meet the growing number of patients recruited to personalized oncology trials.

A-086: A New Antibody CDR Structural Database
  • Simon Kelow, University of Pennsylvania, United States
  • Roland Dunbrack, Fox Chase Cancer Center, United States

Short Abstract: Antibodies are the largest family of solved structures in the Protein Data Bank (PDB), with ~3,500 structures currently available. Characterizing the structural features of the antibody complementary determining regions (CDRs) remains an important step in understanding antibody structure, and research tasks such as antibody design and classification rely on robust characterizations. We previously clustered the conformations of CDRs (North et al., J Mol Biol 406, 228-256) with a backbone dihedral angle metric and an affinity propagation algorithm. Many of the clusters established at that time remain small with very few sequences. We have re-clustered the antibody CDR conformations using a new maximum distance dihedral clustering metric, alongside an implementation of the DBSCAN clustering algorithm. Given a dataset quadruple in size, we have an opportunity to determine a new set of “canonical” clusters with improved sequence features compared to the previous clustering. Here we detail the development of a new antibody database relating CDR structural information to antibody sequence information based on this new clustering.

A-087: Improving the prediction of loops and drug binding in GPCR structure models
  • Bhumika Arora, Indian Institute of Technology Bombay, Monash University, IITB-Monash Research Academy, India
  • Venkatesh Kareenhalli, Indian Institute of Technology Bombay, India
  • Denise Wootten, Monash University, Australia
  • Patrick Sexton, Monash University, Australia

Short Abstract: G protein-coupled receptors (GPCRs) form the largest group of potential drug targets and therefore, the knowledge of their three dimensional structure is important for rational drug design. Homology modeling serves as a common approach for modeling transmembrane helical cores of GPCRs, however, these models have varying degrees of inaccuracies that result from the quality of template used. We have explored the extent to which inaccuracies inherent in homology models of the transmembrane helical cores of GPCRs can impact loop prediction. We found that loop prediction in GPCR models is much more difficult than loop reconstruction in crystal structures owing to the imprecise positioning of loop anchors. Therefore, minimizing the errors in loop anchors is likely to be critical for optimal GPCR structure prediction. To address this, we have developed a ligand directed modeling (LDM) method comprising of geometric protein sampling and ligand docking, and evaluated it for capacity to refine the GPCR models built across a range of templates with varying degrees of sequence similarity with the target. The LDM reduced the errors in loop anchor positions, as well as improved the prediction of ligand binding poses, resulting in the much better performance of these models in virtual library screenings.

A-088: interfacea: open-source library for protein interface analysis
  • João Rodrigues, Stanford University, United States
  • Michael Levitt, Stanford University, United States

Short Abstract: Protein interactions are central to most processes in biology. Understanding why and how these interactions translate to biological function requires a detailed analysis of their 3D structures. Often, these analyses involve the identification of chemical features such as hydrogen bonds and ionic interactions, as well as quantitative measurement of their contribution to the overall stability of the system. Here, we present interfacea, a software library written in Python to facilitate the analysis of chemical and energetic features of macromolecular interfaces. Besides implementing standard algorithms to identify a variety of chemical interactions, interfacea is built on the OpenMM simulation engine to allow high-performance energy calculations that can identify both hotspots and sources of strain in the 3D structure. Our code complies to the PEP8 standards and is available under an open-source license on GitHub. As such, we hope that interfacea not only facilitates routine analyses of the 3D structure of protein interactions, but serves also as a building block for more advanced tools in molecular docking and design.

A-089: Bayesian active learning for optimization and uncertainty quantification in protein docking
  • Yue Cao, Texas A&M University, United States
  • Yang Shen, Texas A&M University, United States

Short Abstract: We introduce a novel algorithm, Bayesian Active Learning (BAL), for optimization and uncertainty quantification (UQ) in flexible protein docking. BAL directly models the posterior distribution of the global optimum (or native structures for protein docking) with active sampling and posterior estimation iteratively feeding each other. Furthermore, we use complex normal modes to represent a homogeneous Euclidean conformation space suitable for high-dimension optimization and construct funnel-like energy models for encounter complexes. Over a protein docking benchmark set and a CAPRI set involving homology docking, we establish that BAL significantly improves against both starting points by rigid docking and refinements by particle swarm optimization, providing for one third targets a top-3 near-native prediction. BAL also generates tight confidence intervals with half range around 25\% of iRMSD and confidence level at 85\%. Its estimated probability of a prediction being native or not achieves binary classification AUROC at 0.93 and AUPRC over 0.60 (compared to 0.14 by chance). To the best of knowledge, this study represents the first uncertainty quantification solution for protein docking, with theoretical rigor and comprehensive assessment.

A-090: The antibody-antigen interaction: a new approach for epitope prediction
  • Charlotte M. Deane, University of Oxford, United Kingdom
  • Anna Vangone, Roche, Germany
  • Alexander Bujotzek, Roche, Germany
  • James Dunbar, Benevolent, United Kingdom
  • Stefan Dengl, Roche, Germany
  • Guy Georges, Roche, Germany

Short Abstract: Interaction between an antibody (Ab) and its antigen (Ag) plays a vital role in the immune response and is regulated by precise structural determinants. Knowledge of the epitope, i.e. the antigen region recognized by the Ab, is fundamental in order to study the mode of action of Ab and to engineer therapeutic antibodies with desired binding and affinity properties. In this work, we present an innovative approach for in silico prediction of epitopes. By using the Ag-Ab cases of the protein-protein benchmark v5 (Vreven et al. JMB 2015), we docked the unbound structures with ZDOCK (Chen et al. Proteins 2003). We then evaluate the frequency with which every Ag-residue is in contact with any Ab-residue within the pool of the docked models, in order to predict residues forming the epitope. With our approach, we are able to identify the epitope (or part of it) in 80% of the cases and we consistently rank native-like models within the top 10 positions (CONSRANK approach, Oliva et al, Proteins 2013). Importantly, our approach results very robust when real-case scenarios are considered, i.e. when only a model of the Ab can be used, with the unbound structure of the Ag.

A-091: Validation of NMR protein structures using structural rigidity theory and random coil index
  • Nicholas Fowler, The University of Sheffied, United Kingdom
  • Adnan Sljoka, Riken, Japan
  • Michael Williamson, The University of Sheffield, United Kingdom

Short Abstract: All protein structures submitted to the Protein Data Bank (PDB) are validated to identify potential errors and quantify their overall quality. Validation can be broadly categorised into two types: “geometric” and “model vs data”. Geometric validation asks whether a protein structure looks like what we know proteins generally look like. A structure may have very good geometrical quality but this does not necessarily mean it is accurate (an accurate structure is one that is close to the true structure). In order to determine accuracy, model vs data validation is required. For x-ray crystal structures, model vs data measures include R factor and Rfree. However, for NMR structures there are no equivalent measures and therefore no method for determining their accuracy. We are currently developing a method for validating the accuracy of NMR protein structures which involves computing local flexibility (i.e. per residue) using Floppy Inclusion and Rigid Substructure Topography (FIRST) to compare to local flexibility predicted from backbone chemical shifts using Random Coil Index (RCI). We present our progress so far which includes how our method can identify structures that have incorrect fold/secondary structure and structures that are too floppy (e.g. lack sufficient hydrogen bonds).

A-092: iCn3D, a Web-based 3D Viewer for Sharing 1D/2D/3D Representations of Biomolecular Structures
  • Jiyao Wang, National Institutes of Health, United States
  • Philippe Youkharibache, National Institutes of Health, United States
  • Aron Marchler-Bauer, National Institutes of Health, United States

Short Abstract: Motivation: Build a web-based 3D molecular structure viewer focusing on the interactive structural analysis. Results: iCn3D (I-see-in-3D) can simultaneously show 3D structure, 2D molecular contacts, and 1D protein and nucleotide sequences through an integrated sequence/annotation browser. Pre-defined and arbitrary molecular features can be selected in any of the 1D/2D/3D windows as sets of residues and these selections are synchronized dynamically in all displays. Biological annotations such as protein domains, single nucleotide variations, etc. can be shown as tracks in the 1D sequence/annotation browser. These customized displays can be shared with colleagues or publishers via a simple URL. iCn3D can display structure-structure alignment obtained from NCBI’s VAST+ service. It can also display the alignment of a sequence with a structure as identified by BLAST, and thus relate 3D structure to a large fraction of all known proteins. iCn3D can also display electron density maps or electron microscopy (EM) density maps, and export files for 3D printing. The following example URL exemplifies some of the 1D/2D/3D representations: https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=1TUP&showanno=1&show2d=1&showsets=1.

A-093: GGIP: Structure and sequence-based GPCR–GPCR interaction pair predictor
  • Sakie Shimamura, Tokyo Denki University, Japan
  • Vachiranee Limviphuvadh, A*STAR, Singapore
  • Hiroyuki Toh, Kwansei Gakuin University, Japan
  • Wataru Nemoto, Tokyo Denki University, Japan
  • Yoshihiro Yamanishi, Kyushu Institute of Technology, Japan

Short Abstract: Many studies have reported that G-Protein Coupled Receptors (GPCRs) function not only as their monomers but also as homo or heterodimers or higher-order molecular complexes. Many GPCRs exert a wide variety of functions by a specific combination of GPCR subtypes. Besides, some GPCRs are reported to be associated with diseases. Thus, GPCR oligomerization is now recognized as an important event in various biological phenomena. We developed a support vector machine-based method to predict interacting pairs for GPCRs oligomerization, GPCR-GPCR Interaction Pair predictor (GGIP), by integrating the structure and sequence information [Nemoto et al. Proteins. 2016;84:1224-33]. The performance of our method was evaluated by the Receiver Operating Characteristic (ROC) curve. The corresponding area under the curve was 0.938. As far as we know, this is the only prediction method for interacting pairs among GPCRs. Our method could accelerate the analyses of these interactions and contribute to the elucidation of the global structures of the GPCR networks in membranes. We have recently launched a prediction server, which is available at http://protein.b.dendai.ac.jp/GGIP/.

A-094: PISITE update: protein sociability and interactions in the genomic era
  • Hafumi Nishi, Graduate School of Info. Sci., Tohoku University, Japan
  • Yuki Kagaya, Graduate School of Info. Sci., Tohoku University, Japan
  • Matsuyuki Shirota, Tohoku University, Japan
  • Kengo Kinoshita, Tohoku University, Japan

Short Abstract: Protein-protein interactions play a key role in numerous biological processes and scrutinizing interacting residues is crucial for a better understanding of protein interactions. Meanwhile, the rapid increase in genetic variation data allows us to examine how single amino acid changes affect protein molecules and complexes on a larger scale. More than ten years ago, we introduced a new concept in protein interactions: sociability. The sociability of proteins was defined by the unique numbers of binding partners and binding states. Proteins interacting larger numbers of proteins in various binding states were recognized as sociable. Sociability was also defined for interacting residues as well. We then developed the protein interaction database PISITE that provides the information of sociable proteins and residues based on all entries in Protein Data Bank. In order to comprehend genetic variations from the sociability perspective, we have updated the PISITE database with genetic variant information from ClinVar and UniProt. We further employed the exome data from gnomAD to explore a broader variant space. It was found that sociable proteins had some unique features compared to non-sociable ones. We believe that our database will give distinctive insights into the structural understanding of genetic variations. PISITE is available at https://pisite.sb.ecei.tohoku.ac.jp.

A-095: Mapping Structure and Interaction in Beta Turns
  • Nicholas Newell, Newell, United States

Short Abstract: β-turns, which make up about a quarter of residues in proteins and play crucial roles in structure and function, are commonly classified by dihedral angles into a small set of types that provides only a low-precision picture of turn backbone geometries, and more than a quarter of turns remain structurally unclassified. Furthermore, the systematic treatment of side-chains in β-turns has been limited to tabulations of single-position amino acid propensities, supplemented by structural examples, and the important interactions between β-turns and their immediate N- and C-terminal neighborhoods have not been systematically characterized. In this work, a simple geometric parameterization is derived for β-turns which enables meaningful sub-partitioning of turns within types, and a 3-stage clustering algorithm generates 3D conformational heat-maps that reveal the fine-scale distributions of backbone and side-chain/rotamer structure and interaction across all β-turns and all their side-chain and backbone motifs, both within turns and in their N- and C-terminal neighborhoods. This analysis yields a broader, deeper picture of β-turns by providing a comprehensive, unified, high precision treatment of the backbone and side-chain structure of turns and their environs, and it should prove useful in protein design, structure validation and prediction, and in understanding the structural consequences of disease-associated mutations.

A-096: GalaxyDomDock: Protein Domain Structure Assembly by Docking
  • Taeyong Park, Seoul National University, South Korea
  • Seung Yul Lee, Seoul National University, South Korea
  • Hyeonuk Woo, Seoul National University, South Korea
  • Sangwoo Park, Seoul National University, South Korea
  • Chaok Seok, Seoul National University, South Korea

Short Abstract: Protein domains are evolutionarily and functionally independent protein structure units, and domain-domain interactions may be related to protein functions. Predicting assembled domain structures is thus an important part of protein structure prediction. Ab initio structure assembly of domains is needed when each domain structure can be predicted reliably but there are no proper structure templates covering the whole protein. In this research, we have developed a domain assembly method called GalaxyDomDock that predicts the relative orientation of two domains. Based on the analysis that the inter-domain interactions and the inter-chain interactions are similar, we decided to apply an ab initio protein-protein docking method, GalaxyTongDock, for the domain assembly problem. First, in GalaxyDomDock, up to 10,000 assembled domain structures were predicted by GalaxyTongDock with additional geometric consideration of the domain linker which effectively reduces the conformational space for docking. The predicted structures were further screened for domain connectabiliy by the given linker using Dijkstra’s algorithm. Lastly, structural redundancy of the remaining structures was removed by clustering dependent on the protein size. GalaxyDomDock showed better or comparable performance compared to AIDA and Rosetta domain assembly protocols on various categories of benchmark tests.

A-097: GalaxyDock3: A protein-ligand docking program that considers full structural flexibility of ligand
  • Chaok Seok, Seoul National University, South Korea
  • Jinsol Yang, Seoul National University, South Korea
  • Minkyung Baek, University of Washington, South Korea
  • Sohee Kwon, Seoul National University, South Korea
  • Beomchang Kang, Seoul National University, South Korea

Short Abstract: Complex structure of a protein with a small-molecule ligand predicted by protein-ligand docking can provide critical information for understanding the biological functions of the protein. One of the essential components of protein-ligand docking is conformational sampling by which possible protein-ligand complex conformations are generated. Conventional docking programs consider ligand flexibility by sampling conformational changes in torsion angles only, fixing bond lengths, bond angles, and ring conformations at those of the initial ligand structure. Such approach has an advantage of reducing the number of degrees of freedom but has a disadvantage of ignoring possible ligand internal structural changes induced by protein binding. In real applications, internal geometry of bound ligand is unknown, so fixing the ligand internal structure can deteriorate the performance of docking. Herein, we introduce a new protein-ligand docking program called GalaxyDock3 that considers the full ligand conformational flexibility by explicitly sampling the ligand ring conformation and allowing relaxation of the full ligand degrees of freedom, including bond angles and lengths. The performance of GalaxyDock3 was improved compared to GalaxyDock2 when predicted ligand conformation was used as the input for docking. GalaxyDock3 also compared favorably with other available docking programs on two benchmark tests that contained diverse ligand rings.

A-098: Surface-centric Computational Protein Design
  • Andreas Scheck, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Pablo Gainza, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Jaume Bonet, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Bruno E Correia, Ecole Polytechnique Fédérale de Lausanne, Switzerland

Short Abstract: Molecular surfaces are a high-level description of proteins without explicitly modeling the underlying atoms. This representation was utilized to compare protein structures on global and local scales as their shapes are related to chemical reactivity and ultimately protein function. Here we propose a surface-centric computational protein design approach that incorporates the molecular shape into the protein design process to guide models towards a desired shape. We implemented a shape similarity metric which is based on the computation of angles and distances of normal vectors on dot surfaces. The metric is implemented as part of the Rosetta modeling software and we combined it with a Monte Carlo approach to gradually adapt the sequence of a given protein backbone to match a reference surface. The proposed metric recognizes geometrical features of individual amino acids and recovers natural surfaces more frequently than the Rosetta scoring function in a sequence recovery benchmark of natural protein interfaces. The results indicate that the incorporation of shape into the protein design process allows the recovery of natural surfaces, often inaccessible to the standard Rosetta scoring function. It seems therefore to be a promising component for functional protein design, especially for the design of highly specific protein interactions.

A-099: Analysis on nonsynonymous variations with possible structural and functional impact on loss-of-function intolerant proteins
  • Matsuyuki Shirota, Tohoku University, Japan

Short Abstract: Recent large-scale exome analyses have shown a spectrum of intolerance to loss-of-function (LoF) variations among human proteins, which were calculated with the accumulation of protein-truncating variations in each protein among hundreds of thousands of populations. Nonsynonymous variations, which result in amino acid changes, can have severe impact on protein structure and function, but how deteriorating nonsynonymous variations are accumulating in Lof-intolerant proteins is not well understood. Here, I evaluated the functional effects of genome variations reported in the genome aggregation database (gnomAD) with available protein 3D structure data in Protein Data Bank. Amino acid changes in the internal of Lof-intolerant proteins, which are expected have large effects, are decreased compared with those that are exposed to protein surface. Variations that cause amino acid changes in protein internal of LoF-intolerant proteins are enriched in very rare variations, such as singletons. Amino acid substitutions that cause large physicochemical and volumetric changes frequently occur in these buried variants of the LoF-intolerant proteins, which reflects the high fraction of singletons. Together, these results suggest that a number of variations in human population can be deleterious to protein function and can associate with phenotypes.

A-100: Structuromic analysis of protein mutational robustness selection at the codon level
  • Marianne Rooman, Université Libre de Bruxelles, Belgium
  • Martin Schwersensky, Université Libre de Bruxelles, Belgium
  • Fabrizio Pucci, ULB, Belgium

Short Abstract: In the attempt to understand how the naturally selected biophysical features of proteins drive the evolution of their coding sequence, protein stability has been recognized to be under strong selective pressure. However, there remain major questions such as whether protein evolution promotes evolvability or mutational robustness, and whether the mutational profile of proteins is universal. Here, we questioned the role of codon usage as well as the structure of the standard genetic code in promoting protein mutational robustness. We performed in silico mutagenesis over more than 20000 protein structures, estimating their free-energy change upon mutation, and compared the distributions of those values for various mutational ensembles such as the mutations resulting from single/multiple nucleotide substitutions or the mutations resulting from substitutions in the used codon versus those in its synonymous codons. Our results indicate that the standard genetic code is well evolved for protein mutational robustness and that codon usage can optimized for mutational robustness. This suggests a new post-translational explanation for codon usage. This tendency appears weaker for codons with usage bias, suggesting a competition with selective pressures at the co-transcriptional level. Together, these results extend our understanding of the processes driving the evolution of coding sequences.

A-101: Evaluation of differences in protein dynamics caused by genomic variant using time-series data
  • Mayumi Kamada, Kyoto University, Japan
  • Mikito Toda, Nara Women's University, Japan

Short Abstract: Genomic medicine has begun to be applied in clinical practice. Particularly in cancer, it has become possible to select optimal drugs depending on the presence or absence of genomic variants. On the other hand, it is also becoming clear that drug efficacy differs depending on mutated position and alteration pattern even if it occurs in the same protein. Dynamics of protein structure is known to play an important role for their function and affinity with chemical compound. Therefore, detecting dynamical change of protein structure by genomic variant is the key to the drug selection and drug discovery. Genomic variations are often detected in Epidermal growth factor receptor (EGFR) on the Japanese patients of non–small-cell lung cancer, and some variants are known to have different drug sensitivities. In this study, utilizing motion features extracted from time-series data obtained from MD simulation, we evaluate differences in structural dynamics of EGFRs, which are known to have different binding affinities with gefitinib. The protocol of feature extraction consists of continuous wavelet transform and singular value decomposition. The results showed our protocol is considered useful to evaluate the differences in protein dynamics that have difference drug sensitivities.

A-102: Describing the dynamics behind extension/retraction of Enterohemorrhagic Escherichia coli Type 4 Pilus
  • Yasaman Karami, Institut Pasteur, France
  • Benjamin Bardiaux, Institut Pasteur, France
  • Nadia Izadi-Pruneyre, Institut Pasteur, France
  • Therese Malliavin, Institut Pasteur, France
  • Michael Nilges, Institut Pasteur, France

Short Abstract: Type IV pilus (T4P) are distinctive dynamic filaments at the surface of many bacteria. They can rapidly extend and retract with the rate of ~1000 sub-units per second. Such behavior enables T4P to play different and crucial roles: adhesion to cell host, twitching motility, DNA uptake and microcolony formation. One of the important human pathogens belongs to the Enterohemorrhagic Escherichia coli (EHEC), in which T4Ps are shown to be among the virulence factors [1]. The structure of the T4P filament of EHEC has been recently determined by Nuclear Magnetic Resonance (NMR) spectroscopy and cryo-electron microscopy [1]. However, obtaining atomistic details is crucial to better understand the mechanism behind their extension and retraction. We have performed all-atom molecular dynamics simulations to study the wild-type and mutants of T4P, in order to characterize their dynamic properties and understand the interaction networks of the subunits. Our results revealed key interactions involved in the extension and retractions. In addition, the analysis of mutants enabled us to better describe the role of key residues for the function of T4P. Finally, this study paves the way toward the development of vaccines and therapeutics through the identification of surface-exposed epitopes. [1] Bardiaux B, et al. Structure, 2019.

A-103: Co-evolution Analysis for Allosteric Network Prediction
  • Charlotte M. Deane, University of Oxford, United Kingdom
  • Dominik Schwarz, University of Oxford, United Kingdom
  • Jiye Shi, UCB Pharma, United Kingdom

Short Abstract: Allostery is thought to be an inherent feature of almost all biological macromolecules and in the protein context most commonly defined as a conformational change of the active site that follows a binding event at a different distant binding site. Residues involved in the transmission of such an allosteric signal could be targeted by non-competitive drugs, opening up new therapeutic strategies. Few experimental methodologies exist to investigate allostery and all are very resource intensive. Unfortunately, computational methods for allostery prediction lack a general validation strategy which is especially true for the verification of allosteric networks (pathways). In this work, we evaluate existing and newly developed computational methods with multiple datasets that contain a small number of validated allosterically important residues, in particular the Allosteric Database (http://mdl.shsmu.edu.cn/ASD/) which contains 46 allosteric networks with clearly defined residues. Validating against a larger number of proteins lowers the risk of biasing any methodology towards any specific protein or experimental technique. We examined the allosteric networks predicted by methods that use co-evolutionary information, in particular Statistical Coupling Analysis (SCA) and Direct Coupling Analysis (DCA).

A-104: Structure-Informed Variant Prioritization in Personalized Oncology
  • Siao-Han Wong, German Cancer Research Center (DKFZ), Germany
  • Benedikt Brors, German Cancer Research Center (DKFZ), Germany

Short Abstract: In personalized oncology, treatment strategies take individualized genomic aberrations into consideration. The interpretation of their pathogenicity relies on our understanding of regional significance within genes. There had been initiatives identifying mutation hotspots based on protein sequences or protein structures, where residues distant in sequence can be brought together through protein folding. Nonetheless, these information are not well utilized in the clinical setting. In this study, we compiled prior knowledge as a hotspot list to molecularly stratify patients, and further consider the structural neighborhood to gain insights into underlying mechanisms. Specifically, 1,572 residue hotspots are predicted from three sequence approaches and an additional 174 structural hotspots from three structural algorithms. With setting up a mechanism to map genomic variants to protein structures, we were able to annotate the structural neighborhood of variants and investigate the interplay between them. Focusing on protein-protein interaction interfaces, we have identified interfaces preferentially disrupted and also the exact contributing mutations. In summary, this approach enables us to quickly prioritize variants in patient genomes by their prevalence, and further annotate structural significance to inform potential mechanisms behind driving events. This would help facilitate clinical evaluations to meet the growing number of patients recruited to personalized oncology trials.

A-105: Transcriptome Reconstruction in Norway spruce
  • Karl Johan Westrin, KTH CBH, Sweden
  • Warren Kretzschmar, KTH CBH, Sweden
  • Olof Emanuelsson, KTH CBH, Sweden

Short Abstract: While the Norway spruce (Picea abies) is important for the Swedish economy, it lacks a complete reference genome -- the published draft is highly fragmented -- and its long juvenile period makes breeding difficult. Early cone setting has been proven associated with expression of a MADS-box gene: DAL19, but no existing transcriptome assembler have managed to reproduce all transcript isoforms of DAL19. A novel de novo-assembly pipeline Abeona is proposed for this issue, outperforming traditional assemblers in reproducing isoforms of DAL19. However, Abeona performs less good when reconstructing the entire transcriptome. Further studies on optimizing Abeona for full transcriptome assembly will be made.

A-106: Pharmacophore-based virtual screening and molecular docking study for searching new inhibitors for Prostaglandin E2 receptor 4 (EP4)
  • Evelyn Costa, Universidade Estadual de Feira de Santana, Brazil
  • Fabricio Silva, Universidade Federal do Vale do São Francisco, Brazil
  • Bruno Silva Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil

Short Abstract: EP4 is a prostaglandin E2 receptor, upon activation this receptor is expressed in the lungs in lung fibroblasts, smooth muscle cells of the airways and in the smooth muscles of the pulmonary vein causing relaxation in addition to suppressing the inflammation thereof. The aim of this work was perform a pharmacophore-based ligand searching, in order to find new inhibitors against EP4, which can be tested in vitro and in vivo as new bronchodilators. Crystallographic EP4 structure was obtained from Protein Data Bank with code 5YWY. Known drug ligand seaching was perfomed DRUGBANK (https://www.drugbank.ca/) database and pharmacophore modelling by PharmaGist (http://bioinfo3d.cs.tau.ac.il/PharmaGist/). We used ZINCPharmer (http://zincpharmer.csb.pitt.edu/) for seaching at least 1000 pharmacophore-like ligands in ZINC database (http://zinc15.docking.org/). All selected ligands as well as known drugs were saved in mol2 and pdbqt formats, using Marvin Sketck and Autodock Tools programs respectivelly. Before docking calculations we carried an energy minimization for 5000 cycles of steepest descent and 5000 cycles of conjugated gradient for adjusting the crystallographic structure, using AMBER 14, follwed by protein preparation in Autodock Tools. Docking results were obtained with Autodock Vina, and 2D ligand interaction maps were constructed using Accelrys Discovery Studio 2.5.