ISMB/ECCB 2015

Google Plus

Linked In

Flickr

Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category L - 'Protein Structure and Function Prediction and Analysis'

L01 - Towards the increase of the thermostability of a biocatalyst

Stefani Dritsa, Aberystwyth University,

Narcis Fernandez-Fuentes, Aberystwyth University,

David Bryant, Aberystwyth University,

Short Abstract: EnzLip is a highly enantioselective lipase with applications in industrial fields from detergents and oleochemistry to biodiesel production and waste treatment. Combining academic research with industrial interests in a greener route to integrating biocatalysts in the industrial catalytic processes, a rational approach has been followed towards the increase of EnzLip's thermostability. The correlation of structural to functional characteristics, the design of models for molecular dynamics simulations (GROMACS), the optimization of those models (Rosetta Software Suite) and the analysis of the computational experiments, are part of the rational design process. Here, results of the pipeline optimization of the in silico experimentations are demonstrated.

L02 - JPred4: A Protein Secondary Structure Prediction Server

Alexey Drozdetskiy, University of Dundee,

Chris Cole, University of Dundee,

James Procter , University of Dundee,

Geoff Barton, University of Dundee,

Short Abstract: JPred4 is a part of a suite of analysis tools available from www.compbio.dundee.ac.uk. JPred4 is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94,000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices with added REST API for programmable command-line access. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results has been enhanced on the web-site and through the optional email summaries, batch submission results and REST API. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials.

L03 - Distant protein homology detection using a structural alphabet

yassine Ghouzam, Inserm U1134, Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Institut National de la Transfusion Sanguine, Laboratory of Excellence GR-Ex, France

Guillaume Postic, Inserm U1134, Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Institut National de la Transfusion Sanguine, Laboratory of Excellence GR-Ex, France

Alexandre G. De Brevern, Inserm U1134, Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Institut National de la Transfusion Sanguine, Laboratory of Excellence GR-Ex, France

Jean-Christophe Gelly, Inserm U1134, Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Institut National de la Transfusion Sanguine, Laboratory of Excellence GR-Ex, France

Short Abstract: Distant protein homology detection using a structural alphabet.

Comparative modeling is the most applied method for protein structure prediction. Identify a homologous protein with a resolved structure from a given sequence is a crucial part of this strategy. Improvements on current state of the art methods are still possible since distant protein relationships remain undetected even by the best one. It has been shown that addition of structural information improve detection of distantly related proteins. Indeed, structure is more conserved than sequence during the evolution and proteins might have structural similarities even when no evolutionary sequence relationship can be detected (llergård et al, Proteins 2009).

We present ORION, a new approach that relies on precise local structural description provided by the Protein blocks (PBs) (De Brevern et al, Proteins 2000). PBs is a 16-letter structural alphabet which can describe accurately and continuously the local conformation of a protein as a one dimensional sequence. ORION use predicted PBs and sequence profile derived from PSI-BLAST to search a fold library of templates profiles build from the HOMSTRAD database. Compared to reference methods PSI-BLAST (Altschul et al, Nucleic acids Res. 1997) and HHsearch (Söding, Bioinformatics 1998), we show that ORION detects around 15% and 8% more templates than PSI-BLAST and HHsearch, respectively. Our method is also able to detect more homologs than HHsearch and PSI-BLAST when used on target sequences from CASP 8, 9 and 10 editions and is particularly efficient for distantly related proteins due to the addition of local structural information.

L04 - Association Rule Mining for Metabolic Pathway Prediction

Imene Boudellioua, King Abdullah University of Science and Technology, Saudi Arabia

Rabie Saidi, UniProt - European Bioinformatics Institute,

Maria Martin, UniProt - European Bioinformatics Institute,

Victor Soloviev, King Abdullah University of Science and Technology, Saudi Arabia

Short Abstract: Prediction of chemical reactions and pathways is among the most challenging problems of systems biology. In this work, we are tackling the problem of metabolic pathway prediction in the context of metabolism. We developed ARBA (Association-Rule-Based Annotator), a system that utilizes machine learning methods, specifically rule mining techniques, to predict pathways associated with protein entries available in UniProtKB. Our system can be used to enhance the quality of automatically generated annotations as well as annotating unknown proteins. Moreover, this system will provide an insight into the conservation of pathways across prokaryotes that differ in their taxonomic classification. ARBA was successfully applied to gain knowledge about pathway annotation type in all UniProtKB-SwissProt entries with manual assertion evidence corresponding to a specified prokaryotic taxon. ARBA presents this knowledge in the form of association rules that takes into account the organism taxonomy and the InterPro signature matches of protein sequences. These rules are then filtered efficiently using the Skyline operator in order to select the best representative rules in terms of several interestingness metrics to effectively minimize false positives as well as eliminating rules generated out of pure randomness. The resulting rules could be used as models to infer pathways for poorly annotated TrEMBL entries. We carried out an experimental study of the performance of ARBA on real datasets representing various prokaryotic taxa to demonstrate the robustness of our system. We found that ARBA achieved an average overall accuracy as high as 99.98%, F-measure of 0.948, precision of 0.977278673, and recall of 0.920848785.

L05 - Genome-wide molecular reconstruction of structure-based protein-protein interaction networks.

Surabhi Maheshwari, Louisiana State University, United States

Short Abstract: Protein-protein interactions (PPI) mediate several biological processes at the molecular level. Thus, building three-dimensional PPI networks is important to interpret the information encoded in genomes. Predicting PPI sites and the structure of a protein complex are two important related components of this problem. Several computational protein-protein prediction methods have been developed in the past. However, majority of the existing methodologies are designed for experimentally determined protein structures. Because a large number of proteins in a genome will only have structure models available, computational tools must be tolerant to structural inaccuracies in order to be used for genome-wide modeling of PPIs. We contribute to this topic by proposing eFindSitePPI, a software for PPI prediction that capitalizes on the tendency of the location of binding sites to be highly conserved across evolutionarily related protein dimers. We show that eFindSitePPI is highly tolerant to structural inaccuracies in the query proteins and performs better for protein-models when compared to nine other state-of-the-art prediction methods. Furthermore, we developed eRankPPI, an algorithm to identify correct docking conformations of protein-models as well as experimental structures produced by docking softwares. The scoring function of eRankPPI uses several features including predicted interfaces with probability estimates calculated by eFindSitePPI and contact-based symmetry scores. A comparative study between eRankPPI and other state-of-the-art scoring methods shows that eRankPPI improves the success rate by ~10% on the benchmark dataset of homo and hetero complexes. The encouraging results obtained especially for protein-models open up the possibility for large scale reconstruction of structure-based PPI networks.

L06 - bbcontacts: prediction of β-strand pairing from direct coupling patterns

Jessica Andreani, Max Planck Institute for Biophysical Chemistry, Germany

Johannes Soeding, Max Planck Institute for Biophysical Chemistry, Germany

Short Abstract: The problem of protein structure prediction from sequence alone is one of the most important and difficult in computational biology. Recently, a major breakthrough was achieved in template-free structure prediction. When enough homologous sequences are available, methods of global statistical network analysis can explain the observed correlations between columns in the multiple sequence alignment by a small set of directly coupled pairs of columns. Strong couplings are indicative of residue-residue contacts, and reliable de novo structural models can be computed from the predicted contacts.

We have recently developed bbcontacts, an algorithm which predicts the pairings of β-strands by exploiting the structural regularity of paired β-strands that leads to characteristic patterns in the noisy matrices of residue-residue couplings [1]. bbcontacts detects these characteristic patterns in the 2D matrix of couplings using two hidden Markov models, one for parallel and one for antiparallel contacts. β-bulges are modelled as indel states.

In contrast to previously published methods, bbcontacts uses predicted instead of true secondary structure. The capacity of bbcontacts to predict contacting pairs of β-residues was tested on a standard set of 916 proteins. bbcontacts achieves 50% precision at 50% recall using predicted secondary structure and 64% precision at 64% recall using true secondary structure, while other β-β contact prediction tools achieve around 45% precision at 45% recall using true secondary structure.

bbcontacts is open source software available at https://bitbucket.org/soedinglab/bbcontacts

[1] Andreani & Söding, Bioinformatics (2015), doi: 10.1093/bioinformatics/btv041.

L07 - Explore the usage of CATH-Gene3D functional family in comparative modelling

Su Datt Lam, UCL,

Sayoni Das, UCL,

Aurelio Garcia, UCL,

Ian Sillitoe, UCL,

Christine Orengo, UCL,

Short Abstract: Comparative modelling of structurally uncharacterised proteins generally produces a good 3D model, if a template with global sequence identity ≥30% was used. However, once the sequence identity falls below 30% (“Twilight zone”), the model quality deteriorates.

The FunFHMMer method clusters protein sequences based on similarities in the sequence patterns (highly conserved positions and specificity-determining positions) associated with conserved structure and functional properties, rather than global sequence identity. All the sequences in CATH-Gene3D have been clustered into functional families in this way. CATH-Gene3D functional families have been shown to contain relatives with highly similar structures. The structural coherence of these groups indicates that they have the potential to be used for the modelling of “Twilight zone” proteins.

This research explores the usage of functional families to guide the template selection process of comparative modelling. Using a benchmark query dataset of 194 non-redundant domain dataset, putative models were built using the MODELLER algorithm and the qualities of the models were assessed using the normalised DOPE score. When compared against “Twilight zone” proteins model using BLAST, this pipeline is capable of producing 9% more good quality models (with TM-score > 0.5).

In addition, this pipeline has been applied to model sequences in CATH-Gene3D that do not have any known structural information. The pipeline generates 15% more domain models for human sequences (83,518 domain models) when compared to the classical BLAST approach, which uses templates with sequence identity ≥30%.

L08 - PDIviz: Analysis and visualization of protein-DNA binding interfaces

Judemir Ribeiro, Pontificia Universidad Católica de Chile, Chile

Francisco Melo, Pontificia Universidad Católica de Chile, Chile

Andreas Schüller, Pontificia Universidad Católica de Chile, Chile

Short Abstract: PDIviz: Analysis and visualization of protein-DNA binding interfaces
Judemir Ribeiro, Francisco Melo and Andreas Schüller*
Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile.

Specific recognition of DNA by proteins is a crucial step of many biological processes. PDIviz is a plugin for the PyMOL molecular visualization system that analyzes protein-DNA binding interfaces by comparing the solvent accessible surface area of the complex against the free protein and free DNA. The plugin provides three distinct three-dimensional visualization modes to highlight interactions with DNA bases and backbone, major and minor groove, and with atoms of different pharmacophoric type (hydrogen bond donors/acceptors, hydrophobic, and thymine methyl). Each mode comes in three styles to focus the visual analysis on the protein or DNA side of the interface, or on the nucleotide sequence. PDIviz allows for the generation of publication quality images, all calculated data can be written to disk, and a command line interface is provided for automating tasks. The plugin may be helpful for the detailed identification of regions involved in DNA base and shape readout, and can be particularly useful in rapidly pinpointing the overall mode of interaction.

L09 - Profiling the Structural elements and Energetics involved in the binding of Small non-peptide compounds onto Plasmodium and human Cysteine proteases using computational approaches

Thommas Musyoka, Rhodes university, South Africa

Aquillah Kanzi, University of Pretoria, South Africa

Kevin Lobb, Rhodes University, South Africa

Özlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: Malaria remains a deadly disease in the tropical areas of Africa, South America and parts of Asia. Owing to co-evolution, Plasmodium parasites have developed protein structures similar to those found within the hosts making their selective eradication a daunting task. Efforts to successfully eradicate malaria have been thwarted by the constant development of resistance by the parasites against the majority of drugs so far developed and the lack of an effective preventive vaccine against malaria, thus the identification of novel compounds with novel mode of action remains a top priority. Plasmodium parasites lack a de novo amino acid synthetic pathway and consequently employ a plethora of proteases to degrade host’s proteins to acquire essential amino acids. Among these unique molecular scissors are Falcipain-2 and Falcipain-3 (FPs) which act as principle hemoglobinases and are therefore considered to be potential targets for development of efficacious anti-malarial drugs. In this study, a computational workflow consisting of homology modelling, molecular docking, molecular dynamic (MD) simulations and binding free energy (BFE) calculations was developed and validated using known FPs inhibitors to effectively screen for potential inhibitors against plasmodial cysteine proteases and identify key contributors to ligand binding. Through LigPlot+ interaction fingerprints, important subsite amino acid residues and interaction types involved in ligand binding were determined. MD simulations indicated that protein-ligand complexes were stable. BFE results showed that van der Waals and electrostatic interactions favoured binding while polar solvation impaired binding. These findings will be useful in the rational design of compounds with better antimalarial properties.

L10 - Protein function prediction via gaussian graphlet kernel

Elnaz Saberi Ansari, IPM, Iran, Islamic Republic of

Changiz Eslahchi, Shahid Beheshti University, Iran, Islamic Republic of

Morteza Milani, Shahid Beheshti University, Iran, Islamic Republic of

Mehdi Mirzaie, Tarbiat Modares University, Iran, Islamic Republic of

Short Abstract: Protein functions are key points to understand various biological processes. Experimental determination of protein functions is expensive and time-consuming. Therefore the need for computational techniques is essential. These techniques are based on the idea of assigning functions to unknown proteins according to known functions of similar proteins. Many different measures of similarity are introduced and many different frameworks compare proteins according to these measures. Since structure of a protein is more conserved during the evolution, recent methods usually use protein structures to find similarities between them. An efficient approach to compare tertiary structure of proteins is to model them as graphs. Usually, a machine-learning framework, specifically a kernel method like SVM, is used to compare these graphs and classify them into functional families. Representing graphs as vectors of features is a necessary part of using SVM. In our work we model proteins as contact graphs and introduce gaussian graphlet kernel to determine functional class membership of enzymes and also classify them as enzymes and non-enzymes. Graphlets are small induced subgraphs with three to five nodes that are considered as feature vectors in our method. We show that an SVM machine learned on these vectors using an RBF kernel can predict protein functions with high accuracy. To evaluate our method, we test the results of protein function prediction quality on two data sets, ENZYMES and D&D. It is shown that this method outperforms traditional methods like BLAST and even other kernel-based methods on both accuracy and runtime.

L11 - Representing ensembles of protein structures as molecular networks to identify structurally and functionally important residue interactions

Nadezhda Doncheva, Max Planck Institute for Informatics, Germany

John H. Morris, University of California, San Francisco, United States

Eric F. Pettersen, University of California, San Francisco, United States

Dina Schneidman, University of California, San Francisco, United States

Andrej Sali, University of California, San Francisco, United States

Thomas E. Ferrin, University of California, San Francisco, United States

Olga V. Kalinina, Max Planck Institute for Informatics, Germany

Mario Albrecht, Institute for Knowledge Discovery, Graz University of Technology, Austria

Short Abstract: Recent studies have shown that representing a protein structure as a network of interacting residues facilitates the analysis of structure-function relationships and advances our understanding of complex molecular mechanisms such as protein-protein and protein-ligand interactions. To capture the dynamic nature of protein structures and interactions, we developed a new method for visualizing and analyzing ensembles of protein structures by representing them as dynamic, weighted residue interaction networks. Ensembles could result from an experimental technique such as nuclear magnetic resonance spectroscopy or a computational method like molecular dynamics (MD) simulation or protein docking. Using residue interaction networks, we can analyze the variation reflected by the individual protein structures and, at the same time, identify non-covalent residue interactions shared by the different structures.
We applied our approach in two different scenarios to address current challenges in structural bioinformatics. First, we analyzed MD simulation data to characterize the effect of residue mutations on protein structure and function. Thereby, we compared wild-type and mutant simulations of a protein and pinpointed significant changes of the residue interaction pattern upon mutation. Furthermore, assuming that top-ranked docking solutions are more enriched in correct binding interfaces than in correct complexes, we analyzed sets of docking structures to determine the actual interface residues and their interactions based on their frequency of occurrence. We used three different benchmark sets and obtained promising performance results for docking methods that achieve reliable predictions.

L12 - Docking of protein models

Petras Kundrotas, University of Kansas, United States

Ivan Anishchenko, University of Kansas, United States

Ilya Vakser, University of Kansas, United States

Short Abstract: Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. That fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins. However, such “double” modeling remains so far largely untested in a systematic way. We present a comprehensive benchmarking study based on a set of 165×6 model protein structures with accuracy levels ranging from 1 to 6 Å Cα RMSD using template-based (TB) docking by structure alignment and free docking techniques. Many TB models fall into acceptable quality category, according to CAPRI criteria, even for highly distorted models (5 – 6 Å RMSD), although the number of such models (and, consequently, docking success rate) drops significantly for models with RMSD > 4 Å. In contrast, for the free docking, a significant drop in the success rate is observed already for the 1 Å models. The results show that TB methodology is significantly less sensitive to the inaccuracies of protein models compared to the free docking, and can be applied to the docking on a genome-wide scale.

L13 - Text mining for protein docking

Varsha Badal, The University of Kansas, United States

Petras Kundrotas, The University of Kansas, United States

Ilya Vakser, The University of Kansas, United States

Short Abstract: Protein-protein docking can be significantly improved when constraints on the docking mode are available. Protein-protein interactions are extensively studied by various approaches, yielding a vast cache of information hidden in the published literature. These publications are available online and the relevant information can be retrieved and analyzed by automated procedures utilizing text mining (TM) techniques. We developed a TM procedure that retrieves published abstracts on a specific protein-protein interaction and extracts residues mentioned in the text. The procedure was accessed on 579 X-ray structures of binary protein complexes from the DOCKGROUND resource (http://dockground.compbio.ku.edu) using two query types. The results show that correct information on binding residues can be extracted for up to 47% of complexes in the dataset. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine (SVM) models with various features were trained and validated on the subset. The remaining abstracts were filtered by the best-performing SVM models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constrains were incorporated in the docking protocol and tested on the DOCKGROUND unbound benchmark set, significantly increasing the docking success rate.

L14 - Protein interaction interfaces and genetic variation

Fábio Madeira, ,

Geoffrey Barton, Division of Computational Biology, College of Life Sciences, University of Dundee,

Short Abstract: There are currently more than 62 million Single Nucleotide Polymorphisms (SNPs) known and this number is doubling every two years stimulated by the falling cost of sequencing. Although many methods have been developed to predict the effect of non-synonymous SNPs on biological function and disease, few have focused on SNPs at protein-protein and protein-ligand interaction interfaces. Interfaces are essential sites for protein function and adaptation, and key in a majority of biological processes. The effects of non-disease intra- and inter-species variation occurring in such interaction surfaces remain mostly unexplored. The availability of over 105,000 protein three-dimensional structures allows the structural context of many SNPs at interfaces to be examined in atomic detail. Here, we present ProIntVar, a computational framework for mapping SNPs onto structure in order to study the features of variation at protein-protein and protein-ligand interfaces. ProIntVar allows the systematic analysis of genetic variation in protein structure interaction surfaces by integrating structural and sequencing data from several biological databases and resources. Genetic variants are analyzed in the context of functional families (FunFams), which are derived from structurally and functionally related protein domains classified in CATH (Class, Architecture, Topology, Homology). Examination of variation in protein interaction interfaces helps to infer which key residues are important for the function of the interface in a broader evolutionary sense. This approach has the potential to identify correlated adaptation, susceptibility to disease and unspecific protein-drug interactions in the human population that are due to sequence variation.

L15 - Organism specific protein-RNA recognition: A computational analysis of protein-RNA complex structures from different organisms

Nagarajan Raju, Indian Institute of Technology Madras, India

Sonia Pankaj Chothani, Philips Research North America, 345 Scarborough Road, Briarcliff Manor, NY10510, USA, United States

Ramakrishnan C, Indian Institute of Technology Madras, India

Sekijima Masakazu, Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku, Japan

Michael Gromiha M , Indian Institute of Technology Madras, India

Short Abstract: Motivation

Understanding the recognition mechanism of protein-RNA complexes has been a challenging task in molecular and computational biology. In this work, we have constructed 18 sets of same protein-RNA complexes belonging to different organisms. The similarities and differences in each set of complexes have been revealed in terms of various sequence and structure based features such as root mean square deviation, sequence homology, propensity of binding site residues, conservation at binding sites, binding segments and motifs, preferred amino acid-nucleotide pairs and influence of neighboring residues for binding.

Results

We found that the proteins of mesophilic organisms have more number of binding sites than thermophiles and the binding propensities of amino acid residues are distinct in E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea. Proteins prefer to bind with RNA using a single residue segment in all the organisms whereas RNA prefers to use a stretch of up to six nucleotides for binding wuth proteins. We developed amino acid residue-nucleotide pair potentials for different organisms, which could be used for predicting the binding specificity. Further, molecular dynamics simulation studies on aspartyl tRNA synthetase complexed with aspartyl tRNA showed specific modes of recognition in E. coli, T. thermophilus and S. cerevisiae.

Conclusion

Based on the structural analysis and molecular dynamics simulations we suggest that the mode of recognition depends on the type of the organism in a protein-RNA complex.

L16 - Structural signatures of Nucleoside Tri-Phosphate (NTP) ligands in diverse sequence and fold families

Raghu Bhagavat, Indian Institute of Science, India

Short Abstract: Nucleoside Tri-Phosphate (NTP) ligands are of high biological importance and essential for all life forms. The most fascinating feature of this class of ligands is their capability of binding a wide range of sequence and fold classes, and thus, it is of great interest to understand the recognition capabilities and commonalities existing at the recognition sites by probing at a molecular level. A structural bioinformatics approach using a non-redundant set of NTP binding structures was carried out, which included exhaustive binding site comparisons and alignments using in-house algorithms. A clustering analysis using the similarities at the binding site, and tree computation, to follow, led to derivation of structural signatures for recognizing NTP ligands. Although the proteins in the dataset belonged to 288 classes, their binding sites could be grouped into only 27 types, majorly, and further exploring the sub-structural similarities existing in these 27 types, a sub-set of only 9 site types was derived. A scan across the PDB database using the motifs that were identified not only led in determining the signatures to be highly specific for NTP recognition, but also, identified a few proteins that were not priorly annotated for NTP binding, and could be a suggestive of a possible NTP binding in such proteins. A comprehensive classification of the NTP binding sites into site types in all the different protein families using binding site similarities is the highlight of the work. Knowledge of determinants obtained from this study will be useful for detecting function in unknown proteins.

L17 - The Sequence and Structural Analysis of LPMO types of the Auxiliary Activity Family 9 Enzymes

Vuyani Moses, Rhodes University, South Africa

Ozlem Tastan Bishop, Rhodes University, South Africa

Short Abstract: Cellulose is a highly abundant substance that has been investigated widely for its potential application in the production of biofuels. Cellulose consists of glucose units that form tightly packed chains which crystallize. This crystalline arrangement of cellulose makes it recalcitrant. Fungi possess a wide range of cellulose degrading enzymes. A group of enzymes called Auxiliary Activity family 9 (AA9) enzymes has been demonstrated in studies to increase the rate of cellulose degradation. AA9 enzymes are believed to function by disrupting the cellulose structure. AA9 proteins are classified into 3 types based on the cleavage Carbon on the glucose ring. Type 1 proteins cleave the C1, Type 2 proteins cleave the C4 carbon and Type 3 proteins cleave both C1 and C4. Bioinformatics techniques were used to assess the sequence and structural features of different AA9 types and to understand how these features affect enzymatic function. At sequence level, AA9 proteins showed a high degree of variability in the C-terminus. Type specific regions on AA9 proteins were observed on the C-terminus. The structural analysis of AA9 structures suggested that the observed variation in AA9 protein sequences affects the active site configuration, which in turn, affects type specificity. Physicochemical properties were investigated through the analysis of different phylogenetic AA9 groups. Variations in physicochemical properties were observed and this was largely attributed to the diverse nature of these enzymes. Our analysis of AA9 proteins provided insights into the possible contributors to substrate specificity including the evolutionary dynamics of these enzymes.

L18 - Length-independent canonical forms of antibody Complementarity Determining Regions

Jaroslaw Nowak, University of Oxford,

Terry Baker, UCB Pharma Ltd,

Guy Georges, Roche Diagnostics GmbH, Germany

Stefan Klostermann, Roche Diagnostics GmbH, Germany

Jiye Shi, UCB Pharma Ltd,

Sudharsan Sridharan, MedImmune,

Charlotte Deane, University of Oxford,

Short Abstract: Antibodies are Y-shaped proteins used by the immune system to bind and potentially neutralize foreign objects (antigens) that have entered the body. The antigen combining site of an antibody consists primarily of six hypervariable loops (L1-L3, H1-H3), known as the Complementarity Determining Regions or CDRs. Together, these determine an antibody's binding properties. Five out of the six CDRs (L1, L2, L3, H1, H2) form only a small number of discrete conformations called canonical classes. Previous work in this area assumes that CDRs of different lengths should, by default, belong to different classes. We exploited dynamic time warping, an algorithm originally designed for comparing temporal sequences varying in speed, to measure similarity between loops of different lengths and used density-based clustering to classify CDRs into length-independent canonical classes. The concept of length-independence allows us to cluster a larger number of CDRs into a smaller number of classes than the length dependent approach. In comparison to the length-dependent approach, it also improves the accuracy of canonical class prediction from sequence. We have also found that CDRs of different lengths that are co-clustered tend to show similar sequence patterns, even when they are coded by genes from different subgroups, pointing to a greater functional redundancy in the immune loci than previously known.

L19 - Biologically Inspired de novo Protein Structure Prediction

Saulo de Oliveira, Department of Statistics - University of Oxford,

Jiye Shi, Informatics - UCB Pharma,

Charlotte Deane, Department of Statistics - University of Oxford,

Short Abstract: We have implemented a biologically inspired fragment-assembly de novo structure prediction program called SAINT2. SAINT2 differs from conventional fragment-assembly approaches such as ROSETTA as it performs sequential predictions. It starts with a small peptide that is extended as the simulation progresses. We have used SAINT2 to investigate the influence of several biological features on protein folding. Our aim was to assess which of these features can aid protein structure prediction. In particular, we investigated the cotranslational protein folding hypothesis. Cotranslational protein folding is the notion that some proteins fold as they are synthesized. This process is thought to promote the formation of energetically favourable intermediates. By doing so, it restricts the conformational space and optimizes the folding process. We compared structure prediction carried out sequentially from N-terminus to C-terminus (cotranslational), from C-terminus to N-terminus (reverse), and global (where the sequence is fully elongated). We found that cotranslational structure prediction produced the best answers. We have also tested SAINT2 in the Critical Assessment for protein Structure Prediction and were able to produce some good models.

L20 - Discrimination and prediction of protein-protein binding affinity

Yugandhar Kumar, Indian Institute of Technology, India

Michael Gromiha M, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, India

Short Abstract: Motivation:
Protein-protein interactions play crucial roles in several biological processes and are responsible for smooth functioning of the machinery in living organisms. Binding affinity determines the specificity in many protein-protein complexes and alteration of which might cause severe functional implications leading to diseases. Predicting the binding affinity of protein-protein complexes provides deep insights to understand the recognition mechanism and identify the strong binding partners in protein-protein interaction networks.
Results:
In this work, we related energetic, physico-chemical and conformational properties of amino acid residues with experimental binding affinity of protein-protein complexes. Based on the results, we developed machine learning methods for discriminating protein-protein complexes into low and high binding affinities and predicting the real value binding affinity. In a training set of 155 complexes, our method showed an accuracy of 76.1% using 10-fold cross-validation and 83.3% on test set of 30 complexes. We classified the complexes based on their functions and developed multiple regression models for real value prediction. We observed that the correlation lies in the range of 0.73-0.93 with Mean Absolute Error (MAE) of 1.02 kcal/mol on leave-one-out cross-validation and the MAE is 1.33 kcal/mol on test set of 45 complexes. Our analysis showed the importance of binding site residues in determining the binding affinity and specificity in protein-protein complexes.
Conclusion:
We developed the first sequence based algorithms to address the task of discrimination and prediction of protein-protein binding affinity. We suggest that our models would serve as an effective tool for identifying interaction partners in protein-protein interaction networks and host-pathogen interactions.

L21 - A minor groove indirect readout mechanism is used by proteins from different DNA-repair pathways to detect damage in DNA.

Juan Cifuentes, Pontificia Universidad Católica, Chile

Francisco Melo, Pontificia Universidad Católica, Chile

Short Abstract: DNA damage occurs ubiquitously in the genome, in a single day thousands of chemically different lesions accumulate in different sequence contexts separated even by millions of bases. A limited number of damaged DNA detection proteins, belonging to three DNA repair pathways (NER, BER, MMR), recognize hundreds of chemically different adducts in a manner independent of the surrounding sequence suggesting a structural indirect readout recognition mechanism. However, the nature of this process remains unknown.

Here, using experimentally solved protein-damaged DNA complexes and isolated damaged-DNA structures, we show that isolated damaged DNA clusterises as B-DNA characterized by a wide minor groove at the lesion point.
In the lesion recognition process, protein damaged-DNA complexes show specifically enriched phenylalanines with extensive interaction surface. In the protein damaged-DNA interface, these phenylalanines intercalate in stacking geometry with damaged or surrounding bases through the minor groove of DNA. Tyrosine and arginine show a similar behavior whereas leucine was found to intercalate in alkylated lesions. The intercalation of these residues co-occurs with the extension of minor groove width. These structural features are common to NER, BER and MMR pathways. Finally, using high interaction surface area of phenylalanines and widened minor grooves as markers, we found proteins whose primary function is not damage recognition but show damaged-DNA interaction in vitro or interference to DNA repair pathways in vivo, such as TBP and HMGB. This study suggests a specific indirect readout mechanism of damaged DNA detection based on the recognition of broader minor grooves at the lesion point.

L22 - Mapping the Protein-protein Interaction Free Energy Landscape: Energy-based approach to scoring docking decoys

K. Anton Feenstra, IBIVU/VU University Amsterdam, Netherlands

Qinzhen Hou, IBIVU/VU University Amsterdam, Netherlands

Kamil K. Belau, Intercollegiate Faculty of Biotechnology UG&MUG, Poland

Jaap Heringa, IBIVU/VU University Amsterdam, Netherlands

Marc F. Lensink, CNRS UMR8576 / Université Lille Nord de France, France

Short Abstract: Measuring binding free energy is essential to understand the relevance of particular protein-protein interactions in their biological context. Moreover, at the atomic scale, molecular simulations give us insight into the physically realistic details of these interactions. In our recent study, we successfully applied coarse-grained molecular dynamics simulations to estimate binding free energy with similar accuracy as and 500-fold less time consuming than full atomistic simulation. The approach relied on the availability of crystal structures of the protein complex of interest. Here, we investigate the effectiveness of this approach as a scoring method to identify stable binding conformations out of docking decoys from protein docking.
We apply our method as an evaluation method to rank more than 19000 protein conformations for 15 benchmark targets from Critical Assessment of PRedicted Interactions (CAPRI). For each complex, we calculate the free energy barrier relative to the bound state based on coarse-grained molecular dynamics simulations. For the 'easier' targets that have many near-native conformation, we obtain a strong enrichment of acceptable or better quality structures in our top 100 selected structures. Moreover, for the 'hard' targets with no or very few near-native complexes in the decoys, our method is still able to select structures which are closed to near-native structures. To the best of our knowledge, this is the first time interaction free energy from a coarse-grained force field is used as a scoring method to rank docking solutions at a large scale.

L23 - Protein structure bias exhibited by translating ribosomes

Alistair Martin, University of Oxford,

Charlotte Deane, University of Oxford,

Short Abstract: Ribosome profiling measures the relative number of ribosomes attached to a mRNA strand and, crucially, their positions to near codon level resolution. Simple abstraction lets us use these relative numbers of ribosomes in a gene as a measure of the relative time spent on each codon. For our research, we have combined these relative durations with the experimentally determined protein structure for genes in E. coli, B. subtilis, and S. cerevisiae. We find that there is a relationship between the duration spent on a specific codon and the structural element it encodes for. These results indicate an additional layer of information pertaining to the protein structure encoded within the codon choice exhibited in mRNA. This is surprising as the central dogma suggest that a protein retains no knowledge of the exact nucleotide sequence which led to their creation. We hope that in the future we can use these results to improve upon the current biophysical understanding of translation and structure formation.

L24 - Subtype-specific Structural Characteristics and Molecular Dynamics of Glycosylated HIV-1 gp120 Proteins

Natasha Wood, University of Cape Town, South Africa

Short Abstract: While efforts to develop an effective vaccine to prevent HIV infection have achieved limited success, recent studies have shown that the human immune system can produce broadly cross-neutralising (BCN) antibodies capable of neutralising a large spectrum of HIV strains. For many of these BCN antibodies, carbohydrates on the surface of the HIV-1 gp120 glycoprotein play a key role in this process since their epitopes comprise either entirely, or partially, of carbohydrates. Using molecular dynamics (MD), we previously have shown that the presence of N-linked glycans has a significant effect on the dynamics of the gp120 glycoprotein. However, these studies were limited due to challenges associated with building 3D structures for densely glycosylated glycoproteins.

Here, we present an approach that explores the most populated rotamers of the Asn-GlcNAc linkage and then adapts the carbohydrate structure to its environment by iteratively rotating the interglycosidic linkages within normal bounds. This automated process increases the number of carbohydrates that can be attached to glycoproteins without the need for manual adjustment or energy minimisation. We have applied this method to investigate the structural differences in HIV-1 gp120 subtype-specific glycosylation profiles. Our results suggest that the high-density of glycans on the surface of HIV-1 gp120 causes mutual exclusion to occur, which may result in associated, and distinctive, glycan-protein or glycan-glycan interactions. Using MD we further present results that illustrate how different carbohydrate distributions influence the spatial dynamics of the gp120 glycoprotein model, which may play a role in both coreceptor usage and in forming carbohydrate-dependent epitopes.

L25 - LoopIng: a template-based tool for predicting the structure of protein loops

Mario A.Messih, Department of Physics, Sapienza University, Italy

Anna Tramontano, Department of Physics, Sapienza University and Istituto Pasteur-Fondazione Cenci Bolognetti, Italy

Short Abstract: Motivation: Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in predicting their structure. However, loops are often involved in protein function by interacting with other molecules, hence inferring their structure is important for predicting protein structure as well as function.

Results: We have developed a method, LoopIng, which uses both sequence and geometry related features (i.e. loop sequence, sequence similarity, stem distance, stem secondary structure and stem geometry). These features are used in a Random Forest (RF) machine learning regression model that is trained to select the loop template with the lowest predicted 3D distance from the target loop among a list of putative ones. LoopIng achieves significant enhancements (~1Å RMSD) over the most recently available template-based as well as ab-initio methods (i.e. LoopWeaver and DisGro, respectively). In addition, compared to LEAP, which is an ab-initio method, LoopIng is able to achieve similar accuracy for short loops (4-10 residues) and significantly better improvements for long loops (11-20 residues) while being orders of magnitude faster (1 minute/loop vs. 10 hours/loop). The quality of the predictions is robust to errors that unavoidably affect stem regions when these are modeled. The method returns a confidence score for the predicted template loops.

Availability and implementation: www.biocomputing.it/looping

L26 - Addressing the protein disorder content in Cryptococcus spp. genomes

Grace Tavares, Oswaldo Cruz Foundation, Brazil

Elvira Horácio, Oswaldo Cruz Foundation, Brazil

Leilane Gonçalves, Oswaldo Cruz Foundation, Brazil

Daniela Resende, Oswaldo Cruz Foundation, Brazil

Jeronimo Ruiz, Oswaldo Cruz Foundation, Brazil

Short Abstract: An increasing number of proteins are being identified that are biologically active though intrinsically disordered. This finding contrasts with the classic notion that proteins require a well-defined globular structure in order to be functional. In addition, intrinsically disordered proteins (IDPs) have the ability of interacting with multiple protein targets and are classified as hub proteins in protein-protein interaction networks. These characteristics may favor host-parasite relationship, adhesion, invasion and survival of parasites in the host. The etiological agents of cryptococcosis are species of the fungus Cryptococcus spp. The disease typically affects immunocompromised patients and represents a neglected public health problem. The main goal of this work is to study the role of IDPs in the predicted proteomes of Cryptococcus spp. available in public domain databases. In order to achieve this goal 12 Cryptococcus genomes have been structurally and functionally re-annotated and the predicted proteomes used for high-throughput screening of IDPs. Computational predictions were performed using: DisEMBL, Gloplot, ANCHOR, IUPred and Dis-PRO among others. Consensus prediction was evaluated using in-house developed scripts and proteins were identified. IDPs associated with virulence factors, functional enrichment analysis, Gene Ontology assignments and the comparative analysis of re-annotated genomes will be presented. The developed analytical approach and relational database will also be presented.

Financial support: CAPES, FAPEMIG, FIOCRUZ, CNPq.

L27 - A structural view into signaling and regulation of Toll-like receptor pathway and its implications in inflammation and cancer crosstalk

Emine Guven Maiorov, Koc University, Turkiye

Ozlem Keskin, Koc University, Turkiye

Attila Gursoy, Koc University, Turkiye

Ruth Nussinov, NCI, United States

Short Abstract: Although inflammation is crucial for defense against pathogens, it can also contribute to all phases of tumorigenesis, if not finely tuned. TLR pathway plays a central role in inflammation and cancer crosstalk and construction of their structural pathway provides insights into their mechanism of action in tumor microenvironment. We constructed the structural TLR pathway and the architectures that we obtained (i) provide the structural basis for TLR clustering upon stimulation and assembly of key signaling complexes; (ii) demonstrate that almost all downstream parallel pathways are competitive; (iii) TIR domain-containing negative regulators (BCAP, SIGIRR, and ST2) interfere with TIR domain signalosome formation; (iv) major deubiquitinases (A20, CYLD, and DUBA) prevent association of TRAF6 and TRAF3 with their partners, in addition to removing K63-linked ubiquitin chains that serve as docking platform for downstream effectors; (v) and illuminate mechanisms of oncogenic mutations. Missense mutations that fall on interfaces and nonsense/frameshift mutations that result in truncated negative regulators disrupt the interactions with their targets, thereby enable constitutive activation of NF-κB, and contribute to chronic inflammation, autoimmune diseases and oncogenesis.

L28 - Aquaria: simplifying discovery and insight from protein structures

Sean ODonoghue, CSIRO & Garvan Institute, Australia

Kenneth Sabir, Garvan Institute, Australia

Maria Kalemanov, TU Munich, Germany

Christian Stolte, CSIRO, Australia

Benjamin Wellmann, TU Munich, Germany

Vivian Ho, Garvan Institute, Australia

Manfred Roos, TU Munich, Germany

Nelson Perdigão, Universidade de Lisboa & Instituto de Sistemas e Robótica, Portugal

Fabian Buske, Garvan Institute, Germany

Julian Heinrich, CSIRO, Australia

Burkhard Rost, TU Munich, Germany

Andrea Schafferhans, TU Munich, Germany

Short Abstract: Aquaria (http://aquaria.ws) helps biologists use protein 3D structures to gain insight into molecular function. It provides intuitive, graphical access to 48 million model structures that can be mapped with 65 million sequence features.

L29 - SOuLMuSiC : Predicting Protein Solubility Changes upon point Mutations

Raphael Bourgeas, Université Libre de Bruxelles, Belgium

Fabrizio Pucci, Université Libre de Bruxelles, Belgium

Marianne Rooman, Université Libre de Bruxelles, Belgium

Short Abstract: The improvement of protein solubility is an important objective in several fields of science and industry, such as the development of vaccines or the efficient production of proteins through endogenous or heterologous expression. The existing methods for predicting protein solubility are mainly based on the amino acid sequence. Here, we present a new method for predicting protein solubility changes upon point mutations (Δs), which is based on both the protein sequence and structure, and exploits a series of newly developed statistical potentials that are solubility-dependent. These potentials were extracted from datasets that contain only proteins with certain solubility values and thus reflect these specific properties.
We start by a statistical analysis of these solubility-dependent mean force potentials in view of understanding the contribution of the different amino acid interactions to the solvation free energy. We also discuss the influence of the secondary structure as well as the solvent accessibility of specific residues to the solubility properties of the proteins.
In the next step, we built a computational tool that uses as input the sequence and three-dimensional structure of the wild type protein to predict Δs. Here, the newly developed solubility-dependent potentials are combined with the help of an artificial neural network. The different parameters of the neural network were identified by minimizing the root mean square deviation between the experimental and predicted Δs values for a dataset of mutations that have been collected from the literature.

L30 - Automatic Association of Enzyme EC Numbers with Pfam Protein Domains to Enrich Protein Structure Annotation

Seyed Ziaeddin ALBORZI, INRIA Nancy-Grand Est, France

David RITCHIE, INRIA Nancy-Grand Est, France

Marie-Dominique DEVIGNES, CNRS, France

Short Abstract: With the growing number of three-dimensional (3D) protein structures in the protein data bank (PDB), there is a need to annotate these structures at the domain level in order to relate protein structure to protein function. Currently, thanks to the new SIFTS database, many PDB structures are cross-referenced with Pfam domains and annotated by EC numbers. However, these annotations do not include any explicit relation between structural (Pfam) domains and EC numbers. Therefore, creating a mapping between EC numbers and Pfam domains will provide a new and more detailed level of protein structure annotation.

To achieve this aim, we first collected a list of associations between EC numbers and Pfam domains from SIFTS, Swiss-Prot, and TrEMBL. Then, by using the InterPro database as a “gold-standard”, we devised a pruning algorithm to pick frequent associations from the list of relations. Finally, we clustered the Pfam domains into EC classes where each EC class shares the first 3 digits (e.g. 1.1.1.*) of a 4-digit EC number.

In short, our method inferred 8350 high confidence associations between EC numbers and Pfam domains with F-measure of 82%, while the gold standard InterPro database contains only 1500 associations. This suggests that many new PDB structures may now be annotated automatically at the 3D domain level.

L31 - On the Interest of Semi-Supervised Approaches with Spacial Dependance in Structural Alphabet Encoding

Ikram ALLAM, MTi laboratory, Paris 7 , LPMA laboratory,Paris 6, Paris 5, France

Delphine FLATTERS, MTi laboratory, INSERM UMR-S 973 University Paris Diderot Paris 7, France

Leslie REGAD, MTi laboratory, INSERM UMR-S 973 University Paris Diderot Paris 7, France

Anne-Claude CAMPROUX, MTi laboratory, INSERM UMR-S 973 University Paris Diderot Paris 7, France

Gregory NUEL, PSB, LPMA laboratory,CNRS INSMI UMR 7599, France

Short Abstract: Structural alphabet (SA) has proved to be a powerful tool to compress the three dimensional protein conformations (3D) of protein into a one-dimensional representation (1D) by encoding protein fragments into structural sequences or Structural Letter sequences (SLs) which can be analysed using standard sequence analysis Tools. SA has demonstrated its usefulness for protein analysis through many applications such as protein classification, structure alignment, structure fast comparison, extraction of functional motifs, etc.
The development of a SA able to integrate the flexibility of the 3D structure of proteins and their cavity (pocket) capable of binding drug is a crucial challenge in the drug design and drug discovery domains. This is now possible thanks to the growing number of protein 3D structures identified in the 3D protein databank PDB (≈ 10,700 structures).
In this work, we present a brief review of the various SA available to encode 3D PDB structures into SL. These various SA are based on mixture models, hidden Markov models (HMM) or classification tools. We then evaluate the interest of two fundamental concepts: a) unsupervised or semi-supervised training of SA; b) accounting or not for the spacial dependance between protein fragments.
The results show that the integration of the semi-supervised approach with fragment spacial dependance, typically through HMM, can dramatically improve the performance of SA. It could be a very promising approach for drug design applications.

L32 - A machine learning approach for protein-protein interaction prediction using rigid docking models

Yuri Matsuzaki, Tokyo Institute of Technology, Japan

Jaak Simm, (1) ESAT-STADIUS, KU Leuven, (2) iMinds Medical IT, (3) Dept of Gene Technology, Talllin University of Technology (Estonia), Belgium

Short Abstract: Protein-protein interaction (PPI) plays key roles in living systems. Predicting relevant interacting partners from their tertiary structure is a challenging topic where computer science methods have potential to contribute. Protein-protein rigid docking has been applied for this purpose by several projects. However, the prediction power is limited mainly because of poor correlation between docking score and actual protein-protein binding affinity. To improve state of the art we propose a machine learning approach to improve docking-based PPI predictions by using site-specific residue profiles obtained by rigid docking. A possible application of our method is a problem in which we have one ‘receptor’ protein and pick up proteins that have potential to interact with it from a pool of candidate ‘ligand’ proteins. We define an interface fingerprint of a pair of proteins as a collection of residues of a ‘receptor’ protein, with information of how often each residue are included in binding sites of 3600 highest scoring docking models. To examine if interactors and non-interactors can be discriminated using this profile, we first used binding residues of known protein complexes to construct the profiles. We applied multiple methods of machine learning to discriminate binders and non-binders using those profiles as input. In this poster we present an evaluation of our method by applying it to proteins of Protein Docking Benchmark ver. 4.0.

L33 - MOLE 2.5: Improved Tool for Analysis of Ligand-Accessible Channels

Lukáš Pravda, Central European Institute of Technology, Czechia

Karel Berka, Faculty of Science, Palacky University, Czechia

Radka Svobodová Vařeková, Central European Institute of Technology, Czechia

David Sehnal, Central European Institute of Technology, Czechia

Michal Otyepka, Faculty of Science, Palacky University, Czechia

Jaroslav Koča, Central European Institute of Technology, Czechia

Short Abstract: Ligand-accessible channels are indispensable for a huge variety of cell-life processes. They enable passage of substrate/product compounds to/from the active site in case this site is deeply buried within the protein structure. Physicochemical properties of these channels such as polarity, hydropathy, charge or bending and radius greatly influence the specificity and selectivity of enzymatic reaction. Therefore, precise detection and, especially characterization of their properties is of a main interest of many researchers involved in rationalizing their roles in enzymatic reactions. Such knowledge can be directly utilized in drug design, rational design of enzymes and other biotechnological application.

Here we demonstrate the capabilities of MOLE software on a PDB-wide analysis of ligand-accessible channels leading to the buried ligands bound to the protein structure. Properties and specificities of individual channel classes, with respect to the ligand type, are discussed and statistically evaluated. MOLE is available as a standalone application and plugins for popular molecular-browsers free of charge at (http://mole.chemi.muni.cz) or as web-service (http://mole.upol.cz).

L34 - MotiveQuery: Web application for fast detection of biomacromolecular fragments in the entire Protein Data Bank

Radka Svobodová Vařeková, Central European Institute of Technology, Czechia

Lukáš Pravda, Central European Institute of Technology, Czechia

David Sehnal, Central European Institute of Technology, Czechia

Crina-Maria Ionescu, Central European Institute of Technology, Czechia

Jaroslav Koča, Central European Institute of Technology, Czechia

Short Abstract: Presently, large volume of biomacromolecular structural data can be mined in order to answer key biological questions. Various computational analyses are generally performed to complement this modern research approach - from simple comparisons to advanced molecular simulations. The first step of these analyses is usually data set preparation, which often includes important biomacromolecular fragments such as catalytic and binding sites, amino acid sequences, supersecondary structures etc. Unfortunately, the detection and extraction of these fragments from large structural databases like the Protein Data Bank is often a challenging task, especially for more complex fragments.

Here we present MotiveQuery, a web-based application designed for rigorous detection and fast extraction of such fragments. The application uses a unique chemical language with Python-like syntax to define the fragments that will be extracted from datasets provided by the user, or from the entire Protein Data Bank. Moreover, the database-wide search can be restricted using a variety of criteria, such as PDB ID, resolution, and organism of origin, to provide only relevant data. The extraction usually takes a few seconds for several hundreds of entries, up to an hour for the whole PDB. The detected fragments are made available for download to enable further processing, as well as presented in a clear tabular and graphical form directly in the browser. MotiveQuery is available free of charge at http://ncbr.muni.cz/MotiveQuery.

L35 - Determining the winning SH3 coalition: how cooperative game theory reveals the importance of domain residues in peptide binding

Ashley Conard, , United States

Elisa Cilia, Université Libre de Bruxelles, Belgium

Tom Lenaerts, Université Libre de Bruxelles, Belgium

Short Abstract: Cell signaling relies on protein-protein and protein-peptide interactions involving signaling domains, which typically recognize specific peptide motifs. For instance, SH3 domains bind preferably to proline-rich amino-acid motifs. Phage-display experiments allow one to determine those motifs and whether surface or core domain mutants gain or loose preference for peptide motifs. Here, we present an approach utilizing the Shapley Value (SV) from Cooperative Game Theory to determine the importance of seven residues in the Fyn SH3’s hydrophobic core. The core positions and the residues in those positions represent the players of a cooperative game in which the worth of each coalition is measured through its capacity to discriminate the binding and non-binding mutants for certain classes of peptides. The players (positions or residues) can be seen as the features of SH3 mutants in a binary classification task. Essentially, we use a feature selection method based on the SV to assign a pay-off to each core position and residue. We quantify their importance to promote peptide binding as well as their joint effects, and their interactions, represented through networks. Our results provide novel insights suggesting that the Fyn SH3 domain must contain different signatures of amino acids to promote binding to various peptide classes. This analysis highlights residue importance for proper domain function, which helps scale conservation profiles (e.g. WebLogo) by adding functionally relevant properties. These detailed pieces of information contribute an effective and novel approach to understanding the role core residues play, next to normally investigated binding-site residues, in binding specific peptides.

L36 - Cloud-Based Visualization of Value-Added Model Annotations Using Jmol

Robert Hanson, St. Olaf College, United States

Short Abstract: Recent advances in cloud-based services now allow the real-time merging of 3D structural models of proteins and nucleic acids from wwPDB with calculational "annotations" -- model validation information for deposited structures, secondary structure analysis for RNA and DNA, sequence alignment information for proteins. This poster presents recent developments in Jmol carried out in collaboration with the European Bioinformatics Institute (EBI) and others that change the meaning of "visualization" with respect to biomolecules, extending such visualization in ways that have not been possible previously.

L37 - Accurate ab initio and Template-Based Predictions of Short, Intrinsically Disordered Regions via Bidirectional Recurrent Neural Networks Trained on Large-Scale Data Sets

BADR ALSHOMRANI, University College Dublin, Ireland

Short Abstract: Intrinsically disordered regions lack a well-defined 3D structure but play key roles in determining the function of numerous proteins. Although predictors of disorder have shown to achieve relatively high rates of correct classification of these segments, improvements over the years have been slow, and accurate methods are needed that are capable of accommodating the ever-increasing amount of structurally- determined protein sequences to try to boost predictive performances.
In this paper we propose a predictor for short disordered regions based on Bidirectional Recurrent Neural Networks and tested by rigorous 5-fold cross validation on a large, non-redundant dataset collected from MobiDB, a new comprehensive source of protein disorder annotations. The system exploits sequence and structural information in the forms of frequency profiles, predicted secondary structure and solvent accessibility, and direct disorder annotations from homologous protein structures (templates) deposited in Protein Data Bank. The contributions of sequence, structure and homology information result in large improvements in predictive accuracy. Additionally, the large scale of the training set leads to low false positives rates, making our systems a robust and efficient way to address high-throughput disorder prediction.

L38 - Coevolution of residues in the Htr2a and Htr2c serotonin receptors suggests that they act as a functional heterodimer that may be relevant in forming substance addictions.

Bernard Fongang, University of Texas Medical Branch at Galveston, United States

Min Zhu, University of Texas Medical Branch at Galveston, United States

Noelle Anastasio, University of Texas Medical Branch at Galveston, United States

Carrie McAllister, University of Texas Medical Branch at Galveston, United States

Andrzej Kudlicki, University of Texas Medical Branch at Galveston, United States

Short Abstract: Serotonin (5-HT) is a monoamine neurotransmitter which regulates activities such as sleep, appetite, substance addiction, and mood. The actions of 5-HT in neurons are transduced by at least 14 subtypes of 5-HT transmembrane receptors grouped into seven distinct classes largely on the basis of their structural and operational characteristics. Evidence and models suggest that some of the receptors may act as heterodimers. Specifically, it has been shown that selective blockade of the 5-HT2A or the activation of the 5-HT2C consistently reduces impulsivity and suppress both the cue- and cocaine-evoked reinstatement. Moreover, recent studies in rats have shown that these receptors always act synergistically to achieve these functions, suggesting a possible contact between their residues in 3D structure. Direct experimental verification of this hypothesis by means of co-crystallization is difficult as they are both transmembrane proteins and their 3D structures have not yet been determined. Here, we used the Direct Coupling Analysis (DCA) method to reveal the native contacts between the heterodimer Htr2a and Htr2c proteins. By studying the co-evolution of residues in the two proteins over more than a hundred species, we identified putative residue contacts and showed that direct contacts between the two proteins are likely limited to extracellular domains. Our results improve the knowledge about serotonin receptors in general and may help understand how these receptors act synergistically to reduce cocaine dependence. Experimentally, the contact sites identified by DCA will be the targets of mutation studies that will confirm the significance of the HTR2a/HTR2C interactions.

L39 - RNA secondary structure prediction with pseudoknots using Newtonian dynamics simulations

Nils Petersen, University Hamburg, Germany

Andrew Torda, University Hamburg, Germany

Short Abstract: Background:
One of the major problems of secondary structure prediction from RNA sequences is that most algorithms are limited to nested structures without pseudoknots. This is due to both computational complexity and limitations of the existing energy models.

Description:
We have developed a new method to predict RNA pseudoknots using Newtonian dynamics in an artificial base pair space. This required converting parts of the popular, discrete nearest neighbor energy model into continuous force field terms and simulating non-physical particles (base-pair probabilities) in a one-dimensional space. This model imposes no restrictions on the topology of the structure and thus allows all kinds of pseudoknots. Furthermore, we demonstrate how it can be coupled to three-dimensional structure models and how secondary and tertiary structure representations can be simulated at the same time.

Conclusions:
On its own, the model is very simple and brings no improvement compared to other methods. However, it has the major benefit that it can be coupled to a 3D structure model. We will exploit this in the future. Adding a very simple coarse grained model will allow us to predict secondary structures which are physically plausible given the steric constraints. Conversely, we want to use the secondary structure model to bias more detailed simulations in order to enhance the sampling of 3D structures.

L40 - The Knob-Socket Model: An Amino Acid Code for Protein Packing

Keith Fraga, , United States

Hyun Joo, University of the Pacific, United States

Jerry Tsai, University of the Pacific, United States

Short Abstract: The complexity and non-specificity of protein packing represents a major impediment to further advances in understanding protein structure. By providing a simple one-to-one correspondence, the knob-socket model addresses this gap with motifs that directly relate amino acid composition to 3D residue conformations. Using the precision of Voronoi Polyhedra/Delauney Tessellations to identify tertiary protein contacts, the knob-socket motif involves three or four residues in a tetrahedral configuration: a socket formed from three residues local in sequence present a surface that may pack a knob residue distant in sequence. A filled socket packs with a knob and a free socket does not. While organization of sockets is sensitive to the secondary structure, the nature of a knob doesn't depend on secondary structures. Because the compositions of filled and free sockets are often mutually exclusive, the knob-socket model establishes an amino acid code for protein packing. As a proof of concept, a α-helix forming sequence was successfully designed de novo to form oligomers using the knob-socket principles. Another capability of the knob-socket model is to project protein packing onto simple and descriptive 2D packing surface topology maps. Analysis of contiguous sets of sockets reveals the canonical packing patterns between secondary structures. Furthermore, the packing surface topologies produce insightful maps of entire protein structures and highlight the significant role of coil residues in protein structure. The use of packing surface topology maps demystifies the complexity of protein structure by allowing for a meaningful interrogation of structure.

L41 - Functional consequences of single and multiple phosphorylation on signaling pathways

HAFUMI NISHI, Tohoku University, Japan

Emek Demir, Memorial Sloan-Kettering Research Center, United States

Anna Panchenko, National Institutes of Health, United States

Short Abstract: Cellular fate depends on the spatio-temporal separation and integration of signals provided by phosphorylation events. To elucidate this signaling process, we sought to correlate molecular characteristics of single and multiple phosphorylation with their regulatory effects on signaling pathways. Using multiple existing databases (PhosphoSitePlus, KEGG, Reactome and Pathway Interaction Database), we integrated the data of individual human phosphosites with evidence on their corresponding kinases and the functional consequences of their phosphorylation on activity of the target protein and corresponding pathways. Statistical analyses of the integrated data revealed significant clustering, both in sequence and space, of multiple phosphorylation sites in a single target protein that have similar regulatory functions, share the same kinases and participate in regulation of similar pathways. Moreover, phosphorylation of sites with similar downstream functional consequences as well as sites regulated by the same kinase have comparable effects on protein stability. Taken together, our results indicate that there are certain patterns in phosphosites’ locations and structural/sequence properties, which can potentially play a role in mediating the communication between different functional states and pathways.

L42 - The Synergistic Effect of UPF1 Binding to 3'UTR on microRNA Targeting

Jwawon Seo, Hanyang University, Korea, Rep

Jin-Wu Nam, Hanyang University, Korea, Rep

Short Abstract: UPF1 is a well-known RNA helicase with essential roles in nonsense-mediated mRNA decay (NMD), a surveillance pathway. However, UPF1 binds predominantly not to coding region but to 3’UTR of mature mRNA and regulates the mRNA abundance depending on the length of 3’UTR. UPF1 also interacts with argonaute (AGO) proteins, which are major components of microRNA-mediated mRNA decay. The global relationship between UPF1 binding and miRNA-mediated mRNA decay in 3’UTR is barely known. In our study, we reanalyzed available RNA-seq which data followed UPF1 down-regulation to confirm a NMD independent repression pathway. In selected non-NMD target genes, we found that UPF1 repressed mRNA abundance along the 3’UTR length. We identified binding sites of UPF1 and AGO2 using available CLIP-seq data. A majority of UPF1 binding sites co-localized with AGO2 proteins in which endogenous miRNA (endo-miRNA) target sites in mouse embryonic stem (mES) cells are embedded. In the UPF1 knock-downed mES cells, co-localized UPF1 and AGO2 with endo-miRNA targets were significantly more de-repressed than only AGO2 localized targets and nontargets, suggesting a synergistic effect of UPF1 on miRNA targeting. 3’UTR length-controlled analysis showed a greater synergistic effect in longer 3’UTRs. In mRNAs with no endo-miRNA target sites, the 3’UTR dependency of UPF1-mediated mRNA down-regulation was greatly reduced. We also observed consistent results in UPF1 knock-downed HeLa cells. Taken together, it suggests that the synergistic role of UPF1 interacting with AGO in miRNA-mediated mRNA decay is evolutionally conserved.

L43 - A database of naturally evolved mutations from E. coli strains mapped on the bacterial 3D PPIN: a tool to identify determinants of antibiotics resistance and design combined mutations to recover susceptibility

Alessandro Pandini, Brunel University London,

Arshad Khan, Brunel University London,

David Gilbert, Brunel University London,

Nigel Saunders, Brunel University London,

Short Abstract: Several strains of pathological bacteria have accumulated mutations conferring resistance to a wide range of antibiotics currently used. A foreseeable solution is arising from the advent of personalised medicine and the affordability of genome sequencing. In the near future treatments might be tailored directly on the mutation profile of the pathogen. Unfortunately no tools is currently available to identify the link between multiple mutations and range of resistance.

Additionally antibiotic resistance greatly limits and makes difficult the use of bacteria for biosynthetic production of commodity chemicals. Bacterial safety and the ability to control cultures is critical for industrial exploitation. The possibility to separate mutations conferring resistance from the ones related to antibiotics susceptibility could open new avenues to successfully engineer strains with selected susceptibility.

We aimed at the development of a computational tool for the selection of mutations for adaptive improvement of E. coli to confer susceptibility to selected antibiotics. The tool will comprise a database of naturally evolved mutations from more than 200 E. coli strains mapped on the bacterial 3D protein-protein interaction network (PPIN) and a predictive software to mine the database for candidate mutations with potential to confer adaptive and desired phenotypes when combined.

Here we present the first stage of the project: a MySQL database of mutation data, associated strain, experimental conditions, detailed phenotypic data (for selected strains) and associated PPI (protein and protein interactors). The resource is open to the scientific community working on synthetic biology in E. coli to contribute with deposition of data.

L44 - pdbBAM: a comprehensive mapping of Protein Data Bank proteins on the human genome

Matsuyuki Shirota, Tohoku University, Japan

Short Abstract: Protein structure plays an important role in interpreting the impact of non-synonymous mutations in human genome. Although the number of protein structures in Protein Data Bank (PDB) is increasing, it requires several steps to test whether structural information is available for the protein encoded by a genomic locus of interest or for the homologues of the protein. To make it easier to use structural information for genomics researchers, pdbBAM is created as a data resource in which amino acid sequences of the proteins in PDB are mapped on the human genome by way of virtual mRNA sequences of PDB proteins. The data is constructed in binary sequence alignment mapping (BAM) format so that it can be visualized with genome browsers together with the results derived from high-throuput sequencing and various annotations of the genome. In pdbBAM, each protein chain in PDB is represented by a sequence read mapped on a genomic position. The mapping is comprehensive, in that all of the proteins homologous to the human protein derived from a genomic locus are shown. The similarity of the PDB protein and the human protein on the locus can be grasped by the extent of base-mismathes, which are originated from the amino acid sequence differences. The residue number of each codon can also be obtained through genome browser by employing a residue-level BAM file. This resource will facilitate the interpretation of genomic non-synonymous variations discovered through genome analyses.

L45 - Structural Basis of the Effect of Systemic Lupus Erythematosus-Associated Mutations in NCF2 on the Activity of NADPH Oxidase

Miriam Eisenstein, Weizmann Institute of Science, Israel

Don Armstrong, University of Southern California, United States

Raphael Zidovetzki, University of Southern California, United States

Chaim Jacob, University of Southern California, United States

Short Abstract: Systemic Lupus Erythematosus (SLE) is a multisystem autoimmune disorder associated with mutations in multiple genes, including NCF2. We found that mutation H389Q in NCF2 is a SLE risk factor in North Americans of European or Hispanic descent and mutation R395W is independently associated with SLE in Hispanics. Both mutations are within the PB1 domain of NCF2, which binds the PB1 domain of NCF4. NCF1, NCF2 and NCF4 are the cytosolic subunits of NADPH oxidase, a multiple subunits complex, important for the generation of reactive oxygen species. Upon activation the cytosolic subunits translocate to the membrane where together with RAC-GTP they form the active NOX2 complex. Dissociation of RAC-GDP and formation of RAC-GTP is regulated by guanine nucleotide exchange factors (GEFs) such as VAV1. NCF2 interacts directly with VAV1, enhances its GEF activity and amplifies RAC nucleotide exchange and NADPH oxidase activation.
The structural effect of the SLE-related NCF2 mutations was revealed by modeling the interaction of NCF2/NCF4 with VAV1. NCF2 H389 interacts with the zinc finger domain of VAV1 while R395 interacts with the C-terminus of NCF4 and together they bind to VAV1 DH domain. The mutations weaken or disrupt the binding. The two-point interaction of NCF2/NCF4 with VAV1 stabilizes VAV1 in a conformation adequate for nucleotide-free RAC1 binding. R395 also stabilizes the conformation of NCF2 loop 395-402 and our model of the complex NCF2/NCF4/VAV1/RAC1 suggests that this loop also interacts with RAC1 insertion domain and switch-II, contributing to the stabilization of nucleotide-free RAC1 and enhancing nucleotide exchange.

L46 - GGIP : GPCR-GPCR Interaction pair Predictor

Wataru Nemoto, Graduate School of Tokyo Denki University, Japan

Yoshihiro Yamanishi, Kyushu University, Japan

Vachiranee Limviphuvadh, A*STAR, Singapore

Shunnsuke Fujishiro, Graduate School of Tokyo Denki University, Japan

Hiroyuki Toh, Kwansei Gakuin University, Japan

Short Abstract: G-Protein Coupled Receptors (GPCRs) are important pharmaceutical targets. More than 30% of currently marketed pharmaceutical medicines target GPCRs. A number of studies have reported that GPCRs function not only as their monomers but also as homo- or hetero-dimers or higher-order molecular complexes. Many GPCRs exert a wide variety of functions by a specific combination of GPCR subtypes. In addition, some GPCRs are reported to be associated with diseases. Thus, GPCR oligomerization is now recognized as an important event in various biological phenomena, and many researchers are investigating the subject. As of today, more than 400 GPCR subtype pairs have been reported to form homo- or hetero-oligomers, and the number is still increasing. However, there are more GPCR pairs whose oligomer formations have not been examined yet. Hence, we have developed a method to predict interacting pairs for GPCR oligomerization by integrating structure and sequence information. The performance of our method was evaluated by Receiver Operating Characteristic curve. The corresponding Area Under the Curve (AUC) was 0.889. As far as we know, there is no method to predict interacting pairs among GPCRs. We will show examples of predicted GPCR pairs, which may be associated with diseases through disease tissue specific upregulation of GPCR genes or nsSNPs on GPCR genes corresponding to the interfaces for GPCR oligomerization.

L47 - Addressing binding site specificities of bromodomains using an in silico drug discovery pipeline

Mehrosh Pervaiz, Alberts Ludwigs University of Freiburg, Germany

Short Abstract: Bromodomains (BRDs) are emerging epigenetic targets in various types of cancer [1]. They specifically recognize ε-N-acetylated lysine residues (Kac) on the unstructured histone tails. The human bromodomain family comprises 61 BRDs, distributed across a wide range of functionally diverse proteins. These bromodomains cluster into eight structural classes, all of which share the conserved bromodomain fold with a largely hydrophobic binding pocket [2], making it difficult to achieve selectivity when looking for potential binders, particularly within a structural class. In this work we compared proteins from the human bromodomain family in order to identify specificities of the binding sites within the different classes. Addressing the binding site specificities would allow us to develop inhibitors with good selectivity for the target proteins. The results of comparative analysis were incorporated into our in silico discovery pipeline. Putative targets were screened against an in-house collected library of small molecules using a fragment-based virtual screening approach. Here we present the virtual screening results for the bromodomain target BRWD(1). Potential ligands are being verified experimentally. High affinity binders will be obtained by performing iterative steps of modeling including fragment growing and linking, lead optimization, and experimental validation.

References

Filippakopoulos P, Knapp S. Targeting bromodomains: epigenetic readers of lysine acetylation. Nat Rev Drug Discov. 2014;13:337–56.
Filippakopoulos P, Picaud S, Mangos M, Keates T, Lambert JP, Barsyte-Lovejoy D, Felletar I, Volkmer R, Müller S, Pawson T, Gingras AC, Arrowsmith CH, Knapp S. Histone recognition and large-scale structural analysis of the human bromodomain family. Cell. 2012;149(1):214-31.

L48 - Modeling and Validation for Enhancing Design of Collagen and Fibronectin Stimulating Peptides for Tissue Engineering

Navaneethakrishnan Krishnamoorthy, QCRC & Imperial College London, Qatar

Yuan-Tsan Tseng, QCRC, Doha & Imperial College London, Qatar

Poornima Gajendra rao, Imperial College London, Qatar

Adrian Chester, Imperial College London,

Magdi Yacoub, Imperial College London,

Short Abstract: The extracellular matrix (ECM) consists of several molecules for regulating cellular functions. In which, collagen and fibronectin are the essential components. These key proteins of ECM are important for the success of tissue engineering (TE) and thus stimulating their production by cells has great biomedical implications. Bio-mimetic peptides can be used for stimulating the cells. However, to customize the peptides for TE requires understanding of their structural properties and selection of an efficient molecular design. Here, we used computational bioengineering to enhance molecular design of peptides for promoting stimulation of collagen and fibronectin in human adipose-derived stem cells (hADSCs). Molecular dynamics simulations were utilized to study their structural properties. This modeling study assisted to prioritize significant peptides that form intact structural core and ordered assembly, which could attract cell receptors and promote their function. As a proof of concept, the prioritized peptides were synthesised to examine under experimental conditions for assessing their functionality with hADSCs. The in-vitro experiments showed that the peptides were non-toxic to the cells and increased the production of collagen and fibronectin in hADSCs. The molecular design from this study could be extremely useful for guiding hADSCs to produce biomaterials for tissue regeneration.

L49 - The Structural Dynamic Effects of Inhibitor Binding to Protein kinase C βII

Shashank Jariwala, University of Michigan, Ann Arbor, United States

Sivaraj Sivaramakrishnan, Department of Cell and Developmental Biology, University of Michigan, Ann Arbor, Michigan, USA., United States

Barry Grant, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA., United States

Short Abstract: Protein kinase C βII (PKCβII) regulates diverse cellular signaling pathways involved in B-cell activation, insulin signaling, and oxidative stress-induced apoptosis. Aberrant PKCβII activity has been implicated in a variety of human diseases including diabetic retinopathy and B-cell immunodeficiency. Accordingly, PKCβII has become a popular target for small molecule based inhibition. However, the effects of available inhibitors on the conformational dynamics and allosteric couplings essential for PKCβII function remains largely unknown.This lack of knowledge hampers the development of improved inhibitors and limits our understanding of how disease-associated mutations in distal sites can interfere with inhibitor efficacy. Here we combine molecular dynamics(MD) simulations with bioinformatics analysis to characterize distinct flexibilities and internal dynamical couplings upon inhibitor binding. MD simulations revealed increased flexibility of the nucleotide-binding P-loop and turn motif of the regulatory C-terminal tail when bound to the ATP-competitive inhibitor Bisindolylmaleimide-I (BIM1). Dynamical cross-correlation analysis revealed variations between ATP and inhibitor states localized to the turn motif,P-loop,and helices αE, αG and αH. The ATP state displays stronger couplings between the turn motif and P-loop, as well as distinct couplings between active-site residues that are lost upon inhibitor binding. Furthermore,correlation network path analysis indicates that the inhibitor decouples the P-loop and catalytic loop from the turn motif. Lastly, residues mediating distinct ligand-dependent couplings were identified in the C-terminal tail and N-lobe of catalytic domain. Somatic mutations of a subset of these residues have been observed in cancer samples and the functional and structural dynamic effects of these mutations are currently being investigated.

L50 - GTP-Dependent K-Ras Dimerization

Serena Muratcioglu, Koc University, Turkiye

Tanmay S. Chavan, University of Illinois, United States

Benjamin C. Freed, Frederick National Laboratory for Cancer Research, United States

Hyunbum Jang, National Cancer Institute, United States

Lyuba Khavrutskii, dCancer and Inflammation Program, United States

Marzena A. Dyba, National Cancer Institute, United States

Sergey G. Tarasov, National Cancer Institute, United States

Attila Gursoy, Koc University, Turkiye

Ozlem Keskin, Koc University, Turkiye

Nadya I. Tarasova, National Cancer Institute, United States

Vadim Gaponenko, University of Illinois, United States

Ruth Nussinov, National Cancer Institute, Tel Aviv University, United States

Short Abstract: Ras-family proteins are small membrane-associated GTPases. They act as molecular switches coupling extracellular signals to various cellular responses. Ras-GTPases recruit and activate many downstream effectors, including Raf kinase and phosphatidylinositol 3-kinase (PI3K). Ras is believed to function as a monomer; however, since Raf dimerizes, it has been suspected that Ras may also dimerize. Here we show that GTP-bound K-Ras4B protein actively forms stable homodimers through its catalytic domain; we further show Ras-GTP/Ras-GDP heterodimers and higher-order assemblies. We observe two major dimer interfaces. Remarkably, the highly populated β-sheet dimer interface is at the Switch I and effector binding regions overlapping Raf’s, PI3K’s, and additional effectors’ binding sites. The second, helical interface also overlaps some effectors’ binding sites, such as PKCα. Both interfaces are also observed in Ras-GTP/Ras-GDP heterodimers. Raf’s activation requires side-to-side dimerization. Our data suggest that Raf competes with Ras’ β-sheet dimer interface; thus the helical interface may promote Raf‘s activation. Our results reveal how active Ras monomers dimerize and interact with Raf, thereby unveiling Raf’s regulation, and suggest that drugs that disrupt the helical dimer interface may weaken Raf signaling in cancer.

L51 - Structural and evolutionary analysis of DNA recognition by Auxin Response Factors in plants.

Alex Slater, Pontificia Universidad Catolica de Chile, Chile

Rodrigo Gutierrez, Pontificia Universidad Catolica de Chile, Chile

Short Abstract: Plant growth and development processes are known to be regulated by genes that response to Auxin. The expression of these genes is controlled by transcription factors of the ARF (Auxin Response Factor) family. They recognize AuxRE (Auxin Response Element) sequences in plant genomes through a B3-type DNA binding domain. ARF transcription factors have been well characterized in Arabidopsis thaliana, as well as the AuxRE elements recognized by these factors, which is highly conserved. However, the conservation of the DNA binding domain and AuxRE elements has not been explored systematically in other plants groups. In this work we have conducted a structural and sequence analysis of the B3 domain conservation in 4 taxonomic groups of plants. We combined the use of three-dimensional models of protein-DNA complexes, sequence pattern comparisons and phylogenetic analysis to explore the degree of conservation in the residues involved in the DNA recognition. Our results show that there are three distinct patterns of conservation at protein level in Bryophyta, Gymnosperm and Monocots+Dicots. This finding also suggests that AuxRE elements should be distinct in these three groups. The identification of these distinct patterns should be interesting for future investigations regarding both the identification of new target genes of ARF factors and the the construction of regulatory networks controlled by Auxin.

L52 - A novel protein kinase domain present in bacterial effectors and putative polymorphic toxins of various pathogens

Krzysztof Pawlowski, Warsaw University of Life Sciences, Poland

Marcin Gradowski, Warsaw University of Life Sciences, Poland

Short Abstract: Novel protein kinase-like families continue to be discovered and characterised. We have recently discovered in silico a number of putative kinases (e.g. FAM69 and SELO kinase families).
Many infectious bacteria employ effectors that subvert host signalling. Using bioinformatics tools for remote homology detection, we present a robust prediction of protein kinase-like structure for a group of uncharacterised bacterial proteins present in several plant pathogens (e.g. Pseudomonas syringae), in a few animal pathogens (e.g. Burkholderia multivorans) and also in free-living organisms.
Proteins in the novel family have a fairly well-conserved catalytic region. Thus, they most likely are active kinases. They are poorly studied, but several of them are annotated as type III effectors, e.g. HopBF1 protein from Pseudomonas syringae. Although most HopBF1-like proteins are composed of the kinase-like domain only, several proteins possessing the HopBF1-like kinase domain are large molecules, several thousand residues long. This long-HopBF1 group is very diverse, coming from different bacterial genera (e.g. Pseudomonas, Ralstonia, Variovorax, Cupravidis, Kribbella, Burkholderia). Their domain compositions and genomic neighbourhoods strongly suggest these proteins are diverse polymorphic toxins, delivered by different secretion systems (mostly type III secretion system, but also type V, VI and VII). Some of the putative kinase toxin domains are accompanied by novel metalloprotease or cysteine protease domains that could be additional toxins or releasing proteases. A few putative HopBF1 polymorphic toxins have repeated kinase-like domains.
The novel protein kinase-like family expands the known kinome and deserves experimental determination of the detailed molecular mechanisms and biological roles within the signalling networks.

L53 - HotMuSiC: Predicting Protein Thermal Stability Changes upon point Mutations

Fabrizio Pucci, Universite Libre de Bruxelles, Belgium

Raphaël Bourgeas, Universite Libre de Bruxelles, Belgium

Marianne Rooman, Universite Libre de Bruxelles, Belgium

Short Abstract: The prediction of the thermal stability changes upon point mutations is an important and highly non-trivial
goal in protein science with a wide series of applications. We tackle this problem by building a tool that uses as input the three-dimensional structure and the melting temperature of the wild type protein
and is able to predict in a fast and precise way the melting temperature change upon point mutations (ΔTm). It is based on a series of statistical mean force potentials which are temperature-dependent, and are linearly combined with the help of a double-layer artificial neural network (ANN). Volume terms that describe the creation of cavities or the accommodation of stress in the protein structure were added. The activation functions of the ANN were chosen to be linear functions of the solvent accessibility of the mutated residues for the first layer, and of the melting temperature of the wild type protein for the second layer. The coefficients of the linear combination were identified by requiring the minimization of the root mean square deviation between the predicted and experimental ΔTm’s for a dataset of 1601 mutations. The performance of our method is evaluated in cross-validation, and yields a root mean square deviation between predicted and experimental ΔTm’s of about 7°C, which reduces to 4°C when ten percent outliers are removed. Our method shows thus quite good scores and is moreover the first large scale predictor of thermal stability changes, which makes no detour through the thermodynamic stability.

L54 - Prediction of protein subcellular localization using LocTree3

Tatyana Goldberg, , Germany

Short Abstract: The prediction of protein sub-cellular localization is an important step toward elucidating protein function. For each query protein sequence, LocTree2 applies machine learning (profile kernel SVM) to predict the native sub-cellular localization in 18 classes for eukaryotes, in six for bacteria and in three for archaea. The method outputs a score that reflects the reliability of each prediction. LocTree2 has performed on par with or better than any other state-of-the-art method. Here, we report the availability of LocTree3 as a public web server. The server includes the machine learning-based LocTree2 and improves over it through the addition of homology-based inference. Assessed on sequence-unique data, LocTree3 reached an 18-state accuracy Q18 = 80 ± 3% for eukaryotes and a six-state accuracy Q6 = 89 ± 4% for bacteria. The server accepts submissions ranging from single protein sequences to entire proteomes. Response time of the unloaded server is about 90 s for a 300-residue eukaryotic protein and a few hours for an entire eukaryotic proteome not considering the generation of the alignments. For over 1000 entirely sequenced organisms, the predictions are directly available as downloads. The web server is available at http://www.rostlab.org/services/loctree3.

L55 - Structure-Based Prediction of Transcription Factor Binding Specificity using an Integrative Energy Function

Alvin Farrel, University of North Carolina at Charlotte, United States

Jun-tao Guo, University of North Carolina at Charlotte, United States

Short Abstract: Transcription factors (TFs) are essential to regulation of gene expression through binding to specific target DNA sites. Structure-based methods for studying TF-DNA interactions can help us annotate TF-binding sites at genome-scale, better understand the effects of mutations in transcription factors and target sites, and facilitate structure-based drug design. We have previously developed knowledge-based residue-level statistical potentials for structure-based TF-binding site prediction and TF-DNA docking. Here we describe novel algorithms for improving structure-based TF-binding site prediction, which include a hybrid energy function for binding affinity calculations, and a unique method for selecting true binding sequences. The new energy function combines atomic-level energies with statistical knowledge-based residue-level potentials. The atomic terms include hydrogen bond energy between protein residues and DNA bases, and the electrostatic energy between aromatic residues and DNA bases involved in π stacking interactions. The new method of selecting binding sequences uses the rate of change of the calculated energy scores. Our results show that adding the new atomic terms to the knowledge-based potential and using the novel strategy for sequence selection increase TF binding site prediction accuracy when tested on the two largest TF families, zinc-finger and homeodomain proteins. This method is currently being tested on a large dataset with representatives from other TF families.

L56 - Global view of the protein universe

Rachel Kolodny, University of Haifa, Israel

Nir Ben-Tal, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel

Sergey Nepomnyachiy, Polytechnic Institute of New York University, United States

Short Abstract: To globally explore protein space, we represent all similarities among a representative set of domains as networks. In the “domain network” edges connect domains that share “motifs,” i.e., significantly sized segments of similar sequence and structure, and in the “motif network” edges connect recurring motifs that appear in the same domain. These networks offer a way to organize protein space, and examine how the definition of “evolutionary relatedness” among domains influences their structure. At excessively strict thresholds the networks falls apart; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions: "discrete" versus “continuous.” The discrete region consists of isolated islands, each generally corresponding to a fold; the continuous region is dominated by domains with alternating alpha and beta elements. The networks can also suggest evolutionary paths between domains, and be used for protein search and design.

L57 - Detecting Molecular Similarities Between Allergenic And Metazoan Parasitic Proteins: Allergy In The Light of Immunity

Nicholas Furnham, London School of Hygiene and Tropical Medicine,

Nidhi Tyagi, European Molecular Biology Laboratory,

Edward Farnell, University of Cambridge,

Colin Fitzsimmons, University of Cambridge,

Stephanie Ryan, University of Edinburgh,

Rick Maizels, University of Edinburgh,

David Dunne, University of Cambridge,

Janet Thornton, European Molecular Biology Laboratory,

Nicholas Furnham, London School of Hygiene & Tropical Medicine,

Short Abstract: Allergic reactions are observed to be very similar to those implicated in the acquisition of an important degree of immunity against metazoan parasites, eliciting a similar immunoglobulin E (IgE) immune response. Based on the hypothesis that IgE-mediated immune responses evolved to provide extra protection against metazoan parasites rather than to cause allergy, we predict that environmental allergens will share key molecular properties with metazoan parasite antigens that are specifically targeted by IgE. Using large scale computational studies, we have established molecular similarity between parasite proteins and allergens and are able to predict the regions of parasite proteins that potentially share similarity with the IgE-binding region(s) of allergens. Nearly half of 2445 parasite proteins that show significant similarity with allergenic proteins fall within the 10 most abundant allergenic protein domain families. Our experimental studies support the predictions, and we present the first confirmed example of a plant pollen-like protein that is the commonest allergen in pollen in a worm and confirming it is targeted by IgE in those exposed to infection in a schistosomiasis endemic area of Uganda. The identification of such similarities explains the ‘off-target’ effects of the IgE-mediated immune system in allergy.

L58 - Antibody CDR loops. What defines them and how are they different?

Cristian Regep, University of Oxford,

Guy Georges, Roche Diagnostics GmbH, Germany

Jiye Shi, UCB Pharma,

Sudharsan Sridharan, MedImmune,

Charlotte Deane, University of Oxford,

Short Abstract: Antibodies are an essential part of our immune system, due to their high degree of specificity and affinity towards their targets. Therapeutic antibodies make use of this high specificity and affinity and now account for the majority of revenue in the sales of new bio-therapeutics. The affinity and specificity of antibodies is modulated by a set of six key binding loops called the Complementarity Determining Region (CDR). Multiple definitions for the start and end points of these six CDR loops have been proposed based either on structure (e.g. Chothia) or sequence (e.g. Kabat an IMGT). The structural definition of CDRs given by Chothia was developed when less than 20 antibody structures were known. In this work we have re-examined the structural definition of CDRs using the almost 2000 structures now available. Our results show that in general the Chothia definition still holds true. However, we also found a few key changes.
In light of our revised definitions we analyzed the similarities and differences between CDR loops and general protein loops. We considered amino acid propensities, loop lengths, structure and flexibility. We find that some of the CDR loops show a high degree of similarity to general protein loops while others do not.

L59 - Protein disorder promotes protein conformational diversity

Alexander Monzon, UNQ, Argentina

Diego Zea, UNQ-LELOIR, Argentina

María Silvina Fornasari, National University of Quilmes, Argentina

Silvio C E Tosatto, University of Padova, Italy

Gustavo Parisi, National University of Quilmes, Argentina

Short Abstract: Large-scale analysis of protein conformational diversity using RMSD as a measure of conformer similarity showed that this distribution has a peak at low RMSD (0.4-0.8Å) but with a skew towards higher RMSD values. In this work we studied the relationship between conformational diversity and the occurrence of protein disorder. Using 2383 proteins with their corresponding conformers taken from CoDNAs database (36200), we measured RMSD between all the conformers for each protein (almost 800,000 comparisons), along with disorder percentage and number of disordered regions in each conformer. Disordered regions were taken as those with at least five consecutive missing residues (terminal ends were excluded). A protein was classified as “ordered” if all their conformers showed no disordered regions, and as “disordered” protein when shows at least one conformer with a disordered region. We found that ordered and disordered maximum RMSD distributions are different (0.81 and 1.24Å for average RMSD respectively) and above 0.9Å of RMSD there is a 2.5 times enrichment in disordered proteins. Interestingly, ordered and disordered sets differ at the molecular function and cellular compartment levels when an enrichment test with GO terms were performed. Also, in the disordered set, most proteins (587) have their maximum RMSD in a disordered pair of conformers, while the rest (251) in an ordered pair. These subsets differ in the percentage of maximum disorder per protein and in the number of disordered regions. We think that these distributions could reflex functional adaptations depending on protein flexibility and disorder content.

L60 - Examining the conservation of kinks in alpha helices

Eleanor Law, University of Oxford,

Henry Wilman, University of Oxford,

Sebastian Kelm, UCB,

Jiye Shi, UCB, Belgium

Charlotte Deane, University of Oxford,

Short Abstract: Kinks are a feature of alpha helices in protein structures. They are known to have functional roles in some membrane proteins such as ion channels and G-protein coupled receptors (GPCRs). We have investigated whether kinks are conserved in homologous proteins, as such conservation might be expected for functional features. However, kinks may also be points of helix flexibility, which could be indicated by a lack of structural conservation. The angle of a kink in a helix can only be estimated. To investigate conservation, we must judge whether the difference in angle between two helices is significant. Therefore, we have developed a novel method to estimate error in measured kink angles.

To assess the conservation of kinks between homologs, we extracted sets of homologous helix pairs and homologous helix families from both soluble and membrane proteins. We find that many kinks are not conserved, and this is more common in soluble proteins. We show that kink conservation is related to both sequence similarity between homologs and conservation of specific residues. In GPCR transmembrane helices, we find that the kink angle can depend on the type of ligand. There is also an example of correlation between kink angles in adjacent helices, suggesting concerted motion. These observations support the importance of helix kinks in functional conformational changes.

L61 - Splice junctions are constrained by protein disorder

Ben Smithers, University of Bristol,

Matt Oates, University of Bristol,

Julian Gough, University of Bristol,

Short Abstract: Domains are units of protein structure that fold independently. However many proteins, or regions of proteins, do not form a stable three-dimensional structure; so-called intrinsically disorder proteins or disordered regions. The need for efficient transcription, translation and splicing all place constraints on the amino acid sequence of proteins. This work spans the central dogma by examining the relationship between the splicing of introns and the structure and disorder of the translated protein product.

It is known that efficient splicing requires nucleotide bias at the splice junction; we show that the preferred usage produces a distribution of amino acids that is disorder-promoting. We observe that efficiency of splicing, as seen in the amino-acid distribution, is not compromised to accommodate globular structure. Thus we infer that it is the positions of splice junctions in the gene that must be under constraint by the local protein environment. Examining exonic splice enhancers found near the splice junction in the gene, reveals that these short DNA motifs are more prevalent in exons that encode disordered protein regions than exons encoding structured regions.

Finally, we compare our results across different taxa, with 91 eukaryotic genomes including animals, plants, protists and fungi.

L62 - Local Frustration and the Energy Landscapes of Ankyrin Repeat Proteins

Rodrigo Parra, Universidad de Buenos Aires, Argentina

Rocio Espada, Universidad de Buenos Aires, Argentina

Nina Verstraete, Universidad de Buenos Aires, Argentina

Diego Ferreiro, Universidad de Buenos Aires, Argentina

Short Abstract: Introduction
Protein folding is today understood within the concepts of the “Energy Landscapes Theory”. In order to fold robustly, proteins must minimize their internal conflicts, following the Principle of Minimal Frustration. However, residual frustration can provide evolutionary advantages on protein native states ensembles, sculpting protein dynamics and modulating protein function.
Ankyrin Repeat Proteins (ANKs) are composed of multiple copies of a ~33 residues length motif that typically fold into elongated structures. Since residue-residue interactions remain local within and between neighboring repeats, these systems are useful to quantify how local energetics impacts on folding and function.

Description:
We have applied a tessellation approach in order to detect and compare the structural variations on all the known structures of ANKs. We calculated the local frustration patterns over these structures and describe which parts of the repeat arrays tend to be energetically (un)favorable for protein folding. We have performed folding simulations of several ANKs, in order to describe their folding mechanisms and compare the results with the previously described parameters.

Results
Energetic conflicts are not randomly distributed over the canonical framework of ANK repeats. Enrichment of highly frustrated interactions in the residues that surround insertions, binding sites and deletions is observed. Highly conserved residues at the sequence level are connected by a network of minimally frustrated interactions. Despite their high structural similarity, ANKs with the same number of repeats can display different folding dynamics with variable levels of complexity. The effects of local frustration and structural variations in the folding ensembles will be discussed.

L63 - ECPred: Enzyme Prediction Using a Combination of Classifiers

Ahmet Süreyya Rifaioğlu, Middle East Technical University, Turkiye

Tunca Doğan, European Bioinformatics Institute,

Ömer Sinan Saraç, İstanbul Technical University, Turkiye

Mehmet Volkan Atalay, Middle East Technical University, Turkiye

Maria J. Martin, European Bioinformatics Institute,

Rengül Çetin-Atalay, Middle East Technical University, Turkiye

Short Abstract: Efficient and accurate protein function prediction methods are required to annotate the proteins with unknown functions. Recent studies show that combination of different methods enhances prediction accuracy. In addition, data preparation and post-processing of predictions are other important factors in functional annotation of proteins. Here we propose “ECPred”, a novel hierarchical approach to predict Enzyme Commission (EC) numbers using a similar classification methods as in our previous approach for GO terms (GOPred). The main characteristics that differentiate our approach from existing studies are the use of a combination of independent classifiers, and novel data preparation and evaluation methods. ECPred consists of three independent classifiers: SPMap, Blast-kNN and Pepstats-SVM that are subsequence, similarity, and feature based methods, respectively. ECPred combines these methods and gives a weighted mean score for each trained EC number. In ECPred we use hierarchical data preparation and evaluation steps to increase the accuracy of the predictions. Each EC number is trained as a separate classifier with its own training data. Therefore, we calculate an optimal decision threshold for each EC number and only the predictions with weighted mean scores over the determined thresholds are presented as set of predicted terms. The training set is composed of enzyme sequences and their EC number annotations in UniProtKB/Swiss-Prot database. ECPred is trained for 851 EC classes. Cross-validation results have shown that ECPred can predict enzyme functions with high performance (average F-Score is 0.96).

L64 - Comparative modeling of full duplex DNA structures by satisfaction of spatial restraints

Ignacio Ibarra, Pontificia Universidad Católica de Chile, Chile

Francisco Melo, Pontificia Universidad Catolica de Chile, Chile

Short Abstract: Just a few tools for three-dimensional (3D) structural modeling of DNA are freely available. In some of them, structural defects are found when assessing the models: absent hydrogen bonds, bases without a proper stereochemistry and artifacts along the sugar-phosphate backbone conformation. In others, even if those aspects are solved, its use is not simple, requiring a significant time investment for generating a model.

In this work, a fully-automated computational tool for building 3D DNA models based on comparative modeling has been implemented from MODELLER software suite. The protocol generates models that keep a final structure as close as possible to the template while base replacements are optimized, producing models with good stereochemistry. The protocol can be executed in a large-scale.

Experimental double-stranded DNA structures were used for the development of this tool. From a non-redundant set of 34 structures obtained from PDB and NDB databases a total of 18 geometrical restraints were derived. A retrospective analysis was done through selection of target sequences and template structures pairs, with restraints jackknife, between our current modeling tool against an equivalent protocol implemented with 3DNA. Statistically significant RMSD differences were found between reference structures and models generated with this protocol and equivalent modeling protocol with 3DNA, in two out of three force fields used to assess the models.

Our results demonstrate the practical utility and robustness of this new tool for double strand DNA modeling. The next step is to extend this tool for the modeling of protein-DNA complexes.

L65 - Detection of related proteins in distant viral families

Olga Kalinina, Max Planck Institute for Info, Germany

Saskia Metzler, Max Planck Institute for Info, Germany

Silvia Caprari, Max Planck Institute for Info, Germany

Short Abstract: Analysis of evolutionary relationships between distant viral families presents particular difficulties, since the sequence similarity of viral proteins is rarely detectable outside the immediate viral family. We have performed an all-to-all sequence and structural comparison of viral proteins, and focused on cases where similarity is detected between proteins from viruses that use different type of nucleic acid to encode their genome. We can split the corresponding proteins families into families with balanced and unbalanced distribution of viral genome types. For the former category, we recapitulate viral hallmark genes (i.e. genes characteristic to only viruses and present in diverse species [1]) and other known wide-spread viral proteins [2], providing the first comprehensive analysis of these cases. The protein families of the latter category can be often characterized by horizontal gene transfer events, including transfer from the virus host.

[1] E. Koonin, T. Senkevich, and V. Dolja. The ancient virus world and evolution of cells. Biology Direct, 1(1):29, 2006.
[2] M. Krupovic and D. H. Bamford. Protein Conservation in Virus Evolution. In eLS, John Wiley & Sons, Ltd, 2001.

L66 - REST services for programmatic access to macromolecular structure data

Swanand Gore, EMBL-EBI,

Jose M. Dana, EMBL-EBI,

Eduardo Sanz García, EMBL-EBI,

Manuel A. Fernandez Montecelo, EMBL-EBI,

Glen van Ginkel, EMBL-EBI,

Saqib Mir, EMBL-EBI,

John Berrisford, EMBL-EBI,

Younes Alhroub, EMBL-EBI,

Ingvar Lagerstedt, EMBL-EBI,

Aleksandras Gutmanas, EMBL-EBI,

Michael Wainwright, EMBL-EBI,

Ardan Patwardhan, EMBL-EBI,

Sameer Velankar, EMBL-EBI,

Gerard Kleywegt, EMBL-EBI,

Short Abstract: PDBe (http://pdbe.org), founding member of wwPDB and EMDataBank, manages the PDB and EMDB archives together with its partners. PDB and EMDB are the single worldwide archives respectively of macromolecular structure and 3D cryo-EM data. These archives continue to enjoy vigorous growth in the volume and complexity of their basic data and annotations.

In order to maximize the utilization of this information and help researchers carry out data-intensive integrative bioinformatic analyses, PDBe has recently released REST APIs for accessing and searching the PDB and EMDB data, with the following key features:
* Lightweight, web-friendly, well-organized RESTful URLs
* Uniform access to a variety of information, e.g. from PDB and EMDB entries, SIFTS cross-references, structure quality, PISA assemblies, literature, etc.
* Powerful search capabilities based on Apache-Solr search engine (http://wwwdev.ebi.ac.uk/pdbe/entry/search/index)
* Information returned as structured JSON or JSONp, with appropriate HTTP status
* Detailed interactive documentation (http://www.ebi.ac.uk/pdbe/api/doc/)
* Illustrative use cases (https://github.com/PDBeurope/PDBe_Programming)
* E-mail list for announcements and discussions (pdbe-api-users@ebi.ac.uk)
* Weekly production cycle for up-to-date results

The next-generation PDBe web-pages are populated exclusively with data from these services. User responses so far suggest that these services indeed fulfil a previously unmet need. The services will continue to be actively developed in a non-disruptive and backwards compatible manner.

L67 - Accurate GPCR-ligand modelling: Exploring insights from GPCRDock 2013

Reyhaneh Esmaielbeiki, University of Oxford,

Sebastian Kelm, UCB Pharma,

Jiye Shi, UCB Pharma,

Will Pitt, UCB Pharma,

Zara Sands, UCB Pharma,

Charlotte Deane, University of Oxford,

Short Abstract: G protein–coupled receptors (GPCRs) are the largest and most diverse group of membrane proteins. They serve as the main component of cellular signalling and are the target of almost 30% of available drugs. Therefore, knowledge of their 3D structure in complex with small molecule ligands is crucial for drug design. Available structural data represent only a small number of human GPCRs. Therefore, computational modelling and docking is often used to fill this gap. However these methods are not always successful in reproducing near-native structures. Here we study the most suitable strategies to generate accurate models of GPCR-ligand complexes.
We obtained the 528 GPCR models submitted to the community-wide GPCRDock 2013 competition and re-dock them to models of their ligands using: Glide (multiple protocols), Gold, MOE and Autodock Vina. We show that for easy targets Glide outperforms other docking methods both in building and ranking near-native complexes. Results confirm that the quality of the modelled complex has no correlations with the global quality of the GPCR model. In contrast, GPCR models with high binding site similarity to the native structure result in more accurate docked conformations. We exploit this knowledge to devise a homology-based method to filter out models which consistently result in low accuracy poses. The remaining models are re-ranked using a variety of strategies in order to select the most accurate conformations.
Based on this study we propose an automated, repeatable protocol which allows building complexes with accuracy comparable to the top results of GPCRDock.

L68 - Structural Dynamics of Porphobilinogen Deaminase during the complex four step tetrapyrrole synthesis

Gopalakrishnan Bulusu, Tata Consusltancy Services, India

Meenakshi Pradhan, Tata Consultancy Services, India

Navneet Bung, Tata Consultancy Services, India

Dibyajyoti Das, Tata Consultancy Services, India

Arijit Roy, Tata Consultancy Services, India

Short Abstract: Porphobilinogen deaminase (PBGD) enzyme is unique as it catalyses the step-wise polymerization of 4 porphobilinogen monomers within the same catalytic site, in which a dipyrromethane cofactor is covalently attached to a conserved Cysteine. To understand the protein dynamics and its catalytic mechanism, the sequence-structure-function relationship in PBGD homologs from E. coli, Human, A. thaliana and P. falciparum were studied. Molecular simulations suggest domain movements to accommodate the elongating pyrrole chain within its active site. The sequence variation of a conserved Lysine and structural dynamics of the PBGD homologs with docked substrates through the stages of chain elongation give insights into the difference in their activity between species. Specific role of active site residues in modulating catalysis is inferred. Steered molecular dynamics suggest the exit of the linear product (1-hydroxymethylbilane) is through the space between the domains flanking the active site loop.

L69 - PatchSearch: a fast method for flexible recognition of protein binding sites

Ines Rasolohery, Molécules Thérapeutiques in silico, France

Gautier Moroy, Molécules Thérapeutiques in silico, France

Frédéric Guyon, Molécules Thérapeutiques in silico, France

Short Abstract: Specific recognition of a drug by its target protein has raised many issues, due to non-specific interactions. In order to determine whether a ligand could interact specifically with its target protein, we have developed a new method called PatchSearch. The aim of our program is to identify similar patches of atoms among protein surfaces.
PatchSearch is based on a quasi-clique detection in a correspondence graph. A clique is a group of geometrical and physicochemical links providing a possible structural alignment between patch and protein surface atoms. PatchSearch firstly computes all maximum cliques of a reduced size graph. The best clique is then enriched with all compatible links from the clique neighborhood. The resulting quasi-clique includes both rigid and flexible parts of the query patch.
We assessed PatchSearch ability to recognize patches specific to a same ligand. For this purpose, we used a dataset proposed by Kahraman consisting of protein structures complexed with one of nine ligand types. We obtained results suggesting that PatchSearch is able to identify patches interacting with a same ligand with good specificity and sensibility.
Based on a benchmark set up by Gunasekaran, we have evaluated PatchSearch capacity to find a patch when a protein adopts different conformations. When patches have undergone large conformational changes, the quasi-cliques approach allows relevant recognition of both rigid and flexible parts of patches.
We have also applied PatchSearch on patches interacting with a drug, i.e. indometacin, to recognize similar patches into a set of human protein structures.

L70 - Distinct Profiling of Antimicrobial Peptide Families

Abdullah Khamis, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Magbubah Essack, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Xin Gao, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Vladimir Bajic, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Short Abstract: Motivation: The increased prevalence of multi-drug resistant (MDR) pathogens heightens the need to design new antimicrobial agents. Antimicrobial peptides (AMPs) exhibit broad-spectrum potent activity against MDR pathogens and kills rapidly, thus giving rise to AMPs being recognized as a potential substitute for conventional antibiotics. Designing new AMPs using current in-silico approaches is, however, challenging due to the absence of suitable models, large number of design parameters, testing cycles, production time and cost. To date, AMPs have merely been categorized into families according to their primary sequences, structures and functions. The ability to computationally determine the properties that discriminate AMP families from each other could help in exploring the key characteristics of these families and facilitate the in-silico design of synthetic AMPs.
Results: Here we studied 14 AMP families and sub-families. We selected a specific description of AMP amino acid sequence and identified compositional and physicochemical properties of amino acids that accurately distinguish each AMP family from all other AMPs with an average sensitivity, specificity and precision of 92.88%, 99.86% and 95.96%, respectively. Many of our identified discriminative properties have been shown to be compositional or functional characteristics of the corresponding AMP family in literature. We suggest that these properties could serve as guides for in-silico methods in design of novel synthetic AMPs. The methodology we developed is generic and has a potential to be applied for characterization of any protein family.

L71 - The Newest Developments of ProQ2

Karolis Uziela, Stockholm University, Sweden

Nanjiang Shu, Bioinformatics Services to Swedish Life Science, Sweden

Björn Wallner, Linköping University, Sweden

Arne Elofsson, Stockholm University, Sweden

Short Abstract: ProQ2 is a model quality assessment program (MQAP) that has been very successful in CASP experiments. The program extracts a number of different input features from the structures of protein models and uses them to train a Support Vector Machine (SVM) that predicts the quality score of the residues in these models. The quality score for the whole protein model (global score) can be easily derived by summing the scores for each residue (local scores) and dividing the sum by target length.

We have improved the accuracy of the original ProQ2 by 16% at the global level and by 9% at the local level by employing new training features that were derived from Rosetta energy functions. The Rosetta energy features were transformed by sigmoidal function and averaged over varying window sizes. Moreover, we have noticed that ProQ2 scores correlate better with contact based measures, such as LDDT or CAD score than with superposition based measures, such as GDT_TS or Sscore. Compared to Sscore the global correlations are 29% better for LDDT and 14% better for CAD score. These and other ProQ2 developments are presented in this poster.

L72 - Developing and validating a prototype pipeline for in silico functional classification of Nitrilases based on sequence data

Eleni-Fani Gkotsi, University of Cyprus, Cyprus

Vasilis Promponas, University of Cyprus, Cyprus

Short Abstract: Nitrilases constitute a 13-branch-superfamily with a conserved catalytic triad [1]. Their biocatalytic potential has attracted interest towards their classification based on their functional properties. Herein, we report preliminary results on a prototype of a computational pipeline for nitrilase classification, taking as launching point the multiple sequence alignment (MSA) of 22 nitrilases from the 1st branch of the superfamily [1].
Briefly, we construct libraries of profile Hidden Markov Models (pHMMs) of various lengths for each position on the MSA aiming to unravel the most Informative Regions towards the branch of interest. Individual pHMMs are assessed by different approaches, e.g. leave one out cross-validation, self-consistency test and against a small (thus not really representative) independent data set. Using several metrics (e.g., F1-score) we apply a series of heuristics to select groups of non-overlapping pHMMs that yield good classification performance, as assessed by combining the p-values of individual pHMMs using the Qfast algorithm [2].
Our results indicate that specific heuristics lead to robust elimination of false positives, without sacrificing sensitivity, while predicting equally well the queries of interest compared to the pHMM based on the full-length MSA and BLASTp, using only ~40% of the available information. Additionally, more detailed insight to specific regions is gained as these are projected on PaNit structure from Pyrococcus abyssi [3]. We intend to further investigate whether these specific regions contain residues critical for nitrilase specificity.

References
[1] Brenner C. Curr Opin Struct Biol 2002;12:775-782.
[2] Bailey TL, Gribskov M. Bioinformatics 1998;14: 48-54.
[3] Raczynska JE, et al. J Struct Biol 2011;173:294-302.

L73 - The Bio3D Package: New Interactive Tools for Structural Bioinformatics

Barry Grant, University of Michigan, United States

Xin-Qiu Yao, University of Michigan, United States

Lars Skjærven, University of Bergen, Norway

Short Abstract: We present extensive updates to Bio3D, a package for both interactive and batch analysis of biomolecular structure, sequence and molecular simulation data. Features include the ability to read and write biomolecular structure, sequence and dynamic trajectory data, query and search major sequence and structure databases, perform alignment, superposition, dynamic domain identification, sequence and conformational clustering, correlation analysis, conservation analysis, normal mode analysis, principal component analysis, and many other common structural bioinformatics tasks. Bio3D also leverages the extensive graphical and statistical capabilities of the R environment and thus provides a useful framework for the exploratory interactive analysis of biomolecular sequence and structure data. Recent notable additions to the package include unique high-throughput ensemble normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, new consensus methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis, interactive structure ensemble visualization as well as multicore support for many time intensive tasks. Here we describe these new capabilities with example applications. The previous version of Bio3D has been downloaded by over 13,700 researchers and cited over 180 times. Merging these new methods represents an important advance further facilitating the integration of structural dynamics and evolutionary analysis. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/.

L74 - Bumps and traffic lights along the translation of secretory proteins

Michal Linial, The Hebrew University of Jerusalem, Israel

Shelly Mahlab, The Hebrew University of Jerusalem, Israel

Short Abstract: Protein translation is the most expensive operation. Therefore, managing the speed and allocation of resources is tightly controlled. In this study we show that the entire proteome in yeast, fly, human, worm, plant and cow do not show the unique properties at the N-terminal segment while a signal is associated with the Signal peptide (SP) containing proteins. We found pattern in the N-terminal for slowing down the translation rate for SP proteome. We critically analyze these observations from statistical and evolutionary perspectives. We generalize our observation to other groups of proteins that govern by the ‘speed controls’. Specifically, the pattern of codons and their prevalence was tested for GPI-anchored and mitochondrial Transit peptide containing proteins. In all cases, a “speed control” pattern is recorded for all tested organisms. We conclude that tuning the translation of a nascent protein is essential for coping with the constraints imposed by proteins’ cellular fate.

L75 - Origin of the tetratricopeptide repeat

Hongbo Zhu, Max Planck Institute for Developmental Biology, Germany

Joerg Martin, Max Planck Institute for Developmental Biology, Germany

Marcus Hartmann, Max Planck Institute for Developmental Biology, Germany

Edgardo Sepulveda, Max Planck Institute for Developmental Biology, Germany

Andrei Lupas, Max Planck Institute for Developmental Biology, Germany

Short Abstract: Tetratricopeptide repeat (TPR) is a solenoid repeat protein facilitating various protein interactions in cells. Each TPR repeat unit consists of 34 residues forming an alpha-helix hairpin. Repeat proteins are thought to have evolved by the amplification of peptides active as cofactors of RNA-based replication and catalysis (the RNA world hypothesis). In search of the origins of the TPR, we analyzed proteins of known structure for ancestral homologs of the helix hairpin that represents the TPR repeat unit, with a particular focus on proteins considered to be ancient. We identified several plausible homologs showing significant similarity in sequence and structure, including one from a ribosomal protein. In order to evaluate the potential of these helical hairpins to form a TPR fold by amplification, we designed a TPR-like motif from the ribosomal protein, using as few mutations as possible. As determined experimentally, constructs containing between two and four point mutations per repeat folded into TPR structures, all of them dimeric. As seen from the crystal structure of one of the constructs, its architecture is identical to that of natural TPR proteins, but the dimer interaction is novel. Introduction of the mutations into the organism harboring the ribosomal protein from which the hairpin was amplified shows that they are neutral with respect to survival and growth, suggesting that they could have been sampled by neutral drift in the course of evolution. Our results demonstrate that the TPR repeat could have arisen by the amplification of a previously existing, primordial helical hairpin.

L76 - Active site profile-based protein clustering is an efficient, accurate method to define protein functional groups

Janelle Leuthaeuser, Wake Forest University, United States

Brian Westwood, Wake Forest University, United States

Patsy Babbitt, University of California - San Francisco, United States

Jacquelyn Fetrow, Wake Forest University, United States

Short Abstract: The elucidation of protein molecular function lags far behind the rate of high-throughput sequencing technology; thus, the development of accurate and efficient computational methods to define functional relationships is essential. Protein clustering based on sequence similarity has emerged as a simple, high-throughput method for defining protein relationships, but sequence-based techniques are often inaccurate at defining molecular function details. Active site profiling (ASP) was previously developed to identify and compare molecular details of protein functional sites. Protein similarity networks were created using both active site similarity and sequence similarity for four manually curated SFLD superfamilies; results demonstrate that ASP-based clustering identifies detailed functional relationships more accurately than sequence-based clustering. Building on this, two iterative pipelines were developed using active site profiling and profile-based searches to cluster protein superfamilies into functional groups. First, the Two Level Iterative clustering Process (TuLIP) utilizes active site profiling and iterative PDB searches to divisively cluster protein structures into groups sharing functional site features. Across eight superfamilies, TuLIP clusters exhibit high correlation with SFLD functional annotations. Subsequently, the Multi-level Iterative Sequence Searching Technique (MISST) was developed to identify protein sequences that belong in each TuLIP group using iterative profile-based GenBank searches. The results indicate that these ASP-based methods introduce an efficient, accurate way to define functionally relevant groups that can be applied systematically and on a large-scale. Moreover, the approach can be applied more quickly than detailed manual curation, suggesting its value in guiding annotation efforts.

L77 - Combining knowledge-based and ab initio approaches for protein loop modelling

Claire Marks, University of Oxford,

Stefan Klostermann, Roche Diagnostics GmbH, Germany

Jiye Shi, UCB, Belgium

Charlotte Deane, University of Oxford,

Short Abstract: Loop modelling is an important problem in protein structure prediction - loop regions are usually the least accurate parts of a model, but are often functionally important. Loop modelling algorithms are commonly grouped into two categories: knowledge-based and ab initio. Knowledge-based methods use databases of previously observed structures to find suitable conformations for a given target loop, enabling high accuracy predictions to be made when the target has a structure similar to one previously observed. Ab initio methods generate possible conformations computationally, and as such are able to access previously unseen regions of the conformational space – however as this space is very large, predictions can be less accurate. Here we present a new loop modelling algorithm that combines aspects of the two approaches, taking relevant structural information from a loop fragment database and completing the structure using ab initio techniques. This method therefore generates the high-accuracy predictions that can be achieved by knowledge-based methods, whilst maintaining the high coverage of ab initio methods. We use our algorithm to predict the structures of antibody CDR-H3 loops – these loops are the main determinant of antibody binding and display huge diversity in sequence and structure. CDR-H3 is known to be particularly challenging for prediction. We achieve results as good as current knowledge-based methods, with 100% coverage.

L78 - The SUPERFAMILY database (and new beta website) in 2015

Natalie Thurlby, University of Bristol,

Matt Oates, University of Bristol,

Jonathan Stahlhacke, University of Bristol,

Dimitris Vavoulis, University of Bristol,

Ben Smithers, University of Bristol,

Owen Rackham, Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London,

Adam Sardar, e-Therapeutics plc,17 Blenheim Office Park, Long Hanborough, Oxfordshire, OX29 8LN, UK,

Jan Zaucha, University of Bristol,

Hai Fang, University of Bristol,

Julian Gough, University of Bristol,

Short Abstract: SUPERFAMILY is a database and a website resource (including a new beta website), which is freely available to the public. A library of expertly curated hidden Markov models provides sequence homology to SCOP structural domains. These are used to provide domain annotation for sequences provided by UniProt, Ensembl and other resources. In addition to this it also contains a collection of 3,258 proteomes, many of which cannot be found elsewhere. This is double the number that were present in SUPERFAMILY four years ago. These proteomes now feature a proteome quality index (PQI), where they are rated according to 11 different metrics.

SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and—in the case of whole genomes—with enrichment analysis against a taxonomically defined background.

L79 - Transmembrane helix prediction in 2015: evaluation and a novel method

Jonas Reeb, Technische Universität München, Germany

Michael Bernhofer, Technische Universität München, Germany

Edda Kloppmann, Technische Universität München, Georgia

Burkhard Rost, Technische Universität München, Germany

Short Abstract: Computational methods for the prediction of transmembrane helices remain popular due to the challenges membrane proteins pose for experimental structure determination. We examined a set of 12 prediction methods, both well known and widely used, on a new non-redundant dataset of 190 transmembrane proteins with high resolution structures. Furthermore, we used datasets containing soluble proteins and proteins with signal peptides to estimate discrimination of transmembrane proteins and helices from soluble proteins and signal peptides.

Applying a more stringent performance measure than typical largely confirmed a high-level of prediction performance. On the other hand all methods showed an overestimation of their performance, with significantly worse predictions on a subset of previously unseen proteins. Several additional trends stood out: First, unlike multiple previous evaluations, we found prediction performance to be higher for eukaryotic proteins. Next, performance was worse for proteins with more transmembrane helices. Finally, while most methods discriminated very well between soluble and transmembrane proteins, the presence of signal peptides led to an increase of false positives in older methods.

Importantly, we have used the results from this evaluation to develop a new transmembrane helix prediction method, TMSEG, which combines machine learning with empirical filters and further incorporates homology information. TMSEG outperforms the best methods from our evaluation, in particular on multi-pass transmembrane proteins, and is freely available as part of the PredictProtein webserver.

L80 - Sequence co-evolution gives 3D contacts and structures of protein complexes

Charlotta Schaerfe, University of Tübingen/Harvard Medical School, United States

Debora Marks, Harvard Medical School, Germany

Thomas Hopf, Technische Universität München/Harvard Medical School, Germany

João Rodrigues, Utrecht University, Netherlands

Anna Green, Harvard Medical School, United States

Oliver Kohlbacher, University of Tübingen, Germany

Chris Sander, Memorial Sloan Kettering Cancer Center, United States

Alexandre Bonvin, Utrecht University, Netherlands

Debora Marks, Harvard Medical School, United States

Short Abstract: The interactions of proteins with other biomolecules are essential for all biological activity and thus the accurate prediction of protein-protein interaction partners and interface-residues has been of great interest to the scientific community. Here we present a method, EVcomplex, that allows to predict such data from the evolutionary sequence record alone by making use of residue coevolution between proteins.
This method can have stark implications for various topics from the determination of the actual binding partners and binding sites in large protein complexes to whole genome interactome predictions. In the presentation I will show that the evolutionary record allows us to predict novel protein-protein interactions as well as alternate binding conformations without additional external knowledge of the protein’s 3D structure.

TOP

View Posters By Category

Search Posters:

TOP