HOME

Tweets by @ISMBinfo

Accepted Posters

Attention Conference Presenters - please review the Speaker Information Page available here.

If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category L - 'Protein Structure and Function Prediction and Analysis'

L01 - Structural Characterization of Protein Families is Slowing

John-Marc Chandonia, Berkeley National Lab, United States

Short Abstract: The number of new protein structures deposited every month in the PDB has steadily increased, and is now at over 750 structures per month. On average, fewer than 15 of these structures (i.e., 2%) represent the first solved structure from a protein family. Fifteen families per month is the lowest rate at which families have been structurally characterized in nearly 20 years, despite vastly more efficient technology for structure determination. Today less than half as many families are newly structurally characterized every month as during the heyday of Structural Genomics, between 2003 and 2007. Because the rate of sequencing has outpaced the rate of structural characterization of families, the fraction of large protein families with a known structure peaked 6-12 years ago, and is 10- 20% lower today than it was at its peak. This makes interpretation of sequence variation more challenging than would otherwise be the case.

L02 - VIScreen: A comprehensive database for in silico screening of virus-inhibitor interactions

Pei Hao, Institute Pasteur of Shanghai,Chinese Academy of Sciences, China

Short Abstract: VIScreen is designed for high-throughput screening of virus-inhibitor interactions. VIScreen consists of a comprehensive database and a set of computing tools. The database currently contains 108,928 viral proteins indexed to 3015 viral species, 4619 viral protein structures (PDB), and a chemical library including 17,130 small molecule compounds. Four different computation applications are integrated into the platform presently:

pfamscan detecting protein domains;
3D molecular similarity calculation;
protein cavity detection to predict ligand binding site for druggability evaluation;
PSI-DOCK for flexible protein-ligand docking.
A pre-determined set of protein-ligand interactions was built and saved in the database according to results computated by molecular similarity analysis, binding site detection, and protein-compound docking. A web interface is provided as front end for users to upload his/her query data and initiates an in silico screening for candidate virus inhibitors.
In summary, VIScreen is a uniqe web-service specially developed aiming for high-throughput in silico screening of virus inhibitory compounds, which is the first to be freely available online.

L03 - Accurate detection of protein motifs and amino acids involved in protein subcellular localization.

Viola Volpato, University College Dublin, Ireland

Short Abstract: This poster is based on Proceedings Submission 32.
Functional annotation of unknown proteins from the sequence of amino acids has become of major interest in proteomics. Prediction of protein subcellular localization and discrimination of protein sorting signals are key steps to elucidate protein function. However, conserved homology blocks in sorting peptides have not been discovered yet due to their high sequence variability. However, it may be possible to shed light on the protein sorting process and, subsequently, to improve protein functional annotation by means of powerful Machine Learning techniques capable of analysing extensive data from protein sequence databases. Here we present novel methods able to extract correlations between protein sequences and their subcellular localizations found by our Neural Network architecture, SCLpredT, during subcellular localization prediction. In particular, our aim is to capture biologically relevant information in terms of sequence motifs and single amino acids involved in protein localization to extracellular, mitochondrial and chloroplast compartments. Our first method is capable of directly isolating signal carried by each portion of sequence and of providing a complete view of the signal trend along an entire sequence. It achieves high performances in recognizing real targeting peptides as main features for correct protein localization. Moreover, at a residue level of analysis, our second approach identifies amino acid patterns and residue propensities that are consistent with data derived from experimental procedures. We believe this information, if freely available and integrated in specialized systems, to be of valuable importance to enhance the annotating process for uncharacterized proteins.

L04 - How antibodies chase antigens, how Antigens try to escape and how we can use this to predict antibody specificity

Yanay Ofran, Bar-Ilan University, Israel

Short Abstract: Abs must bind indistinct patches on proteins that attempt to escape recognition. They must be able to recognize virtually any surface while strictly maintaining their own fold. A little is known about the mechanisms that allow Abs to do this. Thus, while most drugs that are in clinical development are Abs, there is currently no simple way to determine experimentally or computationally what exactly they bind.
We will review a series of studies that revealed key mechanisms that enable Abs to perform these tasks. We will present a novel prediction approach that utilizes these findings, combined with simple competition assays, to predict where on an Ag a given Ab will bind. The accuracy of these predictions is verified experimentally using crystallography and other methods. To conclude, we will bring more examples, and discuss the power of combining sophisticated predictions with simple experiments.

L05 - A unified multitask deep learning architecture for predicting local protein properties

Yanjun Qi, University of Virginia, United States

Short Abstract: The detection of a variety of functionally important protein properties can be encoded as a machine learning task of labeling amino acids. Motivated by recent, successful work in natural language processing, we propose to use deep learning with multitask to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the ten tasks that we considered, the resulting model achieves state-of-the-art performance.

L06 - A site for direct integrin αvβ6•uPAR interaction from structural modelling and docking

Sowmya Gopichandran, Macquarie University, Australia

Short Abstract: Integrin αvβ6 is an epithelially-restricted heterodimeric transmembrane glycoprotein, known to interact with the urokinase plasminogen activating receptor (uPAR), playing a critical role in cancer progression. While the X-ray crystallographic structures of segments of other integrin heterodimers are known, there is no structural information for the complete αvβ6 integrin to assess its direct interaction with uPAR. We have performed structural analysis of αvβ6•uPAR interactions using model data with docking simulations to pinpoint their interface, in accord with earlier reports of the β-propeller region of integrin α-chain interacting with uPAR. Interaction of αvβ6•uPAR was demonstrated by our previous study using immunoprecipitation coupled with proteomic analysis by mass spectrometry. Recently this interaction was validated with proximity ligation assays and peptide arrays. The data suggested that two potential peptide regions from domain II and one peptide region from domain III of uPAR, interact with αvβ6 integrin. Only the peptide region from domain III is consistent with the three-dimensional interaction site proposed in this study. The molecular basis of integrin αvβ6•uPAR binding using structural data is discussed for its implications as a potential therapeutic target in cancer management.

L07 - Using predicted inter-residue contact information to fold globular proteins de novo

Tomasz Kosciolek, University College London,

Short Abstract: Recently, we saw a growth in the number and quality of methods that accurately predict intra-protein contacts. Here, we present a systematic study showing the utility of predicted contacts in improving the predictions of globular proteins. We investigate the potential benefits of combining a fragment-based folding algorithm – FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 globular proteins, each representing a different Pfam domain family, we did a systematic study to show how to use the intra-protein contacts most effectively and what are the features of correct models.

Overall we find that using fragment assembly with both statistical potentials and predicted contact information is significantly better than either approach alone. Results show up to nearly 80% of correct predictions (TM-score ≥ 0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. A strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This observation helped us to develop a quality assessment scoring function achieving 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys.

These findings suggest the approach is well-suited for blind predictions on globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.

L08 - Exploring alternative domain architectures as a way to improve annotation consistency within the Conserved Domain Database (CDD)

Myra Derbyshire, National Institutes of Health, United States

Short Abstract: NCBI’s CDD is a collection of protein domain models, collected as multiple sequence alignments and converted into position-specific score matrices. It uses RPS-BLAST to match protein sequences with these models. CDD includes imported models (Pfam, TIGRFAMs and others) as well as finer-grained hierarchical classifications, based on phylogenetic analysis, for selected domain families curated by NCBI staff. CDD supports a live search service for protein and nucleotide queries, as well as pre-computed domain and site annotation for the majority of protein sequences tracked by NCBI’s Entrez system. For both, a default RPS-BLAST E-value (reporting) threshold is applied. Here we examine, whether collecting additional search-database hits obtained at a raised E-value threshold can uncover domain architectures that are common enough to provide a viable alternative to architectures assigned with the default E-value reporting threshold. We also examine whether suppressing annotation with E-values close to the reporting threshold can be effective in removing rare and unlikely domain architectures. This work was supported by the Intramural Research Program of the National library of Medicine at the National Institutes of Health/DHHS.

L09 - Active site profile-based clustering of Enolase structures and sequences

Janelle Leuthaeuser, Wake Forest University, United States

Short Abstract: The elucidation of protein molecular function lags far behind the rate at which protein sequences are identified; accurate and efficient computational methods that cluster protein sequences in functionally relevant ways are essential. Active site profiling was previously developed to identify and compare details of protein molecular functional sites. The program DASP utilizes a profile-based approach to search sequence databases for proteins containing similar functional sites to the input proteins. The Structure Function Linkage Database (SFLD) contains enzyme superfamilies whose functional subgroups and families have been identified by expert curation. Therefore, validation that DASP can identify discrete functionally relevant groups corresponding to SFLD-curated groups would provide the foundation for development of accurate and efficient computational methods to functionally cluster the protein sequence universe. To validate the approach, active site profiles were created for each subgroup and family in the well-studied enolase superfamily and used to search the PDB sequences. Results demonstrate high correlation between DASP grouping and SFLD annotation, suggesting that profiling could be used in a process to identify functionally relevant clusters, with no a priori knowledge of those clusters. Such a process, TuLIP, was developed and applied to the enolases, producing 23 groups that correspond well with SFLD subgroups and families. Profiles of these groups were used to search Genbank, and over 15,500 enolase superfamily members were identified and assigned to one of the 23 functionally relevant, discrete clusters. Continued validation and automation of this method could provide a necessary tool to automatically cluster proteins into functionally relevant groups.

L10 - An overrepresentation analysis of human transcription factors

Shahram Bahrami, Norwegian University of Science and Technology, Norway

Short Abstract: Transcription factors regulate gene expression by controlling the amount and the timing of transcription. Most Transcription Factors (TFs) are DNA-binding proteins that bind to specific DNA sub-sequences known as Transcription Factor Binding Sites. However, the definition of a TF is not always very strict, and may include DNA-binding proteins that do not recognize any specific DNA motif, as well as proteins that do not bind DNA, but influence transcription through protein-protein interactions (PPIs).
Here we describe the collection of a comprehensive list of properties for human TFs, including DNA binding domains (DBDs), protein-protein interactions and post-translational modifications, based on a list of 1985 human transcription factors. The mapping of DNA-binding domains included the use of prediction methods, giving a final set of 175 DBDs in 1387 DNA-binding proteins.
We have used this list to analyze associations between various properties, like DNA-binding and PPI propensity, as stabilization through PPI is a possible mechanism for stable binding despite lack of strong DBDs in some TFs. This analysis included specific cases, where one, both or none of the TFs in a PPI pair have a DBD.
The results from this analysis show that the most significant result is for interactions where none of the TFs has a DBD. We also extended this approach to other properties, like post-translational modifications, and to the analysis of sets of TFs from various experiments.

L11 - In Silico Screening of Ras Mutations for Structural Oncology through Static Mode experiments

Marie Brut, LAAS-CNRS, France

Short Abstract: The lack of understanding of the biomolecular processes that govern Ras activity explains that it remains undruggable in spite of decades of constant efforts. We propose to apply the Static Mode method, developed at LAAS-CNRS, for screening Ras biomechanical properties with the final goal of proposing new pharmacological strategies to restore their normal function. It allows the user to design specific actions applied on single or multi-atom sites, and to anticipate the induced structural/mechanical changes. This low-cost computational approach offers unprecedented possibilities to address unsolved questions such as the characterization of specific mutations and the mechanism by which they generate dysfunction; or the virtual screening of pertinent new mutant proteins to restore a normal activity and shortcut expensive in vitro experiments.

L12 - Structure-based function prediction of uncharacterized protein using binding sites comparison

Janez Konc, National Institute of Chemistry, Slovenia

Short Abstract: A challenge in structural genomics is prediction of the function of uncharacterized proteins. When proteins cannot be related to other proteins of known activity, identification of function based on sequence or structural homology is impossible and in such cases it would be useful to assess structurally conserved binding sites in connection with the protein's function. We propose the function of a protein of unknown activity, the Tm1631 protein from Thermotoga maritima, by comparing its predicted binding site to a library containing thousands of candidate structures. The comparison revealed numerous similarities with nucleotide binding sites including specifically, a DNA-binding site of endonuclease IV. We constructed a model of this Tm1631 protein with a DNA-ligand from the newly found similar binding site using ProBiS, and validated this model by molecular dynamics. The interactions predicted by the Tm1631-DNA model corresponded to those known to be important in endonuclease IV-DNA complex model and the corresponding binding free energies, calculated from these models were in close agreement. We thus propose that Tm1631 is a DNA binding enzyme with endonuclease activity that recognizes DNA lesions in which at least two consecutive nucleotides are unpaired. Our approach is general, and can be applied to any protein of unknown function. It might also be useful to guide experimental determination of function of uncharacterized proteins.

L13 - Consistency of sugar structures and their annotation in the PDB

Deepti Jaiswal, Masaryk University, Czech Republic

Short Abstract: Theoretical studies typically involve molecular modeling of sugars and sugar-specific protein receptors. These studies rely on structural information deposited in the Protein Databank (PDB). Since the main purpose of PDB is to store the structure of proteins and nucleic acids, thus, it is expected that PDB structure files are complete and correctly annotated.
Sugars are characterized by specific structural features such as multiple chiral centers on each ring. Because of these peculiarities, the validation and annotation of sugar structures is not straightforward.
Our first goal was to develop a methodology that can identify whether a sugar structure is complete and correctly annotated. Second goal was then to check all PDB entries containing sugars. For this purpose we collected all sugar structures from PDB entries, and compared them to model structures available in Ligand Expo [1], a repository of ligand chemical and structural information. For the comparison we used several tools for structural comparison currently available (SiteBinder [2], Open Babel [3]), as well as two in-house programs. We report here on our findings regarding the complete and correctly annotated sugar structures in PDB, together with the problematic cases.
References:
[1] Feng Z, Chen L, Maddula H, Berman HM, Westbrook J: Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 2004.
[2] Sehnal D, Vařeková RS, Huber HJ, Geidl S, Ionescu CM, Wimmerová M, Koča J: SiteBinder: an improved approach for comparing multiple protein structural motifs. J Chem Inf Model 2012.
[3] O'Boyle NM, Banck M, Vandermeersch T, Hutchison GR: Open Babel: An open chemical toolbox. Journal of Cheminformatics 2011

L14 - A recognition model of ACP-HCS interaction for programmed beta-branching in type I polyketide synthases

Rohit Farmer, University of Birmingham,

Short Abstract: Polyketide synthases (PKSs) are enzyme complexes that synthesise a wide range of natural products of medicinal interest, notably a large number of antibiotics. Type I polyketide synthases can introduce beta-carbon branches into a growing polyketide chain via enzymes encoded by the “HMG-CoA synthase (HCS) cassette”. One of the first polyketide biosynthesis cluster in which the HCS cassette was discovered is responsible for the synthesis of the antibiotic mupirocin by Pseudomonas fluorescens. MupH is the HMG-CoA synthase homologue responsible for β-branching in the mupirocin synthesis pathway. To understand better what allows the HCS cassette to recognise β-branch-associated acyl carrier proteins (ACPs) of the mupirocin synthesis pathway, we have computationally docked the modelled MupH with the NMR structure of ACPs. The docking results were also supported by the evolutionary trace data and the physical properties of the interface residues. Hidden Markov models (HMM) were used to classify ACPs as branching and non-branching. HMM analysis highlighted essential features for an ACP to behave like a branching ACP. Through modelling and mutagenesis we identified helix III of the ACP as a probable anchor point of the ACP–HCS complex. The position of this helix is determined by the core of the ACP and substituting the interface residues modulates the interaction specificity. Our method for predicting β-carbon branching lays a basis for determining the rules for ACP-HCS specificity and expands the potential for engineering new polyketides.

L15 - 3D structural models of Baeyer-Villiger Monooxygenases: molecular dynamics simulation at the light of a structural alphabet

Alexandre de Brevern, INSERM UMR_S 1134, France

Short Abstract: Baeyer-Villiger monooxygenases (BVMOs) are flavoenzymes and belong to the class of oxidoreductases (Alphand & Wohlgemuth, 2010). They catalyze the oxidation of various ketones to esters and lactones. During enzymatic oxidation, one atom of oxygen is incorporated between a carbon-carbon bond, whereas the other oxygen atom ends up in a water molecule with the hydrogen atoms originating from the cofactor NAD(P)H. BVMOs’ flavin cofactor is crucial for catalysis and is tightly, but not covalently, bound in the active site. These enzymes require the reduction of the flavin cofactor to activate it for molecular oxygen binding.
We have carefully searched protein databases for new BVMOs related sequences. We had applied strict selection criteria as the flavopotein monooxygenase superfamily is suffering from many annotation problems in the databases. Hence, the 116 proteins we have analyzed are highly characteristic of BVMOs type I family (Rebhemed et al., 2013). We propose efficient structural models of these BVMOs using up-to-date approaches.
Molecular dynamics were performed for each structural model. Interestingly, simulations are analyzed using a structural alphabet, namely Protein Blocks (de Brevern et al., 2000). The latter consists in a library of 16 fragments of 5 residues length able to approximate efficiently every part of protein structures (Joseph et al., 2010). This database of structural models may be useful from a fundamental viewpoint, but also for modelling purposes and therefore, will be freely accessible to the scientific community via a website.

L16 - ATPKnn: a K-nearest neighbors method for predicting ATP-binding sites using protein evolutionary profiles

Jing Hu, Franklin & Marshall College, United States

Short Abstract: Adenine 5’-triphosphate (ATP) plays important roles in many biological processes such cell signaling and enzymatic cofactor functions. It is also widely involved in metabolic processes to transport the energy needed for conformation changes in many biological interactions. The binding of ATP to proteins is a fundamental step to such processes. Therefore, it is very important to develop computational methods which can predict ATP-binding sites on proteins to understand the mechanism of protein-ATP binding. However, ATP-binding sites are difficult to predict because the number of ATP-binding residues is very small compared with that of non-binding residues (i.e., the ratio of ATP-binding residues vs. non ATP-binding residues is around 1:20~25). In this project, we try to predict ATP-binding residues on proteins from protein sequence information. For each protein, we performed PSI-BLAST search against the non-redundant protein database to get the position-specific scoring matrix (PSSM) profile. Each query residue is predicted by the K-nearest neighbors method based on a weighted Euclidian distance calculated by comparing the feature vectors between samples, which are curated using the evolutionary profiles of residues in a sliding window centering on the target residue. The current method has already achieved comparable performance as that of other published methods. The next step of our work focuses on improving the prediction ability of the method by applying over-sampling and/or under-sampling techniques to solve the problem of imbalance of the dataset.

L17 - Bilitranslocase transport channel and functional mechanism

Amrita Roy Choudhury, National Institute of Chemistry, Slovenia

Short Abstract: The goal of our work is to elucidate the transport channel structure and functional mechanism of the transmembrane protein bilitranslocase. The primary function of bilitranslocase is transport of organic anions, and the protein is potentially druggable. To analyze the protein structure, we have used a combination of computational and experimental methods.

Bilitranslocase has four transmembrane alpha helical regions. The probable assembly of these four transmembrane regions forming the transport channel is analyzed with Monte Carlo approach. Predicted interhelical interactions between transmembrane regions TM2:TM3 and TM1:TM4 serve as the primary constraint during the simulation. Analyzing the best-scoring conformations indicate three probable assemblies of transmembrane regions. In the most populated assembly, the two key transmembrane regions, TM2 and TM3, are arranged diagonally. In addition, the structures of these two regions are analyzed, both individually and in mixture, with NMR experiments performed in SDS micelle environment. FRET experiments validate the interaction between TM2:TM3.

The transmembrane regions TM2 and TM3 constitute of amino acids participating in H-bond formation, and are flanked by ligand binding motifs. These two regions therefore play prime roles in ligand mediation through the transport channel. Further, structures of both the transmembrane regions show Pro induced kinks, which render flexibility to the transport channel. These structural features are in line with the metastable nature of the protein.

L18 - Examining variable domain orientations in antigen receptors gives insight into TCR-like antibody design

James Dunbar, Oxford University,

Short Abstract: A key task of the immune system is to specifically recognise immunogenic molecules. Two components that allow this are antibodies and T-cell receptors (TCRs). Antibodies are able to bind diverse antigenic shapes whilst TCRs recognise antigens of the form of a peptide-MHC complex (pMHC). The geometry of both molecules' binding sites is affected by how their variable domains orientate with respect to one another (Antibodies:VH+VL; TCRs:Vbeta+Valpha). We have previously developed ABangle, a method and computational tool to calculate the antibody VH-VL orientation in an absolute sense. Here, we apply the same method to the analogous domains in TCRs and compare Vbeta-Valpha orientations to antibody VH-VL orientations. Despite individual domain structural similarity, Vbeta-Valpha and VH-VL orientations are found to be distinct. To investigate functional implications of this difference we examined the effect that changing the variable domain orientation has on a TCR-pMHC complex. If TCRs are made to take on standard VH-VL orientations they have significant steric clashes with the pMHC. However, engineered therapeutic antibodies that bind the pMHC do exist. Only one of the available TCR-like antibody structures binds in a mode similar to TCRs. This has a VH-VL orientation similar to typical Vbeta-Valpha orientations. Finally we examined what determines the difference in orientation between receptor types. The packing of the L/alpha CDR3 is found to be influential. In addition, several domain-domain interface positions were identified whose modification can promote a TCR-like variable domain orientation in antibodies. These factors may provide useful considerations for the design of therapeutic TCR-like antibodies.

L19 - DeepSAR: Drug Target Prediction using Deep Learning

Andreas Mayr, Johannes Kepler University Linz, Austria

Short Abstract: Drug development depends on knowledge about both the desired and the adverse biological effects of compounds. Information on the compounds' biological effects is used to improve the efficacy of a compound and to avoid adverse side-effects. Therefore, a large number of bioassay experiments have to be performed during the development of a drug. In our work we exploit bioassay measurements available in compound databases to predict the biological effects of drug candidates. An accurate algorithm is highly desirable since it would replace time- and cost-intensive bioassay experiments and, thereby, help to bring more and better drugs to the market.

The task is quite challenging: A computational method has to represent molecules in a meaningful way, handle highly unbalanced data sets, process huge amounts of data, and be highly accurate. We propose DeepSAR, a deep neural network with rectified linear units for predicting the biological effects of drug-like compounds. DeepSAR utilizes a sparse representation of compounds to decrease the computational costs and is, therefore, able to handle the high data dimensionality.

DeepSAR outperforms competitive target prediction methods on Big Data in drug design.

L20 - Memoir 2.0: A Membrane Protein Structure Prediction Server

Reyhaneh Esmaielbeiki, University of Oxford,

Short Abstract: Membrane proteins are the targets of the majority of pharmaceuticals currently available. Knowledge of their structure is a useful asset in drug design. However, out of all the known protein structures only 2% are membrane proteins. Previously, we introduced the Memoir 1.0 server which uses homology modeling to predict the structure of a target membrane protein using a template structure. Homology modeling often entails a trade-off between the level of accuracy and the level of coverage that can be achieved in predicted models. In Memoir 2.0, we increase coverage by modelling the missing structural information only if such prediction is sensible. Therefore, after fragment-based loop modeling by the Fread algorithm, missing loops shorter than 17 residues are predicted using a novel ab initio method, Mechano. We demonstrate that Mechano performs better and faster than other ab initio methods on membrane proteins. In addition, Mechano is able to predict the N-terminal and C-terminal of models unlike the other available methods. Memoir 2.0 generates multiple models offering different coverage of the protein target, all of which can be visualised and downloaded by the user. In addition, a model’s residues can be displayed with a spectrum of colors based on the level of confidence in the prediction. Memoir 2.0 is freely available at http://opig.stats.ox.ac.uk/webapps/memoir.

L21 - Full Protein Stability Curve prediction by temperature-dependent statistical potentials

Fabrizio Pucci, Université libre de Bruxelles, Belgium

Short Abstract: The prediction and control of protein stability at different temperatures is a key goal in protein science. Unfortunately, it is still far from reach since not much is known about the temperature dependence of the amino acid interactions. In this work we go further into the protein stability investigation by building a method for predicting the full Gibbs-Heltmoltz stability curve of a given protein and thus how its folding free energy depends on the temperature. This mathematical function encodes all the thermodynamic parameters that characterize the folding transition and its knowledge is thus fundamental in the protein stability analysis. In summary, we used the formalism of the temperature-dependence statistical potentials to estimate the value of the folding free energy of a given protein of known structure at different temperatures. The stability curve was extracted from these energy values, using a simple extrapolation procedure and the subsequent optimization of some parameters. The method shows good performances when applied to a reference set of about seventy proteins with known stability curve. The standard deviation between the predicted and the experimental values, computed in cross validation, are equal to about 12 °C, 1 kcal/(mol °C) and 4 kcal/mol for the melting temperature, the change in heat capacity and the folding free energy at room temperature, respectively. As far as we know, this is the first method that is able to predict both the thermodynamic and thermal protein stability in a fast and accurate way on a proteomic-wide scale.

L22 - Identification of Protein-Protein Interfaces via Detecting Correlated Mutations in Interactions

Fei Guo, Tianjin University, China

Short Abstract: Protein-protein interactions play a key role in a multitude of biological processes, such as de novo drug design, immune responses and enzymatic activities. Conformational changes frequently occur when the proteins bind to their partners. It is of great interest to understand how proteins in a complex interact with each other. Existing methods remain time-consuming, and they solve only a small part of protein complexes accurately. Here, we present a novel method for protein docking with conformational changes. The key idea of our method is that the correlated mutations of interacting motifs can cause conformational changes on protein-protein interfaces. By using PAM matrix, sequence alignments for proteins are constructed on the segments with a sliding window. The correlation coefficient between two segments, one from each protein, are calculating through the evolutionary distance of each segment. The interacting motifs are selected by extracting the segment pairs with high correlations. Multidimensional scaling is performed to generate the possible conformational changes of the loop regions in selected interacting motifs. We utilize the energy function to identify the near-native docking configurations with suitable conformational changes. Experiments illustrate that our method achieves better results on Benchmark v4.0. In medium difficulty group, our method obtains an average iRMSD of 4.15Å, which compares favorably with the average iRMSD of 4.46Å for ZRANK. In difficulty group, our method obtains an average iRMSD of 5.61Å, which compares with the average iRMSD of 6.18Å for ZRANK.

L23 - Contribution of Protein Flexibility to Arc-DNA Binding Specificity

Jun-tao Guo, University of North Carolina at Charlotte, United States

Short Abstract: Specific protein-DNA interaction is essential to many biological processes, such as maintenance of DNA integrity and transcriptional regulation. The combined direct/indirect readout mechanisms for explaining transcription factor binding specificity are mainly “static”. Protein-DNA recognition, however, is intrinsically a dynamic process involving fine structural fitting between proteins and DNA. In this paper, we investigated the contribution of protein flexibility to protein-DNA binding specificity by comparative molecular dynamics simulations. The simulations were performed on the wild-type and mutant (F10V, phenylalanine-10 mutated to valine) P22 Arc repressors in both free and complex conformations. It has been previously demonstrated that the F10V mutant has lower DNA binding specificity though both the bound and unbound main-chain structures of the wild-type and the F10V mutant Arc are highly similar. We found that in the unbound form, the DNA-binding motif of the wild-type Arc repressor is structurally more flexible than the F10V mutant, especially for three DNA base-contacting residues Gln9, Asn11 and Arg13. Higher flexibility of the DNA-binding motif in the wild-type Arc may lead to higher DNA binding specificity through forming more hydrogen bonds with DNA bases upon binding. Our results indicate that protein flexibility and dynamic properties play important roles in protein-DNA binding specificity.

L24 - Developing Computational Techniques for Structure Modeling of Helical Membrane Proteins

Zhijun Li, University of the Sciences in Philadelphia, United States

Short Abstract: Transmembrane (TM) proteins are estimated to account for ~20-30% of the human genome and serve as important drug targets. Despite the significant progress in experimental techniques, TM protein structure determination remains a challenge in general. Computational approaches, including de novo and comparative structure predictions, have played a significant role in structural and functional studies of membrane proteins, as well as in their structure-based drug design efforts. A major challenge of membrane protein structure prediction is to assemble individual helices into high-quality tertiary structures. In the present work, we developed a conceptually simple, easily implementing and effective computational approach to address this challenge. First, a set of conserved inter-residue interactions between the template and the target protein are identified and converted into distance restraints; Second, extensive restrained simulated annealing simulations are performed to sample the local conformational space around the homology model; Finally, the generated conformers are scored to identity the best conformation. This method allows easy implementation of additional distance restraints extracted from multiple templates and/or experimental data, and has the potential to be applicable to modeling of all helical transmembrane proteins. This work was supported by the NIH-R15 grant from NIGMS.

L25 - Extending the applications of PSICOV-generated residue contacts in protein structure prediction

Stuart Tetchner, University College London,

Short Abstract: Predicting protein structure from sequence is an ongoing area of research within bioinformatics. The problem of folding a protein chain can be reduced to identifying many pairs of residues which are in contact. With sufficient numbers of residue ‘contacts’, it has long been known that accurate models of protein structures can be constructed.

In recent years, one area of contact prediction which has made considerable progress is directly calculating contacts from protein sequences. By aligning homologous proteins, covariation between positions can be detected, suggesting that these residues may be under selective pressure to maintain their interaction, and are therefore in close spatial proximity. The development of methods in this area has also benefitted greatly from the large influx of protein sequences which are now available due to sequencing initiatives.

The Protein Sparse Inverse COVariance (PSICOV) method for calculating residue-residue contacts uses a L1-regularised graphical LASSO method to remove spurious instances of coevolution signal. The contact predictions produced by PSICOV have been shown to be sufficiently accurate to help in a wide variety of applications, including the determination of folds in both globular and transmembrane proteins, improving fold detection and domain boundary identification. Here we will present some of our latest results on exploiting PSICOV-generated residue contacts for the problems of protein tertiary and quaternary structure prediction.

L26 - B-cell epitope hot spot prediction by accessible surface area

Byungkook Lee, National Institutes of Health, United States

Short Abstract: Use of protein therapeutic agents is on the rise and reducing immunogenicity is an important issue for many of these agents. The first step towards reducing the B-cell immunogenicity by protein engineering is to identify key residues on an antigen for antibody binding. There are many existing epitope prediction methods, which identify residues in potential antibody binding sites. These methods generally produce a long list of residues since each antibody binding site of an antigen typically contains many residues. However, from an experimentalist’s point of view, a shorter list focusing on key residues is more practical for experimental validation. We recently summarized the approach we used to identify potential key residues on Pseudomonas Exotoxin domain III (PEdIII) that will spoil antibody binding and compared it to several existing epitope identification programs. This approach uses a simple exposed surface area criterion and attempts to find one or only a few key residues per distinct epitope. Surprisingly, the simple idea worked well in the case of PEdIII. The area under the ROC curve (AUC) is 0.85, which is significantly better than the best of the known methods (AUC: 0.73) that we examined. When tested on influenza hemagglutinin using known virus escape mutants against single monoclonal antibodies as the truth standard, our program gives higher sensitivity but more apparent false positives compared to a well known epitope prediction program. The usefulness of our program will be tested further with more antigens.

L27 - Exploiting reproducibility to improve protein function prediction

Jesse Gillis, Cold Spring Harbor Laboratory, United States

Short Abstract: Improving our knowledge of protein function in otherwise uncharacterized proteins is one major task to which computational methods are put. While this is often called protein function prediction when being treated as a machine learning problem, essentially the same methods underlie a variety of important biomedical applications, such as candidate disease gene prioritization. Like many methods in machine learning, a major concern for PFP methods is the degree to which their performance is robust and generalizes from benchmark tasks to novel data. In this work, we develop a novel approach for this problem, relying on meta-analysis across algorithms and datasets to quantify unknown factors affecting protein function prediction performance.

The essence of our approach is the observation that if two methods give perfect performance in cross-validation, they should be expected to agree in their novel predictions as well if the cross-validation performance appropriately generalizes. We used several types of biological data and a set of high-performance machine learning algorithms whose implementation and performance can be verified. We focused on data based upon networks derived from protein-protein interaction, sequence similarity, aggregated co-expression, and semantic similarity to study underlying patterns of performance. One important finding is that internal estimates of algorithm performance based on cross-validation are not reflected in reproducibility between algorithms; data variation is profoundly more influential than methodological variation. Our infrastructure allows us to characterize in detail why aggregation improves performance, where results are robust and reproducible, and what artifacts are potentially problematic in data interpretation.

L28 - Partial Domains in Proteins

Deborah Triant, University of Virginia, United States

Short Abstract: Although Pfam protein domains can be thought of as the "atomic" units from which larger functional proteins are assembled, almost 4% of Pfam27 PfamA domains (> 900,000 entries) are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To understand better the structural nature of partial domains in proteins, we examined 26,350 <50% partial domain regions from the 136 PfamA domain families (290,359 domains) in the the RPD2 reference protein database. We found that Pfam domain families are largely “atomic;” most apparent partial domains result from sequence construction and domain annotation errors. 60% (15,268) of candidate partial RPD2 domains result from long domains being broken into smaller pieces. Other partials are bounded by either non-homologous domains (1,038) or the ends of the protein (4,248), but bounded partial domains are over-represented in eukaryotes and lower quality protein predictions, suggesting that they may result from inaccurate protein models. 5,796 partial domains can be extended to produce a more complete domain. Searches with these extended sequences suggest that 80% of these partials result from incomplete domain alignments, while 20% can be found as a partial in other sequence contexts. Comparison of bounded and un-bounded partials to proteins of known structure showed 12 Pfam families may contain "structural partials", i.e., the Pfam domain contained multiple CATH structural domains. We conclude that partial domains are largely bioinformatics chimeras—the results of alignment and annotation artifacts.

L29 - Antibody Modelling: CDR-H3 Structure Prediction

Claire Marks, University of Oxford,

Short Abstract: Antibodies are proteins produced by the cells of the immune system to bind to foreign substances that find their way into the body. They are becoming increasingly important in biomedical applications: they are ideal candidates for therapeutic agents, since they bind with high specificity and affinity to their targets. It would be extremely advantageous, therefore, to be able to use computer aided modelling techniques to predict antibody structures with high accuracy, enabling accurate predictions to be made about their biological activity. The binding properties of antibodies are mainly determined by six loops found in their structure, known as complementarity determining regions (CDRs). Of these, the H3 loop is thought to contribute the most to binding. However, while we can predict the structures of the five other CDR loops with a reasonably high level of accuracy, prediction of the H3 loop structure remains challenging. The difficulty arises in the diversity of H3 structures, with regards to both length and sequence. Here, we describe the development of a novel ab initio method, MECHANO, specifically designed to model H3 loops, and compare its performance to other ab initio predictors. We also show how a combination of an antibody specific database-based method (FREAD) and MECHANO is able to offer high-confidence, accurate predictions for H3 loops.

L30 - Functional Characterization of Structural Genomics Proteins in the Crotonase Superfamily

Mary Jo Ondrechen, Northeastern Universtiy, United States

Short Abstract: There are now over 12,500 Structural Genomics (SG) proteins that have structures deposited in the PDB. However, most of these SG proteins are of unknown or uncertain biochemical function. Although many SG proteins have putative functional assignments, these assignments are often incorrect. The Crotonase Superfamily consists of five diverse functional subgroups that are well characterized both structurally and functionally. These subgroups represent different types of reactivity, including hydratase, isomerase, and dehalogenase activities. The Crotonase Superfamily also contains at least 60 SG proteins, so it is ideal to test predictions of protein function. Our approach is based on computational prediction of the functionally active residues in SG protein structures and comparison of these local chemical signatures with those of proteins of known function. First, we utilize Partial Order Optimum Likelihood (POOL) to predict the functionally important residues of each SG protein. Next, Structurally Aligned Local Site of Activity (SALSA) is used to compare the catalytically active residues of the well characterized members in the superfamily to those of the SG proteins. We demonstrate based on these computational methods that the majority of the putative annotations in this superfamily are likely incorrect. For some proteins, a more likely function is predicted. Currently, biochemical assays are being developed and used to test these predictions. The main outcome of this project will be to successfully classify these SG proteins based on their local structure at the predicted active sites.

L31 - Unearthing the structural information contained within mRNA

Alistair Martin, University of Oxford,

Short Abstract: Due to the degeneracy of the genetic code, there exists multiple synonymous DNA sequences that result in the same amino acid sequence being produced. Both the central dogma and Anfinsen’s dogma state that the flow of information within a cell is unidirectional and hence these synonymous sequences should retain no knowledge once translated, thus resulting in equivalent proteins. An increasing volume of experimental work show that in actuality these synonymous sequences produce proteins with different physical properties; a wide range of changes have been reported to occur in association with synonymous mutations in coding DNA sequences. For our research, we use a bioinformatics approach to probe the manner in which codon choice can influence the tertiary protein structure. We utilise the concepts of codon optimality and cotranslational folding to score the translation rate of RNA sequences which have corresponding Protein Data Bank entries, crucially using data sourced from a wide range of species. By subsequently grouping by structure and aligning, we find features in their translation profiles that are evolutionarily conserved in association with structural features. Our research indicates an additional layer of information pertaining to the protein structure contained within mRNA that is lost once translation occurs. We hope that in the future we can use these results to improve upon the current biophysical understanding of translation and structure formation.

L32 - An Antibody-specific, Knowledge-Based Framework for Prediction of Affinities.

Jinwoo Leem, University of Oxford,

Short Abstract: Protein–protein interactions are ubiquitous in biology, and are responsible for a wide range of cellular functions. Antibody–antigen interactions are of particular interest as antibodies are capable of recognis-ing, in principle, any antigen in a specific and potent manner; hence, antibodies are often used as novelbiopharmaceuticals. In order to design antibodies with ideal binding properties, it is imperative to have afunction that can accurately determine a prospective antibody’s affinity. Currently, most scoring functionsuse a combination of pseudo-biophysical terms to estimate the binding affinity of a protein–protein complex, but their predictive capacities are poor, especially for antibody–antigen complexes. Here, we present a novel statistical potential designed specifically for antibody–antigen complexes. We show that the use of antibody–specific data in the development of the potential is critical to its success. Furthermore, we find that combining our statistical potential with other antibody–specific descriptors provides greater insight into the affinity of antibody–antigen interactions. Effectively, this suggests that antibody–antigen interfaces are distinct from general protein–protein interfaces. As our method is designed to evaluate the affinity of any antibody–antigen complex, it can therefore be used for in silico antibody design campaigns.

L33 - Comparison and classification of DNA polymerases chains and domains based con multiple sources of structural and sequence information

Alex Slater, Pontificia Universidad Católica de Chile, Chile

Short Abstract: DNA polymerases are a very ancient protein family responsible for the preservation of genetic information. They are very diverse in terms of sequence similarity and also very complex in terms of structure. For these reasons, DNA polymerases represent a challenging study case in bioinformatics. These are currently classified into 6 different families according to their sequence similarity. Little effort has been made in the comparison of these proteins in terms of their structures. Structural comparison can be particularly useful for detecting relationships in highly divergent protein families, where traditional sequence comparison methods cannot detect a clear similarity signal.We present a detailed comparison at the structural level of all representative family members of DNA polymerases currently available at the PDB. We have used a method based on multiple structural alignments generated by different software to perform a hierarchical classification of DNA polymerases at the chain, domain and sub-domain levels. Our results show that the current classification of DNA polymerases is only consistent at the chain-level. However, an important degree of structural diversity within a family and between families is still observed. We present some examples where the structural comparison suggests the splitting of two classes within a family and where two distinct sub-domains from different families are structurally related. We conclude that structural information is needed to reveal relationships that cannot be detected by sequence comparison and thus to improve the comparison and classification of highly divergent protein families.

L34 - Spatial organization and distribution of linear motifs in the Ankyrin repeat protein family.

Nina Verstraete, University of Buenos Aires - Argentina, Argentina

Short Abstract: Interactions between proteins regulate cellular physiology. Many of these interactions involve the recognition of short peptidic regions (i.e. short linear motifs, SLiMs) which can be characterized by simple sequence patterns, usually found in intrinsically disordered regions or in loops connecting globular or transmembrane domains. These peptide-domain interactions are typically transient and often involve folding upon binding, challenging the lock-and-key paradigm of protein recognition.

Ankyrin-repeats domains (ARD) are one of the most frequently observed protein-protein interactors in nature. These domains are composed of tandem arrays of recurrent amino acids that cooperatively fold into elongated structures that mediate molecular recognition with high specificity. Many ankyrin-binding sites are either predicted or demonstrated to correspond to extended peptides mimicking SLiMs.

We present here an exhaustive analysis of linear motif identification in ARD proteins and their binding partners. We searched for particular SLiMs under- or over-represented with respect to a random exploration of the sequence-space in the ARD recognition family. We also analyzed the spatial distribution of SLiMs along ARD protein sequences and describe how particular SLiMs are specifically distributed in the ARD-containing proteins. We discuss that the presence of functional constraints can conflict with the ARD folding dynamics which in turn modulate the evolution of biological interactions.

L35 - A structural view of the Ankyrin-repeat Proteoverse

R. Parra, Protein Physiology Laboratory, Biological Chemistry Department, University of Buenos Aires., Argentina

Short Abstract: Background
Natural protein sequences resemble random strings of amino acids. Patterns of a relatively small set of folding architectures can be characterized by long distance interactions among amino acids.

Repeat proteins are composed of tandem copies of similar motifs and can get spontaneously organized in symmetrical ways in space. Ankyrin repeat proteins comprise a large number of proteins containing tandem copies of a ~33 residues length motif. They are present in all kingdoms of life, and are apparently enriched in eukaryotes and some specific pathogens.
We have shown that some repeat proteins can appear nearly periodic, while in others clear separations between repetitions exist.

Results
Here we show that the geometry itself is not enough to define a preferred phase for all natural Ankyrin-repeat arrays. When folding energetic parameters are included, a particular phase to define the ankyrin domains emerge. With this definition we annotated the location of the structural repeat occurrences on the molecules using only 3D structural information. The strategy allows for the identification of structural modifications such as insertions and deletions in the repeat array, and quantify how these perturb the overall symmetry of the repeat domain. "

We exhaustively analyzed the currently known ankyrin repeat structural space at the levels of individual repeats, the repetitive arrays, the whole molecules as well as the interactions with their structural partners in order to deconvolute how structural perturbations propagate in these systems and how are these reflected in their primary sequence.

L36 - BioJS: an open source standard for biological visualisation – its status in 2014

Guy Yachdav, Technical University Munich, Germany

Short Abstract: BioJS is a community-based standard and repository of functional components to represent biological information on the web. The development of BioJS has been prompted by the growing need for bioinformatics visualisation tools to be easily shared, reused and discovered. Its modular architecture makes it easy for users to find a specific functionality without needing to know how it has been built, while components can be extended or created for implementing new functionality. The BioJS community of developers currently provides a range of functionality that is open access and freely available. A registry has been set up that categorises and provides installation instructions and testing facilities at http://www.ebi.ac.uk/tools/biojs/. The source code for all components is available for ready use at https://github.com/biojs/biojs.

L37 - Computational Prediction of Spatially Extended Active Sites in Dynamic Enzymes

Mary Ondrechen, Northeastern University, United States

Short Abstract: Classic studies of enzymes identified catalytic residues that are typically in direct contact with the substrate. Dynamic conformational changes during catalysis, in addition to electrostatic interactions, allow for coupling between remote residues and the canonical active site residues of an enzyme. This suggests that at least some enzyme active sites are spatially extended to include remote residues. Partial Order Optimum Likelihood (POOL), developed at Northeastern University, is a machine learning technique to predict catalytically important residues based on the 3D structure of a protein and on computed electrostatic properties. POOL has been shown to predict accurately the catalytic residues of an enzyme and is now shown to be able to discern between a compact and a spatially extended active site. Thus this work represents the use of computational methods to acquire new knowledge about how enzymes work in ways that have not been explored previously. Such understanding of how nature designs enzymes to catalyze reactions holds promise for many beneficial applications to the design of new medicines, renewable energy sources and green chemistry processes.

L38 - D2P2: The Database of Disordered Protein Prediction

Matt Oates, University of Bristol,

Short Abstract: We present the Database of Disordered Protein Prediction (D2P2), available at http://d2p2.pro. An assortment of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, have been run on all protein sequences from 1,765 complete proteomes including 352 eukaryotes, 108 archaea, and 1,305 bacteria. Integrated with these results are all of the predicted SCOP domains from the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. Parsed data from all predictions are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. A JSON based web-service is made available to algorithmically retrieve all data for a set of proteins of interest by sequence or ID. An interactive website provides a graphical view of each protein annotated with SCOP and Pfam domains and disordered regions from all predictors overlaid, with binding regions highlighted by ANCHOR. Additional annotation such as experimentally validated disordered regions from both DisProt and IDEAL resources are provided when available, alongside posttranslational modification data from both PhosphoSitePlus® and dbPTM databases. Naïve eukaryotic linear motif assignment is given where appropriate, with links to ELM for further analysis and validation. D2P2 aims to provide a resource to facilitate the development of disordered protein prediction and analysis, welcoming any contribution and feedback from the community.

L39 - Why domains in pieces? An analysis of protein domains encoded by multiple exons

Ben Smithers, University of Bristol,

Short Abstract: With domains being considered units of structure and function, we explore why the DNA that encodes them is often organised as multiple exons.

We collated a dataset of 15 million exons coding over 1.9 million protein sequences from 91 eukaryotic genomes, including animals, plants, protists and fungi. We mapped the locations of splice junctions to protein sequences to explore their relationship with compact domains.

Using SUPERFAMILY alignments, we mapped locations of exon boundaries to positions in Hidden Markov Models representing SCOP domains at the superfamily level. We found these positions are often conserved across multiple species. Additionally, we examined amino acid usage at and around splice junctions and found a bias that may seem undesirable for globular domains. By integrating sequence annotation from D2P2, including disorder and binding site predictions, we identify domains that include different structural features on their constituent exons.

L40 - Investigating the relationship between characteristics of protein-protein interfaces and binding affinity

Ayse Derya Cavga, Koc University, Turkey

Short Abstract: Relating structure to function has been a fundamental issue in structural biology. Knowledge of structural details of protein-protein interactions is crucial in understanding protein function. However, to determine whether a protein complex actually exists under a given pH, temperature and concentration, and whether it is permanent or transient, knowledge of binding affinity is essential. Here, using a structure-based benchmark, we investigate whether the binding affinity correlates with the structural features of protein-protein interfaces. Proteins forming larger interfaces are observed to show a stronger binding, i.e. higher binding affinity. Additionally, higher number of critical residues, hot spots, implies a protein-protein interface with higher affinity. We also extend the contact order concept to analyse protein complexes and find that contact order of protein complexes correlate with binding affinity independent of the contact order of their unbound components. Finally, we investigate the organization of hot spot residues at protein-protein interfaces of benchmark complexes which show a large conformational change upon binding. Although protein interfaces undergo a large conformational change, there are some rigid residues which correspond to the computational hot spots at protein interfaces. Our findings would be essential for predicting binding affinity based on features of protein interfaces as well as for docking studies.

L41 - Structure-based prediction of homeodomain binding specificity using an integrative energy function

Jun-Tao Guo, University of North Carolina at Charlotte, United States

Short Abstract: Transcription factors (TFs) are essential to regulation of gene expression through binding to specific target DNA sites. Mutations in transcription factors and binding sites can have serious consequences and may lead to various diseases. Structure-based methods for TF-DNA interaction can help us annotate TF-binding sites at genome-scale, better understand the effects of mutations in transcription factors and target sites, and facilitate structure-based drug design. We have previously developed knowledge-based residue-level statistical potentials for structure-based TF-binding site prediction and TF-DNA docking. Here we describe a novel hybrid energy function for improving structure-based TF-binding site prediction and for studying the effects of mutations on TF-DNA binding specificity. The new energy function combines atomic-level energies with statistical knowledge-based residue-level potentials. The atomic terms include hydrogen bond energy between protein residues and DNA bases, and the electrostatic energy between aromatic residues and DNA bases involved in a T-shaped pi stacking interaction. Our results show that adding the new atomic terms to the knowledge-based potential increases TF binding site prediction accuracy when tested on homeodomain proteins.

L42 - Analysis of Conformational Changes in RNA-binding Proteins

Kannan Sankar, Iowa State University, United States

Short Abstract: RNA-binding proteins (RBPs) play vital roles in transcription, translation, and post-transcriptional gene regulation. In the case of ribonucleoprotein (RNP) complexes conformational changes upon binding have been observed in the protein, the RNA, or both. Although previous studies have looked at global RMSD changes in the complexes, the examination of conformational variability of specific residues has been limited to a single study of a dataset of 12 unbound-bound pairs of RNA-binding proteins. Against this background, we analysed a dataset of 90 pairs of unbound and bound RNA-protein pairs assembled from recently published RNA-protein docking benchmarks, with the aim of characterizing and quantifying both global and local conformational changes in RBPs.

Our analyses show that most of the conformationally flexible residues in RBPs are non-interface surface residues. 10% of the residues that occur in both the unbound and bound structures are interface residues, and among these, 27% are conformationally flexible. Out of a total of 8,390 residues in flexible regions of RBPs, 1,137 (14%) are interface residues and 7,253 (86%) are non-interface residues. We also compared the performance of sequence-based versus structure-based machine learning methods for predicting RNA-binding residues in RBPs from unbound and bound complexes. Our results show that when performance is evaluated on unbound RBPs, the sequence-based methods outperform structure-based methods. These results are especially interesting in light of the fact that conformational changes are problematic for rigid docking or other structure-based methods for computational prediction of RNA-binding residues: Is this because many of the structures change significantly upon binding?

L43 - crowdAFP a new initiative for functional annotation of genomes

Mark Wass, University of Kent,

Short Abstract: Advances in genome sequencing have resulted in the identification of millions of genes in numerous species but many of them remain functionally uncharacterised. There is therefore a need for the development of accurate computational methods to automatically infer gene/protein function. There are now a number of such methods, many of which have participated in the recent Critical Assessment of Function Annotation. Here we detail a new international initiative, crowdAFP, which will combine leading function prediction tools to generate high confidence annotations, with the results made freely available online. crowdAFP will initially focus on a small number of model species genomes.

L44 - GOtrack: tracking and viewing changes in functional annotations of genes over time

Paul Pavlidis, University of British Columbia, Canada

Short Abstract: The Gene Ontology (GO) is a widely popular set of terms used to annotate gene product characteristics (“functions”) based on certain evidence. Despite ongoing changes in its design, it is considered a “gold standard” tool for analysis and data interpretation in a variety of settings. Previous studies have highlighted species-specific biases in annotation properties that could potentially impact the interpretation and reproducibility of analyses where GO was used. Assessment of this impact, however, remains challenging as these changes differ between species and gene products. We extend upon the results by Gillis J and Pavlidis P. [Bioinformatics 29,4 (2013)], to analyse the historical stability of GO annotation data in 14 different organisms, which include semantic similarity and multifunctionality metrics, GO term membership, trends and a visualization tool (GOtrack) to visualize those changes on a per gene and per GO term basis. This information will help the research community to visualize how the annotation for particular genes of their interest have changed and assess its impact on their research.

Keywords: stability, GO annotations, gene function

L45 - Predicting RNA-Protein Interactions Using Only Sequence Information

Usha Muppirala, Iowa State University, United States

Short Abstract: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from post-transcriptional regulation of gene expression to host defense against pathogens. Because high-throughput experimental methods for identifying and characterizing protein-RNA complexes have emerged only recently, predicting protein-RNA interactions is an important challenge in computational biology. Many methods have been developed for predicting partners and interfaces in protein-protein interactions. In contrast, relatively little work has been done on predicting partners in protein-RNA interactions.

We recently proposed RPISeq, the first sequence-based approach, for predicting RNA-protein interaction partners. Given the sequences of an RNA and a protein as input, a Random Forest classifier predicts whether or not the RNA and protein interact. The RNA sequence is encoded as a normalized vector of its 4-mer ribonucleotide composition, and the protein sequence is encoded as a normalized vector of its 3-mer amino acid composition, based on a 7-letter reduced alphabet representation. On a balanced nonredundant benchmark dataset, RPISeq achieved an accuracy of 89.6% with ROC AUC of 0.96. RPISeq classifiers, trained using the benchmark dataset, correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from five model organisms.

RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should be valuable for identifying potential ncRNA and mRNA partners for RNA-binding proteins of interest. Also, it should provide useful insights into the roles of specific ncRNAs of unknown function by identifying their potential protein interaction partners. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.

L46 - An Artificial Neural Network to Predict RMSDs of Docked Protein Complexes

Bahar Delibas, University of Massachusetts Boston, United States

Short Abstract: Protein-protein docking methods aim to compute the correct bound form of two or more proteins. One of the major challenges for docking methods is to accurately discriminate native-like structures. The protein docking community agrees on the existence of a relationship between various favorable intermolecular interactions (e.g. Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure. Different docking algorithms often formulate this relationship as a weighted sum of selected terms and calibrate their weights against a specific training data to evaluate and rank candidate structures. However, the exact form of this relationship is unknown and the accuracy of such methods is impaired by the pervasiveness of false positives.
Unlike the conventional scoring functions, we propose a novel machine learning approach that not only ranks the candidate structures relative to each other but also indicates how similar each candidate is to the native conformation. We trained a neural network with an extensive dataset using the back-propagation learning algorithm and achieved RMSD prediction accuracy with less than 1 Angstrom error margin on 19,600 test samples. Moreover, we tested the proposed tool with a large set of docked solutions to assess its ranking capability. The results indicate that the suggested approach may significantly enhance the accuracy of ranking for protein docking and refinement methods.

L47 - Mapping side chain interactions at the N- and C-termini of protein helices

Nicholas Newell, Independent Researcher, United States

Short Abstract: Interactions involving one or more amino acid side chains near the ends of protein helices stabilize helix termini and shape the geometry of the adjacent loops, contributing to supersecondary and tertiary structure. Side chain structures like the Asx/ST N-caps and the capping box, as well as hydrophobic and electrostatic interactions, have been studied at helix termini, but key questions remain, including: 1) To what extent are the terminal motifs that include two or more amino acids which have been detected in structural surveys likely to represent genuine cooperativities? 2) Which particular extrahelical loop geometries are favored by each side chain interaction near helix termini? This question must be answered before side chain interactions can be fully utilized in rational design and optimization. 3) Can an exhaustive statistical scan of a large, recent dataset identify new side chain interactions at helix termini?

In this work, three tools are applied to answer the above questions for both N- and C-termini. First, a new perturbative least-squares 3D clustering algorithm is applied to partition the terminal structures in a recent, large dataset by loop backbone geometry. Second, Cascade Detection (Newell, Bioinformatics, 2011), an algorithm that detects cooperativities by identifying sequence features that are outliers from their background models, is applied to each cluster separately to determine which features are most significant in each geometry. Finally, the results for each feature are displayed in a CapMap, a manipulatable Jmol structure that displays the loop geometries favored by the feature as a 3D conformational heatmap.

L48 - An analysis of oligomerization interfaces in transmembrane proteins

Guido Capitani, Paul Scherrer Institut, Switzerland

Short Abstract: We present an analysis of validated transmembrane protein (TMP) interfaces, the features of which we studied in detail also with respect to their soluble counterparts. The procedure to obtain a validated dataset of TMP interfaces (TMPBio) will be outlined, as will the interface features and the role of "interstitial" lipids. We will also present a benchmarking of our Evolutionary Protein Protein Interface Classifier (EPPIC, www.eppic-web.org) on TMPBio. This benchmarking demonstrated that the method, developed on soluble proteins, works well on TMP interfaces and enabled us to classify with EPPIC all proposed GPCR dimerization interfaces in the literature, to predict which ones are biological dimers. All those proposed interfaces were classified as crystal contacts but one. The exception is the largest interface of the human Smoothened receptor (PDB entry 4jkv): it exhibits clear biological dimer features and we propose it as a reliable template for modeling studies of GPCR dimerization.

L49 - Modelling of three mutations GANLS protein that causes MPS - IVa

Érico Torrieri, University of São Paulo, Brazil

Short Abstract: Mucopolysaccharidoses (MPS) are a group of lysosomal storage diseases caused by deficiencies in enzymes that degrades glycosaminoglycans (GAGs). MPS classification is based on the specific enzyme deficiency. MPS-IVa is caused by mutations in the encoding gene of the GALNS (Nacetilgalactosamine-6-sulfatase) enzyme. Our hypothesis was: missense mutations would severely affect the activity of GALNS, changing its hydrophobic core or modifying its folding. There is no effective treatment to MPS yet. Based on the known structure of wild GALNS, it would be interesting to know the structure of the mutant proteins, in order to consider approaches of therapeutic intervention. The goal is to model 3 mutations in GALNS: a mutation in the active site, one in the hydrophobic core and another one on the protein surface. Modeller software was used for protein modeling and Procheck, Whatif, dFire2, errat, ProQ, prose and VERIFY 3D validation softwares. By having similar structures, changes of A to V on the surface; and A to T in the core have shown no major changes in the target. In the other hand, in the active site, there was a change of C to Y, both polar aminoacids with different structures. C is responsible for breakdown of keratan sulfate and chondroitin sulfate, which causes a large change to the action of the enzyme. We conclude that missense mutations cause considerable changes when they are located in the active site probably affecting the protein function. This is valuable information showing that the search for new drugs must be carried out.

L50 - An Information Cascade Based Approach to Quantifying the Impact of Protein Data Bank

Yi-Hung Huang, Academia Sinica, Taiwan

Short Abstract: The Protein Data Bank (PDB) is the worldwide repository of 3D structures of proteins, nucleic acids and complex assemblies, most of which play essential biological roles and are the prime drug-targets in various diseases. Most journals require a prior submission of the structures to PDB as part of the publication process, which can be inquired by a unique identifier. By exploring these rich structure data and related citations, we can investigate the relationships between protein structures from the viewpoint of the citation network. Moreover, the analysis of the literature and data citation networks may demonstrate potential pathways of scientific discovery, that is, how knowledge and data were used to advance a particular field in structural biology. Here we propose an information cascade based approach to study the whole PDB citation network. We provide a quantitative measure to show how the relationships between protein structures can be characterized by their corresponding citation cascades. We map protein structures in overlapping citation cascades to drug-target and drug. The result shows that related protein structures can be clustered into groups that correlate to the same drug-targets. By quantifying the growth of cascade of each protein structure, the study reveals how PDB greatly impacts drug development.

L51 - Structural Analysis and Remodeling of T-Cell Receptors

Thomas Hoffmann, Technische Universität München, Germany

Short Abstract: T-cells play a major role in the adaptive immune response. T-cell receptor molecules (TCRs) distinguish between self-peptides and pathogenic peptides presented by major histocompatibility complex molecules (MHC) on cell surfaces and thus initiating an immune response. Structural prediction of TCR:peptide:MHC (TCR-p-MHC) complexes would allow a better understanding of this interaction and thus of the molecular basis of T-cell signaling, which is important for the development of immunotherapies and rational vaccine design. TCRs share a common structural scaffold, although the sequence variations in their variable domains can be high and the human TCR repertoire was estimated to be at least 10^8. Thus, due to genetic and structural diversities of the receptor, modeling of TCR structures and their TCR-p-MHC complexes is a challenging task. We investigated the structural characteristics of the TCR variable domains, consisting of two chains (alpha and beta). The analysis showed that the orientation of the TCR beta chain relative to the TCR alpha chain is dependent on the TCR type and that the differences can be described by a common center of rotation. Based on this analysis we developed a force field based prediction tool, which allows predicting the correct TCR geometry for at least 83% of the structures of our test set. In the presentation we will discuss the new methodology and its performance.

L52 - Predicting and Probing Protein-Protein Interactions in Haptoglobin with Myoglobin and Hemoglobin through Computational Modeling and Confirmation of Binding through Mass Spectrometry

Ololade Fatunmbi, University of Massachusetts, United States

Short Abstract: Haptoglobin1-1 (Hp), an abundant dimeric protein in blood binds free hemoglobin (Hb) dimers (Hbα1β1) in one of the strongest non-covalent binding events known in biology [1] and shields residues in Hb that are prone to oxidative modification. Myoglobin (Mb), a monomeric protein required for oxygen transportation in muscle, shares high structural homology and sequence similarity with both Hb α- and β-chains, and many residues critical for Hbα1β1 binding to Hp are conserved in Mb. Despite these similarities, there is a debate regarding the ability of Hp to bind Mb[2]. To predict whether Mb will bind to either the Hbα or Hbβ binding site on Hp, homology modeling was conducted to generate structural models of two Mb molecules in complex with one monomer of Hp (Mb-Hp) using the Hbα1β1-Hp crystal structure as a template. Computational modeling suggested the Mb-Hp model and the Hb-Hp structure shared various conservations in binding interfaces, and possibilities of unfavorable electrostatic interactions of Mb at the Hbβ binding site on Hp. Native MS experiments were performed to probe the binding of Hp to Mb and Hb experimentally. Mb-Hp complexes were detected by MS and confirmed the stoichiometry of the complex: one monomer of Mb can bind one monomer of Hp. Based on our predictions and experimental results, Hp binds Mb at lower affinity compared to Hb and residues critical for Hb or Mb interaction with Hp are proposed. Homology modeling is demonstrated as a useful preliminary method for predicting Mb-Hp interactions and correlates well with MS experimental data.

L53 - Construction of human gene and protein ontology which connects related concepts from different data sources.

Katsuhiko Murakami, National Institute of Advanced Industrial Science and Technology, Japan

Short Abstract: Database integration is more and more important for big data era. When we integrate information from different biological data sources, a key issue is to connect relevant terms which were independently created and annotated. The annotations are typically shown as if they were independent, despite that some annotations are actually correlated each other. To understand the relationships among the annotations, we comprehensively examined how much each annotation is correlated each other through proteins. We selected ten protein annotation (Protein Family, Protein Ontology, InterPro, KEGG pathway, protein-protein interaction, SCOP, SOSUI membrane protein prediction, OMIM, Tissue specificity of gene expression, and Wolf-PSORT) in the integrated human gene database, H-InvDB. For each pair of individual annotation terms (e.g. GO:0004252 and IPR001627), the correlation were evaluated using Fisher's exact (two-side) test with Bonferroni correction. As a result, we found 21,047 (793) pairs with positive (negative) correlation. From these related terms, we constructed ontology to express their relationships from different origins (e.g. terms from KEGG pathway and GO function), such as "hinvo:correlatedWith". Besides some properties have score to express similarity between terms. The constructed ontology would contribute to more sophisticated information processing or data mining, such as function prediction.

L54 - Structural Dynamics of Porphobilinogen Deaminase from human and Plasmodium falciparum during catalytic extension of Dipyrromethane Cofactor

Gopalakrishnan Bulusu, TCS Innovation Labs, India

Short Abstract: Porphobilinogen deaminase (PBGD), an enzyme in tetrapyrrole biosynthesis pathway found across various organisms, catalyses the formation of 1-hydroxymethylbilane from four molecules of porphobilinogen (PBG) using the dipyrromethane (DPM) cofactor. Earlier, we reported the structural dynamics of E. Coli PBGD. Modelling and molecular dynamic simulations of human PBGD (hPBGD) and P. falciparum PBGD (PfPBGD) was performed to study the structural dynamics during the addition of each of the four PBG molecules. Simulation for 250 ns was performed for each of the homologs across five catalytic stages. Different extents of domain movements were seen in each homolog for accommodating the polypyrrole within its active-site. Compared to large domain movements seen for E. coli PBGD, minimal movements were observed in hPBGD, while PfPBGD experiences intermediate level movements. The movement of active site loop (A55 to T78) and a helix (K79 to E86) in domain 1 provides the extra volume required to accommodate the poylprrole in hPBGD. Essential dynamics of PfPBGD showed that domains 1 and 2 move apart in the first stage of PBG addition, to create space in the active site cleft for subsequent PBG molecules. The pyrrole chain is accommodated in a helicoidal conformation within the active site of PfPBGD, wherein the second and the sixth rings are proximal. In both PfPBGD and hPBGD functionally important and conserved residues R67(R26 in hPBGD), Q75(34) and D145(99) interact with the elongating pyrrole chain at each stage and may participate in the catalysis as indicated by biochemical studies.

L55 - Structure-based design of combinatorial mutagenesis libraries

Deeptak Verma, Dartmouth College, United States

Short Abstract: The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. While structure-based design of individual proteins has proven highly beneficial, only sequence-based approaches have been scaled up to the design of massive libraries. We present the first general-purpose method for optimizing arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. Case study applications to targets including green fluorescent protein, β-lactamase and lipase-A proteins show that, for example, in less than an hour our method can optimize a library of 10 to 20 mutated sites, with 2 amino acids per site (using either point mutations or degenerate oligos), and up to 10^6 variants. It can also readily optimize libraries of even a billion members. Analysis of resulting library designs provides insights into how our method selects different mutations that “play well together” in producing variants that are diverse yet energetically compatible with the desired structure. Our approach enables investigators to bring powerful structure-based design technique to bear in library-based contexts, thus promising to improve the hit rate of discovering beneficial variants.

L56 - FireDB and firestar: curated binding information for function prediction

Paolo Maietta, Spanish National Cancer Research Centre, Spain

Short Abstract: Protein function is still a challenging and time-consuming task and, crucially, experimental function determination is unlikely to be standardized. Many protein function prediction methods have been developed in recent years in order to cope with the progressive growth in the protein sequence databases.

The PDB is unique as a source of structural information, but also as a source for small ligand binding data. FireDB annotates small ligands and their associated binding residues from PDB and catalytic residues from Catalytic Site Atlas. The database also classifies them according to their biological relevance, pointing out natural and pharmacological occurring interactions.

firestar is an expert system for predicting functional residues in protein sequences based around FireDB information. It makes predictions from those validated functional residues that are supported by local sequence conservation. firestar has been assessed as a state-of-art method for binding site prediction in the last three Critical Assessment of Structure Predictions experiments.

Both tools have been used in different projects.
firestar is part of the construction pipeline of APPRIS, a database for the automatic annotation of alternative splicing products.
firestar and FireDB have been used for the evaluation of the functional integrity of families and clans of the Pfam database.
Finally we have recently integrated firestar into an in-house automatic Gene Ontology (GO) term prediction method, SIAM. The combined approach was tested in the last CAFA experiment.

L57 - Unexpected Structural Similarity of Analogous Loops

Yoonjoo Choi, Dartmouth College, United States

Short Abstract: Protein loops are important secondary structure elements connecting regular secondary structures. They are located on protein surfaces and often related to function. Despite the importance of loop structures, they are difficult to accurately predict due to their high variability, as they tend to be subject to mutation during evolution. Thus loop prediction is performed at the last stage in protein modelling and the protein loop structure prediction problem is regarded as a constrained mini protein folding problem.

Similarly to sequence-structure relationships more generally in proteins, it is well known that loops with similar local sequences share similar structures (homologous loops), and those with dissimilar sequences tend to have different structures. However, there are also loop pairs that have dissimilar sequences yet very similar structures (analogous loops). Though protein loops with similar sequence and structure has been intensively studied in some protein families such as the canonical structures of complementarity determining regions of antibodies, very little is currently known about cases of dissimilar loops sharing highly similar structures.

Here we investigate analogous protein loop pairs. It is known that the distance between end points of a loop (loop span and loop stretch) restrict its structural degrees of freedom. However, loop span and stretch do not show differences between homologous and analogous loops, and thus analogous loops are not due to physical constraints. We found that amino acid similarity, e.g. similar size, structure, or charge, still plays an important role in the unexpected structural similarity between analogous loops.

L58 - Comparative protein structure model of a dimeric respiratory syncytial virus matrix protein complex

Andreas Schueller, Pontificia Universidad Católica de Chile, Chile

Short Abstract: Human respiratory syncytial virus (hRSV) is an enveloped RNA virus of the family Paramyxoviridae, order Mononegavirales, and is the principal cause of bronchiolitis and pneumonia in infants worldwide. The viral M protein associates with membranes and plays a central role in coordinating viral assembly and budding. Despite evidence for dimeric and higher order forms in solution, the M protein was crystallized as a monomer. Here we present a comparative protein structure model of a potential dimeric quaternary structure of hRSV-M. We performed a systematic analysis of Mononegavirales matrix proteins and identified 22 related crystal structures. The matrix proteins of several mononegaviruses (metapneumovirus, Newcastle disease virus, Borna disease virus) were crystallized in a particular planar square-shaped, dimeric or tetrameric quaternary structure, and served as templates for comparative modeling. Dimeric hRSV-M models were generated by a multi-template approach with help of the software MODELLER and were validated by a statistical potential derived from known protein complexes. Surface features of dimeric hRSV-M were mapped by similarity with template structures and were in good agreement with experimental results for hRSV-M RNA binding. In absence of a dimeric crystal structure of hRSV-M, our results might help to improve understanding of multimerization and interaction with viral and host factors on an atomic level.
Acknowledgements: FONDECYT No. 1131065

L59 - Redesigning the UniProt Web Interface

Sangya Pundir, EMBL-EBI,

Short Abstract: The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and functional data. UniProt aims to provide the scientific community with a comprehensive, high-quality and freely accessible protein resource. The UniProt website is our interface to the world, with approximately 400,000 unique visitors per month. We hence undertook extensive user research to understand the requirements of the scientific community. This resulted in an effort to redesign the UniProt web interface. The aim was to make it more intuitive and to ensure that users can get the most out the data and functionality that we provide. The redesign was carried out in a user centred manner, involving user feedback at every stage. We have found it immensely valuable to include the user community in designing the resource and have seen measurable improvements in key areas. We will showcase some key enhancements and changes to the current interface. These include our new homepage, results page and protein entry page organisation, new results interfaces for our tools and new pages for our proteome data. We will also present the user centred design approach that we followed throughout the process and the rationale behind our key changes. The new web site is available as a beta version for the scientific community to try out and we welcome all feedback.

L60 - Automated function prediction using SIFTER: an application and validation on Desulfovibrio vulgaris Hildenborough genome

Sayed Sahraeian, University of California, Berkeley, United States

Short Abstract: We performed a new validation of the Statistical Inference of Function Through Evolutionary Relationships (SIFTER) algorithm on the whole genome automated functional annotation of Desulfovibrio vulgaris Hildenborough (DvH) genome. SIFTER uses a statistical graphical model of function evolution to incorporate annotations throughout an evolutionary tree, making predictions supported by posterior probabilities for every protein. We validated SIFTER’s predictions on a set of DVH proteins involved in sulfate reduction that includes many of the central players in energy metabolism with experimentally supported annotations. Validation on these experimentally annotated genes revealed that for most of the genes SIFTER has consistent prediction with the reference annotation, while in some cases the predictions are relatively generic. However, the generic predictions are usually due to the (in)sufficiency of experimentally determined specific molecular functions for any homologous sequences.

L61 - Modeling Residue Interaction Graph with matrix Bingham-von Mises-Fisher Distribution

Qiang Lv, University of Soochow, China

Short Abstract: Protein complexes play important roles in signal transduction and material transportation in cells. Knowledge of the structure of the complex can help us to better understand its function. We usually use computational protein-protein(p-p) docking methods to predict the structure of the complex. However the quality of the structures generated by docking methods depends largely on the accuracy of the energy function. And the selection of the best prediction from decoys is also a huge challenge at present. Incorporation of flexibility would inevitably increase the conformational space dramatically.
Making improvements in energy function that will help to pick better near native candidates would be a tremendous contribution to complex structure prediction. The residue interaction graph can not only represent the biological impact but can also quantify the impact into residues’ contacts. We use the graph probability model called matrix Bingham-von Mises-Fisher to model the residue interaction graph of flexible docking decoys. Our model is a useful supplement for energy function in ranking flexible docking decoys, especially in cases where the energy function failed to pick out most promising candidate structures.

L62 - SCOPe: Organizing New PDB Structures into the Classic SCOP Hierarchy

Naomi Fox, Lawrence Berkeley National Laboratory, United States

Short Abstract: SCOPe (SCOP—extended, http://scop.berkeley.edu) is a database of protein structural relationships that extends the Structural Classification of Proteins (SCOP) database. SCOP is a manually curated ordering of domains from proteins of known structure in a hierarchy according to structural and evolutionary relationships. Development of the SCOP 1 series concluded with SCOP 1.75. SCOPe automatically classifies many structures released since SCOP 1.75 and corrects some errors, aiming to match the accuracy of the hand-curated SCOP releases. The ASTRAL compendium provides several databases and tools to aid in the analysis of the protein structures classified in SCOP, particularly through the use of their sequences. Because SCOP and ASTRAL are widely used for structural and evolutionary studies, and are often used for benchmarking new algorithms, new SCOPe releases are fully backward compatible with SCOP 1.75, and incorporate and update ASTRAL. The current release of SCOPe classifies 63,103 protein structures, 24,882 more than SCOP 1.75.

L63 - Protein functional sites predicted by Surface dimension reduction

Ahmet Sacan, Drexel University, United States

Short Abstract: The protein structure alignment has been widely used and researched in computational biology, but the interpretation of the protein 3D structure has always been a big scientific challenge. We have presented a novel method to reduce the dimensionality and utilize image template matching to compare and analyze protein functions. Our framework allows a representation at various structures with physiochemical information, and we have specially incorporated curvature, electrostatic potential, hydrophobicity and evolution conversation into the structure. Our method is able to detect local similarities even in proteins that lack a global similarity.

L64 - Protein Structure Alignment meets Evolution

Ahmet Sacan, Drexel University, United States

Short Abstract: Most of the available protein structure alignment methods such as DaliLite and TMalign align protein structures solely based on geometric information and are limited in their ability to find functionally relevant correspondences between two proteins. In this study, we introduce a new method to incorporate additional biochemical and evolutionary features of the proteins being aligned. We propose Uniscore, a new structure similarity score, which integrates geometric similarity, sequence similarity, and evolutionary profiles of the proteins. While sequence similarity has previously been investigated, to the best of our knowledge, this is the first study to utilize evolutionary profiles in structure alignments. Two specific evolutionary scores are calculated: residue conservation score represented by the entropy or the sum-of-pairs score and sequence profiles denoted by PSSM. We further developed Unialign, a protein structural alignment algorithm to find an alignment with near-optimal Uniscore. The important difference from existing heuristics is that during iterative optimization, we incorporate our scoring function into both residue pair collection using dynamic programming, and also in the geometric superposition step using a weighted-RMSD. We evaluated our program in terms of the consistency between the alignments it produces and human-curated alignments, calculated by the fraction of correctly aligned residues. Two benchmarks are included in our study: CDD and HOMSTRAD. A large scale comparison of Unialign with several popular structure alignment methods: DaliLite, TMalign, FATCAT, SPalign and DeepAlign is ongoing. A case study of 1lfb and 1ftz shows that adding evolutionary profiles are essential to correctly align two protein structures.

View Posters By Category

Search Posters:

TOP