Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category L - 'Protein Structure and Function Prediction and Analysis'
L001 - Alternate Conformation Prediction of Vibrio cholerae Concentrative Nucleoside Transporter
Short Abstract: Secondary Transporters couple the uptake of substrate with ion transport. Crystallization of these proteins in the past decade allowed for investigation into the molecular mechanism of transport. These structural analyses revealed internal symmetries, hypothesized as integral components of their transport domains. Alternate conformations of secondary transporters suggest that they function through the alternating access mechanism, allowing substrate access from only one side of the membrane, depending on the transporter's conformation. Experimental evidence conducted alongside these analyses confirms the existence of these alternate conformations in human homologues in vivo. With the recent crystallization of Vibrio cholerae concentrative nucleoside transporter, structural and computational analysis predicting the molecular mechanism of transport can now be conducted using coarse-grained modeling to predict the protein dynamics of transport.
TOP
L002 - KeyPathwayMiner - Extracting relevant pathways by combining OMICS data and biological networks
Short Abstract: We introduce the latest version of the KeyPathwayMiner
software framework. Given a
biological network and a set of case-control
studies, KeyPathwayMiner efficiently extracts
and visualizes all maximal connected subnetworks
that contain mainly genes that are
dysregulated, e.g., differentially expressed, in
most cases studied. The exact quantities for
“mainly” and “most” are modeled with two
easy-to-interpret parameters that allow
the user to control the number of outliers
(not dysregulated genes/cases) in the solutions.
We developed two slightly varying
models that fall into
the class of NP-Hard optimization problems.
To tackle the combinatorial explosion of the
search space, we designed a set of exact and
heuristic algorithms. In our presentation,
we will demonstrate the extraction
of active sub-networks in Huntington disease utilizing gene expression data. In addition, we
demonstrate KeyPathwayMiner’s flexibility by combining genome-scale DNA methylation profiles
taken from colorectal cancer patients with the human interactome to identify differentially
methylated key pathways.
TOP
L003 - Structure-based redesign of proteins for minimal T-cell epitope content
Short Abstract: The ongoing development of protein-based therapeutics has provided novel and efficacious treatments for a broad spectrum of diseases. However, large molecules of non-human origin are subject to immune surveillance, potentially resulting in detrimental anti-biotherapeutic immune responses (aBIRs) in human patients. One key source of aBIR is cellular presentation and molecular recognition of constituent immunogenic peptides, called epitopes. Building upon the Rosetta flexible backbone method, we present a computational protein design tool that yields minimal T-cell epitope content in target sequences. We examine our method on three therapeutically relevant proteins: erythropoietin (hormone for red cell production), staphylokinase (thrombolytic agent) and HB36 (flu-binding protein). Our deimmunized protein designs exhibit native-like properties as measured by Rosetta energy, hydrophobicity, charge conservation, structural distortion, packing and sequence identity. We further demonstrate the broad applicability of the method to a broader panel of protein targets.
TOP
L004 - A large‐scale evaluation of computational protein function prediction
Short Abstract: The presentation will first provide motivation for and challenges of predicting protein function. This will include both biological significance and also precise computational problem formulation. We will then present details (at an appropriate level for a highlight presentation) of the CAFA experiment as described in the paper, discuss current state-of-the art in protein function prediction, and lay out possible avenues for improvements and accuracy assessment of computational function prediction. Finally, we intend to briefly discuss the next CAFA challenge whose start will coincide with the ISMB 2013 conference.
TOP
L005 - In silico identification of ubiquitin-binding domains
Short Abstract: The ubiquitylation signal promotes trafficking of endogenous and retroviral transmembrane proteins. The signal is decoded by a large set of ubiquitin (Ub) receptors that tether Ub-binding domains (UBDs) to the trafficking machinery. We developed a structure-function based procedure to scan the PDB for hidden UBDs. The procedure is based on structural alignment of the physico-chemical properties of known ubiquitin binding sites to determine "template baits" for the scanning of the structural database of all the eukaryotic proteins. A subsequent scan of the entire database by the highest scoring "bait" configuration retrieved many of the known UBDs. Intriguingly, new potential UBDs, which scored as high as the known UBDs, were identified, including the ALIX-V domain. Extensive in vitro and in vivo experiments, including mutations in the postulated ALIX-V:Ub interface, have corroborated the in-silico findings demonstrating that ALIX-V directly interacts with ubiquitin in vivo and that this interaction can influence retroviral budding.
TOP
L006 - SwissSidechain: a molecular and structural database of non-natural amino acids
Short Abstract: Amino acids form the building blocks of all proteins. Naturally occurring amino acids are restricted to a few tens of sidechains. Yet, the potential chemical diversity of amino acid sidechains is nearly infinite. Exploiting this diversity using non-natural sidechains has recently found widespread applications. With the SwissSidechain database (http://www.swisssidechain.ch), we offer a central and curated platform for non-natural sidechains. SwissSidechain provides biophysical, structural (e.g., rotamers) and molecular data for hundreds of commercially available non-natural amino acid sidechains. We further provide plugins to seamlessly insert non-natural sidechains into peptides and proteins using molecular visualization software, as well as topologies and parameters compatible with molecular mechanics software. Recent experimental results on a phage display optimized ligand of urokinase plasminogen activator show how expanding the building blocks of peptides and proteins with non-natural amino acids is powerful to develop potent inhibitors of protein interactions.
TOP
L007 - Pannzer: A high-throughput tool for functional annotation of unknown protein sequences
Short Abstract: PANNZER is a weighted K nearest neighbor classifier which uses BLAST to search a protein sequence
database and generate a list of neighbors of the query sequence. The central task in functional
annotation is the selection of the relevant annotation features from all annotation features that
occur in the sequence search list. Annotation features include features such as free-text functional
descriptions, gene names, Gene Ontology (GO) classes, Enzyme Commission (EC) classifications. PANNZER works well in error prone environment and it was ranked as the third best method in Critical Assessment of Function Annotations 2011 challenge [1].

[1] Radivojac et al. 2013. A large-scale evaluation of computational protein function prediction. Nature Methods 10, 221–227
TOP
L008 - Predicting the biochemical consequences of missense mutations using genome-wide homology modeling
Short Abstract: The discovery of which mutations contribute to a particular disease is an important biomedical problem with potential applications in drug discovery, disease diagnosis and prognosis, and the development of improved personalized therapies. To this end, we have developed a computational method that integrates complementary approaches for predicting the biochemical effects of missense mutations using genome-wide generation of homology models for human protein complexes. Mutations affecting diverse types of binding sites are identified by homology to available X-ray structures of complexes and machine learning classifiers while spatial clustering of mutations is used to detect other compact regions of the protein structure important for its function. A Random Forest classifier trained on results from these structure-based methods, as well as annotations from online databases, evolutionary conservation, and predicted stability changes was found to outperform current popular prediction methods. Finally, the predicted biochemical effects of mutations showed good agreement with experimental assays.
TOP
L009 - Predicting the codon usage in yeast
Short Abstract: The translation of DNA into protein is decoded in translation tables. However, not all organisms use the same translation scheme. Rather, several different schemes evolved, which cannot always be unambiguously assigned to species by their phylogenetic grouping. Such a case is present in yeast, whereof some species use an alternative translation of the leucine codon CUG. While members of Schizosaccharomyces and Saccharomyces clades translate CUG as leucine, some, but not all members of the Candida species translate it as serine. We developed a method predicting which translation table is used in a given yeast species. The prediction method is based on comparing a yeast query sequence to a reference dataset comprising about 1700 manually annotated cytoskeletal and motor protein sequences belonging to 26 conserved protein families in 71 yeast species. Query sequences can be single DNA sequences, transcriptome data or full genome assemblies. In the query sequence, cytoskeletal and motor proteins are predicted and subsequently aligned with the reference. CUG codons at alignment positions with highly conserved hydrophobic (e.g. valine, isoleucine and leucine) or polar (e.g. serine and threonine) residues are used to discriminate between standard and alternative codon usage. By applying this method to all publically available yeast sequences, we can reanalyse and possibly correct their annotated translation schemes. The prediction method was implemented in a web-interface and will be made publically available for ISMB/ECCB at http://www.motorprotein.de/bagheera.
TOP
L010 - From sequence to enzyme mechanism
Short Abstract: Background:
Predicting enzyme function at the level of chemical mechanism provides a finer granularity of annotation than traditional Enzyme Commission classes. Predicting not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which inhibitors has important consequences for drug and enzyme design.
Work such as SABER (Nosrati et al. 2012) predicts enzyme catalytic activity based on 3D protein features. However, using 3D structural attributes limits the prediction of enzyme mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling.

Results:
In this study we evaluate whether InterPro and Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. Our ml2mech method (Machine Learning to Mechanism) can predict at 92% accuracy the MACiE mechanism definitions of 288 proteins available in the EzCatDb, MACiE and SFLD databases, using an off-the-shelf multi-label K-Nearest Neighbours algorithm from the Mulan machine learning library.

Conclusion:
We find that InterPro signatures are critical for accurate prediction of enzyme mechanism, showing robust accuracy even when a reduced subset of signatures containing enzyme names are used. We also find that incorporating Catalytic Site Atlas site matches results in additional accuracy. Available online at http://sourceforge.net/projects/ml2db/ (code) and http://sourceforge.net/projects/ml2mech/ (data and results).
TOP
L011 - CD4+ T-cell Epitope Prediction Using Antigen Structure
Short Abstract: The major histocompatibility complex (MHC) molecules play a critical role in initiating immune response because they present antigen peptides on the cell surface for recognition by T-cells. Given an antigen protein sequence along with an MHC allele, a classic computational problem is to predict which, if any, peptides in the antigen will be immunodominant (for CD8 or CD4+ T-cells). Accurate computational prediction of the peptides that confer an immune response would be a significant achievement, because, it for example, we would then be able to screen pathogen proteomes for antigens that are suitable for use in vaccines. Traditionally, immunodominance has been equated with MHC binding affinity. While this approach has been fruitful for CD8 epitope dominance prediction, it is much less effective in predicting for CD4+ epitope dominance. We believe that CD4+ T-cell epitope dominance has been difficult to predict because the MHC antigen-presenting protein is less selective, and because during processing, proteases act on mostly natively folded antigens whose 3D structure directs proteolysis to disordered regions. Thus, potential MHC-binding sequences in flexible segments of the antigen are destroyed, and sequences in the stable segments are preferentially loaded and presented to CD4+ T-cells. Given the structure of an antigen, we have developed a simple methodology that takes conformational stability criteria into account when predicting peptide immunodominance. Our preliminary validation on ground truth epitope maps in mice shows that this approach has the potential to reduce the false positive rate of CD4+ T-cell epitope dominance prediction.
TOP
L012 - Towards structural elucidation of transmembrane protein Bilitranslocase
Short Abstract: In our work, we focus on the transmembrane organic anion transporter protein bilitranslocase. The most studied function of bilitranslocase is transport of bilirubin from blood to liver cells. However, the protein is also found localized in different tissues and is involved the uptake of several other ligands. Consequently, bilitranslocase can act as a potential drug target. The primary structure of bilitranslocase has been available for a while now, but it shows no sequence similarity with any other protein. Hence, we cannot utilize standard homology modeling and threading approaches. As the first step towards structural elucidation of the protein, we have predicted the four alpha helical transmembrane regions using a novel chemometric approach. The prediction is corroborated with reasonable evidences from experiments. We have analyzed the stability of the predicted transmembrane helices in standard DPPC membrane using molecular dynamics simulation for 20ns. The possible assembly of the four transmembrane regions is suggested by Monte Carlo approach. Analyzing the best conformations indicate two possible types of assembly. Further, the structures of the second and the third transmembrane domains (TM2 and TM3) are confirmed with NMR experiments performed in SDS micelle environment. Both transmembrane regions contain proline kinks at equivalent position in the membrane. Their structures indicate that the domains TM2 and TM3 are involved in forming the transporting channel and hint towards allosteric nature of the protein. Our future work is focused on NMR studies of the transmembrane domains and interaction between them with the aim of deducing the possible functional mechanism.
TOP
L013 - A recognition model of ACP-HCS interaction for programmed beta-branching in type I polyketide synthases
Short Abstract: Polyketide synthases (PKSs) are enzyme complexes that synthesise a wide range of natural products of medicinal interest, notably a large number of antibiotics. Type I polyketide synthases can introduce beta-carbon branches into a growing polyketide chain via enzymes encoded by the “HMG-CoA synthase (HCS) cassette”. One of the first polyketide biosynthesis cluster in which the HCS cassette was discovered is responsible for the synthesis of mupirocin by Pseudomonas fluorescens, which is a clinically important antibiotic effective against certain Gram-positive bacteria, including methicillin-resistant Staphylococcus aureus (MRSA), and is used clinically to treat bacterial skin infections. MupH is the HMG-CoA synthase homologue responsible for β-branching in the mupirocin synthesis pathway. To understand better what allows the HCS cassette to recognise β-branch-associated acyl carrier proteins (ACPs) of the mupirocin synthesis pathway, computer modelling was used to explore the interaction of the ACPs with MupH and homologues. Hidden Markov models (HMM) were used to classify ACPs as branching and non-branching. HMM analysis highlighted essential features for an ACP to behave like a branching ACP. We computationally docked a homology model of MupH with the NMR structure of each of ACPs mupA3a and ACP mupA3b. The docking results were also supported by the evolutionary trace data, which represent the conservation of amino acids within phylogenetic clades, and also supported by the physical properties of the interface residues. The results identified key residues and structural features critical for the recognition specificity of the branching ACPs which were further validated in lab experiments.
TOP
L014 - T-PioDock: A tool for scoring docking conformations using predicted protein interfaces
Short Abstract: Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. We propose T-PioDock, a tool for prediction of a protein complex 3D structure from the structures of its components. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by T-PIP.
First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, performs best. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing tool. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. T-PioDock is freely available for download at: http://manorey.net/bioinformatics/wepip/.
TOP
L015 - The Mechanism of Endocrine Resistance of Estrogen Receptor (ER): Structural Modeling of Mutation in Metastatic Breast Cancer Patients
Short Abstract: Resistance to endocrine therapy occurs in most patients with ER-positive metastatic breast cancer (MBC) and is attributed to various mechanisms such as loss of ER expression or altered activity of co-regulators. To our knowledge, acquired mutations of the ER have not been described as mediating endocrine resistance to endocrine treatment.
ER is a ligand-modulated transcription factor that is a target for breast cancer therapy. Several available crystal structures of ER with agonists (as estrogen) and antagonists (as tamoxifen) demonstrate the effect of ligand binding on the protein conformation, hence, its interactions with co-activator (SRC-1) in order to bind the DNA.
In this study, a novel mutation (D538G) was identified in MBC patients. Experiments in cell lines indicated constitutive ligand-independent transcriptional activity of the mutated receptor. It may enhance ligand-independent interaction with SRC-1. D538 is positioned within helix 12 of the ligand binding domain. Analysis of the wild type ER crystal structure indicates the importance of helix 12 in mediating the interaction of the receptor with co-activator SRC-1, and the effects of estrogen and tamoxifen on helix 12 conformations. In order to study the effects of D538G substitution, a structural model was generated. In silico molecular docking of estrogen and tamoxifen were preformed on both WT and D538G mutant.
The model suggests that D538G may cause a conformational change that mimics the conformation of activated receptor and interferes with binding of either estrogen or tamoxifen. This study may result in development of a new treatment against endocrine resistance in MBC.
TOP
L016 - Drug Promiscuity in PDB: Protein Binding Site Similarity is Key
Short Abstract: There is a long standing debate on the reasons for drug promiscuity (polypharmacology) - a drug's ability to bind to several targets. Based on large compound screens, hydrophobicity and molecular weight have been suggested as key reasons. However, the results are sometimes contradictory and leave space for further analysis.

Protein structures offer a structural dimension to explain promiscuity: Can a drug bind multiple targets because the drug is flexible or because the targets are structurally similar or even share similar binding sites?

We present a systematic study of drug promiscuity based on structural data of PDB target proteins with a set of 164 promiscuous drugs. We show that there is no correlation between the degree of promiscuity and ligand properties such as hydrophobicity or molecular weight but a weak correlation to conformational flexibility. However, we do find a correlation between promiscuity and structural similarity as well as binding site similarity of protein targets. In particular, 71% of the drugs have at least two targets with similar binding sites. In order to overcome issues in detection of remotely similar binding sites, we employed a score for binding site similarity: LigandRMSD measures the similarity of the aligned ligands and can be applied to arbitrary structural binding site alignments.

Our findings suggest that global structural and binding site similarity play a more important role to explain the observed drug promiscuity in the PDB than physicochemical drug properties like hydrophobicity or molecular weight. Additionally, we find ligand flexibility to have a minor influence.
TOP
L017 - Detecting repetitions and periodicities in proteins by tiling the structural space
Short Abstract: Background
Energy Landscape Theory indicates that it is much easier to find sequences that satisfy the “Principle of Minimal Frustration”' when the folded structure is symmetric. Repeats and structural mosaics may be fundamentally related to landscapes with multiple embedded funnels. Existence of repetitions does not guarantee that the system will be symmetric as these should arrange in particular ways and coalesce into higher order patterns. Detecting repeated units and patterns is a first step towards an understanding of their assembly in complete structures and the emergence of symmetry.

Description:
We present analytical tools to detect and compare structural repetitions in protein molecules. By an exhaustive analysis of the distribution of structural repeats using a robust metric we define those portions of a protein molecule that best describe the overall structure as tessellation of basic units. Patterns produced by such tessellations provide intuitive representations of the repeating regions and their association towards higher order arrangements.

Conclusion:
We find that some protein architectures can be described as nearly periodic, while in others clear separations between repetitions exist. Since the method is independent of amino acid sequence information we can identify structural units that can be encoded with different primary elements. Our methods can be applied to various topological families and resolve fine geometrical differences. Moreover, we define a metric that allows for a crude comparison of the symmetrical dispositions of repetitions between proteins of different size, topology and quaternary arrangement on the same grounds.
TOP
L018 - High precision alignment of cryo-electron subtomograms through gradient-based parallel optimization
Short Abstract: Cryo-electron tomography emerges as an important component for structural system biology. It not only allows the structural characterization of macromolecular complexes, but also the detection of their cellular localizations in near living conditions. However, the method is hampered by low resolution, missing data and low signal-to-noise ratio (SNR). To overcome some of these difficulties and enhance the nominal resolution one can align and average a large set of subtomograms. Existing methods for obtaining the optimal alignments are mostly based on an exhaustive scanning of all but discrete relative rigid transformations (i.e. rotations and translations) of one subtomogram with respect to the other.

We propose gradient-guided alignment methods based on two popular subtomogram similarity measures, a real space as well as a Fourier-space constrained score. We also propose a stochastic parallel refinement method that increases significantly the efficiency for the simultaneous refinement of a set of alignment candidates. We estimate that our stochastic parallel refinement is on average about 20 to 40 fold faster in comparison to the standard independent refinement approach. Results on simulated data of model complexes and experimental structures of protein complexes show that even for highly distorted subtomograms and with only a small number of very sparsely distributed initial alignment seeds, our combined methods can accurately recover true transformations with a substantially higher precision than the scanning based alignment methods.

Our methods increase significantly the efficiency and accuracy for subtomogram alignments, which is a key factor for the systematic classification of macromolecular complexes in cryo-electron tomograms of whole cells.
TOP
L019 - Parallel Maximum Clique Algorithm for Structural Comparisons in Bioinformatics
Short Abstract: Computing the structural similarity between proteins or chemical compounds is a crucial task in a variety of bioinformatics approaches, from function prediction to drug development, and many such structural comparisons can be modeled as a problem of finding a maximum clique in alignment graphs. We propose a new exact parallel maximum clique algorithm based on a previous leading serial algorithm for finding a maximum clique. We compare our parallel algorithm with state-of-the-art maximum clique finders on standard benchmarks as well as on constructed alignment graphs typically used in bioinformatical structural comparisons. We show that our parallel algorithm outperforms other algorithms on most graphs. Exploiting multiple cores available in a majority of current computers, enabled by our maximum clique algorithm, may be a useful strategy to speed-up proteins or chemical compounds structural comparisons.
TOP
L020 - Unraveling the structural effects of phosphorylation on the human IκBα/NF-κB complex
Short Abstract: Phosphorylation, a ubiquitous post-translational modification mechanism, is the addition of a phosphate group by a protein kinase to Ser, Thr, and Tyr sidechains. The preponderant -2 charge carried by the phosphate leads to significant electrostatic perturbations which in turn modulate the energy landscape that controls protein recognition and binding. Phosphorylation may contrive structural and dynamical modifications either in vicinity of the phosphorylation site or far from it.
To investigate the effects of this particular post-translational modification, we have, through applying molecular dynamics (MD) simulations, studied the phosphorylation of IκBα and strived to highlight its consequences. IκBα is the inhibitor of the multifunctional transcription factor, NF-κB, which has its clinical significance within its association with immune and inflammatory responses. IκBα exerts its force through masking the nuclear localization signal of NF-κB. The trigger for the disassociation of the IκBα/NF-κB complex and subsequent activation of NF-κB, lies firstly in the phosphorylation and secondly in the proteolytic destruction of IκBα.
IκBα phosphorylation, at residues Ser32 and Ser36, leads to recognition of the DSGφXS destruction motif (φ representing a hydrophobic and X any amino acid) in IκBα by βTrCP, the substrate binding subunit of an SCF-type E3 ligase. Phosphorylation is deemed to be a prerequisite for βTrCP binding, thereby linking phosphorylation-mediated signaling to protein ubiquitylation and destruction of IκBα. We have studied in detail the unphosphorylated and phosphorylated (mono- and diphosphorylated) forms of IκBα in complex with NF-κB and observed minor but significant structural changes in the phosphorylated IκBα.
TOP
L022 - On the track of TRPA1 gating mechanisms
Short Abstract: The transient receptor potential ankyrin 1 (TRPA1) channel is a non-selective cation channel which is gated by various stimuli, e.g. cooling, depolarizing voltages and diverse chemical compounds, but details of gating mechanisms are still unclear. Overall structure of the TRPA1 channel is clearly similar to the voltage gated potassium (Kv) channels, but direct structural information are limited to the map of electron density with resolution of 16 A (Cvetkov et al., JBC, 2011).

To shed some light on mechanisms of gating of the TRPA1 channel we created a homology model of its transmembrane domain and subjected it to the Molecular Dynamics Flexible Fitting procedure. It allows to fit an initial model into the known electron density map. The transmembrane domain of TRPA1 was then embedded into a phospholipid bilayer. Subsequent MD simulations pointed to residues which could be important for the TRPA1 gating mechanism. To confirm these predictions, the effects of specific mutations on channel activity were functionally assayed using whole-cell electrophysiological recordings.
TOP
L023 - Ant Colony Optimization Algorithm for Biological Hierarchical Multi-Label Classification problems
Short Abstract: The classification of proteins expressed by an organism is an important step in understanding the molecular biology of that organism. A protein can perform more than one function and many protein functional-definition schemes are organized in a hierarchical structure, then in the context of machine learning, protein classification is an instance of a Hierarchical Multi-Label Classification problem (HMC).

In HMC, each instance can be classified into two or more classes simultaneously, differently from conventional classification. Additionally, the classes are structured in a hierarchy, in the form of either a tree or a directed acyclic graph. Hence, an instance can be assigned to two or more paths from the hierarchical structure.

Bio-inspired algorithms have had significant development in different fields of application. Ant Colony (ACO) is a recently proposed metaheuristic approach for solving hard combinatorial optimization problems, inspired in the pheromone trail laying and following behavior of real ants.

We can distinguish six Ant-based algorithms: Ant System, Ant-Miner (Ant Colony-based Data Miner), MuLAM (Multi-label Ant-Miner), cAnt-Miner (Ant-Miner coping with continuous attributes), hAnt-Miner (Hierarchical Classification Ant-Miner), and hmAnt-Miner (Hierarchical Multi-label Classification Ant-Miner).

With the aim to systematize those aspects involved in the design of bio-inspired algorithms capable to face with Multi-Label Classification problems, we analyzed the main algorithmic aspects in their adaptation to face classification tasks with growing complexity, from simple to HMC. Also,we analyzed the performance metrics in the different algorithms. Finally, we proposed a development of the ACO algorithm in order to improve the performance of the classification in a biological HMC problem.
TOP
L024 - Generation and selection of VH-VL orientation templates for antibody structure prediction
Short Abstract: Antibodies are a class of proteins which play an important role for facilitating immune responses in vertebrates. Although the general structure of antibodies is well characterised, producing high-resolution models of the antigen binding site remains a challenging problem. The site is formed between the two variable domains, VH and VL, of the antibody's antigen binding fragment.
Its topology is therefore affected by how the VH and VL domains orientate with respect to one another. Subsequently, understanding and predicting the VH-VL orientation is important for antibody modelling, docking and engineering as well as for studying the mechanisms of antigen specificity and affinity.

Different VH-VL orientations are have previously been described using relative measures such as RMSD. We have developed a method to determine the VH-VL orientation in an absolute sense and implemented it in the computational tool, ABangle. This allows us to calculate differences in orientation between structures in an consistent fashion and determine how they compare to other structures globally. We have used this information to investigate how better knowledge about the VH-VL orientation can improve the generation and selection of better templates for prediction of antibody structure using comparative modelling. This work focusses on modelling antibody structure but may also be informative for more general cases of multi-domain protein modelling.
TOP
L025 - Fold Space Preferences of New-born and Ancient Protein Superfamilies
Short Abstract: The evolution of proteins, the basic unit of biological functions, is the single process that has delivered the diversity and complexity of life that we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. In contrast to their rapidly changing sequences, protein structures are largely conserved and provide a deeper phylogenetic signal, as well as a more direct relationship to the function of the macromolecule. As such, they are arguably an appropriate unit for any consideration of the evolution of proteins. Unfortunately, while the theory of sequence changes is relatively well developed, structural evolution remains much less comprehensively determined. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics.

Here we seek to identify how proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age on this space and explore the relationships between these ages and a superfamily's position within the landscape of different folds. In doing so we aim to assist in an understanding of the dynamics driving evolutionary transitions within structure space.
TOP
L026 - DynaMine: Sequence-based Protein Backbone Dynamics and Disorder Prediction
Short Abstract: Protein dynamics are closely related to their functionality, especially in disordered proteins: we here introduce DynaMine, a fast linear predictor of protein backbone dynamics from protein sequence only. DynaMine is based on statistical information derived from experimental data recorded on proteins in solution (NMR chemical shifts): this enabled us to obtain dynamics data on a continuous scale for over 200.000 amino acids. The approach bypasses the use of three-dimensional protein structure coordinates and avoids a binary order/disorder interpretation of amino acid residue behaviour. The underlying physical meaning is clear from the per-amino acid dynamics propensities, which directly correspond to known qualitative information. With regard to identifying protein disorder, DynaMine compares to the best and most sophisticated protein disorder predictors; crucially, however, it predicts dynamics, not disorder, and shows great potential in distinguishing globular domains from regions that fold upon binding, as well as being able to identify pre-structured motifs.
TOP
L027 - Molecular docking between the RNA polymerase of the Moniliophthora perniciosa mitochondrial plasmid and Rifampicin produces a highly stable complex
Short Abstract: Moniliophthora perniciosa (Stahel) Aime & Phillips-Mora is the causal agent of witches' broom disease (WBD) in cacao (Theobroma cacao). When the mitochondrial genome of this fungus had been completely sequenced, an integrated linear-type plasmid that encodes viral-like RNA polymerases was found. The structure of this polymerase was previously constructed using a homology modeling approach. Using a virtual screening process, accessing the Kegg, PubChem and ZINC databases, we selected the eight most probable macrocyclic polymerase inhibitors to test against M. perniciosa RNA polymerase (RPO). AutoDock Vina was used to perform docking calculations for each molecule. This software returned affinity energy values for several ligand conformations. Subsequently, we used PyMOL 1.4 and Ligand Scout 3.1 to check the stereochemistry of chiral carbons, substructure, superstructure, number of rotatable bonds, number of rings, number of donor groups, and hydrogen bond receptors. On the basis of this evidence we selected Rifampicin, a bacterial RNA polymerase inhibitor, and then AMBER 12 was used to simulate the behavior of the RPO-Rifampicin complex after a set of 5000 ps and up to 300 K in water. This calculation returned a graph of potential energy against simulation time and showed that the ligand remained inside the active site after the simulation was complete, with an average energy of -15 x 102 Kcal/Mol. The results indicate that Rifampicin could be a good inhibitor for testing in vitro and in vivo against M. perniciosa.
TOP
L028 - Sequence analysis and discrimination of subcellular localization of type II membrane proteins
Short Abstract: In order to carrying out protein functions, bio-synthesized proteins should be transported to particular organelles. Transport signals, including signal-peptides, contain information about the subcellular localization of proteins and are located in amino acid sequences. Membrane proteins are thought to be localized by different process from signal sequences of soluble proteins. On the other hand, the lipid bilayers of each organelle consist of different kinds and ratios of lipid molecules, and have individual characteristics. Therefore, to find the characteristics of the transport signals, sequences around the hydrophobic regions of membrane proteins were analyzed, and a computational discrimination method was developed in this study.
Data of type II membrane proteins, single-pass type membrane localized proteins which have one hydrophobic region, were extracted from Uniprot Knowledge Base/Swiss-Prot Release 2011_11. Hydropathy profiles of each protein were estimated by average hydropathy calculation. As a result, the most hydrophobic positions in each protein within the 100 amino acid residues from the N-terminus were included in annotation regions as transmembrane helices. The sequences were aligned at the most hydrophobic positions. Hydropathy profiles and position-specific amino acid propensities were different according to sequences of each organelle in N-terminus side than these hydrophobic regions. Therefore, it is thought to be possible to predict subcellular localizations of membrane proteins through the characterization and optimization of hydropathy profiles and amino acid compositions of hydrophobic regions around N-terminus side of each organelle dataset.
TOP
L029 - Sequence, structure and functional diversity of PD-(D/E)XK phosphodiesterase superfamily
Short Abstract: Proteins belonging to PD-(D/E)XK phosphodiesterases constitute a functionally diverse superfamily with representatives involved in replication, restriction, DNA repair and tRNA-intron splicing. To date there have been several attempts to identify and classify new PD-(D/E)XK phosphodiesterases using remote homology detection methods. Such efforts are complicated, because the superfamily exhibits extreme sequence and structural divergence. Using our highly sensitive, transitive homology detection approach [1], supported with superfamily-wide domain architecture and horizontal gene transfer analyses, we performed a comprehensive reclassification of proteins containing PD-(D/E)XK domain [2]. The PD-(D/E)XK phosphodiesterases span over 21 900 proteins, which can be classified into 121 different groups of various families. Eleven of them, including DUF4420, DUF3883, DUF4263, COG5482, COG1395, Tsp45I, HaeII, Eco47II, ScaI, HpaII and Replic_Relax, are newly assigned to the PD-(D/E)XK superfamily. Some groups of PD-(D/E)XK proteins are present in all domains of life, whereas others occur within small numbers of organisms. We observed multiple horizontal gene transfers even between human pathogenic bacteria or from Prokaryota to Eukaryota. Uncommon domain arrangements greatly elaborate the PD-(D/E)XK world. These include domain architectures suggesting regulatory roles in Eukaryotes, like stress sensing and cell cycle regulation. Our results may inspire further experimental studies aimed at identification of exact biological functions and specific substrates of these highly diverse proteins.

[1] Ginalski K, von Grotthuss M, Grishin NV, Rychlewski L (2004) Detecting distant homology with Meta-BASIC. NAR 32, W576-81.
[2] Steczkiewicz K, Muszewska A, Knizewski L, Rychlewski L, Ginalski K (2012) Sequence, structure and functional diversity of PD-(D/E)XK phosphodiesterase superfamily. NAR 40(15):7016-45.
TOP
L030 - Correlations between protein secondary structure and glycosylation
Short Abstract: In recent years, carbohydrate chains have been considered as “the third chains” in life science. The glycobiology study had been expanded worldwide, through development of comprehensive analysis and synthetic technique of chemical structure, gene discovery of the glycosylation related proteins and functional analysis of carbohydrate chains. Bio-synthesized proteins are transported to specific organelles, receiving post-translational modifications (PTMs), and modified proteins express various functions. Glycosylations, one of the PTMs, are known that glycosyltransferases recognize specific motif sequences (N-linked glycan: Asn-X-The/Ser, O-linked glycan: Thr/Ser). Many glycoproteins and glycolipids are related to intra-cellular networks, due to controlling signal transduction systems and enzyme activities by cytoskeleton formation and ligand structural change, respectively. Difference of sugar types and modified positions, bringing variations of protein structures and functions, is caused by the selectivity of glycosyltransferases.
In this study, to find correlation between glycosylations and secondary structures, the three-dimensional coordinate data of atoms in amino acids around the glycosylation sites were extracted from Protein Data Bank. The data was classified into sugar types. Propensities of relative side chain accessibilities and secondary structures were calculated from the three-dimensional coordinates. As a result, low tendency of exposure degrees suggested the modification sites to be buried. Differences of secondary structural properties between N- and O-glycan were found in amino acid sequences modified by N-acetylglucosamine (GlcNAc). Correlations between glycosylations and secondary structure formation of proteins were discussed based on those properties.
TOP
L031 - MetaDock: enzyme substrate identification using protein-ligand docking
Short Abstract: The rapid growth in the number of known proteins without experimentally determined/assigned biological function, makes molecular docking a valuable in silico tool for enzyme function identification. However, the benchmarking studies published so far, have shown that searching for the native substrates in different enzymes datasets is still a challenging task. In our work we have performed an extensive cross-docking experiment on the dataset comprising of non-redundant Escherichia coli enzymes that metabolize different types of molecules. We have analyzed both the ability of individual programs to predict each enzyme cognate ligand as well as applied the VoteDock consensus approach to improve the quality of the results. Our data clearly show that all tested programs have obtained a reasonable accuracy when recognizing its native substrate, product or cofactor. In fact for more than 60% of the proteins from our data set, programs can identify their native molecule in top 10% of the set of docked molecules. Yet, applying our consensus approach to this problem is still a challenge, as we did not obtained an expected and sufficient boost in docking results quality, as reported previously.
TOP
L032 - Molecular Modeling and Binding Site Identification of Heat Shock Protein HSP27 to understand Drug Mode of Action
Short Abstract: The expression of the small heat shock protein HSP27 is induced under adverse cell conditions. It plays a vital role in necrosis and apoptosis being associated with increased tumorigenicity in cancer cells and thus forming obstacles in chemotherapy. In an earlier study, we reported that Brivudine (BVDU) improves chemotherapy in pancreatic cancer. In the present study, we performed affinity chromatography analyses, which revealed that BVDU binds to HSP27. To get a deeper understanding of this interaction at structural level, we developed a homology model of HSP27. To confirm the model’s quality and structural stability, we performed extensive molecular dynamics simulations. Three deepest pockets with best conservation of residues were predicted using LIGSITEcsc. These sites were studied manually and compared to the amino acid composition of the known BVDU binding site in a viral thymidine kinase. The BVDU binding site of the kinase has two phenylalanine residues binding BVDU via a pi-pi-interaction. We observed a similar binding site in the HSP27 homology model (Phe29 and Phe33) and verified this binding site experimentally by introducing point mutations at the positions 29 and 33. The mutant did not bind BVDU, in contrast to the wild type HSP27. Moreover, BVDU sensitized to heat shock and reduced tumor invasion in vitro.
TOP
L033 - Statistical potential for assessment of membrane protein structures
Short Abstract: Membrane proteins represent 25% of all human proteins and nearly 50% of them are drug targets. Knowing the structure of a membrane protein is helpful to characterize its function and mechanism at the molecular level. Despite major advances in solving structures experimentally, most membrane protein structures remain unknown. This lack of available structures, along with the physical constraints imposed by the anisotropic environment of the lipid bilayer, constitutes a difficulty for membrane proteins modelling. Assessing the quality of membrane protein model is therefore critical.
In this study, we have developped a knowledge-based scoring function to distinguish between native, or near-native structures, and non-native ones, using a non-redundant set of 130 membrane proteins sharing no more than 30% of identity. This distance-dependent statistical potential is specific of the location in the lipid bilayer of each interacting residue. Deriving an accurate statistical potential from such a small data sets is challenging. To overcome that difficulty, we have based the construction of our potential on a kernel density estimation of distances distributions. By a leave-one-out cross-validation procedure, we show that our potential outperforms a potential optimized on globular proteins (DOPE) in discriminating native membrane protein structures from random sequence decoys. Our scoring function is also more efficient than DOPE in separating accurate membrane protein models from inaccurate ones. These results suggest that our potential will be usefull for the modeling of membrane protein structure.
TOP
L034 - The nAnnoLyze program for target identification. Application to the TB-TCAMS dataset from GSK
Short Abstract: Tuberculosis is a worldwide spread disease that results in tens of millions of new
infected individuals every year. Moreover, the appearance of resistance to current
available antibiotic treatment together with the co-infection with HIV virus makes
this disease very difficult to treat and a source for future epidemic catastrophes
(predominately in sub-Sahara areas in Africa). Nevertheless, tuberculosis has remained
neglected for many decades and big pharmaceutical companies have dedicated limited amount of resources to the discovery of new treatments against it. The recent wave of openness is changing this trend. Recently, GlaxoSmithKlime has embarked in a large-scale screening of the about two million chemical compounds in their library for
anti-mycobacterial phenotypes. Such screening has resulted in the release of 776 compounds (called the TB-TCAMS), including 177 validated non-cytotoxic H37Rv potent hits, with clear activity against tuberculosis [1]. We will introduce the nAnnolyze program, a network-based improvement of the original AnnoLyze program [2] and
its application to identify the likely targets in Mycobacterium tuberculosis for the 776 compounds from TB-TCAMS.

1. Ballell L, Bates RH, Young RJ, Alvarez-Gomez D, Alvarez-Ruiz E, et al. (2013) Fueling Open-Source Drug Discovery: 177 Small-Molecule Leads against Tuberculosis. ChemMedChem.

2. Marti-Renom MA, Rossi A, Al-Shahrour F, Davis FP, Pieper U, et al. (2007) The
AnnoLite and AnnoLyze programs for comparative annotation of protein structures. BMC Bioinformatics 8 Suppl 4: S4.
TOP
L035 - Genetic Network and Structural Characterization of Anti-apoptotic Proteins in Cell Death Pathways
Short Abstract: This poster is based on Proceedings Submission xxx
The rapid development of second-generation sequencing technology has enabled the description and characterization of the whole genomes of a wide range of species. However, despite the rapid growth in the number of experimentally determined protein structures in the PDB repository, many membrane proteins have not yet been described at the level of their three-dimensional structures. Although several basic predictive tools are available, these have limited power and application. In contrast, the use of more advanced de novo computational method like Robetta and homology modelling can provide immediate insights in the understanding of uncharacterised gene products. Cellular genetic network interaction and protein-protein interactions can provide much information on a protein’s functional niche in the dynamic cellular system. This is an essential component in the drug discovery process and has special application in the field of anti-cancer therapeutic agents. In this poster, we describe the characterization of novel protein folds, three-dimensional structures, and network interaction of anti-apoptotic proteins and their roles in the regulation of cell death pathways.
TOP
L036 - Derivation of rules for comparative RNA modeling from a database of RNA structure alignments
Short Abstract: New RNA structure prediction tools are needed for fast obtaining detailed structural information of new RNA sequences. Here we propose to use knowledge-based statistical potentials and a fragment-based modeling approach as input to predict RNA structure from sequence. As initial step, we have downloaded a dataset composed of all X-ray determined RNA 3D structures from the Protein Data Bank [1]. From the initial 1,792 files, 1,538 non-redundant RNA structures were selected after filtering small sequences (<20 nucleotides). The CD-HIT program [2] was used on those sequences to derive 296 sequence families (95% sequence identity) after removing sequences with low number of canonical base-pairs (<10%) and low-resolution PDBs (>3 Å). The SARA method was then used for generating an all-against-all alignment of a representative of each RNA family from our dataset. Finally, the resulting alignments were used as an input into the ModDom program to derive a set of conserved RNA fragments. The alignments were also used to derive a set of structural properties describing RNA structure. Such properties will be further analyzed for conservation against the fragment dataset, which will result in a series of knowledge-based statistical potentials from this dataset.

References:
1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The Protein Data Bank. Nucleic Acids Res 28: 235-242.
2. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658-1659.
TOP
L037 - Exploiting the protein databank to propose additional conformations of a query protein of known structure
Short Abstract: Proteins often alternate between several conformations, e.g., active and inactive states of receptors, open and closed states of channels, etc. However, in many cases only one conformation is known. The detection of more (biologically-relevant) conformations of a protein would provide more insight on its function in health and disease. In this work, we model putative conformation(s) of a query protein with one known conformation by assuming that pairs of structurally similar proteins may also share similar conformational changes. We propose a three-step automated method: First, the protein databank is searched for structurally similar proteins to the query. Second, pairwise structural alignments are built between the query protein and each of the structurally similar proteins. And last, other known conformations of the latter proteins are indicated. By using the alignments found in the second step, and modeling on the structural templates found on the third step, the method suggests new conformations for the query protein. To demonstrate the approach we studied the Epidermal Growth Factor Receptor (EGFR) kinase domain, the structure of which is known both in the active and inactive conformations. Using the EGFR active conformation as our query, we reproduced the inactive conformation with root mean square deviation (RMSD) under 1.35Å, based on the structural similarity to the active conformation of C-SRC tyrosine-kinase and the known inactive conformation of this protein. The sequence identity between the two kinase domains is only 37% and the fact that they share similar active and inactive conformations might not be obvious.
TOP
L038 - A new web server for structural analysis and comparative modeling of protein interactions
Short Abstract: Easy-to-use computational tools for structural analysis of protein-protein interactions would be valuable for many researchers. As the number of known experimental structures involving protein interactions increases, comparative modeling of the three-dimensional (3D) structures of protein complexes is expected to be applicable in many cases. Therefore, we decided to develop a new method for the analysis of protein complexes and prediction of their 3D structures. First, we developed a database of protein complexes based on the Protein Data Bank (PDB) Biological Assembly data. These data were preprocessed to facilitate further analysis. Where possible, protein structures were parsed into structural domains. All interacting domain pairs were then detected and their interaction interfaces were analyzed by means of Voronoi tessellation of protein structures. This procedure allows enumeration and detail analysis of contacts between amino acids of different domains. In order to reduce the redundancy of PDB data, clustering of the interaction interfaces was performed utilizing CAD-score (Contact Area Difference score) and the fraction of common contacts. Both scores are intuitive and biochemically interpretable interface similarity measures. Having the preprocessed structural data, we developed a web server for the analysis and comparative modeling of protein complexes. The server takes sequences of two interacting proteins as a query and searches for homologous interactions in the database. If found, a 3D model of a complex can be built using template-based approach. This straightforward and user-friendly web-based software should ease the analysis of protein complexes for biochemists and molecular biologists, not familiar with sophisticated computational techniques.
TOP
L039 - The use of interatomic contact areas for the assessment of RNA 3D structural models
Short Abstract: The growing interest in RNA also led to the increased activity in the development of computational methods for the prediction of RNA three-dimensional structure. The critically important component in the development and benchmarking of such methods is the ability to objectively evaluate RNA models against the experimentally determined reference structure. We propose a new, superposition-free, interatomic contact area-based method for the evaluation of RNA structural models. This method is the adaptation of CAD-score (Contact Area Difference score), our recently developed score for proteins, to RNA. The main idea of the method is to use contact areas between nucleotides or subsets of nucleotide atoms (bases, main chain) to quantify both the local and the global accuracy of RNA models. Contacts and contact areas are derived from the Voronoi diagram of spheres that correspond to heavy atoms of van der Waals radii. Dominating contacts in RNA are those between nucleobases. Therefore, we further distinguish stacking and non-stacking contacts between nucleobases in order to provide more detailed information about the type of errors in the model. We tested the new contact area-based evaluation method on a large number of RNA models and found that it is able both to effectively point out local errors in a model and to rank models by their overall quality. We believe that the new method should be useful not only for the developers of RNA structure prediction methods but also for the RNA community at large.
TOP
L040 - dcGOnet: combining domain-centric ontologies with network biology for inferring disease-drug-phenotype matrix
Short Abstract: The dcGO database contains protein domain annotations with Gene Ontology (GO) and a dozen of other biomedical ontologies on diseases, phenotypes and drugs. To maximize its utility, we here introduce a functional domain network called ‘dcGOnet’. This network has nodes consisting of protein domains (at the superfamily/evolutionary level), with edges weighted by the semantic similarity according to dcGO annotations. A global random walk on the dcGOnet is able to recover labels for disease, drug or phenotype-related ontologies, supporting the hypothesis that functionally related domains tend to impact the same drugs/diseases/phenotypes as each other. Based on the dcGOnet and its global properties, we further develop an approach to build a disease-drug-phenotype matrix. Most of the high-ranking predictions recover connections that are well known, but others uncover connections that have only suggestive or obscure support in the literature. Together, combining network biology with ontology annotations, both at the domain level, allows for the extraction of cross-knowledge and cross-species relationships that are often obscured from a simple protein level view. The network, the implementation and the inferred matrix are available at http://supfam.org/SUPERFAMILY/dcGO/dcGOnet.html.
TOP
L041 - Using Molecular Dynamics to Explore the Effect of N-linked Glycosylation on the Conformation, Mobility and Flexibility of the HIV gp120 Surface Protein
Short Abstract: N-linked glycans (carbohydrates) bound to amino acids on the HIV-1 gp120 trimer form a “glycan shield” that protects the virion from recognition and neutralisation by the host immune system. The conformation of these glycans influences binding to target cell chemokine receptors (coreceptors), and also comprises critical elements of recently described viral epitopes for broadly cross-neutralising (BCN) antibodies. The importance of the glycan distribution together with the interaction between gp120 and the glycans has been suggested, but limited work has been carried out to understand the structural features and conformational changes imposed by a given glycan profile.

Here we apply a computational molecular dynamics approach, using GROMACS, to 1) describe the conformational differences between a non-glycosylated and glycosylated HIV-1 gp120 protein structure, 2) identify structural attributes introduced by specific glycans that may play a role in HIV-1 coreceptor tropism, and 3) describe the change in HIV-1 gp120 dynamics when shifting a glycan between adjacent N-linked glycosylation sites known to affect BCN antibody susceptibility. The analyses of the simulations provide evidence that the glycan profile on the surface of HIV-1 gp120 has a significant impact on the conformation, mobility and flexibility of the underlying protein.

Understanding the role that the glycan conformation has on viral tropism and antibody neutralisation is important for the design of potential carbohydrate-binding HIV therapies. We present a clear illustration of the features of the “glycan shield” that may affect HIV-1 gp120 morphology and thereby allows the virus to escape the host immune response and selected antiretroviral drugs.
TOP
L042 - In silico studies of the interaction between the nuclear localization sequence from Ku70 and the Importin-α
Short Abstract: The nuclear import system is responsible for exchange of protein molecules between cytoplasm and nucleus, allowing proteins with nuclear function to migrate through the nuclear envelope of the cell. One example of a nuclear protein is the Ku70, belonging to the group of proteins involved in the DNA double-strand break repair pathway by non-homologous end joining. The classical nuclear import pathway mediates the translocation of the Ku70 to the nucleus. This mechanism involves Importin-α (Impα) and Importin-β (Impβ), with the direct binding of Ku70 to Impα due to the recognition of its nuclear localization sequence (Ku70NLS). So far, crystallographic studies show that only few residues close to the C-terminus region of the Ku70NLS interact with Impα. However, there are Ku70NLS mutagenesis experiments suggesting the possible role of other residues in the recognition of Ku70 by Impα. Therefore, we tested the involvement of unreported residues of the Ku70NLS during the Impα recognition. To accomplish that goal, we used techniques of docking, molecular dynamics simulation (MD) and normal modes analysis (NMA). Based in our preliminary results, we evidenced interfacial hydrogen bonds, salt bridges and hydrophobic interaction effects involving the residues near to the N-terminus region of the Ku70NLS with Impα. The key participation of these residues, together with the ones determined in previous studies, show to play a role in the early stage of the nuclear import of Ku70.
TOP
L043 - Structural modelling of the HIV-1 coreceptor usage tropism
Short Abstract: For a successful entry, HIV-1 binds two proteins on the surface of the host cell: the CD4 receptor and either CXCR4 or CCR5 (both belonging to the family of seven-helix G-protein coupled receptors), which in the context of viral binding are termed coreceptors. The viruses that are able to bind exclusively CXCR4 or CCR5 are called X4- or R5-tropic viruses, respectively, viruses capable of binding both corecetors are said to have dual tropism. The correct identification of viral tropism is of pivotal role for choice of the anti-viral treatment, since the interaction with CCR5 can be inhibited without major consequences to the host organism, whereas there is no approved drug to block CXCR4.

Previously, we have developed a model that predicts R5-tropism versus X4- or dual tropism. In this study, we have performed modelling of interactions between viruses of different predicted tropism with their coreceptors with the tools of structural bioinformatics. The most important interaction takes place at the pocket on the extracellular side of the coreceptor, but prior to that the initial contact between the virus and the host cell is governed by the charge of the HIV envelope. The analysis of the endogenous ligands of CXCR4 and CCR5 supports this proposition. We hypothesize that the difference in charge of the envelope between the R5- and X4/dual-tropic viruses can be achieved through a more extensive glycosylation characteristic to R5-tropic viruses.

These findings have further implication for anti-HIV treatment and vaccine development.
TOP
L045 - Assessing Quality of Competing Structural Alignments: An Objective Measure Based on Information Content.
Short Abstract: The field of protein structural comparison lacks consensus on how to assess the quality of structural alignments. The difficulty stems from the opposing objectives of maximising alignment coverage whilst minimising loss of fidelity under structural superposition. This has resulted in a number of ad hoc quality measures that aim to obtain a reasonable trade-off between these objectives.

We propose a new method to evaluate alignment quality using the natural and objective measure of information content. We treat structural alignments as hypotheses that explain (or, losslessly compress) the coordinates of two structures. The quality of any alignment can therefore be quantified using its explanation message length. The utility of this measure is demonstrated by the following properties: The negative logarithm posterior probability of a given alignment varies according to its explanation message length; the difference in explanation message lengths of two competing alignments is their log-odds ratio. There is a natural null-hypothesis test and any alignment hypothesis that fails it can be rejected. The measure achieves an objective trade-off, free of artificial parameters and cut-offs. Further, our measure easily handles shifts and hinge-rotations commonly observed in protein structures.

We tested the discriminative power of our measure on a number of examples and compared it with other popular measures of alignment quality. Our analysis shows that our measure is capable of making an objective trade-off when selecting the best structural alignment. A web server implementing the proposed measure is available at: http://lcb.infotech.monash.edu.au/I-rate
TOP
L046 - Structural Capacitance in Protein Evolution and Human Diseases
Short Abstract: Disordered Regions (DRs) are defined as protein regions lacking stable well-defined 3D structure. DRs are ubiquitous in proteins and highly related to human disease. Here, we report a new mechanism termed ‘structural capacitance’, where gain-of-structural and functional changes at the individual protein level may be achieved through the introduction of point mutations into the germline that increase the hydrophobicity of key nucleating amino acids located in predicted regions of structural disorder. As a consequence, new microstructures are generated in previous disordered regions in the protein to obtain novel functions.
Using two widely used prediction tools of disordered regions (VSL2B and IUPred), we curate a large-scale up-to-date human disease mutation dataset that contains 26,125 disease-associated mutations. We define the disorder state changes of regions with mutations into four categories: order-to-disorder (O→D), order-to-order (O→O), disorder-to-order (D→O) and disorder-to-disorder (D→D). We find that D→O transitions tend to increase the hydrophobicity after mutation, while O→D transitions have the opposite effect and for O→O and D→D transitions, the changes are not significant. Several D→O disease mutations found in long DRs, we predict that the mutation does not increase the aggregation propensity of the protein, suggesting that the mutation exerts its affect not by aggregation.
While much emphasis on protein disorder focuses on lose-of-structure accompanied by lose-of-function, we propose a new paradigm of understanding human diseases through mutations that cause gain-of-structure/function via D→O transitions. These ‘structural capacitance residues’ (mutations in predicted disordered regions) could represent new epitopes worthy of further experimental investigation.
TOP
L047 - COMPARTMENTS: Using text-mining approach to unravel protein localizations
Short Abstract: Function of proteins are heavily depend on their subcellular localization, and it is known
that many diseases are caused by inefficient, or wrong localization of the proteins. Various biological experiments depend on where in the cell a protein is expressed, but there is no single resource that collects evidence on subcellular localization and provide an overview. We have developed an all-in-one web resource for subcellular localization information by combining curated knowledge, literature-mining and software predictions.

We collected all annotated knowledge from SwissProt, SGD, FlyBase, WormBase and MGI and we used prediction softwares like YLoc and PSORT to process 1,684,376 unique protein sequences. As a new source of information, we used an in-house text-mining tool to find pairs of co-mentioned proteins and subcellular localizations in 23 million PubMed abstracts.

Upon a query we show an overview figure, where proteins are categorized into 12 organelles of the cell and different shades of green correlates with the strength of underlying evidence. Further down the page detailed tables of different channel are provided with link-outs for the above mentioned databases, and showing the relevant PubMed abstracts in order to support literature-mining results.

COMPARTMENTS is available at: http://compartments.jensenlab.org
TOP
L048 - Insights into Heat Shock Proteins and Hop Domain Structuring and Functioning as Revealed by Phylogenetic and Motif Analyses in Eukaryota
Short Abstract: The heat shock organizing protein (Hop) is important in modulating the activity and co-interaction of two chaperones; heat shock protein 90 and 70 (Hsp90 and Hsp70). Recent research suggests that Plasmodium falciparum Hop (PfHop), PfHsp90 and PfHsp70 may interact and form complex in the trophozooite infective stage. However, there has been almost no computational research on malarial Hop protein in complex with other malarial Hsps. This work is based on the in-silico characterisation of PfHop, which showed that Hop is very well conserved across 60 species. Homology modelling was used to predict protein structures for these interactions in PfHop, and human Hop (HsHop) in complex with its own Hsp90 and Hsp70 C-terminal peptide partners. The analyses indicated excellent conservation of the concave TPR sites bound to the C-terminal motifs of partner proteins. Motif analysis was combined with phylogenetic trees and structure mapping in novel ways to attain more information on the evolutionary conservation of important structural and functional sites on Hop. Alternative sites of interaction between Hop TPR2 and Hsp90’s M and C domains were found to be distinctly less conserved, but still important to complex formation, making this a likely interaction site for selective drug targeting. Binding energies for all complexes were calculated; indicating that all HsHop TPR domains have higher affinities for their respective C-terminal partners than do their P. falciparum counterparts. This work is being expanded to understand the structure and functioning of other chaperone proteins in several apicomplexan species, particularly those involved in mammalian disease.
TOP
L049 - COARSE-GRAINED SIMULATION: FAST AND ACCURATE CALCULATION OF PROTEIN BINDING AFFINITY
Short Abstract: Protein-protein complexes are involved in many biological process. A thorough knowledge of Protein-protein interaction is critical in revealing how two proteins interact with each other and form a complex. Although many efforts have been devoted to the development of methodology for this purpose, both prediction from sequence and protein docking methods all have particular limitations. Furthermore, current scoring functions in protein-protein docking are very much limited in their ability to predict binding affinity.
In our former study, we have successfully applied coarse-grained simulation to calculate binding free energy. In the current work, we have used this method to estimate the binding affinity of 81 protein complexes from the Protein-Protein Docking Benchmark 3.0. For these complexes, dissociation constants (Kd’s) span 10 orders of magnitude (10-14 > Kd > 10-5), corresponding to Gibbs free-energy differences of binding (ΔGbind) in between 7 and 20 kcal/mol.
We assess the reliability of estimation of the binding affinity using our course-grained simulation approach to calculate the potential of mean force for these 81 complexes.
TOP
L050 - Protein Inter-Conformational Movement Modeling Based on Mass Transportation Principle
Short Abstract: One of the most essential and widespread problems of structural biology is to predict how a protein moves from one given conformation to another. A number of methods based on different protein representations have been proposed. An approach based on a coarse-grained protein model is presented and studied. The aim of the study is a construction of long-term conformational movements for further dynamic docking and protein functional research.
A movement of a protein is presented as a series of protein conformations. The first and the last conformations in the series correspond to the given ones. The intermediate conformations are constructed on the basis of the first and the last ones. Protein atoms are supposed to move at kinked curves that connect positions of the atoms in the conformations. The kinked curves approximate well the true trajectories of the atoms as the number of intermediate conformations grows. The cost of a transformation presented in the described way is a function of a sum of distances passed by each atom between adjacent conformations multiplied by corresponding atomic masses. In other words, a movement cost is calculated in accordance with the mass transportation principle.
Given the model described above, the protein movement is derived by minimizing its cost as a function of torsion angles of intermediate conformations. Using torsion angles for conformation specifying ensures that bond lengths and planar angles remain unchanged. Also a number of constraint functions to restrict inadmissible movements like backbone self-intersections are introduced and included in the presented model.
TOP
L051 - Peptide design using freely available tools and efficient sampling algorithms
Short Abstract: Artificial peptides have different important applications, most notably as potential drugs and biomarkers. While in vitro peptide design has a long tradition, illustrated by the plethora of published phage display experiments, their design in silico has become an area of special interest only in recent years. In particular, it is yet largely unclear how well existing docking tools perform for (short) peptide ligands as opposed to other small molecules. More research is thus required to make rational peptide design a means to guide and partly replace more resource-intensive experimental approaches.

Here, we assess the suitability of Autodock Vina for high-throughput protein-peptide docking and, based on this, propose a generic framework for the design of short peptides optimised to bind specific target proteins. Our pipeline comprises a set of (exclusively) freely available tools to prepare, dock and score peptide ligands. To efficiently sample from millions of possible peptide sequences, in a typical experiment, we have implemented a simple and intuitive evolutionary algorithm. This is extended by exhaustive searches, and both can be combined with the use of consensus sequences and reduced amino acid alphabets.

As a proof-of-concept, we present a set of designed peptides that are predicted to bind their respective protein targets with high affinity, making them promising targets for experimental validation. We further assess how well published peptide sequences and their reported binding affinities comply with the results obtained. Finally, we compare the observed usage of binding residues and strategies with those reported in recent theoretical and empirical works.
TOP
L052 - lDDT: A superposition-free similarity measure for protein structures and prediction models
Short Abstract: The local which Distance Difference local Test distance (lDDT) is a superposition-free protein structure similarity measure, which was first introduced in CASP9. It evaluates how well interatomic distances in a reference structure are reproduced in prediction models. Here we present an improved version of lDDT, which allows the use of ensembles of equivalent structures as reference and includes a validation of the stereo-chemical plausibility of models. Furthermore we introduce the lDDT web server.
TOP
L053 - firestar ligand and binding site prediction: beyond functional annotation
Short Abstract: firestar is a sequence profile-based method for predicting small ligand binding residues and catalytic sites. It is based around FireDB, a database that collects and categorizes small ligand binding information from the PDB. FireDB and firestar have been shown to be state-of-art methods in the last three CASP experiments.
Beyond the functional annotation of novel proteins, firestar has also been integrated in APPRIS, an automatic system for annotating principal splice isoforms that is part of the GENCODE human genome annotation. We have carried out a statistical analysis of all firestar human genome predictions and have detected small ligand binding sites for more than nine thousand human genes, predominantly metal binding sites, nucleotides and cofactors.
We also performed an analysis of the families in the Pfam functional database. Using FireDB and firestar we annotated 4,190 hand curated Pfam-A domains with biological ligands. The results from firestar will add value to the annotations of Pfam families and aid in the organization of Pfam clans.
TOP
L054 - PSSH2 - Database of sequence to structure alignments
Short Abstract: We present PSSH2, a completely reworked version of the PSSH database (Schafferhans et al., 2003), enlisting all alignments of Uniprot sequences to related structures.
The data generation process now uses HHblits to build sequence profiles and efficiently search the available structures. We have validated our process using known structures to optimally balance alignment accuracy, retrieval of even distantly related structures and detection accuracy.
The poster details the database structure and design principles. It gives statistics of the coverage and depth of our sequence-to-structure mapping, which brings the coverage of structure space from about 0.3% of structurally solved protein sequences to about 50% of sequences with structural information of homologs.
The PSSH2 database builds the foundation of the new Aquaria service (follow up project of SRS 3D), which aims to make structures discoverable. Structures can then be viewed together with sequences, alignments and sequence features.
TOP
L055 - Structural insights into E.coli porphobilinogen deaminase during synthesis and exit of 1-hydroxymethylbilane
Short Abstract: The formation of 1-hydroxymethylbilane (HMB) through a step-wise polymerisation of four molecules of porphobilinogen (PBG) is catalysed by porphobilinogen deaminase (PBGD), using dipyrromethane cofactor. Earlier studies have suggested residues with catalytic importance, but their specific role in the mechanism and dynamics of the protein vis-a-vis the growing pyrrole chain remains unknown. Molecular dynamics simulations of the protein through the four stages of chain elongation were performed to understand the concomitant structural changes. The compactness of the overall protein decreases progressively with addition of each PBG. Essential dynamics analysis shows that domain 1 (1-99, 200-217) and domain 2 (105-193) move apart while the cofactor turn region (240-243) moves into the active site towards domain 2, thus creating space for the additional PBG moieties at each stage. Residues D50, K55 and R149 play a significant role in active site loop (40-63) modulation, while R11, D84 and R176 are involved in protein catalytic mechanism as supported by biochemical studies. Further, on removal of HMB, structure of PBGD gradually regains its compactness to resemble its initial structure, ready to resume its catalytic cycle. Steered molecular dynamics to study the exit of HMB from PBGD suggests probable path for the exit of HMB, through the interface between domain 1, domain 2 and active site loop. Residues R11, Q19 and R176, reported as catalytically important residues are also seen to play a role in the exit of HMB.
TOP
L056 - Announcing Enhancements and Improvements to the Protein Homology Modelling Web Server SWISS-MODEL
Short Abstract: SWISS-MODEL is a widely-used web service for protein homology modeling. Around 2000 models are built every day. Since these models are primarily motivated by a particular biological research question, the biological context of these models plays an important role. Current efforts in the development of SWISS-MODEL focus on increased biological relevance of the models, and an improved user experience: For new users, an automated modeling pipeline performs template identification, template selection, and model building without user intervention. Experienced users are given full control of the modeling steps: in a new template selection step, biological knowledge can be incorporated into the selection process. Models are built in their correct oligomeric state, and include relevant ligands and co-factors. The models are presented in a modern web-interface, providing 3D visualisation of the built models.
TOP
L057 - Identification of Inhibitors Blocking Interactions between HIV-1 Integrase and Human LEDGF/p75: Mutational Studies, Virtual Screening and Molecular Dynamics Simulations
Short Abstract: The HIV-1 integrase (IN) mediates integration of viral cDNA into the host cell genome, an essential step in the retroviral life cycle. Human lens epithelium-derived growth factor (LEDGF/p75) is a co-factor of HIV-1 IN plays a crucial role in HIV-1 integration. Because of its crucial role in the early steps of HIV replication, the IN-LEDGF/p75 interaction represents an attractive target for anti-HIV drug discovery. In this study, the LEDGF/p75 binding pocket of IN interaction was studied by in silico mutational studies using molecular dynamics simulations. The results showed that the IN mutations (Q168A, E170A, H171A and T174A) in the α4/5 connector impaired the interaction with LEDGF/p75. All the crucial residues identified in mutational studies were identified in as the binding site residues. We screened ChemBridge database through three different protocols of docking simulations of varying precisions and computational intensities. We have selected six compounds analyzing the interactions with the important amino acid residues of IN, binding affinity and pharmacokinetic parameters. Finally, we performed MD simulations for a time scale of 10ns each, to examine molecular interactions between protein-ligand complexes. Results show the stable binding of compounds at the α4/5 connector of HIV-1 IN. These finding could be helpful for blocking IN-LEDGF/p75 interaction, provides a method of avoiding viral resistance and cross-resistance.
TOP
L058 - Linking structural information and evolutionary sequence variation to reveal the functional relevance of GPCR variants
Short Abstract: Individual differences in GPCR sequences raise the question of their functional relevance. Evolutionary processes changed each GPCR over time and left signatures in the sequence variation of extant species (orthologs). We postulate that a comprehensive set of orthologous sequences of a GPCR is adequate to predict the functional relevance of mutations.

Here, we present a web-based system for evaluating molecular effects of genetic variations based on their evolutionary relationship (P2Y12 mutant library; www.ssfa-7tmr.de/p2y12). We utilized a model protein, the ADP receptor P2Y12 to elucidate the congruence between sequence and structural variability or conservation in different species. The feasibility of this approach was verified by comparing evolutionary conservation of available ortholog sequences with functional data of a comprehensive in vitro mutant library (19 possible variants at 66 contiguous positions). Our study revealed that the amino acid variability to assure proper receptor function highly correlates (~90%) between in vitro experimental and in vivo sequence data. This confirms our initial hypothesis that natural sequence variation (orthologs) helps to guide functional studies in GPCRs. Furthermore, we analyzed P2Y12 variants known to cause platelet defects. Homology models for all orthologs were generated using our GPCR-Sequence Structure Feature Extractor (SSFE; www.ssfa-7tmr.de/ssfe) and their impact was evaluated by comparing their sequence variation with the structural space surrounding a specific natural variant.

Our approach as a systematic proof of principle has shown that comparative sequence data are sufficient to predict the functional relevance of individual P2Y12 variants and it is likely to be applicable to other GPCRs.
TOP
L059 - The relevance of Systems Biology in Relational Learning to annotate human protein with Reactome pathways
Short Abstract: Motivation:
Analyzing the relevance of the Systems Biology area in Relational Learning for protein functional annotation Prediction is a task of interest, due to the great number of associations, and to the multiple ways in which a protein could influence in the function of others.
Our analysis has been developed in a real relational problem, i.e. the prediction of human protein to belong to Reactome pathways.

We propose to study and to compare different representations of proteomic knowledge and relational learning techniques, being assessed their influence in functional annotation. We focus on testing the use of a relational representation and relational learning in front of the propositional (i.e. classical) Machine Learning, because they allow us to take advantages from the interactions and other important biological associations. Besides, we face up to multi-class and multi-label classification, since it is a very frequent context in functional annotation problems.

Results:
We explore, analyze and conclude some strategies (knowledge representation, learning technique and configuration) to follow in future application of Machine Learning to solve functional annotation tasks in Molecular Biology. According to the objectives and to the problem characteristics, we suggest the use of a specific approach instead of others. We propose when to use a relational representation or a propositional representation or a combination of both, when it is advisable to include functional annotations of interaction partners and proteins related by homology, under what conditions to use an only multi-classifier opposite to several individual classifiers, and how to manage missing values, among others.
TOP
L060 - Where are we in protein structure prediction today? Evaluation of CASP10 template free models
Short Abstract: We present the assessment of predictions for 20 Template Free Modeling (FM) target domains in CASP10. Models were first clustered so that duplicated or very similar ones were grouped together and represented by the model with the highest GDT_TS score in the cluster. They were then compared with targets using 5 different scores, GDT_TS, QCS, Handedness, Correlation of Distance Matrices and Deformation scores. The latter 3 score functions newly developed for CASP10 are superposition-independent, and complemented GDT and QCS. For each target, top 15 representatives from each score were pooled to form the Top15Union set. All models in this set were visually inspected by four of us independently using the new plugin, EvalScore, which we developed with the UCSF Chimera group. The best models were selected after extensive debate among the four examiners. The prediction groups were ranked by the number of times their models were selected as one of the best models. Keasar group submitted the most (4 out of 20 targets) best models. Among the prediction servers, QUARK from Zhang’s group performed the best (3 out of 20). As observed in CASP9, many successful modelers were “meta-predictors”, who select models from automated servers and then modify them, to only a modest extent in most cases. No one group dominated the FM category. New/better ab initio methods are very much needed for template free and multi-domain protein structure prediction.
TOP
L061 - mpMoRFsDB: A database of molecular recognition features (MoRFs) in membrane proteins
Short Abstract: Molecular Recognition Features (MoRFs) are short, intrinsically-disordered regions in proteins that undergo a disorder-to-order transition upon binding to their partners. MoRFs are implicated in protein-protein interactions, which serve as the initial step in molecular recognition. The aim of this work was to collect, organize and store all membrane proteins that contain MoRFs. We focused in membrane proteins, as they constitute one third of fully sequenced proteomes and are responsible for a wide variety of cellular functions. Data were initially collected from Protein Data Bank (PDB) and Uniprot and were managed with Perl scripts. MoRFs were classified according to their secondary structure, after interacting with their partners. We identified MoRFs both in transmembrane and peripheral proteins. The position of transmembrane protein MoRFs was determined relative to a protein’s topology. All information was stored in a publicly available mySQL database with a user-friendly web interface (http://bioinformatics.biol.uoa.gr/mpMoRFsDB/). A Jmol applet is integrated for visualization of the structures. The utility of the database is the provision of information related to disordered based protein-protein interactions in membrane proteins. Such proteins play key roles in crucial biological functions and ca. 50% of them are putative hubs in protein interaction networks. Consequently, these proteins may be correlated with various human diseases. The database will be updated on a regular basis by an automated procedure. The present work was funded by SYNERGASIA 2009 co-funded by the European Regional Development Fund and National resources (Project Code 09SYN-13-999,G.S.R.T. of the Greek Ministry of Education and Religious Affairs, Culture and Sports)
TOP
L062 - 3-Dimensional Modeling of Macromolecular Assemblies by Efficient Combination of Pairwise Dockings
Short Abstract: Macromolecular complexes play a key role in many biological processes. In metabolic pathways, for example, assemblies of proteins bear several advantages: reactions are performed more efficiently, oversupply of intermediate products is reduced or avoided by regulating the activity of the involved enzymes via feedback loops, and toxic or highly reactive compounds are kept from being released into the cytoplasm. However, atom-level structural determination of such complexes, for example with X-ray crystallography, often fails due to the size of the complex, different binding affinities of the involved proteins, or the complex falling apart during crystallization.
We present a novel combinatorial greedy algorithm that iteratively assembles such complexes solely based on the knowledge of the approximate interface locations of any two interacting proteins in the complex, and the stoichiometry of each monomer. Prior assumptions about symmetries in the complex are not required; rather, the symmetry is detected during complex assembly. Complexes are assembled stepwise from pairwise docking poses obtained with RosettaDock and scored using a geometric compatibility constraint deduced from these docking poses. Clash detection and clustering guarantee a reasonable and diverse solution space in each iteration.
In a diverse and representative benchmark set of 304 complexes from the Protein Data Bank with more than five subunits, 199 (65%) could be reconstructed with an average RMSD of 14 reference points for any two contacting subunits in the reference complex not greater than 3.0Å from the reference complex. Of these, the best prediction lies within the top ten in 91% of the cases.
TOP
L064 - StrAnno - An integrative Approach for automated structure-based functional Annotation of Proteins
Short Abstract: Functional annotation of proteins without significant sequence similarity to proteins of known function represents a challenging task in current bioinformatics. Hundreds of new proteins and large datasets of protein-related functional information are constantly added into public databases due to rapid advances in high-throughput technologies. As still many proteins remain structurally and functionally uncharacterized, there is a strong need for innovative approaches that can make use of available large-scale data to help in structural and functional annotation of uncharacterized proteins.
We present StrAnno, an integrative structure-based functional annotation platform, which supports annotation of proteins in an automated high-throughput manner. StrAnno combines fold recognition with sequence-based pre-filtering methods, it integrates structural and functional information from public resources, and it comprises several new post-processing methods. We show that this combination improves the performance of classic fold recognition methods in terms of applicability, batch processing capabilities and prediction reliability, which is desirable in particular for large-scale applications. Moreover, StrAnno has been conceived as a flexible and extendable platform, and its integrative design significantly increases prediction confidence levels while decreasing post-processing time and difficulty. Promising applications for our method include screening for uncharacterized proteins that might contain family-specific structural features undetectable at sequence level (e.g. disulfide bonds), and annotation of uncharacterized proteins for which phenotypic or expression-based functional information obtained in high-throughput experiments might be available.
Our integrative approach harvests the strengths of currently available sequence- and structure-based protein annotation methods to characterize novel proteins and, thus, may be helpful in current functional genomics efforts.
TOP
L065 - RNADeform: Structural Alignment by Flexible Matching and Basepair Constraints
Short Abstract: Motivation. Structure and function are highly correlated. There is an increasing number of RNA structures in the PDB that need to be compared and studied for their
biological activity. A number of programs have been developed for alignment but most of them assume that the structures are rigid, in other words, penalize according to the RMSD of the alignments.

Method. We have implemented a program that uses a sequence of local transformations
in order to evaluate an alignment, instead of a single rigid transformation for the
entire matching. It starts by considering several alignments between base fragments imposing strong constraints for base pairing. Then, a dynamic programming strategy aligns mainly single bases to single bases and basepairs to basepairs evaluating the rigid transformation
of a local neighbors of each base.

Results. A benchmark against the ARTS, SARA and SETTER programs have been carried out and
shows overall improvement and some particular good matchings. The benchmark is based on annoted functions so that the program seems good to predict function from structure.
TOP
L066 - Protein fold recognition by conditional probability based threading using structural alphabets.
Short Abstract: Fold recognition is an important step in structural annotation of proteins with no homologue of known 3-D structure identified using sequence-search methods. We propose a unique fold recognition algorithm which is based on calculation of conditional probability for the amino-acid sequence of a protein to fit to a particular fold. We use a structural alphabet, known as “Protein Blocks” (PBs) which is a library of 16 local structural prototypes named a to p based on a sliding window of pentapeptides, to encode existing folds into PB sequences. The method relies on the usage of 16 amino-acid occurrence matrices, one for each PB, to calculate conditional probability of a window of 15 residues to have a local structure corresponding to a particular PB. These probabilities were used to score dynamic programming based global and local alignments of query amino acid sequences to PB sequences derived from a library of known folds from SCOP 1.75A. Overall performance to identify the correct fold in top 5 hits was assessed on a test dataset of 1837 domains from SCOP 1.75A with utmost 70% sequence identity to our fold library. Results showed sensitivity of 75.3% and 61.9% for global and local alignments respectively, with fixed Z-score cutoffs that achieved 95% specificity. Surprisingly, this algorithm is able to pick up a correct fold without using any information from residue contacts or sequence search methods or explicit incorporation of hydrophobic effect. This method scales up very well and offers promising perspectives for structural annotations on genomic level.
TOP
L067 - Modelling structural differences between the cold and the heat denatured state
Short Abstract: Protein folding is a key process in all organisms, and misfolding proteins are known to be at the basis of aggregation diseases like Alzheimer’s and Parkinson’s. The folding transition takes place in water, causing hydrophobic amino acids to pack together in the core of the protein. The hydrophobic effect is the biggest factor in the stability of proteins. The water molecules and hydrophobic interactions also give rise to curious temperature dependent behaviour: some proteins do not only unfold at high temperatures, but also at low temperatures (cold denaturation). In addition, experimental heat capacity curves for protein folding with respect to temperature are characteristic for the hydrophobic effect.


We present a simple model with a statistical knowledge based potential for interaction between amino acids obtained from the Protein Data Bank (PDB). This is combined with a temperature dependent term for the hydrophobic amino acids, which is approximated by a quadratic function fitted to oil-to-water transfer models. We show that this simple model reproduces both heat and cold denaturation.


We observe a cold denatured state that is much more compact than the heat denatured state, as has been observed experimentally. Moreover, we can reproduce the very characteristic heat capacity curves for protein folding. With our simple model, we can relate the slope of the curves on either side of the folding transition to the amount of exposed hydrophobic amino acids. This latter result gives a handle on interpreting Differential Scanning Calorimetry (DSC) data from high throughput analysis.
TOP
L068 - Hydrophobic core to stabilize tertiary structure of proteins
Short Abstract: This poster is based on Preceedings Submission ISMB/ECCB 2013.
The hydrophobic core is traditionally interpreted as responsible for tertiary structure stabilization in protein. The quantitative measurements of the presence of regular core is presented.
The structure of idealized hydrophobicity core is assumed to be represented by 3-D Gauss function. The characteristics of this function: maximum value in the central part of ellisoid; gradual decrease of with the distance versus the center and zero-level values in the distance 3*sigma (for each direction independently) is accordant to expected hydrophobicity distribution in the protein body. On the other hand the hydrophobicity density distribution as it appears in protein body can be calculated as results of pair-wise hydrophobic inter-residual interaction. These two distributions (after normalization) can be compared. The degree of accordance/discordance between expected and observed hydrophobicity denisty distribution can be measured using the divergence entropy. High accordance between hydrophobici distribution (idealized and observed) suggests the presence of well defined hydrophobic core in protein molecule. The area of local hydrophobicity deficiency may suggests the potential localization of ligand binding while the local hydrophobicity excess (when observed on the surface of protein) may suggest the area of protein complexation. The examples of proteins of highly accordant structure of hydrophobic core with the idealized one, proteins with correctly identified ligand binding area, proteins with approprietary recognized protein-protein interface will be presented to verify the model called "fuzzy oil drop model". Model allows also identification of the structural/functional effects of mutations influencing the structure of hydrophobic core.
TOP
L069 - Structural Modeling of the Streptomyces venezuelae Anthranilate Synthase TrpE site
Short Abstract: Anthranilate synthase (AS) catalyzes the conversion of chorismate to anthranilate in the tryptophan biosynthetic pathway of microorganisms. The enzymes of this pathway hold significant potential as targets for the metabolic engineering of bacteria for the production of antibiotics. Exploitation of tryptophan synthesis for antibiotic production by Streptomyces has been hampered by lack of target enzyme 3-D structures from these organisms. The AS from the chloramphenicol-producing bacterium Streptomyces venezuelae (SvAS) is a fused enzyme that shows 50% sequence identity to the related enzyme, 2-Amino-2-desoxyisochorismate (ADIC) synthase, or PhzE, from Burkholderia lata. This level of sequence homology suggested that PhzE could be an effective structural template in the homology modeling of SvAS. Our studies using PhzE as template generated a 3-D model of SvAS having both a TrpE binding site with bound chorismate, and a TrpG site, which possesses amidotransferase activity, with bound glutamine. The chorismate site of the model accurately predicted the 7 chorismate binding residues identified by X-ray structural analysis of other anthranilate synthases. An additional 5 chorismate binding residues that we had identified using site-directed mutagenesis were also predicted by the model. This work represents an important first step in understanding the structural context of the SvAS-substrate interaction, which is essential for the exploitation of this enzyme as a target for antibiotic development.

This work was supported by grant # 8G12 MD007597 from NIMHD, NIH, MBRS-SCORE program grant (SC3GM083752/SC3GM083752-02S1) to W. Malcolm Byrnes and Howard University College of Medicine Bridge Funds and Pilot Study Awards program to Yayin Fang.
TOP
L070 - MobiDB 2: high quality intrinsic protein disorder annotations with full UniProt coverage
Short Abstract: During the last few years, intrinsic protein disorder has become an increasingly important topic in protein science. Due to the difficulty of experimentally characterizing the phenomenon, in silico predictions and indirect methods have been the main source of information used by the community. This situation is currently changing, and experimental determination of disorder is becoming increasingly feasible. So far, there have been several attempts at building databases to store intrinsic disorder annotations, and each of these have tackled the problem with different approaches. Here we present the latest version of the MobiDB database. The goal of this new version is to provide the best possible disorder annotations for all of UniProt's over thirty million proteins by leveraging different data sources. The results can be thought of as a data pyramid. The base of the pyramid is composed of all proteins, which will feature the comparatively less reliable predicted annotations. The middle part of the pyramid would be populated by the subset of proteins which also feature some indirect evidence of disorder, e.g. missing residues in PDB structures. Finally, the top of the pyramid would consist on an ever smaller but very high quality set of proteins which, apart from featuring the aforementioned predictions and indirect evidence, have manually curated and reviewed intrinsic disorder annotations. The MobiDB database can be accessed via a web interface or web services, and it's freely available at http://mobidb.bio.unipd.it.
TOP
L071 - Co-translational Protein Folding Prediction
Short Abstract: Current methods that perform ab initio protein structure prediction face two major limitations: the infeasibility of probing the conformational space and the lack of accuracy of the energy potentials. Proteins are thought to fold in a co-translational fashion, in which partially extruded peptides assume conformations that resemble the protein’s native structure. Co-translational aspects of protein folding can constrain the conformational space and improve current search heuristics. Correlated evolution can be used to improve the energy potentials: protein contact predictions based on multiple sequence alignments (MSA) can drastically impact the accuracy of protein structure prediction. In this work, we have assessed the quality of contact predictions comparing two state-of-the-art contact prediction methods. Further, we have incorporated these predictions into an existing ab initio protein structure prediction software, SAINT2, that performs predictions in a co-translational fashion: it combines an heuristic approach with a simulation, in which the peptide is gradually extruded from the ribosome.
TOP
L072 - Mining and automatic classification of repeat protein structures with RAPHAEL.
Short Abstract: Repeat proteins form a distinct class of structures where folding is greatly simplified. The structures are elongated or circular with tandem arrays of structural motifs attached periodically. Mounting evidence suggests vital functional roles in cell regulation, transcriptional control and maintenance of a healthy nervous system to name a few. However, little attention has been paid to the large-scale organization of these highly influential structures in a comprehensible manner. From a structural point of view, finding repeats may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. Our recently published method RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. It can mine large databases such as the Protein Data Bank (PDB) and CATH with 89.5% repeat and 97.2% non-repeat detection rate. Moreover, for each repeat RAPHAEL attempts the (1) determination of their periodicity and (2) assign non-periodic regions (insertions). RAPHAEL finds 1931 highly confident repeat structures not previously annotated as repeats in the PDB records (e.g. using keywords). Likewise CATH was mined for domains with periodic properties.
Finally all CATH domains and PDB chains detected to be periodic are classified automatically using a self organizing clustering algorithm. This resulted in a clear separation of repeat families and perhaps the first attempt at organizing currently available periodic proteins.
TOP
L073 - A Considerable Improvement in Protein Fold Recognition Using a Novel Kernel Fusion Framework
Short Abstract: Useful knowledge on function of proteins can be provided by information of their tertiary structure; hence determination of this structure is among the most essential objectives in biological science. From the other point of view, information about the protein fold can provide much better understanding about its three-dimensional structure. Various protein sequence feature-based techniques have been used in the classification of protein folds. Nevertheless, it is worthwhile to find a efficient and cost-effective method for integrating theses existing different views on data. On the other hand, The heterogeneous biological data sources can intelligibly be integrated as partial integration like kernel-based data fusion. Most of proposed techniques for integrating multiple kernels focus on learning convex linear combination of base kernels. In addition to limitation of linear combination, going with such approaches could cause losing potentially fruitful data information.

We present a new method which combines kernel matrices in taking geometric mean instead of convex linear combination. We employ twenty six different representative models of protein samples based on physicochemical properties and primary structural information of protein sequences, sequence evolutionary information and local sequence alignments. Furthermore, our computational model is developed by incorporating the functional domain composition of proteins through hybridization model. We evaluate our method for classification on the SCOP PDB-40D benchmark dataset for protein fold recognition. By using our proposed hybridization model, the protein fold recognition accuracy is improved to 87.7% that is superior to the results of the best existing approaches.
TOP
L074 - PON-P 2: A reliable predictor for pathogenicity of missense variations
Short Abstract: More reliable and faster prediction methods are needed to handle data from sequencing and genome projects. We have developed a completely new version of our computational tool, Pathogenic-or-Not-Pipeline (PON-P 2), dedicated for identification of disease-related variations. The machine learning based classifier groups missense variations into pathogenic, neutral and unknown, on the basis of random forest probability score. PON-P 2 is trained using altogether 22,449 experimentally verified pathogenic and neutral missense variations from VariBench, a benchmark database (1). While PON-P is a meta-predictor (2), PON-P 2 is an independent classifier and relies on information about evolutionary conservation of sequences, biochemical properties of amino acids and annotations of variation sites. PON-P 2 showed consistently improved performance in cross-validation test sets. The accuracy and MCC at 0.95 reliability level are 0.89 and 0.79, respectively. Comparison of PON-P 2 to related methods showed that it performs better than existing state-of-art tools. PON-P 2 is powerful and can be used to screen harmful missense variations for prioritization and ranking and for further experimental analysis.
References
1. Nair PS, Vihinen M. 2013. VariBench: abenchmark database for variations. Hum Mutat 34:42-9.
2. Olatubosun A, Väliaho J, Härkönen J, Thusberg J, Vihinen M. 2012. PON-P: Integrated predictor for pathogenicity of missense variants. Hum Mutat 33:1166-74.
TOP
L075 - The relationship between stability and amino acid conservation in enzyme structures
Short Abstract: Enzymes are present at the core of various biochemical reactions. They form highly ordered atomic structures called protein domains. Due to their thermodynamically stable arrangements of atoms, their evolutionary trajectories are constrained to a narrow range of stability. While some residues are critical for maintaining the stability of the protein fold, others are important for the catalytic activity itself. Most amino acid changes in native proteins are destabilising and, consequently, mutations that lead to a more favourable enzyme activity are likely to decrease the stability of the protein, and vice versa. Compensatory mutations are then needed to restore global stability. This process is referred to as the "stability-activity trade-off", suggesting a close relationship between the function and the structure.

In this study, we extracted and analysed the 4,920 protein domains identified in FunTree, a database providing evolutionary information on enzymatic proteins. We explored different structural properties for each residue, such as its solvent accessibility, its distance to the closest ligand (clustered as in the periodic table of elements), its distance to the geometric centre of the protein, and the stability effect of the 20 amino acid mutations. At the sequence level, we extracted evolutionary conservation scores, based on entropy and Scorecons methods applied on alignments from CATH/Gene3D. In conclusion, we identified different relationships between all these properties, among them a link between the intrinsically destabilising effect of binding sites and the type of the metal associated.
TOP
L076 - Protein target prediction with Logistic regression in a large scale setting
Short Abstract: In drug discovery, research should focus on those compounds, where the desired effects are maximized and unwanted off-target effects are avoided. Therefore one of the initial steps is compound prioritization. High throughput screens support this process by measuring bioactivities of the chemical structures and therefore help to identify a lead compound.

Here we want to present a method, which uses the data obtained by these bioassay measurements to predict the activities of new, unmeasured potential drug candidates for different targets based on their chemical structures. A tool like that may serve as an additional tool in the drug discovery process. The main challenge is the amount of data to be processed. ChEMBL, for example, contains more than one million compounds and in overall about ten million bioactivity measurements.

Since the usage of complex molecule kernels might cause much computational effort for a large number of targets and compounds, we suggest fingerprint methods to describe the structure of molecules together with an efficient implementation of logistic regression, which makes explicitly use of the sparseness of that molecule representation. Advantage of logistic regression is the convexity of the method and probabilistic output. Many fingerprint features may be extremely weak in discriminating active from inactive compounds. Therefore, in order to reduce the dimensionality of the fingerprint features, we use an additional test for filtering.

In order to assess the performance of this approach, we apply our method to targets in ChEMBL in a leave-one-cluster-out setting and compare with some previously suggested methods.
TOP
L077 - Critical Assessment of the Methods and the Features Used for Hot Spot Residue Prediction at Protein-Protein Interfaces
Short Abstract: Hot spots are residues which make dominant contributions to the free energy of binding at protein interfaces. Experimentally, a hot spot can be identified by mutating it to alanine and measuring the changes in free energy of binding (??G). Experimental information is available only for a limited number of complexes. Hence, a need for computational methods arises. Several methods based on machine learning algorithms are applied to predict hot spots. Furthermore, sequence and/or structure based features are used for determining whether a residue at protein interface is a hot spot or not. Additionally, lots of data sets are used but some of them are redundant or incommensurate in the context of hot spot. In this study, we offer a critical assessment of methods, features and data sets used for hot spot residue prediction at protein-protein interfaces in recent years and also propose a newly generated non-redundant protein data set.
TOP
L078 - 3D structures of transmembrane beta barrel proteins from maximum entropy analysis of genomic information
Short Abstract: Recent advancements in the field of structure prediction have shown that 3D structure of a protein can be predicted from sequences alone by using maximum entropy formalism to find causal evolutionary correlations in homologues sequences. The main advantage of using such methods is that correlations arising from transitivity can be avoided. Transmembrane beta barrels are membrane proteins that exist in the outer membrane of bacteria, chloroplast and mitochondria. They are key constituents of the translocation machinery of outer membrane proteins. Further they act as transporters and constitute the type-V secretion system. Many transmembrane beta barrels act as pathogens and thus make for good drug targets against infectious diseases. We recently showed that machine-learning methods combined with analytical methods based on geometric construction of beta-barrels can be employed to generate idealized beta-barrel models. However, high-resolution 3D modeling of beta-barrels and identification of functionally relevant residues is still an open problem. We show that applying a maximum entropy approach to identify causal evolutionary correlations between residues can be used to compute the 3D structure and identify crucial residues of these important proteins. This method can potentially enhance our understanding of assembly of translocation machineries in the outer-membrane of bacteria, chloroplast and mitochondria.
TOP
L079 - Structure-based prediction of transcription factor binding specificities using a Metropolis-Montecarlo simulation approach
Short Abstract: Protein-DNA binding is of utmost importance since it is involved in cellular processes such as gene expression and cell division. Since the first DNA-protein structure complex was solved at atomic resolution, our knowledge about how the recognition is carried out has increased notably. Several studies have been reported, where sequence as well as structural information has been used to predict protein-DNA binding specificities. Despite sequence-based methods are widely used, they are still not very accurate, exhibiting poor sensitivity/specificity trade offs.
We have developed a structure-based method for predicting binding sites in the DNA. This method relies on software for the 3D modelling of protein-DNA complexes and on statistical potentials, which estimate the protein-DNA stability from the 3D structure. We have included this approach in a protocol for predicting specificities of transcription factors at the DNA level. The core of the protocol is a Metropolis-Montecarlo simulation, a random sampling method for modelling systems with many coupled degrees of freedom, that allows us to reduce the sampling space as well as to obtain DNA sequences that bind with a medium-high or high affinity.
Here we report this protocol and show that we can obtain ensembles of DNA sequences with high fitness for the protein-DNA complex under study and, at the same time, recover in a large degree the known experimental specificities for several transcription factors.
TOP
L080 - Large scale analysis of disorder changes between protein conformations
Short Abstract: Proteins have diverse structures and functions. This diversity occurs not only between different proteins, but also at the individual protein levels. It is well established that proteins show a wide range of conformations in the native state. Proteins can undergo small to large conformational rearrangements involving ordered as well as intrinsically disordered (ID) states. According to the conformational selection theory conformational equilibrium towards different conformers can be shifted by different factors such as presence of ligands, pH, temperature, post translational modifications and change in oligomeric state. Using CoDNaS and MobiDB databases we studied how these conditions change order and disorder regions in 98742 pair of conformers corresponding to 6089 proteins.
We found that in general the presence of ligand, post translational modifications and change in oligomeric state produce an increase in ordered regions between conformers in about 60% of the proteins studied. Although in the rest of the cases the different factors produce an increase in the disorder of the protein, the absolute amount of the change is less compared with the set where the proteins became ordered. Our results could help to understand the mechanisms underlying protein dynamism and disorder and their close relationship with protein function.
TOP
L081 - Fragment binding prediction using unsupervised learning of ligand substructure binding sites
Short Abstract: Fragment-based drug design, an emerging tool in drug discovery, focuses on optimizing low affinity low-molecular-weight fragments into higher affinity lead molecules. Key in this process is the initial identification of fragments that bind to the protein target of interest. Existing computational methods of docking and virtual screening are optimized for complex drug-like small molecules and do not perform well with fragments. However, the availability of structural data for proteins whose bound ligands share substructures can be a proxy to enhance our understanding of fragment binding to facilitate the development of fragment binding predictors.

We propose an unsupervised machine learning approach to automate the discovery of fragment binding preferences. For all protein residues involved in ligand binding, we collect their local structural microenvironment and annotate them with the ligand fragments they bind. This serves as the knowledge base of protein-fragment interactions. Comparison to the knowledge base enables retrieval of fragments statistically preferred by the microenvironments of a target protein structure, giving insight to drug design. Our approach enables discovery of similar microenvironments across diverse proteins and maximizes structural data usage by merging information across diverse ligands with shared substructures. Results on a dataset of proteins binding a variety of endogenous ligands show strong ability to rediscover fragments corresponding to the ligand bound, validating the methodology.
TOP
L082 - Thermal adaptation of conformational dynamics in ribonuclease H
Short Abstract: The relationship of protein conformational dynamics to enzymatic activity and thermostability is often difficult to elucidate. Because conformational changes are often necessary for enzymes to carry out their catalytic function, and rigidification is a common mechanism of thermostabilization, adaptation to temperature extremes can require tradeoffs between stability and function. The study of homologous proteins with differing thermostabilities thus facilitates understanding of the functional aspects of conformational dynamics Ribonuclease H1 (RNase H), a well-conserved endonuclease that hydrolyzes the RNA component of RNA:DNA duplexes, is a well-studied protein family in which structural information is available for close homologs from organisms with a range of temperature tolerances. Nuclear magnetic resonance (NMR) spectroscopy has previously identified differences in local flexibility between the homologs from the mesophilic bacterium Escherichia coli and the thermophilic bacterium Thermus thermophilus. Here we present a comparative analysis of molecular dynamics simulations of five homologous RNase H proteins of differing thermostability and enzymatic activity at ambient temperature. Unexpectedly, we find that a single amino acid residue differentiates between two different and otherwise conserved dynamic processes in a region of the protein known to form part of the substrate-binding interface. Within these two categories, we identify additional residues that influence the temperature dependence of these processes, allowing us to rationalize known differences in both flexibility and enzymatic activity. Collectively, these results suggest that, despite high sequence homology among the RNase H proteins studied here, multiple evolutionary solutions exist to the problem of thermal adaptation of conformational dynamics in the substrate-binding interface.
TOP
L083 - Conformational diversity and lung cancer associated mutations in human EGFR kinase domain
Short Abstract: Single amino acid substitution (SAS) can cause disease by different mechanisms. The most extended is the destabilization of the protein fold producing the alteration of protein function. Considering that the native state of the protein is not unique and it is better represented by a conformer ensemble, it is conceivable that SASs distinctively affect the stability of the different conformers. Consequently to analyze the impact of a given SAS, the conformational diversity of a protein should be taken into account.
Epidermal Growth Factor Receptor (EGFR) is an important marker employed in detecting Non-Small Cell Lung Cancer (NSCLC). Here we explored 49 different SASs in EGFR kinase domain associated with NSCLC as a function of the conformational diversity of the protein.

∆∆G for all possible amino acid substitutions and for all the positions in the active and inactive conformers of EGFR were scanned. Comparing the pattern of ∆∆G per position between the conformers we found that, cancer associated SASs are located in position showing the maximum variation in the ∆∆G. Also, sensible to drug treatment associated mutations (tyrosine kinase inhibitors) were found to be mostly neutral but those resistant to treatment show high destabilization values particularly in the active conformer of the protein. Our results could indicate that the analysis of conformer specific SASs tolerance, in terms of structure stability, could improve our understanding of disease origin and treatment response.
TOP
L084 - Increasing coverage of multi-domain chains in SCOP
Short Abstract: An ongoing challenge in maintaining the SCOP database is continuing to provide the same level of reliability while increasing coverage of PDB entries. The latest version of SCOP (1.75B, hosted at http://scop.berkeley.edu) contains 56% of the approximately 89,000 entries in the PDB. We recently extended the SCOP build pipeline in order to support automatic classification. Our first public deployments with automated classification (1.75A,B) supported single-domain chains. Approximately 40% of PDB entries in SCOP contain a multi-domain chain. In this poster, we describe an extension toward classification for multi-domain chains. Domains in a newly deposited chain are annotated by running BLAST against a database of domain sequences from SCOP. Of the PDB entries not yet included in SCOP, 77% contained at least one chain with a significant BLAST hit (e-value<-4). With focused curation efforts and a planned monthly release schedule, we aim to reach 100% coverage.
TOP
L085 - Simultaneous fitting of macromolecular subunits into low-resolution cryoEM maps using network alignment and genetic algorithm
Short Abstract: Cryo-electron microscopy (cryoEM) techniques are widely used to study the structure and function of macromolecular assemblies. Although EM image processing methods produce 3D density maps of low resolution, they are becoming hugely important as they allow the visualisation of large assemblies in multiple conformational states. EM maps of assemblies are often interpreted by fitting atomic models into them. However, model fitting can become a very challenging task depending on the size and shape of the complex, the number of subunits, the complexity of their interactions, and the map resolution.

We present a method for simultaneously fitting subunits into cryoEM maps of assemblies. Based on the clustering of electron density information, the assembly map and its multiple subunits are simplified into a set of vectors (feature points) [1] that can be used to quickly and simultaneously place the subunits using an integer quadratic programming procedure [1]. This is combined with a genetic algorithm used to optimise the positions of the feature points. The optimisation is guided by a combination of scores to capture the shape of the subunits and the goodness-of-fit between the map and the subunits. Our current, simulated test cases suggest that the method may be applicable to larger assemblies than the ones previously published [1].

[1] Zhang S et al. (2010) A fast mathematical programming procedure for simultaneous fitting of assembly components into cryoEM density maps. Bioinformatics (ISMB)26:i261–i268.
TOP
L086 - “Stability patches” on protein surfaces - structural organization and functional role
Short Abstract: Understanding the structural basis of protein-protein interactions (PPI) may shed light on the organization and functioning of biological networks, and assist in structure-based design of ligands (drugs) targeting PPI. We hypothesized that backbone compliance play an important role in PPI and in the mechanism of binding of small-molecule compounds to protein surfaces. We used a steered molecular dynamics simulation (SMD) to explore the compliance properties of the backbone of surface-exposed residues in several model proteins: interleukin-2, MDM2 and proliferating cell nuclear antigen (PCNA). We demonstrated that protein surfaces exhibit distinct patterns in which highly immobile residues form defined clusters, “stability patches”, alternating with areas of moderate to high mobility. These “stability patches” tend to localize in functionally important regions involved in PPI.
This new method of SMD surface analysis was further applied to the study the regulation of PPI by allosteric modulators. Allosteric modulators regulate PPI by binding at site(s) orthogonal to the complex interface and altering the protein’s propensity for complex formation. We characterized the dynamic properties of functionally distinct conformations of a model protein, calmodulin (CaM), whose ability to interact with target proteins is regulated by the presence of the allosteric modulator Ca2+. We demonstrated that SMD analysis is capable of pinpointing CaM surfaces implicated in the recognition of both the allosteric modulator Ca2+ and target proteins. Our analysis of changes in the dynamic properties of the CaM backbone elicited by Ca2+ binding yielded new insights into the molecular mechanism of allosteric regulation of CaM-target interactions.
TOP
L087 - Does high throughput mean low output? How much Information is provided by high throughput experiments?
Short Abstract: The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here we investigate just how prevalent is the ``few articles --many proteins'' phenomenon. We examine the annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins.
TOP
L088 - The next generation of SCOP and ASTRAL
Short Abstract: The Structural Classification of Proteins (SCOP) database is a manually curated, near-comprehensive ordering of domains from proteins of known structure in a hierarchy according to their structural and evolutionary relationships. The ASTRAL compendium is a collection of software and databases, closely related to SCOP, that is used to aid research into protein structure and evolution. We released new versions of both SCOP and ASTRAL (1.75B) in January 2013. The new releases are the second in a series of stable SCOP and ASTRAL releases based on SCOP 1.75. New versions of both databases are presented to the public through a single, unified interface (http://scop.berkeley.edu/). New features include a SQL-based infrastructure and build procedure, a fully automated classification scheme for new PDB entries that are similar to previously classified entries, and periodic incremental releases to supplement the stable releases. More than 11,300 new PDB entries have been added since SCOP 1.75, without sacrificing the reliability that SCOP has accumulated through years of careful manual curation. We plan to introduce additional features in a series of stable releases, while a major reclassification (SCOP 2.0) is in progress.
TOP
L089 - Toward Redesign Stable and Efficient Paraoxanase Bioscavengers
Short Abstract: Organophosphate (OP) compounds have been extensively used as pesticides and insecticides in agriculture, as a flame retardant in plastics and rubbers, and even as a gasoline additive. Long term exposures to these OP compounds in commercial products have been linked to several adverse health effects and permanent damages to our ecosystem. Paraoxonase (PON1) is an A-esterase capable of hydrolyzing the active metabolites (oxons) of a number of organophosphorus (OP) insecticides such as parathion, diazinon and chlorpyrifos. In this study, we aim to redesign human PON1 to be used as scavengers to sequester toxic OP compounds. Human PON1 displays two polymorphisms in the coding region (Q192R and L55M) and several polymorphisms in the promoter and the 3'-UTR regions. We found that the L55M polymorphism is less stable due to a loss of its coordination with the stability Ca2+. In addition, a network of interacting residues that are in concerted motions is disconnected. The new computational protein design protocol that includes the evolution conservation of PON1 sequences will be presented as well as a novel variant that is more stable than the wild-type PON1.
TOP
L090 - On optimal metric training and testing of protein residue pair potentials
Short Abstract: Recently a method named metric training was proposed for designing a smooth knowledge-based pair potential for protein structure prediction not based on statistics but entirely on optimizing the correlation between the native-model energy differences and a native-model distance. The chosen metric in the optimization determines the geometrical properties captured by the model. Here, we find the optimal metric training pair-potentials when exchanging RMSD with a number of popular protein structure distance measures. We test the ability of the resulting pair potential to obtain energy-distance correlation and to select the best decoy in an ensemble of decoys and report substantial improvements in performance when using the global distance test, GDT-TS, the fraction of native contacts and an anharmonic normal mode potential besides RMSD. Our results are based on an analysis of a standard single model pair potential and a novel ensemble consensus pair potential. The consensus pair potential generally outperforms the single model pair potential but this difference may be covered by adding a local fragment based potential given sufficient sequence homology to known structures.
TOP
L091 - Mapping Side Chain Interactions at Protein Helix Termini
Short Abstract: Capping interactions at the ends of protein helices stabilize helix termini and shape the geometry of the adjacent loops, making a substantial contribution to overall protein structure. At the N-terminus, it has been found that the dominant capping motifs involve side chains, and include hydrogen-bonded motifs such as the Asx/ST N-caps and the capping box, as well as hydrophobic and electrostatic interactions. But key questions remain concerning N-terminal capping interactions: 1) To what extent are capping motifs with two or more amino acids that have been detected in structural surveys likely to represent genuine cooperativities? 2) What geometries for the loop backbone are favored by each N-terminal side chain interaction? 3) Can an exhaustive statistical scan of a large, recent dataset identify new side chain interactions near the N-terminus?
In this work, three analytical tools are applied to answer the above questions. First, a new least-squares 3D clustering algorithm is applied to group the caps in a large (N = 16,600), high-quality, PDB-derived dataset by backbone geometry. Second, Cascade Detection (Newell, Bioinformatics, 2011), an algorithm that detects cooperativities by identifying sequence features that are outliers from their background models, is applied to each cluster separately to determine which features are most significant in each geometry. Finally, the results for each feature are displayed in a CapMap, a Jmol structure that depicts the distribution of feature overrepresentation across loop geometries as a 3D heatmap, and enables the isolation and analysis of the side chain interaction corresponding to the feature.
TOP
L092 - Tracing the similarities between enzyme folds through binding-site similarity networks
Short Abstract: Exponential growth in protein sequence and structural databases significantly increases the need for computational approaches for comparing proteins and for obtaining accurate functional annotations. Comparison of molecules is carried out at different levels, that of whole sequences, domains, sequence motifs, structural folds and sub-structures. The problem is complex even for enzymes, despite being the best characterized type of proteins, due to one fold – many functions and many folds – one function association types. Understanding sequence-fold-site relationships and their evolutionary implications is thus a challenging task.
In this study, we address this problem using a network approach that links enzyme functions through similarities in their binding sites, folds and ligands. We first systematically annotate the domain(s) in a protein structure involved in reaction catalysis using information from bound cognate ligands, cofactors, or known catalytic site residues. A bipartite network of functions and folds as nodes shows that 1683 enzyme functions are associated with 395 structural folds of which 191 functions are associated with more than two folds. A second network of functional associations using similarity in the binding-sites and domain superfamilies identifies 285 functions and 1209 shared interactions. This network reveals about 20 clusters of functions that could be rationalized by similarities either in their ligands or in their catalytic mechanisms. A consolidated network is then constructed and used for elucidating probable paths in evolution from one function to another. The network is directly useful for obtaining accurate functional annotations of proteins and also in poly-pharmacology and protein engineering.
TOP
L093 - Predicting protein-protein binding sites and binding partners using CRACLe
Short Abstract: We report on the development of a novel computational approach to identify binding sites on protein surfaces and evaluate the likelihood of interaction between two proteins. CRACLe (Critical Residue Annotation and Complementarity Likelihood) utilizes the SNAPP (Simplicial Neighborhood Analysis of Protein Packing) scoring function and an ensemble of cheminformatics-inspired descriptors of Protein-Protein (PP) interfaces. CRACLe calculates specialized SNAPP scores for solvent-exposed residues and residue neighborhoods to identify potential hot spots characterized by high SNAPP scores. CRACLe correctly predicted binding sites for over 80% of proteins from the Dockground automatically selected representative complexes and over 70% of proteins from the PepX database of protein-peptide complexes. We shall discuss these results and highlight case studies where CRACLe accurately identified PP binding sites and was also able to correctly predict the interacting residue of the binding partner. CRACLe‘s prediction performances are discussed and compared with other existing approaches. We posit that CRACLe is a fast and efficient method for binding site identification and binding partner prediction, especially as a high-throughput precursor and aid to protein-protein docking approaches.
TOP
L094 - The virtual screening of compounds from curare (Chondrodendron platiphyllum) presents Siringaresinol as a new inhibithor of platelet aggregating factor (PAF)
Short Abstract: The plant genus Chondrodendron (Menispermaceae) are traditionally used to produce curare (poison) by South American Indians, as a tool for hunting and fishing. The main activity of curare is in the neuromotor and nervous central system. Platelet Aggregating Factor (PAF) is associated with the inflammation process and innate immunity in the gastrointestinal tract of mammals. The Siringaresinol has been isolated and described by the fisrt time from Chondrodendron platiphyllum from Brazil, and its pharmacological activities have not been tested in vivo yet. Using SEA database, we found significant similarity between Siringaresinol and other molecules with activity against PAF. Then we used AutoDock Vina to perform docking calculations beetween Siringaresinol and PAF receptor. This software returned reasonable affinity energy values for several ligand conformations. Subsequently, we used PyMOL 1.4 and Ligand Scout 3.1 to check the stereochemistry of chiral carbons, substructure, superstructure, number of rotatable bonds, number of rings, number of donor groups, and hydrogen bond receptors. AMBER 12 was used to simulate the behavior of the PAF-Siringaresinol complex after a set of 5000 ps and up to 300 K in water. This calculation returned a graph of stabilization of potential energy against simulation time and showed that the ligand remained inside the active site after the simulation was complete. The results indicate that Siringaresinol could be a good inhibitor for PAF in vitro and in vivo.
TOP
L095 - ProRank+: Detecting overlapping protein complexes using a protein ranking algorithm
Short Abstract: The detection of protein complexes is evidently a cornerstone of understanding various biological processes and identifying key genes causing different diseases. Accordingly, many methods aiming at detecting protein complexes were developed. Recently, a novel method called ProRank was introduced. This method uses a ranking algorithm to detect protein complexes by ordering proteins based on their importance in the interaction network and by accounting for the evolutionary relationships among them. The experimental studies showed that it outperformed several well-known methods in terms of the number of detected complexes with high accuracy, precision and recall levels. In spite of that, the ProRank algorithm does not reflect the fact that proteins can exhibit many functions by being part of different complexes. Therefore, including this biological fact would certainly lead to further improved detection results. In this work, we present ProRank+, a refined version of ProRank, which allows detected protein complexes to overlap; a supposition that was not considered in the original version of the method.
TOP
L096 - On the quick evolution of transcriptional regulation in Prokaryotes
Short Abstract: Compared to other functionally-related gene products, such as genes in operons or genes coding for proteins that physically interact, genes coding for transcription factors (TFs: proteins that activate and/or repress genetic transcription, excluding sigma factors), show a loose evolutionary relationship with the genes whose expression they regulate. Previous works showing rapid evolution of the relationship between TFs and their regulated transcription units (TUs: one or more adjacent, same-strand, genes transcribed together into a messenger RNA) have relied on knowledge databases derived from direct experimental information. To expand the analyses into Prokaryotes with less experimental information, we first found putative TFs by finding protein motifs commonly used by TFs to bind to DNA. Next we calculated presence/absence profiles for all protein-coding genes annotated within each genome and compared such profiles (phyletic patterns or phylogenetic profiles) to measure gene co-occurrence across available prokaryotic genomes. Given the absence of experimental evidence associating TFs with their target TUs, we used the most co-occurring gene as a proxy for the best association between any genes within each genome. We found that, in most genomes, TFs had less co-occurring gene partners than any other genes in the genome. The few instances of genomes with TFs showing stronger co-occurrence than other genes happened to contain over-annotated genes (a high proportion of false genes, which, almost by definition, should show low-scoring co-occurrence patterns). Overall, our results confirm that transcriptional regulation might evolve quickly in most prokaryotes.
TOP
L097 - GIANT: a web-server for analyzing protein–small ligand interactions based on statistically preferred patterns of atomic contacts
Short Abstract: Analyzing molecular interactions between proteins and small ligands from the 3D structures is an important task in the structural biology. While visual inspections by scientists’ eyes are usually done as a first step of the analysis, it inevitably introduces biases caused by subjective viewpoints of scientists. Tools providing more objective viewpoints for molecular interactions may be helpful to interpret their binding modes. Here, we present a web-server, named “GIANT”, to guide analyzing molecular interactions with annotations of statistically preferred positions of atomic contacts. Firstly, we took statistics of spatial positions of ligand atoms around fragments of amino acids over protein–small ligand complexes in the PDB. Secondly, positions of atomic contacts were classified by an unsupervised pattern recognition technique, and we call the classes of atomic contacts “interaction patterns”. Each interaction pattern means a statistically overrepresented area around a fragment to interact with ligand atoms, and was defined as a Gaussian function. The information about the interaction patterns is annotated onto each atomic contact in the 3D structure of complexes. For each atomic contact, user can see what pattern is assigned to the contact, how common (or statistically preferred) it is, and what complexes have similar interactions. GIANT provides browser-based GUI (powered by Jmol) for interactive visual inspections of protein–small ligand interactions.
TOP
L098 - Similarity-based docking using atomic correspondence of maximum common substructure : dependence of prediction accuracy on target-template ligand similarity
Short Abstract: A binding conformation of a ligand to a receptor protein is important to know the mechanism of the interaction. Recently, a similarity-based docking based on a template complex 3D structure becomes feasible, because an increasing number of 3D data are accumulated in the PDB. The docking is performed by superimposing and transforming of the target molecule onto the bound 3D conformation of the template compound. The volume-overlapping method using Gaussian distribution function is the most popular for this superposition. We propose a method based on atomic correspondences between the two molecules obtained by MCS (maximum common substructure) of 2D chemical structures. The MCS is calculated by our program kcombu (Kawabata, J.Chem.Inf.Model,2011,51,1755-1787), providing classical connected and disconnected MCSs, as well as topologically constrained disconnected MCS (TD-MCS) allowing a few gaps in connected substructures. After the calculation of MCS, a target molecule is transformed by the three steps; a rigid-body superimposing of corresponding atom pairs, “stamping” of the corresponding dihedral angles, and a gradient-based optimization by rotatable dihedral angles. The prediction performance was evaluated using many superimposed ligand 3D structures on the same protein in PDB. We found that the prediction accuracy depends on chemical similarity between the target and template; if the template and target compounds have more than 60 % tanimoto similarity, average RMSD of 3D conformations is less than 2.0Å. Our method is more accurate and faster than volume-overlapping methods if sufficiently similar templates are found.
TOP
L099 - Protein-protein interactions in a crowded environment: an analysis via cross-docking simulations and evolutionary information.
Short Abstract: Large scale analyses of protein-protein interactions with molecular docking simulations are now realizable on hundreds of proteins. We demonstrate that the combination of cross-docking and evolutionary sequence information applied to the 168 proteins of the Mintseris Benchmark 2.0 covering a large spectrum of interfaces, is a viable route to identify interacting partners and to propose a conformation for the corresponding complexes. We evaluate the quality of the interaction signal and the contribution of docking compared to evolutionary information in partner identification. Since protein interactions usually occur in crowded environments with several competing partners, and protein affinities might vary from partner to partner, we realize a thorough analysis of the interactions involving the proteins of the Mintseris dataset with their true partner but also with the rest of the database to evaluate whether the proteins in competition with the true partner can affect the identification of the native complex. We identify three populations of proteins: strongly competing, never competing and interacting with different levels of strength. Populations and levels of strength are numerically characterized and provide a signature for the behaviour of a monomer in a microenvironment. We also show that evolutionary sequence analysis can be used to reduce the high computational cost of docking simulations, with no consequence in the quality of the results, opening the possibility to apply docking simulations to datasets made of thousands of proteins. We release the complete decoys set coming from docking simulations of both true and false interacting partners, and their evolutionary sequence analysis.
TOP
L100 - MOLECULAR MODELING AND STRUCTURAL ANALYSIS OF TASK-1 POTASSIUM CHANNEL INTERACTING WITH THE BLOCKER A1899
Short Abstract: Two-pore domain potassium (K2P) channels are expressed as functional dimers in the central nervous system, cardiovascular system, genitourinary system and gastrointestinal system. They are related with several pathologies in humans. Thus, members of this family have emerged as molecular candidates for the action of pharmacological agents. The K2P channel TASK-1 is an important modulator of multiple sclerosis and in 2011 a highly-selective blocker of TASK-1, named A1899, was discovered. It was suggested that A1899 acts as an open-channel blocker and binds to residues forming the wall of the central cavity. In 2012 the first crystal structures of K2P channels were published. Electron density maps revealed two open lipid cavities or fenestrations, one on each side of the dimer, that expose the central cavity to the membrane. We constructed homology models of TASK-1, based on the crystal structures of the recently crystallized K2P channels and studied the specific binding site of A1899. Our results suggest that:
- A1899 could enter to TASK-1 trough the pore driven by the residues N240, V243, M247.
- A1899 binds to residues of the selectivity filter through hydrogen bonds, blocking the ion flux.
- A1899 is stabilized mainly by hydrophobic interactions of the fenestration residues (which is in agreement with its LogP value=4.738).
TOP
L101 - Accurate Prediction of Protein-Protein Contacts: from Patches to Sites
Short Abstract: Protein-protein interactions regulate most biological processes and their interfaces constitute an increasingly important target for drug design. Protein binding sites are expected to display certain degree of conservation along with some specific physical-chemical properties.

We have previously developed a method Joint Evolutionary Tree (JET) for in silico prediction of protein interfaces using evolutionary information and amino-acids physical-chemical properties mapped to protein three-dimensional structure [Engelen 2009]. This method proved successful in detecting very different types of protein interfaces and in providing predictions even with weak signals. JET predictions were also found very useful to guide molecular cross-docking and to help discriminate true partners from decoys [Lopes 2013]. However these results highlighted the crucial importance of an accurate definition of protein binding patches to probe biologically relevant interactions.

In this work, we present improvements of JET by treating explicitly the different structural components of protein-protein interfaces, namely the support, core and rim [Levy 2010] and by combining evolutionary information with circular variance that measures the exposition of residues in a protein [Ceres 2012]. This strategy enabled to (i) more accurately define protein binding sites and (ii) specifically discriminate protein-protein interfaces from protein-ligand binding sites on a set of proteins from the Mintseris database.

Ceres N, Pasi M and Lavery R (2012) J. Chem. Theory Comput. 8:2141–2144
Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A. (2009) PLoS Comput Biol. 5:e1000267
Levy ED (2010) J Mol Biol. 403:660-70
Lopes A, Sacquin-Mora S, Dimitrova V, Ponty Y, Carbone A (2013) (submitted)
TOP
L102 - Deriving High Resolution Local Sequence Dependent Ramachandran Maps
Short Abstract: The most accurate maps to date describing the Ramachandran dihedral angle pair propensities for all 20 single and 8000 triplet amino acid sequences are derived from high-resolution X-ray crystal structures using a robust information-theoretic approach. I first establish a general information maximization procedure and then apply it specifically to derive highly resolved Ramachandran maps at a resolution of 10 degree radius, the optimum resolution dictated directly by the current volume of structural data. Deriving a large number of high-resolution sequence-dependent PDFs was made possible by the strategy to utilize the latent information contained in all viable high-resolution structures found in the Protein Data Bank (PDB) totaling more than 77,000 chains, including redundant chains from highly represented protein families. The procedure allows for the ability to further refine plots and adjust key parameters as new structures and folds are added to the PDB. As components of a backbone potential, these high-resolution Ramachandran maps are shown here to perform better in extensive threading tests compared to other triplet maps published in literature. This work is of general interest because it advances a fully automatic computational device that can be applied to other data-derived probability distribution functions. This device exploits natural information-theoretic notions linking naturally to improved performance of knowledge-based potentials, providing an easy computational recipe for empirically deriving all kinds of sequence-dependent structural probability distribution functions with greater detail and precision. The high-resolution Ramachandran maps derived in this work are available for download.
TOP
L103 - GrAlign: fast, flexible alignment of protein 3D structures
Short Abstract: Motivation: Protein structure alignment is key for transferring information from well-studied
proteins to less studied ones. Structural alignment identifies the most precise mapping of equivalent residues, as structures are more conserved during evolution than sequences. Among the methods for aligning protein structures, maximum Contact Map Overlap (CMO) has received sustained attention during the past decade. Yet, known algorithms exhibit modest performance and are not
applicable for large-scale comparison.

Results: Graphlets are small induced subgraphs that are used to design sensitive topological similarity measures between nodes and networks. By extending the notion of graphlets to the case of ordered graphs, we introduce GrAlign, a CMO heuristic that is suited for database searches. On popular benchmarks from the literature (the Skolnick set, containing 40 protein domains, and the
Proteus_300 set containing 300 protein domains), GrAlign outperforms the state-of-the-art CMO heuristic in terms of running times, and its similarity score being in better agreement with the SCOP classification. On a large scale experiment over the whole Astral-40 database (11,160 protein domains), we show that GrAlign's flexible alignments are preferable to the traditional alignments based on rigid body superimposition: GrAlign's top scoring alignments are in better agreement with the SCOP classification, have better coverage in terms of matched residues, and can be used for identifying the rigid and flexible regions of the proteins.
TOP
L104 - Molecular modeling of Sonic Hedgehog calcium binding suggests novel regulatory mechanism
Short Abstract: Hedgehog (Hh) signaling is essential for the development of nearly every organ system in vertebrates. One representative protein of the Hh family in mammals is Sonic Hedgehog (Shh). Shh is also grouped into the LAS family (Lysostaphin type enzymes, D-Ala-D-Ala metalloproteases, Sonic Hedgehog), and shares with the members of this family a special zinc center. Remarkably, Shh is the only LAS member without a proven peptidase activity, and it exhibits an extra metal center with two calcium ions not far from the zinc center.
We have studied by molecular dynamics simulations, structural alignments and electrostatic calculations the effect of these calcium ions on Shh structure. Our calculations show that the presence of calcium ions has an impact on Shh properties and stabilizes the overall structure. The binding of the second calcium ion makes Shh significantly less similar to LAS enzymes, suggesting the inactivation of putative peptidase function of Shh. Moreover, electrostatic potential differences among calcium states suggest the possible binding of non polar substrates. This regulatory mechanism could have many implications in the biological function of Hh proteins.
TOP
L105 - Quantification of the Impact of PSI:Biology According to Functional Annotations of the Determined Structures
Short Abstract: We present new approaches to quantifying the functional attributes assigned to groups of protein structures. In one metric, protein structures were analyzed regarding their mean number of functional category assignments. A second metric of Shannon diversity, as adapted from ecological studies, was used to quantify the number of unique structure-function relationships addressed across groups of structures or structural projects. Functional assignments were utilized at the entire protein level, primarily retrieved from UniProt, and at the level of specific residues, as facilitated by the SIFTs mapping project. We considered proteins assigned to a relatively large number of functional assignments to have high biological relevance, as they have been previously well characterized by the biomedical community. Using the two metrics, structures determined via Protein Structure Initiative(PSI):Biology projects were compared to those determined during the first two PSI phases and to those determined in the US with structural genomics excluded. One result is that PSI:Biology has been successful in realizing its goal of achieving a higher focus on determining structures of biologically relevant proteins than PSI:1&2. Also, there is clear evidence of a trend that team-based structural projects, as done through PSI:Biology Partnerships, have determined structures with relatively the highest biomedical relevance. We infer that the collaborative efforts of PSI:Biology Partnerships are able to tie together functional and structural relationships at a more effective rate than individual investigators.

This work was supported in part by a grant from the National Institute of General Medical Sciences (U01 GM093324) to develop the Structural Biology Knowledgebase.
TOP
L106 - Annotation network from different data sources of genes and proteins by correlation analysis
Short Abstract: In construction of databases, genes were analyzed and annotated through independent resources, original databases, and bioinformatics tools. These information are then merged and integrated in a larger database. In such analyses, annotations are used as if they were independent, despite that some annotations are correlated each other. To interpret complex multiple annotations, we comprehensively examined correlation among each annotation for human genes. We selected ten annotations (gene family, Gene Ontology, InterPro, KEGG pathway, protein-protein interaction, SCOP, SOSUI membrane protein prediction, OMIM, tissue specificity of gene expression, and subcellular localization) from the integrated human gene database, H-InvDB. For all pairs of terms, the correlations were evaluated using Fisher's exact (two-side) test with Bonferroni correction. As a result, we found 21,047 pairs with positive correlation. Among them, we found many annotation which shared almost the same genes. By examining the shared genes, we re-constructed the designed hierarchical relationships of the annotations, such as Gene Ontology (GO), and InterPro. Although the hierarchical information of GO terms and InterPro terms can be obtained from the data source provider, this analysis also allowed us to construct a new (partially hierarchical) cross-database annotation network among them, including KEGG pathway, OMIM disease, gene family, etc. The results will help us to interpret and analyze information on genes and proteins.
TOP

View Posters By Category

Search Posters:


TOP