19th Annual International Conference on
Intelligent Systems for Molecular Biology and
10th European Conference on Computational Biology


Accepted Posters

Category 'W'- Structure and Function Prediction'
Poster W01
A Time-Dependent Framework for Solving the Degeneracy Problem in RNA Secondary Structure Deleterious Mutation Prediction

Danny Barash Ben-Gurion University
Alexander Churkin (Ben-Gurion University, Computer Science);
 
Short Abstract: The problem of predicting which single point mutation in a given RNA sequence will induce a conformational rearrangement in its secondary structure may often encounter multiple solutions. The simultaneous integration of the chemical master equation by the Gillespie algorithm for RNA mutant sequences in realistic times may solve this problem.

Long Abstract: Click Here

Poster W02
Correcting protein alignment errors using a novel multiple-template threading method

Jinbo Xu Toyota Technological Institute at Chicago
Jian Peng (Toyota Technological Institute at Chicago, NA);
 
Short Abstract: Template-based protein modeling (TBM) suffers greatly from alignment errors especially when the sequence and template are distantly-related. We present a novel multiple-template threading method to address this problem, which employs a novel probabilistic-consistency algorithm to accurately thread a single sequence simultaneously to multiple templates. Multiple-template methods have been studied by several groups before, but none has showed significant improvement in alignment accuracy through a large-scale blind test (e.g., CASP). Our CASP9 results indicate that 1) using multiple templates we can significantly improve alignment accuracy; 2) our multi-template models consistently excel models built from the best single templates; and 3) without an accurate alignment modeling improvement is not guaranteed by simply using multiple templates to increase alignment coverage. According to the CASP9 official evaluation, our method generated the best alignments for the 50 hardest TBM targets. Our method can also be applied to alignment of multiple protein/RNA sequences and structures.

Long Abstract: Click Here

Poster W03
All-atom knowledge-based potential for RNA structure prediction and assessment

Emidio Capriotti Stanford University
Tomas Norambuena (Pontificia Universidad Católica de Chile, Departamento de Genética Molecular y Microbiología); Marc A. Marti-Renom (Centro de Investigacion Principe Felipe, Department of Bioinformatics and Genetics); Francisco Melo (Pontificia Universidad Católica de Chile, Departamento de Genética Molecular y Microbiología);
 
Short Abstract: The vision that RNA simply serves as information transfer molecules has dramatically changed. The study of the sequence/structure/function relationships in RNA is becoming more important. Therefore, new methods for assessing the accuracy of RNA structure models are needed. We introduce an all-atom knowledge-based potential for the assessment of RNA three-dimensional structures. We have benchmarked our new potential, called RASP, with two different decoy datasets composed of near-native RNA structures. In one of the benchmark sets, RASP was able to rank the native structure as the best and within the top 10 models for ~93% and ~95% of decoys, respectively. The average correlation coefficient between model accuracy, calculated as the RMSD and GDT-TS measures of C3’ atoms, and the RASP score was 0.85 and 0.89, respectively. Based on a recently released benchmark dataset RASP scoring function compared favorably to previously developed methods in the selection of accurate models.

Long Abstract: Click Here

Poster W04
A structural and dynamical model of human telomerase

Samuel Flores Uppsala University
Christina Waldsich (University of Vienna, MFPL);
 
Short Abstract: As described below, the human Telomerase complex has tremendous consequences in human health and longevity. In this presentation I will show how we predicted the structure of much of the complex from fragments crystallized from other organisms, and by combining biochemical evidence using RNABuilder, our recently completed modeling code. I will also show how we combined further evidence on the dynamics of telomere elongation to create an exciting movie of the process. I discuss the implications of the model and propose focused experiments for further validation.

Long Abstract: Click Here

Poster W05
Optimization of residual electrostatic environment to enhance protein thermal stability

Chi-Wen Lee National Chiao Tung University
 
Short Abstract: Electrostatic interaction is known to be one of the most important factors to affect protein thermal stability. This related research is not systemically used to apply the thermal stable protein engineering. Here we present a new computational method to optimize the electrostatic interaction by predicting mutation to increase the thermal stability, which are re-engineered by experimental mutagenesis. The basic idea is to create the novel salt bridge in the proteins to enhance the thermal stability. We applied our method to ?-glucosidase A, which is a glycoside hydrolase that catalyzes the hydrolytic cleavage of the glycosidic bonds, and were able to produce more of stable variants with up to 15.8 ºC. The finial results indicate that the inter-subunit of the electrostatic interaction is more important than the intra-subunit interface to enhance the thermal stability. This method is highly efficient and easy to implement that can provide a useful tool to design more thermostable mutants, and extensive apply to find the hot spot residues that responsible for enhancing thermostability of protein-protein interaction via oligomerization.
 
Poster W06
Complex networks govern coiled coil oligomerization - predicting and profiling by means of a machine learning approach

Ulrich Bodenhofer Johannes Kepler University
Carsten Mahrenholz (Charite Medical School, Institute of Medical Immunology); Ingrid Abfalter (Johannes Kepler University, Institute of Bioinformatics); Rudolf Volkmer (Charite Medical School, Institute of Medical Immunology); Sepp Hochreiter (Johannes Kepler University, nstitute of Bioinformatics);
 
Short Abstract: Understanding the relationship between protein sequence and structure is one of the great challenges in biology. In the case of the ubiquitous coiled coil motif, structure and occurrence have been described in extensive detail, but there is a lack of insight into the rules that govern oligomerization, i.e., how many alpha-helices form a given coiled coil. To shed new light on the formation of two- and three-stranded coiled coils, we developed a machine learning approach to identify rules in the form of weighted amino acid patterns. These rules form the basis of our classification tool PrOCoil, which also visualizes the contribution of each individual amino acid to the overall oligomeric tendency of a given coiled coil sequence. We discovered that sequence positions previously thought irrelevant to direct coiled coil interaction have an undeniable impact on stoichiometry. Our rules also demystify the oligomerization behavior of the yeast transcription factor GCN4, which can now be described as a hybrid - part dimer and part trimer - with both theoretical and experimental justification.

Long Abstract: Click Here

Poster W07
Combining inference and integration for multiple ranked data from genomic studies

Michael Schimek Medical University of Graz
Alena Mysickova (Max Planck Institute for Molecular Genetics, Bioinformatics); Eva Budinska (Swiss Institute of Bioinformatics, Lausanne, Bioinformatics Core Facility);
 
Short Abstract: Current experimental molecular research typically produces a large amount of data from rather small studies, many of them addressing similar questions. Public data repositories make such results easily accessible. A key problem is the combination of data across studies that differ in characteristics such as laboratory technology, quantification of measurements, and size. Only highly conforming subsets of these data are relevant for integration. Here, rank-based methods are most useful. We introduce an approach to deal with several rankings of the same set of genes or proteins that combines the estimation of the lengths of informative sublists with their stochastic aggregation. The goal is to obtain a smaller set of consolidated genes or proteins in a new rank order. Our approach does not only allow for huge data, but also for irregular and incomplete rankings due to random and missing assignments. Its advantages are exemplified on the metaanalysis of microarray experiments.

Long Abstract: Click Here

Poster W08
The imprint of codons on protein structure

Charlotte Deane Oxford University
 
Short Abstract: The central dogma of molecular biology describes the unidirectional flow of interpretable data from genetic sequence to protein sequence. This has led to the idea that a protein’s structure is dependent only on its amino acid sequence. Analysing the input (mRNA) and output (protein) of translation, we find that local protein structure information is encoded in the mRNA nucleotide sequence. Using a detailed mapping between over 4000 solved protein structures and their mRNA we have carried out a comprehensive analysis of codon usage across many organisms. We found no evidence that domain boundaries are enriched with slow codons. In fact, genes seemingly avoid slow codons around structurally defined domain boundaries. Translation speed, however, does decrease at the transition into secondary structure. These results support the premise that codons encode more information than merely amino acids and give insight into the role of translation in protein folding.

Long Abstract: Click Here

Poster W09
PyRy3D: a software tool for modelling of large macromolecular complexes

Joanna Kasprzak Adam Mickiewicz University in Pozna?
Wojciech Potrzebowski (International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, 02-109 Warsaw, POLAND , Laboratory of Bioinformatics and Protein Engineering); Janusz M. Bujnicki (International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, 02-109 Warsaw, POLAND , Laboratory of Bioinformatics and Protein Engineering);
 
Short Abstract: One of the major challenges in structural biology is to determine the structures of macromolecular complexes and to understand their function and mechanism of action. However, compared to structure determination of the individual components, structural characterization of macromolecular assemblies is very difficult. To maximize completeness, accuracy and efficiency of structure determination for large macromolecular complexes, a hybrid computational approach is required that will be able to incorporate spatial information from a variety of experimental methods (like X-ray, NMR, cryo-EM, cross-linking and mass spectrometry, etc.) into modeling procedure. For many biological complexes such an approach might become the only possibility to retrieve structural details essential for planning further experiments e.g. in order to explain mechanism of action.
We developed PyRy3D, a method for building and visualizing low-resolution models of large macromolecular complexes. The components can be represented as rigid bodies (e.g. macromolecular structures determined by X-ray crystallography or NMR, theoretical models, or abstract shapes) or as flexible shapes (e.g. disordered regions or parts of protein or nucleic acid sequence with unknown structure). Spatial restraints are used to identify components interacting with each other, and to pack them tightly into contours of the entire complex (e.g. cryoEM density maps or ab initio reconstructions from SAXS or SANS methods). Such an approach enables creation of low-resolution models even for very large macromolecular complexes with components of unknown 3D structure. Our model building procedure applies Monte Carlo approach to sample the space of solutions fulfilling experimental restraints.
 
Poster W10
Planar shape representation based Protein function prediction

Heng Yang Drexel Univerisity
 
Short Abstract: We present a protein function prediction method which flattens 3D protein structures into 2D planes. The 2D planar grid encodes the 3D protein surface from different projection angles. The Dimension Reduction Method (DRM) is used to reduce the dimensionality of a given protein. 28 different DRMs are applied to each 3D protein structure and evaluated by a voting system to pick the best reduction result.
The method is divided into 3 parts: 3D protein surface generation, dimension reduction, and surface alignment. The 3D protein surface is generated for a given PDB file and the vertex coordinates are scanned into a matrix. Electrostatic properties are assigned to the vertices as color values with the help of APBS tool. The surface matrix is fed to the DRM system and the results are ranked. Our method requires 20 planes. Grid points on the projected planes represent vertices of the original protein surface, and the color value of each point indicates the vertex’s electrostatic property. Corresponding planes between the test protein and the known protein are compared to detect the surface structure similarity of the two proteins.
 
Poster W11
The conserved structures of RNA-dependent RNA polymerases

Heli Mönttinen University of Helsinki
Minna Poranen (University of Helsinki)
 
Short Abstract: RNA-dependent RNA polymerases (RdRp) are responsible for the replication and transcription of RNA viruses. Like DNA-dependent DNA polymerase I and RNA-dependent DNA polymerases RdRps are structurally right-handed including palm, finger and thumb motifs. The catalytic site is situated in the palm motif. In bacteriophage phi6 at a position ~6 Å from catalytic ions there is a Mn2+ binding site. This noncatalytic ion is needed for the correct coordination of nucleotides for catalysis and for the dynamic functions of the RdRp. The noncatalytic ion binding site is conserved in a number of RdRps including RdRps of poliovirus and Dengue virus. Many other RdRps are highly stimulated in the presence of Mn2+ ions. However, the noncatalytic ion has not been detected in all RdRps. This might reflect different crystallization conditions applied in solving the RdRp structures. In this project we have identified the noncatalytic ion binding sites by means of structural alignments and distance calculations. Our preliminary results suggest that the noncatalytic ion binding site is conserved in the RdRps of positive-stranded and double-stranded RNA viruses and putatively also in the reverse transcriptases of retroviruses. These results give insights into structural and functional co-evolution of RdRps. The studies of structural co-evolution will be extended by using computational methods (for instance identifying the sequence fingerprint of RdRps on the basis of structural alignment and predicting protein-ligand interactions).
 
Poster W12
MODORAMA - a web application for comparative protein structure modeling

Jan Kosinski Sapienza University of Rome
Alessandro Barbato (Sapienza University of Rome, Department of Biochemical Sciences); Anna Tramontano (Sapienza University of Rome, Department of Physics); Pascal Benkert (University of Basel, Biozentrum);
 
Short Abstract: Building protein models maximally useful for practical biological analyses is still a difficult task. Thus, we developed a new modeling platform, Modorama, which both greatly facilitates critical steps in comparative modeling and allows for easy integration of biological knowledge.

In Modorama, users can either go through the full modeling procedure starting from a sequence, or discriminate between a user-defined set of alignments and templates, or just refine an existing target-template alignment.

The best modeling templates can be selected according to automatic scores as well as various structural and functional annotations using an intuitive tabular interface. This interface easily allows to search for e.g. “best scoring, best quality template solved in the presence of ATP and DNA”.

Target-template alignments can be manually refined prior to modeling. Utilizing an interactive alignment editor, users can analyze real-time changes in alignment quality without actually building a model. This alignment quality can be evaluated in a similar way to typical expert analysis, which includes evaluation of sequence conservation, secondary structure and solvent accessibility agreement, or a residue-level QMEAN energy profile.

Importantly, the alignment evaluations are automatically performed also for representative target homologs, giving an estimate of how the whole target family “fits” to a given template. Also, potential errors are automatically detected and highlighted.

The interactive interfaces of Modorama such as template selection tables or the alignment editor have been implemented using JavaScript and they can be used in any browser without the need of installing any plug-ins.

Modorama is available at http://biocomputing.it/Modorama
 
Poster W13
Mean Scaled RMSD: A Size Independent Measure For Superimposition of Protein Structures

Seyed Shahriar Arab Institute for Research in Fundamental Sciences (IPM)
Rosa Aghdam (Institute for Research in Fundamental Sciences (IPM), Bioinformatics); Masoud Rahimi Ghazi Kalayeh (Institute for Research in Fundamental Sciences (IPM), Computer Science); Changiz Eslahchi (Shahid Beheshti University, Mathematics); Mehdi Sadeghi (National Institute of Genetic Engineering and Biotechnology, Biochemistry); Hamid Pezeshk (University of Tehran, Center of Excellence in Biomathematics, College of Science);
 
Short Abstract: To compare three-dimensional structures of two molecules, biologists mostly use the RMSD (root mean square deviation) of the atomic coordinates after optimal rigid body superimposition as the similarity measure. In the same size segments, the smaller the RMSD is, the more similar the two structures are. What is not so clear is comparing of RMSD values for segments in different lengths.
In this work, we introduce a new measure, mRMSD (mean-scaled RMSD) of structural similarity based on RMSD that is independent of the sizes of the segments.
We show how mRMSD fits can be used to identify protein domains and protein mobility. Lastly we investigate the use of mRMSD to evaluate predicted protein structures.
 
Poster W14
Probabilistic model for secondary structure prediction from protein chemical shifts

Martin Mechelke MPI for Developmental Biology
Michael Habeck (Department of Protein Evolution, MPI for Developmental Biology);
 
Short Abstract: (This poster is based on Proceedings Submission 142)
Motivation:
Protein chemical shifts are long known to encode valuable structural information that it is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches that infer the relationship between chemical shifts and secondary structure have a long tradition in biomolecular NMR. These methods range from simple chemical shift index to complex neural networks. In the most successful prediction algorithms, the relation between secondary structure and chemical shift is obscured by the complexity and missing transparency of the prediction device that often involves many parameters.
Results: We present hidden Markov models with Gaussian and non-Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for given amino acid and secondary structure type. We estimated and tested several multivariate probability distributions including uncorrelated and fully correlated Gaussian distributions. To capture asymmetries and outliers in the chemical shift distributions, we also estimated multivariate Normal inverse Gaussian distributions that are more flexible than Gaussians and can account for skewness and heavy tails. Using these distributions as outputs of first and second order HMMs we achieve a prediction accuracy of 82.7% maximum, which is competitive with existing methods for predicting secondary structure from protein chemical shifts.
 
Poster W15
Multi-LZerD: Multiple protein docking for asymmetric complexes

Juan Esquivel-Rodríguez Purdue University
Daisuke Kihara (Purdue University, Biological Sciences/Computer Science);
 
Short Abstract: Binary and multimeric protein complexes are involved in many biological processes mediating diverse important cellular functions. The tertiary structures of protein complexes provide a crucial insight about the molecular mechanisms that regulate their functions and assembly. However, solving protein complex structures by experimental methods is often more difficult than single protein structures. Computational prediction methods are expected to make significant contributions in this area. Until now most of the protein docking prediction methods focus only on pairwise docking. There exist a handful of methods for the prediction of multimeric complexes, but almost all of them assume specific properties such as homomericity or symmetry. Considering a substantial number of multimeric complexes of diverse kinds exist in a cell, there is an urgent need for the development of a multiple protein docking prediction method that does not target a specific type of complexes. We have developed a novel multiple protein docking algorithm, Multi-LZerD, that builds models of multimeric complexes by effectively reusing pairwise docking predictions of component proteins. A genetic algorithm is applied to explore the conformational space followed by a structure refinement procedure. Benchmark on seven hetero-multimeric complexes resulted in near native conformations for all of them (a root mean square deviation smaller than 2Å). Our framework was able to predict near native structures for multimeric complexes of various topologies
 
Poster W16
Modelling Protein Domain Binding to Kinase Structures altered by Phosphorylation

Christoph Gernert Helmholtz Centre for Infection Research
 
Short Abstract: This poster is based on Proceedings Submission 76.

In my previous work I connected experimental mass spectrometry (MS) data with a protein interaction database to recognize most likely active signaling pathways. I utilized the information gained from quantitative MS experiments and especially changes in protein modifications to highlight the "flow" of information from one protein to another. The resulting interaction network represents proteins and their interactions, reflecting known posttranslational modifications like phosphorylation. The spanned network of highlighted edges easily shows temporal or spatial changes which reveal the likely state of any signal transduction network.
Through mass spectrometry experiments phosphorylations at different protein kinases are detected, with no clue of their interaction partners. To check potential partners for a posttranslational modified protein kinase, an available 3D model of the protein from the Protein Data Bank (PDB) is extended by a phosphate molecule at the corresponding site. The modified structure is involved in an in silico docking process consecutively with the ten possible phosphoprotein-binding domains, 14-3-3, BRCT, C2, FHA, MH2, PBD, PTB, SH2, WD-40 and WW. In this process various combinations of the two structures are tested whether they fit together and a protein interaction is given. Proteins that include a domain which returned a positive result in the docking process are potential interaction partners.
 
Poster W17
Full atom comparative three-dimensional modeling of protein-DNA complexes

Francisco Melo P. Universidad Catolica de Chile
 
Short Abstract: No free computer tools are currently available to build full atom three-dimensional (3D) models of protein-DNA complexes. Existing tools are only able to model either full atom DNA duplexes (ie. 3DNA), full atom proteins (ie. MODELLER) or coarse grain protein-DNA complexes (ie. Rosetta). Among these, when duplex DNA molecules need to be modeled, Rosetta modeling software suite and 3DNA software are the most widely used. The major problem with currently exiting tools is their incapacity to properly model the DNA sugar-phosphate backbone. Unfortunately, about 70% of the atomic contacts that DNA make with proteins involve DNA atoms that belong to the sugar-phosphate backbone. Therefore, the accurate modeling of the complete DNA duplex molecule is required if the final objective is to model the 3D structure of protein-DNA complexes.

In this work, new software that has been developed in our laboratory to build full atom 3D models of protein-DNA complexes is presented. The software was written in Python and it can be described as a new module of MODELLER software. The generated 3D models are consistent with CHARMM molecular topology and thus have proper stereochemistry. A detailed description of the method is provided, as well as an assessment of its performance, along with a comparison with models produced with other software.
 
Poster W18
Identification of cavities on protein surface using multiple compu-tational approaches for drug binding site prediction

Bingding Huang Techinical University of Dresden
 
Short Abstract: Protein-ligand binding sites are the active sites on protein sur-face that perform protein functions. Thus the identification of those binding sites is often the first step to study protein functions and structure-based drug design. There are many computational algo-rithms and tools developed in recent decades, such as LIGSITEcs/c, PASS, Q-SiteFinder, SURFNET and so on. In our previous work MetaPocket, we have proved that it is possible to combine the results of many methods together to improve the prediction result. Here we continue our previous work by adding four more methods Fpocket, GHECOM, ConCavity and POCASA to further improve the prediction success rate. The new method MetaPocket 2.0 and the individual approaches are all tested on two datasets of 48 un-bound/bound and 210 bound structures as used before. The results show that the average success rate has been raised 5% at the top1 prediction compared to previous work. Moreover, we construct a non-redundant data-set of drug-target complexes with known structure from DrugBank, DrugPort and PDB database and apply MetaPocket 2.0 to this data-set to predict drug-binding sites. As a result, more than 74% drug-binding sites on protein target are cor-rectly identified at the top 3 prediction, and it is 12% better than the best individual approach. MetaPocket 2.0 is freely available at two web sites: http://projects.biotec.tu-dresden.de/metapocket/ and http://sysbio.zju.edu.cn/metapocket.
 
Poster W19
Physical contacts-based score for evaluation of protein three-dimensional models

Kliment Olechnovic Vilnius University Institute of Biotechnology
Eleonora Kulberkyte (Vilnius University Institute of Biotechnology, Department of Bioinformatics); Ceslovas Venclovas (Vilnius University Institute of Biotechnology, Department of Bioinformatics);
 
Short Abstract: The ability to effectively score protein structure models against the reference structure is at
the heart of protein structure prediction methods comparison and development. Currently,
Global Distance Test (GDT) and the template modeling score (TM-score), are among the most
popular scores. They are able to rank models of different level of completeness as is often the
case when different prediction methods are used. However, both GDT and TM-score suffer
from at least two major weaknesses. First, both scores consider only the main chain and are
not able to assess the side chain prediction accuracy. Second, both scores perform poorly on
multidomain proteins as they are based on the rigid-body comparison.

We have developed a new scoring method for the model evaluation against the reference
protein structure. The method is based on physical contacts derived from the Voronoi diagram
of atoms. Therefore, the method considers real interatomic interactions within protein structure
and does not use any arbitrary parameters or cutoffs. The new score assigns the model quality
in the {0;1} range depending on the agreement with the reference structure. For single domain
structures our contact-based score shows a strong correlation (over 0.9) with both GDT and
TM-score, at the same time providing a significantly better resolution for models having
relatively accurate main chain trace. Moreover, the new score does not require structure
superposition, and it works equally well on single-, multi-domain and even multi-chain protein
models. We implemented the new method both as a standalone program and as a web server.
 
Poster W20
A Novel Approach for Predicting Disordered Regions

Meijing Li Chungbuk National University
Jung Kwang Su (Korea National Institution of Health, Division of Bio-Medical Informatics); Keun Ho Ryu (Chungbuk National University, Database/Bioinformatics laboratory);
 
Short Abstract: The study of disordered reigns in proteins applying data mining techniques is recently becoming important as a research topic in Bioinformatics. Until now, a number of published predictors for these protein regions are based on various algorithms and disordered protein sequence properties. In this study, we propose a novel approach for predicting disordered protein regions using emerging subsequence mining without using the characteristics of disordered proteins. An emerging subsequence is a part of a protein sequence that has a higher frequency occurrence in target class than in source class. We first adapt the new approach to generate emerging protein subsequences on public protein sequence data. Second, the disordered and ordered regions in a protein sequence were predicted by searching generated emerging protein subsequence with sliding window, which tends to overlap. Third, the scores of the overlapping regions were calculated based on support and growth_rate values in both classes. Finally, the score of predicted regions in the target class were compared with the score of the source class, and the class having a higher score was selected.
In this experiment, disordered sequence data is extracted from DisProt 4.9, ordered sequence data is from PDB, both used as training data. The test data came from CASP 7 with known disordered and ordered regions. Comparing with several published predictors, the results of the experiment showed higher accuracy rate than other existing methods.
 
Poster W21
The influence of the local sequence environment on RNA loop structures

Dirk Walther Max Planck Institute for Molecular Plant Physiology
Christian Schudoma (Max Planck Institute for Molecular Plant Physiology, Bioinformatics); Abdelhalim Larhlimi (Max Planck Institute for Molecular Plant Physiology, Bioinformatics);
 
Short Abstract: RNA folding is assumed to be a hierarchical process. The secondary structure of an RNA molecule, signified by base pairing and stacking interactions between the paired bases, is formed first. Subsequently, the RNA molecule adopts an energetically favourable 3D conformation in the structural space determined mainly by the rotational degrees of freedom associated with the backbone of regions
of unpaired nucleotides (loops). To what extent the backbone conformation of RNA loops also results from interactions within the local sequence context or rather follows global optimisation constraints alone has not been addressed yet. Because the majority of base stacking interactions are exerted locally, a critical influence of local sequence on local structure appears plausible. Thus, local loop structure ought to be predictable, at least in part, from the local sequence context alone. To test this hypothesis, we used Random Forests on a non-redundant dataset of unpaired nucleotides extracted from 97 X-ray structures from the Protein Data Bank (PDB) to predict discrete backbone angle
conformations given by the discretised ?/?- pseudo-torsional space. Predictions on balanced sets with
4 to 6 conformational classes using local sequence information yielded average accuracies of up to
55%, thus significantly better than expected by chance (17%-25%). Bases close to the central nucleotide appear to be most tightly linked to its conformation. Our results suggest that RNA loop structure does not only depend on long-range base-pairing interactions. Instead it appears that local sequence context exerts a significant influence on the formation of the local loop structure.
 
Poster W22
Structure Prediction of CRISPR repeats in Prokaryotic Immunity

Sita Lange University of Freiburg
Dominic Rose (University of Freiburg, Bioinformatics); Omer Alkhnbashi (University of Freiburg, Bioinformatics); Rolf Backofen (University of Freiburg, Bioinformatics);
 
Short Abstract: Prokaryotes are known to acquire immunity against phages and viruses through a widely conserved RNA-based gene silencing pathway. Fragments of the foreign DNA are initially integrated into clusters of regularly interspaced short palindromic repeats (CRISPRs). During a new invasion these fragments are extracted from the CRISPR array as mature CRISPR-RNAs (crRNAs), which target the viral DNA for degradation. A small hairpin structure has been proposed to guide an endoribonuclease to the cleavage site and there is recent evidence of direct interaction between the stem-loop and the Cas6, one of the many CRISPR-associated (Cas) proteins. Based on sequence similarity, Kunin et al. (2007) reported 12 major families of CRISPR repeats and claimed that only six exhibit the typical hairpin motif. We have revisited structural properties of the CRISPR system on a genome-wide scale. Our results show that the hairpin motif is present in almost all families. Additionally, some sequences are misplaced and a few clusters ought to be subdivided into further families. Since repeats are able to form multiple stem-loop structures, we have developed an approach to predict the functional hairpin of a single CRISPR array by folding the repeat structures within the context of all spacer sequences. We show that the most probable hairpin in such a context is not always the minimum free energy (MFE) structure.
 
Poster W23
Peptides modulating conformational changes: From in silico design to preclinical proof of concept

Itamar Borukhov Compugen Ltd.
Yossef Kliger (Compugen Ltd., Research & Development); Ofer Levy (Compugen Ltd., Research & Development); Anat Oren (Compugen Ltd., Research & Development); Haim Ashkenazy (Compugen Ltd. and The Mina and Everard Goodman Faculty of Life Sciences, Research & Development); Zohar Tiran (Compugen Ltd., Research & Development); Amit Novik (Compugen Ltd., Research & Development); Avi Rosenberg (Compugen Ltd., Research & Development); Anat Amir (Compugen Ltd., Research & Development); Assaf Wool (Compugen Ltd., Research & Development); Amir Toporik (Compugen Ltd., Research & Development); Ehud Schreiber (Compugen Ltd., Research & Development); Dani Eshel (Compugen Ltd., Research & Development); Zurit Levine (Compugen Ltd., Research & Development); Yossi Cohen (Compugen Ltd., Research & Development); Claudia Nold-Petry (University of Colorado.) Charles A. Dinarello (University of Colorado.) Itamar Borukhov (Compugen Ltd., Research & Development);
 
Short Abstract: Blocking conformational changes in biologically active proteins holds therapeutic promise. Inspired by the susceptibility of viral entry to inhibition by synthetic peptides that block the formation of helix–helix interactions in viral envelope proteins, we developed a computational approach for predicting interacting helices. Using this approach, which combines correlated mutations analysis and Fourier transform, we designed peptides that target gp96 and Clusterin, two secreted chaperones known to shift between inactive and active conformations. The gp96-derived peptide and the Clusterin-derived peptide exhibit anti-inflammatory and anti-tumor activity, respectively, both in-vitro and in-vivo. Furthermore, the predicted mode of action of the active peptides was experimentally verified: both peptides bound to their parent proteins, and their biological activity was abolished in the presence of the peptides corresponding to the counterpart helices. These data demonstrate a novel method for rational design of protein antagonists.
 
Poster W24
Spritz: Intrinsic protein disorder prediction for genomes and insights into protein function.

Ian Walsh University of Padua
Alberto J. Martin Martin (University of Padua, Department of Biology); Tomas Di Domenico (University of Padua, Department of Biology); Silvio Tosatto (University of Padua, Department of Biology);
 
Short Abstract: An alternative view is emerging with respect to non-folding regions which suggest a reassessment of the structure-to-function paradigm. These non-folding flexible regions within a protein are known as disordered regions. This work is concerned with two algorithms for the prediction of this important protein feature:

Genomes: Protein disordered regions are key for the function of numerous processes within an organism. Experimental annotations for disorder remain low (e.g. Disprot and the PDB) while current genomic efforts cause sequence databases to grow very quickly. ESpritz is an accurate and efficient method which can process entire organisms in hours (human genome in approximately 6hrs). Espritz uses the Bi-directional Recurrent Neural Network algorithm. We prove that ESpritz is state-of-the-art with respect to Disopred and IUPred on X-ray, Disprot and NMR type disorder.
Server availability: http://protein.bio.unipd.it/espritz/. Executables can be downloaded from http://protein.bio.unipd.it/download/

Disorder and function: Disorder, secondary structure and possible functional amino acid patterns (linear motifs) are particularly useful for determining the biological role of a protein. Our server CSpritz allows the analysis of these three pieces of information. Motif patterns are found from ELM(http://elm.eu.org/) and secondary structure from Porter(http://distill.ucd.ie/porter/). The algorithm for disorder prediction is based on averaging three orthogonal systems developed in-house. CSpritz is less efficient but more accurate than ESpritz. It is more suited to the examination of a small number of proteins which may be of biological interest. We benchmark CSpritz on the recent CASP9 dataset and prove that it is state-of-the-art with respect to the other participating groups at CASP9.
Server availability: http://protein.bio.unipd.it/cspritz/
 
Poster W25
TargetP 2.0: Improved subcellular localization prediction

Daniel Edsgärd Royal Institute of Technology
 
Short Abstract: Determining the subcellular localization of a protein is an important first step toward understanding its function. The translocation of a protein within the cell is governed by intrinsic signals of the proteins, and bioinformatics has become an important tool for the characterization and prediction of these protein-encoded sorting signals. TargetP is a neural-network based subcellular localization tool which provides prediction of chloroplast, mitochondrial and secretory pathway translocation, based on N-terminal sequence motifs. It can be applied to newly identified proteins and to characterize the proteome-wide localization patterns of a species protein content. Here we report a comprehensive update of TargetP with regard to two major aspects. First, TargetP is extended to supply specific predictions for three phyla; plant, fungi and metazoa; rather than only for plants and non-plants. Second, new protein compartments are added. TargetP is currently available as a web-server at http://www.cbs.dtu.dk/services/TargetP, where also the new version will be released.
 
Poster W26
Slip symmetry – a new type of symmetry in protein structures

Dukka KC NIH-NCI
Todd Taylor (NIH, NCI); Byungkook Lee (NIH, NCI);
 
Short Abstract: We have recently developed a procedure called SymD to determine whether a protein is symmetric or not (Kim et al., BMC Bioinfo., 2010). Upon applying SymD on ASTRAL 40 dataset we obtained 10 to 15 % of proteins to be symmetric according to our criteria.

Upon detailed investigation of the results, we have realized that there are proteins that have high similarity score (Z-score) but are different from regular symmetric proteins. These proteins have what we call a ‘slip symmetry’, i.e. the structure superimposes well with itself when translated by a small amount (slipped) along a certain direction. Structural alignment between two proteins with slip symmetry can produce two or more different structure-based sequence alignments that have similar scores.

We have developed a procedure to determine whether a protein has a ‘slip symmetry’ using the Z-score, symmetry angle, and the sequence alignment that SymD produces. The method is based on the value of the average serial number difference between the aligned residues of the protein with high z-score and small symmetry angle.

We have applied this algorithm to the whole dataset of Astral SCOP 1.75 that has around 110000 domains and have obtained nearly 700 domains that have a slip symmetry. These include a large number of helix bundles but also some in Tim-barrel (c.1), Rossmann (c.2), and other folds.
 
Poster W27
A multiple-template approach to protein threading

Jian Peng Toyota Technological Institute at Chicago
Jinbo Xu (Toyota Technological Institute at Chicago)
 
Short Abstract: Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This work describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. According to the CASP9 official evaluation, our method outperforms almost all other CASP9 servers and our method generated the best alignments for the 50 hardest template-based modeling targets. Our method was also voted by the CASP9 community as one of the most innovative and interesting methods. Our method will have greater potential in the near future when many more templates are available due to the increasing number of solved structures. We can further improve alignment accuracy by extending our algorithm to simultaneously thread multiple homologous sequences to multiple templates.
 
Poster W28
A novel type of methyltransferase encoded by a conserved domain of GRAS

Magdalena Dymecka M. Sk?odowska-Curie Cancer Center and Institute of Oncology
Katarzyna Kokoszy?ska (M. Sk?odowska-Curie Cancer Center and Institute of Oncology, -); Leszek Rychlewski (Bioinfobank Institute, -); Cordelia Bolle (Ludwig-Maximilians-Universitat, -); Lucjan Wyrwicz (M. Sk?odowska-Curie Cancer Center and Institute of Oncology, -);
 
Short Abstract: The GRAS proteins, expressed exclusively in plants, play an important role in plant growth and development and take part in such processes as signal transduction, gibberellic acid (GA) signaling and meristem maintenance. As the exact molecular mechanism of GRAS remains undiscovered, a detailed bioinformatics study was performed.
The GRAS proteins sequence analysis showed the presence of two regions - the DELLA domain and the second domain (also known as GRAS domain). The previous reports on the GRAS domain based on its nuclear localization and the presence of five characteristic sequence motifs suggested that these proteins may belong to the family of transcription factors.
Here we present the results of a detailed bioinformatics study which reveal the presence of a novel methyltransferase fold within the GRAS domain. Top-of-the-line bioinformatics (identification of homology via meta-profile methods as well as mapping of distant protein homology for homology modeling) served to prepare a structural model of the GRAS domain.
The presented results shed new light on the putative function of the GRAS protein. The mapping of the identified functional regions allowed to build a hypothesis on its molecular action, still the analyses require further experimental verification.
 
Poster W29
Implementation of the Protein Fluorescence And Structural Toolkit (PFAST)

Cynthia Prudence University of Rhode Island
Yana Reshetnyak (University of Rhode Island, Physics Department);
 
Short Abstract: The major goal in the application of tryptophan fluorescence spectroscopy is to interpret fluorescence properties in terms of structural parameters and to predict of the structural changes in the protein. We have developed methods for the mathematical analysis of fluorescence spectra of multitryptophan proteins aimed at revealing the spectral components of individual tryptophan or clusters of tryptophan residues located close to each other (Burstein et al., 2001, Biophys J., 81, 1699-1709; Reshetnyak and Burstein, 2001, Biophys. J., 81, 1710-1734). Also, we have created an algorithm for the structural analysis of the tryptophan environment in 3D atomic structures of proteins from PDB (Reshetnyak et al, 2001, Biophys. J., 81, 1735-1758). The successful design of the methods of spectral and structural analysis opened an opportunity for establishing a relationship between the spectral and structural properties of a protein. We have integrated the developed software modules, introduced new programs for the assignment of tryptophan residues to spectral-structural classes, and created a web-based toolkit PFAST: Protein Fluorescence and Structural Toolkit (Shen et al., 2008, Proteins, 71, 1744-1754). PFAST contains 3 modules: 1) FCAT - fluorescence-correlation analysis tool, which decomposes protein fluorescence spectra and assigns spectral components to one of five previously established spectral-structural classes. 2) SCAT - structural-correlation analysis tool for the calculation of the structural parameters of the environment of tryptophan residues from the atomic structures of the proteins from the PDB, and for the assignment of tryptophan residues to one of five spectral-structural classes. 3) The last module is a PFAST database.
 
Poster W30
Maximal Use of Information Found in All Protein Structure Data

Armando Solis New York City College of Technology, The City University of New York
 
Short Abstract: The universal mechanism of transforming occurrence frequencies in structural data into “energies” or scores, for use as knowledge-based potentials, requires a well-defined way of assembling a comprehensive structural data set from the PDB. My ongoing strategy, to link information-theoretic notions to the nature and action of empirical potentials, continues to explore ways to maximize the information that can be extracted from still limited data. I will report on the latest results of one particular application of this strategy. Specifically, I am advancing an improved method to use all high-resolution structures in the PDB to build more informative potentials. This is in contrast to the commonly employed strategy of using a non-redundant subset of structures, which limits database bias, but also leaves out potentially valuable knowledge. This approach is supported by fundamental information-theoretic formulations, with the consistent goal of maximizing discrimination between native and decoy conformations. I present my most current results, which appear to demonstrate that such an approach can potentially produce potentials that show markedly improved performance in fold recognition tests.
 
Poster W31
Investigating the effects of pH variations on structure and stability of maltotriose-binding protein from Thermus thermophilus using a combined approach of Fluorescence correlation spectroscopy and Mo

Anna Marabotti Italian National Research Council
Antonio Varriale (Italian National Research Council, Institute of Protein Biochemistry); Maria Staiano (Italian National Research Council, Institute of Protein Biochemistry); Luisa Iozzino (Italian National Research Council, Institute of Protein Biochemistry); Luciano Milanesi (Italian National Research Council, Institute of Biomedical Technologies); Sabato D'Auria (Italian National Research Council, Institute of Protein Biochemistry);
 
Short Abstract: We used a combination of a computational technique (Molecular Dynamics, MD) and of an experimental technique (Fluorescence Correlation Spectroscopy, FCS) to study the conformational variability of Maltotriose binding protein from T. thermophilus (MalE2) in different conditions of pH. The structure of the unliganded form of the protein was modelled and simulations were conducted in three different pH conditions by changing the protonation state of the protic residues according to their predicted pKa.
Results show that at extreme pH values multiple conformations of the protein are visible, indicating that the tertiary structure of the protein is highly unstable. This effect is maximum at acidic pH, indicating that probably the protein undergoes a pH-induced unfolding transition. On the contrary, at neutral pH the protein shows only one cluster of tertiary structures with a low RMSF. A PCA analysis identified some regions of the protein with increased flexibility. The diffusion coefficient calculated from MD simulations is in excellent agreement with the experimental one obtained by FCS. It is interesting to note that these two investigation techniques are both focused on single molecules, but their timescale are completely different since FCS explores the dynamics of a protein in the microsecond time range.
This work provides the evidence that MD simulations are able to anticipate the phenomena occurring at a longer timescale, supporting the hypothesis that at these two extreme pH conditions this protein shows a conformational instability on its tertiary structure that can evolve into unfolded or less stable states.
 
Poster W32
Pathogenic-Or-Not-Pipeline (PON-P): Integration of predictors for disease relevance

Mauno Vihinen University of Tampere
Ayodeji Olatubosun (University of Tampere, Institute of Biomedical Technology); Jouni Väliaho (University of Tampere, Institute of Biomedical Technology);
 
Short Abstract: This work deals with the prediction of whether a non-synonymous single nucleotide polymorphism leads to disease or not. This is an important scientific problem, with potential applications in such areas as molecular diagnosis, prioritization of experiments and screening of variations. This work utilizes a unique approach in an effort to improve on the current performance limitations of predictors in this problem domain. The importance of this work to this particular conference is further highlighted by the fact that the SNP-special interest group one-day meeting is organized specifically to address the same problem dealt with in this work, which will also be the focus of the second special session in this meeting. The results of this work, and their implications would be of high interest to researchers in many fields.
 
Poster W33
Protein function prediction using clustering method of Protein-Protein Interaction network

Ji Hye Choi University of Dankook
Sejong Oh (University of Dankook)
 
Short Abstract: Assigning functions to novel proteins is one of the most important problems in the postagenomic era. Several approaches have been applied to this problem, including the analysis of gene expression patterns, phylogenetic profiles, protein fusions and protein-protein interaction. In this study, we develop a novel approach to infer a protein's function using clustering method on protein-protein interaction network. In protein-protein interaction network, nodes can represent each protein and edges can represent relationshils of interaction. If two nodes are interacted, the nodes are functionally similar. It is representative feature of protein-protein interaction. And we can detect protein complexes by clustering protein-protein interaction network. In same protein complex, each protein is functionally associated to other proteins. So if we use result of clustering, we can infer functions of unknown proteins. We gained data of protein-protein interaction in MIPS. And used clustering method is ClusterONE of Cytoscape software. From our experiments, we confirmed improvement of prediction accuracy. Particularly ratio of 'true positive(TP)' is highly increased. Problem of other methods like neighbor counting and chi-square was low accuracy of TP. Proposed method promises to improve the accuracy of protein function prediction.
 
Poster W34
Comprehensive study of HOX transcription factors DNA binding specificity

Katarzyna Kokoszynska Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology
Maja Ma?kowska (Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Laboratory of Bioinformatics and Systems Biology); Leszek Rychlewski (BioInfoBank Institute, -); Lucjan Wyrwicz (Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Laboratory of Bioinformatics and Systems Biology);
 
Short Abstract: The homeodomain-containing HOX protein family forms an important evolutionary network of transcription factors regulating the development and differentiation in Metazoa. Though the high level of homeodomain conservation within the protein family, particular HOX proteins are characterized by specific regulation of target genes. Previous structural studies of HOX representatives pointed that the mechanism of DNA fragment recognition by the third alpha-helix of the homeodomain occurs via interactions with the major groove. The additional region responsible for the specific DNA recognition - so called N-terminal arm - is involved in DNA minor groove interaction, yet very little is known how the N-terminal arm achieves its specificity.
For the purpose of better understanding the specific mechanism of transcription regulation by HOX proteins, we performed a detailed structural study of protein-DNA interaction specificity. Using homology modeling methods we prepared 3D models of the complete set of human HOX proteins putting it into an open-access repository of HOX structures. The summary of the experimental data (source: TRANSFAC database) together with the analysis of evolutionary protein sequence conservation and structure conformation study of HOX monomers was performed. Separate analyses were made for the HOX proteins in heterodimer conformations with their cofactors playing a major role in the complex binding specificity (TALE homeodomain proteins).

The obtained results are discussed along with the detailed analyses of expression pattern of HOX proteins and their target genes, giving a new insight into the potential mechanism of the DNA recognition specificity and processes regulated by these factors.
 
Poster W35
Structure Based Analysis of the Interactions between PAS Domain Containing Circadian Clock Proteins

SERAP BELDAR KOÇ UNIVERSITY
Serap Beldar (KOÇ UNIVERSITY) Özlem Keskin (KOÇ UNIVERSITY, Chemical and Biological Engineering); Halil Kavakli (KOÇ UNIVERSITY, Chemical and Biological Engineering); Attila Gürsoy (KOÇ UNIVERSITY, Computer Engineering);
 
Short Abstract: The mammalian BMAL1 and CLOCK are the transcription factors that bind E-box and regulate the expression of clock controlled genes. On the other hand, the PER2 and CRYPTOCHROME (CRY) proteins act as negative regulators of such transcription. PER2 and CRY form ternary complexes with casein kinase I? (in the cytoplasm) and translocates into the nucleus, where they form complex with BMAL1/CLOCK and suppress expression of clock controlled genes. Many experimental studies have been performed to figure out the interactions between these clock proteins. However, there is no study related with how clock proteins form such a complex including all clock proteins. To understand the nature of such a complex, we performed structure based analysis on protein-protein interactions with clock proteins (CLOCK, BMAL1 and PER2) that contains PAS domain. This complex was analyzed by using structural data and efficient structural comparison algorithms to predict potential interactions. Since there are no available atomic structures for clock proteins, homology models are used as structural data. In our model, BMAL1 and CLOCK interacts with each other through their PAS domain and the PER2 interacts with this dimer through the PAS domain of CLOCK. This is the first time that we showed how the clock complexes form at the molecular level. This study is not only important to understand clock mechanism at molecular level but also will allow us to develop drugs against clock-regulated diseases like Jet-Lag and some form of depressions. Currently we are carrying out experimental work to prove this model.
 
Poster W36
The MULTICOM Toolbox for Protein Structure Prediction

Jianlin Cheng University of Missouri Columbia
Jilong Li (University of Missouri, Columbia, Computer Science); Zheng Wang (University of Missouri, Columbia, Computer Science); Jesse Eickholt (University of Missouri, Columbia, Computer Science); Xin Deng (University of Missouri, Columbia, Computer Science); Jianfeng He (University of Missouri, Columbia, Computer Science);
 
Short Abstract: As genome sequencing is becoming a routine in biomedical research, the total number of protein sequences is increasing exponentially, reaching over 108 million recently. However, to date, only a tiny portion of these proteins (e.g. ~72,000) have known tertiary structures determined by experimental techniques. The gap between protein sequence and structure still enlarges rapidly as the throughput of genome sequencing is much higher than that of protein structure determination. Computational software tools of predicting protein structure and structural features are crucial to make use of this vast repository of protein resources. To meet the need, we developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools, including secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction. These tools had been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving the state-of-the-art or close performance. In order to facilitate bioinformatics research and technology development in the field, we make the MULTICOM toolbox freely available for academic use and scientific research at http://sysbio.rnet.missouri.edu/multicom_toolbox/.
 
Poster W37
Comprehensive structure-to-function analysis of transcription factor IID (TFIID) complex

Maja Ma?kowska Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology
Maja Małkowska (Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology) Katarzyna Kokoszy?ska (Maria Sklodowska- Curie Memorial Cancer Center and Institute of Oncology, Laboratory of Bioinformatics and Systems Biology); Leszek Rychlewski (BioInfoBank Institute, BioInfoBank Institute); Lucjan Wyrwicz (Maria Sklodowska- Curie Memorial Cancer Center and Institute of Oncology, Laboratory of Bioinformatics and Systems Biology);
 
Short Abstract: The transcription of protein genes into mRNA in Eukaryotes is
mediated by RNA polymerase II (Pol II). The initiation of this process
demands cooperation of many transcription factors (TFIIs). The main
component of preinitiation complex is TFIID. It recognizes and binds the
gene promoter, forms a scaffold for other transcription factors (TFIIA,
TFIIB, TFIIE, TFIIF, TFIIH) and provides the proper position of Pol
II. TFIID is a large macromolecular complex composed of TATA-box
binding protein (TBP) and a group of 14 conserved TBP-associated
factors (TAFs). TAFs are known to regulate transcription at various
levels - mediating transcription via interaction with activators, histone
modifications, recognition and binding to promoters, being a platform for
other transcription factors and RNA polymerase II.

Despite numerous previous studies of the TFIID complex still the
knowledge on the structure of its components and thus the exact mechanism
of its function, remains undetermined. To perform an in-depth analysis of
TFIID we performed the state-of-the-art structural bioinformatic analysis
of the TFIID complex. Homology modeling of particular TAFs allowed to
determine a potential structure of TAF2 as M1 aminopeptidase-like protein,
and facilitated mapping of previously not fully characterized structural
domains in well-studied TAF proteins (including full histone-like domains
of TAF4 and 12 or TAF3 and 8 bromodomain associated structure). In the
study we provided the detailed structural models for all elements of human
and major model species analyzed in the context of TFIID activity with
indication of structural alterations within TFIID in various model species.
 
Poster W38
Structure-based classification of the plant non-specific lipid transfer protein superfamily towards its functional characterization

Cécile Fleury INRA/CIRAD
Marie-Françoise GAUTIER (INRA, UMR AGAP); Franck MOLINA (SysDiag CNRS/Bio-Rad, UMR 3145); Frédéric de LAMOTTE (INRA, UMR AGAP); Manuel RUIZ (CIRAD, UMR AGAP);
 
Short Abstract: The non specific Lipid Transfer Proteins (nsLTPs) show large variations in their sequences, biological roles, quaternary associations and the nature of bound hydrophobic ligands. Besides, they are involved in a large number of biological processes relative to plant development and defense However they share a conserved eight-cysteine-residue pattern which plays an important role in the structural scaffold. For these reasons, the nsLTP superfamily constitutes an interesting case of study to validate a method designed for the investigation of protein structure-function relationships. According to new identification criteria, 800 mature amino acid sequences belonging to more than 100 plant species have been selected and submitted to comparative analysis. Given the overall fold conservation and the good quality of the available experimental structures, reliable models were obtained for all nsLTPs using homology modeling method, although the identity rate observed among all nsLTP sequences is only up from 25%. Sequence- and structure-based phylogenies have been performed independently, and both resulting classifications have been compared. Using the evolutionary trace method, we observed that only the eight cysteines are strictly conserved among the whole family members. Other less conserved residues appear to be class-specific and localized inside the hydrophobic cavity which accommodates the ligand. A high variability is observed on the external loop which is supposed to be structurally unimportant but probably playing a role in driving the ligand into the cavity. We assume that this region may be responsible for ligand binding diversity. Further investigation will focus on the molecular determinants of this variability.
 
Poster W39
A comparative study on filtering protein secondary structure prediction

Petros Kountouris University of Cyprus
Chris Christodoulou (University of Cyprus) Michalis Agathocleous (University of Cyprus, Computer Science); Vasilis Promponas (University of Cyprus, Biological Sciences); Georgia Christodoulou (University of Cyprus, Computer Science); Simos Hadjicostas (University of Cyprus, Computer Science); Vassilis Vassiliades (University of Cyprus, Computer Science);
 
Short Abstract: Over the past 20 years, machine learning techniques and evolutionary information have significantly boosted the quality of protein secondary structure prediction methods. Moreover, filtering of the final predictions have been shown to improve and smooth the predictions, providing more realistic results. Both machine learning techniques and empirical rules have been used to filter the sequence-to-structure secondary structure prediction. Despite being employed widely, to the best of our knowledge, no study has been carried out to find the most suitable filtering technique. In this paper, we perform a comparative study on the challenging problem of filtering, utilising both widely used empirical smoothing rules and machine learning techniques. In particular, we predicted the secondary structure using an ensemble of Bidirectional Recurrent Neural Networks and employed the WEKA software package to evaluate filtering with the following algorithms: Naive Bayes, Classification And Regression Tree, AdaBoost, Logistic function, Radial Basis Function (RBF) Neural Network, k-Nearest Neighbour, Multilayer Perceptron, J48 decision trees, Random Forest and combinations of the above using several voting schemes. In addition, we filtered the predictions using Support Vector Machines (SVM) by utilising the LibSVM software package. For this task, we also implemented a Hidden Markov Model and an RBF Network, which is initialised by a Self-Organising Map. Different local window sizes were tested to select the optimal one for each approach. Notably, the Logistic function and the SVM have been found to be superior to the tested methods in terms of both predictive accuracy (Q3) and the Segment Overlap score.
 
Poster W40
firestar in 2011

Paolo Maietta Centro Nacional de Investigaciones Oncologicas
Michael Tress (Centro Nacional de Investigaciones Oncologicas) Gonzalo Lopez (Columbia University, C2B2); Jose Manuel Rodriguez (Centro Nacional de Investigaciones Oncologicas, INB); Alfonso Valencia (Centro Nacional de Investigaciones Oncologicas, Structural and Comptational Biology Programme);
 
Short Abstract: Here we present the new developments in firestar, an expert system for predicting functional residues in protein sequences based on information extracted from the PDB. firestar extrapolates from the large inventory of catalytic and biologically relevant small molecule ligand binding residues that are organized in the FireDB database and makes predictions for those validated functional residues from FireDB that are supported by local sequence conservation in the query sequence.

Several new features have been incorporated into firestar to improve the quality and coverage of the predictions. Functional residues in the FireDB repository are now classified in terms of their biological relevance using evolutionary information, structural data and lists of known cognate ligands and we have implemented HHsearch as an additional, more powerful method of detecting remotely related templates.

Previous versions of firestar required human interpretation of the results. Now, the whole process has been automated and a new web interface has been made available. Additionally, the server is able to produce high quality results in a high throughput mode by using sequences as the only input.

firestar has been benchmarked against the CASP7, CASP8 and CASP9 ligand binding prediction targets. The server was able to detect ligand binding residues for 94% of the sites over all three CASP editions. During CASP8 firestar outperformed all officially participating groups in CASP8, showing that firestar is a state of the art method and initial results for the CASP9 experiment suggested that firestar outperformed all the other 15 servers on targets with biological ligands.
 
Poster W41
MinkoFit3D: Modeling of macromolecular structures into density maps by 3D Minkowski sum analysis.

Wojciech Potrzebowski International Institute of Molecular and Cell Biology in Warsaw
Janusz M. Bujnicki (International Institute of Molecular and Cell Biology in Warsaw, Laboratory of Bioinformatics and Protein Engineering);
 
Short Abstract: Experimental advances in producing electron microscopy data on structures of macromolecular assemblies have paved the way for theoretical methods that dock atomic models into low-resolution electron density maps. Although considerable progress has recently been made in building pseudo-atomic models based on combination of cryo-EM and X-ray crystallography, putting together macromolecular assemblies consisting of tens or even hundreds of components remains a challenging and still unsolved problem. The same is true for combining structures with partial information e.g. with known molecular shape.
We developed a new method for fitting atomic models or any user-defined shapes into cryo-EM maps by means of reduced mesh representation. Our program achieves this goal by calculating the Minkowski sum of polyhedrons of the structure or any user-defined shape and the map. The position of the fitted component is obtained by finding “tight passages” in the configuration space defined by the Minkowski sum. This method has been inspired by an approach used in mobile robotic navigation control in obstacle environment. The fit inferred from Minkowski sum is subject to local refinement in the electron density map. This procedure applies iterative superposition of the atomic structure on the calculated density centers of the reference electron density map. Selected or all subunits forming assembly may undergo rotational and translational movement to optimize the correlation coefficient and minimize number of inter-subunit steric clashes until convergence.
 
Poster W42
BOCTOPUS: Topology prediction of Transmembrane beta barrel proteins

Sikander Hayat Stockholm University, Sweden
Arne Elofsson (Stockholm University, Department of Biophysics and Biochemistry);
 
Short Abstract: Transmembrane beta barrel proteins (TMBs) constitute 2-3% of the genome of Gram-Negative bacteria and are found in the outer membrane of Gram-Negative Bacteria, Chloroplast and Mitochondria where they play a major role in the translocation machinery. TMBs also perform functions such as pore formation, membrane anchoring, ion exchange and are candidate molecular targets for development of antimicrobial drugs and vaccines.

We present here BOCTOPUS, a computational method for the topology prediction of TMBs. BOCTOPUS is based on ideas from computational methods that have previously been used for topology prediction of Helical membrane proteins and work by combining local and global predictions [1, 2]. BOCTOPUS is benchmarked using a non-redundant dataset of 37 TMBs with known structures. BOCTOPUS employs Support Vector Machines (SVMs) and a Hidden Markov Model (HMM) and uses position specific scoring matrix (PSSM) as the input parameter. In the first stage, three separate SVMs are used to predict: strand/non-strands, in/not-in, out/not-out regions and the output probabilities are used as the input for the HMM.

Preliminary results show that the SOV, Q3 of BOCTOPUS based on a leave-one-out cross-validation test are 92.8 and 87.8, respectively. Moreover, BOCTOPUS can predict 526 out of 559 (94.1%) strands in the dataset within +/- 3 residues. The BOCTOPUS method outperforms existing methods from the literature.

1. Improving the accuracy of transmembrane protein topology prediction using evolutionary information, Jones DT, Bioinformatics, 2007
2. OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar, Viklund H, Elofsson A, Bioinformatics, 2008
 
Poster W43
Analysis of Functional Islands using Structural Clustering

Anne-Christin Hauschild Max-Planck-Institute for Informatics
Ingolf Sommer (Max-Planck-Institute for Informatics, Computational Biology & Applied Algorithmics);
 
Short Abstract: Motivation: Over the past years protein structure databases have been growing at a rapid pace. Using structural comparison of proteins one can embed the proteins into a - not necessarily metric - space. Using available function annotations, one can analyze local conservation of function in structure space. Considerable differences in the local conservation of different molecular functions have been reported. Here we present a clustering method, useful to analyze and visualize the landscape of proteins in this space.
Method: Structure space can be represented as a weighted, directed graph: vertices represent proteins, connected by edges if the proteins are structurally similar. The edges are weighted by structural similarities.
The clustering method works as follows. Two connected vertices are assigned to the same cluster if they share a similar neighborhood, which means, that two vertices share the same adjacent vertices. We discuss, how this strategy relates to established clustering methods.
Result: We applied the clustering method to a representative set of protein domains annotated with GO molecular function terms. We visualize clusterings for various molecular functions. Essentially, we observe two classes of clusterings: I) all proteins are located in one cluster, in these cases the structural similarity infers the same GO function. II) multiple subclusters occur, which means even though the proteins in different clusters are annotated with the same molecular function, they have distinct structural origin. This approach provides new insights into the relation between protein structures and the annotated GO molecular function terms.
 
Poster W44
The i-Patch Server: Prediction of inter-protein contact sites

Konrad Krawczyk University of Oxford
Charlotte Deane (University of Oxford, Statistics); Rebecca Hamer (University of Oxford, Statistics);
 
Short Abstract: We present i-Patch - an interactive software tool for the prediction of inter protein contact sites. i-Patch is currently one of the most successful methods for the prediction of residues involved in protein protein interactions (REF). It achieves 59% precision with 20% recall on a blind testing set of 31 proteins. The method uses the sequence alignments of the homologues of two proteins which are known to interact alongside their three dimensional structures. It produces a contact-site likelihood score for each of the surface-exposed residues in the two proteins. This score is calculated by treating the complex of two proteins as a network with amino acids as nodes and distances between them as edges. Our tool allows the user to enter their two proteins of interest and gives back protein structures which are highlighted by the i-Patch score. This allows the user to identify the residues and patches of residues that potentially constitute the binding site between the two proteins.
 
Poster W45
Predicting the 3d DNA structure of the human genome

Sven Bilke National Cancer Institute
 
Short Abstract: Evidence for a non-random spatial 3d organization of the cells DNA content and its relevance for gene regulation has been accumulating in recent years. In a recent study [1], Dekker and co-workers introduced a novel method, HiC, allowing for an unbiased genome wide study of 3d conformations producing a "probability map" of DNA-DNA contacts in an ensemble of cells. Here we aim to identify genomic parameters correlating with the 3d-structure described in [1].

We successfully developed a model based exclusively on DNA sequence related observables and a set of mixing parameters. Using Monte Carlo optimization techniques, we identify two major sequence features contributing to the contact matrix. The resulting model faithfully reproduces the empirical consensus contact probability map described in [1] with Pearson's correlation r > 0.71. For comparison, intra-experiment correlation of the data in [1] ranges from 0.55 to 0.89. Only 12 out of 32 intra-experimental correlation are larger than 0.71.

Additional experimental validation of our findings has started at time of writing this abstract.

[1] Lieberman-Aiden,E., Dekker,J. et al, Comprehensive mapping of
long-range interactions reveals folding principles of the human
genome. Science, 326(5950) 289-293 (2009).
 
Poster W46
Binding Ligand Prediction by Comparing Local Surface Patches of Potential Pocket Regions

Daisuke Kihara Dept of Biological Sci/Computer Sci
 
Short Abstract: Function of a protein, specifically, the type of ligand that bind to a protein, can be predicted by finding similar local surface regions of known proteins. We developed an alignment free local surface comparison method for predicting ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its shape, the electrostatic potential, the hydrophobicity, and concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets.
The benchmark studies showed that Patch-Surfer’s prediction performance is superior to existing methods, including Pocket-Surfer developed previously by us (Chikhi, Sael, Kihara, Proteins 2010), which represents a pocket as a whole. On a dataset of 100 non-homologous proteins, 84.0% of the binding ligands were predicted correctly within the top three scores using the shape and pocket size information. The performance was further improved to 87.0% when physicochemical properties were added. Overall, we show that proposed method is powerful in predicting the type of ligand a protein binds even in the absence of homologous proteins in the database. (Sael & Kihara Int J Mol Sci, 2010).
 
Poster W47
Prediction of binding preferences of RNA-binding proteins using a novel graph representation

Daniel Maticzka University of Freiburg
Rolf Backofen (University of Freiburg) Fabrizio Costa (University of Freiburg, Chair for Bioinformatics);
 
Short Abstract: RNA-binding proteins (RBPs) are involved in a number of cellular processes as diverse as regulation of post-transcriptional gene expression, splicing or nuclear export. RBPs recognize their various RNA binding partners by a combination of sequence and secondary structure context of the binding site. In order to accurately model binding preferences of RBPs, this structural context has to be considered in addition to the nucleotide sequence. Structural context has been used by the MEMERIS motif finder to guide the search towards single-stranded regions. The more recent RNAcontext method is able to discover both sequence and structural preferences. Here we propose a novel graph representation of RNA sequence and secondary structure information suitable for the elicitation of RBP binding preferences. In this graph each nucleotide can be annotated with information about its predicted structural context such as being paired, unpaired or being unpaired while being part of a certain secondary structure. Using NSPDK (neighbourhood subgraph pairwise distance kernel) these graphs can then be mapped to explicit features. The resulting sparse feature vectors are suitable for support vector machine learning methods. We successfully applied our graph representation in combination with support vector regression to the prediction of RNA-binding preferences of 9 RBPs as determined by RNAcompete, an in vitro assay for the estimation of binding affinities of RBPs.
 
Poster W48
ProBiS: A web server for detection of structurally similar protein binding sites

Janez Konc National Institute of Chemistry
 
Short Abstract: A web server, ProBiS, freely available at http://probis.cmm.ki.si, is presented. This provides access to the program ProBiS (Protein Binding Sites), which detects protein binding sites
based on local structural alignments. Given a structure of a protein with unknown binding sites, ProBiS suggests the regions on its surface which may be involved in binding with small ligands,
proteins, or DNA/RNA. Alternatively, given a protein with an identified binding site, ProBiS finds other proteins with structurally or physicochemically similar binding sites. If used as a pairwise structure alignment program, ProBiS detects and superimposes similar functional sites in a pair of submitted protein structures, even when these do not have similar folds.
 
Poster W49
Protein conformational movements modeling based on the average action principle

Alexander Krass ElectroMedOborudovanie
Eugene Stepanov (St. Petersburg State University of Information Technologies, Mechanics and Optics, Department of Mathematical Modelling); Sergey Nikolenko (Academic University, Laboratory of Algorithmic Biology); Yuri Porozov (Academic University, Department of Mathematics and Computer Science Technologies);
 
Short Abstract: At present, protein structural biology has several modeling methods for macromolecular conformational motions. In this work, we introduce and implement a new modeling method for predicting the conformational motion of protein molecules. It allows to both significantly reduce computation time as compared to molecular dynamics methods and provide better accuracy than the Mixed Elastic Network Model and Cartesian coordinates linear interpolation.
We have developed a new mathematical model where a protein is represented as a graph whose nodes mark the atoms of the backbone chain. We compute the weights of side chains for each amino acid and transport them to the corresponding graph node. The basic functional is minimized with gradient descent. An additional problem arises here: we have to ‘factor’ all molecule's movements. This has been taken into account both in our modeling method and in the software that we have developed.
The proposed method has shown good results in predicting the correct geometric structure of a protein molecule. We plan to extend this model in further research. This mathematical model and results found with its help can be useful for researchers working in pharmacology and signal biology fields.
 
Poster W50
Evaluation of transmembrane helix (TMH) packing prediction methods on successive in sequence TMHs

Maria Xenophontos University of Cyprus
Vasilis Promponas (University of Cyprus)
 
Short Abstract: In this work, we evaluate the performance of two state-of-the-art, freely available, SVM-based TMH packing methods (MEMPACK and TMhit), in the special case of predicting interacting successive in sequence TMHs (SSTMHs). Along these lines, we derive information on interacting TMHs, from a non-redundant set of PDB entries of polytopic TM subunits. Locations of TMHs are inferred from the structural data and interaction is defined requiring a minimum of residue-pairs in contact, based on a distance criterion. Additionally, we compute the ground truth of interacting TMH pairs by implementing the same criteria used in the training of the aforementioned methods.
Both MEMPACK and TMhit require for input an amino acid sequence, along with the location of TMHs (optionally, TMhit also accepts topology data). Unfortunately, experimental information on TM protein topology is rare. Therefore, in a real-life scenario, users would invoke TMH packing methods with topologies predicted by one of the numerable algorithms that have been developed to accurately predict TM protein topology. Thus, ongoing work investigates how SSTMH interaction prediction quality is affected when using topologies predicted by different widely used methods. Particularly, we have selected a set of diverse freely available TM protein topology prediction tools.
Our results so far indicate that MEMPACK predicts SSTMH interactions slightly more accurate than TMhit. Moreover, MEMPACK exhibits higher accuracy, when PSI-BLAST profiles are generated against UniRef90, instead of NCBI-NR. Importantly, both methods show a tendency for higher specificity than sensitivity.
 
Poster W51
Computing Page Number of RNA Pseudoknotted Structures

IVAN DOTU BOSTON COLLEGE
 
Short Abstract: Let S denote the set of (possibly noncanonical) base pairs {i, j} of an RNA tertiary structure; i.e. {i, j} belongs to S if there is a hydrogen bond between the ith and jth nucleotide. The page number of S, denoted P(S), is the minimum number k such that S can be decomposed into a disjoint union of k secondary structures. Finding the page number of an RNA tertiary structure is recently proven to be NP-complete. Here, we present a Constraint Programming (CP)
approach to solving the page number problem that yields optimal solutions. The approach is divided into three phases: first, we collapse the set of base pairs into a set of helices; second, we construct a graph where helices are the nodes and vertices correspond to pairs of helices that
are involved in a pseudoknot; finally, we apply CP to solve a Minimum Coloring problem on the mentioned graph which corresponds, in fact, to the page number of the given RNA. We present results for the set of RNAs considered in [1] for which their genus number was calculated and we find that the maximum page number among all these RNAs is 4. The motivation for our work is to
provide additional computational data relevant to defining a reasonable pseudoknot energy penalty when computing minimum energy pseudoknotted structures.

[1] M. Bon, G. Vernizzi, H. Orland and A. Zee, J. Mol. Biol. 2008 Jun 13;379(4):900-11.
 
Poster W52
Are Protein Disordered Regions Equal to Loops?

Esmeralda Vicedo Technische Universität München
Avner Schlessinger (University of California, Bioengineering and Therapeutic Sciences); Markus Schmidberger (Technische Universität München , I12 - Informatik); Marco Punta (Technische Universität München , I12 - Informatik); Burkhard Rost (Technische Universität München , I12 - Informatik);
 
Short Abstract: One common definition of regions of “disorder” in proteins is that they do not adopt a regular three-dimensional (3D) structure in isolation (i. e., when not bound to other molecules) on their own. These disordered regions are in contrast to regions that are well structured or “ordered”. Notably, there is a great variety of “flavors” of disorder: some adopt a unique regular 3D secondary structure only upon binding; others, for example loops, remain irregular; some proteins are almost entirely disordered and others have only short disordered regions.
Numerous computational methods exist that predict disorder based on a variety of concepts. One of these methods, NORSnet, has been developed in our group. NORSnet aims to predict disordered regions of the “loopy” type. Here, we predict secondary structure and disorder for all completely sequenced organisms (~ 4 million proteins, UNIPROT, December 2010). We report two observations: (1) Most of the predicted disorder regions are also predicted as loops as expected by the design of the method. (2) Only half of the long loops are actually predicted as disorder.
 
Poster W53
Environment Specific Substitution Tables Improve Membrane Protein Alignment

Jamie Hill University of Oxford
Sebastian Kelm (University of Oxford, Statistics); Jiye Shi (UCB Celltech, Research & Development); Charlotte Deane (University of Oxford, Statistics); Jiye Shi (Shanghai University, Biochemistry);
 
Short Abstract: This poster is based on Proceedings Submission 80.

Motivation: Membrane proteins are both abundant and important in cells, but the small number of solved structures restricts our understanding of them. Here we consider whether membrane proteins undergo different substitutions from their soluble counterparts and whether these can be used to improve membrane protein alignments, and therefore improve prediction of their structure.

Results: We construct environment-specific substitution tables for membrane proteins. These are markedly different from tables for soluble proteins. For example, substitution preferences in lipid tail-contacting parts of membrane proteins are found to be distinct from all environments in soluble proteins, including buried residues. From a principal component analysis of the tables, the most important determinant of substitution preferences is hydrophobicity, followed by secondary structure. We demonstrate the use of our tables in sequence-to-structure alignments of membrane proteins using the FUGUE alignment program. On average, in the 10 - 25% sequence identity range, alignments are improved by 28 correctly aligned residues compared with the default FUGUE tables. Coordinate generation from our alignments yields improved structure models.
 
Poster W54
Quality or Quantity of Structural Dynamics Information: That is the Question! (when improving performance of structure-based function prediction methods)

Dariya Glazer Stanford University
Grace Tang (Stanford University, Bioengineering); Vijay Pande (Stanford University, Chemistry, Structural Biology, and Computer Science); Russ Altman (Stanford University, Bioengineering, Genetics, and Medicine);
 
Short Abstract: Previously, we showed that incorporating structural dynamics information from molecular simulations improved performance of structure-based function prediction methods. This raised new intriguing questions. Does the choice of force field, the length of the simulation trajectories, or the presence of a ligand in the simulation systems have any effect? Are there any meaningful descriptors of the true positive results that can help discriminate those from the false positive ones?

To this end, we scaled up our efforts in molecular dynamics simulations with Folding@Home distributed computing. Using 5 force fields (Amber ’96, ‘99sb, '03, Gromos53a6, and OPLS-AA) we generated 14 multiples of 10 nanosecond trajectories (a total of 140 nanoseconds) with and without the presence of calcium ions for 11 pairs of structures, HOLO/APO for calcium binding. Resulting structural ensembles were evaluated for calcium binding sites using FEATURE.

Our results indicate that different force fields explore somewhat different conformational space of the molecules with respect to the sensitivities of FEATURE. Longer simulation trajectories and inclusion of calcium ions in the simulation systems did not yield a greater improvement in FEATURE’s performance. We propose 6 descriptors of sites identified by FEATURE that can be intelligently combined by machine learning algorithms in order to maximize true positive and minimize false positive results.

Future efforts in structure-based function prediction, especially identifying calcium binding sites, could benefit from applying function prediction methods to structural ensembles of the molecules of interest obtained from several copies of short-scale molecular dynamics simulations employing at least two general force fields.
 
Poster W55
Nearest-Neighbor Approaches to Predict Protein Function by Sequence Homology Alone

Tobias Hamp TU Muenchen
Rebecca Kassner (TU Muenchen, Institut fuer Informatik); Stefan Seemayer (TU Muenchen, Institut fuer Informatik); Esmeralda Vicedo (TU Muenchen, Institut fuer Informatik); Christian Schaefer (TU Muenchen, Institut fuer Informatik); Dominik Achten (TU Muenchen, Institut fuer Informatik); Florian Auer (TU Muenchen, Institut fuer Informatik); Ariane Boehm (TU Muenchen, Institut fuer Informatik); Tatjana Braun (TU Muenchen, Institut fuer Informatik); Maximilian Hecht (TU Muenchen, Institut fuer Informatik); Mark Heron (TU Muenchen, Institut fuer Informatik); Peter Hoenigschmid (TU Muenchen, Institut fuer Informatik); Thomas Hopf (TU Muenchen, Institut fuer Informatik); Stefanie Kaufmann (TU Muenchen, Institut fuer Informatik); Michael Kiening (TU Muenchen, Institut fuer Informatik); Denis Krompass (TU Muenchen, Institut fuer Informatik); Cedric Landerer (TU Muenchen, Institut fuer Informatik); Yannick Mahlich (TU Muenchen, Institut fuer Informatik); Manfred Roos (TU Muenchen, Institut fuer Informatik); Burkhard Rost (TU Muenchen, Institut fuer Informatik);
 
Short Abstract: This year's Automated Function Prediction SIG meeting at ISMB/ECCB 2011 features, for the first time, the so-called CAFA challenge.
Participants were supposed to computationally predict and submit the Gene Ontology (GO) terms for over 40,000 so far functionally unannotated targets.
In this context, we have developed three different Nearest-Neighbor-based methods to derive the functions of a protein by sequence homology alone. Despite being conceptually the same, they largely differ in their technical details, ranging from an emphasis on superior ad-hoc performance to sophisticated and well-defined scoring schemes, allowing to direct either recall or precision into the desired reliability range.
All three methods were employed to predict all the CAFA targets mentioned before and are, in a way, supposed to act as controls for an expected wealth of more sophisticated function prediction machineries at the CAFA meeting.
In addition, we have created a meta-classifier to combine the three approaches and established new ways to generally capture the performance of function prediction tools with GO terms. The latter now enable us to not only estimate precision and recall as described in the official CAFA rules, but also to use the shapes of the GO subgraphs of a target protein to create an assessment that considers its biologically distinct functions.
An own pre-CAFA evaluation of our methods with 10,000 random targets from the SwissProt database already showed remarkable performances of up to 87% precision with 84% recall using the official standard protein centric measure and a greatly outperformed class of random prediction models.
 
Poster W56
Efficiently Represent RNA Ensemble

Reazur Rahman Purdue University
Michael Gribskov (Purdue University, Biological Sciences);
 
Short Abstract: Minimum Free Energy (MFE) based approaches to RNA structure prediction provide structures that contain only a fraction of the biologically relevant stems, due to limitations of the algorithm and inaccuracies in energy parameters. One can analyze the ensemble of near MFE structures to find other biologically relevant stems, including pseudoknots, that are not found in the MFE predicted structure. Unfortunately, with increasing free energy difference from the predicted MFE structure, the number of suboptimal structures increases rapidly along with the number of stems. Conserved stems can be identified by comparing ensembles of structures represented as RNA graphs, where graph nodes are stems and the relationships between stems are different types of edges. Since graph matching is an NP-complete problem, large graphs make the task of comparison less tractable.

Our aim is to reduce the complexity of these graphs by removing similar stems while preserving biologically important stems. First, we theoretically enumerate all possible stems that can be formed with respect to a given stem. We identify cases with topologically similar stems that can be combined to reduce the graph size without significantly altering the overall ensemble of topological relationships. This provides the basis for developing a set of heuristics to contract RNA graphs. We apply our heuristic rules on a well curated dataset, containing Rnase P, Group I Intron, tmRNA and tRNA sequences, and report the effectiveness of each heuristic.
 
Poster W57
Chemical shift prediction using graph kernels

Nino Shervashidze Max Planck Institutes Tübingen
Michael Habeck (Max Planck Institutes Tübingen, Computational Structural Biology); Karsten Borgwardt (Max Planck Institutes Tübingen, Machine Learning and Computational Biology);
 
Short Abstract: In this work we consider the problem of learning to predict chemical shifts of different atomic nuclei in a protein based on the tertiary structure of the protein.
We tackle the problem by regressing the chemical shift value of a nucleus based on its local neighborhood.
In more detail, our approach proceeds as follows: We first represent the local neighborhood of each such nucleus as a graph. This is done by identifying the nodes of each graph with the atoms within a given distance threshold from the considered nucleus and assigning them discrete labels according to the types of the corresponding atoms; The distances between the nodes are determined from the tertiary structure of the protein.
Second, we represent these graphs in a feature space using a variant of the Weisfeiler-Lehman kernel by Shervashidze and Borgwardt (2009). In the end, we predict the chemical shift values using a regression method on these representations of the graphs.
In this poster, we will describe our approach in detail, as well as present and discuss the empirical results of our study in the light of results given by other state-of-the-art chemical shift prediction methods.
 
Poster W58
Is the biological hydrophobicity scale correct?

Christoph Peters Stockholm University
Arne Elofsson (Stockholm University, Dep of Biochemistry and Biophysics);
 
Short Abstract: Alpha helical membrane proteins are inserted into the membrane by the Sec-translocon. When sufficiently hydrophobic segments pass through the translocon they are recognized and passed into the membrane. This process can be described by a simple equilibrium process. In a set of experiments Gunnar von Heijne and co-workers performed direct measurements of the contribution of individual amino acids to the free energy of insertion (Hessa, 2005, 2007 Nature). These measures where then used to create a "biological hydrophobicity scale". In general this scale correlates quite well with other hydrophobicity scales but it differs in detail. This discrepancy has caused some discussion in the recent literature (Johansson & Lindahl PNAS 2009; Gumbart, Chipot & Schulten PNAS 2011).

We have recently showed that using the biological hydrophobicity scale it is possible to develop a method for membrane-protein topology prediction that can challenge the most advances machine learning algorithms. Scampi is based on a very simple hidden-Markov model like algorithm and only has two parameters that can be optimized. We assumed that this excellent performance was due to the high quality of the biological hydrophobicity scale. 

Here, we want to test if the biological hydrophobicity scale really provides a more accurate picture of the insertion mechanism than other hydrophobicity scales, by optimizing the SCAMPI algorithm using different hydrophobicity scales. Preliminary results show that other hydrophobicity scales lead to a decrease in the topology prediction accuracy of Scampi, i.e. the biological hydrophobicity scale does indeed describe the recognition of transmembrane segments by the translocon better than other scales.
 
Poster W59
A computational analysis to identify putative residues responsible for specific activity variation in Sucrose Phosphate Phosphatase (SPP)

Divya P Syamala Devi Sugarcane Breeding Institute, India ; Currently at National Centre for Biological Sciences, Bangalore
Divya Syamala Devi (Sugarcane Breeding Institute, India ; Currently at National Centre for Biological Sciences, Bangalore) Neethi Jayaraman (Sugarcane Breeding Institute, Division of Crop Improvement); Subramonian N (Sugarcane Breeding Institute, Division of Crop Improvement);
 
Short Abstract: This poster is based on Proceedings Submission 218
Sucrose phosphate synthase(SPP) is one of the key enzymes in the sucrose biosynthesis pathway. Sequence and structural comparison between the orthologs of this enzyme identified crucial residue changes that might have contributed to the activity variation between them. We report minute structural variations in active site regions as deduced from GDT scores calculated for fragments containing reported active site residues. The analysis suggests the role of two residues - one phosphate binding and another sucrose binding- in causing variation in catalytic activity among orthologs of the enzyme. We also report a substitution observed in rice SPP that can disrupt stabilizing interaction between sucrose and the enzyme, and thereby contributing to a relatively higher specific activity of rice SPP. Phylogenetic studies based on sequence and structure reveal that the evolutionary relationship of SPP orthologs depends on the over all sequence as well as architectural differences.
 
Poster W60
A comprehensive review of protein fold space and the correlation of structure with function

Spencer Bliven University of California, San Diago
Andreas Prlic (San University of California, San Diego, San Diego Supercomputer Center); Philip Bourne (University of California, San Diego, Skaggs School of Pharmacy and Pharmaceutical Sciences);
 
Short Abstract: Numerous studies have attempted to characterize protein fold space. Existing classification schemes such as SCOP and CATH tend to view protein structures as belong to a finite number of discrete protein folds. Others prefer to view protein space as continuum of possible structures, of which only a few have been sampled. However, previous debate about the topic has been primarily qualitative and philosophical, with examples being cited in support of both discrete and continuous viewpoints [1-4]. In this study, with the aid of specialized hardware, we comprehensively compare protein structures, with the goal of quantitatively assessing the continuity or discreteness of fold space.

Protein fold space was mapped for the complete non-redundant Protein Data Bank (PDB) using the FATCAT algorithm for structural comparison [5] on the OpenScienceGrid compute cluster, yielding 189,031,127 pairwise comparisons. Conceptually, the pairwise protein comparisons form a large graph, where the edges correspond to the similarity between proteins. By analyzing this graph it is possible to quantitatively measure the similarity between protein folds. We find that this graph is highly connected, providing support to the hypothesis that fold space is continuous. Results of mapping the graph to SCOP and to functional annotations will be reported.

[1]Sadreyev et al. Curr Opin Struct Biol (2009) vol.19 (3) pp. 321-8
[2]Sippl. Curr Opin Struct Biol (2009) vol.19 (3) pp. 312-20
[3]Govindarajan et al. Proteins (1999) vol.35 (4) pp. 408-14
[4]Shindyalov and Bourne. Proteins (2000) vol. 38 (3) pp. 247-60
[5]Prlic et al. Bioinformatics (2010) vol. 26 (23) pp. 2983-2985
 
Poster W61
Protein interaction partners revealed by their dynamical properties

Jau-Ji Lin National Chiao Tung University
Jenn-Kang Hwang (National Chiao Tung University, Institute of Bioinformatics and Systems Biology);
 
Short Abstract: The biological processes in cells are carried out by various kinds of protein molecules that perform their functions by interacting with other proteins. Identifying the interacting partners of proteins is crucial to the constructing of the biological pathways in a cell. Recently we developed a method called WCN (Lin et al., 2008) to calculate the protein dynamical properties from its folded structure directly and efficiently. In this work, we analyze the dynamical properties of interacting proteins. We find that these proteins display a “complementary” relationship in their profiles. Therefore we analyze the dynamical profiles of all protein structures in PDB and try to elucidate the relationships between them. The dynamical profiles reveal that proteins do have their specific “disposition” to interact with certain kinds of protein molecules. By inspecting the dynamical profiles between each pair of protein structures in PDB, we can construct a network with nodes and edges representing the proteins and the interactions between them. The resulting network is composed of several sub networks, and most of them agree with the available protein-protein interaction data. In addition, we use CELLO (Yu et al., 2006) to predict the subcellular localization of proteins being proposed as interacting partners and find most of them have the same subcellular localization. This finding confirms the space requirement of proteins to interact with each other. This study constitutes a novel methodology of constructing protein-protein interaction networks and provides a promising way to identify interacting partners of proteins of unknown function with structural information only.
 
Poster W62
Predicting peptidase substrate profiles using computational methods

Haslina Hashim University of Manchester
 
Short Abstract: Peptidases are ubiquitous enzymes which hydrolyze peptide bonds and constitute around 2% of the genomes of all organisms. With an estimated 14% of the 500 human peptidases under investigation as drug targets from the ~550 active and putative peptidases in the human genome, they are important for human health. Given their abundance and pharmaceutical importance, there are still many peptidases whose function is not well characterised. According to the MEROPS peptidase database, out of the 95,150 known peptidase sequences only 415 have known structures, yet this covers the majority of MEROPS families. Hence, comparative modelling is a viable proposition to predict peptidase structure and specificity for most of these enzymes. We have used MODELLER and FoldX to build peptidase structures interacting with candidate substrates, using inhibitor structures as templates, in order to predict subsite specificity from FoldX interaction energies. We have evaluated the protocol using a well characterised test system, trypsin, where structures of 11 variants at the P1 position and associated binding free energies are known. The results show promise in predicting the known specificity profile of trypsin at the P1 subsite, using the actual enzyme structure as a template. Similarly, in a second example, the method can also predict the P4 specificity of Pepsin A with success. However, we will also present results which show how performance is affected when using modelled enzyme structures at a range of similarities, highlighting the true utility of comparative modelling in predicting biological function.
 
Poster W63
SNPeffect 4.0: Molecular and structural phenotyping of human SNPs and disease mutations

Greet De Baets Flanders Institute for Biotechnology
Joost Van Durme (VIB, SWITCH); Frederic Rousseau (VIB, SWITCH); Joost Schymkowitz (VIB, SWITCH);
 
Short Abstract: Single nucleotide polymorphisms (SNPs) are, together with copy number variation, the primary source of variation in the human genome and are associated with altered response to drug treatment, susceptibility to disease, and other phenotypic variation. Linking structural effects of non-synonymous SNPs to functional outcomes is a major issue in structural bioinformatics, and many tools and studies have shown that specific structural properties such as stability and residue burial can be used to distinguish neutral variations and disease associated mutations.
The SNPeffect database uses sequence- and structure-based bioinformatics tools to predict the effect on the molecular phenotype of proteins. It integrates Tango (an aggregation predictor, http://tango.crg.es/); Waltz (a predictor of amyloid forming sequences, http://waltz.switchlab.org/); Limbo (a predictor for chaperone specificity, http://limbo.switchlab.org/); and FoldX (http://foldx.switchlab.org/) that reports the ??G, the change in free energy upon mutation. In that way, FoldX predicts the effect of SNPs in two categories of functional properties: (1) structural and thermodynamic properties affecting protein dynamics and stability and (2) the integrity of functional and binding sites.
The database already contains the annotations for the UniProt set of human disease and polymorphism mutations, but users can also submit their own set of mutations for analysis.
 
Poster W64
canSAR 3D: Building a 3D Structural Map of Cancer

Krishna C Bulusu Institute of Cancer Research
Krishna Bulusu (Institute of Cancer Research) Bissan Al-Lazikani (Institute of Cancer Research, CRUK Cancer Therapeutics Unit); Mark Halling-Brown (Institute of Cancer Research, CRUK Cancer Therapeutics Unit);
 
Short Abstract: Protein 3D structural data is a very powerful tool in drug design and development. Understanding the rules governing protein conformational movement is vital in understanding the structure-function correlation. This research is aimed at studying the structural basis of function, pathology and drug binding in Cancer-associated protein families using integrative in-silico techniques. 3D structural data available in the public domain and also within the Institute is integrated with chemogenomic data available to understand structural variation within a protein family and how it influences the protein’s role in cancer pathways. This is achieved by performing structural comparisons using superpositions and family-based clustering to analyse the effect of conformational changes on function. Ligand-binding footprints will be derived by studying the binding sites and mapped to the chemical structures of small molecule ligands. The knowledge derived will be utilised to build a library of curated and validated 3D models of Cancer-specific proteins.
 
Poster W65
Is It Biologically Relevant? An Evolutionary Method for Distinguishing Biological Interfaces from Crystal Contacts

Guido Capitani Paul Scherrer Institut
Jose Duarte (Paul Scherrer Institut, Biomolecular research); Schärer Martin (Paul Scherrer Institut, Biomolecular research);
 
Short Abstract: Thanks to technical and conceptual advances, macromolecular crystallography is nowadays a technique that can achieve the structure determination of very complex molecular objects. Such complexity makes it increasingly difficult to distinguish by visual inspection the two different types of interfaces found in protein crystals: biologically relevant ones and non-specific ones, corresponding to crystal lattice contacts. A need thus exists for computational tools capable of assigning a given interface as either “biological” or as “crystal contact”. To this end we devised a novel indicator called CRK (1) and a software tool to compute it. The CRK indicator compares the selection pressure acting, on average, on interface residues that are fully buried upon protein-protein interface formation (the "core" residues) and on those that are only partially buried (the "rim" residues). For a biologically relevant interface, the average selection pressure should be stronger on the core residues than on the rim residues, while in a crystal contact interface it is assumed not to differ significantly between the rim and core set. The CRK software evaluates selection pressure at the single residue level by calculating Ka/Ks ratios with SELECTON (2), or also by sequence entropy.
CRK distinguishes the two types of interfaces very effectively. It can also be used for structure validation purposes. A new version of the CRK software is being written, with the goal of being user-friendly and easy to install.
(1) Schaerer et al (2010), Proteins 78:2707-2713
(2) Stern et al (2007), Nucleic Acids Res 35:W506-W511
 
Poster W66
SAHG, a comprehensive database of annotated structure models for human proteins

Chie Motono National Institute of Advanced Industrial Science and Technology
Ryotaro Koike (Nagoya University, Graduate School of Information Science); Kana Shimizu (National Institute of Advanced Industrial Science and Technology, Computational Biology Research Center); Matsuyuki Shirota (Tohoku University, Graduate School of Information Science); Takayuki Amemiya (Nagoya University, Graduate School of Information Science); Kentaro Tomii (Institution National Institute of Advanced Industrial Science and Technology , Computational Biology Research Center); Nozomi Nagano (Institution National Institute of Advanced Industrial Science and Technology , Computational Biology Research Center); Hidekazu Hiroaki (Kobe University, Graduate School of Medicine); Tsuyoshi Shirai (Nagahama Institute of Bioscience and Technology, Department of Bioscience); Kengo Kinoshita (Tohoku University, Graduate School of Information Science); Tamotsu Noguchi (National Institute of Advanced Industrial Science and Technology, Computational Biology Research Center); Motonori Ota (Nagoya University, Graduate School of Information Science);
 
Short Abstract: We have constructed a novel database, SAHG, Structural Atlas of Human Genome (http://bird.cbrc.jp/sahg/), which exhibits protein structure models encoded in the human genome. All of the Open Reading Frames in the human genome are subjected to a fully automated, exhaustive protein-structure-prediction pipeline to generate protein structure models. SAHG contains 42,577 domain-structure models in ~24800 unique human protein sequences from the Refseq database. We believe that the SAHG database provides an up-to-date collection of annotated protein 3D models of reliable quality to users.
Compared to other existing databases of protein structure models, SAHG is distinct in that the pipeline is focus on proteins of higher organisms, and in that conformational changes of proteins were predicted and displayed as animated image.
To achieve the features, we developed the structure prediction methods suitable for the multi-domain proteins with considerable intrinsically disordered (ID) regions by the combination of local alignment methods (BLAST, PSI-BLAST, and the Smith-Waterman profile-profile alignment), global alignment method (FORTE and the probabilistic profile-profile alignment), prediction tool for ID regions, and MODELLER. Conformational changes of protein models upon ligand binding are predicted by the simultaneous modeling using the templates of apo and holo forms. When there were no suitable templates for holo forms, we prepared holo models using prediction methods for ligand-binding and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images, which provide significant information for protein function when they are shown with functional residues and ligands.
 
Poster W67
Training bidirectional recurrent neural networks with Conjugate gradient-type algorithms for protein secondary structure prediction

Michalis Agathocleous University of Cyprus
Chris Christodoulou (University of Cyprus) Petros Kountouris (University of Cyprus, Computer Science ); Vasilis Promponas (University of Cyprus, Biological Sciences ); Georgia Christodoulou (University of Cyprus, Computer Science ); Vassilis Vassiliades (University of Cyprus, Computer Science ); Chris Christodoulou (University of Cyprus, Computer Science );
 
Short Abstract: Protein Secondary Structure Prediction (PSSP) is an important step to predict the protein 3D structure from an amino acid sequence, which is essential for the exploration of protein functions. Recent PSSP research has managed to achieve an overall result around 80% prediction accuracy, utilizing various machine learning techniques. One approach uses a Bidirectional Recurrent Neural Network (BRNN) architecture and an extension of the backpropagation through structure algorithm with the error propagated through time in both directions of the BRNN. The BRNN has proved to be a very efficient architecture for PSSP, since it can capture both upstream and downstream information. In order to increase the prediction capabilities of the BRNN architecture and achieve better results for the PSSP problem, we have implemented a second-order learning algorithm to replace the conventional backpropagation through time algorithm. It has been widely proven that second-order learning algorithms are superior to the backpropagation algorithm in training multilayer networks. In particular, a variant of the traditional conjugate gradient (CG) algorithm, the scaled-conjugate gradient (SCG) has been shown to be very effective when used on large datasets and faster than the standard backpropagation algorithm. Even though the standard CG has been developed for simple recurrent neural networks, to the best of our knowledge it has never been developed and implemented for BRNN architectures. In this paper, we theoretically develop and implement the SCG for the BRNN and apply to PSSP. Our preliminary results both on learning speed and on PSSP accuracy are very promising.
 
Poster W68
Protein Sequence Design Using Ensemble-Based Energetic Information

James Wrabl Johns Hopkins University
Shayer Chowdhury (Johns Hopkins University, Biology); Vincent Hilser (Johns Hopkins University, Biology and Biophysics);
 
Short Abstract: Despite advances in computational technology and theoretical understanding, rational selection of amino acid sequences that adopt a desired protein structure is still a formidable challenge. This work explores a preliminary approach to this design problem using information derived from a previously reported (and experimentally validated) statistical mechanical ensemble description of globular protein thermodynamics. Information about both the native and denatured state energetics of the design target is taken into account, as we hypothesize that the denatured state in particular cryptically encodes necessary negative design information. An algorithm encoding the strategy is developed that automatically generates amino acid sequences thought to be energetically compatible with an arbitrary target structure. These designed sequences indeed demonstrate primary, secondary, and tertiary structure properties similar to the target in silico. Interestingly, although designed sequences exhibit twilight-zone levels of sequence identity to the target, no designed amino acid sequence has yet been observed to identically match a known entry in publicly available databanks. Results of experimental overexpression, purification, and biophysical characterization of sequence designs for the specific example of SH3 domain are presented.
 
Poster W69
Predicting Functions of the Universal Stress Proteins in Metabolically Versatile Rhodopseudomonas palustris

Shaneka Simmons Jackson State University
Alexander Ropelewski (Carnegie Mellon University, Pittsburgh Supercomputing Center); Hari Cohly (Jackson State University, Biology); Hugh Nicholas (Carnegie Mellon University, Pittsburgh Supercomputing Center); Raphael Isokpehi (Jackson State University, Center for Bioinformatics & Computational Biology);
 
Short Abstract: Genes encoding the universal stress protein (USP) domain (Pfam PF00582) are induced during a plethora of environmental stress stimuli. The USPs of Methanococcus jannaschii MJ0577 and Haemophilus influenzae have been well characterized as asymmetric dimers with a tertiary alpha/beta fold and predicted to be associated with ATP binding or non-ATP binding motifs, respectively. However, the biochemical mechanisms and structural architecture defining the diverse functional roles USPs in other bacterial species remains largely unknown. Public availability of finished genome sequences housed at Integrated Microbial Genome (IMG) system, a comprehensive microbial database, allows for investigation of microbial genes encoding proteins containing the PF00582 domain. The metabolically versatile, bioenergy producing microbe, Rhodopseudomonas palustris, has 6 finished genome sequences from strains BisA53, BisB5, BisB18, CGA009, HaA2 and TIE-1, representing the highest number of USP-containing genomes and genes in microbes identified as important to the Department of Energy’s mission for bioenergy production. USPs within finished R. palustris genome sequences were retrieved from IMG system and analyzed to characterize the functional and biochemical roles of the USP domain. Fifty-one R. palustris USPs were grouped according to amino acid length. Twenty sequences ranging from 274 aa to 293 aa were aligned with Clustal X Gonnet series protein weight matrix and segregated into conserved familial sequence motifs using the MEME motif analysis algorithm. Phylogenetic analysis revealed four R. palustris Usp-containing clades. Annotations based on INTERPRO domains identified evidence of Rossmann-like fold topology within each sequence. We are currently assessing the ATP-binding capabilities of the 20 prioritized proteins.
 
Poster W70
The Impact of Multifunctional Genes on "Guilt by Association" Analysis

Jesse Gillis University of British Columbia
Paul Pavlidis (University of British Columbia, Psychiatry and Centre for High-throughput Biology);
 
Short Abstract: Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this work we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
 
Poster W71
Understanding protein-protein interactions: Efficient classification of physiological and non-physiological protein complexes and unexpected conservation in interaction interfaces

Dmitry Korkin University of Missouri-Columbia
Nan Zhao (University of Missouri-Columbia, Dept. of Computer Science and Informatics Institute); Bin Pang (University of Missouri-Columbia, Dept. of Computer Science and Informatics Institute); Chi-Ren Shyu (University of Missouri-Columbia, Dept. of Computer Science and Informatics Institute);
 
Short Abstract: Detailed structural knowledge about protein-protein interactions can provide insights to the basic processes underlying cell function. In this work, we address two problems concerning protein-protein interactions. The first problem is to determine whether an interaction formed by a pair of proteins is physiological or it is an artifact of computational or experimental methods. Solving this problem is crucial for detection of crystal-packing interfaces and accurate protein-protein docking. We introduce two feature-based classifiers that are trained using (i) supervised and (ii) semi-supervised learning approaches. All classifiers employ a universal set of features extracted from the structures of protein-protein interfaces. Our classifiers reach 93% accuracy when classifying physiological crystal-packing interactions in PDB and 85% when classifying native and decoy interactions from protein docking.

Our second problem is concerned with understanding the role of charges residues in protein-protein interactions. To address the problem, we propose a simple approach to study structural conservation of the charged residue pairs by analyzing the homologous binary complexes. As a result, we find a novel conservation pattern, which we call the correlated reappearance of charged residues. The analysis of conservation patterns across different superkingdoms as well as structural classes of proteins has revealed that the correlated is by far the most prevalent conservation pattern. By linking our findings with the electrostatic steering mechanism, we propose that often it is not a specific location, but rather a mere presence of charged residue pairs in the protein interface that needs to be conserved to form a protein-protein interaction.
 
Poster W72
Protein Structural Domain Assignment Using Delaunay Tessellation

Todd Taylor NIH
 
Short Abstract: We describe a fully automated method of protein structural domain assignment using a Potts model that we call DePot (an abbreviation for Delaunay-Potts). Each amino acid residue is represented as a site in an irregular lattice derived from the Delaunay tessellation of the protein structure. Domain membership is represented by a spin value and each site has a spin that can change under the influence of its neighbors. Neighboring spins on the lattice are allowed to interact subject to an Ising ferromagnetic-like energy function until clusters of like spins emerge and these clusters define domains. DePot is simple and easy to implement and the assignments agree very well with previously published methods.
 

Accepted Posters


Attention Poster Authors: The ideal poster size should be max. 1.30 m (130 cm) high x 0.90 m (90 cm) wide. Fasteners (Velcro / double sided tape) will be provided at the site, please DO NOT bring tape, tacks or pins. View a diagram of the the poster board here




Posters Display Schedule:

Odd Numbered posters:
  • Set-up timeframe: Sunday, July 17, 7:30 a.m. - 10:00 a.m.
  • Author poster presentations: Monday, July 18, 12:40 p.m. - 2:30 p.m.
  • Removal timeframe: Monday, July 18, 2:30 p.m. - 3:30 p.m.*
Even Numbered posters:
  • Set-up timeframe: Monday, July 18, 3:30 p.m. - 4:30 p.m.
  • Author poster presentations: Tuesday, July 19, 12:40 p.m. - 2:30 p.m.
  • Removal timeframe: Tuesday, July 19, 2:30 p.m. - 4:00 p.m.*
* Posters that are not removed by the designated time may be taken down by the organizers and discarded. Please be sure to remove your poster within the stated timeframe.

Delegate Posters Viewing Schedule

Odd Numbered posters:
On display Sunday, July 17, 10:00 a.m. through Monday, June 18, 2:30 p.m.
Author presentations will take place Monday, July 18: 12:40 p.m.-2:30 p.m.

Even Numbered posters:
On display Monday, July 18, 4:30 p.m. through Tuesday, June 19, 2:30 p.m.
Author presentations will take place Tuesday, July 19: 12:40 p.m.-2:30 p.m





Want to print a poster in Vienna - try these options:

Repacopy- next to the congress venue link [MAP]

Also at Karlsplatz is in the Ring Center, Kärntner Str. 42, link [MAP]


If you need your poster on a thicker material, you may also use a plotter service next to Karlsplatz: http://schiessling.at/portfolio/



View Posters By Category
Search Posters:
Poster Number Matches
Last Name
Co-Authors Contains
Title
Abstract Contains






↑ TOP