20th Annual International Conference on
Intelligent Systems for Molecular Biology


Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category W - ''
W01 - Developing computational techniques for structure modeling of helical membrane proteins
Short Abstract: Transmembrane (TM) proteins are estimated to account for ~20-30% of the human genome and serve as important drug targets. Despite the significant progress in experimental techniques, TM protein structure determination remains a challenge in general. Computational approaches, including de novo and comparative structure predictions, have played a significant role in structural and functional studies of membrane proteins, as well as in their structure-based drug design efforts. A major challenge of membrane protein structure prediction is to assemble individual helices into high-quality tertiary structures. Analyzing inter-residue interactions within membrane proteins is of significance to developing novel tools to tackle this challenge. In this work, we first proposed a scoring function based on the analysis of favorable inter-residue interactions in high-resolution structures of helical membrane proteins. Next, this scoring function was validated using multiple datasets of helical membrane protein structure models, including those from GPCR Dock 2008 and GPCR Dock 2010 datasets. These results suggest a useful tool for computational modeling of membrane protein structures. This work was supported by the NIH-R15 grant from NIGMS.
TOP
W02 - Investigation of sequence, function and quaternary structure of DJ-1 superfamily
Short Abstract: The DJ-1 superfamily (DJ-1/ThiJ/PfpI superfamily) is distributed across all three kingdoms of life. These proteins are involved in a highly diverse range of cellular functions including chaperone and protease activity. The DJ-1 superfamily proteins usually form dimer or hexamer in vivo and show at least four different modes of binding orientations via distinct interface patches. Abnormal oligomerization of human DJ-1 is related to neurodegenerative disorders including Parkinson’s disease, suggesting important functional roles of quaternary structures. Still, the quaternary structures of the DJ-1 superfamily have not been extensively studied.
We focus on the diverse oligomerization modes among the DJ-1 superfamily proteins and investigate the functional roles of quaternary structures both computationally and experimentally. The oligomerization modes were classified into 4 types (the DJ-1, the YhbO, the Hsp, and the YDR type) depending on the distinct interface patches (I~IV) upon dimerization. A unique, rotated interface via the patch-I is reported, which may potentially be related to higher order oligomerization. In general, the groups based on sequence similarity are shown consistent with the quaternary structural classes, but their biochemical function could not be inferred straightforwardly using sequence information alone. The observed phyletic pattern suggests the dynamic nature of quaternary structures in the course of evolution.
TOP
W03 - Identification and classification of conserved RNA structural motifs using a graph theoretical approach
Short Abstract: Originally known as a genetic information carrier, RNA also plays a critical role in multiple cellular processes including transcriptional and translational regulation. Known functional RNA classes include transfer RNA, ribosomal RNA, ribonuclease P RNA, small nucleolar RNA, small nuclear RNA, transfer-messenger RNA, and regulatory elements in untranslated regions of messenger RNA. However, the majority of functional RNA motifs are yet to be identified.
Compared to DNA and protein, whose conserved functional motifs can be identified based on underlying sequence similarity, RNA functional motifs lack a reliable signal at the sequence level. However, RNA sequences with similar functions have conserved secondary and higher-order structures. RNA topology, the global organization of local structural elements (stems, loops, pseudoknots, etc), offers an approach for identifying unknown but conserved functional elements.
In this study, we have developed a graph theoretical approach that is able to identify a set of topological features in an RNA graph; this set of features defines a unique structural fingerprint of the RNA molecule. By comparison of RNA structural fingerprints, we can identify conserved structural motifs across RNAs. Such conservation may be indicative of as-yet unknown function. Our preliminary results on four known functional RNA classes exhibited successful identification of specific conserved structural motifs in each class. Further classification using this class-specific motif information reached an accuracy of over 90%. The identification of RNA with similar structural features is a step towards structure-based prediction of RNA function.
TOP
W04 - Sources of Experimental Function Annotation in UniProt-GOA, Implications for Function Prediction
Short Abstract: Computational protein function prediction programs rely upon well-annotated databases for testing and training their algorithms. These databases, in turn, rely upon the work of curators to capture experimental findings from the scientific literature and apply them to protein sequence data. However, due to high-throughput experimental assays, it is possible that a small number of experimental papers could dominate the functional protein annotations collected in databases. Here we investigate just how prevalent is the “few papers – many proteins” bias. We examine the annotation of experimental protein function in the UniProt Gene Ontology Annotation project (GOA), and show that the distribution of proteins per paper is exponential, with a small number of papers contributing a large number of annotations. We additionally investigate the impact that this bias has on the available function annotations per species. We find that for several important model species, a significant fraction of the annotations available are provided by only a few dominant papers. Given that most high-throughput techniques can find only one (or a small group) of functions, it appears that some level of experimental protein function annotation bias is unavoidable. We discuss how this bias affects our view of the protein function universe, and consequently our ability to predict protein function. Knowing that this bias exists and understanding its extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.
TOP
W05 - Measuring protein structural changes to quantify residue level flexibility
Short Abstract: The importance of considering protein backbone flexibility is increasingly demonstrated by its incorporation into molecular modeling algorithms such as protein design or docking. Our measurements quantify residue-level backbone flexibility by comparing conformational differences between pairs of independent crystal structures of the same protein. We identify 2 different types of flexibility, and provide computationally inexpensive measures for each. We believe that predicting these measurements within modeling algorithms will provide improvements in modeling capability and efficiency.
Protein flexible regions can be identified by positional fluctuations between the atoms of the backbone (or side-chains) of different conformations. Our hypothesis is that the amplitude of these fluctuations reflects the degree of flexibility. Furthermore, flexibility can be measured in terms of residue movement or shape deformation, and each type of flexibility should be analyzed separately. We discuss the development of two continuous measures of flexibility calculated by comparing corresponding residues and secondary structure elements of pairs of protein conformations, and show how each measure can be used to classify residues that demonstrate flexibility. We compare our measurements to descriptions of conformational changes in the relevant literature and to flexibility measurements of appropriate NMR structures.
TOP
W06 - High-throughput subtomogram alignment and classification by Fourier space constrained fast volumetric matching
Short Abstract: Cryo-electron tomography allows the visualization of macromolecular complexes in their cellular environments in close-to-live conditions. The nominal resolution of subtomograms can be significantly increased when individual subtomograms of the same kind are aligned and averaged. A vital step for such a procedure are algorithms that speedup subtomogram alignment and improve its accuracy to allow reference-free subtomogram classifications. Such method will facilitate automation of tomography analysis and overall high throughput in the data processing. Building on previous work, we propose a fast rotational alignment method that uses the Fourier equivalent form of a popular constrained correlation measure that considers missing wedge corrections and density variances in the subtomograms. The fast rotational search is based on 3D volumetric matching, which improves the rotational alignment accuracy in particular for highly distorted subtomograms with low SNR and tilt angle ranges in comparison to a fast rotational alignment based on the matching of projected 2D spherical images. We further integrate our fast rotational alignment method in a reference free iterative subtomogram classification scheme, and propose a local feature enhancement strategy in the classification process. As a proof of principle, we can demonstrate that the automatic method can be used to successfully classify a large number of experimental subtomograms without the need of a reference structure.
TOP
W07 - Investigation of sequence, function and quaternary structure of DJ-1 superfamily
Short Abstract: The DJ-1 superfamily (DJ-1/ThiJ/PfpI superfamily) is distributed across all three kingdoms of life. These proteins
are involved in a highly diverse range of cellular functions,
including chaperone and protease activity. DJ-1 proteins
usually form dimers or hexamers in vivo and show at least
four different binding orientations via distinct interface
patches. Abnormal oligomerization of human DJ-1 is related to neurodegenerative disorders including Parkinson’s disease, suggesting important functional roles of
quaternary structures. However, the quaternary structures
of the DJ-1 superfamily have not been extensively studied.
Here, we focus on the diverse oligomerization modes
among the DJ-1 superfamily proteins and investigate the
functional roles of quaternary structures both computationally and experimentally. The oligomerization modes
are classified into 4 types (the DJ-1, YhbO, Hsp, and YDR
types) depending on the distinct interface patches (I-IV)
upon dimerization. A unique, rotated interface via patch I is
reported, which may potentially be related to higher order
oligomerization. In general, the groups based on sequence
similarity are consistent with the quaternary structural
classes, but their biochemical functions cannot be directly
inferred using sequence information alone. The observed
phyletic pattern suggests the dynamic nature of quaternary structures in the course of evolution. The amino acid
residues at the interfaces tend to show lower mutation
rates than those of non-interfacial surfaces.
TOP
W08 - The Critical Assessment of Function Annotation experiment: a community-wide effort towards a better functional annotation of genes and genomes
Short Abstract: A major challenge of the post-genomic era is understanding the function of genes. Accurate functional annotation is critical so that genes can be placed in proper biological context be it biochemical, physiological or ecological. While the data obtained from sequencing projects are growing exponentially, the biological interpretation is lagging behind. The reason for this lag is that sequencing and rudimentary gene finding have been automated to an acceptable degree of reliability, but the ability to predict gene function has not. Most genome projects and derived databases rely fully on automated functional annotations, making the increase in annotation accuracy and coverage a prime goal for annotation algorithm
It is for this reason that understanding the accuracy of function prediction programs is of primary importance to the process of translating sequence data into biologically meaningful information. Here we present the results of the first Critical Assessment of Function Annotations (CAFA) held during 2010-2011. Thirty-four research groups worldwide have participated in this experiment, with over 50 function annotation algorithms. We provided a list of 48,298 targets from Swiss-Prot, taken from several model organisms, for the participants to annotate. 594 of these targets have accumulated experimentally verified annotations after the prediction process ended, and were thus fit to be used as a blind benchmark for evaluating the accuracy of the annotation algorithms. The prediction methods were assessed using ROC curves, precision/recall curves, and variations on semantic similarity as applied to the Gene Ontology.
TOP
W09 - Factors affecting membrane protein threading
Short Abstract: Membrane proteins comprise up to 50% of drug targets, but only ~2% of structures in the pdb. Template-based modelling of membrane proteins is commonly performed with templates below the twilight zone of sequence identity (~30% identity). The primary determinant of model quality is the quality of the threading between the target and template. We determine the factors that affect threading quality in the membrane regions, and incorporate this knowledge into a threading method. Despite its simplicity, this new method is competitive with the best soluble-protein threading programs including hhsearch, msaprobs, and promals. We also discuss the effects of homologue-selection, phylogeny, and secondary structure propensities on threading accuracy.
TOP
W10 - A Novel Clustering Algorithm to Study Enzyme Reactions and Mechanisms
Short Abstract: The most widely used classification system describing enzyme-catalysed reactions is the Enzyme Commission (EC) number. Understanding enzyme function is important for both fundamental scientific and pharmaceutical reasons. The EC classification was designed when three dimensional protein structures, that play an important role in understanding and annotating enzyme functions, were not available. Furthermore, the EC classification is essentially unrelated to the reaction mechanism.

We take two sets of descriptors about enzyme reactions and mechanisms, based on the MACiE database. We seek the inherent structure of these data, particularly the number and composition of clusters. We consider various existing supervised and unsupervised clustering methods and show that in general they do not produce satisfactory results for our dataset; many suffer from internal and external biases. Determining the number of clusters, when no prior information is available, is not a trivial task. This motivates us to propose a novel clustering algorithm which is unbiased both in determining the structure of a dataset and in deciding the number of clusters. The algorithm is based on statistical sampling of the data and uses an unbiased expectation value estimator to partition the dataset into meaningful and coherent groups.

Here we present our implementation of the novel clustering algorithm, with some results for validation datasets of known cluster structure, and its application to clustering enzyme reaction and mechanism data.
TOP
W11 - A probabilistic approach to predict substrate specificity of methyltransferases
Short Abstract: Background
Methylation is one of the most common chemical modifications, involved in many essential cellular processes. In Saccharomyces cerevisiae there are 61 methyltransferases known to methylate proteins, RNAs, lipids or small molecules. In addition, there are 25 putative methyltransferases with unknown substrate specificity.

Results
We applied a simple probabilistic classifier to predict substrate specificity of yeast methyltransferases. Our model predict either MTase methylate RNA, protein or other molecule.

To build prediction models we used four general features of the analyzed proteins: structural fold, isoelectric point (pI), expression pattern and cellular localization. Maximum Likelihood Method coupled with Akaike Information Criterion was used for estimating parameters of the models. Our best model predicts correctly general substrate specificity in 84% of cases, when tested on methyltransferases with known substrate type.

Conclusions
We predicted substrate specificity for all putative Saccharomyces cerevisiae methyltransferases. Several of our predictions were tested and confirmed experimentally.
Our approach is mostly based on non-yeast specific properties of the proteins and as such should be also applicable to methyltransferases from other organisms.
TOP
W12 - Structure Prediction and Modelling Studies of Brain-Specific Angiogenesis Inhibitor-1
Short Abstract: This poster is based on structure prediction of Brain-specific Angiogenesis Inhibitor 1(BAI1). BAI1 is a protein that in humans is encoded by the BAI1 gene. It plays a major role in angiogenesis inhibition and suppression of glioblastoma in brain. BAI1 is postulated to be a G-protein coupled receptor which is a member of the class B secretin receptor family, an inhibitor of angiogenesis and a growth suppressor of glioblastomas. Molecular modelling of human Brain-specific Angiogenesis Inhibitor 1 is based on homology and fold patterns of known three-dimensional structure of 4DLO_ A (template sequence). We compared the two structures generated via the aforementioned methodologies,so as to obtain the final structure.The modelling was followed by energy minimization via charmm27 forcefield and energy optimization for 2000 iterations.
The modeling of the BAI1 protein can be a significant step for artificially inducing angiogenesis in blood clot affected-regions found specially in the brains of stroke patients.
TOP
W13 - A position-specific distance-dependent statistical potential for protein structure and functional study
Short Abstract: Although studied extensively, designing highly accurate protein energy potential is still challenging. A lot of knowledge-based statistical potentials are derived, which consist of two major components: observed atomic interacting probability and reference state. These potentials are mainly different in the reference state and use a similar simple counting method to estimate the observed interacting probability, which is usually assumed to correlate with only atom types. We take a rather different view on the observed interacting probability and parameterize it by protein sequence profile context (i.e., the PSI-BLAST sequence profile in the neighborhood of a residue) and the radius of the gyration, in addition to atom types. Our potential have different energy profiles for two atoms of given types, depending on their sequence profile contexts, while others have the same energy profile for an atom pair of fixed types across all proteins. Different from the simple counting method used by many energy potentials, we estimate the observed atomic interacting probability using a probabilistic neural network since there are insufficient solved protein structures in PDB for reliable simple counting of sequence profile contexts.
Experimental results indicate that our potential significantly outperforms several popular higher-resolution full-atom potentials in decoy discrimination even if our potential uses only backbone atoms. Our potential also demonstrates superior performance in ab initio folding when compared to DOPE and DFIRE. These results imply that in addition to the reference state, the observed atomic interacting probability also makes energy potentials different and evolutionary information greatly boost performance of statistical potentials.
TOP
W14 - Proteome-wide Discovery of Remote Homologs of Human Chemokines by a 3D Profile-based Computational Approach
Short Abstract: Chemokines are small secreted signal proteins with important roles in immune responses. They share a conserved 3D structure, the so-called IL8-like chemokine fold, which is supported by characteristic disulfide bonds. Sequence- and profile-based computational methods have been proficient in discovering novel chemokines utilizing their sequence-conserved cysteine patterns. However, it has recently been shown that some chemokines escaped annotation by these methods due to low sequence similarity to known chemokines and to different arrangement of cysteines in sequence and 3D.
To overcome this limitation, we developed a novel computational approach for proteome-wide identification of remote homologs of the chemokine family that uses fold recognition in combination with an automatic scaffold-based mapping of disulfide bonds to define a 3D profile of the chemokine family. By applying our methodology to all currently uncharacterized human protein sequences, we have discovered two novel proteins that, without having significant sequence similarity to known chemokines or characteristic cysteine patterns, show strong structural resemblance to known anti-HIV chemokines. Detailed computational analysis and experimental structural investigations support our structural predictions and highlight several other chemokine-like features. Our results support their functional annotation as putative novel chemokines and encourage further experimental characterization.
The identification of remote human chemokines homologs may provide new insights into the molecular mechanisms causing pathologies such as cancer or AIDS, and may contribute to the development of novel treatments. Besides, the genome-wide applicability of our methodology based on 3D protein family profiles may open up new possibilities for improving and accelerating protein function annotation processes.
TOP
W15 - Computer-Based Annotation of Putative AraC/XylS-Family Transcription Factors of Known Structure but Unknown Function
Short Abstract: Currently, about 20 crystal structures per day are released and deposited in the Protein Data Bank. A significant fraction of these structures is produced by research groups associated with the structural genomics consortium. The biological function of many of these proteins is generally unknown or not validated by experiment. Therefore, a growing need for functional prediction of protein structures has emerged. Here we present an integrated bioinformatics method that combines sequence-based relationships and three-dimensional (3D) structural similarity of transcriptional regulators with computer prediction of their cognate DNA binding sequences. We applied this method to the AraC/XylS family of transcription factors, which is a large family of transcriptional regulators found in many bacteria controlling the expression of genes involved in diverse biological functions. Three putative new members of this family with known 3D structure but unknown function were identified for which a probable functional classification is provided. Our bioinformatics analyses suggest that they could be involved in plant cell wall degradation (Lin2118 protein from Listeria innocua, PDB entry 3oou), symbiotic nitrogen fixation (protein from Chromobacterium violaceum, PDB entry 3oio), and either metabolism of plant-derived biomass or nitrogen fixation (protein from Rhodopseudomonas palustris, PDB entry 3mn2). Mapping of predicted DNA binding sites to the genomes of their source organisms and analysis of the genetic context was in good agreement with our proposed functional annotations.
TOP
W16 - HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
Short Abstract: Sequence-based protein function and structure prediction depends critically on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): ‘HMM-HMM-based lightning-fast iterative sequence search’ (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50–100% higher sensitivity and generates more accurate alignments.
TOP
W17 - A novel approach of spatial motif extraction to classify protein structures
Short Abstract: Motivation: Exploring spatial information in protein structures can
give important functional and structural insights. Indeed, spatial motifs
may correspond to relevant fragments, which are greatly useful in
any computational task dealing with proteins. In this paper, we propose
a novel algorithm to find spatial motifs from protein structures
by extending the Karp-Miller-Rosenberg (KMR) repetition finder dedicated
to sequences. The extracted motifs, termed ant-motifs, obey
a well-defined shape which is proposed based on biological basis.
These motifs are used to perform various supervised classification tasks on already published data.

Results: Experimental results show that ant-motifs offer considerable
benefits, in protein classification, over sequential motifs and spatial
motifs of recent relative works. We also show that it is better
to enhance the data preprocessing rather than to focus only on the
optimization of classifiers.

Availability: Programs and data are freely available upon request
from authors or at http://www.isima.fr/~mephu/FILES/AntMotif/
TOP
W18 - Contact Geometry of Estrogen Receptor Dynamics
Short Abstract: The estrogen receptor is a biologically important protein with crucial roles both in normal physiology and in breast cancer. We are using molecular dynamics simulations to study the ligand-induced conformational changes in the ligand-binding domain (LBD) of the estrogen receptor-alpha. We use a measure we developed previously, mean-contact-deviation (MCD), to compare the dynamics of the 1QKU estradiol-bound LBD structure (with estradiol removed) to a ligand-free homology model based on the 1LBD RXR-alpha structure. MCD generalizes the widely used root-mean-squared-deviation (RMSD) measure from three dimensions to n-dimensions, where n in the current study equals the number of atoms either in the ligand-binding pocket or in the complete monomer. For both RMSD and MCD analyses we compare individual atom trajectories. Comparisons based on MCD indicate that the ligand-binding pocket atoms exhibit more freedom in the model than in the 1QKU structure. In contrast, RMSD did not indicate any significant differences for the ligand-binding pocket atoms. When analyzing all atoms in the monomer, MCD highlighted histidine 524, a residue experimentally observed to be important for ligand binding, as especially flexible in the model. In contrast, histidine 524 did not exhibit significant differences in RMSD. However, RMSD analysis suggested significant changes in the N-terminal end of helix 3, while MCD did not. We believe, therefore, that RMSD and MCD detect different functionally relevant structural changes, and it is likely that the two analysis techniques offer complementary indications of protein structural changes.
TOP
W19 - Selection of Targets for Function and Structure Determination in the Isoprenoid Synthase Superfamily
Short Abstract: As sequencing expands, the proportion of gene products that can be experimentally characterized becomes vanishingly small, yet current computational approaches for function prediction are not sufficiently reliable alone. Organization of homologous proteins of known and unknown function into superfamilies provides a context that can inform function prediction for unknowns. The Enzyme Function Initiative (EFI) is a computational/experimental partnership for developing a large-scale sequence/structure-based strategy for functional inference. Using as an example the isoprenoid synthase (IS) superfamily, we discuss how the EFI bioinformatics core guides selection of targets for experimental and computational characterization that can aid in functional inference for many other unknowns. Isoprene synthases generate more than 50,000 known natural products yet reaction and substrate specificities are known for <7% of superfamily members.

Based on sequence and structural information, we identified nearly 10,000 divergent superfamily members. Sequence similarity networks were used to visualize the relationships among them, identifying 5 distinctive functional subgroups. Hundreds of potential targets were suggested to cover the breadth of unknowns in the superfamily, then representatives for closely related sets were chosen based on the availability of DNA (to enable cloning), likelihood for successful protein production and structural characterization by crystallography or modeling, genetic tractability of species (for microbiology), and criteria to enable experimental screening by enzymology. The filtered set of 540 targets was submitted for analysis by computational and experimental cores. As their functions are determined, the results will be used to develop annotation transfer rules so that the experimental information can be extrapolated to unknowns.
TOP
W20 - Predicting protein binding sites on homology models by a guided integration of sequence and structure data
Short Abstract: Determining locations of the protein binding sites is important for understanding the protein function mediated by the protein-protein interactions as well as for structural characterization of the interaction complexes. During the last decade, both sequence-based and structure-based approaches have been developed to predict protein binding sites. The sequence-based methods can be applied to virtually any protein sequence. However, they are less accurate than structure-based methods. On the other hand, structure-based approaches, while being more accurate, are limited by the lack of structural data for many proteins. Therefore, the goal of a new binding site prediction approach would be to bridge the coverage gap, while preserving and hopefully improving the prediction accuracy. Here, we propose a sequence-based protein binding site prediction approach that, nevertheless, utilizes the benefits of a structure-based method. We achieve this by first constructing a homology model of a target protein, and then predicting the protein binding site(s) through a rational integration of the sequence and structure properties using a machine learning approach. The basic idea of such integration is to rely more on (i) structure-based predictions in the regions that are accurately modeled and (ii) sequence-based predictions in those regions that are not modeled well. As a result, we show that our sequence-based binding site prediction approach is able to obtain comparable accuracies with state of the art structure-based predictors.
TOP
W21 - A fast and accurate method for large-scale transmembrane beta barrel topology prediction
Short Abstract: Transmembrane beta barrel proteins (TMBs) play a major role in the normal functioning of the cell and are an important constituent of the translocation machinery of the outer membrane proteins in bacteria, mitochondria and chloroplast. Currently there are only around 36 experimental 3D structures available for TMBs in PDB (at 30% sequence identity). Furthermore, with large amounts of sequence data available from high-throughput methods, it is imperative to develop accurate and fast computational methods for their identification and topology prediction.

Here, we present a fast and accurate method to identify TMBs and predict their topology. The method uses sparse encoded amino acid data as input and employs a Support Vector Machine (SVM) and a Hidden Markov Model (HMM) to generate accurate topologies. The topologies in the training phase are divided into pre-barrel state, outer-loop state, inner-loop state and the transmembrane beta-strand state. In the first stage, 4 separate SVMs are employed to predict the local state preference for each residue. A profile generated from the probabilities thus obtained is used as input to the HMM stage to determine the overall topology. If the number of predicted strands is between 8 and 24, then the given sequence is identified as a TMB. We see the application of our method in the proteome-wide topology prediction of TMBs, where current methods might have a limitation due to the time consuming homologous sequence search step.
TOP
W22 - Search for Conserved Topology in RNA Ensembles
Short Abstract: RNA molecules play an intricate role in many cellular processes, but unlike protein and DNA, our ability to predict and compare RNA structure is limited. Divergent RNA molecules with similar functions, are likely to have similar structure but might have no detectable resemblance in sequence. So, a sequence homology based approach for prediction and comparison is not very reliable. Moreover, only a fraction of biologically relevant stems are predicted by RNA structure prediction methods based on Minimum Free Energy (MFE) approaches, due to restrictions of the algorithms and inaccuracies in energy parameters. At the cost of some overprediction, the set of MFE predicted stems can be extended to include pseudoknots and suboptimal stems. To identify structurally similar RNAs, we need a tool that can find conserved stem topologies in a set of RNA structures, without relying on the primary sequence.


We propose a comparative graph theoretical framework to learn these biologically critical structures. We convert RNA structures to a graph representation (XIOS RNA graph) that includes pseudoknots and mutually exclusive structures, thereby representing ensembles of RNA structures in a single graph. We develop "XIOS Match", a RNA structure matching tool, by using a maximal subgraph isomorphism algorithm, and use it to identify the greatest topological match for a set of RNA structures. We apply our tool to different types of RNA, including ensembles of near MFE structures, and demonstrate that conserved motifs discovered for various RNA species are likely to have functional and structural significance.
TOP
W23 - MEGADOCK: A rapid screening system of protein-protein interactions with all-to-all physical docking
Short Abstract: Background: The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular systems and structure-based drug designs. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge.

Results: We have been investigating a protein-docking approach based on shape complementarity and physico-chemical properties. To realize the procedures required to sample a huge number of protein dockings, we have developed “MEGADOCK”, a high-speed protein-protein docking software package. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real-Pairwise Shape Complementarity (rPSC) score. We demonstrate that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When our PPI prediction system was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120x120=14,400 combinations of proteins, an F-measure value of 0.231 was obtained.

Conclusions: MEGADOCK showed comparable docking accuracy to other FFT-based software programs, such as ZDOCK, while employing a much simpler and thus computationally less expensive score function. Additionally, our software was shown to be applicable to a large scale protein-protein interaction screening problem with accuracy better than random. With our approach combined with parallel high-performance computing systems, searching and analyzing protein-protein interactions with consideration to three-dimensional structures at the interactome scale is now a feasible problem.
TOP
W24 - Fitting protein assemblies into cryoEM density maps using vector quantisation, network alignment and a genetic algorithm
Short Abstract: Single-particle cryo electron microscopy (cryo EM) is amongst the most useful methods available when attempting to study large-scale dynamics. However, most cryo EM maps have a resolution worse than 5 Å and at these levels the unambiguous placement of atoms is not feasible. It is for this reason that fitting atomic structures of proteins and nucleic acid components into cryo EM maps is the primary method for extracting pseudo-atomic models from those maps. Of particular difficulty is the process of assembling the components of large protein complexes into a low-resolution map (worse than ~10 Å) simultaneously, depicting its general shape.

We have developed an automated fitting procedure to address such issues. It relies upon the clustering of a cryo EM density map into a set of vectors (‘feature points’) that embody a simplified description of the map and the subunits, and can be used to quickly place subunits using an integer quadratic programming-based network alignment algorithm [1]. This is combined with a genetic algorithm used to optimise the positions of the feature points. Our current, simulated test cases suggest that this procedure can be used to produce accurate fits.

1. Zhang S et al. (2010) A fast mathematical programming procedure for simultaneous fitting of assembly components into cryoEM density maps. Bioinformatics (ISMB) 28(12): i261–i268.
TOP
W25 - Computational analysis of glycosaminoglycan binding to chemokine CCL5
Short Abstract: Chemokines are small chemotactic proteins, many of which are pro-inflammatory and are released by host in response to pathogen invasion. They act as chemo-attractants, recruiting specific immune cells to the site of infection or physical damage. The migration of these immune cells requires immobilization of chemokines onto proteoglycans located at the surface of endothelial cells and in the extracellular matrix.
Chemokine CCL5 (or Rantes) primarily attracts T-cells, eosinophils and basophiles, which produce its corresponding receptor, CCR5. Binding of CCL5 to the glycosaminoglycan (GAG) component of proteoglycans has been shown to induce oligomerization of the chemokine into dimeric, tetrameric and even higher order structures.
Unfortunately, polysaccharide-protein complexes are refractory to crystallization, and therefore require alternative approaches for the characterization of their 3D complexes. Here we are employing computational methods (docking and molecular dynamics simulations) to develop an atomic-level understanding of the mechanism of GAG recognition by CCL5. In particular, we are looking at the origin of the influence of GAG sulfation pattern on binding affinity, specificity and protein oligomerization. The results will be discussed in reference to sparse but critical data from NMR and MS experiments.
TOP
W26 - A Conditional Neural Fields model for protein threading
Short Abstract: Template-based modeling (TM) methods including homology modeling and protein threading have been extensively studied for protein 3D structure modeling. The TM model quality critically depends on the accuracy of sequence-template alignment, which usually contains alignment errors when only distantly-related templates are available. Current TM methods are limited in the following aspects. One is these methods use linear scoring functions to guide the sequence-template alignment. The other is these methods heavily depend on sequence profiles. To go beyond the limitations of current alignment methods, we present a novel Conditional Neural Fields (CNF) method for protein threading, which can align a sequence to a distantly-related template much more accurately. Our method combines sequence information and structure information using a probabilistic nonlinear scoring function and integrate as much information as possible to estimate the alignment probability of two residues. In particular, our CNF method utilizes neighborhood (sequence and structure) information to estimate the probability of two residues being aligned much more accurately. We use a novel quality-sensitive method to train the CNF model, as opposed to the standard maximum-likelihood (ML) method. The ML method treats all the aligned positions equally, which is inconsistent with the fact that some positions are more conserved than others and more important for protein alignment. By directly maximizing the expected alignment quality, the quality-sensitive method puts more weight on the conserved positions to ensure accurate alignment. Experimental results confirm that the quality-sensitive method usually can result in better alignments.
TOP
W27 - X-inactivation: multiple protein-RNA associations eXist
Short Abstract: X chromosome inactivation (XCI) is a process that targets the transcriptional silencing of one of the female X chromosomes, resulting in dosage equivalence with males. Little is known about the molecular machinery regulating the initial steps of X chromosome inactivation. What cellular processes cause X-chromosome inactivation? Molecular details have emerged on the long non-coding transcript Xist and its protein network. We recently introduced a theoretical framework, catRAPID, to study protein associations with RNA molecules. Our approach exploits physico-chemical properties of nucleotide and amino acid chains to estimate their binding ability with great accuracy. Using catRAPID, we are able to reproduce all available experimental data on Xist associations with proteins and propose new interactions. We predict interactions between the long noncoding RNA Xist and a number of epigenetic regulators such as the Polycomb group, the transcription repressor factor YY1, the splicing factor ASF and nuclear proteins SATB1 and SAF-A. Our predictions are in striking agreement with available experimental findings and provide directions for future investigations. In the uncharted field of long non-coding RNAs, where experimental evidence is still rather scarce, catRAPID provides a powerful approach to investigate in silico protein-RNA associations and design experiments that will help the functional characterization of unexplored long non-coding RNAs.
TOP
W28 - RaptorX: A web server for template-based protein structure modeling
Short Abstract: A key challenge of modern biology is to uncover the functional role of the protein entities comprising cellular proteomes. To this end the availability of reliable three-dimensional atomic models of proteins is often crucial. We developed a novel community-wide web-based protocol, RaptorX (http://raptorx.uchicago.edu), for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly-related template proteins (especially those with sparse sequence profile) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. In the recent CASP9 experiment, RaptorX was ranked No.2 among ~80 participating servers, when all test domains were considered. RaptorX also generated significantly better alignments than other servers for the 50 hardest template-based modeling targets in CASP9 and was voted as one of the most interesting and innovative methods by the CASP9 community. In addition to structure prediction, RaptorX also provides domain parsing of long protein sequences and disorder prediction to help users to interpret secondary and tertiary structure prediction results. Currently it takes RaptorX ~35 minutes to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6000 sequences submitted by ~1500 users from around the world.
TOP
W29 - Computer Guided Reactivation of p53 Cancer Mutants
Short Abstract: The tumor suppressor protein p53 is the single most frequently mutated protein in human cancer. About 50% of human tumors express mutant p53, about three-quarters of which is full-length p53 with a single amino acid mutation. Reactivating p53 cancer mutants is a long-held medical goal, and animal studies have shown that restoring p53 activity induces regression even in advanced tumors. Feasibility of this strategy has been demonstrated by the identification of a handful of prototype p53 rescue drugs. However, their rescue spectrum and mechanisms are unknown. Second site point mutations can also reactivate many clinically important p53 cancer mutants in vivo (p53 cancer rescue mutants).

We used a combined computational and biological approach to design an effective platform for the prediction, identification, and optimization of p53 cancer mutant reactivation. Our discovery strategy is guided by state-of-the-art machine learning, genetic mutation analysis, molecular dynamics simulations, solvent fragment mapping, and in-silico docking. We also demonstrated that a structural change from drug binding and a structural change from a genetic mutation can be processed equivalently by our computational predictor, so we are able to use p53 genetic mutation data to make predictions about prototype p53 rescue drugs. These results provide the basis and tools for the development and improvement of potential cancer therapeutics that function through reactivation of p53 cancer mutants.
TOP
W30 - The Protein Model Portal – Towards More Transparent Model Quality Estimation and Assessment
Short Abstract: The Protein Model Portal (PMP) has been developed to foster effective use of molecular models in biomedical research by providing convenient and comprehensive access to structural information for a specific protein. For the first time both experimental structures and theoretical models for a given protein can be searched simultaneously, and analyzed for structural variation. The current release allows searching 22.8 millions of model structures for 3.7 million distinct UniProt entries (UP rel. 2012_03).
Ultimately, the accuracy of a structural model determines its usefulness for specific applications. Model quality estimation tools allow evaluating the accuracy of generated models to indicate their usefulness for specific applications in biomedical research. Here, we present new developments of Protein Model Portal supporting model validation and quality estimation, which consist of (1) service interfaces to several established modeling and model quality estimation tools and (2) CAMEO (Continuous Automated Model EvaluatiOn) mechanisms for the continuous evaluation of the modeling programs participating in PMP. The Protein Model Portal does not only offer a unique opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for a specific protein, but also benefit from the analysis of the continuous assessment of the modeling services registered with CAMEO.
Visit us at www.proteinmodelportal.org !
TOP
W31 - A rational protein redesign method for improved secretion yields in Aspergillus niger
Short Abstract: High secretion yields are required for industrial production of enzymes. In previous work, using a unique set of proteins that were tested for high-yield secretion in the microbial cell-factory Aspergillus niger, we explored sequence characteristics to predict successful secretion. From the large set of explored features, a protein's amino acid composition was found to be most predictive. Here, we develop a method to rationally redesign the sequence of a 'low-yield' protein to make the amino acid composition more similar to that of 'high-yield' proteins, with the aim to optimize secretion yields. Our method was validated in silico, by showing that the a redesigned low-yield protein has an increased sequence identity to a structurally similar high-yield protein. Experimental work will be performed to also validate the method in vivo.
TOP
W32 - Ultra rapid, accurate quality assessment of\protein structure models
Short Abstract: Prediction of proteins' 3D structure is one of the major goals of contemporary bioinformatics. For each predicted model, there is a need for use of an independent measure to evaluate correctness of the model. This is the role of Model Quality Assessment programs (MQAPs). Traditionally, MQAPs focused on evaluating structural features of predicted models, to assess the likelihood of model being similar to a native protein structure. These approaches are capable of detecting non-physical model conformations, but of discriminating which of the two biophysically feasible structure models is in the correct conformation. The advent of consensus methods alleviated this problem. Consensus methods are based on the premise, that among different models of the same protein, the one that is most similar to the others is most likely to be correct.

Most of consensus-based MQAPs rely on structural superposition, which is a computationally expensive process and as such makes consensus approaches unfeasible for larger model ensembles. Additionally, structural superposition
does not account for conformational flexibility of proteins.

Approach presented in this work does not rely on structural superposition, but rather on comparison of inter-atom distance matrices. It is at least as efficient in selecting the most accurate models from the model ensemble, as world-leading consensus methods. Due to use of the streaming computing platform (off-the-shelf CUDA-compatible GPU), it is able to obtain at least a 10-fold speed-up in comparison to the other approaches, with no upper bounds on the
amount of models in the ensemble, nor on the model size.
TOP
W33 - ProBiS – 2012: Web Server and Web Services for Detection of Structurally Similar Binding Sites in Proteins
Short Abstract: The ProBiS web server is a web server for detection of structurally similar binding sites in the PDB and for local pairwise alignment of protein structures. In this paper, we present a new version of the ProBiS web server that is 10 times faster than earlier versions, due to the efficient parallelization of the ProBiS algorithm, which now allows significantly faster comparison of a protein query against the PDB and reduces the calculation time for scanning the entire PDB from hours to minutes. It also features new web services, and an improved user interface. In addition, the new web server is united with the ProBiS-Database and thus provides instant access to precalculated protein similarity profiles for over 29,000 non-redundant protein structures. The ProBiS web server is particularly adept at detection of secondary binding sites in proteins. It is freely available at http://probis.cmm.ki.si/old-version, and the new ProBiS web server is at http://probis.cmm.ki.si.
TOP
W34 - Evolutionary Protein Protein Interface Classification: Applications and Perspectives
Short Abstract: Distinguishing crystal contacts from biologically relevant interfaces is an important issue in protein structure analysis that naturally lends itself to evolutionary approaches. We present here a new method (and software) using three criteria, two of which evolutionary, to classify protein interfaces. The method compares favourably to PISA in performance and is suitable for a variety of applications in structural bioinformatics and structural biology, for instance validation of structures and homology models and divide-and conquer approaches to the structure determination of supramolecular complexes. We have implemented the method as a command line tool and an easy-to-use web server.
TOP
W35 - Prediction of chameleon peptides by using CSSP algorithm
Short Abstract: The sequence potential for non-native β-strand formation and the presence of protein sequences have been investigated extensively from the perspective that such structural features are implicated in protein stability and effectiveness. We demonstrated that calculation of contact-dependent secondary structure propensity (CSSP) is highly sensitive in detecting non-native beta-strand propensities in helical regions of proteins. Beta-sheet formation is the main reason for protein aggregation. Based on our study, the CSSP method offers an alternative for designing peptide fragments with varied propensity for conformational change between helix and beta-strand.
TOP
W36 - A community resource for analysis and understanding of intrinsic disorder in proteins of completely sequenced genomes.
Short Abstract: A battery of disorder predictors: VLXT, VSL2b, PrDOS and others (to be updated), were run on all protein sequence from 1,766 complete genomes. Included alongside these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These annotations together provide a resource which the community can use for the comparison of predictors to each other, for examining the overlap between the disordered predictions and SCOP domains, and for understanding the genomic distribution and evolution of intrinsic protein disorder.
The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are also statistics and tools for browsing and comparing genomes and their disorder within the context of their position in the tree of life.
TOP
W37 - Full-atom structure-based prediction of transcription factor binding sites
Short Abstract: Protein-DNA binding is of paramount importance since it is involved in cellular processes such as gene expression and cell division. Since the first DNA-protein structure complex was solved at atomic resolution, our knowledge about how the recognition is carried out has increased notably.

Two main approaches to predict protein-binding sites in the DNA have been reported: 1) Sequence-based methods, that use patterns or profiles coming from sequence alignments (e.g. consensus sequences, WebLogos) and 2) Structure-based methods, that use structural information coming from protein-DNA complexes solved by X-ray crystallography or NMR. The sequence-based methods are the most popular because of the simplicity in their use and implementation. However, these methods are not very accurate, exhibiting poor sensitivity/specificity trade offs.

In this work, we present the development of novel statistical potentials that describe the protein-DNA interactions, which are used to estimate the stability of proteins-DNA complexes from the atomic coordinates of their 3D structures. Here we show that the combined use of these potentials and software for the 3D modeling of protein-DNA complexes allowed us to recover in a large degree the known experimental binding sites for several transcription factors.

Acknowledgmets: This research was funded by grants from FONDECYT, Chile (1110400) and ICM (P09-016-F).
TOP
W38 - Integrative Structure Determination Using Proteomics Data: Application to the Nuclear Pore Complex
Short Abstract: Proteomics techniques have been used to generate comprehensive lists of protein interactions in a number of species. However, relatively little is known about how these interactions result in functional multiprotein complexes. Here, we describe and assess a novel computational technique that bridges this gap by combining data from affinity purification experiments with data from established structure determination techniques. By using small protein fragments and domains as bait, we are able to resolve domain-domain interactions in large protein assemblies. One such assembly is the nuclear pore complex (NPC), which serves as the sole mediator of nucleocytoplasmic exchange in eukaryotic cells. We used our method to determine the structure of the ~600 kDa heptameric Nup84 complex, an essential component of the NPC. This work demonstrates that integrative approaches based on low resolution data can generate functionally informative structures at intermediate resolution.
TOP
W39 - An interactive online platform for structure modelling of G-protein coupled receptors.
Short Abstract: Background:

G-protein coupled receptors are targets of nearly half of drugs at the current pharmaceutical market. Despite the great interest only few structures are known to date, because of experimental difficulties in finding the best environment, mutations and ligands to stabilize these transmembrane proteins in crystals. In contrast to time-consuming experimental methods, computational biology offers fast and accurate methods for homology modelling of GPCRs which structure consists of a common seven transmembrane helices fold. Here, we propose an integrated online platform for modelling of GPCRs and their small-molecules complexes. The platform will significantly decrease the time of structural analysis and provide computational resources needed in large scale biological projects.

Description:

The platform employs sequence profiles generation, anchored aligning of sequence profiles, assessment of template-target alignment, an extensive loop modelling and final model assessment based on statistical potentials. The final protein model is refined in the all-atom force field with hydrogen atoms included and thus can easily serve as a receptor in a flexible docking. Small molecules used in the docking can be either agonist or inverse agonists since the activation state of GPCR is taken into account during the model building procedure.

Conclusions:

We provided the molecular biology community an efficient computational tool accessible online. Due to combination of various modelling techniques we omitted the time-consuming molecular dynamics step usually used in studies of transmembrane proteins. The platform can be used not only by computational biologist but also by experimentalists to verify their data.
TOP
W40 - Predicting Kinks in Trans-Membrane Helices
Short Abstract: Here we present a knowledge-based predictor of kink position and size, using input of amino acid sequence and membrane position information derived from iMembrane.

Alpha-helical bundles make up a majority of integral membrane proteins, and include G-protein coupled receptors and ion channels.
These are both are medicinally relevant.
The difficulty of experimentally determining their structure makes modelling these membrane proteins an important tool in drug design, as well as a tool to probe their structural biology.
Transmembrane helices are often kinked, and these kinks have been shown to be functionally important. Sequence mutations have been associated with diseases, and molecular dynamics simulations have demonstrated relationships between amino acids and function.
Initially we studied a non-homologous set of helices from the PDB, using our own method to automatically identify kinks, identifying a residue on the stretched side of the helix as the kink point.
Analysis of the sequences around these kinks identified Proline as a major cause of kinks, as seen in previous studies. We also saw a number of other amino acid effects, including some amino acids preferring to be on the compressed or stretched side of the kink. In addition, by utilizing iMembrane, a homology based tool for positioning membrane protein structure in the membrane, we saw that kinks occurred more frequently in certain parts of the membrane.
Using these observed relationships, we constructed a score based predictor of kink position and size, using input of amino acid sequence and membrane position information derived from iMembrane.
TOP

View Posters By Category

Search Posters:


TOP