CASP14 | Nov 30 – 4, 2020 | Virtual Symposium | Viewing Hall

Click here for the conference programme

Click here for conference updates

Click here for the "How to?" guide

Click here to access Airmeet

Click here to access Discord

Click here access the CASP14 press release

Session 2: Wednesday, December 2, 2020 at 2:20 PM EST

Presentation 21: Ab Initio Protein Folding Guided by Deep Learning Predicted Distance and Orientations - Chengxin Zhang, University of Michigan

Show
Keywords:
Poster:
  • Chengxin Zhang, University of Michigan, United States of America
  • Yang Li, University of Michigan, United States of America
  • Xiaogen Zhou, University of Michigan, United States of America
  • Wei Zheng, University of Michigan, United States of America
  • Yang Zhang, University of Michigan, United States of America

Short Abstract: We present D-QUARK, an ab initio protein folding algorithm guided by inter-residue distances and orientations predicted by deep learning. Starting from a target sequence, a multiple sequence alignment (MSA) is selected from a pool of MSAs generated by variants of the DeepMSA algorithms. The MSA is used to predicted inter-residue distances and orientations by DeepPotential by converting MSA into a covariance matrix and a matrix of pseudo-likelihood maximization parameters and feeding them into deep residual neural networks. The distance and orientation potentials are incorporated into a comprehensive replica-exchange Monte Carlo (REMC) simulation with a flat well potential for protein folding. The high quality MSA, accurate deep learning prediction, and REMC simulation with carefully designed energy terms all contribute to the high performance of D-QUARK. In terms of the first model TM-score, D-QUARK outperforms our previous ab initio protein folding algorithm by QUARK by 108.8% and two state-of-the-art distance-based structure prediction programs, DMPfold and trRosetta, by 22.9% and 11.4%, respectively.

Video not uploaded

To ask a question to the presenter click here

Presentation 22: Quality Assessment of Protein Models using Graph Convolutional Networks - Soumyadip Roy, Colorado State University

Show
Keywords:
Poster: Poster not uploaded
  • Soumyadip Roy, Colorado State University, United States of America
  • Asa Ben-Hur, Colorado State University, United States of America

Short Abstract: Protein model quality assessment is an important problem. There have been many algorithms proposed for this task, including deep learning methods that use 3D convolution . This showed the promise of deep learning architectures for this problem. We decided to go with Graph Convolutional Networks (GCNs) which have not been used for this task to the best of our knowledge. Proteins can be considered as graphs with the atoms as nodes. GCNs are a very powerful neural network architecture which can produce useful feature representations of nodes in networks. Therefore, we hypothesize that GCNs can learn the features that help discriminate decoys from near native models.In our method we considered each protein as a graph with the atoms as nodes. We then applied multiple layers of graph convolution over atom-level features, followed by a couple of dense layers. Our aim is to predict a score for the entire protein that reflects the Global Distance Test Total Score (GDTTS) 3 of a model with respect to its native structure. In other words, we use GDTTS scores as our ground truth labels. In later work we extended this approach to predict residue level GDTTS training along with some residue level features which significantly improved performance. We trained our model on CASP 11 and CASP 12 datasets which consisted of over 200 targets in total and tested on CASP 13 datasets consisting of 143 targets. We obtained a Pearson rank correlation of 0.61. Our more advanced models, which were not ready in time for CASP 14, yielded improved accuracy with a rank correlation of 0.83.

To ask a question to the presenter click here

Presentation 23: Refinement with Improved Restrained Molecular Dynamics - Connor Morris, Brigham Young University

Show
Keywords:
Poster: Poster not uploaded
  • Connor Morris, Brigham Young University
  • Wendy Billings, Brigham Young University
  • Dennis Della Corte, Brigham Young University

Short Abstract: Our MD-based refinement protocol placed us among the top performers in the refinement category. Our method is based on the Feig Lab CASP13 refinement protocol, but contains many important improvements, including reduced simulation time. Here, we present key components of our refinement method, an interesting MD-based method for determining structure quality, and details about results of our participation in the CASP14 refinement challenge.

To ask a question to the presenter click here

Presentation 24: MELDxMD: incorporating ML derived distograms and NMR data into simulations - A. Mondal, University of Florida

Show
Keywords:
Poster:
  • A. Mondal, University of Florida, United States of America
  • A. Perez, University of Florida, United States of America

Short Abstract: We have previously established the value of MELDxMD as a Bayesian inference approach to incorporate information into Molecular Dynamics simulations, determining high accuracy structures for several targets. Previously, the largest success has come from using NMR data, and using heuristics on small targets. Here we try to break the size barrier for MELDxMD in the absence of NMR data by using distograms derived from combining sequence co-evolution with machine learning. In the current CASP 14 event, we have submitted 99 predictions, in the TS, TR and data driven categories. All simulations were performed with our local resources at the HiperGator Supercomputer center at the University of Florida. In this poster I will show our method, theoretical background and results.

Video not uploaded

To ask a question to the presenter click here

Presentation 25: Protein structure refinement: how frustration analysis assists principal compenent guided simulations - Shikai Jin, Rice University

Show
Keywords:
Poster:
  • Shikai Jin, Rice University

Short Abstract: There is a significant progress in structure refinement yielding moderate accuracy by means of all-atom molecular dynamics simulation since last CASP experiment. It has been shown that principal component guided refinement can successfully search for refined structures that require large conformation changes. Our group also find frustration analysis augments this approach by finding those parts of the structure that already have high local accuracy. Combining these ideas can lead to the refinement of the protein structure with improved accuracy. In this poster, we show our performance in several targets and analyze their frustration patterns as well as the principal component properties.

Video not uploaded

To ask a question to the presenter click here

Presentation 26: A Holistic Approach to Integrate Template-based and Template-free Modeling - Y. Yamamori, AIST

Show
Keywords:
Poster:
  • Y. Yamamori, AIST
  • M. Takemoto, Preferred Networks Inc.
  • R. Ishitani, Preferred Networks Inc.
  • K.Mizuno, Preferred Networks Inc.
  • Y. Tsuchiya, AIST
  • K. Oono, Preferred Networks Inc.
  • K. Tomi, AIST

Short Abstract: In CASP14, we made predictions by integrating both template-based and template-free approaches to cope with large proteins and protein complexes. For template-based modeling, we generated 3D-models based on target-template alignments derived from our profile?profile aligner FORTE, using MODELLER. Profiles of both targets and templates were prepared based on multiple sequence alignments (MSAs), which were obtained using three methods: i) a combination of SSEARCH with the MIQS matrix and PSI-BLASTexB, followed by MAFFT, ii) DELTA-BLAST, and iii) HHblits. In some cases, we used multiple templates that were selected manually to generate 3D-models. For free modeling, we generated 3Dmodels based on predicted contacts of residue pairs using CONFOLD2. Predicted contacts were obtained using three methods: i) DeepECA, an end-to-end learning framework of protein contact prediction that can effectively use information derived from either deep or shallow MSAs; ii) distance distribution prediction similar to AlphaFold; and iii) consensus prediction of contacts derived from selected CASP-hosted predictions. We converted distance prediction results, derived from method ii), into contacts when we used them. When no suitable template was found for multimetric targets, we performed rigid-body docking of subunits. To evaluate and rank the generated 3D-models described above, we mainly used three guidelines: i) the coverage of predicted contacts, derived from DeepECA, satisfied in a model; ii) similarity to a consensus model, selected based on the average TM-score with other models, among models provided by CASP-hosted servers; and iii) Z-scores calculated using FORTE. In addition to these guidelines, for easy targets, we utilized classical 3D-scores such as Verify3D and dDFIRE. For multimetric targets, we also considered the stoichiometry of templates. Occasionally, we performed cluster analysis to evaluate and rank our generated 3D-models. Because these guidelines and scores used for ranking are not always consistent, final ranking was sometimes done with human intervention. For the data-assisted target S1063, we used two metrics to select 3D-models to compare the calculated SAXS profile of a model and the experimental SAXS profile: Chi2 and the volatility of the ratio. Most of our submissions, approximately 60%, are models obtained using template-based approaches. The remaining models were derived from template-free approaches or combinations of the two approaches (partially modeled by free modeling). According to our in-house assessments, for TBM-easy targets our template-based modeling worked well. The TM-scores of our submitted models with the PDB structures for T1024-D1 and D2 are 0.845 and 0.774, respectively. For TBM-hard targets, our template-based modeling still predicted the structures near the PDB structures for 1026-D1, T1032-D1, T1046s2-D1, T1056-D1, and T1099-D1. The TM-score are 0.694, 0.613, 0.685, 0.662, and 0.632, respectively. For FM targets, there is a case, T1029-D1, our template-free modeling can predict a relatively near structure to the PDB one (TMscore 0.500), even for another case, T1082-D1, our template-based modeling is better than the other (TMscore 0.545). For data-assisted target S1063, we assume that we were able to improve our model based on the multimer shape derived from the experimental SAXS data provided in terms of the results from Chi2 and the volatility of the ratio.

To ask a question to the presenter click here

Presentation 27: ProSPr Distance Predictions at CASP14 and Beyond - Wendy Billings, Brigham Young University

Show
Keywords:
Poster:
  • Wendy Billings, Brigham Young University, United States of America
  • Connor Morris, Brigham Young University, United States of America
  • Dennis Della Corte, Brigham Young University, United States of America

Short Abstract: Our DellaCorteLab team submitted over 700 models across several CASP categories, including RR contacts/distances. These predictions were made using updated versions of our deep ProSPr network and were trained to predict directly in the 10-bin CASP14 distance prediction format specified by the organizers. Here, we share preliminary analysis of our contact performance on CASP14 targets, and investigate correlations with multiple sequence alignment depth and target size. We also present results regarding the ensembling of different contact prediction methods on CASP13 data, demonstrating their potential to outperform all individual component groups. Finally, we outline plans to release our newest improved ProSPr models and encourage groups to do the same, in order to create superior community-derived contact predictions. Both slides and a recording are made available.

To ask a question to the presenter click here

Presentation 28: TorchMD: Enlightening molecular simulations with machine learning - Stefan Doerr, Acellera

Show
Keywords:
Poster:
  • Stefan Doerr, Acellera, Spain
  • Maiciej Majewski, Computational Science Laboratory, Spain
  • Adria Perez, Computational Science Laboratory, Spain
  • Cecilia Clementi, Freie Universitat, Germany
  • Frank Noe, Freie Universitat, Germany
  • Toni Giorgino, CNR-IBF, Italy
  • Gianni De Fabritiis, Acellera, Spain

Short Abstract: Molecular dynamics (MD) simulations provide a mechanistic description of molecules by relying on empirical potentials. They allow modelling of time-dependent molecular processes, including protein folding. The quality and transferability of potentials used for the simulation can be improved leveraging data-driven models derived with machine learning (ML) approaches. Here, we present TorchMD for mixed classical and ML potentials molecular simulations. All of the force computations including bond, angle, dihedral, Lennard-Jones and Coulomb interactions, including the ability to run full all-atom systems with the Amber force field are expressed as PyTorch arrays and operations. TorchMD enables rapid prototyping and integration of ML potentials by extending force terms commonly used in MD with neural network potentials (NNP) of arbitrary complexity. Furthermore, the package provides the capability to perform end-to-end differentiable simulations. Additionally, we demonstrate the use of TorchMD for learning and simulating an NNP for coarse-grained models of atomistic systems and protein folding. We show training, simulation and analysis of the coarse-grained NNP and that such models are able to fold chignolin into the correct native structure, highlighting their potential in protein structure prediction Code and data are freely available at github.com/torchmd

Video not uploaded

To ask a question to the presenter click here

Presentation 29: Deep Learning for protein models quality assessment in CASP14 - Gabriele Pozatti, Stockholm University

Show
Keywords:
Poster:
  • Gabriele Pozatti, Stockholm University
  • Federico Baldassarre, Royal Institute of Technology

Short Abstract: Nowadays it is very important to follow up protein structure prediction methods with a quality assessment (QA) step, able to verify modelled structures’ reliability. For the 14th CASP edition, we submitted quality estimates derived from two Deep Learning-based predictors, ProQ4 and GraphQA. Here, we present a brief description of these methods, as well as a preview of such methods’ performance, calculated on some of the 14th CASP edition targets for which the crystal structures are already available. ProQ4 is a deep learning predictor which uses as input a multiple sequence alignment (MSA), as well as a coarse representation of the protein models to be evaluated. This predictor is trained to extrapolate the Local Distance Difference Test (LDDT), a metric which allows both local and global model QA. On the other hand, GraphQA estimates protein quality using a graph-based representation of protein structure and a Graph Convolutional Network. Overall, GraphQA employs input features similar to ProQ4 but achieves better performances on past CASP editions thanks to a better representation of the spatial structure, which is based on graphs rather than sequences. Specifically, the input to GraphQA is a graph whose nodes represent amino-acids and whose edges represent contacts between residues. By construction, edges are placed between nodes that are neighbours in the sequence, i.e. the corresponding residues appear close in the primary structure, or that are neighbours in space, i.e. they are within a certain distance in the tertiary structure. A single GraphQA model is trained to output many quality assessment scores, at both the residue and protein level. Namely, for each residue, LDDT and CAD scores are predicted. Also, at the protein level GraphQA predicts GDT-TS, GDT-HA, TM-score, LDDT and CAD. Some interesting conclusions have been produced, by looking at the results on these programs quality assessment on the available targets.

Video not uploaded

To ask a question to the presenter click here

Presentation 30: Redundancy-Weighting for Detailed Secondary Structure Prediction with applications in Estimation of protein Decoy Accuracy - Tomer Sidi, University of Haifa

Show
Keywords:
Poster:
  • Tomer Sidi, University of Haifa
  • Chen Keasar, Ben-Gurion University of the Negev

Short Abstract: The Protein Data Bank (PDB), the ultimate source for data in structural biology, is inherently imbalanced. To alleviate biases, virtually all structural biology studies use non-redundant subsets of the PDB, which include only a fraction of the available data. An alternative approach, dubbed redundancy-weighting, down-weights redundant entries rather than discarding them. This approach may be particularly helpful for Machine Learning (ML) methods that use the PDB as their source for data. Current state-of-art methods for Secondary Structure Prediction of proteins (SSP) use non-redundant datasets to predict either 3-letter or 8-letter secondary structure annotations. The current study challenges both choices: the dataset and alphabet size. On the one hand, Non-redundant datasets are presumably unbiased, but are also inherently small, which limits the machine learning performance. On the other hand, the utility of both 3- and 8-letter alphabets is limited by the aggregation of parallel, anti-parallel, and mixed beta-sheets in a single class. Each of these subclasses imposes different structural constraints, which makes the distinction between them desirable. In this study we show improvement in prediction accuracy by training on a redundancy-weighted dataset. Further, we show the information content is improved by extending the alphabet to consider beta subclasses while hardly effecting SSP accuracy. Finally, we show the utility of 13-class SSP on Estimation of protein Model Accuracy (EMA).

To ask a question to the presenter click here

Presentation 31: Protein structure prediction in CASP14 with the coarse-grained UNRES model - A. Antoniak, University of Gdansk

Show
Keywords:
Poster:
  • A. Antoniak, University of Gdansk
  • I. Biskupek, University of Gdansk
  • K.K. Bojarski, University of Gdansk
  • C. Czaplewski, University of Gdansk
  • A. Gieldon, University of Gdansk
  • M. Kogut, University of Gdansk
  • M.M. Kogut, University of Gdansk
  • P. Krupa, Polish Academy of Sciences
  • A.G. Lipska, University of Gdansk
  • A. Liwo, University of Gdansk
  • E.A. Lubecka, University of Gdansk
  • M. Marcisz, University of Gdansk
  • M. Maszota-Zieleniak, University of Gdansk
  • S.A. Samsonov, University of Gdansk
  • A.K. Sieradzan, University of Gdansk
  • M.J. Slusarz, University of Gdansk
  • R. Slusarz, University of Gdansk
  • P.A. Wesolowski, University of Gdansk
  • K. Zieba, University of Gdansk

Short Abstract: We tested, with the CASP14 targets, our methodology for protein-structure prediction, which is based on the UNRES heavily coarse-grained physics-based model. In UNRES the only interaction sites are united backbone peptide groups and united side chains, the alpha-carbon atoms serving to define backbone geometry. Both monomeric and oligomeric targets were treated and the models of oligomeric targets were submitted both to CASP and to CAPRI. Three prediction modes were used, each executed a different group of predictors (i) database-unassisted mode (group 360, UNRES; 81 regular targets processed), (ii) server-model assisted mode (group 18, UNRES-template; 81 regular targets processed), and contact-assisted mode (group 96, UNRES-contact; 80 regular targets processed). Each group also processed the NMR-data-assisted targets N1077 and N1088 and the SAXS-data-assisted target S1063. Refinement targets were processed by the UNRES (48 targets) and UNRES-template (50 targets) groups. The prediction procedure consisted of the following stages (i) running unrestrained or restrained multiplexed replica-exchange (MREMD) simulations of the targets with UNRES to explore the conformational space, (ii) determining the analysis temperature (before the unfolding-transition temperature) and determining the probabilities of the conformations by using weighted histogram analysis method (WHAM), (iii) dissecting the simulated conformations into 5 (CASP) or 10 (CAPRI) families by minimum-variance clustering and selecting the conformations closest to cluster means for further processing, ranking following cluster free energy (iv) conversion of the coarse-grained structures to all-atom structures to obtain the candidate predictions which were submitted to CASP/CAPRI. For smaller targets, MREMD simulations were started from random structures and for larger targets the starting structures were the `stage 2’ server models. Only weak secondary-structure restraints derived from PSIPRED predictions were imposed in UNRES-group prediction procedure, the UNRES-template group used the consensus-fragment restraints derived from the server models and from the models generated by in-house installation of iTASSER, while the UNRES-contact group used the contact restraints determined by using DNCON2. For oligomeric targets, the HHpred server was used to obtain hints as to the possible structures of oligomers. The recent extension of UNRES enabled us to run simulations for very large targets, including the 260-mer virus capsid (target T1099). For refinement, we developed a method, based on normal-mode analysis, with which to identify flexible regions, on which the conformational search for alternative geometries was focused. To implement NMR-based restraints, we developed an approach to estimate the positions of backbone and side-chain hydrogen atoms from coarse-grained geometry. More details of the prediction procedure are available in the CASP14 abstracts of the UNRES, UNRES-template, and UNRES-contact groups and in references therein. As the officials CASP14 results had not been published at the time the poster was created, only the results of comparison of the predictions of the UNRES, UNRES-template, and UNRES-contact groups with the corresponding experimental structures that had been released in the PDB at that time are presented.

Video not uploaded

To ask a question to the presenter click here

Presentation 32: Determining the pathway from nascent chain to native state - Leonor Cruzeiro, CCMAR and FCT

Show
Keywords:
Poster:
  • Leonor Cruzeiro, CCMAR and FCT, Portugal

Short Abstract: Here the aim is to obtain the native state of proteins from the knowledge of the physical process of folding. The hypothesis is that protein folding in vivo is a non-equilibrium, dynamical, process which always starts from a helical chain. The first step in the pathway to the native state is the bending of the initial helix at specific amino acid sites. Thus, finding the location of these sites is key to deciphering the pathway. It is argued that some of those sites are found in regions of the initial helix bounded by positive amino acids at the N-end and by negative amino acids at the C-end.

Video not uploaded

To ask a question to the presenter click here

Presentation 33: Template-based modeling of protein complexes using the PPI3D web server - Justas Dapkunas, Vilnius University

Show
Keywords:
Poster:
  • Justas Dapkunas, Vilnius University, Lithuania

Short Abstract: The PPI3D web server is user-friendly software, focused on searching, analyzing and modeling protein-protein, protein-peptide and protein-nucleic acid interactions in the context of three-dimensional structures. Reducing the data redundancy by clustering and analyzing the properties of interaction interfaces using Voronoi tessellation makes this software a highly effective tool for addressing different questions related to protein interactions. In recent CASP and CAPRI experiments, PPI3D also proved to be highly effective in detecting structural templates for modeling protein complexes, thus it may be useful for anyone interested in all types of protein interactions.

Video not uploaded

To ask a question to the presenter click here

Presentation 34: Tertiary and quaternary structure prediction in CASP14 using a combination of physics-based approaches with machine learning - Agnieszka Karczynska, CNRS

Show
Keywords:
Poster:
  • Agnieszka Karczynska, CNRS, France
  • Sergei Gurdinin, CNRS, France

Short Abstract: In the current CASP14 experiment, we participated as three human groups for the tertiary and quaternary structure prediction (the TS category of targets). These were VoroCNN-select, Ornate-select, and SBROD-select groups. Each of these groups followed the same protocol but used different quality assessment (QA) methods, VoroCNN, Ornate, and SBROD, respectively. For the predictions of the assemblies, we extensively used the symmetry assembler SAM and the binary docking method Hex. Multimeric and one SAXS-assisted target also motivated us to develop novel methods specifically adapted to these targets. For example, we have extended Pepsi-SAXS for rapid computation of scattering profiles of symmetric assemblies, we extended the SAM symmetry assembler for helical symmetries, we introduced new options into the symmetry analyzer AnAnaS, we developed a novel rigid-body replica-exchange Markov-chain Monte Carlo simulation technique, we introduced more options, specifically, symmetry constraints, into the interactive docking engine, and more. More information about our methods can be found at https://team.inria.fr/nano-d/software.

To ask a question to the presenter click here

Presentation 35: Folding with MELDxMD Guided by trRosetta Predicted Contacts - Roy Nassar, Stony Brook University

Show
Keywords:
Poster:
  • Roy Nassar, Stony Brook University
  • Emiliano Brini, Stony Brook University
  • Cong liu, Stony Brook University
  • Sridp Parui, Stony Brook University
  • Gregory Dignon, Stony Brook University
  • Ken Dill, Stony Brook University

Short Abstract: Molecular dynamics (MD) simulations possess the ability to sample structures with atomic-scale resolution. A major bottleneck, however, is the vast configurational space that a protein can sample when folding. To alleviate this burden, we use our MD accelerator MELD, which leverages external information to limit the search space of physics-based protein simulations. MELD accelerated MD (MELDxMD) complements knowledge-based, machine learning, and experimental approaches by integrating data from all of them into a technique that delivers high resolution structures with free energy based scoring. In CASP14, MELD combines the AMBER ff14SBonlysc force field and gbNeck2 implicit solvent with predicted structural and contact information from trRosetta, secondary structure predictions from PSIPRED, and general protein properties (hydrophobic assosications) from sequence to guide the generation of 3D conformations in the free modeling and refinement categories. MELD integrates information as distance and angle restraints and handles noisy data by activating different subsets of the total restraints at different times throughout the simulation. Short, medium and long range trRosetta contacts were processed and incorporated as restraints between C-beta atom pairs with MELD enforcing 80% of those restraints to allow for incorrectly predicted contacts. Server predictions from trRosetta or provided refinement templates were used as seeds to start the simulations. MELDxMD operates a Replica Exchange molecular dynamics (REMD) scheme during which restraints vanish at high T to allow for large-scale conformational sampling and activate with full strength at low T to refine the structures inside the discovered energy minima. All generated structures are clustered into conformational macrostates. The centroid of each of the five most populated clusters are submitted as a representative of each macrostate. Preliminary results show success on proteins of sizes previously unattainable by MD without help from experimental data.

Video not uploaded

To ask a question to the presenter click here

Presentation 36: ReFOLD3: Refinement of 3D Protein Models with the Utilization of Contact Prediction and Local Quality Estimation - R, Adiyaman, University of Reading

Show
Keywords:
Poster:
  • R, Adiyaman, University of Reading, United Kingdom
  • L.J. McGuffin, University of Reading, United Kingdom

Short Abstract: Since the 10th Critical Assessment of Structure Prediction (CASP10), the usage of Molecular Dynamics (MD)-based refinement protocols has been found to be more effective compared with other protocols1 . However, the most successful MD-based protocols generally require supercomputer scale resources in order to refine a single 3D protein model. The ReFOLD server2 was developed by our group to rapidly refine 3D models with more modest computational resources. However, in CASP12 it was found that many of the 3D models from ReFOLD still contained structural flaws and some had drifted further away from the native structure during the refinement process. Many restraint strategies have been used to prevent 3D models from the undesired deviations caused by force field inaccuracies. In CASP13, we utilised local model quality estimates to determine the poorly predicted regions in a 3D model, which could be targeted for refinement. In CASP14, we proposed to use novel gradual restraint strategies by considering the need of refinement for each residue according to per-residue accuracy scores and our Contact Distance Agreement (CDA) scores 3 . For the refinement of 3D models of proteins, we used a modified version of our automated ReFOLD method2 . Our new refinement pipeline, ReFOLD3, consisted of four protocols that were similar to the original version. The major improvement for ReFOLD version 3 was the accommodation of the two new MD-based strategies. The first protocol used a rapid iterative strategy (i3Drefine4 ), and the second and third protocols both employed a more CPU/GPU intensive molecular dynamic simulation strategy (using NAMD5 ) to refine each starting model. The second protocol included the introduction of molecular dynamics simulations that were guided by the per-residue accuracy scores obtained from ModFOLD8. The per-residue accuracy scores were used to identify the poorly modelled regions, which were then targeted for refinement. A gradual restraint strategy based on the per-residue accuracy score was applied, which considered the degree of refinement for each residue during the MD simulations. For the third protocol, residue-residue contact predictions were used to guide the MD simulation. We used our CDA score, which is based on the agreement between the residue contacts predicted by DeepMetaPSICOV6 and the contacts in the model. If the CDA score was high, a stronger restraint was applied to keep the residues in contact as in the predicted 3D model. A lower CDA score indicated that the residue may be further away from the native structure, therefore it was targeted for refinement to improve the quality of the predicted 3D model. Therefore in the third protocol, another gradual restraint strategy was preferred. Refined models generated from the first three protocols were then assessed and ranked using ModFOLD8_rank. The fourth protocol was a combination of the first 2 approaches, where the top-ranked model from the 2nd and 3rd protocol was then further refined using i3Drefine. Finally, all of the refined models generated by each of these protocols and the starting model were pooled and re-ranked again using ModFOLD8_rank and the final top 5 models were selected and submitted during CASP14. The application of the gradual restraint strategy based on the local quality estimation and contact prediction had the effect of increasing the population of the improved models. The population of the improved models generated by the original MD-based protocol of ReFOLD was ~29.5%, and applying the gradual restraint strategy based on the local quality estimation managed to increase it by a further 4.9% to 34.4%. According to the observed scores, the population of the improved models was further increased to 35.7% for all CASP13 targets with the application of the contact-assisted MD-based protocol. The gradual restraints based on the local quality estimation were also used to refine the best predicted 3D models identified by ModFOLD8 of the SARS-CoV-2 targets as part of the CASP Commons COVID-19 initiative. A significant proportion of the top 10 scoring models were submitted by our group, according to the CASP official quality estimations results. Our participation in CASP Commons COVID-19 initiative highlights the importance of the role of our prediction pipelines for the elucidation of the structures for key protein targets whose experimental structures are not yet solved. ReFOLD3 is our first attempt at utilising contact prediction to guide refinement approaches. Using the per-residue accuracy score and CDA score has prevented the refined models from containing undesired structural deviations and this is a step towards more consistent refinement strategies. Our application of these gradual restraints has also been a unique pioneering strategy for MD-simulations.

To ask a question to the presenter click here

Presentation 39: Generative Pre-training: Improving Tertiary Protein Structure Prediction with Self-supervised Learning - Jacob Stern, Brigham Young University

Show
Keywords:
Poster:
  • Jacob Stern, Brigham Young University

Short Abstract: Current state-of-the-art protein structure prediction neural networks rely heavily on multiple sequence alignment features. These features have worked well, but they are not differentiable, preventing the neural network from improving those features during training. Additionally, reliance on multiple sequence alignment features is a limitation for structure prediction of proteins with few sequence homologs. Use of multiple sequence alignment features can be seen as a form of semi-supervised learning, using unlabeled sequences to improve protein structure prediction. I propose a different method of semi-supervised learning, based on generative pre-training tasks common in the natural language processing domain. This method results in fully differentiable features and points to fully differentiable sequence-to-structure prediction.

To ask a question to the presenter click here

Presentation 40: Experimental validation of protein prediction models with SAXS - Susan Tsutakawa, Lawrence Berkeley National Laboratory

Show
Keywords:
Poster: Poster not uploaded
  • Susan Tsutakawa, Lawrence Berkeley National Laboratory, United States of America
  • Greg Hura, Lawrence Berkeley National Laboratory, United States of America
  • John Tainer, Lawrence Berkeley National Laboratory, United States of America

Short Abstract: As predictions increase in accuracy, experimental data can contribute to confidence in the prediction and use in the biology community. Small Angle X-ray Scattering is a high throughput method that can provide information on the protein oligomerization and conformation states in solution, under varying buffer conditions.SAXS measures the electron pair distances in a molecule and can be rapidly compared to prediction models. Importantly, SAXS data can be collected for free at most synchrotrons and requires 1-5 mg/ml concentration of protein in solution. Recent advances in coupling size exclusion chromatography has enabled accurate SAXS data collection for even complex systems. Here, we present a recent example of SAXS data validating a prediction model, and request help from the CASP community on developing programs to alter atomic models to fit the SAXS data.

To ask a question to the presenter click here