Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT |
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT |
---|---|
Session A Poster Set-up and Dismantle
Session A Posters set up: Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT Session A Posters dismantle: Tuesday, July 12 at 6:00 PM CDT |
Session B Poster Set-up and Dismantle
Session B Posters set up: Wednesday, July 13 between 7:30 AM - 10:00 AM CDT Session B Posters dismantle: Thursday. July 14 at 2:00 PM CDT |
Presentation Overview: Show
The deep manifold sampler is a generative protein sequence model that enables iterative exploration of protein sequence space (Gligorijevic et al., 2021; Berenberg et al., 2022) with the advantage of guidance from oracles such as function prediction (Gligorijevic et al., 2019). Rosetta structure-based protein design (Rohl et al., 2004; Leaver-Fay et al., 2011; Alford et al., 2017) uses Monte Carlo sampling of sequence and structure space contingent on optimization of the Rosetta energy function to explore relevant structural changes during design. Integration of 3D structural data into ’oracles’ to guide and/or assess designs from generative modeling remains an open challenge. Here, we demonstrate integration of the deep manifold sampler with Rosetta and DeepAb (Ruffolo et al., 2022) through FastRelax and/or CDR graft-based design protocols using predicted antibody structures, allowing generated designs
to be conditioned on either total score or interface energy (dG) as predictions for antibody stability or antibody-antigen binding affinity. We demonstrate these two approaches on the Herceptin-Her2 system. We show that the deep manifold sampler proposes novel amino acid changes capable of widely exploring antibody sequence space effectively. We also show that this novel proposal distribution is capable of producing designs with better interface or total energy than state-of-the-art methods reliant on structure alone.
Presentation Overview: Show
DNA binding proteins play critical roles in gene regulation and development. Therefore, it is essential to develop a reliable DNA binding site prediction method. Compared to small molecule binding site prediction methods, the ones for DNA binding site prediction still have room to be improved. We constructed a convolutional neural network only using 3D coordinates and the atom-types of protein surface atoms as the input data to predict how likely a voxel on the protein surface is a DNA-binding site. The improved accuracy demonstrates the robustness of our model which produces consistent results among 3 datasets. It also proved that protein 3D structures combined with the atom-type information on the protein surfaces can be used to predict the binding sites on a protein. That inspired us to develop new prediction algorithms for the binding sites of other biological molecules on the target proteins.
Presentation Overview: Show
Bacteria in biofilms are notoriously tolerant to conventional antibiotics, making the treatment of biofilm infections a clinical challenge. Antimicrobial peptides (AMPs), seen as a novel anti-biofilm approach, target specific biofilm features, as opposed to ‘one size fits all’ antibiotics. In our previous work, we have developed Biofilm-AMP, an open-source structural and functional repository of AMPs for biofilm studies (B-AMP v1.0) consisting of >5000 AMP structures, with annotations to biofilm literature. Its user-friendly functionalities include, search-enabled AMP information, FASTA files and PDB, PDBQT structures. Additionally, AMPs with known anti-Gram positive and negative activity are listed. In B-AMP v2.0, we have upgraded with existing biofilm targets. Specifically, it consists of a manually curated, systematic list of potential biofilm targets across various bacterial pathogens. Using three databases (PDB, Uniprot, and PubMed) we have curated a list of ~2500 targets that span across >50 functional categories. Each target consists of a unique target ID, 3D structure, PDBQT file, and supporting literature references. The 3D structures for targets with no structural data were modeled using ROSETTA. As a case study, we highlight MD simulations of previously identified candidate AMPs with the catalytic site residues of the Sortase C protein (a biofilm target) of Corynebacterium striatum.
Presentation Overview: Show
Mitotic kinesin kif11(also known as EG5), is a validated chemotherapeutic target with several compounds at various stages of clinical trials. All the current drug candidates bind uncompetitively with ATP/ADP at allosteric site 1 formed by loop L5, helices α2 and α3. Recent experiments found another allosteric site (site 2), formed by helices α4 and α6 where inhibitors bind either competitively or uncompetitively to ATP/ADP.
However, it is still unclear how inhibitors that bind to two different allosteric sites of kif11, alter the kinetics of the motor domain. Here, we studied the critical structural dynamics that happen at important regions upon inhibitor binding at allosteric site 1 and 2 using coarse-grained modelling like Elastic Network Models, Gaussian Network Models (GNM) and Anisotropic Network Models (ANM). The GNM results showed differences in the structural dynamics of the various inhibitor bound states of kif11 that could attribute to different modes of inhibition. ANM showed specific functional regions of kif11. We conclude that the mechanism of binding at allosteric site 1 and 2 are unique. The simultaneous binding of ligand at both allosteric sites has structural interactions that are independently found in allosteric site 1 and 2 leading to a different mechanism of binding.
Presentation Overview: Show
Staphylococcus aureus is a gram-positive bacterial pathogen which causes various disorders, e.g., skin infections and sepsis. Antibiotics are used for the treatment of infections, however excessive usage of these drugs results in antibiotic resistance. S. aureus develops antibiotic resistance via multiple mechanisms, e.g., mutations in the genes encoding the efflux pumps, which results in inhibition of drug accumulation and bacteria can’t be eliminated. NorA is the most studied efflux pump in S. aureus and contributes to the resistance against compounds as hydrophilic fluoroquinolone antibiotics, antiseptics, dyes as ethidium bromide etc. A variety of inhibitors of the protein has also been found, suggesting a promiscuous mechanism of substrate/inhibitor recognition. Here, we aimed to characterize ligand specificity and promiscuity of NorA via molecular docking simulations using 3D structure of protein generated via AlphaFold2 and a list of its known inhibitors and non-inhibitors. An ensemble of NorA conformations was generated by MD simulations, and ligands were docked to various binding pockets on these conformations to examine binding characteristics. According to our results, NorA has shown various binding characteristics and binding sites for ligands were not consistent, which confirmed the promiscuous nature. Our findings may guide further research on discovery of novel NorA inhibitors.
Presentation Overview: Show
Sustainable butanol production from lignocellulosic biomass (LB) consists three steps viz., pre-treatment, detoxification, and fermentation. During pre-treatment of LB, many undesirable compounds (e.g. aliphatic/aromatic acids, aldehydes, furans, etc) are formed that have significant impact on the yield of butanol in fermentation, rendering the whole fermentation process economically unfeasible. With the application of quantum and molecular mechanics (QM/MM), our aim is to investigate the inhibition of key alcohol/aldehyde dehydrogenase (AAD) enzymes in LH. The objectives of present study are: (1) Identification and homology modelling of key AAD enzymes; (2) validation, quality assessment and biophysiochemical characterization of the modelled enzymes; (3) identification, construction and optimization of chemical structure of potent AAD inhibitors in LH; and (4) molecular docking and dynamics simulations to profile the molecular interactions between AAD enzymes and their inhibitors. The present study has depicted the mechanism and mode of inhibition of AAD enzyme with the crucial binding sites. The analysis revealed that by minimizing presence of ρ-coumeric acid, vanillic acid and cinnamaldehyde in LH, the biobutanol production yield from LH can be increased significantly. This study will guide the genetic/ metabolic engineers to design robust enzymes that have improved substrate utilisation resulting in substantial product yields and specificity.
Presentation Overview: Show
With the recent advances in protein 3D structure prediction, protein interactions are becoming more central than ever before. Here, we address the problem of determining how proteins interact with one another. More specifically, we investigate the possibility of discriminating near-native protein complex conformations from incorrect ones by exploiting local environments around interfacial residues. Deep Local Analysis (DLA)-Ranker is a deep learning framework applying 3D convolutions to a set of locally oriented cubes representing the protein interface. It explicitly considers the local geometry of the interfacial residues along with their neighboring atoms and the regions of the interface with different solvent accessibility. We assessed its performance on three docking benchmarks made of half a million acceptable and incorrect conformations. We show that DLA-Ranker successfully identifies near-native conformations from ensembles generated by molecular docking. It surpasses or competes with other deep learning-based scoring functions. We also showcase its usefulness to discover alternative interfaces.
Presentation Overview: Show
Dengue fever caused by the Dengue virus is an epidemic in tropical countries such as India. Currently, there are no vaccines or drugs available for treatment. In the life cycle of the dengue virus, the envelope glycoprotein E, which is associated with protein C, and protein M mediates the attachment to the host cell. The viral and host cell membrane fusion is driven by a conformational change of E protein in the low pH environment from a dimeric pre-fusion conformation to a trimeric post-fusion state. In a previous study, two novel cavities were characterized and validated. Using one of the cavity characterized in the dimer interface, here we used 3805 FDA drugs as starting candidates to identify inhibitors that would potentially lock the E protein in its dimeric meta-stable conformational state. Employing a two-step virtual screening using AutoDock Vina and AutoDock programs we identify the potential molecules, post-enrichment. Using iLibDiverse, we plan to create a focused library of molecules using criteria such as Lipinski’s rule of five, fewer steps in synthesis, the longevity of interactions to both chains of the dimer, etc. The top-scoring molecules would be further studied using MD simulations.
Presentation Overview: Show
COVID-19 has affected the lives of millions of people around the world. In an effort to develop therapeutic interventions and control the pandemic, scientists have isolated several neutralizing antibodies against SARS-CoV-2 from the vaccinated and convalescent individuals. These antibodies can be explored further to understand SARS-CoV-2 specific antigen-antibody interactions and biophysical parameters related to binding affinity, which can be utilized to engineer more potent antibodies for current/emerging SARS-CoV-2 variants. In the present study, we analyzed the interface between SARS-CoV-2 spike protein and neutralizing antibodies in terms of amino acid residue propensity, pair preference, and atomic interaction energy. We observed that Tyr residues containing contacts are highly preferred and energetically favorable at the interface of spike protein-antibody complexes. We have also developed a regression model to relate the experimental binding affinity for antibodies using structural features, which showed a correlation of 0.93. Moreover, several mutations at the spike protein-antibody interface were identified, which may lead to immune escape (epitope residues) and improved affinity (paratope residues) in current/emerging variants. Overall, the work provides insights into spike protein-antibody interactions, structural parameters related to binding affinity, and mutational effects on binding affinity change, which can be helpful to develop better therapeutics against COVID-19.
Presentation Overview: Show
The biological functions of a protein are determined by its 3D structure. Recent works on developing SO(3)-invariant and equivariant neural networks (ENNs) for protein structures have made remarkable progress in protein engineering tasks, such as protein function prediction and driver mutation identification. Nevertheless, these models have difficulty in providing interpretable explanations of the predictions for protein engineers and experimental biologists, which might not directly help with designing functionally enhanced proteins. To address this concern, we present a novel protein graph attention pooling (PGAP) layer for interpretable prediction of protein functions by emphasizing amino acids differently. Experiments with synthetic datasets validated the interpretability of PGAP. The experimental results on the enzyme-catalyzed reaction classification benchmark dataset show that the synergy of PGAP and SO(3)-ENNs protein structure representation modules achieves competitive performance with existing ENN models.
Presentation Overview: Show
Antibiotic resistance is one of the leading challenges to public health today, and a primary contributor to the rapid rise of resistance is plasmids. Therefore, plasmids are critical targets to prevent the rapid spread of antibiotic resistance. In particular, low-copy number plasmids often contain toxin-antitoxin systems that act lethally when activated, so due to the role of toxin-antitoxin systems in facilitating internal cell death, key interacting regions of the Kid-Kis toxin-antitoxin interaction were identified as binding sites for the de-novo design of small-molecule inhibitors using the webserver LEA3D. To predict the activity of novel inhibitors, a QSAR classification model was constructed with OCHEM using published experimental data on a related system. The most promising inhibitors, with four out of five inhibitors classified as active compounds, were molecules targeting the Glu66 to Arg72 region of the Kis antitoxin. Calculations for Gibbs free energy (p=0.000000252) and pKd (p=0.000459) showed statistically significant binding affinity compared to control molecules, representing a significant binding specificity towards the target interaction region. In the fight against antibiotic resistance, the design of small-molecule inhibitors targeting toxin-antitoxin systems may be an important discovery for the selective targeting of plasmid-mediated resistance through the application of internal mechanisms toward antibiotic development.
Presentation Overview: Show
Molecular dynamics (MD) simulations generate a huge amount of data containing
atomic positions and velocities. Most MD analyses involve studying a part of the
system, similar to finding a needle in a haystack. Here, we have attempted to identify
time-dependent dynamic molecular features from MD simulation trajectories. The
motivation is to identify dynamical features that can be used to train machine
learning (ML) algorithms to identify potential drug molecules. The hypothesis for this
study is that the features extracted from MD trajectories provide dynamic information
about the interactions between the ligand and the protein and hence ML models
trained on this data should have more predictive power compared to conventional
models. We have taken Human Androgen receptor’s ligand-binding domain
structures from PDB along with a set of 1431 drug molecules (450 Agonists and 981
Antagonists) from PubChem, along with Testosterone (positive control) and
Cyproterone acetate (negative control). Using a cumulative of ~28 microsecond MD
simulation data we plan to use trajectory analyzers such as TRAVIS, MD-TASK,
MDAnalysis and ProDy to extract features such as mean square displacement,
mechanical stiffness, amount of amino acid perturbation, and others to discriminate
an agonist from an antagonist.
Presentation Overview: Show
AlphaFold2 produces structure predictions at high quality and speed. As a result the AlphaFold database is expected to release over 100 million structures covering the UniRef90 database this year, which will soon lead to billions of structures. Additionally, the prediction speed is constantly improved, e.g. ColabFold's pipeline is approx. 100 times faster compared to the base system. In spite of advances in speed, storing the structure of a protein with 250 residues in PDB format takes approx. 200 kilobytes (only 3D coordinates 25 kilobytes), thus one billion structures would require hundreds of terabytes.
Here, we propose a format and method to compress protein structures requiring only 2 kilobytes for a protein structure of average size (8 bytes per residue), reducing the required storage space by an order of magnitude. We achieve this reduction by encoding the torsion angles of the backbone as well as the side-chain angles in a compact format instead of the 3D coordinates. Additionally, we show that using our lossy compression has no impact on structural downstream analysis.
By storing angles with an optimized bit-format, we can reduce the disk space required by 90% compared to float-encoded 3D coordinates, while maintaining a high compression and decompression speed.
Presentation Overview: Show
Highly accurate structure prediction methods, such as AlphaFold2 and RoseTTAFold, are generating an avalanche of publicly available protein structures. Searching through these structures with current structural alignment tools is becoming the main bottleneck in their analysis. Here we propose Foldseek a fast and sensitive protein structures alignment method to compare large structure sets. Foldseek encodes structures as sequences over a 20-state 3Di alphabet. 3Di describes discretized tertiary residue-residue interactions, which is critical for reaching high sensitivities. Foldseek's novel local alignment stage combines structural and amino acid substitution scores to improve sensitivity without sacrificing speed. It reaches sensitivities similar to state-of-the-art structural aligners while being at least 20,000 times faster. The open-source Foldseek software is available at foldseek.com and a webserver at search.foldseek.com
Presentation Overview: Show
Residue coevolution within and between proteins is used as a marker of physical interaction and/or residue functional cooperation. Pairs or groups of coevolving residues are extracted from multiple sequence alignments based on a variety of computational approaches. However, coevolution signals emerging in subsets of sequences might be lost if the full alignment is considered. iBIS2Analyzer is a web server dedicated to a phylogeny-driven coevolution analysis of protein families with different evolutionary pressure. It is based on the iterative version, iBIS2, of the coevolution analysis method BIS, Blocks in Sequences. iBIS2 is designed to iteratively select and analyse subtrees in phylogenetic trees, possibly large and comprising thousands of sequences. With iBIS2Analyzer, openly accessible at http://ibis2analyzer.lcqb.upmc.fr/, the user visualizes, compares and inspects clusters of coevolving residues by mapping them onto sequences, alignments or structures of choice, greatly simplifying downstream analysis steps. A rich and interactive graphic interface facilitates the biological interpretation of the results.
Presentation Overview: Show
iCn3D was initially developed as a web-based 3D molecular viewer. It became a collaborative research instrument through the sharing of permanent, shortened URLs that encapsulate not only annotated visual molecular scenes, but also all underlying data and analysis scripts in a FAIR manner. More recently, with the growth of structural databases, the need to analyze large structural datasets systematically led us to use Python scripts and convert the code to be used in Node.js scripts. We showed a few examples of Python scripts at https://github.com/ncbi/icn3d/tree/master/icn3dpython to export secondary structures or PNG images from iCn3D. Users just need to replace the URL in the Python scripts to export other annotations from iCn3D. Furthermore, any interactive iCn3D feature can be converted into a Node.js script to be run in batch mode, enabling an interactive analysis performed on one or a handful of protein complexes to be scaled up to analysis features of large ensembles of structures. Currently available Node.js analysis scripts examples are available at https://github.com/ncbi/icn3d/tree/master/icn3dnode. We also review new features such as DelPhi electrostatic potential, 3D view of mutations, alignment of multiple chains, assembly of multiple structures by realignment, and use of iCn3D in Jupyter Notebook as described at https://github.com/ncbi/icn3d/tree/master/jupyternotebook.
Recently we added the Virtual Reality (VR) and Augmented Reality (AR) features in iCn3D. The video demo is at https://youtu.be/qzhuomrJPnI .
Source Code: https://github.com/ncbi/icn3d .
Presentation Overview: Show
Human interferon-gamma (hIFNγ) is a crucial immunomodulating cytokine, which binds to a high-affinity cellular receptor hIFNγR1. The cytokine also binds to the glycosaminoglycans (GAGs) heparin and heparan sulfate (HS), which modulates its physico-chemical properties.
We report molecular dynamics studies of the interaction of hIFNγ and HS-derived oligosaccharides in two different scenarios – in the circulation, and at the cell-surface, when the cytokine forms a complex with its receptor.
HS oligosaccharides bind to the C-termini of free IFNγ with high affinity, forming very stable complexes due to the strong electrostatic attraction, and also interact with the positively charged solvent-exposed domains in the cytokine globule. This impedes further interaction of the cytokine with hIFNγR1.
On the other hand, GAGs, and HS in particular, may be crucial participants in the formation
of the hIFNγ–hIFNγR complex at the cell surface. Our in silico results demonstrate, that placing HS oligosaccharides between the two receptor units facilitates the formation of the cytokine–receptor complex by pulling down the hIFNγ globule via electrostatic attraction of its C-termini. Experiments performed on cell culture confirm, that inhibition of the sulfation of HS proteoglycans by addition of NaClO3 to the cell medium leads to decreased hIFNγ activity.
Presentation Overview: Show
Multiple mutations in the bifunctional UDP-N-acetyl-2-epimerase/N-acetylmannosamine kinase (GNE) gene resulting in defects in the skeletal muscles leading to GNE myopathy. The disease is characterized by progressive muscle weakness and atrophy leading to extreme disability. We analyzed the mutations in the Indian population to understand the correlation between the genotype and phenotype. We used the dominant-isoform 2 of GNE, mutations Ile618Thr (most pathogenic variant), and Val727Met (least pathogenic variant). The mutated sequences were submitted to AlphaFold and RoseTTAFold. The high-quality models (confidence score of 0.82) were then simulated using GROMACS to study the effect of the mutations on GNE structure by running 0.5 microsecond of MD simulations separately. Mutational analysis (short-range and long-range) was performed by mapping the interactions around the mutation site categorized as newly formed interactions, interactions lost, and interactions retained. The contact order for the aromatic interactions was significantly decreased in the mutants. Comparing the interactions around the residue 618 and 727 in the isoform2 and mutant structures, there are relatively more interactions lost in the immediate vicinity of the mutants. Ile618Thr has more interactions lost indicating lower stability in this mutant. The results suggest that there is a correlation between structural changes and phenotype of the mutations.
Presentation Overview: Show
Deinococcus indicus is a novel Gram-negative bacterium, which is radiation-resistant, and exceptionally resistant to Arsenic. The six proteins of the corresponding ars operon involved in Arsenic extrusion have not yet been characterized, and the mechanism of the same is unknown. Here, we present a computational model (using RoseTTAFold) of the operonic structure and characterization of these proteins - two transcriptional regulators (ArsR1, ArsR2), two Arsenate reductases (ArsC2, ArsC3), one metallophosphatase family protein, and a transmembrane arsenite efflux pump (ArsB). ArsRs are repressors of the operon, and the reductases reduce arsenate (As5+) to arsenite (As3+) ions. Excluding ArsB, others belong to the structural class. ArsB is a transmembrane pump that mediates the removal of arsenite from the cell, and the function of the metallophosphatase family protein in this operon is unknown. After modelling the proteins using a deep – learning approach, they were simulated using Molecular Dynamics simulations in GROMACS and NAMD. The obtained trajectories were analyzed using various measures to ascertain their stability in a simulated environment, and to mine for properties and features of these proteins. Known structural homologs of all proteins were used for comparison to arrive at degree of similarity in structure and function.
Presentation Overview: Show
Cellular functions are governed by proteins. While some proteins work independently, most function by interacting with each other. It is crucially important to know the binding sites that facilitate the interactions. Experimental methods are costly and time consuming, therefore it is essential to develop effective computational methods. We present PITHIA, a deep learning model for protein interaction site prediction that exploits several of the most powerful tools in bioinformatics: alignment, attention, and embedding. The recently introduced MSA-transformer uses the power of attention to learn from millions of multiple sequence alignments, a language model that surpasses previous unsupervised methods by a wide margin. We use the contextual embeddings produced by the MSA-transformer as inputs to our program. The architecture of PITHIA is attention based as well, selected by a thorough comparison with multiple candidates. For meaningful comparison with existing programs, we update several widely used datasets with the most current protein binding site information and create a new one, which is the largest and most challenging to date. PITHIA greatly surpasses the competition on five datasets with respect to multiple measures, exceeding the closest competitor by up to 35% in terms of area under the precision-recall curve.
Presentation Overview: Show
To assign structural and functional annotations to the ever-increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or profile Hidden Markov Model methods, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, this problem is computationally hard.
We introduced PPalign, a program based on Integer Linear Programming, to compute the optimal pairwise alignment of Potts models representing proteins. The approach was assessed on reference pairwise sequence alignments with low sequence identity (3% to 20%). In this experimentation, Potts models were aligned in reasonable time (1’37” on average), and PPalign yielded a better mean F1 score and found significantly better alignments than HHalign and independent-site PPalign in some cases.
These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time.
Presentation Overview: Show
Structure comparison is fundamental for understanding proteins, specifically for studying their sequence and structural evolution and for guiding our efforts to predict their structures from their sequences of amino acids. Coordinate based structural alignment methods optimize the distances traversed by aligned residue pairs during the linear interpolation between two superimposed structures. Current alignment scores do not take into account if there is room for this morph, if it causes steric clashes or if it causes topological changes to the compared structures.
ProteinAlignmentObstruction finds steric clashes and self-intersections occurring during the linear interpolation between two aligned and superimposed structures. Self-intersections that can be avoided by re-folding at most M (user-defined) residues are called removable and the remaining self-intersections detect different threading or topology and are called essential.
We find examples of homologous protein pairs with distinct threading and many pairs of distinctly classified folds that easily are morphed into each other emphasizing the continuous nature of parts of protein fold space. I will present our new server Steric and TOPological Model Hindrance and examples of threading errors it finds in CASP14 models. There are many applications where the ability to detect if structures are close in configuration space may prove important.
Presentation Overview: Show
The Structural Bioinformatics Library (SBL https://sbl.inria.fr/ ) is a large set of optimized computational tools for the analysis of protein structure, function and mechanism. It consists of an extremely modular architecture of C++ template classes and methods divided by level of generality and abstraction - from core features implementing fundamental algorithms to applications aimed at solving specific bioinformatics problems. The extreme care in the mathematical formulation of each task ensures optimal performances in terms of time and robustness.
SBL now counts 23 different application packages divided in four main topics, from protein interface recognition to conformational sampling to binding affinity and functional prediction. For each SBL Application, we are developing an interactive environment that lets the user explore the SBL algorithms and their parameters. Important and non-trivial options are explained in convenient information frames, and an example set of input data is always available.
The SBL is now also available as a Singularity container and a Conda package.
Presentation Overview: Show
The Moniliophthora roreri is a phytopathogenic and hemibiotrophic fungus. This pathogen causes moniliasis, also known as frosty pod rot in species of the genus Herrania and Theobroma (cocoa tree). Several countries in South, Central, and North America have been infected by this disease, and its last register occurred in the north of Brazil in July 2022, causing drastic economic losses. This study aimed to use in silico methodologies for repurposing known drug inhibitors and natural compounds available in public databases for rationally discovering new molecules to mitigate this fungus. For defining a key M. roreri protein target we performed a Blastp of the fungus genome against the PDB database and a list of possible targets were generated by filtering sequence coverages and identities values. In addition, a biological network (PPI) was built to know the biological importance of the modeled proteins, which showed two targets as the hub and bottleneck proteins. The GSK3 was selected for modeling through Swiss-Model, and docking analyses revealed Manzamine A as its possible inhibitor, an alkaloid isolated from marine sponges, with an affinity energy of -9.1 Kcal/mol. Molecular Dynamics calculations have been done for describing the ligand stability inside the GSK3 active site.
Presentation Overview: Show
Multi-protein assemblies play a crucial role in several cellular processes. Studying the functional basis of such complexes begins with the analysis of protein-protein interactions. Several studies have highlighted the significance of interfacial residues in protein-protein complexes and their role in conferring stability and specificity to the complex. In this work, the features of inter-protein bifurcated interactions in multi-protein assemblies have been investigated. We begin by generating a dataset of heteromeric complexes of known 3-D structures. Upon screening of these complexes, we found that inter-protein bifurcated interactions are present in over 600 multi-protein assemblies. Arg, Tyr, and Leu are the highly occurring amino acids in bifurcated inter-protein interactions. Van der Waals interactions, hydrophobic interactions, and salt bridges are the most frequent interaction types. Further, we found that the majority of these residues are hotspots, and they are moderate to highly conserved, with a few exceptions. We could explain the biological significance of bifurcated interactions through a few case studies. Overall, this study expands the knowledge on protein-protein interactions paving the way for the learning of multi-protein assemblies.
Presentation Overview: Show
The structure of proteins can help understand the mechanism of diseases associated with missense mutations and help develop therapeutics. With improved deep learning techniques such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologues. We modelled and extracted the domains from 553 disease-associated human proteins without known protein structures or sequential homologues in the Protein Databank. Domains that could be assigned to CATH superfamilies had higher quality and lower RMSD between AlphaFold and RoseTTAFold models compared to those that could only be assigned to Pfam or neither. Using these models, we predicted ligand-binding sites, protein-protein interfaces, conserved residues, destabilising effects, and pathogenicity caused by missense mutations. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization, or pathogenicity. These mutations were more buried, pathogenic, closer to predicted functional sites and had higher predicted ddG of mutation compared to polymorphisms. Usage of models from the two state-of-the-art techniques and multiple predictors predicting the same mutation to have an effect provides higher confidence in our predictions. We explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.
Presentation Overview: Show
Discovering genome-wide chemical-protein interactions is instrumental for chemical genomics, drug discovery and precision medicine. However, more than 90% of gene families remain dark, i.e., their small molecular ligands are undiscovered. Existing approaches typically fail when the dark protein of interest differs from those with known ligands or structures. To address this challenge, we developed a deep learning framework PortalCG. PortalCG consists of three novel components: (i) end-to-end step-wise transfer learning in recognition of sequence-structure-function paradigm, (ii) out-of-cluster meta-learning in light of protein evolution for generalizing machine learning models to unstudied gene families, and (iii) stress model selection to facilitate model deployment in a real-world scenario. In rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art sequence- and structure-based techniques when applied to dark gene families. Experimental validations on 65 compounds supported the accuracy and robustness of PortalCG. Thus, PortalCG is a viable solution to the out-of-distribution (OOD) problem in exploring the dark protein functional space, and can be applied to a wide variety of scientific domains.
Presentation Overview: Show
Researcher’s ability to sequence genomic variation outpaces the ability to functionally interpret mutations. Previous work has shown that calculations on 3D protein structure enhance the mechanistic information for interpreting genetic variation. Among the key genetic alterations driving cancer, and germline conditions referred to as RASopathies, is KRAS. KRAS is a GTPase enzyme that controls cell growth and proliferation. The prevalence of KRAS mutations within cancer cells makes KRAS an attractive target for cancer treatment, potentially via small molecule targeting. To better understand the conformational sampling of KRAS hotspot mutations, we performed unbiased and enhanced-sampling molecular dynamics simulations. We then compared both to each other and to x-ray crystallography data. Our results demonstrate a wider range of conformations using enhanced sampling and that better capture the variability from experimental data. These findings allow us to better differentiate between conformations of different KRAS genetic variants, and to better understand what conformational changes are more prevalent for each. We investigate the potential to score conformational sampling spaces, which may allow us to streamline the process of differentiating between hotspot variants of different proteins for future projects. Finally, clustering of KRAS conformational spaces suggest mutation-specific conformations that may be candidates for small molecule targeting.
Presentation Overview: Show
Motivation: We propose a practical algorithm based on graph theory, with the purpose to identify CTCF- mediated chromatin loops that are linked in 3D space. Our method is based on finding certain graph structures, K6 minors, in graphs constructed from pairwise chromatin interaction data obtained from the ChIA-PET experiments. We show, that such graph structures, representing particular arrangement of loops, mathematically necessitate linking, if co-occurring in an individual cell. The presence of these linked structures can advance our understanding of the principles of spatial organization of the genome. Results: We apply our method to graphs created from in situ ChIA-PET data for GM128787, H1ESC, HFFC6 and WTC11 cell lines, and from long-read ChIA-PET data. We look at these datasets as divided into CCDs – closely interconnected regions defined on the basis of CTCF loops. We find numerous candidate regions with minors, indicating the presence of links. The graph-theoretic characteristics of these linked regions, including betweenness and closeness centrality, differ from regions without, in which no minors were found, which supports their non-random nature. We provide two versions of the algorithm: one efficient enough to be applied to large datasets, and the other with greater detection capabilities.
Presentation Overview: Show
While recent advances in structural bioinformatics and deep learning have made the prediction of single-chain protein structures highly accurate, many related challenges remain, including that of multiple-chain protein complex structure prediction. Coevolution-based quaternary structure prediction, unlike its single-chain counterpart, presents the challenge of constructing a “paired” multiple sequence alignment (MSA) to embed sequence-sequence pairwise coevolutionary information for hetero-multimers. Attempts to predict protein complex structures have paired MSAs by the species-origin of individual sequences, by allowing for large gaps in unpaired regions of the MSA, and/or through naïve concatenation of monomeric MSAs. Here, we present DeepDimer, a novel pipeline for the prediction of protein heterodimer structures using a novel MSA pairing method. From a paired MSA, DeepDimer generates inter-chain coevolutionary features, and from these features predicts an inter-chain distance map by a residual, convolutional, and deep neural network. Using this inter-chain distance map and single-chain distance maps generated by DeepPotential, DeepDimer uses a modified version of PotentialFold to construct dimer structures. While DeepDimer is still under development, preliminary results suggest its novel MSA pairing method significantly improves heterodimer structure prediction accuracy. DeepDimer will be released as a free web server and an open-source project.
Presentation Overview: Show
Predicting peptide-protein docking structure has experienced an impressive scientific momentum over the past few years. Analyzing the structure of these complexes and discovering how they bind together plays a crucial role in designing and developing peptide drugs and enzyme inhibitors. A wide range of computational algorithms has been developed for peptide-protein docking predictions. Current peptide-protein docking tools often require crystallized protein structures, which are expensive and difficult to capture. Benefiting from Alphafold2 and RoseTTAFold tools, most protein structures can be precisely deciphered based on protein sequences only. Formulating peptide-protein binding/docking as a protein complex folding problem, we can use Alphafold2 to generate a series of bounded peptide-protein conformations. We have designed a pipeline for predicting peptide-protein complexes and scoring the predicted models by using these AI-based tools. In this work, we benchmarked the pipeline by a set of non-redundant peptide-protein complex structures derived from databases of peptiDB, Propedia, and PepBDB. We compared the results with several peptide-protein tools, such as InterPep2 and GalaxyWEB. We also evaluated several scoring schemes, including our in-house method based on a graph neural network, in ranking peptide-protein binding conformations.
Presentation Overview: Show
Linear B-Cell epitope refers to a class of antigenic determinants that could bind to B-Cell receptors or antibodies released by the adaptive immune system. Among the two types of epitope classes, the continuous (or linear) and the discontinuous, both only exist upon the detection and binding of the antigen by an antibody. In a scalable and less expensive process, computational approaches aim to contribute with epitope-based vaccines and immunotherapies development, identifying from a protein sequence, which residues are more likely to be part of an epitope.
A variety of prediction methods have been developed over the years, however, their reliability for clinical applications is still questionable based on medium to low performance (Matthew’s Correlation Coefficients ranging from 0.32 to 0.62). Additionally, current machine learning models also lack interpretability, limiting biological insights that could otherwise be obtained. Here, we introduce CSM-epitopes, an interpretable machine learning method, capable of accurately identifying linear B-cell epitopes, leveraging a new graph-based signature representation of protein sequences, based on our well established CSM (Cutoff Scanning Matrix) algorithm.
Presentation Overview: Show
The interaction of enhancers and promoters on genomic DNA remains poorly understood. Chromosomes cannot be observed during the cell division cycle because the genome forms a chromatin structure and spreads within the nucleus. However, high-throughput chromosome conformation capture (Hi-C) measures the physical interactions of genomes. In previous studies, DNA extrusion loops were directly derived from Hi-C heat maps. Multidimensional Scaling (MDS) is used in this assessment to more precisely locate DNA loops. MDS is a multivariate analysis method that reproduces the original coordinates from the distance matrix between elements. We used Hi-C data of an immortalized line of human T lymphocyte cells and applied MDS as the distance matrix of the genome. In addition, we selected columns 2 and 3 of the orthogonal matrix U as the desired structure. Overall, the DNA loops from the reconstructed genome structure contained the transcription factors involved in DNA loops, such as SATB1 and HMGIY. Therefore, our results are consistent with the biological findings. Our method is suitable for identifying DNA loops in the genome.
Presentation Overview: Show
The development of new vaccines and antibody therapeutics typically takes several years and requires over $1bn in investment. Accurate knowledge of the paratope (antibody binding site) can speed up and reduce the cost of this process by improving our understanding of antibody-antigen binding.
We present Paragraph, an open-source structure-based paratope prediction tool that outperforms current state-of-the-art tools using simpler feature vectors and no antigen information. Representing the antibody variable region as a graph, Paragraph uses equivariant graph neural network layers to predict the probability of each residue belonging to the paratope.
Given the lack of readily available antibody crystal data, it is essential that structure-based prediction tools work on model structures. As such, all our results are on models.
In addition to improving paratope prediction accuracy, we also identify issues with currently used benchmark datasets and metrics. To overcome this, we develop a larger, cleaner dataset to be used in future efforts and suggest metrics well-suited to evaluating highly class-imbalanced problems.
Paragraph achieves a PR AUC of 0.725 on ABlooper model structures of our expanded dataset. Promisingly, Paragraph’s performance increases with model confidence, suggesting our accuracy may rise with future improvements to antibody structure prediction.
Presentation Overview: Show
The genome of Mycoplasma pneumoniae has undergone much reduction, making it an organism suitable for system-wide studies in proteomics or, in this case, a proof of principle for in-cell structural proteomics. Here, we show that novel protein complexes can be identified from whole cells imaged with electron cryotomography.
Over 500 cells were imaged. After 3D tomogram reconstruction, we selected a recurring particle of interest (POI) of unknown identity. We picked a number of these POIs, refined subvolumes in order to construct an initial average 3D density of the POI, and used this density for template matching against the large dataset of tomograms.
After curation, we trained a 3D U-Net on subvolumes around POIs. The trained model was then used to predict more POIs. We iteratively trained, predicted, and curated to improve results.
To identify the proteins in this POI, we performed rigid-body fitting of AlphaFold-predicted structures into the density. Top hits were confirmed via crosslinking mass spectrometry data. We are working in parallel to improve the resolution of our average, such that secondary-structure elements become apparent and we can begin modelling our identified proteins into the map.
Ultimately, we show it is possible to identify uncharacterized proteins from whole-cell tomograms.
Presentation Overview: Show
AlphaFold2, a ML-based method developed by DeepMind, revolutionised the field of structural biology by predicting the 3D structure of proteins with an accuracy often comparable to experimental characterization. In a joint effort with EMBL-EBI, protein structures for 21 model organisms were made available. To exploit these, assigning modelled domains to their evolutionary families helps in understanding how genetic variations modify structure and ultimately function. The CATH database includes evolutionary relationships between protein domains and classifies them into superfamilies. We identify structural domains in AlphaFold2 models and classify them in CATH. While most domain assignments are obtainable by Hidden Markov Models-based methods, remote homologs often are elusive. We recently established CATHe, a supervised machine learning approach that exploits sequence embeddings from the ProtT5 PLM to detect remote homologs. Using CATHe and a new fast structural aligner, Foldseek, we established thresholds for confirming homology. Before structurally validating the assignments, small, disordered, non-globular domains or poorly packed domains were removed. 93% of domains passing these thresholds could be brought into CATH, with the remainder belonging to ~4200 putative novel families. Manual curation efforts on human domains from these novel families, lead to the identification of one new architecture and ~100 new folds.
Presentation Overview: Show
Glycans play important roles in protein folding and cell-cell interactions – and, furthermore, glycosylation of protein antigens can dramatically impact immune responses. Previously, we developed an in silico tool GLYCO (GLYcan COverage), to quantify the glycan shielding of protein surfaces. We applied it to determine glycan-free surface of SARS-CoV-2 NTD supersite and to correlate glycan coverage with antigen-antibody properties. Here we developed a user-friendly web server, GLYCO-2.0, and improved the computational speed by replacing the previous linear parametrization with a new analytical cylinder method with KD-trees when retrieving atom positions within the coordinate space. The use of these new methods increased computational speed by ~4-5 fold in single and multiprocessing settings. GLYCO-2.0 can estimate glycan shielding from a single coordinate file or multiple frames derived from for instance molecular dynamics simulations or NMR spectroscopy to account for the inherent flexibility of oligosaccharides. The server offers email notifications, allowing the retrieval of results within a week. Also, we showcased the applicability of GLYCO-2.0 by estimating the glycan shield development of influenza’s hemagglutinin proteins over time. Overall, quantification of glycans by GLYCO-2.0 provides a comprehensive understanding of glycan shielding of glycosylated proteins and contributes to glycoprotein-involved research such as vaccine design.
Presentation Overview: Show
Several viral glycoproteins go through conformational changes, fundamental to infection processes. The SARS-CoV-2 Spike protein is of particular importance during the current pandemic. This protein interacts with the human acetylcholinesterase 2 (ACE2) receptor as part of the viral entry mechanism. To do so, the receptor-binding domain (RBD) of Spike needs to be in an open state conformation. Here we utilize coarse-grained Normal Mode Analyses to model the dynamics of SARS-CoV-2 Spike protein variants as well as the transition probabilities between open and closed conformations. We performed 17081 possible in silico single mutations of Spike to determine positions and mutations that may affect the occupancy of the conformational states. Based on that, we successfully predicted some of the main mutations that constitute Alpha, Beta and Gamma variants. We also built a simplified model for binding evaluation, validated with experimental data of the binding between RBD mutants and ACE2, which is now being applied to the evaluation of interfaces between conformational ensembles of Spike and antibody structures, with preliminary results offering a consensus among the various experimental interfaces determined, to propose a method to evaluate mutants that integrates dynamics, binding, and immune escape.
Presentation Overview: Show
Protein-protein interactions (PPIs) play a crucial role in many molecular processes. Despite many efforts, mechanisms governing molecular recognition between proteins remain mysterious. This presents a challenge for computational approaches to differentiate between interacting and non-interacting proteins. Here we present a new method to tackle this challenge using intrinsically disordered regions (IDRs). IDRs are protein segments that are functional despite lacking a single invariant three-dimensional structure. The prevalence of IDRs in eukaryotic proteins suggests that the highly dynamic nature of IDRs is critical for protein function. To test this hypothesis, we predicted PPIs using IDR sequences in candidate interacting proteins in humans. Moreover, we acquired appropriate training strategies based on the type of prediction problem between proteins. Our findings underline the importance of separating problem types from each other and show that sequences encoding IDRs can be used to predict specific features of the human IDP networks. Our findings further suggest that accounting for IDRs in future analyses should accelerate efforts to elucidate the eukaryotic PPI network.
Presentation Overview: Show
Prediction and structural modeling of protein-protein interactions (PPIs) are essential for understanding biological processes. Most large-scale experimental and computational approaches that predict PPIs do not provide structural information. We present a novel approach, XLEC, combining cross-linking mass spectrometry (XL-MS) and evolutionary couplings (ECs) data for efficient proteome-wide prediction and modeling of PPIs. While ECs derived from multiple sequence alignments primarily yield information on direct contacts between proteins across the interface, XL-MS data preferentially captures longer-range interactions, hence these methods contain complementary information. XLEC integrates information from both approaches in a machine learning-based model and subsequent constraint-based modeling of the complex structure. We applied XLEC to data from murine mitochondrial proteomes and compared its performance to those of XL-MS and ECs separately. Our preliminary assessment suggests that XLEC outperforms XL-MS or ECs-based identification of PPIs (precision/recall: XLEC 76%/76%; XL-MS only: 71%/57%; ECs: 68%/57%). Furthermore, XLEC-based modeling of PPIs achieved excellent L-RMSD (<10 Å) for 20% of the benchmark dataset (XL-MS only: 2%; ECs only: 11%). Using XLEC, we generated around 500 de novo PPI models revealing novel insights into the mitochondrial interactome.
Presentation Overview: Show
In the context of crowded cellular environment, one of the important challenges is to elucidate how proteins distinguish their native partners from a wide variety of non-interactors. The increasing availability of experimentally determined protein-protein complexes provides an opportunity to investigate preferences in protein-protein interactions. We systematically explored the shape complementarity of the interacting proteins using binary hetero complexes from the Protein Data Bank (PDB). The results showed that protein shape characteristics and the corresponding intermolecular energy landscape, sampled by a systematic docking protocol, can discriminate the non-interacting proteins. The number of minima on the energy landscape of known protein interactors, as well as the clustering patterns of the energy minima, are different from those of the non-native protein ligands. The findings provide an insight into fundamental properties of protein recognition. The results can be used to generate more adequate sets of protein-protein complexes for knowledge-based modeling.
Presentation Overview: Show
Bombali ebolavirus (BOMV) was recently discovered in bats roosting inside houses in Sierra Leone. Although there is currently no evidence of BOMV infection in humans, BOMV glycoprotein (GP) is capable of mediating viral entry into human cells by specifically interacting with the essential filovirus receptor Niemann-Pick C1 (NPC1). Genetic variation at the GP-NPC1 interface serves as a critical determinant of cellular host susceptibility. Given the potential for BOMV transmission to humans, it is imperative to investigate the fundamental properties underlying viral susceptibility of human cells.
Here, we integrate complementary in-silico binding affinity tools (FoldX, FlexDDG and Free-Energy Perturbation -FEP-), in-vitro binding ELISAs, and cell-based infectivity assays to characterize the interaction between BOMV GP and human NPC1. We identified a residue in the BOMV GP interface that decreases the binding affinity of GP to human NPC1. Molecular dynamics analysis suggests that the amino acid variation in GP triggers a conformational change in the interacting NPC1 loop that interrupts energetically favorable contacts. This study uncovers a novel genetic determinant of the cellular host range of BOMV and sets the stage for a novel surveillance strategy to rapidly characterize the zoonotic risk of emergent filoviruses.
Presentation Overview: Show
Protein-protein interactions drive functions in eukaryotes that can be described by short linear motifs (SLiMs). Conservation of SLiMs help illuminate functional SLiMs in eukaryotic protein families. However, the simplicity of eukaryotic SLiMs makes them appear by chance due to mutational processes not only in eukaryotes but also in pathogenic bacteria and viruses. Further, functional eukaryotic SLiMs are often found in disordered regions. Although proteomes from pathogenic bacteria and viruses have less disorder than eukaryotic proteomes, their proteins can successfully mimic eukaryotic SLiMs and disrupt host cellular function. Identifying important SLiMs in pathogens is difficult but essential for understanding potential host-pathogen interactions. We performed a comparative analysis of structural features for experimentally verified SLiMs from the Eukaryotic Linear Motif (ELM) database across viruses, bacteria, and eukaryotes. Our results revealed that many viral SLiMs and specific motifs found across viruses and eukaryotes, such as some glycosylation motifs, have less disorder. Analyzing the disorder and coil properties of equivalent SLiMs from pathogens and eukaryotes revealed that some motifs are more structured in pathogens than their eukaryotic counterparts and vice versa. These results support a varying mechanism of interaction between pathogens and their eukaryotic hosts for some of the same motifs.
Presentation Overview: Show
MicroRNAs (miRNAs) regulate gene expression and have recognized roles in numerous physiological processes and diseases, including cancer. A single nucleotide polymorphism (SNP) in the miR-125a gene leads to poor breast cancer prognosis by blocking the cleavage of the primary miRNA transcript (pri-miRNA). The SNP does not affect pri-miR-125a’s minimal energy structure, leading to the hypothesis that the lost signal must be dynamical. Leveraging high-throughput data on the maturation efficiency of 29 478 pri-miR-125a mutant sequences, we applied our sequence-sensitive elastic network ENCoM to study the full 3D conformational space of pri-miRNAs and its impact on their maturation efficiency. The model predicts maturation efficiency with high accuracy (predictive R-squared of 0.75) when both the ENCoM dynamical signature and the MC-Fold enthalpy of folding are combined, highlighting the synergy between these respectively entropy- and enthalpy-based methods. Looking at the patterns apparent from the model’s coefficients, we corroborate motifs previously identified as necessary for the cleavage of pri-miRNAs but also challenge established notions such as the necessity for a rigid hairpin structure. Our novel approach is fast enough to predict theoretical maturation efficiencies for millions of miRNA sequences, the extremes of which we are currently testing in the lab.
Presentation Overview: Show
Protein families evolve by the accumulation of sequence variations that translate into changes in the folding pathways and the structure and dynamics of the native state of their members. These changes are constrained by the features of the folding energy landscape as well as the cellular context where these proteins perform their molecular function.
Natural proteins fold by minimizing the energetics of those interactions that are present in their native states. Although the free energy is globally minimized, not all interactions that are present in the native state can be energetically optimized. These conflicting, frustrated, signals have been linked with different functional aspects such as protein-protein interactions, allosterism and catalytic activity.
Here we present FrustraEvo, a tool that measures local frustration conservation patterns within protein families as a proxy to define residues that are important either for stability or function and relate them to their sequence variability signatures. We additionally compare homologous protein families to understand how they have diversified their functional patterns from a common ancestral origin. We will showcase how FrustraEvo can shed light into the functional understanding of structurally characterized protein families as well as of poorly characterized ones, thanks to recent advances in structure predictions.
Presentation Overview: Show
We employ coarse-grained normal mode analysis to calcu-late dynamical signatures of different ligand/G-protein Coupled Receptors (GPCRs) complexes. Dynamical signa-tures show changes in flexibility of different parts of the structure upon ligand binding. As a first experiment, we docked a large set of ligands with known Emax for GTP-gammaS binding to a crystal structure of the active mu (MOR) and kappa (KOR) opioid receptors, calculated the dynamical signature for each ligand and obtained predictors using multiple linear regression. We obtained a Pearson’s correlation of R=0.46 and R=0.57 in a leave-one-out vali-dation (a scenario where we present a totally new ligand to the system) and a Pearson’s correlation of R=0.8 and R=0.7 in an 80:20 validation (a best-case scenario where new molecules are like training set molecules), for MOR and KOR reactively. These results, shows that even with a limited training set, we can get good estimation of Emax of new drug candidates, therefore predicting their role as agonists, antagonists, or partial agonists computationally and potentially as part of high-throughput screening. More-over, by analyzing the coefficients of these predictors, we see what regions of the receptor have the largest influence in its activation (highlighting helices 5, 6 and the binding-site).
Presentation Overview: Show
The SARS-CoV-2 virus interacts with the host cell by binding its Spike protein with the host's ACE2 receptor by a viral receptor-binding domain (RBD). Active peptides have been used as new antiviral alternatives for inhibiting viral cell entry as new drug options. This study aimed to in silico evaluate the interaction antimicrobial peptides for blocking the interaction between the SARS-CoV-2 Spike (RBD) with the human ACE2 receptor, as well as propose new synthetic peptide molecules using machine learning and molecular modeling and docking methods. 302 peptide sequences were modeled and docked against the Spike-RBD region to do a training database. Subsequently, we trained machine learning models using Bayesian Ridge (BR), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) algorithms for predicting Spike-peptide interaction energies. Molecular docking energies ranged from -126,060 to -230,308 KJ/Mol. The BR model achieved the best results with an RSME of -14.1 using the Amino Acid Linking (AAL) resource extractor. Using a genetic algorithm assisted by a machine learning model we proposed 10 new peptides with antiviral potential against the SARS-CoV-2 with better energies in comparison to the training database. This tool and its ML algorithms can be easily applied to other emerging viruses and microorganism targets.
Presentation Overview: Show
SARS-CoV-2 infection manifests a range of clinical presentations from mild illness to life-threatening disease. As a mediator of viral entry, ACE2 is an a priori candidate genetic risk factor. The affinity of SARS-CoV-2 Spike for ACE2 is a key parameter influencing host-range and tropism and so we determined the affinities of several reported ACE2 population variants experimentally and predicted the effects of many more. We found ACE2 alleles that strongly inhibited binding to Spike and some with moderately increased affinity. Comparison to recent infectivity studies indicates that the affinity ranges of ACE2 variants can protect cells from infection and so some almost certainly confer resistance to carriers; this is now being tested with clinical data. We will also highlight the strengths and weaknesses of current generation predictors, and present new results on the interplay between ACE2 variants and different SARS-CoV-2 strains.
Presentation Overview: Show
SARS-CoV-2 infection manifests a range of clinical presentations from mild illness to life-threatening disease. As a mediator of viral entry, ACE2 is an a priori candidate genetic risk factor. The affinity of SARS-CoV-2 Spike for ACE2 is a key parameter influencing host-range and tropism and so we determined the affinities of several reported ACE2 population variants experimentally and predicted the effects of many more. We found ACE2 alleles that strongly inhibited binding to Spike and some with moderately increased affinity. Comparison to recent infectivity studies indicates that the affinity ranges of ACE2 variants can protect cells from infection and so some almost certainly confer resistance to carriers; this is now being tested with clinical data. We will also highlight the strengths and weaknesses of current generation predictors, and present new results on the interplay between ACE2 variants and different SARS-CoV-2 strains.
Presentation Overview: Show
A high-precision protein structure prediction refers to an accurate determination of the relative positions of amino acids in three-dimensional space. The recent advance in single-chain protein structure prediction brought great promise in the computational algorithms to the protein-protein interaction. An effective protein model ranking method is essential for structure prediction approaches to select the most accurate protein structure. This study applies a deep graph neural network to score a protein complex model by utilizing the residue-level structural information in 3D space and sequence-level co-evolutionary constraints. Several protein structural features in deep graph learning are investigated for high-accurate protein quality estimation, including inter-residue distance, per-residue energies in structure, physical and chemical properties of amino acids, and co-evolutionary constraints from sequence alignment. The method models protein structure as a connected graph, in which each node represents the residues, and the edge represents the closeness between any pair of residues in a complex structure. The algorithm provides the residue-level quality estimation in terms of the local-distance difference test (lDDT) score. We trained the quality estimation algorithm on protein-protein docking benchmark version 4.0 (BM4) and improved the performance in ranking protein complex decoys compared with top-ranked protein scoring approaches.
Presentation Overview: Show
The immunoglobulin (Ig) fold is a remarkable protein fold covering all kingdoms of life and makes up the structure of antibodies as well as many signaling receptors and cell adhesion molecules. The Ig folds/domains mediate a variety of cellular functions including immune response, tissue formation, cell migration, and synapse formation to name a few. They are implicated in a range of disease mechanisms and also carry amazing therapeutic potential. Despite their significance, much remains unknown regarding their mechanisms of action in different physiological contexts. The diversity of Ig folds (C1-set, C2-set, I-set, FNIII, V-set) and the lack of a unified Ig residue numbering scheme hinders the mechanistic deconstruction of this fold, which can then be leveraged for an evolutionary, structural, and biochemical understanding of their different physiological functions as well as for the rational design of therapeutic nanobodies. In this work, we propose the IgStRAnD (Ig Strand Residue Anchors Dependent) numbering scheme that unifies the different types of Ig folds by revealing conserved residue contacts critical for the minimal Ig fold, presence of (pseudo)symmetry at the tertiary/quaternary levels in all Ig folds, and a seamless structural as well as functional connection of important residues across different Ig folds.
Presentation Overview: Show
The increasing resistance of vector mosquitoes, such as Aedes aegypti L. to commercial repellents and their associated human and environmental toxicity, has driven the search and development of more selective and sustainable compounds. The aromatic compounds of natural origin are designed with the purpose of attracting insects and pollinators, presenting as a characteristic low toxicity and safety, when compared to substances of synthetic origin. A chemotheque with 3361 aromatic molecules of natural origin, obtained from aroma banks (Odor DB, EssOil DB, Super sent and Odor data) was used. After molecular docking using the Dockthor program with the odorant protein OBP from Aedes aegypti L. (PDB code: 3k1e), 188 molecules were selected with affinity energies more than -8.8kcal/mol when compared to DEET (N,N-dimethyl-meta-toluamide) commercial synthetic repellent compound. Then, 7 of these molecules with energy more than -9,0kcal/mol were submitted to ADMETox analysis (Data Warrior, PkCSM and Toxtree) where physicochemical, pharmacokinetic, toxicological and environmental biodegradability characteristics were described. According to the prediction of human risk, two molecules were selected (LBQC1281 e LBQC1154) with potential for evaluation in vivo assays in adult mosquitoes of Aedes aegypti L., with low impact on human and environmental health.
Presentation Overview: Show
The Protein Data Bank (PDB) core archive currently distributes >190,000 experimentally-determined 3D structures of biological macromolecules. This archive is managed by the Worldwide PDB (wwPDB, wwpdb.org): RCSB PDB, PDBe, and PDBj provide public access to the contents of the archive, while EMDB and BMRB house experimental EM and NMR data, respectively. As a full understanding of any macromolecular system requires knowledge of its biological and evolutionary context, each partner website independently furnishes this structural data with a set of entry-specific annotations from external resources such as UniProt, SCOPe, CATH, SIFTS, and others. While this mode of delivery offers contextual value to PDB users viewing individual structures directly online, it is not ideally suited for research involving a group of related structures or for offline work. Towards this, wwPDB is developing a Next Generation PDB repository that collates structural data with contextual annotations into downloadable files. This new architecture involves the development and implementation of service-specific APIs using a FastAPI framework and containerized packaging (as carried out by RCSB PDB), which will be presented here.
RCSB PDB and PDBe are jointly funded for this project (U.S. NSF and U.K. BBSRC). RCSB PDB core operations are funded by NSF, NIH, and DOE.
Presentation Overview: Show
Apolipoprotein E (ApoE) is the primary cholesterol and lipid transporting apolipoprotein in Alzheimer’s disease (AD). There are three main isoforms differing by single amino acid changes: ε3 is “neutral”, ε4 is “risk” (Cys112Arg), and ε2 is “protective” (Arg158Cys). Rare forms (Christchurch, Jacksonville) have also been proposed as “resilience” to AD. It has been proposed that a significant conformational transformation is required for lipidation; to date, only a single mutated NMR structure of full-length ε3 in a closed conformation exists, leaving unanswered questions regarding conformational differences among different APOE isoforms. Here, we have utilized multiple replicates of long-timescale (six replicates of 15 µs per isoform, 540µs in total) to generate 200 starting conformations per isoform using the AiMOS supercomputer. These were then simulated an additional microsecond with three replicates each (600 µs per isoform). In total, 4.14 milliseconds of simulation across 6 isoforms were generated. Using a graph-based implementation of VAMPNets, we have explored the conformational landscape of ApoE, using graph attention networks to probe intramolecular interactions for the different metastable states for each isoform, as well as a combination of the 6 isoforms. These insights will shed light on the structural differences between risk, neutral, and protective alleles for ApoE.
Presentation Overview: Show
We have developed the Elastic Network Contact Model (ENCoM), a coarse-grained normal mode analysis method unique in its sensitivity to the full chemical sequence of the studied molecule. This enables studying the impact of mutations on properties like vibrational entropy, which we demonstrated correlates with thermal stability, and the full entropic signature (entropy at each residue). When used in combination with machine learning models like LASSO regression or simple neural networks, the entropic signature captures important properties for the function of diverse biomolecules and exhibits good fit to experimental data. ENCoM is now part of the NRGTEN Python package, can be installed with a single command and run on a PDB file with as few as three lines of Python. The new dynasigML Python package contains all the functions to learn relationships between dynamics and function for any macromolecule of interest, provided there exists mutational data. These packages make use of NumPy and are very fast, taking less than 5 seconds CPU time predicting the effect of a mutation on a 250 amino acid protein. This enables the high-throughput prediction of engineered mutants to test in the lab.
Presentation Overview: Show
Despite tremendous efforts by scientists during the COVID-19 pandemic, the exact structure of SARS-CoV-2 virus remains elusive. Membrane (M) protein is one of the four structural proteins in SARS-CoV-2 and is the most abundant protein in the virus, with estimated ~1,100 M dimers in each viral particle. Yet, the structure of M protein has not been solved. Here, we aim to develop an integrative approach to build an accurate model of M protein in its native, dimeric form and perform a structure-driven comparative analysis to discover functional and evolutionary relationship with ORF3a, another SARS-CoV-2 protein, functioning as an ion-channel. We integrated information on de novo models of M monomers, symmetric docking, experimental geometry constrains, and structure of ORF3a for domain refinement to build our M-dimer model. For comparative analysis, we built a hybrid alignment, based on the structural alignment of the two proteins and sequence alignments of their homologs in Betacoronavirus. Although ORF3a and M-dimer share poor sequence similarity, they are surprisingly similar in their structures. We found that a substantial number of functionally important residues are conserved between ORF3A and M and within their evolutionary families. Our findings demonstrate that M may be an attractive novel target for antivirals.
Presentation Overview: Show
Despite tremendous efforts by scientists during the COVID-19 pandemic, the exact structure of SARS-CoV-2 virus remains elusive. Membrane (M) protein is one of the four structural proteins in SARS-CoV-2 and is the most abundant protein in the virus, with estimated ~1,100 M dimers in each viral particle. Yet, the structure of M protein has not been solved. Here, we aim to develop an integrative approach to build an accurate model of M protein in its native, dimeric form and perform a structure-driven comparative analysis to discover functional and evolutionary relationship with ORF3a, another SARS-CoV-2 protein, functioning as an ion-channel. We integrated information on de novo models of M monomers, symmetric docking, experimental geometry constrains, and structure of ORF3a for domain refinement to build our M-dimer model. For comparative analysis, we built a hybrid alignment, based on the structural alignment of the two proteins and sequence alignments of their homologs in Betacoronavirus. Although ORF3a and M-dimer share poor sequence similarity, they are surprisingly similar in their structures. We found that a substantial number of functionally important residues are conserved between ORF3A and M and within their evolutionary families. Our findings demonstrate that M may be an attractive novel target for antivirals.
Presentation Overview: Show
Neoepitopes (neoantigen) are cancer-specific antigens and are significant therapeutic cancer vaccine candidates. Epitopes bind the major histocompatibility complex (MHC), which is an immune receptor. Tumor neoepitopes induce an immune response to eliminate cancer cells. This immune activation depends on the affinity between antigen peptide and MHC ligand. Epitope-MHC binding assay is a technologically difficult, time-consuming, and high-expensive experiment. Therefore, the prediction tools, which predict the affinity between antigen peptide and MHC ligand, have been developed using computational approaches. However, it is insufficient data volume for predicting the epitope-MHC binding. The performance of these predictions is not enough. Here, we proposed a novel deep learning model that can predict epitope-MHC binding from a small amount of training data.
MTL4MHC2 has two multi-task Bi-LSTM models, which are the antigen peptides learning model and the MHC peptides learning model. Each multi-task model shares the learning parameters of MHC class I and II. MTL4MHC2 achieves an AUC-ROC score of 82.2%, outperforming state-of-the-art models.
We demonstrated the effectiveness of multi-task learning for improving prediction performance from low amounts of data. MTL4MHC2 can be applied to developing novel cancer therapeutics like a cancer vaccine.
Presentation Overview: Show
Copper nanoparticles (Cu NPs) have attracted attention due to their many biological applications for microbiology, genetic engineering, pharmacology, agriculture, and other fields. However, their high reactivity favors oxidation, corrosion and aggregation, leading them to lose their properties of interest.
Copper capped by graphene (Cu@G) NPs have also attracted the medical and industrial sector because graphene can serve as a shield to protect the Cu NPs from undesired phenomena. Additionally, It also has biocide activity and can be functionalized for different purposes, improving the Cu NPs applications.
In this work, new Morse potentials to reproduce the behavior of Cu@G NPs through molecular dynamics are reported. The interaction parameters for Cu with C and H were obtained from DFT/PWscf simulations, followed by single-point energies calculations in a good agreement between the quantum and classical technics through the new potentials. Then, they were used to evaluate the structural and thermal conductivity (k) of Cu@G NPs from 1.5 to 6.1 nm at 100 to 800 K, varying also the size, number of layers and orientation (planar or perpendicular) of the graphene sheets.
The results indicate that Cu@G NPs are stable in an argon environment. They have an improved k compared to the Cu NPs, being higher in a range of 200% to 400% at 300 K when they have one graphene layer, and upper to 500% when having a bilayer o trilayer. Finally, The size, homogeneity and orientation of the graphene sheets doesn't seem to affect the κ of the Cu@G NPs.
Presentation Overview: Show
Computational methods to predict protein-protein interaction (PPI) typically segregate into sequence-based ""bottom-up"" methods that infer properties from the characteristics of the individual protein sequences, or global ""top-down"" methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g., AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms.
Software availability: https://topsyturvy.csail.mit.edu
Presentation Overview: Show
Catalytic, binding and metal-binding sites are important and conserved regions of proteins. Their identification can provide important information and insights into protein function. Several computational methods have been developed to identify binding sites based on both sequence and structural information. These have, however, presented limited performance, mostly relying on structural similarity, restricting their application to small binding sites, and not being capable of handling conservative mutations or identifying inter-domain sites.
Here we present the GASS platform, a family of methods for searching similar sites in proteins based on parallel genetic algorithms. GASS was previously successfully used to search for similar catalytic and binding sites, based on templates from the Mechanism Catalytic Site Atlas (M-CSA), correctly identifying more than 90% of the catalogued catalytic sites, ranking fourth among the 18 methods in the CASP 10 competition. GASS was also compared with 8 other state-of-the-art methods for detecting metal-binding sites, outperforming similar methods and achieving an MCC of up to 0.57 and detecting up to 96% of the metal-binding sites correctly.
The GASS platform (https://gassmetal.unifei.edu.br, http://gass.unifei.edu.br/) provides accurate and easy-to-use methods that can be adapted to searching for binding sites in proteins.