Posters - Schedules

Posters Home

View Posters By Category

Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT
Session A Poster Set-up and Dismantle Session A Posters set up:
Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT
Session A Posters dismantle:
Tuesday, July 12 at 6:00 PM CDT
Session B Poster Set-up and Dismantle Session B Posters set up:
Wednesday, July 13 between 7:30 AM - 10:00 AM CDT
Session B Posters dismantle:
Thursday. July 14 at 2:00 PM CDT
Virtual: A hybrid approach to antibody design: combining generative protein sequence design with Rosetta and DeepAb
COSI: 3DSIG
  • Simon Kelow, Prescient Design | Genentech, United States
  • Dan Berenberg, Prescient Design | Genentech, United States
  • Jack Maguire, Prescient Design | Genentech, United States
  • Santrupti Nerli, Prescient Design | Genentech, United States
  • Michael Chungyoun, Johns Hopkins University, United States
  • Andrew Leaver-Fay, Prescient Design | Genentech, United States
  • Andy Watkins, Prescient Design | Genentech, United States
  • Jae Hyeon Lee, Prescient Design | Genentech, United States
  • Stephen Ra, Prescient Design | Genentech, United States
  • Maria Lee, Prescient Design | Genentech, United States
  • Henri Dwyer, Prescient Design | Genentech, United States
  • Kyunghyun Cho, Prescient Design | Genentech, United States
  • Rich Bonneau, Prescient Design | Genentech, United States
  • Vladimir Gligorijevic, Prescient Design | Genentech, United States


Presentation Overview: Show

The deep manifold sampler is a generative protein sequence model that enables iterative exploration of protein sequence space (Gligorijevic et al., 2021; Berenberg et al., 2022) with the advantage of guidance from oracles such as function prediction (Gligorijevic et al., 2019). Rosetta structure-based protein design (Rohl et al., 2004; Leaver-Fay et al., 2011; Alford et al., 2017) uses Monte Carlo sampling of sequence and structure space contingent on optimization of the Rosetta energy function to explore relevant structural changes during design. Integration of 3D structural data into ’oracles’ to guide and/or assess designs from generative modeling remains an open challenge. Here, we demonstrate integration of the deep manifold sampler with Rosetta and DeepAb (Ruffolo et al., 2022) through FastRelax and/or CDR graft-based design protocols using predicted antibody structures, allowing generated designs
to be conditioned on either total score or interface energy (dG) as predictions for antibody stability or antibody-antigen binding affinity. We demonstrate these two approaches on the Herceptin-Her2 system. We show that the deep manifold sampler proposes novel amino acid changes capable of widely exploring antibody sequence space effectively. We also show that this novel proposal distribution is capable of producing designs with better interface or total energy than state-of-the-art methods reliant on structure alone.

Virtual: A Novel Deep Learning Algorithm to Predict DNA Binding Sites: DeepDISE
COSI: 3DSIG
  • Samuel Hendrix, University of Georgia, United States
  • Kuan Chang, National Taiwan Ocean University, Taiwan
  • Zeezoo Ryu, University of Georgia, United States
  • Zhong-Ru Xie, University of Georgia, United States


Presentation Overview: Show

DNA binding proteins play critical roles in gene regulation and development. Therefore, it is essential to develop a reliable DNA binding site prediction method. Compared to small molecule binding site prediction methods, the ones for DNA binding site prediction still have room to be improved. We constructed a convolutional neural network only using 3D coordinates and the atom-types of protein surface atoms as the input data to predict how likely a voxel on the protein surface is a DNA-binding site. The improved accuracy demonstrates the robustness of our model which produces consistent results among 3 datasets. It also proved that protein 3D structures combined with the atom-type information on the protein surfaces can be used to predict the binding sites on a protein. That inspired us to develop new prediction algorithms for the binding sites of other biological molecules on the target proteins.

Virtual: B-AMP 2.0: Enhanced functionalities and capabilities in tackling biofilm induced resistance
COSI: 3DSIG
  • Sai Supriya Avatapalli, Savitribai Phule Pune University, Dr. Karishma Kaushik's Lab, India
  • Yatindrapravanan Narasimhan, SASTRA University, School of Chemical and Biotechnology, India
  • Shashank Ravichandran, SASTRA University, School of Chemical and Biotechnology, India
  • Ragothaman Yennamalli, SASTRA University, School of Chemical and Biotechnology, India
  • Karishma S. Kaushik, Savitribai Phule Pune University, Department of Biotechnology, India


Presentation Overview: Show

Bacteria in biofilms are notoriously tolerant to conventional antibiotics, making the treatment of biofilm infections a clinical challenge. Antimicrobial peptides (AMPs), seen as a novel anti-biofilm approach, target specific biofilm features, as opposed to ‘one size fits all’ antibiotics. In our previous work, we have developed Biofilm-AMP, an open-source structural and functional repository of AMPs for biofilm studies (B-AMP v1.0) consisting of >5000 AMP structures, with annotations to biofilm literature. Its user-friendly functionalities include, search-enabled AMP information, FASTA files and PDB, PDBQT structures. Additionally, AMPs with known anti-Gram positive and negative activity are listed. In B-AMP v2.0, we have upgraded with existing biofilm targets. Specifically, it consists of a manually curated, systematic list of potential biofilm targets across various bacterial pathogens. Using three databases (PDB, Uniprot, and PubMed) we have curated a list of ~2500 targets that span across >50 functional categories. Each target consists of a unique target ID, 3D structure, PDBQT file, and supporting literature references. The 3D structures for targets with no structural data were modeled using ROSETTA. As a case study, we highlight MD simulations of previously identified candidate AMPs with the catalytic site residues of the Sortase C protein (a biofilm target) of Corynebacterium striatum.

Virtual: Coarse grained modelling of Human Kinesin kif11 with its inhibitors at different allosteric sites and implications in its activity
COSI: 3DSIG
  • Soundarya Priya Alexandar, SASTRA Deemed University, India
  • Ragothaman Yennamalli, SASTRA Deemed University, India
  • Venkatasubramanian Ulaganathan, SASTRA Deemed University, India


Presentation Overview: Show

Mitotic kinesin kif11(also known as EG5), is a validated chemotherapeutic target with several compounds at various stages of clinical trials. All the current drug candidates bind uncompetitively with ATP/ADP at allosteric site 1 formed by loop L5, helices α2 and α3. Recent experiments found another allosteric site (site 2), formed by helices α4 and α6 where inhibitors bind either competitively or uncompetitively to ATP/ADP.
However, it is still unclear how inhibitors that bind to two different allosteric sites of kif11, alter the kinetics of the motor domain. Here, we studied the critical structural dynamics that happen at important regions upon inhibitor binding at allosteric site 1 and 2 using coarse-grained modelling like Elastic Network Models, Gaussian Network Models (GNM) and Anisotropic Network Models (ANM). The GNM results showed differences in the structural dynamics of the various inhibitor bound states of kif11 that could attribute to different modes of inhibition. ANM showed specific functional regions of kif11. We conclude that the mechanism of binding at allosteric site 1 and 2 are unique. The simultaneous binding of ligand at both allosteric sites has structural interactions that are independently found in allosteric site 1 and 2 leading to a different mechanism of binding.

Virtual: Computational Characterization of Ligand Specificity and Promiscuity of Staphylococcus aureus NorA Efflux Pump
COSI: 3DSIG
  • Esra Büşra Işık, Gebze Technical University, Turkey
  • Onur Serçinoğlu, Gebze Technical University, Turkey


Presentation Overview: Show

Staphylococcus aureus is a gram-positive bacterial pathogen which causes various disorders, e.g., skin infections and sepsis. Antibiotics are used for the treatment of infections, however excessive usage of these drugs results in antibiotic resistance. S. aureus develops antibiotic resistance via multiple mechanisms, e.g., mutations in the genes encoding the efflux pumps, which results in inhibition of drug accumulation and bacteria can’t be eliminated. NorA is the most studied efflux pump in S. aureus and contributes to the resistance against compounds as hydrophilic fluoroquinolone antibiotics, antiseptics, dyes as ethidium bromide etc. A variety of inhibitors of the protein has also been found, suggesting a promiscuous mechanism of substrate/inhibitor recognition. Here, we aimed to characterize ligand specificity and promiscuity of NorA via molecular docking simulations using 3D structure of protein generated via AlphaFold2 and a list of its known inhibitors and non-inhibitors. An ensemble of NorA conformations was generated by MD simulations, and ligands were docked to various binding pockets on these conformations to examine binding characteristics. According to our results, NorA has shown various binding characteristics and binding sites for ligands were not consistent, which confirmed the promiscuous nature. Our findings may guide further research on discovery of novel NorA inhibitors.

Virtual: Computational investigations in inhibition of alcohol/ aldehyde dehydrogenase (AAD) in lignocellulosic hydrolysates
COSI: 3DSIG
  • Karan Kumar, Indian Institute of Technology Guwahati, Assam, India, India
  • Dr Vijayanand Suryakant Moholkar, Indian Institute of Technology Guwahati, Assam, India, India


Presentation Overview: Show

Sustainable butanol production from lignocellulosic biomass (LB) consists three steps viz., pre-treatment, detoxification, and fermentation. During pre-treatment of LB, many undesirable compounds (e.g. aliphatic/aromatic acids, aldehydes, furans, etc) are formed that have significant impact on the yield of butanol in fermentation, rendering the whole fermentation process economically unfeasible. With the application of quantum and molecular mechanics (QM/MM), our aim is to investigate the inhibition of key alcohol/aldehyde dehydrogenase (AAD) enzymes in LH. The objectives of present study are: (1) Identification and homology modelling of key AAD enzymes; (2) validation, quality assessment and biophysiochemical characterization of the modelled enzymes; (3) identification, construction and optimization of chemical structure of potent AAD inhibitors in LH; and (4) molecular docking and dynamics simulations to profile the molecular interactions between AAD enzymes and their inhibitors. The present study has depicted the mechanism and mode of inhibition of AAD enzyme with the crucial binding sites. The analysis revealed that by minimizing presence of ρ-coumeric acid, vanillic acid and cinnamaldehyde in LH, the biobutanol production yield from LH can be increased significantly. This study will guide the genetic/ metabolic engineers to design robust enzymes that have improved substrate utilisation resulting in substantial product yields and specificity.

Virtual: Deep Local Analysis evaluates protein docking conformations with locally oriented cubes
COSI: 3DSIG
  • Yasser Mohseni Behbahani, Sorbonne Université, France
  • Simon Crouzet, Sorbonne Université, France
  • Élodie Laine, Sorbonne Université, France
  • Alessandra Carbone, Sorbonne Université, France


Presentation Overview: Show

With the recent advances in protein 3D structure prediction, protein interactions are becoming more central than ever before. Here, we address the problem of determining how proteins interact with one another. More specifically, we investigate the possibility of discriminating near-native protein complex conformations from incorrect ones by exploiting local environments around interfacial residues. Deep Local Analysis (DLA)-Ranker is a deep learning framework applying 3D convolutions to a set of locally oriented cubes representing the protein interface. It explicitly considers the local geometry of the interfacial residues along with their neighboring atoms and the regions of the interface with different solvent accessibility. We assessed its performance on three docking benchmarks made of half a million acceptable and incorrect conformations. We show that DLA-Ranker successfully identifies near-native conformations from ensembles generated by molecular docking. It surpasses or competes with other deep learning-based scoring functions. We also showcase its usefulness to discover alternative interfaces.

Virtual: Drug repurposing-based identification of Dengue virus envelope protein inhibitors to lock in its dimeric meta-stable conformation
COSI: 3DSIG
  • Sivasankar A.S, Department of Biotechnology, School of Chemical and Biotechnology, SASTRA Deemed to be University, Thanjavur 613401, India
  • Ramya L, Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed to be University, Thanjavur 613401, India
  • Ragothaman M. Yennamalli, Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed to be University, Thanjavur 613401, India


Presentation Overview: Show

Dengue fever caused by the Dengue virus is an epidemic in tropical countries such as India. Currently, there are no vaccines or drugs available for treatment. In the life cycle of the dengue virus, the envelope glycoprotein E, which is associated with protein C, and protein M mediates the attachment to the host cell. The viral and host cell membrane fusion is driven by a conformational change of E protein in the low pH environment from a dimeric pre-fusion conformation to a trimeric post-fusion state. In a previous study, two novel cavities were characterized and validated. Using one of the cavity characterized in the dimer interface, here we used 3805 FDA drugs as starting candidates to identify inhibitors that would potentially lock the E protein in its dimeric meta-stable conformational state. Employing a two-step virtual screening using AutoDock Vina and AutoDock programs we identify the potential molecules, post-enrichment. Using iLibDiverse, we plan to create a focused library of molecules using criteria such as Lipinski’s rule of five, fewer steps in synthesis, the longevity of interactions to both chains of the dimer, etc. The top-scoring molecules would be further studied using MD simulations.

Virtual: Elucidating important structural features for the binding affinity of spike - SARS-CoV-2 neutralizing antibody complexes
COSI: 3DSIG
  • Divya Sharma, IIT Madras, India


Presentation Overview: Show

COVID-19 has affected the lives of millions of people around the world. In an effort to develop therapeutic interventions and control the pandemic, scientists have isolated several neutralizing antibodies against SARS-CoV-2 from the vaccinated and convalescent individuals. These antibodies can be explored further to understand SARS-CoV-2 specific antigen-antibody interactions and biophysical parameters related to binding affinity, which can be utilized to engineer more potent antibodies for current/emerging SARS-CoV-2 variants. In the present study, we analyzed the interface between SARS-CoV-2 spike protein and neutralizing antibodies in terms of amino acid residue propensity, pair preference, and atomic interaction energy. We observed that Tyr residues containing contacts are highly preferred and energetically favorable at the interface of spike protein-antibody complexes. We have also developed a regression model to relate the experimental binding affinity for antibodies using structural features, which showed a correlation of 0.93. Moreover, several mutations at the spike protein-antibody interface were identified, which may lead to immune escape (epitope residues) and improved affinity (paratope residues) in current/emerging variants. Overall, the work provides insights into spike protein-antibody interactions, structural parameters related to binding affinity, and mutational effects on binding affinity change, which can be helpful to develop better therapeutics against COVID-19.

Virtual: Enhancing interpretability of equivariant neural networks for protein structures
COSI: 3DSIG
  • Hao Xu, Queen's University, Canada
  • Laurence Yang, Queen's University, Canada


Presentation Overview: Show

The biological functions of a protein are determined by its 3D structure. Recent works on developing SO(3)-invariant and equivariant neural networks (ENNs) for protein structures have made remarkable progress in protein engineering tasks, such as protein function prediction and driver mutation identification. Nevertheless, these models have difficulty in providing interpretable explanations of the predictions for protein engineers and experimental biologists, which might not directly help with designing functionally enhanced proteins. To address this concern, we present a novel protein graph attention pooling (PGAP) layer for interpretable prediction of protein functions by emphasizing amino acids differently. Experiments with synthetic datasets validated the interpretability of PGAP. The experimental results on the enzyme-catalyzed reaction classification benchmark dataset show that the synergy of PGAP and SO(3)-ENNs protein structure representation modules achieves competitive performance with existing ENN models.

Virtual: Exploiting Plasmid-Mediated Resistance: Discovery of Small-Molecule Inhibitors for the Artificial Activation of the Kid-Kis Toxin-Antitoxin System in Plasmid R1
COSI: 3DSIG
  • Pinyu Liao, Inglemoor High School, United States


Presentation Overview: Show

Antibiotic resistance is one of the leading challenges to public health today, and a primary contributor to the rapid rise of resistance is plasmids. Therefore, plasmids are critical targets to prevent the rapid spread of antibiotic resistance. In particular, low-copy number plasmids often contain toxin-antitoxin systems that act lethally when activated, so due to the role of toxin-antitoxin systems in facilitating internal cell death, key interacting regions of the Kid-Kis toxin-antitoxin interaction were identified as binding sites for the de-novo design of small-molecule inhibitors using the webserver LEA3D. To predict the activity of novel inhibitors, a QSAR classification model was constructed with OCHEM using published experimental data on a related system. The most promising inhibitors, with four out of five inhibitors classified as active compounds, were molecules targeting the Glu66 to Arg72 region of the Kis antitoxin. Calculations for Gibbs free energy (p=0.000000252) and pKd (p=0.000459) showed statistically significant binding affinity compared to control molecules, representing a significant binding specificity towards the target interaction region. In the fight against antibiotic resistance, the design of small-molecule inhibitors targeting toxin-antitoxin systems may be an important discovery for the selective targeting of plasmid-mediated resistance through the application of internal mechanisms toward antibiotic development.

Virtual: Extracting features from Molecular Dynamics trajectory for identification of agonists and antagonists against Human Androgen Receptor
COSI: 3DSIG
  • Yatindrapravanan Narasimhan, SASTRA Deemed to be University, Thanjavur, India
  • Shashank Ravichandran, SASTRA Deemed to be University, Thanjavur, India
  • Ragothaman Yennamalli, SASTRA Deemed to be University, Thanjavur, India


Presentation Overview: Show

Molecular dynamics (MD) simulations generate a huge amount of data containing
atomic positions and velocities. Most MD analyses involve studying a part of the
system, similar to finding a needle in a haystack. Here, we have attempted to identify
time-dependent dynamic molecular features from MD simulation trajectories. The
motivation is to identify dynamical features that can be used to train machine
learning (ML) algorithms to identify potential drug molecules. The hypothesis for this
study is that the features extracted from MD trajectories provide dynamic information
about the interactions between the ligand and the protein and hence ML models
trained on this data should have more predictive power compared to conventional
models. We have taken Human Androgen receptor’s ligand-binding domain
structures from PDB along with a set of 1431 drug molecules (450 Agonists and 981
Antagonists) from PubChem, along with Testosterone (positive control) and
Cyproterone acetate (negative control). Using a cumulative of ~28 microsecond MD
simulation data we plan to use trajectory analyzers such as TRAVIS, MD-TASK,
MDAnalysis and ProDy to extract features such as mean square displacement,
mechanical stiffness, amount of amino acid perturbation, and others to discriminate
an agonist from an antagonist.

Virtual: Fast lossy protein structure compression algorithm
COSI: 3DSIG
  • Hyunbin Kim, Seoul National University, South Korea
  • Johannes Söding, Max Planck Institute, Germany
  • Martin Steinegger, Seoul National University, South Korea


Presentation Overview: Show

AlphaFold2 produces structure predictions at high quality and speed. As a result the AlphaFold database is expected to release over 100 million structures covering the UniRef90 database this year, which will soon lead to billions of structures. Additionally, the prediction speed is constantly improved, e.g. ColabFold's pipeline is approx. 100 times faster compared to the base system. In spite of advances in speed, storing the structure of a protein with 250 residues in PDB format takes approx. 200 kilobytes (only 3D coordinates 25 kilobytes), thus one billion structures would require hundreds of terabytes.

Here, we propose a format and method to compress protein structures requiring only 2 kilobytes for a protein structure of average size (8 bytes per residue), reducing the required storage space by an order of magnitude. We achieve this reduction by encoding the torsion angles of the backbone as well as the side-chain angles in a compact format instead of the 3D coordinates. Additionally, we show that using our lossy compression has no impact on structural downstream analysis.

By storing angles with an optimized bit-format, we can reduce the disk space required by 90% compared to float-encoded 3D coordinates, while maintaining a high compression and decompression speed.

Virtual: Foldseek: fast and accurate protein structure search
COSI: 3DSIG
  • Michel van Kempen, Max Planck Institute, Germany
  • Stephanie Kim, Seoul National University, South Korea
  • Charlotte Tumescheit, Seoul National University, South Korea
  • Milot Mirdita, Max Planck Institute, Germany
  • Johannes Söding, Max Planck Institute, Germany
  • Martin Steinegger, Seoul National University, South Korea
  • Cameron Gilchrist, Seoul National University, South Korea


Presentation Overview: Show

Highly accurate structure prediction methods, such as AlphaFold2 and RoseTTAFold, are generating an avalanche of publicly available protein structures. Searching through these structures with current structural alignment tools is becoming the main bottleneck in their analysis. Here we propose Foldseek a fast and sensitive protein structures alignment method to compare large structure sets. Foldseek encodes structures as sequences over a 20-state 3Di alphabet. 3Di describes discretized tertiary residue-residue interactions, which is critical for reaching high sensitivities. Foldseek's novel local alignment stage combines structural and amino acid substitution scores to improve sensitivity without sacrificing speed. It reaches sensitivities similar to state-of-the-art structural aligners while being at least 20,000 times faster. The open-source Foldseek software is available at foldseek.com and a webserver at search.foldseek.com

Virtual: iBIS2Analyzer: a web server for a phylogeny-driven coevolution analysis of protein families
COSI: 3DSIG
  • Francesco Oteri, Sorbonne Université, France
  • Francesca Nadalin, EMBL, Italy
  • Edoardo Sarti, Inria Université Côte d'Azur, France
  • Alessandra Carbone, Sorbonne Université, France


Presentation Overview: Show

Residue coevolution within and between proteins is used as a marker of physical interaction and/or residue functional cooperation. Pairs or groups of coevolving residues are extracted from multiple sequence alignments based on a variety of computational approaches. However, coevolution signals emerging in subsets of sequences might be lost if the full alignment is considered. iBIS2Analyzer is a web server dedicated to a phylogeny-driven coevolution analysis of protein families with different evolutionary pressure. It is based on the iterative version, iBIS2, of the coevolution analysis method BIS, Blocks in Sequences. iBIS2 is designed to iteratively select and analyse subtrees in phylogenetic trees, possibly large and comprising thousands of sequences. With iBIS2Analyzer, openly accessible at http://ibis2analyzer.lcqb.upmc.fr/, the user visualizes, compares and inspects clusters of coevolving residues by mapping them onto sequences, alignments or structures of choice, greatly simplifying downstream analysis steps. A rich and interactive graphic interface facilitates the biological interpretation of the results.

Virtual: iCn3D: From Web-based 3D Viewer to Structural Analysis Tool in Batch Mode; plus recent VR and AR features
COSI: 3DSIG
  • Kevin Yang, University of Maryland, United States
  • Yuchen Ge, Johns Hopkins University, United States
  • Guangfeng Song, NCBI/NLM, United States
  • Francesco Tabaro, EMBL Rome, United States
  • Nicholas Johnson, NINDS/NIH, United States
  • Ben Busby, DNAnexus, United States
  • David Enoma, Noma Technology, United States
  • Chin-Hsien Tai, NCI/NIH, United States
  • Sridhar Malkaram, West Virginia State University, United States
  • Rachel Dunn, University of British Columbia, United States
  • Zhiyu Cheng, University of California, Irvine, United States
  • Jack Lin, University of Washington, United States
  • Jiyao Wang, National Institutes of Health, United States
  • Sarah Zhao, Cary Academy, United States
  • Li Chong, Bezmialem Vakif University, Turkey
  • Tiejun Cheng, NCBI/NLM, United States
  • Gabriele Marchler, NCBI/NLM, United States
  • Thomas Madej, NCBI/NLM/NIH, United States
  • Shennan Lu, NCBI, United States
  • Dachuan Zhang, NCBI/NLM/NIH, United States
  • Christopher Lanczycki, NCBI, National Library of Medicine, United States
  • Aron Marchler-Bauer, NCBI, National Library of Medicine, United States
  • Philippe Youkharibache, NIH, United States


Presentation Overview: Show

iCn3D was initially developed as a web-based 3D molecular viewer. It became a collaborative research instrument through the sharing of permanent, shortened URLs that encapsulate not only annotated visual molecular scenes, but also all underlying data and analysis scripts in a FAIR manner. More recently, with the growth of structural databases, the need to analyze large structural datasets systematically led us to use Python scripts and convert the code to be used in Node.js scripts. We showed a few examples of Python scripts at https://github.com/ncbi/icn3d/tree/master/icn3dpython to export secondary structures or PNG images from iCn3D. Users just need to replace the URL in the Python scripts to export other annotations from iCn3D. Furthermore, any interactive iCn3D feature can be converted into a Node.js script to be run in batch mode, enabling an interactive analysis performed on one or a handful of protein complexes to be scaled up to analysis features of large ensembles of structures. Currently available Node.js analysis scripts examples are available at https://github.com/ncbi/icn3d/tree/master/icn3dnode. We also review new features such as DelPhi electrostatic potential, 3D view of mutations, alignment of multiple chains, assembly of multiple structures by realignment, and use of iCn3D in Jupyter Notebook as described at https://github.com/ncbi/icn3d/tree/master/jupyternotebook.

Recently we added the Virtual Reality (VR) and Augmented Reality (AR) features in iCn3D. The video demo is at https://youtu.be/qzhuomrJPnI .

Source Code: https://github.com/ncbi/icn3d .

Virtual: Interaction between hIFNγ and HS oligosaccharides
COSI: 3DSIG
  • Elena Lilkova, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria
  • Elena Krachmarova, Institute of Molecular Biology “Roumen Tsanev”, Bulgarian Academy of Sciences, Bulgaria
  • Peicho Petkov, Faculty of Physics, Sofia University "St. Kliment Ohridski", Bulgaria
  • Nevena Ilieva, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria
  • Kristina Malinova, Institute of Molecular Biology “Roumen Tsanev”, Bulgarian Academy of Sciences, Bulgaria
  • Genoveva Nacheva, Institute of Molecular Biology “Roumen Tsanev”, Bulgarian Academy of Sciences, Bulgaria
  • Leandar Litov, Faculty of Physics, Sofia University "St. Kliment Ohridski", Bulgaria


Presentation Overview: Show

Human interferon-gamma (hIFNγ) is a crucial immunomodulating cytokine, which binds to a high-affinity cellular receptor hIFNγR1. The cytokine also binds to the glycosaminoglycans (GAGs) heparin and heparan sulfate (HS), which modulates its physico-chemical properties.
We report molecular dynamics studies of the interaction of hIFNγ and HS-derived oligosaccharides in two different scenarios – in the circulation, and at the cell-surface, when the cytokine forms a complex with its receptor.
HS oligosaccharides bind to the C-termini of free IFNγ with high affinity, forming very stable complexes due to the strong electrostatic attraction, and also interact with the positively charged solvent-exposed domains in the cytokine globule. This impedes further interaction of the cytokine with hIFNγR1.
On the other hand, GAGs, and HS in particular, may be crucial participants in the formation
of the hIFNγ–hIFNγR complex at the cell surface. Our in silico results demonstrate, that placing HS oligosaccharides between the two receptor units facilitates the formation of the cytokine–receptor complex by pulling down the hIFNγ globule via electrostatic attraction of its C-termini. Experiments performed on cell culture confirm, that inhibition of the sulfation of HS proteoglycans by addition of NaClO3 to the cell medium leads to decreased hIFNγ activity.

Virtual: Long time-scaled structural dynamics-based mutation analysis of GNE Myopathy variants in the Indian subcontinent
COSI: 3DSIG
  • Ragothaman Yennamalli, SASTRA Deemed to be University, India
  • Shashank Ravichandran, SASTRA Deemed to be University, India
  • Yatindrapravanan Narasimhan, SASTRA Deemed to be University, India
  • Alok Bhattacharya, Ashoka University, India


Presentation Overview: Show

Multiple mutations in the bifunctional UDP-N-acetyl-2-epimerase/N-acetylmannosamine kinase (GNE) gene resulting in defects in the skeletal muscles leading to GNE myopathy. The disease is characterized by progressive muscle weakness and atrophy leading to extreme disability. We analyzed the mutations in the Indian population to understand the correlation between the genotype and phenotype. We used the dominant-isoform 2 of GNE, mutations Ile618Thr (most pathogenic variant), and Val727Met (least pathogenic variant). The mutated sequences were submitted to AlphaFold and RoseTTAFold. The high-quality models (confidence score of 0.82) were then simulated using GROMACS to study the effect of the mutations on GNE structure by running 0.5 microsecond of MD simulations separately. Mutational analysis (short-range and long-range) was performed by mapping the interactions around the mutation site categorized as newly formed interactions, interactions lost, and interactions retained. The contact order for the aromatic interactions was significantly decreased in the mutants. Comparing the interactions around the residue 618 and 727 in the isoform2 and mutant structures, there are relatively more interactions lost in the immediate vicinity of the mutants. Ile618Thr has more interactions lost indicating lower stability in this mutant. The results suggest that there is a correlation between structural changes and phenotype of the mutations.

Virtual: Molecular Modelling and Computational Characterization of ars operon in Deinococcus indicus
COSI: 3DSIG
  • Shrivaishnavi Ranganathan, SASTRA University, India
  • Deepa Sethi, Shiv Nadar University, India
  • Ramya L, SASTRA University, India
  • Richa Priyadarshini, Shiv Nadar University, India
  • Ragothaman M Yennamalli, SASTRA University, India


Presentation Overview: Show

Deinococcus indicus is a novel Gram-negative bacterium, which is radiation-resistant, and exceptionally resistant to Arsenic. The six proteins of the corresponding ars operon involved in Arsenic extrusion have not yet been characterized, and the mechanism of the same is unknown. Here, we present a computational model (using RoseTTAFold) of the operonic structure and characterization of these proteins - two transcriptional regulators (ArsR1, ArsR2), two Arsenate reductases (ArsC2, ArsC3), one metallophosphatase family protein, and a transmembrane arsenite efflux pump (ArsB). ArsRs are repressors of the operon, and the reductases reduce arsenate (As5+) to arsenite (As3+) ions. Excluding ArsB, others belong to the  structural class. ArsB is a transmembrane pump that mediates the removal of arsenite from the cell, and the function of the metallophosphatase family protein in this operon is unknown. After modelling the proteins using a deep – learning approach, they were simulated using Molecular Dynamics simulations in GROMACS and NAMD. The obtained trajectories were analyzed using various measures to ascertain their stability in a simulated environment, and to mine for properties and features of these proteins. Known structural homologs of all proteins were used for comparison to arrive at degree of similarity in structure and function.

Virtual: PITHIA: protein interaction site prediction using multiple sequence alignments and attention
COSI: 3DSIG
  • Seyedmohsen Hosseini, University of Western Ontario, Canada
  • Lucian Ilie, University of Western Ontario, Canada


Presentation Overview: Show

Cellular functions are governed by proteins. While some proteins work independently, most function by interacting with each other. It is crucially important to know the binding sites that facilitate the interactions. Experimental methods are costly and time consuming, therefore it is essential to develop effective computational methods. We present PITHIA, a deep learning model for protein interaction site prediction that exploits several of the most powerful tools in bioinformatics: alignment, attention, and embedding. The recently introduced MSA-transformer uses the power of attention to learn from millions of multiple sequence alignments, a language model that surpasses previous unsupervised methods by a wide margin. We use the contextual embeddings produced by the MSA-transformer as inputs to our program. The architecture of PITHIA is attention based as well, selected by a thorough comparison with multiple candidates. For meaningful comparison with existing programs, we update several widely used datasets with the most current protein binding site information and create a new one, which is the largest and most challenging to date. PITHIA greatly surpasses the competition on five datasets with respect to multiple measures, exceeding the closest competitor by up to 35% in terms of area under the precision-recall curve.

Virtual: PPalign: optimal alignment of Potts models representing proteins with direct coupling information
COSI: 3DSIG
  • Hugo Talibart, Institut de Systématique, Evolution, Biodiversité (ISYEB), MNHN, Sorbonne Université, EPHE, UA, CNRS, France
  • François Coste, Univ Rennes, Inria, CNRS, IRISA, Rennes, France, France
  • Mathilde Carpentier, Institut de Systématique, Evolution, Biodiversité (ISYEB), MNHN, Sorbonne Université, EPHE, UA, CNRS, France


Presentation Overview: Show

To assign structural and functional annotations to the ever-increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or profile Hidden Markov Model methods, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, this problem is computationally hard.
We introduced PPalign, a program based on Integer Linear Programming, to compute the optimal pairwise alignment of Potts models representing proteins. The approach was assessed on reference pairwise sequence alignments with low sequence identity (3% to 20%). In this experimentation, Potts models were aligned in reasonable time (1’37” on average), and PPalign yielded a better mean F1 score and found significantly better alignments than HHalign and independent-site PPalign in some cases.
These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time.

Virtual: ProteinAlignmentObstruction – an algorithm for detecting and quantifying steric and topological obstructions to structural alignments of proteins
COSI: 3DSIG
  • Peter Røgen, Technical University of Denmark, Denmark


Presentation Overview: Show

Structure comparison is fundamental for understanding proteins, specifically for studying their sequence and structural evolution and for guiding our efforts to predict their structures from their sequences of amino acids. Coordinate based structural alignment methods optimize the distances traversed by aligned residue pairs during the linear interpolation between two superimposed structures. Current alignment scores do not take into account if there is room for this morph, if it causes steric clashes or if it causes topological changes to the compared structures.

ProteinAlignmentObstruction finds steric clashes and self-intersections occurring during the linear interpolation between two aligned and superimposed structures. Self-intersections that can be avoided by re-folding at most M (user-defined) residues are called removable and the remaining self-intersections detect different threading or topology and are called essential.

We find examples of homologous protein pairs with distinct threading and many pairs of distinctly classified folds that easily are morphed into each other emphasizing the continuous nature of parts of protein fold space. I will present our new server Steric and TOPological Model Hindrance and examples of threading errors it finds in CASP14 models. There are many applications where the ability to detect if structures are close in configuration space may prove important.

Virtual: Structural Bioinformatics Library: interactive, portable, online
COSI: 3DSIG
  • Edoardo Sarti, Inria Université Côte d'Azur, France
  • Frédéric Cazals, Inria Université Côte d'Azur, France


Presentation Overview: Show

The Structural Bioinformatics Library (SBL https://sbl.inria.fr/ ) is a large set of optimized computational tools for the analysis of protein structure, function and mechanism. It consists of an extremely modular architecture of C++ template classes and methods divided by level of generality and abstraction - from core features implementing fundamental algorithms to applications aimed at solving specific bioinformatics problems. The extreme care in the mathematical formulation of each task ensures optimal performances in terms of time and robustness.
SBL now counts 23 different application packages divided in four main topics, from protein interface recognition to conformational sampling to binding affinity and functional prediction. For each SBL Application, we are developing an interactive environment that lets the user explore the SBL algorithms and their parameters. Important and non-trivial options are explained in convenient information frames, and an example set of input data is always available. 
The SBL is now also available as a Singularity container and a Conda package.

Virtual: The GSK3 from Moniliophthora roreri: Repurposing known inhibitors and natural ligands against a phytopathogenic fungus of The Cocoa Tree
COSI: 3DSIG
  • Décio Lisboa, Universidade Estadual de Santa Cruz (UESC), Brazil
  • Tarcisio Melo, Universidade Estadual do Sudoeste da Bahia (UESB), Brazil
  • Wagner Soares, Universidade Estadual do Sudoeste da Bahia (UESB), Brazil
  • Lucas Palmeira, Universidade Estadual do Sudoeste da Bahia (UESB), Brazil
  • Carlos Pirovani, Universidade Estadual de Santa Cruz (UESC), Brazil
  • Bruno Andrade, Universidade Estadual do Sudoeste da Bahia (UESB), Brazil
  • Raner Silva, Universidade Estadual de Santa Cruz, Brazil


Presentation Overview: Show

The Moniliophthora roreri is a phytopathogenic and hemibiotrophic fungus. This pathogen causes moniliasis, also known as frosty pod rot in species of the genus Herrania and Theobroma (cocoa tree). Several countries in South, Central, and North America have been infected by this disease, and its last register occurred in the north of Brazil in July 2022, causing drastic economic losses. This study aimed to use in silico methodologies for repurposing known drug inhibitors and natural compounds available in public databases for rationally discovering new molecules to mitigate this fungus. For defining a key M. roreri protein target we performed a Blastp of the fungus genome against the PDB database and a list of possible targets were generated by filtering sequence coverages and identities values. In addition, a biological network (PPI) was built to know the biological importance of the modeled proteins, which showed two targets as the hub and bottleneck proteins. The GSK3 was selected for modeling through Swiss-Model, and docking analyses revealed Manzamine A as its possible inhibitor, an alkaloid isolated from marine sponges, with an affinity energy of -9.1 Kcal/mol. Molecular Dynamics calculations have been done for describing the ligand stability inside the GSK3 active site.

Virtual: Understanding the significance of inter-protein bifurcated interactions in protein-protein complexes
COSI: 3DSIG
  • Sneha Bheemireddy, Indian Institute of Science, Bangalore, India, India
  • Revathy Menon, National Centre for Biological Sciences (NCBS), India
  • Narayanaswamy Srinivasan, Indian Institute of Science, Bangalore, India, India


Presentation Overview: Show

Multi-protein assemblies play a crucial role in several cellular processes. Studying the functional basis of such complexes begins with the analysis of protein-protein interactions. Several studies have highlighted the significance of interfacial residues in protein-protein complexes and their role in conferring stability and specificity to the complex. In this work, the features of inter-protein bifurcated interactions in multi-protein assemblies have been investigated. We begin by generating a dataset of heteromeric complexes of known 3-D structures. Upon screening of these complexes, we found that inter-protein bifurcated interactions are present in over 600 multi-protein assemblies. Arg, Tyr, and Leu are the highly occurring amino acids in bifurcated inter-protein interactions. Van der Waals interactions, hydrophobic interactions, and salt bridges are the most frequent interaction types. Further, we found that the majority of these residues are hotspots, and they are moderate to highly conserved, with a few exceptions. We could explain the biological significance of bifurcated interactions through a few case studies. Overall, this study expands the knowledge on protein-protein interactions paving the way for the learning of multi-protein assemblies.

P-001: Characterizing and explaining impact of disease- associated mutations in proteins without known structures or structural homologues
COSI: 3DSIG
  • Neeladri Sen, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Ivan Anishchenko, Institute for Protein Design, University of Washington, United States
  • Nicola Bordin, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Ian Sillitoe, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Sameer Velankar, Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
  • David Baker, Institute for Protein Design, University of Washington, United States
  • Christine Orengo, Institute of Structural and Molecular Biology, University College London, United Kingdom


Presentation Overview: Show

The structure of proteins can help understand the mechanism of diseases associated with missense mutations and help develop therapeutics. With improved deep learning techniques such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologues. We modelled and extracted the domains from 553 disease-associated human proteins without known protein structures or sequential homologues in the Protein Databank. Domains that could be assigned to CATH superfamilies had higher quality and lower RMSD between AlphaFold and RoseTTAFold models compared to those that could only be assigned to Pfam or neither. Using these models, we predicted ligand-binding sites, protein-protein interfaces, conserved residues, destabilising effects, and pathogenicity caused by missense mutations. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization, or pathogenicity. These mutations were more buried, pathogenic, closer to predicted functional sites and had higher predicted ddG of mutation compared to polymorphisms. Usage of models from the two state-of-the-art techniques and multiple predictors predicting the same mutation to have an effect provides higher confidence in our predictions. We explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.

P-002: Structure-enhanced Deep Meta-learning Predicts Genome-Wide Uncharted Chemical-Protein Interactions
COSI: 3DSIG
  • Tian Cai, Hunter College, The City University of New York, United States
  • Lei Xie, Hunter College, The City University of New York, United States


Presentation Overview: Show

Discovering genome-wide chemical-protein interactions is instrumental for chemical genomics, drug discovery and precision medicine. However, more than 90% of gene families remain dark, i.e., their small molecular ligands are undiscovered. Existing approaches typically fail when the dark protein of interest differs from those with known ligands or structures. To address this challenge, we developed a deep learning framework PortalCG. PortalCG consists of three novel components: (i) end-to-end step-wise transfer learning in recognition of sequence-structure-function paradigm, (ii) out-of-cluster meta-learning in light of protein evolution for generalizing machine learning models to unstudied gene families, and (iii) stress model selection to facilitate model deployment in a real-world scenario. In rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art sequence- and structure-based techniques when applied to dark gene families. Experimental validations on 65 compounds supported the accuracy and robustness of PortalCG. Thus, PortalCG is a viable solution to the out-of-distribution (OOD) problem in exploring the dark protein functional space, and can be applied to a wide variety of scientific domains.

P-003: Enhanced Sampling Simulations for Genomics: Application to KRAS Mutations
COSI: 3DSIG
  • Brian Ratnasinghe, Medical College of Wisconsin, United States
  • Neshatul Haque, Medical College of Wisconsin, United States
  • Michael Zimmermann, Medical College of Wisconsin, United States


Presentation Overview: Show

Researcher’s ability to sequence genomic variation outpaces the ability to functionally interpret mutations. Previous work has shown that calculations on 3D protein structure enhance the mechanistic information for interpreting genetic variation. Among the key genetic alterations driving cancer, and germline conditions referred to as RASopathies, is KRAS. KRAS is a GTPase enzyme that controls cell growth and proliferation. The prevalence of KRAS mutations within cancer cells makes KRAS an attractive target for cancer treatment, potentially via small molecule targeting. To better understand the conformational sampling of KRAS hotspot mutations, we performed unbiased and enhanced-sampling molecular dynamics simulations. We then compared both to each other and to x-ray crystallography data. Our results demonstrate a wider range of conformations using enhanced sampling and that better capture the variability from experimental data. These findings allow us to better differentiate between conformations of different KRAS genetic variants, and to better understand what conformational changes are more prevalent for each. We investigate the potential to score conformational sampling spaces, which may allow us to streamline the process of differentiating between hotspot variants of different proteins for future projects. Finally, clustering of KRAS conformational spaces suggest mutation-specific conformations that may be candidates for small molecule targeting.

P-004: Intrinsic linking of chromatin in human cells
COSI: 3DSIG
  • Maciej Borodzik, Institute of Mathematics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland, Poland
  • Michał Denkiewicz, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Krzysztof Spalinski, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Kamila Winnicka, Centre of New Technologies, University of Warsaw, ul. Banacha 2c, 02-097 Warsaw, Poland, Poland
  • Kaustav Sengupta, Centre of New Technologies, University of Warsaw, ul. Banacha 2c, 02-097 Warsaw, Poland, Poland
  • Marcin Pilipczuk, Institute of Informatics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland, Poland
  • Michał Pilipczuk, Institute of Informatics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland, Poland
  • Yijun Ruan, The Jackson Laboratory for Genomic Medicine, USA, United States
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw & Warsaw University of Technology, Warsaw, Poland, Poland


Presentation Overview: Show

Motivation: We propose a practical algorithm based on graph theory, with the purpose to identify CTCF- mediated chromatin loops that are linked in 3D space. Our method is based on finding certain graph structures, K6 minors, in graphs constructed from pairwise chromatin interaction data obtained from the ChIA-PET experiments. We show, that such graph structures, representing particular arrangement of loops, mathematically necessitate linking, if co-occurring in an individual cell. The presence of these linked structures can advance our understanding of the principles of spatial organization of the genome. Results: We apply our method to graphs created from in situ ChIA-PET data for GM128787, H1ESC, HFFC6 and WTC11 cell lines, and from long-read ChIA-PET data. We look at these datasets as divided into CCDs – closely interconnected regions defined on the basis of CTCF loops. We find numerous candidate regions with minors, indicating the presence of links. The graph-theoretic characteristics of these linked regions, including betweenness and closeness centrality, differ from regions without, in which no minors were found, which supports their non-random nature. We provide two versions of the algorithm: one efficient enough to be applied to large datasets, and the other with greater detection capabilities.

P-005: Hetero-dimeric protein structure prediction from paired multiple sequence alignments using a deep residual neural network
COSI: 3DSIG
  • Jacob Schwartz, University of Michigan, United States
  • Eric Bell, University of Michigan, United States
  • Peter Freddolino, University of Michigan, United States


Presentation Overview: Show

While recent advances in structural bioinformatics and deep learning have made the prediction of single-chain protein structures highly accurate, many related challenges remain, including that of multiple-chain protein complex structure prediction. Coevolution-based quaternary structure prediction, unlike its single-chain counterpart, presents the challenge of constructing a “paired” multiple sequence alignment (MSA) to embed sequence-sequence pairwise coevolutionary information for hetero-multimers. Attempts to predict protein complex structures have paired MSAs by the species-origin of individual sequences, by allowing for large gaps in unpaired regions of the MSA, and/or through naïve concatenation of monomeric MSAs. Here, we present DeepDimer, a novel pipeline for the prediction of protein heterodimer structures using a novel MSA pairing method. From a paired MSA, DeepDimer generates inter-chain coevolutionary features, and from these features predicts an inter-chain distance map by a residual, convolutional, and deep neural network. Using this inter-chain distance map and single-chain distance maps generated by DeepPotential, DeepDimer uses a modified version of PotentialFold to construct dimer structures. While DeepDimer is still under development, preliminary results suggest its novel MSA pairing method significantly improves heterodimer structure prediction accuracy. DeepDimer will be released as a free web server and an open-source project.

P-006: Evaluation of peptide-protein docking using Alphafold2 and RoseTTAFold
COSI: 3DSIG
  • Negin Manshour, University of Missouri-Columbia, United States
  • Yang Yu, Northeast Normal University, China
  • Wenyuan Qin, Northeast Normal University, China
  • Fei He, Northeast Normal University, China
  • Duolin Wang, University of Missouri-Columbia, United States
  • Dong Xu, University of Missouri-Columbia, United States


Presentation Overview: Show

Predicting peptide-protein docking structure has experienced an impressive scientific momentum over the past few years. Analyzing the structure of these complexes and discovering how they bind together plays a crucial role in designing and developing peptide drugs and enzyme inhibitors. A wide range of computational algorithms has been developed for peptide-protein docking predictions. Current peptide-protein docking tools often require crystallized protein structures, which are expensive and difficult to capture. Benefiting from Alphafold2 and RoseTTAFold tools, most protein structures can be precisely deciphered based on protein sequences only. Formulating peptide-protein binding/docking as a protein complex folding problem, we can use Alphafold2 to generate a series of bounded peptide-protein conformations. We have designed a pipeline for predicting peptide-protein complexes and scoring the predicted models by using these AI-based tools. In this work, we benchmarked the pipeline by a set of non-redundant peptide-protein complex structures derived from databases of peptiDB, Propedia, and PepBDB. We compared the results with several peptide-protein tools, such as InterPep2 and GalaxyWEB. We also evaluated several scoring schemes, including our in-house method based on a graph neural network, in ranking peptide-protein binding conformations.

P-007: CSM-epitope: linear B-cell epitope prediction using graph-based signatures and interpretable machine learning
COSI: 3DSIG
  • Bruna Moreira da Silva, The University of Melbourne, Australia
  • David Ascher, The University of Melbourne, Australia
  • Douglas Pires, The University of Melbourne, Australia


Presentation Overview: Show

Linear B-Cell epitope refers to a class of antigenic determinants that could bind to B-Cell receptors or antibodies released by the adaptive immune system. Among the two types of epitope classes, the continuous (or linear) and the discontinuous, both only exist upon the detection and binding of the antigen by an antibody. In a scalable and less expensive process, computational approaches aim to contribute with epitope-based vaccines and immunotherapies development, identifying from a protein sequence, which residues are more likely to be part of an epitope.
A variety of prediction methods have been developed over the years, however, their reliability for clinical applications is still questionable based on medium to low performance (Matthew’s Correlation Coefficients ranging from 0.32 to 0.62). Additionally, current machine learning models also lack interpretability, limiting biological insights that could otherwise be obtained. Here, we introduce CSM-epitopes, an interpretable machine learning method, capable of accurately identifying linear B-cell epitopes, leveraging a new graph-based signature representation of protein sequences, based on our well established CSM (Cutoff Scanning Matrix) algorithm.

P-008: Identification of DNA loops in the Genome by Multi-Dimensional Scaling
COSI: 3DSIG
  • Ryo Ishibashi, Chuo University, Japan
  • Y-H Taguchi, Chuo University, Japan


Presentation Overview: Show

The interaction of enhancers and promoters on genomic DNA remains poorly understood. Chromosomes cannot be observed during the cell division cycle because the genome forms a chromatin structure and spreads within the nucleus. However, high-throughput chromosome conformation capture (Hi-C) measures the physical interactions of genomes. In previous studies, DNA extrusion loops were directly derived from Hi-C heat maps. Multidimensional Scaling (MDS) is used in this assessment to more precisely locate DNA loops. MDS is a multivariate analysis method that reproduces the original coordinates from the distance matrix between elements. We used Hi-C data of an immortalized line of human T lymphocyte cells and applied MDS as the distance matrix of the genome. In addition, we selected columns 2 and 3 of the orthogonal matrix U as the desired structure. Overall, the DNA loops from the reconstructed genome structure contained the transcription factors involved in DNA loops, such as SATB1 and HMGIY. Therefore, our results are consistent with the biological findings. Our method is suitable for identifying DNA loops in the genome.

P-009: Paragraph - Antibody paratope prediction using Graph Neural Networks with minimal feature vectors
COSI: 3DSIG
  • Lewis Chinery, University of Oxford, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom
  • Newton Wahome, GSK, United States
  • Iain Moal, GSK, United Kingdom


Presentation Overview: Show

The development of new vaccines and antibody therapeutics typically takes several years and requires over $1bn in investment. Accurate knowledge of the paratope (antibody binding site) can speed up and reduce the cost of this process by improving our understanding of antibody-antigen binding.

We present Paragraph, an open-source structure-based paratope prediction tool that outperforms current state-of-the-art tools using simpler feature vectors and no antigen information. Representing the antibody variable region as a graph, Paragraph uses equivariant graph neural network layers to predict the probability of each residue belonging to the paratope.

Given the lack of readily available antibody crystal data, it is essential that structure-based prediction tools work on model structures. As such, all our results are on models.

In addition to improving paratope prediction accuracy, we also identify issues with currently used benchmark datasets and metrics. To overcome this, we develop a larger, cleaner dataset to be used in future efforts and suggest metrics well-suited to evaluating highly class-imbalanced problems.

Paragraph achieves a PR AUC of 0.725 on ABlooper model structures of our expanded dataset. Promisingly, Paragraph’s performance increases with model confidence, suggesting our accuracy may rise with future improvements to antibody structure prediction.

P-010: Identification and structural determination of a novel protein complex from in situ electron cryotomography of Mycoplasma pneumoniae
COSI: 3DSIG
  • Joseph Christian Somody, EMBL Heidelberg, Germany
  • Rasmus Kjeldsen Jensen, EMBL Heidelberg, Germany
  • Liang Xue, EMBL Heidelberg, Germany
  • Lukas Adam, EMBL Heidelberg, Germany
  • Peer Bork, EMBL Heidelberg, Germany
  • Julia Mahamid, EMBL Heidelberg, Germany


Presentation Overview: Show

The genome of Mycoplasma pneumoniae has undergone much reduction, making it an organism suitable for system-wide studies in proteomics or, in this case, a proof of principle for in-cell structural proteomics. Here, we show that novel protein complexes can be identified from whole cells imaged with electron cryotomography.

Over 500 cells were imaged. After 3D tomogram reconstruction, we selected a recurring particle of interest (POI) of unknown identity. We picked a number of these POIs, refined subvolumes in order to construct an initial average 3D density of the POI, and used this density for template matching against the large dataset of tomograms.

After curation, we trained a 3D U-Net on subvolumes around POIs. The trained model was then used to predict more POIs. We iteratively trained, predicted, and curated to improve results.

To identify the proteins in this POI, we performed rigid-body fitting of AlphaFold-predicted structures into the density. Top hits were confirmed via crosslinking mass spectrometry data. We are working in parallel to improve the resolution of our average, such that secondary-structure elements become apparent and we can begin modelling our identified proteins into the map.

Ultimately, we show it is possible to identify uncharacterized proteins from whole-cell tomograms.

P-011: AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms
COSI: 3DSIG
  • Nicola Bordin, University College London, United Kingdom
  • Ian Sillitoe, University College London, United Kingdom
  • Vamsi Nallapareddy, University College London, United Kingdom
  • Clemens Rauer, University College London, United Kingdom
  • Su Datt Lam, Universiti Kebangsaan Malaysia, Malaysia
  • Vaishali Waman, University College London, United Kingdom
  • Neeladri Sen, University College London, United Kingdom
  • Michael Heinzinger, Technische Universität München, Germany
  • Maria Littmann, Technische Universität München, Germany
  • Stephanie Kim, Seoul National University, South Korea
  • Sameer Velankar, EMBL-EBI, United Kingdom
  • Martin Steinegger, Seoul National University, South Korea
  • Burkhard Rost, Technische Universität München, Germany
  • Christine Orengo, University College London, United Kingdom


Presentation Overview: Show

AlphaFold2, a ML-based method developed by DeepMind, revolutionised the field of structural biology by predicting the 3D structure of proteins with an accuracy often comparable to experimental characterization. In a joint effort with EMBL-EBI, protein structures for 21 model organisms were made available. To exploit these, assigning modelled domains to their evolutionary families helps in understanding how genetic variations modify structure and ultimately function. The CATH database includes evolutionary relationships between protein domains and classifies them into superfamilies. We identify structural domains in AlphaFold2 models and classify them in CATH. While most domain assignments are obtainable by Hidden Markov Models-based methods, remote homologs often are elusive. We recently established CATHe, a supervised machine learning approach that exploits sequence embeddings from the ProtT5 PLM to detect remote homologs. Using CATHe and a new fast structural aligner, Foldseek, we established thresholds for confirming homology. Before structurally validating the assignments, small, disordered, non-globular domains or poorly packed domains were removed. 93% of domains passing these thresholds could be brought into CATH, with the remainder belonging to ~4200 putative novel families. Manual curation efforts on human domains from these novel families, lead to the identification of one new architecture and ~100 new folds.

P-012: GLYCO-2.0: a web-based server to quantify glycan shielding of glycosylated proteins with improved data processing and computational speed
COSI: 3DSIG
  • Myungjin Lee, National Institutes of Health, United States
  • Mateo Reveiz, National Institutes of Health, United States
  • Reda Rawi, National Institutes of Health, United States
  • Peter Kwong, National Institutes of Health, United States


Presentation Overview: Show

Glycans play important roles in protein folding and cell-cell interactions – and, furthermore, glycosylation of protein antigens can dramatically impact immune responses. Previously, we developed an in silico tool GLYCO (GLYcan COverage), to quantify the glycan shielding of protein surfaces. We applied it to determine glycan-free surface of SARS-CoV-2 NTD supersite and to correlate glycan coverage with antigen-antibody properties. Here we developed a user-friendly web server, GLYCO-2.0, and improved the computational speed by replacing the previous linear parametrization with a new analytical cylinder method with KD-trees when retrieving atom positions within the coordinate space. The use of these new methods increased computational speed by ~4-5 fold in single and multiprocessing settings. GLYCO-2.0 can estimate glycan shielding from a single coordinate file or multiple frames derived from for instance molecular dynamics simulations or NMR spectroscopy to account for the inherent flexibility of oligosaccharides. The server offers email notifications, allowing the retrieval of results within a week. Also, we showcased the applicability of GLYCO-2.0 by estimating the glycan shield development of influenza’s hemagglutinin proteins over time. Overall, quantification of glycans by GLYCO-2.0 provides a comprehensive understanding of glycan shielding of glycosylated proteins and contributes to glycoprotein-involved research such as vaccine design.

P-013: DYNAMICAL STUDY OF VIRAL GLYCOPROTEINS AND EVOLUTIONARY FITNESS SIMULATION
COSI: 3DSIG
  • Natalia Fagundes Borges Teruel, UdeM: Université de Montreal, Canada
  • Olivier Mailhot, UdeM: Université de Montreal, Canada
  • Rafael Najmanovich, UdeM: Université de Montreal, Canada


Presentation Overview: Show

Several viral glycoproteins go through conformational changes, fundamental to infection processes. The SARS-CoV-2 Spike protein is of particular importance during the current pandemic. This protein interacts with the human acetylcholinesterase 2 (ACE2) receptor as part of the viral entry mechanism. To do so, the receptor-binding domain (RBD) of Spike needs to be in an open state conformation. Here we utilize coarse-grained Normal Mode Analyses to model the dynamics of SARS-CoV-2 Spike protein variants as well as the transition probabilities between open and closed conformations. We performed 17081 possible in silico single mutations of Spike to determine positions and mutations that may affect the occupancy of the conformational states. Based on that, we successfully predicted some of the main mutations that constitute Alpha, Beta and Gamma variants. We also built a simplified model for binding evaluation, validated with experimental data of the binding between RBD mutants and ACE2, which is now being applied to the evaluation of interfaces between conformational ensembles of Spike and antibody structures, with preliminary results offering a consensus among the various experimental interfaces determined, to propose a method to evaluate mutants that integrates dynamics, binding, and immune escape.

P-014: Prediction of protein-protein interactions using sequences of intrinsically disordered regions
COSI: 3DSIG
  • Gözde Kibar, Max Planck Institute for Molecular Genetics, Germany
  • Martin Vingron, Max Planck Institute for Molecular Genetics, Germany


Presentation Overview: Show

Protein-protein interactions (PPIs) play a crucial role in many molecular processes. Despite many efforts, mechanisms governing molecular recognition between proteins remain mysterious. This presents a challenge for computational approaches to differentiate between interacting and non-interacting proteins. Here we present a new method to tackle this challenge using intrinsically disordered regions (IDRs). IDRs are protein segments that are functional despite lacking a single invariant three-dimensional structure. The prevalence of IDRs in eukaryotic proteins suggests that the highly dynamic nature of IDRs is critical for protein function. To test this hypothesis, we predicted PPIs using IDR sequences in candidate interacting proteins in humans. Moreover, we acquired appropriate training strategies based on the type of prediction problem between proteins. Our findings underline the importance of separating problem types from each other and show that sequences encoding IDRs can be used to predict specific features of the human IDP networks. Our findings further suggest that accounting for IDRs in future analyses should accelerate efforts to elucidate the eukaryotic PPI network.

P-015: XLEC - Large-scale prediction of protein-protein complex structures from sequence co-evolution and cross-linking data
COSI: 3DSIG
  • Hadeer Elhabashy, Max-Planck-Institut für Biologie Tübingen, Germany
  • Oliver Kohlbacher, University of Tübingen, Germany


Presentation Overview: Show

Prediction and structural modeling of protein-protein interactions (PPIs) are essential for understanding biological processes. Most large-scale experimental and computational approaches that predict PPIs do not provide structural information. We present a novel approach, XLEC, combining cross-linking mass spectrometry (XL-MS) and evolutionary couplings (ECs) data for efficient proteome-wide prediction and modeling of PPIs. While ECs derived from multiple sequence alignments primarily yield information on direct contacts between proteins across the interface, XL-MS data preferentially captures longer-range interactions, hence these methods contain complementary information. XLEC integrates information from both approaches in a machine learning-based model and subsequent constraint-based modeling of the complex structure. We applied XLEC to data from murine mitochondrial proteomes and compared its performance to those of XL-MS and ECs separately. Our preliminary assessment suggests that XLEC outperforms XL-MS or ECs-based identification of PPIs (precision/recall: XLEC 76%/76%; XL-MS only: 71%/57%; ECs: 68%/57%). Furthermore, XLEC-based modeling of PPIs achieved excellent L-RMSD (<10 Å) for 20% of the benchmark dataset (XL-MS only: 2%; ECs only: 11%). Using XLEC, we generated around 500 de novo PPI models revealing novel insights into the mitochondrial interactome.

P-016: Standing out in the crowd: Native protein partners are distinct from the non-native ones in protein-protein interactions
COSI: 3DSIG
  • Amar Singh, Computational Biology Program, The University of Kansas, Lawrence, Kansas 66045, USA, United States
  • Petras J. Kundrotas, Computational Biology Program, The University of Kansas, Lawrence, Kansas 66045, USA, United States
  • Ilya A. Vakser, Computational Biology Program, The University of Kansas, Lawrence, Kansas 66045, USA, United States


Presentation Overview: Show

In the context of crowded cellular environment, one of the important challenges is to elucidate how proteins distinguish their native partners from a wide variety of non-interactors. The increasing availability of experimentally determined protein-protein complexes provides an opportunity to investigate preferences in protein-protein interactions. We systematically explored the shape complementarity of the interacting proteins using binary hetero complexes from the Protein Data Bank (PDB). The results showed that protein shape characteristics and the corresponding intermolecular energy landscape, sampled by a systematic docking protocol, can discriminate the non-interacting proteins. The number of minima on the energy landscape of known protein interactors, as well as the clustering patterns of the energy minima, are different from those of the non-native protein ligands. The findings provide an insight into fundamental properties of protein recognition. The results can be used to generate more adequate sets of protein-protein complexes for knowledge-based modeling.

P-017: Genetic determinants in Bombali ebolavirus glycoprotein dictate viral entry into human cells
COSI: 3DSIG
  • Gorka Lasso, Albert Einstein College of Medicine, United States
  • Michael Grodus, The Rockefeller University, United States
  • Dimitry Lupyan, Schrodinger, United States
  • Robert H. Bortz III, Albert Einstein College of Medicine, United States
  • Estefania Valencia, Albert Einstein College of Medicine, United States
  • Rohit Jangra, Louisiana State University Health Science Center-Shreveport, United States
  • Kartik Chandran, Albert Einstein College of Medicine, United States
  • Simon J. Anthony, University of California-Davis School of Veterinary Medicine, United States


Presentation Overview: Show

Bombali ebolavirus (BOMV) was recently discovered in bats roosting inside houses in Sierra Leone. Although there is currently no evidence of BOMV infection in humans, BOMV glycoprotein (GP) is capable of mediating viral entry into human cells by specifically interacting with the essential filovirus receptor Niemann-Pick C1 (NPC1). Genetic variation at the GP-NPC1 interface serves as a critical determinant of cellular host susceptibility. Given the potential for BOMV transmission to humans, it is imperative to investigate the fundamental properties underlying viral susceptibility of human cells.
Here, we integrate complementary in-silico binding affinity tools (FoldX, FlexDDG and Free-Energy Perturbation -FEP-), in-vitro binding ELISAs, and cell-based infectivity assays to characterize the interaction between BOMV GP and human NPC1. We identified a residue in the BOMV GP interface that decreases the binding affinity of GP to human NPC1. Molecular dynamics analysis suggests that the amino acid variation in GP triggers a conformational change in the interacting NPC1 loop that interrupts energetically favorable contacts. This study uncovers a novel genetic determinant of the cellular host range of BOMV and sets the stage for a novel surveillance strategy to rapidly characterize the zoonotic risk of emergent filoviruses.

P-018: Comparative analysis of structural features in SLiMs from eukaryotes, bacteria, and viruses with importance for host-pathogen interactions
COSI: 3DSIG
  • Heidy Elkhaligy, Florida International University, United States
  • Christian A. Balbin, Florida International University, United States
  • Jessica Siltberg-Liberles, Florida International University, United States


Presentation Overview: Show

Protein-protein interactions drive functions in eukaryotes that can be described by short linear motifs (SLiMs). Conservation of SLiMs help illuminate functional SLiMs in eukaryotic protein families. However, the simplicity of eukaryotic SLiMs makes them appear by chance due to mutational processes not only in eukaryotes but also in pathogenic bacteria and viruses. Further, functional eukaryotic SLiMs are often found in disordered regions. Although proteomes from pathogenic bacteria and viruses have less disorder than eukaryotic proteomes, their proteins can successfully mimic eukaryotic SLiMs and disrupt host cellular function. Identifying important SLiMs in pathogens is difficult but essential for understanding potential host-pathogen interactions. We performed a comparative analysis of structural features for experimentally verified SLiMs from the Eukaryotic Linear Motif (ELM) database across viruses, bacteria, and eukaryotes. Our results revealed that many viral SLiMs and specific motifs found across viruses and eukaryotes, such as some glycosylation motifs, have less disorder. Analyzing the disorder and coil properties of equivalent SLiMs from pathogens and eukaryotes revealed that some motifs are more structured in pathogens than their eukaryotic counterparts and vice versa. These results support a varying mechanism of interaction between pathogens and their eukaryotic hosts for some of the same motifs.

P-020: Sequence-sensitive elastic network captures dynamical elements necessary for human microRNA maturation
COSI: 3DSIG
  • Olivier Mailhot, Université de Montréal, Canada
  • François Major, Université de Montréal, Canada
  • Rafael Najmanovich, Université de Montréal, Canada


Presentation Overview: Show

MicroRNAs (miRNAs) regulate gene expression and have recognized roles in numerous physiological processes and diseases, including cancer. A single nucleotide polymorphism (SNP) in the miR-125a gene leads to poor breast cancer prognosis by blocking the cleavage of the primary miRNA transcript (pri-miRNA). The SNP does not affect pri-miR-125a’s minimal energy structure, leading to the hypothesis that the lost signal must be dynamical. Leveraging high-throughput data on the maturation efficiency of 29 478 pri-miR-125a mutant sequences, we applied our sequence-sensitive elastic network ENCoM to study the full 3D conformational space of pri-miRNAs and its impact on their maturation efficiency. The model predicts maturation efficiency with high accuracy (predictive R-squared of 0.75) when both the ENCoM dynamical signature and the MC-Fold enthalpy of folding are combined, highlighting the synergy between these respectively entropy- and enthalpy-based methods. Looking at the patterns apparent from the model’s coefficients, we corroborate motifs previously identified as necessary for the cleavage of pri-miRNAs but also challenge established notions such as the necessity for a rigid hairpin structure. Our novel approach is fast enough to predict theoretical maturation efficiencies for millions of miRNA sequences, the extremes of which we are currently testing in the lab.

P-021: FrustraEvo: Assessing Protein Families Divergence In The Light Of Sequence and Energetic Constraints
COSI: 3DSIG
  • Victoria Ruiz-Serra, Barcelona Super Computing Center, Spain
  • Maria Freiberger, Protein Physiology Lab, Buenos Aires University, Argentina
  • Camila Pontes, Barcelona Super Computing Center, Spain
  • Miguel Romero, Barcelona Super Computing Center, Spain
  • Pablo Galaz-Davison, Institute for Biological and Medical Engineering, Pontificia Universidad Catolica de Chile, Chile
  • Cesar Ramirez-Sarmiento, Institute for Biological and Medical Engineering, Pontificia Universidad Catolica de Chile, Chile
  • Rodrigo Gonzalo Parra, Barcelona Supercomputing Center, Spain
  • Alfonso Valencia, Barcelona Supercomputing Center, Spain


Presentation Overview: Show

Protein families evolve by the accumulation of sequence variations that translate into changes in the folding pathways and the structure and dynamics of the native state of their members. These changes are constrained by the features of the folding energy landscape as well as the cellular context where these proteins perform their molecular function.

Natural proteins fold by minimizing the energetics of those interactions that are present in their native states. Although the free energy is globally minimized, not all interactions that are present in the native state can be energetically optimized. These conflicting, frustrated, signals have been linked with different functional aspects such as protein-protein interactions, allosterism and catalytic activity.

Here we present FrustraEvo, a tool that measures local frustration conservation patterns within protein families as a proxy to define residues that are important either for stability or function and relate them to their sequence variability signatures. We additionally compare homologous protein families to understand how they have diversified their functional patterns from a common ancestral origin. We will showcase how FrustraEvo can shed light into the functional understanding of structurally characterized protein families as well as of poorly characterized ones, thanks to recent advances in structure predictions.

P-022: Normal Mode Analysis Applied to GPCRs
COSI: 3DSIG
  • Gabriel Tiago Galdino, Universite de Montreal, Canada
  • Oliver Mailhot, Universite de Montreal, Canada
  • Rafael Najamanovich, Universite de Montreal, Canada


Presentation Overview: Show

We employ coarse-grained normal mode analysis to calcu-late dynamical signatures of different ligand/G-protein Coupled Receptors (GPCRs) complexes. Dynamical signa-tures show changes in flexibility of different parts of the structure upon ligand binding. As a first experiment, we docked a large set of ligands with known Emax for GTP-gammaS binding to a crystal structure of the active mu (MOR) and kappa (KOR) opioid receptors, calculated the dynamical signature for each ligand and obtained predictors using multiple linear regression. We obtained a Pearson’s correlation of R=0.46 and R=0.57 in a leave-one-out vali-dation (a scenario where we present a totally new ligand to the system) and a Pearson’s correlation of R=0.8 and R=0.7 in an 80:20 validation (a best-case scenario where new molecules are like training set molecules), for MOR and KOR reactively. These results, shows that even with a limited training set, we can get good estimation of Emax of new drug candidates, therefore predicting their role as agonists, antagonists, or partial agonists computationally and potentially as part of high-throughput screening. More-over, by analyzing the coefficients of these predictors, we see what regions of the receptor have the largest influence in its activation (highlighting helices 5, 6 and the binding-site).

P-023: BioPepTool: A high throughput screening tool for discovering peptide bioactivity
COSI: 3DSIG
  • Lucas Palmeira, Universidade Estadual do Sudoeste da Bahia (UESB), Brazil
  • Elias da Silva, Universidade Federal de Uberlandia, Brazil
  • Murillo Carneiro, Universidade Federal de Uberlandia, Brazil
  • Robinson Sabino-Silva, Universidade Federal de Uberlandia, Brazil
  • Bruno Andrade, Universidade Estadual do Sudoeste da Bahia (UESB), Brazil


Presentation Overview: Show

The SARS-CoV-2 virus interacts with the host cell by binding its Spike protein with the host's ACE2 receptor by a viral receptor-binding domain (RBD). Active peptides have been used as new antiviral alternatives for inhibiting viral cell entry as new drug options. This study aimed to in silico evaluate the interaction antimicrobial peptides for blocking the interaction between the SARS-CoV-2 Spike (RBD) with the human ACE2 receptor, as well as propose new synthetic peptide molecules using machine learning and molecular modeling and docking methods. 302 peptide sequences were modeled and docked against the Spike-RBD region to do a training database. Subsequently, we trained machine learning models using Bayesian Ridge (BR), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) algorithms for predicting Spike-peptide interaction energies. Molecular docking energies ranged from -126,060 to -230,308 KJ/Mol. The BR model achieved the best results with an RSME of -14.1 using the Amino Acid Linking (AAL) resource extractor. Using a genetic algorithm assisted by a machine learning model we proposed 10 new peptides with antiviral potential against the SARS-CoV-2 with better energies in comparison to the training database. This tool and its ML algorithms can be easily applied to other emerging viruses and microorganism targets.

P-024: Population missense variants in human ACE2 strongly affect binding to SARS-CoV-2 Spike: A case study in affinity predictions of interface variants
COSI: 3DSIG
  • Stuart A. MacGowan, University of Dundee, United Kingdom
  • Michael I. Barton, Sir William Dunn School of Pathology, University of Oxford, United Kingdom
  • Mikhail Kutuzov, Sir William Dunn School of Pathology, University of Oxford, United Kingdom
  • Omer Dushek, University of Oxford, United Kingdom
  • P. Anton van der Merwe, Sir William Dunn School of Pathology, University of Oxford, United Kingdom
  • Geoffrey J. Barton, University of Dundee, United Kingdom


Presentation Overview: Show

SARS-CoV-2 infection manifests a range of clinical presentations from mild illness to life-threatening disease. As a mediator of viral entry, ACE2 is an a priori candidate genetic risk factor. The affinity of SARS-CoV-2 Spike for ACE2 is a key parameter influencing host-range and tropism and so we determined the affinities of several reported ACE2 population variants experimentally and predicted the effects of many more. We found ACE2 alleles that strongly inhibited binding to Spike and some with moderately increased affinity. Comparison to recent infectivity studies indicates that the affinity ranges of ACE2 variants can protect cells from infection and so some almost certainly confer resistance to carriers; this is now being tested with clinical data. We will also highlight the strengths and weaknesses of current generation predictors, and present new results on the interplay between ACE2 variants and different SARS-CoV-2 strains.

P-024: Population missense variants in human ACE2 strongly affect binding to SARS-CoV-2 Spike: A case study in affinity predictions of interface variants
COSI: 3DSIG
  • Stuart A. MacGowan, University of Dundee, United Kingdom
  • Michael I. Barton, Sir William Dunn School of Pathology, University of Oxford, United Kingdom
  • Mikhail Kutuzov, Sir William Dunn School of Pathology, University of Oxford, United Kingdom
  • Omer Dushek, University of Oxford, United Kingdom
  • P. Anton van der Merwe, Sir William Dunn School of Pathology, University of Oxford, United Kingdom
  • Geoffrey J. Barton, University of Dundee, United Kingdom


Presentation Overview: Show

SARS-CoV-2 infection manifests a range of clinical presentations from mild illness to life-threatening disease. As a mediator of viral entry, ACE2 is an a priori candidate genetic risk factor. The affinity of SARS-CoV-2 Spike for ACE2 is a key parameter influencing host-range and tropism and so we determined the affinities of several reported ACE2 population variants experimentally and predicted the effects of many more. We found ACE2 alleles that strongly inhibited binding to Spike and some with moderately increased affinity. Comparison to recent infectivity studies indicates that the affinity ranges of ACE2 variants can protect cells from infection and so some almost certainly confer resistance to carriers; this is now being tested with clinical data. We will also highlight the strengths and weaknesses of current generation predictors, and present new results on the interplay between ACE2 variants and different SARS-CoV-2 strains.

P-025: A deep graph learning approach to rank protein-protein interaction structures
COSI: 3DSIG
  • Mahdi Rahbar, Saint Louis University, United States
  • Sheng Wang, Anhui University, China
  • Renzhi Cao, Pacific Lutheran University, United States
  • Jie Hou, Saint Louis University, United States


Presentation Overview: Show

A high-precision protein structure prediction refers to an accurate determination of the relative positions of amino acids in three-dimensional space. The recent advance in single-chain protein structure prediction brought great promise in the computational algorithms to the protein-protein interaction. An effective protein model ranking method is essential for structure prediction approaches to select the most accurate protein structure. This study applies a deep graph neural network to score a protein complex model by utilizing the residue-level structural information in 3D space and sequence-level co-evolutionary constraints. Several protein structural features in deep graph learning are investigated for high-accurate protein quality estimation, including inter-residue distance, per-residue energies in structure, physical and chemical properties of amino acids, and co-evolutionary constraints from sequence alignment. The method models protein structure as a connected graph, in which each node represents the residues, and the edge represents the closeness between any pair of residues in a complex structure. The algorithm provides the residue-level quality estimation in terms of the local-distance difference test (lDDT) score. We trained the quality estimation algorithm on protein-protein docking benchmark version 4.0 (BM4) and improved the performance in ranking protein complex decoys compared with top-ranked protein scoring approaches.

P-026: IgStRAnD: A unified residue numbering scheme for the ubiquitous Immunoglobulin fold based on a conserved network of anchor residues
COSI: 3DSIG
  • Caesar Tawfeeq, California State University Northridge, United States
  • Tom Madej, NCBI, NIH, United States
  • Jiyao Wang, NCBI, NIH, United States
  • James Song, NCBI, NIH, United States
  • Philippe Youkharibache, NCI, NIH, United States
  • Ravinder Abrol, California State University Northridge, United States


Presentation Overview: Show

The immunoglobulin (Ig) fold is a remarkable protein fold covering all kingdoms of life and makes up the structure of antibodies as well as many signaling receptors and cell adhesion molecules. The Ig folds/domains mediate a variety of cellular functions including immune response, tissue formation, cell migration, and synapse formation to name a few. They are implicated in a range of disease mechanisms and also carry amazing therapeutic potential. Despite their significance, much remains unknown regarding their mechanisms of action in different physiological contexts. The diversity of Ig folds (C1-set, C2-set, I-set, FNIII, V-set) and the lack of a unified Ig residue numbering scheme hinders the mechanistic deconstruction of this fold, which can then be leveraged for an evolutionary, structural, and biochemical understanding of their different physiological functions as well as for the rational design of therapeutic nanobodies. In this work, we propose the IgStRAnD (Ig Strand Residue Anchors Dependent) numbering scheme that unifies the different types of Ig folds by revealing conserved residue contacts critical for the minimal Ig fold, presence of (pseudo)symmetry at the tertiary/quaternary levels in all Ig folds, and a seamless structural as well as functional connection of important residues across different Ig folds.

P-027: A molecular docking and ADMETox study of a promising aromatic compounds for development biosustanable repelents for Aedes aegypti L.
COSI: 3DSIG
  • Jailan Sousa, Universidade Estadual do Sudoeste da Bahia, Brazil
  • Bruno Andrade, Universidade Estadual do Sudoeste da Bahia, Brazil
  • Wagner Soares, Universidade Estadual do Sudoeste da Bahia, Brazil


Presentation Overview: Show

The increasing resistance of vector mosquitoes, such as Aedes aegypti L. to commercial repellents and their associated human and environmental toxicity, has driven the search and development of more selective and sustainable compounds. The aromatic compounds of natural origin are designed with the purpose of attracting insects and pollinators, presenting as a characteristic low toxicity and safety, when compared to substances of synthetic origin. A chemotheque with 3361 aromatic molecules of natural origin, obtained from aroma banks (Odor DB, EssOil DB, Super sent and Odor data) was used. After molecular docking using the Dockthor program with the odorant protein OBP from Aedes aegypti L. (PDB code: 3k1e), 188 molecules were selected with affinity energies more than -8.8kcal/mol when compared to DEET (N,N-dimethyl-meta-toluamide) commercial synthetic repellent compound. Then, 7 of these molecules with energy more than -9,0kcal/mol were submitted to ADMETox analysis (Data Warrior, PkCSM and Toxtree) where physicochemical, pharmacokinetic, toxicological and environmental biodegradability characteristics were described. According to the prediction of human risk, two molecules were selected (LBQC1281 e LBQC1154) with potential for evaluation in vivo assays in adult mosquitoes of Aedes aegypti L., with low impact on human and environmental health.

P-028: Next Generation Protein Data Bank: Enhancing Findability, Accessibility, Interoperability, and Reusability of PDB Data with Enriched Annotations
COSI: 3DSIG
  • Dennis Piehl, RCSB PDB, Rutgers University, United States
  • Zukang Feng, RCSB PDB, Rutgers University, United States
  • Stephen Burley, RCSB PDB, Rutgers University, United States


Presentation Overview: Show

The Protein Data Bank (PDB) core archive currently distributes >190,000 experimentally-determined 3D structures of biological macromolecules. This archive is managed by the Worldwide PDB (wwPDB, wwpdb.org): RCSB PDB, PDBe, and PDBj provide public access to the contents of the archive, while EMDB and BMRB house experimental EM and NMR data, respectively. As a full understanding of any macromolecular system requires knowledge of its biological and evolutionary context, each partner website independently furnishes this structural data with a set of entry-specific annotations from external resources such as UniProt, SCOPe, CATH, SIFTS, and others. While this mode of delivery offers contextual value to PDB users viewing individual structures directly online, it is not ideally suited for research involving a group of related structures or for offline work. Towards this, wwPDB is developing a Next Generation PDB repository that collates structural data with contextual annotations into downloadable files. This new architecture involves the development and implementation of service-specific APIs using a FastAPI framework and containerized packaging (as carried out by RCSB PDB), which will be presented here.

RCSB PDB and PDBe are jointly funded for this project (U.S. NSF and U.K. BBSRC). RCSB PDB core operations are funded by NSF, NIH, and DOE.

P-029: Ensemble Molecular Dynamics Simulations and Graph Neural Networks Elucidate Isoform-Specific Genetic Susceptibility and Resilience of Apolipoprotein E (ApoE) in Alzheimer's disease
COSI: 3DSIG
  • Feixiong Cheng, Cleveland Clinic, United States
  • William Martin, The Cleveland Clinic, United States
  • Ruth Nussinov, Center for Cancer Research, National Cancer Institute, United States


Presentation Overview: Show

Apolipoprotein E (ApoE) is the primary cholesterol and lipid transporting apolipoprotein in Alzheimer’s disease (AD). There are three main isoforms differing by single amino acid changes: ε3 is “neutral”, ε4 is “risk” (Cys112Arg), and ε2 is “protective” (Arg158Cys). Rare forms (Christchurch, Jacksonville) have also been proposed as “resilience” to AD. It has been proposed that a significant conformational transformation is required for lipidation; to date, only a single mutated NMR structure of full-length ε3 in a closed conformation exists, leaving unanswered questions regarding conformational differences among different APOE isoforms. Here, we have utilized multiple replicates of long-timescale (six replicates of 15 µs per isoform, 540µs in total) to generate 200 starting conformations per isoform using the AiMOS supercomputer. These were then simulated an additional microsecond with three replicates each (600 µs per isoform). In total, 4.14 milliseconds of simulation across 6 isoforms were generated. Using a graph-based implementation of VAMPNets, we have explored the conformational landscape of ApoE, using graph attention networks to probe intramolecular interactions for the different metastable states for each isoform, as well as a combination of the 6 isoforms. These insights will shed light on the structural differences between risk, neutral, and protective alleles for ApoE.

P-030: The NRGTEN and dynasigML Python packages for fast and user-friendly protein engineering
COSI: 3DSIG
  • Rafael Najmanovich, University of Montreal, Canada
  • Olivier Mailhot, University of Montreal, Canada


Presentation Overview: Show

We have developed the Elastic Network Contact Model (ENCoM), a coarse-grained normal mode analysis method unique in its sensitivity to the full chemical sequence of the studied molecule. This enables studying the impact of mutations on properties like vibrational entropy, which we demonstrated correlates with thermal stability, and the full entropic signature (entropy at each residue). When used in combination with machine learning models like LASSO regression or simple neural networks, the entropic signature captures important properties for the function of diverse biomolecules and exhibits good fit to experimental data. ENCoM is now part of the NRGTEN Python package, can be installed with a single command and run on a PDB file with as few as three lines of Python. The new dynasigML Python package contains all the functions to learn relationships between dynamics and function for any macromolecule of interest, provided there exists mutational data. These packages make use of NumPy and are very fast, taking less than 5 seconds CPU time predicting the effect of a mutation on a 250 amino acid protein. This enables the high-throughput prediction of engineered mutants to test in the lab.

P-031: Structural characterization of M protein: Insights into a higher-order oligomerization in SARS-CoV-2 envelope and evolutionary relationship with ORF3a
COSI: 3DSIG
  • Dmitry Korkin, Worcester Polytechnic Institute, United States
  • Weria Pezeshkian, University of Groningen, Netherlands
  • Fabian Grünewald, University of Groningen, Netherlands
  • Oleksandr Narykov, Worcester Polytechnic Institute, United States
  • Senbao Lu, Worcester Polytechnic Institute, United States
  • Valeria Arkhipova, Science in Motion, Russia
  • Alexey Solodovnikov, Science in Motion, Russia
  • Tsjerk A Wassenaar, University of Groningen; Hanze University of Applied Sciences, Netherlands
  • Siewert J. Marrink, University of Groningen, Netherlands


Presentation Overview: Show

Despite tremendous efforts by scientists during the COVID-19 pandemic, the exact structure of SARS-CoV-2 virus remains elusive. Membrane (M) protein is one of the four structural proteins in SARS-CoV-2 and is the most abundant protein in the virus, with estimated ~1,100 M dimers in each viral particle. Yet, the structure of M protein has not been solved. Here, we aim to develop an integrative approach to build an accurate model of M protein in its native, dimeric form and perform a structure-driven comparative analysis to discover functional and evolutionary relationship with ORF3a, another SARS-CoV-2 protein, functioning as an ion-channel. We integrated information on de novo models of M monomers, symmetric docking, experimental geometry constrains, and structure of ORF3a for domain refinement to build our M-dimer model. For comparative analysis, we built a hybrid alignment, based on the structural alignment of the two proteins and sequence alignments of their homologs in Betacoronavirus. Although ORF3a and M-dimer share poor sequence similarity, they are surprisingly similar in their structures. We found that a substantial number of functionally important residues are conserved between ORF3A and M and within their evolutionary families. Our findings demonstrate that M may be an attractive novel target for antivirals.

P-031: Structural characterization of M protein: Insights into a higher-order oligomerization in SARS-CoV-2 envelope and evolutionary relationship with ORF3a
COSI: 3DSIG
  • Dmitry Korkin, Worcester Polytechnic Institute, United States
  • Weria Pezeshkian, University of Groningen, Netherlands
  • Fabian Grünewald, University of Groningen, Netherlands
  • Oleksandr Narykov, Worcester Polytechnic Institute, United States
  • Senbao Lu, Worcester Polytechnic Institute, United States
  • Valeria Arkhipova, Science in Motion, Russia
  • Alexey Solodovnikov, Science in Motion, Russia
  • Tsjerk A Wassenaar, University of Groningen; Hanze University of Applied Sciences, Netherlands
  • Siewert J. Marrink, University of Groningen, Netherlands


Presentation Overview: Show

Despite tremendous efforts by scientists during the COVID-19 pandemic, the exact structure of SARS-CoV-2 virus remains elusive. Membrane (M) protein is one of the four structural proteins in SARS-CoV-2 and is the most abundant protein in the virus, with estimated ~1,100 M dimers in each viral particle. Yet, the structure of M protein has not been solved. Here, we aim to develop an integrative approach to build an accurate model of M protein in its native, dimeric form and perform a structure-driven comparative analysis to discover functional and evolutionary relationship with ORF3a, another SARS-CoV-2 protein, functioning as an ion-channel. We integrated information on de novo models of M monomers, symmetric docking, experimental geometry constrains, and structure of ORF3a for domain refinement to build our M-dimer model. For comparative analysis, we built a hybrid alignment, based on the structural alignment of the two proteins and sequence alignments of their homologs in Betacoronavirus. Although ORF3a and M-dimer share poor sequence similarity, they are surprisingly similar in their structures. We found that a substantial number of functionally important residues are conserved between ORF3A and M and within their evolutionary families. Our findings demonstrate that M may be an attractive novel target for antivirals.

P-032: MTL4MHC2: MHC class II binding prediction by using multi-task learning
COSI: 3DSIG
  • Kazuhiro Ikkyu, University of Tsukuba, Riken, Japan
  • Itoshi Nikaido, University of Tsukuba, Riken, Tokyo Medical and Dental University, Japan


Presentation Overview: Show

Neoepitopes (neoantigen) are cancer-specific antigens and are significant therapeutic cancer vaccine candidates. Epitopes bind the major histocompatibility complex (MHC), which is an immune receptor. Tumor neoepitopes induce an immune response to eliminate cancer cells. This immune activation depends on the affinity between antigen peptide and MHC ligand. Epitope-MHC binding assay is a technologically difficult, time-consuming, and high-expensive experiment. Therefore, the prediction tools, which predict the affinity between antigen peptide and MHC ligand, have been developed using computational approaches. However, it is insufficient data volume for predicting the epitope-MHC binding. The performance of these predictions is not enough. Here, we proposed a novel deep learning model that can predict epitope-MHC binding from a small amount of training data.
MTL4MHC2 has two multi-task Bi-LSTM models, which are the antigen peptides learning model and the MHC peptides learning model. Each multi-task model shares the learning parameters of MHC class I and II. MTL4MHC2 achieves an AUC-ROC score of 82.2%, outperforming state-of-the-art models.
We demonstrated the effectiveness of multi-task learning for improving prediction performance from low amounts of data. MTL4MHC2 can be applied to developing novel cancer therapeutics like a cancer vaccine.

P-033: NOVEL MORSE POTENTIAL TO STUDY THE Cu/C/H INTERACTIONS IN COPPER CAPPED BY GRAPHENE NANOPARTICLES FOR BIOLOGICAL APPLICATIONS
COSI: 3DSIG
  • Gabriel J Olguín-Orellana, Center for Bioinformatics, Simulations and Modeling, University of Talca, Chile
  • Germán Soldano, Faculty of Chemical Scienes, National University of Córdoba, Argentina
  • M Belén Camarada, Faculty of Chemistry and Pharmacy, Pontifical Catholic University of Chile, Chile
  • Jans Alzate-Morales, Center for Bioinformatics, Simulations and Modeling, University of Talca, Chile
  • Marcelo M Mariscal, Faculty of Chemical Scienes, National University of Córdoba, Argentina


Presentation Overview: Show

Copper nanoparticles (Cu NPs) have attracted attention due to their many biological applications for microbiology, genetic engineering, pharmacology, agriculture, and other fields. However, their high reactivity favors oxidation, corrosion and aggregation, leading them to lose their properties of interest.

Copper capped by graphene (Cu@G) NPs have also attracted the medical and industrial sector because graphene can serve as a shield to protect the Cu NPs from undesired phenomena. Additionally, It also has biocide activity and can be functionalized for different purposes, improving the Cu NPs applications.

In this work, new Morse potentials to reproduce the behavior of Cu@G NPs through molecular dynamics are reported. The interaction parameters for Cu with C and H were obtained from DFT/PWscf simulations, followed by single-point energies calculations in a good agreement between the quantum and classical technics through the new potentials. Then, they were used to evaluate the structural and thermal conductivity (k) of Cu@G NPs from 1.5 to 6.1 nm at 100 to 800 K, varying also the size, number of layers and orientation (planar or perpendicular) of the graphene sheets.

The results indicate that Cu@G NPs are stable in an argon environment. They have an improved k compared to the Cu NPs, being higher in a range of 200% to 400% at 300 K when they have one graphene layer, and upper to 500% when having a bilayer o trilayer. Finally, The size, homogeneity and orientation of the graphene sheets doesn't seem to affect the κ of the Cu@G NPs.

P-034: Topsy-Turvy: integrating a global view into sequence-based PPI prediction
COSI: 3DSIG
  • Rohit Singh, Massachusetts Institute of Technology, United States
  • Kapil Devkota, Tufts University, United States
  • Samuel Sledzieski, Massachusetts Institute of Technology, United States
  • Bonnie Berger, Massachusetts Institute of Technology, United States
  • Lenore Cowen, Tufts University, United States


Presentation Overview: Show

Computational methods to predict protein-protein interaction (PPI) typically segregate into sequence-based ""bottom-up"" methods that infer properties from the characteristics of the individual protein sequences, or global ""top-down"" methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g., AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms.

Software availability: https://topsyturvy.csail.mit.edu

P-035: GASS platform: identifying active sites and binding sites on protein structures using parallel genetic algorithms
COSI: 3DSIG
  • Vinícius Paiva, Universidade Federal de Viçosa - UFV, Brazil
  • Murillo Mendonça, Universidade Federal de Itajubá - UNIFEI, Brazil
  • Sabrina Silveira, Universidade Federal de Viçosa - UFV, Brazil
  • David Ascher, University of Melbourne, Australia
  • Douglas Pires, University of Melbourne, Australia
  • Sandro Izidoro, Universidade Federal de Itajubá - UNIFEI, Brazil


Presentation Overview: Show

Catalytic, binding and metal-binding sites are important and conserved regions of proteins. Their identification can provide important information and insights into protein function. Several computational methods have been developed to identify binding sites based on both sequence and structural information. These have, however, presented limited performance, mostly relying on structural similarity, restricting their application to small binding sites, and not being capable of handling conservative mutations or identifying inter-domain sites.

Here we present the GASS platform, a family of methods for searching similar sites in proteins based on parallel genetic algorithms. GASS was previously successfully used to search for similar catalytic and binding sites, based on templates from the Mechanism Catalytic Site Atlas (M-CSA), correctly identifying more than 90% of the catalogued catalytic sites, ranking fourth among the 18 methods in the CASP 10 competition. GASS was also compared with 8 other state-of-the-art methods for detecting metal-binding sites, outperforming similar methods and achieving an MCC of up to 0.57 and detecting up to 96% of the metal-binding sites correctly.

The GASS platform (https://gassmetal.unifei.edu.br, http://gass.unifei.edu.br/) provides accurate and easy-to-use methods that can be adapted to searching for binding sites in proteins.