- Thomas Lengauer, ISCB President, Germany
Presentation Overview: Show
Introduction of the COVID-19 Track and ISCB COVID-19 Repository
- Richard Neher
Presentation Overview: Show
The number of available SARS-CoV-2 genomes has risen rapidly from a handful in January to tens of thousands today. Since SARS-CoV-2 accumulates about 2 changes in its 30kb genome per month, these genomes allow us to retrace how the virus is spreading across the globe and how outbreaks in different parts of the world are connected. Compared to previous viral outbreaks, this genomic epidemiology is truly happening in real-time with delays between sample collection and analysis often less than two weeks. I will discuss the potential, pitfalls, and challenges in performing and interpreting such real-time analysis.
- Snehalika Lall, Indian Statistical institute, India
- Sumanta Ray, Centrum Wiskunde Informatica, The Netherlands
- Sanghamitra Bandyopadhyay, Indian Statistical Institute, India
Presentation Overview: Show
Motivation: Recent outbreak of Novel-coronavirus (SARS-CoV2) has infected millions of people throughout the world. Unfortunately, there no effective drugs are available till now due to the unavailability of proper set of SARS-CoV2-interacting human proteins, thus limiting the set of possible drugs-targets.
Method: Toward this end we have proposed a novel methodology based on graph-neighborhood sampling strategy to identify high-confidence host proteins interacting with SARS-CoV2 proteins. We compile a network consisting of SARS-CoV2 protein, their experimentally verified host interactor (CoV-host) and human proteins. The SARS-Cov2–host interactions are taken from two recent experimental studies consisting of 332 and 261 high-confidence interactions, respectively. Cov-host proteins are mapped into human-PPI interactome to get interaction information. Node2Vec sampling strategy is utilized to learn the low-dimension embeddings of nodes which is utilized to construct neighbourhood-graph. Louvain clustering is applied on it to cluster nodes into groups.
Results: The clusters contain CoV-host and their interacted host proteins. In each cluster, we take most similar nodes of the CoV-host based on the results obtained from Node2Vec. These may be treated as potential target of SARS-CoV2. We predict 148 high-confidence interactions consisting 137 host-proteins. Some predicted host proteins like AGT act as indirect interactor with Spike through ACE2-receptor.
- Matthew Raybould, University of Oxford, United Kingdom
- Aleksandr Kovaltsuk, University of Oxford, United Kingdom
- Claire Marks, University of Oxford, United Kingdom
- Charlotte Deane, University of Oxford, United Kingdom
Presentation Overview: Show
Research is ongoing around the world to create vaccines and therapies to minimise rates of COVID-19 disease spread and mortality. Crucial to these efforts are molecular characterisations of neutralising antibodies to its associated virus, SARS-CoV-2. Such antibodies would be valuable for measuring vaccine efficacy, diagnosing exposure, and developing effective biotherapeutics. Here, we describe our new database, CoV-AbDab, which already contains data on over 380 published/patented antibodies and nanobodies known to bind to at least one betacoronavirus. This database is the first consolidation of antibodies known to bind SARS-CoV-2 and other betacoronaviruses such as SARS-CoV-1 and MERS-CoV. We supply relevant metadata such as evidence of cross-neutralisation, antibody/nanobody origin, full variable domain sequence (where available) and germline assignments, epitope region, links to relevant PDB entries, homology models, and source literature. Our preliminary analysis exemplifies a spectrum of potential applications for the database, including identifying germline biases and assessing the diagnostic value of SARS-CoV binding CDRH3 sequences. Community submissions are invited to ensure CoV-AbDab is efficiently updated with the growing body of data analysing SARS-CoV-2. CoV-AbDab is freely available and downloadable on our website at http://opig.stats.ox.ac.uk/webapps/coronavirus
- Edouard de Castro, SIB Swiss Institute of Bioinformatics, Switzerland
- Christian Sigrist, Swiss Institute of Bioinformatics, Switzerland
- Arnaud Kerhornou, Swiss Institute of Bioinformatics, Switzerland
- Nicole Redaschi, SIB Swiss Institute of Bioinformatics, Switzerland
- Alan Bridge, SIB Swiss Institute of Bioinformatics, Switzerland
- Philippe Le Mercier, SIB Swiss Institute of Bioinformatics, Switzerland
Presentation Overview: Show
A worldwide pneumonia outbreak started in 2019 in Wuhan, China, caused by a new coronavirus (SARS-CoV-2) emergence from animal reservoir. The ViralZone coronavirus resource (https://viralzone.expasy.org/9056) provides annotated information on the virion, gene expression, interactome, virus replication cycle and its interaction with antiviral drugs, with links to the annotated proteome in UniProtKB. The virus replication cycle details the stage in which viral proteins are produced and their main functions in the virus life cycle. The curated interactome is built mostly by similarity with SARS-CoV (2003), and highlights how the virus binds target cells, escapes interferon signaling, silences host gene expression, and escapes the antiviral activity of Bone Stromal Antigen 2 (BST2, aka tetherin). The viral spike protein is mostly unique to SARS-CoV-2; this uniqueness explains why diagnostic methods used for SARS-CoV (2003) did not detect SARS-CoV-2. Using in-silico predictive tools like PROSITE and SWISSMODEL, we have identified potential features of the spike protein that may affect viral transmission and pathology, including a potential integrin binding site.
- Michal Linial, The Hebrew University, Israel
- Esther S Brielle, The Hebrew University of Jerusalem, Israel
- Dina Schneidman-Duhovny, The Hebrew University of Jerusalem, Israel
Presentation Overview: Show
The COVID-19 disease has plagued 210 countries with over 2 million cases and has resulted in 150,000 deaths within 3 months. To gain insight into the high infection rate of the SARS-CoV-2 virus, we compare the interaction between the human ACE2 receptor and the SARS-CoV-2 spike protein with that of other pathogenic coronaviruses using molecular dynamics simulations. SARS-CoV, SARS-CoV-2, and HCoV-NL63 recognize ACE2 as the natural receptor but present a distinct binding interface to ACE2 and a different network of residue-residue contacts. SARS-CoV and SARS-CoV-2 have comparable binding affinities achieved by balancing energetics and dynamics. The SARS-CoV-2–ACE2 complex contains a higher number of contacts, a larger interface area, and decreased interface residue fluctuations relative to SARS-CoV. These findings expose an exceptional evolutionary exploration exerted by coronaviruses toward host recognition. We postulate that the versatility of cell receptor binding strategies has immediate implications on therapeutic strategies.
- Denisa Bojkova, Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany, Germany
- Jake McGreig, University of Kent, United Kingdom
- Katie-May McLaughlin, University of Kent, United Kingdom
- Stuart Masterson, University of Kent, United Kingdom
- Marek Widera, Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany, Germany
- Verena Krähling, Institute of Virology, Biomedical Research Center (BMFZ), Philipps University Marburg, Germany, Germany
- Sandra Ciesek, German Center for Infection Research, DZIF, Braunschweig, Germany, Germany
- Mark Wass, University of Kent, United Kingdom
- Martin Michaelis, University of Kent, United Kingdom
- Jindrich Cinatl Jr, Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany, Germany
Presentation Overview: Show
COVID-19 is a global pandemic with over 4.2M infections and 287K mortalities as of May 2020, yet our understanding of the roles each viral protein plays in infection have not been fully elucidated. SARS-CoV-2, the virus causing COVID-19, is closely related to SARS-CoV, the virus that cause the SARS outbreak in 2002-2003, however the characteristics of the viruses are very different. SARS-CoV-2 is much more easily transmitted and also has a lower death rate. Our study identified a large number of amino acid positions that are differently conserved between SARS-CoV-2 and SARS-CoV, which can be used to explain these differences in clinical behaviour between the two viruses. We identified 243 differentially conserved positions (DCPs) in the spike protein, the protein responsible for viral entry, making up 19.4% of the total length of this virus protein. In addition, we found that p6 and nsp3, interferon antagonists, were enriched in DCPs (28.6% and 21.3% respectively), and we introduce our pipeline for generating these DCP profiles for proteins. Our in silico study is supported by a cell culture comparison of the two viruses that demonstrate differences between the viruses, including sensitivity to drugs and we propose that the DCPs explain these differences.
- Palash Sashittal, University Of Illinois at Urbana-Champaign, United States
- Yunan Luo, University Of Illinois at Urbana-Champaign, United States
- Jian Peng, University Of Illinois at Urbana-Champaign, United States
- Mohammed El-Kebir, University Of Illinois at Urbana-Champaign, United States
Presentation Overview: Show
In light of the current COVID-19 pandemic, there is an urgent need to accurately infer the evolutionary and transmission history of the virus to inform real-time outbreak management, public health policies and mitigation strategies. Current phylogenetic and phylodynamic approaches typically use consensus sequences, essentially assuming the presence of a single viral strain per host. Here, we analyze 621 bulk RNA sequencing samples and 7,540 consensus sequences from COVID-19 patients, and identify multiple strains of the virus, SARS-CoV-2, in four major clades that are prevalent within and across hosts. In particular, we find evidence for (i) within-host diversity across phylogenetic clades, (ii) putative cases of recombination, multi-strain and/or superinfections as well as (iii) distinct strain profiles across geographical locations and time. Our findings and algorithms will facilitate more detailed evolutionary analyses and contact tracing that specifically account for within-host viral diversity in the ongoing COVID-19 pandemic as well as future pandemics.
- Qingyu Chen, NIH, United States
- Alexis Allot, NIH, United States
- Zhiyong Lu, NCBI, United States
Presentation Overview: Show
LitCovid, a daily curated literature hub with over 12,000 relevant articles to date, was recently developed to keep scientists and healthcare professionals stay informed with the latest published research on COVID-19. Each day, LitCovid is widely accessed by users from around the world for their information needs related to the current outbreak. Initially, all data collection and literature curation were done manually with little machine assistance. However, as the outbreak evolved, it became increasingly challenging to keep up with the number of new articles each day, which grows rapidly from a handful to several hundred new articles each day. In response, we have developed and evaluated artificial intelligence (AI) & Machine Learning (ML) algorithms to provide support and improve efficiency. In this talk, we will describe how cutting-edge AI/ML research is used in support of its development from accurately classifying relevant articles to extracting entity names from free text. We will also describe remaining challenges for future research and how such an open resource is used in computational biology and bioinformatics research such as creating COVID-19 knowledge graphs and assisting drug repurposing.
- Anton Nekrutenko, The Pennsylvania State University, United States
- Sergei L Kosakovsky Pond, Temple University, United States
- Galaxy And Hyphy Developments Teams, The Pennsylvania State University, United States
- Bjoern Gruening, Uni-Freiburg, Germany
Presentation Overview: Show
The current SARS-CoV-2 pandemic and its consequences for societies world-wide make more evident than ever that science, the data it uses, analyses conducted, and results obtained through it, need to be publicly accessible, transparent and reproducible to and by the global research community and, more generally, an audience as large as possible.
Early on during the unfolding of the pandemic, Galaxy community members and collaborating researchers initiated work on https://covid19.galaxyproject.org, a cross-discipline effort to develop best-practice workflows for the transparent analysis of COVID-19-related data from public sources.
Since mid-February 2020, members of teams operating public Galaxy instances across the world and the HyPhy development team have contributed to this project by developing and refining tools, by assembling, testing and reviewing workflows, by tracking public data sources, by documenting the project's efforts, and by ensuring proper support for the various workflows from Genomics to Cheminformatics.
We will talk about outcomes of the project, including Genomic analysis and Variant detection in SARS-CoV-2, Natural Selection Analysis, a large scale fragment screen, and drug design effort, automated workflows run on new data as soon as they become available, and lessons learned during this endeavor.
- Nina Fefferman, University of Tennessee, United States
Presentation Overview: Show
Our discussion will focus on the challenge of building mathematical models to support making good policy recommendations for public health measures during the last 4 months as the COVID-19 pandemic has emerged. We will discuss how to build useful models when initial information about the threat is limited, selecting which information is needed, how to measure or infer it accurately is difficult, and interdependent factors complicate interpretation. We will root the conversation in concrete case studies of models built to support decision making in different societal domains ranging from estimating pending surge in hospital capacity demand to the role of mass incarceration in pandemic dynamics.
- Jeffrey Law, Virginia Tech, United States
- Kyle Akers
- Nure Tasnina
- Catherine M. Della Santina
- Meghana Kshirsagar, Microsoft Research, United States
- Judith Klein-Seetharaman, Colorado School of Mines, United States
- Mark Crovella, Boston University, United States
- Padmavathy Rajagopalan, Virginia Tech, United States
- Simon Kasif, Boston University, United States
- T. M. Murali, Virginia Tech, United States
Presentation Overview: Show
Motivated by the critical need to identify new treatments for COVID-19, we present a genome-scale, systems-level computational approach to prioritize drug targets based on their potential to regulate host-virus interactions or their downstream signaling targets. We adapt and specialize network label propagation methods to this end. We demonstrate that these techniques can predict human-SARS-CoV-2 protein interactors with high accuracy. The top-ranked proteins that we identify are enriched in host biological processes that are potentially coopted by the virus. We present cases where our methodology generates promising insights such as the potential role of HSPA5 in viral entry. We highlight the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. We identify tubulin proteins involved in ciliary assembly that are targeted by anti-mitotic drugs. Drugs that we discuss are already undergoing clinical trials to test their efficacy against COVID-19. Our prioritized list of human proteins and drug targets is available asa general resource for biological and clinical researchers who are repositioning existing and approved drugs or developing novel therapeutics as anti-COVID-19 agents.
- Varun Mittal, NLPCore, United States
- Naveen Garg, NLPCore, United States
- Yos Wagenmans, NLPCore, United States
- Mayuree Binjolkar, University of Washington, United States
- Rashad Hatchett, University of Washington Tacoma, United States
- Varik Hoang, University of Washington Tacoma, United States
- Emma Biggs Lanier, University of Washington Tacoma, United States
- Ling-Hong Hung, University of Washington Tacoma, United States
- Ka Yee Yeung, University of Washington Tacoma, United States
Presentation Overview: Show
We present publicly available software tools in the form of an interactive Jupyter notebook and a NLPCORE search engine with graphical visualization to automate the discovery of reliable and up-to-date scientific information about SARS-CoV-2 studies. In particular, we leverage the NLPCORE AI technology to improve the contextual reference of text-mining results by discovering biological terms and their cross-references within and across articles. These cross-references are specific to search keywords and dynamically computed using neural networks that factor in keyword frequencies, co-location offsets, part-of-speech tags, dictionaries, and expert feedback.
Using the scholarly articles in the COVID-19 Open Research Dataset (CORD-19) corpus, we submitted search keywords to NLPCORE along with suggested topics. We also compared our search results to LitCovid, a curated repository of COVID-19 literature. We achieved over 96% precision rate using each keyword “transmission”, “treatment” and “diagnosis” when compared to LitCovid.
Python Interactive Jupyter Notebook at the Kaggle Challenge website https://www.kaggle.com/varunmittalnlpcore/cord19-round1-response-by-uw-and-nlpcore
One click access for the NLPCORE search results (requires registration/login): https://search.nlpcore.com/search-results?asp=&d=1&p=cord19-dataset&q=coronavirus+transmission&rViewType=graph
- Brian Le, UCSF, United States
- Gaia Andreoletti, UCSF, United States
- Katharine Yu, UCSF, United States
- Idit Kosti, UCSF, United States
- Daniel Bunis, UCSF, United States
- Tomiko Oskotsky, UCSF, United States
- Marina Sirota, UCSF, United States
Presentation Overview: Show
The COVID-19 pandemic caused by the SARS-CoV-2 virus has had far-reaching detrimental effects worldwide. Given the current lack of effective remedies, drug repositioning is one method to substantially speed up the discovery of therapeutics to help mitigate the effects of COVID-19. Here, we leveraged a transcriptomics-based computational drug repositioning pipeline to identify existing drugs that could potentially be therapeutic in for COVID-19. This pipeline utilizes a rank-based pattern-matching method, leveraging gene expression data for both diseases and drugs, in order to identify disease-drug pairs with opposite transcriptional effects. Previously, this approach has been successfully applied to a variety of complex diseases, including dermatomyositis, inflammatory bowel syndrome, cancer and preterm birth. Using this method, we predicted drug hits for three different SARS-CoV-2 transcriptomic signatures from both cell line data and patient samples. Enriched pathways from these signatures include interferon signaling and viral response. We identified 25 drugs from the Connectivity Map database which significantly reverse at least two of the signatures. A number of these drugs have been identified by other drug repositioning methods or are in clinical trials such as haloperidol and sirolimus, indicating the therapeutic potential of the drug repurposing candidates identified by our pipeline.
- Sean O'Donoghue, CSIRO & Garvan, Australia
- Andrea Schafferhans, Technical University of Munich, Germany
- Neblina Sikta, Garvan Institute of Medical Research, Australia
- Sandeep Kaur, Garvan Institute of Medical Research, Australia
- Christian Dallago, Technical University of Munich, Germany
- Nicola Bourdin, UCL, United Kingdom
- Burkhard Rost, Technical University of Munich, Germany
Presentation Overview: Show
The COVID-19 pandemic spawned by SARS-CoV-2 (also: Human SARS Coronavirus 2) requires quick characterisation of the protein structures comprising the viral proteome. As experimentally determined 3D structures become available, these data can be augmented by high-throughput generation of homology models, thereby helping researchers leverage structural data to gain detailed insights into the molecular mechanisms underlying COVID-19. These insights, in turn, help in generating hypotheses aimed at identifying druggable targets for the development of therapies intervention, including vaccines.
We present an online resource that provides ~1,000 3D structure models, derived from all current entries in the PDB that have detectable sequence similarity to any of the SARS-CoV-2 proteins. The matching of sequence-to-sequence alignments were generated by aligning pairs of Hidden Markov Models (HMMs) via HHblits. The structures are presented in the Aquaria molecular graphics systems, which was designed to facilitate overlay of sequence features, e.g., SNPs and posttranslational modifications from UniProt. Aquaria has recently been enhanced to include a much richer set of sequence features, including predictions from the PredictProtein and CATH resources. The COVID-19 models - together with 32,717 sequence features - are available at https://aquaria.ws/covid19.