Posters

Poster presentations at ISMB 2020 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2020. There are Q&A opportunities through a chat function to allow interaction between presenters and participants.

Preliminary information on preparing your poster and poster talk are available at: https://www.iscb.org/ismb2020-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time	Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time
Bioinfo-core CAMDA COSI COVID-19 Education COSI EvoCompGen COSI Function / CAFA 4 MLCSB COSI SCANGEN (Special Session) SysMod COSI Systems Immunology (Special Session) Text Mining July 14 between 10:40 am - 2:00 pm EDT iRNA COSI	3DSIG COSI Bio-Ontologies COSI BioVis COSI CompMS COSI COVID-19 EvoCompGen COSI HitSeq COSI MICROBIOME COSI NetBio COSI RegSys COSI TransMed COSI VarI COSI General Comp Bio

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time

Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time

July 14 between 10:40 am - 2:00 pm EDT

iRNA COSI

A Chronological and Geographical Analysis of Personal of COVID-19 on Twitter

COSI: COVID-19

Ari Z Klein, University of Pennsylvania, United States
Arjun Magge, University of Pennsylvania, United States
Karen Oconnor, University of Pennsylvania, United States
Davy Weissenbacher, University of Pennsylvania, United States
Graciela Gonzalez-Hernandez, Department of Epimediology, Biostatistics, and Informatics, Perelman School of Medicine, University of Pennsylvania, United States

Short Abstract: The rapidly evolving outbreak of COVID-19 presents challenges for actively monitoring its spread. In this proof-of-concept study, we assessed a social media mining approach for automatically analyzing the chronological and geographical distribution of users in the United States reporting personal information related to COVID-19 on Twitter.

A supervised deep neural network classifier using a pre-trained BERT model was developed to identify tweets reporting potential exposure to COVID-19. We trained and tested the classifier with 10,000 manually annotated tweets classified into three categories – “probable” case, “possible” case or other. The classifier achieved F1-scores of 0.64 for the “probable” class, 0.53 for the “possible” class, and 0.68 when unifying the “probable” and “possible” classes.

We deployed the classifier on a set of 430,574 unlabeled tweets, extracting the date and location information for those labeled as possible or probable. When compared to official reports, personal reports in Twitter increased about 2 to 3 weeks earlier. This study demonstrates that (1) users do report personal information on Twitter indicating potential exposure to COVID-19, (2) these personal reports can be understood as signals of COVID-19 cases, and (3) our social media mining approach could help provide an early indication of the spread of COVID-19.

A computational approach to study the dynamics of variability in infectious pathogens

COSI: COVID-19

Sanket Desai, Advanced Centre for Treatment, Research and Education in Cancer, India
Aishwarya Rane, Advanced Centre for Treatment, Research and Education in Cancer, India
Sonal Rashmi, Advanced Centre for Treatment, Research and Education in Cancer, India
Bhasker Dharavath, Advanced Centre for Treatment, Research and Education in Cancer, India
Amit Dutt, Advanced Centre for Treatment, Research and Education in Cancer, India

Short Abstract: Introduction:
The magnitude of the number of variables in NGS data-sets present formidable computational challenges limiting its application in public health laboratories engaged in studying epidemic outbreaks.

Methods:
Infectious Pathogen Detector (IPD) works on the principle of computational subtraction, followed by quantification and analysis of pathogen traces in the NGS data. It aligns and quantifies heterogeneous data-sets such as of short- and long- read data. The pipeline integrates read quality assessment, pathogen burden normalization and genome coverage calculation. The confident reads are further used for pathogen assembly, pathogen genome variant analysis and quantification of pathogen burden from the sequenced samples. The complete pipeline, including the graphical user interface (GUI), is developed in python.

Results:
Using IPD, we present the analysis of 1275 SARS-CoV2 sequenced samples and 717 human transcriptomes along with validation of the detected pathogen using orthologous techniques. We also extend the SARS-CoV2 genomic analyses integrating mutational dynamics of variability in infectious pathogens.

Conclusion:
IPD predicts the occurrence and mutational dynamics of variability among infectious pathogens—with a potential for direct utility in the COVID-19 pandemic to help automate the NGS based pathogen analysis in responding to public health threats, in an efficacious manner.

A Pangenomic Approach to Structural Evolution of SARS-CoV-2 Genomes

COSI: COVID-19

Joann Mudge, National Center for Genome Resources, United States
Thiruvarangan Ramaraj, DePaul University, United States
Alan Cleary, National Center for Genome Resources, United States
Buwani Manuweera, Montana State University, United States
Brendan Mumey, Montana State University, United States
Indika Kahanda, Montana State University, United States

Short Abstract: With over 12,000 SARS-CoV-2 full genome assemblies available and the number growing rapidly, pangenomics approaches that avoid pairwise comparisons or even multiple sequence alignment methods will improve our ability to quickly and efficiently mine genomic sequences. Here, we present the application of our frequented region (FR) algorithm to SARS-CoV-2 pangenomic graphs. These FRs identify regions in a compressed De Bruijn graph that are co-visited by a set of supporting paths from individual sequences in the pangenome. This allows us to rapidly assess tens of thousands of SARS-CoV-2 genomes to identify conserved genomic regions (important in vaccine production) and genomic shifts over time and geographic space. We use machine learning approaches to identify nucleotide signatures that characterize geographic regions and rapid viral expansions potentially indicative of contagion patterns, and link genomic changes with health outcomes as the data becomes available. While phylogenetic trees and tracking of single nucleotide polymorphism and amino acid changes have been done with more traditional methods, pangenomics will better accommodate the rapid updating of analyses on a rapidly increasing number of genomes. Furthermore, the use of FRs brings analyses from the nucleotide level to that of genomic haplotype blocks shaped by its evolutionary history.

A toolset and resources for fast Identification of SARS-CoV-2 and other microorganisms from sequencing data

COSI: COVID-19

Shifu Chen, HaploX Biotechnology, China
Changshou He, HaploX Biotechnology, China
Yingqiang Li, HaploX Biotechnology, China
Zhicheng Li, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China
Charles Melancon, HaploX Biotechnology, China

Short Abstract: We present a toolset and related resources for rapid identification of viruses and microorganisms from short-read or long-read sequencing data. We present fastv as an ultra-fast tool to detect microbial sequences present in sequencing data, identify target microorganisms, and visualize coverage of microbial genomes. This tool is based on the k-mer mapping and extension method. K-mer sets are generated by UniqueKMER, another tool provided in this toolset. UniqueKMER can generate complete sets of unique k-mers for each genome within a large set of viral or microbial genomes. For convenience, unique k-mers for microorganisms and common viruses that afflict humans have been generated and are provided with the tools. As a lightweight tool, fastv accepts FASTQ data as input, and directly outputs the results in both HTML and JSON formats. Prior to the k-mer analysis, fastv automatically performs adapter trimming, quality pruning, base correction, and other pre-processing to ensure the accuracy of k-mer analysis. Specifically, fastv provides built-in support for rapid SARS-CoV-2 identification and typing. Experimental results showed that fastv achieved 100% sensitivity and 100% specificity for detecting SARS-CoV-2 from sequencing data; and can distinguish SARS-CoV-2 from SARS, MERS, and other coronaviruses. This toolset is available at: github.com/OpenGene/fastv.

A virtual sandbox for compartment models in epidemiology

COSI: COVID-19

Anna Hildebrandt, Mondata GmbH, Germany
Jennifer Leclaire, Institute of Computer Science, Johannes Gutenberg University Mainz, Germany
Andreas Hildebrandt, Institute of Computer Science, Johannes Gutenberg University Mainz, Germany

Short Abstract: The current Sars-CoV-2 pandemic has clearly demonstrated the importance of accurate models of the spread of infectuous diseases. Similarly, and maybe even more importantly, the pandemic has taught us the dangers of inaccurate models and insufficient validation.

In this work, we present a Julia-based framework that facilitates the development and study of epidmiological compartment models. Harnessing recent advances in scientific machine learning, the framework can insulate the user from large parts of the intricacies of numerical programming, thus leading to more stable and hopefully more realistic models. The ability to train and validate models in user-defined time periods against available real-world data leads to a more reliable evaluation phase.

Using suitable compartment models and suitably defined time periods, such models allow, e.g., to approximately quantify the effect of government-proscribed isolation measures, or voluntary degrees of self-isolation. To help understanding the value of such insights, our sandbox can attempt to identify alternative explanations for the same data set by searching for different parameter value sets that lead to a similar goodness-of-fit.

A workflow to investigate the effect of SARS-CoV-2 on intracellular signalling and regulatory pathways

COSI: COVID-19

Dezso Modos, Quadram Institute Bioscience, United Kingdom
Agatha Treveil, 1 Earlham Institute, United Kingdom
Marton Olbei, Earlham Institute, United Kingdom
Lejla Potari-Gul, Earlham Institute, United Kingdom
Padhmanand Sudhakar, Earlham Institute, United Kingdom
Balazs Bohar, Department of Genetics, Eotvos Lorand University, Hungary
Luca Csabai, Department of Genetics, Eotvos Lorand University, Hungary
Tamas Korcsmaros, Earlham Institute, United Kingdom

Short Abstract: To help understand the effect of SARS-CoV-2 on human cells, we developed a workflow to create intracellular causal networks.
First, we collected transcriptomic datasets on SARS-CoV-2 infection. Then we determined the differentially expressed genes of the dataset. We calculated which transcription factors target these differentially expressed genes using the DoRothEA transcription-factor target gene database. Then we constructed cell-type-specific networks from transcriptomic datasets and the integrated signalling network resource, OmniPath. We used the list of SARS-CoV-2 interacting proteins as sources from IntAct and the transcription factors of the differentially expressed genes as targets for the TieDIE network diffusion algorithm.
Subsequent functional analysis identified that SARS-CoV-2 has an effect on key immune pathways, such as interleukin signalling or Toll-like receptor signalling. Our analysis also pointed out that SARS-CoV-2 has effects on cell proliferation and differentiation pathways such as PI3K/AKT signalling, VEGF signalling and MAPK signalling. We identified the key proteins of the network such as ERK2, protein kinase-C-alpha or the NFKB1 transcription factor.
Our workflow highlighted additional pathways in coronavirus pathogenesis. The proposed workflow can show how the coronavirus affects various cells and organs in the body and applicable in describing other types of host-microbe interactions, not only SARS-CoV-2.

Annotation of SARS-CoV-2 in ViralZone: From proteins to virus life cycle.

COSI: COVID-19

Edouard de Castro, SIB Swiss Institute of Bioinformatics, Switzerland
Christian Sigrist, Swiss Institute of Bioinformatics, Switzerland
Arnaud Kerhornou, Swiss Institute of Bioinformatics, Switzerland
Nicole Redaschi, SIB Swiss Institute of Bioinformatics, Switzerland
Alan Bridge, SIB Swiss Institute of Bioinformatics, Switzerland
Philippe Le Mercier, SIB Swiss Institute of Bioinformatics, Switzerland

Short Abstract: A worldwide pneumonia outbreak started in 2019 in Wuhan, China, caused by a new coronavirus (SARS-CoV-2) emergence from animal reservoir. The ViralZone coronavirus resource (viralzone.expasy.org/9056) provides annotated information on the virion, gene expression, interactome, virus replication cycle and its interaction with antiviral drugs, with links to the annotated proteome in UniProtKB. The virus replication cycle details the stage in which viral proteins are produced and their main functions in the virus life cycle. The curated interactome is built mostly by similarity with SARS-CoV (2003), and highlights how the virus binds target cells, escapes interferon signaling, silences host gene expression, and escapes the antiviral activity of Bone Stromal Antigen 2 (BST2, aka tetherin). The viral spike protein is mostly unique to SARS-CoV-2; this uniqueness explains why diagnostic methods used for SARS-CoV (2003) did not detect SARS-CoV-2. Using in-silico predictive tools like PROSITE and SWISSMODEL, we have identified potential features of the spike protein that may affect viral transmission and pathology, including a potential integrin binding site.

Application of Adverse Outcome Pathway Framework to COVID-19

COSI: COVID-19

Young Jun Kim, Korea Institute of Science and Technology Europe, Germany
Penny Nymark, Karolinska Institutet and Misvik Biology, Sweden
Yong Oh Lee, KIST Europe, Germany
Clemens Wittwehr, European Commission Joint Research Centre, Italy
Brigitte Landesmann, European Commission Joint Research Centre, Italy

Short Abstract: The adverse outcome pathway (AOP) paradigm was originally developed as a knowledge management framework for capturing and disseminating mechanistic information in the context of chemical risk assessment. However, application to other fields of research can be expected to support increased understanding of disease onset and progress. Investigations of COVID-19 have demonstrated that the infection causes clusters of acute respiratory distress associated with mortality similar to SARS-CoV and MERS. Following the inhalation of viral particles, the RNA virus capsid (S) glycoprotein binds the cellular receptor angiotensin-converting enzyme 2 (ACE2) and mediates fusion of the viral and cellular membranes through a pre- to post-fusion conformation transition. The S protein binds the catalytic domain of ACE2 with high affinities. Successive outbreaks and re-emergence of similar virus infections will benefit from detailed understanding of the underlying molecular initiating and key events involved in virus-associated lung injury. The use of the AOP framework to now underpin the investigation and depiction of the mode of action of Coronaviruses from molecular to population levels has the potential to facilitate the development of novel vaccine immunogens and therapeutics for the prevention and treatment of RNA virus-associated lung injury.

Applying Natural Language Processing (NLP) techniques on the scientific literature to accelerate drug discovery for COVID-19

COSI: COVID-19

Carlos Soto, Brookhaven National Laboratory, United States
Gilchan Park, Brookhaven National Laboratory, United States
Yen-Chi Chen, Brookhaven National Laboratory, United States
Ada Sedova, Oak Ridge National Laboratory, United States
Line Pouchard, Brookhaven National Laboratory, United States
Shinjae Yoo, Brookhaven National Laboratory, United States

Short Abstract: Motivation: The ongoing COVID-19 pandemic has been accompanied by tens of thousands of scientific publications relating to all aspects of the disease, with many more added daily. This publication rate is leaving researchers unable to keep up with new findings, which they rely on to inform their work in understanding and combating SARS-Cov-2. Researchers investigating novel drug candidates with in-silico drug discovery efforts work with vast numbers of potential compounds, changing and incompletely characterized protein targets, varying cellular contexts, and limited established information. So they especially rely on scientific insights from the literature to guide their efforts toward the most fruitful directions. To keep pace with the literature, these researchers need more capable tools to get the information they seek.
Results: In ongoing work, we are developing and applying Natural Language Processing (NLP) techniques for locating and filtering relevant information from the literature, to help accelerate drug discovery research. These techniques go beyond traditional text search tools by incorporating context and semantics, supporting targeted filtering, and leveraging external resources. We apply three strategies (embedding-expanded search, sentence classification, and text similarity) in a unified platform for drug discovery researchers that supports querying the database and offline filtering of relevant topics.

Assisted classification of COVID-19 literature for Vall d’Hebron Barcelona Hospital COVID-19 Repository using supervised learning

COSI: COVID-19

Carlos-Francisco Méndez-Cruz, Center for Genome Sciences, Universidad Nacional Autónoma de México (UNAM), Mexico
Mónica Ballesteros, Hospital Universitari Vall d’Hebron, Spain
Johanna Mendivelso, Agència de Qualitat i Avaluació Sanitàries de Catalunya, Spain
Juliana Sanabria, Independent research organization, Spain
Johannes Graën, Universitat Pompeu Fabra & Universitat de Gotemburgo, Spain
Alejandra López-Fuentes, Universitat Pompeu Fabra, Spain
Fabio Rinaldi, University of Zurich, Switzerland
Julio Collado-Vides, Center for Genome Sciences, Universidad Nacional Autónoma de México (UNAM), Mexico

Short Abstract: An initiative to facilitate access to scientific literature of COVID-19 for health professionals started at the Vall d'Hebron University Hospital. This initiative aims at classifying scientific articles by branch of medicine, since emerging repositories lacked this classification. A collection of 354 articles has been manually classified into 27 branches of medicine. Classified articles have been uploaded to a free access COVID-19 repository.
As literature for COVID-19 has grown rapidly, we powered this initiative by using supervised learning techniques. We used the manually classified collection as training and test data. We obtained a classifier based on k-Nearest Neighbors classification that obtained a 0.66 of f-score in testing (precision 0.67, recall 0.67). Our classifier uses a truncated singular value decomposition strategy for dimensionality reduction from a tf-idf term document matrix.
Several branches were highly accurate classified (i.e., pediatrics, dermatology, endocrinology, oncology, psychiatry, nephrology, immunology). Then, using our automatic classifier, we classified 3300 articles into the 27 branches of medicine. This classification is considered as a pre-classification, which is validated by health professionals to assure the quality of classification before articles are upload to the repository. This manual validation will be leveraged to improve our classifier in future.

Characterization of SARS-CoV-2 viral diversity within and across hosts

COSI: COVID-19

Palash Sashittal, University Of Illinois at Urbana-Champaign, United States
Yunan Luo, University Of Illinois at Urbana-Champaign, United States
Jian Peng, University Of Illinois at Urbana-Champaign, United States
Mohammed El-Kebir, University Of Illinois at Urbana-Champaign, United States

Short Abstract: In light of the current COVID-19 pandemic, there is an urgent need to accurately infer the evolutionary and transmission history of the virus to inform real-time outbreak management, public health policies and mitigation strategies. Current phylogenetic and phylodynamic approaches typically use consensus sequences, essentially assuming the presence of a single viral strain per host. Here, we analyze 621 bulk RNA sequencing samples and 7,540 consensus sequences from COVID-19 patients, and identify multiple strains of the virus, SARS-CoV-2, in four major clades that are prevalent within and across hosts. In particular, we find evidence for (i) within-host diversity across phylogenetic clades, (ii) putative cases of recombination, multi-strain and/or superinfections as well as (iii) distinct strain profiles across geographical locations and time. Our findings and algorithms will facilitate more detailed evolutionary analyses and contact tracing that specifically account for within-host viral diversity in the ongoing COVID-19 pandemic as well as future pandemics.

Comprehensive analysis of human SARS-CoV-2 infection and host-virus interaction

COSI: COVID-19

Mariana G. Ferrarini, University of Lyon, INSA-Lyon, INRAE, BF2I, Villeurbanne, France, France
Avantika Lal, NVIDIA Corporation, Santa Clara, CA, USA, United States
Rita Rebollo, University of Lyon, INSA-Lyon, INRAE, BF2I, Villeurbanne, France, France
Andreas Gruber, Oxford Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK, United Kingdom
Andrea Guarracino, Centre for Molecular Bioinformatics, University Of Rome Tor Vergata, Rome, Italy, Italy
Itziar Martinez Gonzalez, Amsterdam UMC, Amsterdam, The Netherlands, Netherlands
Taylor Floyd, Center for Neurogenetics, Weill Cornell Medicine, Cornell University, New York, NY, USA, United States
Daniel Siqueira de Oliveira, Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon CNRS, UMR 5558, Villeurbanne, France, France
Alex Kanitz, Swiss Institute of Bioinformatics, ELIXIR Switzerland, Switzerland
Brett E. Pickett, Brigham Young University, Provo, UT, USA, United States
Vanessa Aguiar-Pulido, Center for Neurogenetics, Weill Cornell Medicine, Cornell University, New York, NY, USA, United States

Short Abstract: In December of 2019 a novel betacoronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China. This virus causes the COVID-19 disease and by May 14th, it had already infected more than four million people worldwide, accounting for 300 thousand deaths. We have performed a comprehensive gene, transcript and transposable element differential expression analysis based on the available dataset of lung cells infected with SARS-CoV-2; identified regulatory motifs that could partially explain these genome-scale expression changes upon virus infection; and predicted putative interaction sites between the viral RNA and human RNA binding proteins, which may play essential roles in regulating viral transcription, replication, and translation. We detected genes involved in general viral response, as well as specific SARS-CoV-2 deregulated genes. Many of the genes identified in this work are interesting and worthy of additional functional analysis. We suggest new avenues for research into the differential susceptibility of humans to COVID-19, and novel insights on the virulence of SARS-CoV-2, which will be helpful to the scientific community to fight this disease in the near future.

Computational Drug Discovery against Severe Acute Respiratory Syndrome Coronovirus (SARS-CoV-2) by Molecular Docking

COSI: COVID-19

Sarra Akermi, Annotation Analytics Pvt. Ltd., 36,Ward no-14, Biswa, Amrpura, Gurgaon-122001, Gugaon, India, Tunisia
Neha Lohar, Annotation Analytics Pvt. Ltd., 36,Ward no-14, Biswa, Amrpura, Gurgaon-122001, Gugaon, India, India
Anshul Nigam, Amity University Mumbai, Mumbai - Pune Expressway, Bhatan Post - Somathne, Panvel, Mumbai, Maharashtra 410206, India, India
Subrata Sinha, Centre for Biotechnology and Bioinformatics, Dibrugarh University, Assam, India, India
Surabhi Johari, Institute of Management Studies (IMSUC), Ghaziabad, Uttar Pradesh, India, India
Sunil Jayant, Annotation Analytics Pvt. Ltd., 36,Ward no-14, Biswa, Amrpura, Gurgaon-122001, Gugaon, India, India

Short Abstract: Chloroquine and HydroxyChloroquine have been successfully used against the treatment of covid 19 disease spread by coronavirus (SARS-CoV-2). However, the use of these drugs is still questionable especially because of their efficacy and side effects. We report the use of in-silico approach for high throughput screening of FDA approved antiviral drugs and plant compounds having antiviral activities against coronavirus (SARS-CoV-2). Plant compound Tocopheryl-curcumin produces more affinity for spike protein of SARS-CoV-2 as compare to FDA approved drugs. Tocopherylcurcumin binds at the binding site of RBD domain of spike protein (6VSB, chain A) with free energy (∆G) of binding of -11.2 kcal/mol and makes strong hydrogen bonds with amino acid residues of S366, V367, L368, S373, and K529. Pibrentasvir obtains top rank among FDA approved drugs with free energy (∆G) of binding of -9.6 kcal/mol. Chloroquine (-6.87 kcal/mol) and Hydroxychloroquine (-7.24 kcal/mol) obtain lower rank in our docking study. Toxicity prediction by VEGA tool predicts that tocopherylcurcumin shows no toxicity as compared to FDA approved drugs. Therefore, tocopheryl-curcumin could be used as a potential antiviral (Non toxic) drug against COVID 19 disease as compared to chemical drugs.

CoV-AbDab: the Coronavirus Antibody Database

COSI: COVID-19

Matthew Raybould, University of Oxford, United Kingdom
Aleksandr Kovaltsuk, University of Oxford, United Kingdom
Claire Marks, University of Oxford, United Kingdom
Charlotte Deane, University of Oxford, United Kingdom

Short Abstract: Research is ongoing around the world to create vaccines and therapies to minimise rates of COVID-19 disease spread and mortality. Crucial to these efforts are molecular characterisations of neutralising antibodies to its associated virus, SARS-CoV-2. Such antibodies would be valuable for measuring vaccine efficacy, diagnosing exposure, and developing effective biotherapeutics. Here, we describe our new database, CoV-AbDab, which already contains data on over 380 published/patented antibodies and nanobodies known to bind to at least one betacoronavirus. This database is the first consolidation of antibodies known to bind SARS-CoV-2 and other betacoronaviruses such as SARS-CoV-1 and MERS-CoV. We supply relevant metadata such as evidence of cross-neutralisation, antibody/nanobody origin, full variable domain sequence (where available) and germline assignments, epitope region, links to relevant PDB entries, homology models, and source literature. Our preliminary analysis exemplifies a spectrum of potential applications for the database, including identifying germline biases and assessing the diagnostic value of SARS-CoV binding CDRH3 sequences. Community submissions are invited to ensure CoV-AbDab is efficiently updated with the growing body of data analysing SARS-CoV-2. CoV-AbDab is freely available and downloadable on our website at opig.stats.ox.ac.uk/webapps/coronavirus

COVID-19 Disease Map, a systems biology resource to study molecular mechanisms of virus-host interactions

COSI: COVID-19

Marek Ostaszewski, Luxembourg Centre for Systems Biomedicine, Luxembourg
Anna Niarakis, University of Paris-Saclay, France
Inna Kuperstein, Institut Curie, France
Alexainder Mazein, Luxembourg Centre for Systems Biomedicine, Luxembourg
COVID-19 Disease Map Community

Short Abstract: Due to the ongoing COVID19 pandemic there is an urgent need to understand the nature of SARS-CoV-2 virus infection. The development of more efficient diagnosis and treatment depends heavily on a clear understanding of the multistep and multicellular processes implicated in the disease. However, to grasp the entire picture, the patched pieces of information need to be systematically collected, harmonized and combined together in an integrative picture.

The disease maps community (disease-maps.org) initiated the COVID-19 Disease Map project that aims to develop a comprehensive standardized knowledge repository of mechanisms driving the coronavirus SARS-CoV-2 interactions with the human cell. It will enable domain experts, such as clinicians, virologists, and immunologists, to collaborate with data scientists and computational biologists.

Under this initiative (fairdomhub.org/projects/190), we are developing novel bioinformatic workflow for precise formulation of COVID-19 computational models, and accurate data interpretation that has the potential to suggest drug repositioning. This workflow integrates expert knowledge of molecular mechanisms of SARS-CoV-2 infection and host cell response, databases and data, and computational modelling. This will serve a basis for a computational model for tests and simulation of the response for drugs and predictions of response depending on patient’s risk factor and predispositions.

doi:10.17881/covid19-disease-map

Further Acknowledgements:
fairdomhub.org/projects/190
covid.pages.uni.lu

CytokiNet: an interactive cytokine communication map to analyse immune responses in inflammatory and infectious diseases

COSI: COVID-19

Marton Olbei, Earlham Institute, Quadram Institute Bioscience, United Kingdom
Isabelle Hautefort, Earlham Institute, United Kingdom
Padhmanand Sudhakar, Earlham Institute, KU Leuven, United Kingdom
Agatha Treveil, Earlham Institute, Quadram Institute Bioscience, United Kingdom
Balazs Bohar, Eotvos Lorand University, Earlham Institute, Hungary
Lejla Gul, Earlham Institute, United Kingdom
Luca Csabai, Eotvos Lorand University, Earlham Institute, United Kingdom
Dezso Modos, Earlham Institute, Quadram Institute Bioscience, United Kingdom
Tamas Korcsmaros, Earlham Institute, Quadram Institute Bioscience, United Kingdom

Short Abstract: Hyper-induction of proinflammatory cytokines, also known as a cytokine storm is one of the key aspects of the currently ongoing SARS-CoV-2 pandemic. This process occurs when many innate and adaptive immune cells activate, and produce pro-inflammatory cytokines, establishing a positive feedback loop of inflammation. It contributes to the mortality observed with COVID-19 for a subgroup of patients. Understanding how this proinflammatory state forms after a SARS-CoV-2 infection will be key to developing intervention strategies.
In this project, we built a communication map between tissues and blood cells to show how the aforementioned cytokine feedback loops can build (and thus, possibly break) under pathophysiological conditions. We collated the most prevalent cytokines from literature, and assigned the proteins and their receptors to source tissue and blood cell types based on consensus RNA-Seq data from the Human Protein Atlas. Recent work from multiple groups has shown the gut to be productively infected by SARS-CoV-2. Our preliminary analyses with CytokiNet highlight SARS-CoV-2 specific cytokine responses, a subset of them originating from the intestine. Although we built the resource with primarily the COVID-19 research effort in mind, it can be applicable to other diseases, where cytokine communication patterns are relevant, e.g. inflammatory bowel disease.

Data integration promises robust and faster discovery of COVID-19 drug targets

COSI: COVID-19

Tyrone Chen, Monash University, Australia
Kim-Anh Lê Cao, University of Melbourne, Australia
Sonika Tyagi, Monash University, Australia

Short Abstract: The novel coronavirus SARS-Cov-2 continues to have adverse impacts on human health. Despite the volume of experiments performed and data available, its biology is not yet fully understood. Functional omics technologies such as high throughput sequencing and mass spectrometry allow users to capture large quantities of complex data. From these individual data modalities, it is possible to extract valuable information associated with a biological system under study, leading to new discoveries and a deeper knowledge of biology. However, combining these blocks of information can yield information that is not visible with a single data modality. To better understand this virus, we take a multi-omics integrative view of the data, combining both proteomics and translatome data. This is in contrast to existing studies which mostly focus on a single aspect of functional omics data, primarily the genome. As a result of this fragmented view, valuable information may be masked. Using a latent variable approach, our integrative pipeline unifies proteome and translatome. We compared the features of interest contributing to each biological outcome across the individual data blocks and the integrated omics data. This revealed previously invisible and potentially medically relevant features for drug development.

Deep Learning for the Prediction of Novel Coronavirus

COSI: COVID-19

Nimisha Ghosh, Siksha 'O' Anusandhan, India
Indrajit Saha, National Institute of Technical Teachers' Training and Research, India
Debasree Maity, MCKV Institute of Engineering, India
Arijit Seal, Cognizant Technology Solutions Pvt.Ltd, India
Dariusz Plewczynski, University of Warsaw and Warsaw Technical University, Poland

Short Abstract: SARS-CoV-2 generally known as COVID-19 is causing havoc worldwide. SARS-CoV-2 has high transmission rate and thus requires an early prediction and proper identification for subsequent treatment. However, SARS-CoV-2 seems to adapt and sustain in different kinds of environment, making it difficult to predict. Moreover, there are other viruses like SARS-CoV-1 and MERS of the same family Coronaviridae so a predictor is highly required to differentiate them based on their genomic sequences. To mitigate this problem, a Long Short Term Memory based predictor is proposed based on the framework of deep learning to identify an unknown sequence of these viruses. Initially, k-mer technique is applied to create motifs in order to generate Bag-of-Words and subsequent numerical sequence vector of given virus sequence. The proposed predictor is not only validated for the dataset using cross-validation partition but also for an unseen test dataset of SARS-CoV-2 sequences and sequences from SARS-CoV-1 and MERS as well. To verify the efficacy of the predictor, it has been compared with other prediction techniques based on Linear Discriminant Analysis, Random Forests and Gradient Boosting Method. The proposed predictor achieves 100% prediction accuracy on validation dataset and test datasets. It also shows superior results over other prediction techniques.

Differentially conserved amino acid positions reflect differences in SARS-CoV-2 and SARS-CoV behaviour

COSI: COVID-19

Denisa Bojkova, Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany, Germany
Jake McGreig, University of Kent, United Kingdom
Katie-May McLaughlin, University of Kent, United Kingdom
Stuart Masterson, University of Kent, United Kingdom
Marek Widera, Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany, Germany
Verena Krähling, Institute of Virology, Biomedical Research Center (BMFZ), Philipps University Marburg, Germany, Germany
Sandra Ciesek, German Center for Infection Research, DZIF, Braunschweig, Germany, Germany
Mark Wass, University of Kent, United Kingdom
Martin Michaelis, University of Kent, United Kingdom
Jindrich Cinatl Jr, Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany, Germany

Short Abstract: COVID-19 is a global pandemic with over 4.2M infections and 287K mortalities as of May 2020, yet our understanding of the roles each viral protein plays in infection have not been fully elucidated. SARS-CoV-2, the virus causing COVID-19, is closely related to SARS-CoV, the virus that cause the SARS outbreak in 2002-2003, however the characteristics of the viruses are very different. SARS-CoV-2 is much more easily transmitted and also has a lower death rate. Our study identified a large number of amino acid positions that are differently conserved between SARS-CoV-2 and SARS-CoV, which can be used to explain these differences in clinical behaviour between the two viruses. We identified 243 differentially conserved positions (DCPs) in the spike protein, the protein responsible for viral entry, making up 19.4% of the total length of this virus protein. In addition, we found that p6 and nsp3, interferon antagonists, were enriched in DCPs (28.6% and 21.3% respectively), and we introduce our pipeline for generating these DCP profiles for proteins. Our in silico study is supported by a cell culture comparison of the two viruses that demonstrate differences between the viruses, including sensitivity to drugs and we propose that the DCPs explain these differences.

Empowering Cloud Technology for SARS-CoV2 Identification

COSI: COVID-19

Hendrick Gao-Min Lim, Graduate Institute of Biomedical Informatics, Taipei Medical University, Indonesia
Yuan-Chii Gladys Lee, Graduate Institute of Biomedical Informatics, Taipei Medical University, Taiwan

Short Abstract: Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2) is the virus type that causing the novel pandemic coronavirus disease (COVID-19). The pandemic status was given by World Health Organization (WHO) on March 2020 during the rapid increase of numerous cases of infection and death outside its original site in Wuhan, China starting from December 2019 that has affected across China and worldwide. Therefore, a strategy of early accurate case identification is crucial to controlling the pandemic. Here, we used sequencing data to identify the virus and cloud technology for running analysis process. We took a total of 62 sequence reads data available on NCBI SRA database from 2 different BioProject datasets, 54 samples of SARS-CoV2 as case and 8 samples of Middle East Respiratory Syndrome Coronavirus (MERS-CoV) which has the similarity in Betacoronavirus genus level as control. We utilized Seven Bridges Cancer Genomic Cloud platform that implements metagenomics Centrifuge analysis workflow for simple, scalable, reproducible, rapid, cheap yet accurate SARS-CoV2 identification process. As the result, we had 99.99-100% of reads from the case and less than 0.04% of reads from control that classified into SARS-CoV2. It takes 13 minutes to run and costs $0.11 with Amazon Web Services r4.4xlarge spot instance.

Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing

COSI: COVID-19

Andreas Pichlmair, Institute of Virology, TUM School of Medicine, Technical University of Munich, Germany
Jan Baumbach, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Josch Pauling, LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Kevin Yuan, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Nina Wenke, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Alexey Stukalov, Institute of Virology, TUM School of Medicine, Technical University of Munich, Germany
Julian Späth, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Marisol Salgado-Albarrán, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Tim Daniel Rose, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Sepideh Sadegh, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Mhaned Oubounyt, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Reza Nasirigerdeh, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Markus List, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Tim Kacprowski, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Gihanna Galindez, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
David Blumenthal, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Julian Matschinske, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany

Short Abstract: Various studies aim to understand the molecular mechanisms of SARS-Cov-1/2 infection for predicting drug repurposing candidates. However, such information is spread across many publications and is thus time-consuming to access, integrate, explore, and exploit. CoVex is an interactive web platform for exploring the SARS-CoV-1/2 virus-host-drug interactome integrating 1) experimentally validated virus-human protein interactions, 2) human protein-protein interactions and 3) drug-target interactions. CoVex implements several algorithms for identifying drug repurposing candidates which were specifically tailored for the network medicine context and include a weighted version of TrustRank as well as a novel multi-Steiner tree method. Expert knowledge can be applied to compile a set of host or viral proteins (referred to as seeds) to start the analysis with. These seeds could be related to a relevant molecular mechanism or targeted by drugs, or correspond to differentially expressed genes. Based on the selected seeds, CoVex offers three main actions: (1) searching the human interactome for drug targets, (2) identifying repurposable drug candidates, and (3) a combination of both. Hence, CoVex allows users to mine the interactome for suitable drug targets for which, in turn, suitable drugs are identified.
The platform is available at exbio.wzw.tum.de/covex/.

FBA reveals guanylate kinase as a potential target for antiviral therapies against SARS-CoV-2

COSI: COVID-19

Alina Renz, Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, University of Tübingen, Germany
Lina Widerspick, Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, University of Tübingen, Germany
Andreas Dräger, Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, University of Tübingen, Germany

Short Abstract: The novel coronavirus (SARS-CoV-2) currently spreads worldwide, causing the disease COVID-19. The number of infections increases daily, without any approved antiviral therapy. The recently released viral nucleotide sequence enables the identification of therapeutic targets, e.g., by analyzing integrated human-virus metabolic models. Investigations of changed metabolic processes after virus infections and the effect of knock-outs on the host and the virus can reveal new potential targets.

We generated an integrated host-virus genome-scale metabolic model of human alveolar macrophages and SARS-CoV-2. Analyses of stoichiometric and metabolic changes between uninfected and infected host cells using flux balance analysis (FBA) highlighted the different requirements of host and virus.
Consequently, alterations in the metabolism can have different effects on host and virus, leading to potential antiviral targets. One of these potential targets is guanylate kinase (GK1). In FBA analyses, the knock-out of the guanylate kinase decreased the growth of the virus to zero, while not affecting the host. As GK1 inhibitors are described in the literature, its potential therapeutic effect for SARS-CoV-2 infections needs to be verified in in-vitro experiments.

Functional characterization of SARS-CoV-2 using RNA-sequencing data reveals immune-related pathways

COSI: COVID-19

Surabhi Naik, University of Memphis, United States
Akram Mohammed, University of Tennessee Health Science Center, United States

Short Abstract: Recently, we have experienced an emergence of life-threatening SARS-CoV-2 pandemic. As we still do not know enough to prevent its transmission or other features of virus, understanding its regulatory mechanism is needed. Here, we characterized the host transcriptional response of lung tissues to SARS-CoV-2 infection in vitro and in vivo systems. We used RNA-Seq of SARS-CoV-2 infected human bronchial epithelial cells, human alveolar basal epithelial cells and human lung cancer cells for in vitro model, and COVID-19 patient’s lung biopsies for in vivo models and compared their expression with uninfected counterparts to perform differential gene expression (DGE) analyses. Our DGE analysis identified 785 genes with adj. p-value < 0.01, out of which 12 genes were down-regulated and 773 genes were upregulated with a log fold change cutoff of 2. Functional analysis revealed that PI3K−Akt signaling pathway, focal adhesion, and ECM−receptor interaction were the most enriched KEGG pathways. Gene Ontology analysis revealed terms related to organelle fission and nuclear division were the top enriched biological processes. Extracellular matrix and peptidase inhibitor activity were some of the most enriched molecular functions. These genes could be a potential marker contributing to SARS-CoV-2 infection and further study is needed to investigate their role.

Genetic variation and structural prediction of the SARS-CoV-2 open reading frame 3a viroporin

COSI: COVID-19

Garrett McCue, Baldwin Wallace University, United States
Jeffrey Zahratka, Baldwin Wallace University, United States

Short Abstract: SARS-CoV-2 is the causative agent of the COVID-19 pandemic responsible for over 4,400,000 known cases and over 300,000 deaths worldwide as of May 14, 2020. SARS-CoV-2 is a 29.9 kilobase single-stranded RNA virus that encodes for multiple open reading frames including ORF3a, a protein with homology to the SARS-CoV (SARS) ORF3a ion channel subunit. Previous work has demonstrated that SARS ORF3a oligomerizes to form nonspecific cation channels in infected cells, ultimately resulting in activity-dependent apoptosis and viral propagation. Here, our group performed genomic and structural bioinformatics analysis on the SARS-CoV-2 ORF3a (CoV2-3a) locus. Analysis of 9370 sequences from GISAID revealed 166 substitutions with four common variants worldwide. Membrane topology profiling of the CoV2-3a amino acid sequence suggests the protein has three transmembrane domains and a long cytoplasmic C-terminal tail, a homologous architecture with conserved features found in the SARS ORF3a protein. Docking simulation predicts CoV2-3a assembles into tetramers, with transmembrane domains 2 and 3 forming the channel pore and individual subunits held together by a critical disulfide linkage. Taken together, these findings predict the CoV2-3a encodes an ion channel similar to that of SARS, and likely plays a critical role in viral-mediated host cell apoptosis.

Genomic Analysis Reveals the Abundance of SARS-CoV-2 strains in CoVID-19 patients

COSI: COVID-19

Olaitan Awe, University of Ibadan, Nigeria

Short Abstract: The novel coronavirus (SARS-CoV-2) is a public health concern in our world today. Certain strains of this virus seem to be life-threatening, thereby resulting in severe disease. With more than 300,000 global fatalities, it is critical to understand how SARS-CoV-2 works and which strains are actively involved in CoVID-19.

While the novel coronavirus has been gradually mutating since its outbreak in late 2019, our understanding of its diversity is limited and research studies are yet to determine how the abundance of the virus in human hosts correlates to the resulting disease severity.

In this study, we used whole genome sequences of 447 SARS-CoV-2 strains and applied statistical methods to determine specific viral strains abundant in CoVID-19 RNAseq datasets mostly from symptomatic patient samples.

Viral strains present in at least 82% of the samples were identified, making up the highly abundant strains. Results from this research revealed highly abundant SARS-CoV-2 strains in patient samples thereby suggesting that these strains play significant roles in the evolution of CoVID-19.

By combining data from this research with clinical outcomes, insight from this research can help to determine the severity of a SARS-CoV-2 infection and in the design of CoVID-19 potential drugs.

Human Gene and Disease Associations for Clinical-Genomics and Precision Medicine Research to Investigate COVID-19

COSI: COVID-19

Zeeshan Ahmed, Institute for Health, Health Care Policy and Aging Research. Rutgers, The State University of New Jersey., United States
Saman Zeeshan, Cancer Institute of New Jersey. Rutgers, The State University of New Jersey., United States
Xinqi Dong, Institute for Health, Health Care Policy and Aging Research. Rutgers, The State University of New Jersey., United States

Short Abstract: The time has never been more critical for drug-discovery data and innovative solutions development to open up its secrets for precision medicine to win the battle against Coronavirus disease 19 (COVID-19); a respiratory illness caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Despite many significant scientific and medical discoveries, the genetics of rare infectious diseases like COVID-19 remains far from clear. Our goal is to facilitate implementation of precision medicine to improve the traditional symptom-driven practice and tailoring better-personalized treatments, especially when dealing with pandemic situations. One challenge is to timely model clinical and genomics data to find statistical patterns across millions of features to identify underlying biologic pathways, modifiable risk factors and actionable information that supports early detection and prevention of COVID-19, and development of new therapies for better patient care. It is important to investigate correlation and overlapping between reported diagnoses of COVID-19 patients in clinical data with identified germline and somatic mutations, and highly expressed genes from genomics data analysis. We present a high volume clinical-genomics database accessed through an iOS application i.e. PAS, integrating information about classified diseases, genes, and germline and somatic mutations, including ACE2 and TMPRSS2, and other related to COVID-19.

Identification and Classification of Differentially Expressed Genes Reveals Potential Molecular Signatures Associated with COVID-19 Infection in Lung Adenocarcinoma Cells

COSI: COVID-19

Opeyemi Soremekun, UKZN, South Africa

Short Abstract: Genomic techniques such as next generation sequencing and microarrays have facilitated the identification and classification of molecular signatures inherent in cells upon viral infection for possible therapeutic target. We performed a differential gene expression analysis, pathway enrichment analysis and gene ontology on RNAseq data obtained from SARS-Cov-2 infected A549 cells. Differential expression analysis revealed that 753 genes were up regulated while 746 were down regulate. SNORA81, OAS2, SYCP2, LOC100506985, and SNORD35B are the top 5 upregulated genes upon SARS-Cov-2 infection. Expectedly, these genes have been implicated in immune response to viral assaults. In the Ontology of protein classification, high percentage of the genes are classified as Gene-specific transcriptional regulator, metabolite interconversion enzyme and Protein modifying enzymes. 20 pathways with P-value lower than 0.05 were enriched in the up regulated genes while 18 pathways are enriched in the down regulated DEGs. Toll-like receptor signalling pathway is one of the major pathways enriched. This pathway plays important role in innate immune system by identifying pathogen-associated molecular signature emanating from various microorganism. Our results present novel understanding into genes and corresponding pathways upon SARS-Cov-2 infection, and could facilitate the identification of novel therapeutic targets and biomarkers in the treatment of COVID-19.

Identification of human biological processes and protein sequence motifs putatively targeted by SARS-CoV-2 proteins

COSI: COVID-19

Amit Scheer, University of Ottawa, Canada
Rachel Nadeau, University of Ottawa, Canada
Dallas Nygard, University of Ottawa, Canada
Emily Roth, University of Ottawa, Canada
Soroush Shahryari Fard, University of Ottawa, Canada
Iryna Abramchuk, University of Ottawa, Canada
Yun-En Chung, University of Ottawa, Canada
Mathieu Lavallée-Adam, University of Ottawa, Canada

Short Abstract: While the COVID-19 pandemic is causing important loss of life and global disruption, knowledge of the interactions of the causative virus SARS-CoV-2 with human host cells is currently limited. Investigating protein-protein interactions (PPIs) between viral and host proteins can enable the identification of potential drug targets. We therefore performed an in-depth computational analysis of the recently published interactome of SARS-CoV-2 proteins with human proteins in infected HEK 293 cells (Gordon et al., 2020) to reveal molecular pathways that are affected by the virus and putative protein binding sites. Specifically, we performed a set of network-based functional enrichment and sequence motif discovery analyses on SARS-CoV-2-interacting human proteins and supplemented the network with high-confidence PPIs from the STRING database. Using a novel implementation of our GoNet algorithm, we identified 93 Gene Ontology terms for which the SARS-CoV-2-interacting human proteins are significantly clustered in the network. Furthermore, we present a novel protein sequence motif discovery approach that identifies amino acid motifs for which the associated host proteins are locally enriched in the network. Together, these results provide insights into the biological processes and sequence motifs that are putatively implicated in SARS-CoV-2 infection and could lead to potential therapeutic targets.

Identifying Human Interactors of SARS-CoV-2 Proteins and Drug Targets for COVID-19 using Network-Based Label Propagation

COSI: COVID-19

Jeffrey Law, Virginia Tech, United States
Kyle Akers
Nure Tasnina
Catherine M. Della Santina
Meghana Kshirsagar, Microsoft Research, United States
Judith Klein-Seetharaman, Colorado School of Mines, United States
Mark Crovella, Boston University, United States
Padmavathy Rajagopalan, Virginia Tech, United States
Simon Kasif, Boston University, United States
T. M. Murali, Virginia Tech, United States

Short Abstract: Motivated by the critical need to identify new treatments for COVID-19, we present a genome-scale, systems-level computational approach to prioritize drug targets based on their potential to regulate host-virus interactions or their downstream signaling targets. We adapt and specialize network label propagation methods to this end. We demonstrate that these techniques can predict human-SARS-CoV-2 protein interactors with high accuracy. The top-ranked proteins that we identify are enriched in host biological processes that are potentially coopted by the virus. We present cases where our methodology generates promising insights such as the potential role of HSPA5 in viral entry. We highlight the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. We identify tubulin proteins involved in ciliary assembly that are targeted by anti-mitotic drugs. Drugs that we discuss are already undergoing clinical trials to test their efficacy against COVID-19. Our prioritized list of human proteins and drug targets is available asa general resource for biological and clinical researchers who are repositioning existing and approved drugs or developing novel therapeutics as anti-COVID-19 agents.

Identifying novel SARS-CoV2–human protein interactions using graph embedding

COSI: COVID-19

Snehalika Lall, Indian Statistical institute, India
Sumanta Ray, Centrum Wiskunde Informatica, The Netherlands
Sanghamitra Bandyopadhyay, Indian Statistical Institute, India

Short Abstract: Motivation: Recent outbreak of Novel-coronavirus (SARS-CoV2) has infected millions of people throughout the world. Unfortunately, there no effective drugs are available till now due to the unavailability of proper set of SARS-CoV2-interacting human proteins, thus limiting the set of possible drugs-targets.
Method: Toward this end we have proposed a novel methodology based on graph-neighborhood sampling strategy to identify high-conﬁdence host proteins interacting with SARS-CoV2 proteins. We compile a network consisting of SARS-CoV2 protein, their experimentally veriﬁed host interactor (CoV-host) and human proteins. The SARS-Cov2–host interactions are taken from two recent experimental studies consisting of 332 and 261 high-conﬁdence interactions, respectively. Cov-host proteins are mapped into human-PPI interactome to get interaction information. Node2Vec sampling strategy is utilized to learn the low-dimension embeddings of nodes which is utilized to construct neighbourhood-graph. Louvain clustering is applied on it to cluster nodes into groups.
Results: The clusters contain CoV-host and their interacted host proteins. In each cluster, we take most similar nodes of the CoV-host based on the results obtained from Node2Vec. These may be treated as potential target of SARS-CoV2. We predict 148 high-conﬁdence interactions consisting 137 host-proteins. Some predicted host proteins like AGT act as indirect interactor with Spike through ACE2-receptor.

Identifying potential drugs for COVID-19 using a drug repositioning tool on heterogeneous network of drugs and diseases

COSI: COVID-19

Duc-Hau Le, Vingroup Big Data Institute, Viet Nam
Trang Tran, Vingroup Big Data Institute, Viet Nam
Phuong Nguyen, Thuyloi University, Viet Nam

Short Abstract: Computational drug repositioning is nowadays a widely used approach for finding new uses of existing and experimental drugs. We implemented a state-of-the-art network-based method, random walk with restart algorithm on a heterogeneous network of drugs and diseases, as an app HDR in Cytoscape platform (apps.cytoscape.org/apps/hdr) for drug repositioning. To identify potential drugs that can be repurposed for COVID-19, we constructed a chemical structure-based drug similarity network connecting 7,838 drugs. We also constructed a disease similarity network connecting 3,229 diseases based on the similarity between known associated gene sets of each disease. A total of 40 genes known to be strongly associated with COVID-19 has been recently experimentally found; meanwhile, those for other diseases were collected from OMIM. Known drug-disease associations were used to connect the two networks to form the heterogeneous one. Top 5% ranked drugs were further examined and compared with a list of 74 drugs currently undergoing clinical trials for COVID-19. Interestingly, we found ten of them in the list. We also tested the ability of HDR to recover the drugs currently in clinical trials and obtained an AUC value of 0.728. S, those indicated the potential of HDR in identifying drugs that can be repurposed for COVID-19.

Immunoinformatics Analysis and In Silico Designing of Epitope-based Polyvalent Vaccines against Multiple Strains of Human Coronavirus (HCoV)

COSI: COVID-19

Bishajit Sarkar, Jahangirnagar University, Savar, Dhaka, Bangladesh., Bangladesh
Md. Asad Ullah, Jahangirnagar University, Savar, Dhaka, Bangladesh., Bangladesh
Yusha Araf, Shahjalal University of Science and Technology, Sylhet, Bangladesh., Bangladesh
Nafisa Nawal Islam, Jahangirnagar University, Savar, Dhaka, Bangladesh., Bangladesh
Umme Salma Zohora, Jahangirnagar University, Savar, Dhaka, Bangladesh., Bangladesh

Short Abstract: Background: The group of human coronaviruses (HCoVs) consists of some highly pathogenic viruses that have caused several outbreaks in the past and the newly emerged strain, SARS-CoV-2 is responsible for the recent global pandemic which has already caused many deaths around the world due to lack of effective therapeutic options. In this study, the methods of immunoinformatics were used to design three epitope-based polyvalent vaccines which are expected to be effective against four different pathogenic strains of HCoV i.e., HCoV-OC43, HCoV-SARS, HCoV-MERS and SARS-CoV-2.
Methods and Results: The constructed vaccines contained highly antigenic, non-allergenic and non-toxic T-cell and B-cell epitopes from all the four viral strains with appropriate linkers and adjuvants, therefore, they should be able to provide strong protection against all the four strains. Protein-protein docking was performed to predict the best vaccine construct. Later, the MD simulation and immune simulation of the best vaccine construct predicted that the vaccine might be quite stable and expected to generate good humoral and cell-mediated immune responses. Finally, in silico cloning was performed to develop a higher expression strategy of the vaccine.
Conclusion: The predicted polyvalent vaccine might be a good preventative measure against the four selected HCoV strains.

In silico approach of Jergon Sacha - Dracontium loretense metabolites against SARS-CoV-2 main protease

COSI: COVID-19

Haruna Barazorda Ccahuana, Universidad Católica de Santa María, Peru
Luis Goyzueta Mamani, Universidad Católica de Santa María, Peru

Short Abstract: The pandemic of SARS-CoV-2 has already infected over 2 million people and took
over 150 000 lives worldwide. Currently, this global threat against the health of every nation is known for its high reproduction number and lack of drugs for treatments or vaccines. This issue has motivated researchers all over the world to discover vaccines or study the use of drugs that were already approved for the treat of other diseases. In an attempt to find the inhibition of the main protease (Mpro) of this new virus, this study aimed to contribute to the discovery of novel potential substances, proposing the use of the plant Dracontium loretense (Jergon Sacha) as a promising candidate against SARS-CoV-2, due to the positive pharmacokinetic characteristics of their metabolites and effectiveness based on the prediction of the binding energy to the main protease and molecular docking, suggesting further in vitro and in vivo studies. Also, a comparison of effectiveness and pharmacokinetics among chloroquine derivates was carried out as well as a comparison between the main protease of SARS-CoV-2 and SARS-CoV-1 to find affinity by bioinformatic techniques.

Inference and analysis of the global transmission network of SARS-CoV-2 prior to the pandemic declaration

COSI: COVID-19

Pelin Burcak Icer, Georgia State University, United States
Alexander Kirpich, University of Florida, United States
Fatemeh Mohebbi, Georgia State University, United States
Alex Zelikovsky, GSU, United States
Serghei Mangul, University of Southern California, United States
Gerardo Chowell-Puente, Georgia State University, United States
Pavel Skums, Georgia State University, United States

Short Abstract: The pandemic caused by the SARS-CoV-2 virus is straining health systems around the world. Although the Chinese government implemented severe restrictions on people's movement in an attempt to contain its initial spread, the virus had reached the majority of countries, partially due to its potent transmissibility and frequency of asymptomatic cases. As the pandemic continues, understanding its global transmission network properties is essential. The goal of this study is to characterize the network associated with the establishment of the pandemic.

We employ molecular surveillance data for inference and analysis of SARS-CoV-2 global transmission network, and exploit an algorithmic approach specifically tailored to emerging outbreak settings. It traces accumulation of viral genomic heterogeneity via mutation trees, that are then transformed into transmission networks.

The analysis suggests multiple introductions of SARS-CoV-2 into the majority of regions via heterogeneous transmission pathways. The transmission network is scale-free, with few genomic variants responsible for the majority of transmissions. The network structure is in line with the temporal data and suggest the expected sampling time difference of few days between potential transmission pairs. The findings emphasize the extent of the global epidemiological linkage and demonstrate importance of internationally coordinated containment measures.

Influence of COVID-19 Transmission Stages and Demographics on Length of In-Hospital Stay in Singapore for the First 1000 Patients

COSI: COVID-19

Jaya Sreevalsan-Nair, Graphics-Visualization-Computing Lab, IIIT Bangalore, India
Reddy Rani Vangimalla, International Institute of Information Technology - Bangalore, India
Pritesh Rajesh Ghogale, International Institute of Information Technology, Bangalore, India

Short Abstract: The COVID-19 outbreak started in December 2019 in Wuhan, China, which as of mid-May 2020, has infected ~4.5 million people from 213 countries, of which ~1.7 million patients have recovered. Each country has taken into consideration their specific socio-economic priorities to announce national efforts in containing the pandemic. The lessons learned from epicenters of the disease in Asia and Europe, and experiences from the SARS-COV1 outbreak in 2003 in China, Singapore, South Korea, etc. have helped in strategizing handling of the current pandemic. We discuss a retrospective study of the first 1000 patients in Singapore to analyze government strategies. While mathematical epidemiological models have been proposed to understand the trends of transmission of the pandemic, we study the factors that influence the length of in-hospital stay (LoS) of the discharged patients. Analyzing recovery helps us to understand how effective the efforts taken by the government have been as well as how the hospital load is affected in different transmission stages. We find the transmission stage and age influence LoS and it has been effectively reduced to an estimate of 9 days, based on the regression model.

Integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of SARS-CoV-2 viral entry and highlights inflammatory programs in putative target cells

COSI: COVID-19

Elena Torlai Triglia, Broad Institute of MIT and Harvard, United States
Human Cell Atlas Lung Biological Network, Human Cell Atlas, United States
Aviv Regev, Broad Institute of MIT and Harvard, United States
Martijn C Nawijn, University Medical Center Groningen, Netherlands
Fabian J Theis, Helmholtz Center Munich, Germany
Daniel T Montoro, Broad Institute of MIT and Harvard, United States
Hattie Chung, Broad Institute of MIT and Harvard, United States
Adam L Haber, Harvard T.H. Chan School of Public Health, United States
Jian Shu, Broad Institute of MIT and Harvard, United States
Sijia Chen, Harvard Medical School, United States
Justin Buchanan, University of California-San Diego School of Medicine, United States
Brian Lin, Massachusetts General Hospital, United States
Peiwen Cai, Icahn School of Medicineat Mount Sinai, United States
Meshal Ansari, Helmholtz Center Munich, Germany
Christoph Muus, Broad Institute of MIT and Harvard, United States
Evgenij Fiskin, Broad Institute of MIT and Harvard, United States
Elizabeth Thu Duong, University of California San Diego, United States
Karthik Jagadeesh, Broad Institute of MIT and Harvard, United States
Christopher Smilie, Broad Institute of MIT and Harvard, United States
Ayshwarya Subramanian, Broad Institute of MIT and Harvard, United States
Eeshit Dhaval Vaishnav, Broad Institute of MIT and Harvard, United States
Yoshihiko Kobayashi, Duke University Medical School, United States
Lisa Sikkema, Helmholtz Center Munich, Germany
Graham Heimberg, Broad Institute of MIT and Harvard, United States
Avinash Waghray, Massachusetts General Hospital, United States
Gokcen Eraslan, Broad Institute of MIT and Harvard, United States
Malte D Luecken, Helmholtz Center Munich, Germany

Short Abstract: The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, creates an urgent need for identifying molecular mechanisms that mediate viral entry, propagation, and tissue pathology. Here, we assess the cell type-specific RNA expression of mediators of SARS-CoV-2 cellular entry (ACE2, TMPRSS2, and CTSL) through an integrated analysis of 107 single-cell and single-nucleus RNA-Seq studies. Joint expression of ACE2 and the accessory proteases identifies specific subsets of epithelial cells as putative targets of viral infection in the respiratory system. Cells that co-express ACE2 and proteases are also identified in cells from other organs, some of which have been associated with COVID-19 transmission or pathology, including gut enterocytes, corneal epithelial cells, cardiomyocytes, heart pericytes, olfactory sustentacular cells, and renal epithelial cells. Performing the first meta-analyses of scRNA-seq studies, we analyzed 1,176,683 cells from 164 donors spanning fetal, childhood, adult, and elderly age groups, associate increased levels of ACE2, TMPRSS2, and CTSL in specific cell types with increasing age, male gender, and smoking, all of which are epidemiologically linked to COVID-19 susceptibility and outcomes. Taken together, we demonstrate the power of aggregating diverse, single-cell datasets of healthy tissues to direct the study of disease.

Integrated data analysis uncovers new Covid-19 related genes and potential drug re-purposing candidates

COSI: COVID-19

Noel Malod-Dognin, Barcelona Supercomputing Center (BSC), Spain
Alexandros Xenos, Barcelona Supercomputing Center (BSC), Spain
Carme Zambrana, Barcelona Supercomputing Center (BSC), Spain
Natasa Przulj, Barcelona Supercomputing Center (BSC), Spain

Short Abstract: The current pandemic of Covid-19 is an acute and rapidly developing global health crisis. To understand the molecular basis of this disease, we go beyond traditional biological network analysis and build upon the recently proposed concept of an integrated cell, iCell, which fuses three omics, tissue-specific molecular interaction network types of human into a unified model of a cell.
We apply the iCell methodology to construct the iCells of two Covid-19 infected human lung cell lines, A549 and NHBE, as well as of the corresponding controls. The comparison between infected and control iCells allows us to uncover the most rewired genes in Covid-19 infection. The top 20 most rewired genes between infected and control iCells of both cell lines, which could not have been identified using differential gene expression analysis, are likely to be relevant to Covid-19 infection, since they are related to human immune response and are all in the neighborhoods of the human proteins that bind to SARS-Cov-2 proteins.
Because only two of our newly identified genes have known drugs targeting them, we apply a second step of data fusion using known drug target interactions and drug chemical similarities to predict potential novel drug re-purposing candidates.

Investigation of the genetics of COVID-19 co-morbidities reveals genes and phenotypes coincident with the SARS-Cov-2 viral disease

COSI: COVID-19

Judith Blake, The Jackson Laboratory, United States
David Hill, The Jackson Laboratory, United States
Gaurab Mukherjee, The Jackson Laboratory, United Kingdom
Monica McAndrews, The Jackson Laboratory, United States
Mary Dolan, The Jackson Laboratory, United States
Elissa Chesler, The Jackson Laboratory, United States

Short Abstract: The emergence of the SARS-Cov-2 virus and subsequent COVID-19 pandemic kick-started intense inquiry into the mechanisms of action for this virus. COVID-19 presents more seriously in conjunction with other human disease conditions such as hypertension, diabetes, and lung disease.

Data analysis platforms such as GeneWeaver(GW) support comparative functional analysis through the statistical evaluation of gene sets against the extensive collection of high-quality gene sets within the GW knowledgebase. We conducted a bioinformatics analysis of COVID-19 co-morbidity gene sets, identifying genes and pathways shared among the comorbidities, evaluating the ability of these genes and pathways to inform further laboratory investigations.

We use five sets integrated into GW from MeSH that represent COVID-19 comorbidity: Cardiovascular Disease, Diabetes, Hepatitis, Lung Disease, and Kidney Disease. We identified 8 genes in common among all of them. There are 123 genes that are shared by at least 4 of these gene sets. Functional analysis of the shared genes revealed significant enrichment in immune-system phenotypes as well as for cardiovascular-relevant phenotypes. Enriched biological processes also identified immune-related significance. We also identified enriched human pathways shared by these co-morbidity datasets. We will report on comparing these data with emerging gene sets and data directly associated with COVID-19 disease.

Mapping Genetic Variability onto SARS-CoV-2 Protein Crystal Structures

COSI: COVID-19

Setayesh Yazdani, Structural Genomics Consortium, Canada
Matthieu Schapira, Structural Genomics Consortium, Canada

Short Abstract: The use of broad-spectrum small-molecule inhibitors for viral proteins is bound to only work if there is limited structural diversity that affects drug-protein interactions. Thousands of SARS-CoV-2 sequences are available from public repositories (GISAID) database, and over 100 high-resolution crystal structures of SARS-CoV-2 proteins are available from the Protein DataBank. Mapping the genetic variability of SARS-CoV-2 sequences onto their crystal structures at the ligand-binding sites identifies genetically stable binding pockets that are less likely to mutate and reveal the most promising chemical inhibition strategies against SARS-CoV-2 viral proteins. This approach can broadly apply to SARS-like and MERS-like coronaviruses circulating in bat reservoir species with the potential to transmit to humans, which would make them relevant for future outbreaks. For over 90,000 publicly available coronavirus sequences, mapping their genetic variability onto proteins crystal structures at their ligand-binding pockets can help prioritize strategies for developing inhibitors. I will present the druggability and the genetic variability of the SARS-CoV-2 proteins and other coronaviruses at their binding sites. I will also highlight the critical non-conserved residues at the ligand-protein interactions sites.

PACIFIC: Deep learning classifier of SARS-CoV-2 and co-infection sequences

COSI: COVID-19

Pablo Acera Mateos, John Curtin School of Medical Research, Australian National University, Australia
Hardip Patel, John Curtin School of Medical Research, Australian National University, Australia
Renzo Balboa, John Curtin School of Medical Research, Australian National University, Australia
Eduardo Eyras, John Curtin School of Medical Research, Australian National University, Australia

Short Abstract: Recent studies have reported that viral co-infections may alter disease severity in Covid-19 patients. The current standard of viral detection is based on PCR/RT-PCR directed to SARS-CoV-2, which will miss possible co-infecting viruses. High-throughput sequencing (HTS) assays provide additional opportunities to detection co-infections in SARS-CoV-2 affected individuals. As a result, we developed PACIFIC (deeP learning clAssifier of SARS-CoV-2 and co-InFectIon sequenCes) to enable the detection of co-infecting viruses in HTS assays. PACIFIC uses a natural language processing deep learning design to classify sequencing reads into 5 different virus groups: Coronaviridae, Influenza, Metapneumovirus, Rhinovirus, SARS-CoV-2 and Human, and can identify SARS-CoV-2 and 361 other viruses at concentration as low as 0.03% with high specificity. Using samples simulated with a variety of conditions, PACIFIC shows 99% sensitivity and 98.9% specificity, and accurately recovers the relative concentrations of viruses in a sample (r=0.99). When tested in 4 in vitro human samples infected with SARS-CoV-2, PACIFIC correctly identified the presence of the virus in each case. PACIFIC is an end-to-end, fast and easy to use tool that will enable researchers to develop new tests for monitoring viral infections and co-infections to effectively manage the current global pandemic.

POCOVID-Net: Automatic Detection of COVID-19 From a New Lung Ultrasound Imaging Dataset (POCUS)

COSI: COVID-19

Jannis Born, ETH Zurich, Switzerland
Gabriel Brändle, Pediatric Emergencies Department, Geneva, Switzerland
Manuel Cossio, Biomedical Research Institute August Pi i Sunyer, Barcelona, Spain
Marion Disdier, N.A., Switzerland
Julie Goulet, Physik Department and Bernstein Center for Computational Neuroscience, Technische Universität München, Germany
Jeremie Roulin, N.A., Switzerland
Nina Wiedemann, ETH Zurich, Switzerland

Short Abstract: With the rapid development of COVID-19 into a global pandemic, there is an urgent need for cheap, fast and reliable tools that assist physicians in diagnosing COVID-19. Medical imaging can take a key role in complementing conventional diagnostic tools. Using CT or X-ray scans several deep learning models were demonstrated promising performances.
Here, we present the first framework for COVID-19 detection from ultrasound. Ultrasound is cheap, portable, non-invasive and ubiquitous in medical facilities.
Our contribution is threefold.
First, we gather a lung ultrasound dataset consisting of 1103 images (654 COVID-19, 277 bacterial pneumonia and 172 healthy controls). This dataset is by no means exhaustive, but we processed it to feed deep learning models and make it publicly available, thus delivering a starting point for an open-access initiative of lung ultrasound datat. Second, we train a deep convolutional neural network (POCOVID-Net) in a 5-fold cross validation on this data and achieve an accuracy of 89%, and, for COVID-19, a sensitivity of 0.96 (speciﬁcity 0.79).
Third, we provide an open-access web service at: pocovidscreen.org. The website deploys not only the predictive model but also offers a data-sharing interface, simplifying data contribution for researchers and physcians. Dataset and code are available from: github.com/jannisborn/covid19_pocus_ultrasound

Predicting the Severity of COVID19 Infection Through the Concentration of Interleukin-6 Cytokine as a Primary Inflammatory Marker

COSI: COVID-19

Yvonne An, Magnus Center of Ethics, Science and Philosophy, Philippines
Jaewon Chang, Magnus Center of Ethics, Science and Philosophy, Philippines

Short Abstract: Literature studies have reported that clinical presentation and pathology of the novel coronavirus resembled SARS and MERS in regards to the increased level of Interleukin-6 through cellular transcription. This study aims to explore the clinical value of C-reactive Protein and IL-6, as primary inflammatory markers to assess the severity of 2019-nCoV. We perform meta-analysis to investigate the epidemiology of the SARS-CoV-2 strain and the clinical manifestation of C-RP. We review recently published literature databases, including PubMed, NCBI, and ResearchGate, until April 30, 2020, constructing a meta-analysis model consisting of five age-subgroups (0-15, 16-30, 31-45, 46-60, 60+). Non-standardized age-subgroup analysis was performed to evaluate the strength magnitude of correlation between the two variants, C-RP content and prognosis. As unabated cytokine storms have shown to correlate with greater risks of chronic diseases, C-RP blood test (a type of inflammatory marker) may prove to be beneficial as a measure of detecting IL-6 content within a human body. With meta-analysis, a definite association between COVID-19 severity and inflammation was recognized. Hence, this study recommends the usage of inflammatory markers as a guideline for the optimal allocation of COVID-19 antibody test kits in the United States where a dearth of these resources exists.

Prioritizing important regions of SARS-CoV-2 genome using community detection

COSI: COVID-19

Ali Rahnavard, George Washington University, United States
Keith A Crandall, George Washington University, United States
Marcos Perez-Losada, George Washington University, United States
Himel Mallick, Merck & Co., Inc., United States
Suvo Chatterjee, National Institute of Health, United States

Short Abstract: The SARS-CoV-2 virus (CoV) is the etiological agent of COVID-19. The virus will evolve solutions to both host immune systems and intervention strategies. In order to diminish both the short-term and long-term impacts of CoV, it is essential to develop a transformative, robust, repeatable, and accessible informatics infrastructure to analyze the diversity of data becoming available in the face of the COVID-19 pandemic. We present a novel approach m2clust, for community detection in omics data incorporating similarities between measurements and the overall structure of the data. We validated m2clust in diverse multi-omics datasets, revealing new communities and groups between coronavirus strains and microbial strains with enrichment scores explained by metadata. We have applied m2clust on CoV genome distance to investigate viral strain diversity in relation to clinical and epidemic data. m2clust allows us to characterize the dynamic nature of mutations in the virus and test for associations with clinical variables and omic biomarkers - transforming traditional approaches of association studies and biomarker discovery. Our approach allows prioritization of genes associated with outcome predictors, including health, therapeutic, and vaccine outcomes, as well as inform improved DNA tests for predicting disease status and severity. The software and documentation are available at github.com/omicsEye/m2clust.

Rapid analysis of SARS-CoV-2 genomic content

COSI: COVID-19

Kristen Beck, IBM Research, United States
Akshay Agarwal, IBM Research, United States
Simone Bianco, IBM Research, United States
Gowri Nayar, IBM Research, United States
Harsha Krishnareddy, IBM Research, United States
Hakan Bulu, IBM Research, United States
James Kaufman, IBM Research, United States
Vandana Mukherjee, IBM Research, United States
Edward Seabolt, IBM Research, United States

Short Abstract: During the current global pandemic, SARS-CoV-2 sequencing efforts ramped up to address the needs of this unprecedented public health emergency with thousands of new genome sequences deposited daily. However, the collection, curation, and annotation of this data is a critical bottleneck of clinical and non-clinical research. To address this issue, we greatly expanded capabilities of our Functional Genomics Platform. This research tool automatically detects genes, proteins, and functional domains from microbial genome sequences. In response to the COVID-19 pandemic, we collected and processed 12,504 SARS-CoV-2 genomes stored in public repositories to yield 1,708,627 sequences across aforementioned biological entities. We automatically update our annotations as new genomes are sequenced. We identify a core set of 19 structural and non-structural proteins— some are highly conserved and others represent emerging variants. We observe 96 median variants for each protein. Replicase polyprotein 1a and 1ab are observed to exhibit the highest rate of mutation. In contrast, non-structural protein 6 is far more conserved with only 8 sequence variants in this collection. Additionally, we identify enriched motifs in protein and domain sequences to build a loss-less representation of sequence frequency vectors. Our platform may represent an important tool to accelerate the research against emerging pathogens.

Rapid and accurate classification of SARS-CoV-2 using genomic signature analysis with Machine Learning

COSI: COVID-19

Gurjit S. Randhawa, University of Western Ontario, Canada
Maximillian P.M. Soltysiak, University of Western Ontario, Canada
Hadi El Roz, University of Western Ontario, Canada
Camila P.E. de Souza, University of Western Ontario, Canada
Kathleen A. Hill, University of Western Ontario, Canada
Lila Kari, University of Waterloo, Canada

Short Abstract: At times of major viral outbreaks, early elucidation of taxonomic classification and origin of the virus genomic sequence can assist in strategic planning, containment, and treatment. We identify an intrinsic COVID-19 virus (SARS-CoV-2) genomic signature and use it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 virus genomes. We analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp, including the early published 29 COVID-19 virus sequences. Our method classifies the COVID-19 virus sequences with 100% classification accuracy, using raw genomic sequences alone, within a few minutes. Our results support a hypothesis of a bat origin and classify the SARS-CoV-2 as Sarbecovirus, within the Betacoronavirus genus. The proposed approach being alignment-free bypasses altogether the complexity involved in the annotations and additional biological information that are necessary requirements for alignment-based methods or clinical analyses. Our results suggest that the proposed machine learning-based alignment-free approach can provide a reliable real-time option for taxonomic classification of novel viral and pathogen genomic sequences.

Rapid Mining of Scientific Literature to Enhance Drug Repurposing Process for COVID-19

COSI: COVID-19

Luis Tari, --, United States
Xiaoxi Ouyang, --, United States
James Cai, --, United States

Short Abstract: When faced with the challenge to quickly identify existing drugs to be repurposed for COVID-19 treatment, a critical step is to compile a comprehensive list of known drugs that modulate human proteins with key roles in viral infection. Beyond the specialized knowledge of domain experts, the most direct approach is to consult structured knowledge bases such as DrugBank and PharmGKB for drug-target information. However such knowledge bases rely on human curation and may not cover the most current knowledge especially in a crisis situation. In this case, the scientific literature provides the most comprehensive and up-to-date information, albeit in an unstructured text format. Here we demonstrate that we can extract drugs from literature using a bootstrapping information extraction (IE) approach starting from a small number of drug-protein interaction examples. We compared our results with curated knowledge bases, as well as cheminformatics searches used in SARS-CoV-2 protein interaction map by Gordon et. al. We found that in each case our approach was able to identify additional drugs that were not found with the other methods. We believe that our IE approach can complement other methods in our collective effort to generate comprehensive knowledge needed to fight COVID-19 and future viral pandemics.

Semantically-enriched environment for COVID-19 literature exploration

COSI: COVID-19

Oscar William Lithgow Serrano, Dalle Molle Institute for Artificial Intelligence Research (IDSIA), Switzerland
Alejandra López-Fuentes, Center for Genomic Sciences (CCG), UNAM, Mexico
Yalbi Balderas-Martínez, Instituto Nacional de Enfermedades Respiratorias (INER), Mexico
Fabio Rinaldi, Dalle Molle Institute for Artificial Intelligence Research (IDSIA), Switzerland
Julio Collado-Vides, Center for Genomic Sciences (CCG), UNAM, Mexico

Short Abstract: We present an environment for analysis of scientific literature related to COVID-19, which offers various modalities for aggregation of information across multiple papers. The amount of publications, clinical notes, experiments, and other observations generated in relation to the COVID-19 emergency is growing at a staggering rate. Therefore doctors, researchers and public authorities struggle to integrate such fragmented knowledge, in order to find the information most urgently needed to perform their respective duties.
Our aim is to provide a web tool where COVID-19 literature is enriched by NER annotations, semantic network, extractive summarization, offering text and NER searches, and, especially, a reading strategy powered by semantic hyperlinks among sentences.

NER capabilities are provided by OGER, a state-of-the-art biomedical NER annotator. OGER in turn depends on the Bio Term Hub, an aggregator of biomedical terminologies which are dynamically sourced from the major life science databases.
We are using semantic textual similarity (STS) to facilitate the navigation of related knowledge statements across the literature.
We have already processed the LitCovid collection from the NIH, and plan to continue adding collections from other sources.
The system is partly derived from the L-Regulon project, which was initially targeted for the transcriptional regulation literature.

Sequence determinants of SARS-CoV-2 receptor specificity: insights from molecular evolution

COSI: COVID-19

Rosalba Lepore, BSC-CNS Barcelona Supercomputing Center, Spain
Camila Pontes, BSC-CNS Barcelona Supercomputing Center, Spain
Victoria Ruiz Serra, BSC-CNS Barcelona Supercomputing Center, Spain
Alfonso Valencia, BSC-CNS Barcelona Supercomputing Center, Spain

Short Abstract: The relationship between protein family segregation and their functional organisation has been extensively investigated for decades and a variety of computational methods have been developed to infer their evolutionary link at the residue level. It is therefore relatively straightforward to identify the amino acid positions that modulate the functional specificity of a given enzyme towards a substrate or cofactor or the binding specificity of a protein-ligand or protein-protein interaction by the analysis of evolutionary conservation patterns within the MSA of a protein family. Here we apply this concept to the analysis of the evolutionary patterns in a MSA of the SARS-CoV-2 spike and related protein sequences belonging to the β-CoV genus. The analysis is based on a vectorial representation of protein sequences and amino acid positions in a multidimensional space to simultaneously identify the family segregation and the residue positions that better explain the sources of variation of the family. By unsupervised analysis of the β-CoV spike protein family, we identify patterns of sequence variations, i.e. specificity determining positions (SDPs), that correlate with the binding specificity to different host receptors, in a way that is consistent and reproducible within different β-CoV lineages.

Significantly Improved COVID-19 Outcomes in Countries with Higher BCG Vaccination Coverage: A Multivariable Analysis

COSI: COVID-19

Danielle Klinger, The Hebrew University of Jerusalem, Israel
Ido Blass, The Hebrew University of Jerusalem, Israel
Nadav Rappoprt, Ben Gurion University of the Negev, Israel
Michal Linial, The Hebrew University of Jerusalem, Israel

Short Abstract: COVID-19 has spread to 210 countries within 3 months. We tested the hypothesis that the vaccination with BCG correlates with a better outcome for COVID-19 patients. Our analysis covers 55 countries, complying with predetermined thresholds on population size and deaths per million (DPM). We found a strong negative correlation between the years of BCG admission and a lower DPM along with the pandemic progression, substantiated in a multivariable analysis. Analyzing countries according to an age-group partition reveal that the strongest correlation is attributed to the coverage in BCG vaccination of the young population (<25 years). We propose that BCG immunization coverage, especially among the most recently vaccinated contributes to attenuation of the spread and severity of the COVID-19 pandemic.

Swiss-wide Longitudinal COVID-19 Health Survey (SloCo)

COSI: COVID-19

Natalie Davidson, ETH Zürich, Switzerland
Andre Kahles, ETH Zurich, Switzerland
Gunnar Ratsch, ETH Zürich, Department for Computer Science, Switzerland
Ximena Bonilla, ETH Zürich, Switzerland
Olga Mineeva, ETH Zürich, Switzerland
Faisal Alquaddoomi, Nexus, ETH Zurich, Switzerland
Jan von Overbeck, Health Department, Canton Bern, Switzerland

Short Abstract: As COVID-19 has spread across the world, it is clear that testing is essential for minimizing community spread. However, large scale longitudinal testing is not a viable option in many countries. We present SloCo, a Swiss-wide Longitudinal COVID-19 Health Survey, an anonymous survey of COVID-19 symptoms, comorbidities, and demographic information.

Our study tries to:
1)Gain insight on COVID-19: Using elastic-net logistic regression we refined the current COVID-19 case definition. Our proposed model achieves ~50% higher recall on the test set than a predictor built on the WHO/ECDC case definition (for the same precision)
2)Inform health authorities: We propose an early warning system to alert if symptomatic persons significantly increase in a region. We need ~200 surveys per time point to detect a new outbreak within the doubling time plus incubation time with a sensitivity of 80%.
I3)nform the public: Public perception of health measures is essential for adherence to guidelines. To address this, we created a dashboard to interactively explore our data.

SloCo currently has >260,000 submissions with ~1,000/day (covidtracker.ch). Our results can immediately be used by health authorities, educate the general public, and learn about the spread and progression of COVID-19.

Systematic modeling of COVID-19 protein structures

COSI: COVID-19

Sean O'Donoghue, CSIRO & Garvan, Australia
Andrea Schafferhans, Technical University of Munich, Germany
Neblina Sikta, Garvan Institute of Medical Research, Australia
Sandeep Kaur, Garvan Institute of Medical Research, Australia
Christian Dallago, Technical University of Munich, Germany
Nicola Bourdin, UCL, United Kingdom
Burkhard Rost, Technical University of Munich, Germany

Short Abstract: The COVID-19 pandemic spawned by SARS-CoV-2 (also: Human SARS Coronavirus 2) requires quick characterisation of the protein structures comprising the viral proteome. As experimentally determined 3D structures become available, these data can be augmented by high-throughput generation of homology models, thereby helping researchers leverage structural data to gain detailed insights into the molecular mechanisms underlying COVID-19. These insights, in turn, help in generating hypotheses aimed at identifying druggable targets for the development of therapies intervention, including vaccines.

We present an online resource that provides ~1,000 3D structure models, derived from all current entries in the PDB that have detectable sequence similarity to any of the SARS-CoV-2 proteins. The matching of sequence-to-sequence alignments were generated by aligning pairs of Hidden Markov Models (HMMs) via HHblits. The structures are presented in the Aquaria molecular graphics systems, which was designed to facilitate overlay of sequence features, e.g., SNPs and posttranslational modifications from UniProt. Aquaria has recently been enhanced to include a much richer set of sequence features, including predictions from the PredictProtein and CATH resources. The COVID-19 models - together with 32,717 sequence features - are available at aquaria.ws/covid19.

Targeting RNA-dependent RNA polymerases towards COVID-19 drug discovery

COSI: COVID-19

Zheng Zhao, University of Virginia, United States
Philip E. Bourne, University of Virginia, United States

Short Abstract: The COVID-19 pandemic has caused a severe threat to global public health worldwide and infected over 4.2 million people as of May 11, 2020. Developing new medications to address the pandemic caused by coronavirus SARS-CoV-2 remains critical. COVID-19 is an RNA virus containing a single-stranded positive-sense RNA genome. RNA-dependent RNA polymerases (RDRP) catalyzes the replication of viral RNA and play a pivotal role in encoding the genome of the virus. Consequently, RDRP is a primary target in the design of potential antiviral inhibitors. As of 26 Feb. 2020 we have collected 374 PDB structures of RDRP catalytic domains and their complexes from 45 RNA viruses including coronavirus as our RDRP dataset. Using computational pharmacology methods, including protein-ligand interaction fingerprints, we determine the binding characteristics of RDRP-ligand interactions to determine inhibitory insights into the binding sites of different viruses, thereby assisting in antiviral drug design and discovery. Then, combining a multi-scoring-function docking process with an antiviral compound library from Chemical Abstracts Service (CAS), we are determining potential inhibitors as drug repurposing opportunities, as well as gaining new insights into the inhibitory mechanisms for containing deadly COVID-19.

Targeting SARS-CoV-2 Protein Protein Interaction Surfaces

COSI: COVID-19

Annika Keshu, King's College London, United Kingdom
Joseph Ng, King's College London, United Kingdom
Emmanouela Petsolari, Imperial College London, United Kingdom
Irene Marzuoli, King's College London, United Kingdom
Mirko Giorgi, King's College London, United Kingdom
Tommaso Laurenzi, Università degli Studi di Milano, Italy
Uliano Guerrini, Università degli Studi di Milano, Italy
Luca Palazzolo, Università degli Studi di Milano, Italy
Ivano Eberini, Università degli Studi di Milano, Italy
Franca Fraternali, King's College London, United Kingdom

Short Abstract: As the Covid-19 pandemic continues to claim countless lives, it has become crucial to develop inhibitors targeting SARS-CoV-2 interactions. We have developed an in silico screening method to detect protein-protein interacting (PPI) interfaces targetable by inhibitors. Filtering these surfaces against crystal contacts in lattices of single protomers, this identifies regions on the protein surface which could be tested using high-throughput crystallography-based fragment screening. This approach successfully identified >400 binary complexes amenable for PPI inhibitor development, which were subsequently verified experimentally.
Here we applied this method to prioritise surfaces on SARS-CoV-2 proteins to be screened for PPI inhibitors using large chemical libraries. We set up an highthroughput virtual screening pipeline to screen compounds targeting viral proteins. We search for candidates within already approved drugs to pursue a repurposing strategy, but we also aim at identifying putative novel hit compounds.
We focus on several SARS-CoV-2 interactions, including the spike glycoprotein complexed with ACE2 and antibody, and Nsp10-Nsp14 and Nsp10-Nsp16 viral-viral interactions.
We will use information on cavities and hot spots nearby PPI interfaces to extract allosteric pockets to target alternatively or in combination with the screened PPI inhibitors. Molecular Dynamics simulations will be performed and allosteric communication pathways between binding sites calculated.

Tools, Workflows and Infrastructure for Open and Reproducible Analysis of SARS-CoV-2 Data

COSI: COVID-19

Anton Nekrutenko, The Pennsylvania State University, United States
Sergei L Kosakovsky Pond, Temple University, United States
Galaxy And Hyphy Developments Teams, The Pennsylvania State University, United States
Bjoern Gruening, Uni-Freiburg, Germany

Short Abstract: The current SARS-CoV-2 pandemic and its consequences for societies world-wide make more evident than ever that science, the data it uses, analyses conducted, and results obtained through it, need to be publicly accessible, transparent and reproducible to and by the global research community and, more generally, an audience as large as possible.
Early on during the unfolding of the pandemic, Galaxy community members and collaborating researchers initiated work on covid19.galaxyproject.org, a cross-discipline effort to develop best-practice workflows for the transparent analysis of COVID-19-related data from public sources.
Since mid-February 2020, members of teams operating public Galaxy instances across the world and the HyPhy development team have contributed to this project by developing and refining tools, by assembling, testing and reviewing workflows, by tracking public data sources, by documenting the project's efforts, and by ensuring proper support for the various workflows from Genomics to Cheminformatics.

We will talk about outcomes of the project, including Genomic analysis and Variant detection in SARS-CoV-2, Natural Selection Analysis, a large scale fragment screen, and drug design effort, automated workflows run on new data as soon as they become available, and lessons learned during this endeavor.

Transcriptome phenotype-based drug repurposing framework for discovering COVID-19 therapeutic agents

COSI: COVID-19

Gwanghoon Jang, Korea University, South Korea
Sungjoon Park, Korea University, South Korea
Sunkyu Kim, Korea University, South Korea
Minjae Ju, Korea University, South Korea
Junseok Choe, Korea University, South Korea
Keonwoo Kim, Korea University, South Korea
Sanghoon Lee, Korea University, South Korea
Manseong Park, Korea University College of Medicine, South Korea
Jaewoo Kang, Korea University, South Korea

Short Abstract: The SARS-CoV-2 also referred to as COVID-19, has caused a global health crisis, which requires urgent development of COVID-19 drugs. There have been many attempts to develop COVID-19 therapeutic agents, mainly by repurposing approved drugs that might bind and inhibit the target proteins such as the COVID-19 receptor (e.g., ACE2). However, the target-based approach can have limitations where the Mechanism of Action (MoA) of the COVID-19 has not been fully identified. In this study, by developing a transcriptional phenotype-based drug repurposing framework, which can process independently of the knowledge of disease’s MoA, we identified drugs that can reverse the COVID-19 infected state to the normal state in the transcriptome-level. Using the framework, we prioritized 11 candidate drugs from 6,843 world-approved drugs as COVID-19 therapeutic agents. In the candidates, we found 6 out of 11 drugs were reported to have an association with COVID-19. Especially, Lenalidomide ranked at 3rd place recently entered into a clinical trial phase. Moreover, in terms of androgen receptor which is recently reported to be a promising target of COVID-19, we suggest Enzalutamide and Apalutamide (ranked at 1st, and 11th place respectively) are potential inhibitors of COVID-19.

Using Viral RNA Test Data To Perform Epidemic Surveillance

COSI: COVID-19

Carlo Graziani, Argonne National Laboratory, United States
Kathleen Beavis, University of Chiacgo, United States
Weihao Ge, University of Illinois at Urbana-Champaign/NCSA, United States
Liudmila Mainzer, University of Illinois at Urbana-Champaign/NCSA, United States

Short Abstract: We report on an effort to use data from viral RNA tests for epidemic surveillance. Using Bayesian compartment-based modeling of both the epidemic and virus-host interaction, we estimate local epidemic parameters using MCMC sampling. The method is capable of generating short-term local forecasts, with quantified uncertainty. We will exhibit verifications with simulated data, and show initial results using testing data from COVID-19 hot spots.

Virtual drug discovery on COVID-19 target proteins

COSI: COVID-19

Yifei Wu, University of Georgia, United States
Lei Lou, University of Georgia, United States
Lorette G. Edwards, University of Georgia, United States
Zhong-Ru Xie, University of Georgia, United States

Short Abstract: The COVID-19 pandemic has caused unprecedented health and economic crisis around the world. Numerous therapeutic agents have been proposed or tested. However, so far there have been no drugs approved to treat this disease, even though Gilead Sciences’ Remdesivir has been granted emergency use authorization (EUA) by the US Food and Drug Administration (FDA) for COVID-19 treatment recently. In this study, we first docked 11 inhibitors of HIV-protease and 3 nucleotide-analogs, which are proposed to treat COVID-19, to four possible target proteins, both SARS-CoV and SARS-CoV-2 main proteases (Mpro) and RNA dependent RNA polymerases (RdRp), respectively. Based on the docking results, we concluded that some repurposing drugs or prodrugs should be good inhibitors and elucidated their binding mechanisms. We also virtually screened more than two thousand approved drugs and natural products to propose the top-scored compounds as potential drugs for COVID-19 treatment.

Viruses, Visualization, and Validation: Interactive mining of COVID-19 literature

COSI: COVID-19

Varun Mittal, NLPCore, United States
Naveen Garg, NLPCore, United States
Yos Wagenmans, NLPCore, United States
Mayuree Binjolkar, University of Washington, United States
Rashad Hatchett, University of Washington Tacoma, United States
Varik Hoang, University of Washington Tacoma, United States
Emma Biggs Lanier, University of Washington Tacoma, United States
Ling-Hong Hung, University of Washington Tacoma, United States
Ka Yee Yeung, University of Washington Tacoma, United States

Short Abstract: We present publicly available software tools in the form of an interactive Jupyter notebook and a NLPCORE search engine with graphical visualization to automate the discovery of reliable and up-to-date scientific information about SARS-CoV-2 studies. In particular, we leverage the NLPCORE AI technology to improve the contextual reference of text-mining results by discovering biological terms and their cross-references within and across articles. These cross-references are specific to search keywords and dynamically computed using neural networks that factor in keyword frequencies, co-location offsets, part-of-speech tags, dictionaries, and expert feedback.

Using the scholarly articles in the COVID-19 Open Research Dataset (CORD-19) corpus, we submitted search keywords to NLPCORE along with suggested topics. We also compared our search results to LitCovid, a curated repository of COVID-19 literature. We achieved over 96% precision rate using each keyword “transmission”, “treatment” and “diagnosis” when compared to LitCovid.

Python Interactive Jupyter Notebook at the Kaggle Challenge website www.kaggle.com/varunmittalnlpcore/cord19-round1-response-by-uw-and-nlpcore
One click access for the NLPCORE search results (requires registration/login): search.nlpcore.com/search-results?asp=&d=1&p=cord19-dataset&q=coronavirus+transmission&rViewType=graph

Wikidata WikiProject COVID-19 : a community effort to integrate biomedical knowledge

COSI: COVID-19

Tiago Lubiana, University of São Paulo, Brazil

Short Abstract: The Wikidata WikiProject COVID-19 is a collaborative, multinational effort created in the current pandemic to improve the representation of pandemic-related content on Wikidata. One of the core goals of the project is reconciling into Wikidata a range of COVID-19 related datasets. The project has integrated pieces of information from critical datasets on epidemiology, on SARS-COV-2 proteins, macromolecular complexes, and protein-protein interactions, on cell lines used in SARS-CoV-2 research, and datasets descriving COVID-19 related scholarly articles themselves. This integrated information is available openly and can be queried via a SPARQL endpoint. The data is accessible in a web-based format via the Wikidata API and wrapper packages in Python and R. The contributions of the WikiProject COVID-19 participants have led to a powerful resource for the life sciences community to parse our collective knowledge about the pandemic.

ISMB 2020

Posters

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time

Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time

ISCB On the Web