Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster presentations at ISMB 2020 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2020. There are Q&A opportunities through a chat function to allow interaction between presenters and participants.

Preliminary information on preparing your poster and poster talk are available at: https://www.iscb.org/ismb2020-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time
Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time
July 14 between 10:40 am - 2:00 pm EDT
A novel network alignment algorithm for extracting densest connected subgraphs in dual networks.
COSI: NetBio COSI
  • Pietro Hiram Guzzi, Laboratory of Bioinformatics, University of Catanzaro, Italy
  • Emanuel Salerno, Unicz, Italy
  • Giuseppe Tradigo, Unicz, Italy
  • Pierangelo Veltri, Unicz, Italy

Short Abstract: Networks are used to model and analyse biological datasets in many fields such as computational biology, medical informatics and social network analysis. More recently, it has been shown that simple network models fail to capture some aspects of the investigated scenarios. There fore more complex models, such as heterogeneous networks or dual networks, have been proposed.

A dual network model uses a pair of graphs sharing the same node set and two different edge sets. One network, called physical network, has unweighted edges and represents binary associations among nodes while the other one is edge-weighted and represent the strength of the associations among nodes. A dual networks capture in a single model some aspects that cannot be described by using a single network.

A relevant problem in dual networks is finding the Densest Connected Subgraph (DCS) having the largest density in the conceptual network, which is also connected in the physical network. A DCS represents a set of nodes that are strongly associated considering the conceptual network, and that are also associated with the physical one.

A protein-protein interaction network-guided functional enrichment analysis for the identification of dysregulated pathways in different breast cancer subtypes.
COSI: NetBio COSI
  • Rachel Nadeau, University of Ottawa, Canada
  • Anastasiia Byvsheva, University of Ottawa, Canada
  • Mathieu Lavallée-Adam, University of Ottawa, Canada

Short Abstract: Quantitative proteomics studies are often used to identify significantly differentially expressed proteins across different experimental conditions. Functional enrichment analyses can then detect biological processes that are significantly overrepresented among these proteins to provide insights into the molecular impacts of the conditions. However, a biological process may be dysregulated in a condition, while its proteins may not be individually significantly differentially expressed. We propose a novel method using protein-protein interaction networks for the identification of differentially expressed biological processes in different conditions. Our graph theory-based approach detects Gene Ontology annotations that are both significantly clustered in a BioGRID protein-protein interaction network and differentially expressed across two experimental conditions, thereby highlighting annotations that may be related to dysregulated pathways. We applied our method to a quantitative proteomics analysis of the Her2+ and triple negative breast cancer subtypes previously published by Tyanova et al. and identified 168 biological processes that are dysregulated between the Her2+ and triple negative breast cancer subtypes and significantly clustered in the network (false discovery rate < 0.001) that would not have been identified by standard methods. This algorithm will improve our ability to characterize quantitative proteomics datasets and provide a better understanding of dysregulated mechanisms under different conditions.

ACCORDION: Clustering and Selecting Relevant Data for Guided Network Extension and Query Answering
COSI: NetBio COSI
  • Yasmine Ahmed, University of pittsburgh, United States
  • Cheryl Telmer, Carnegie Mellon University, United States
  • Natasa Miskov-Zivanov, University of Pittsburgh, United States

Short Abstract: Querying new information from knowledge sources, in general, and published literature, in particular, aims to provide precise and quick answers to questions raised about a system under study. In this paper, we present ACCORDION (Automated Clustering Conditional On Relating Data of Interactions tO a Network), a novel tool and a methodology to enable efficient answering of biological questions by automatically assembling new, or expanding existing models using published literature. Our approach integrates information extraction and clustering with simulation and formal analysis to allow for an automated iterative process that includes assembling, testing and selecting the most relevant models, given a set of desired system properties. We applied our methodology to a model of the circuitry that con-trols T cell differentiation. To evaluate our approach, we compare the model that we obtained, using our automated model extension approach, with the previously published manually extended T cell differentiation model. Besides demonstrating automated and rapid reconstruction of a model that was previously built manually, ACCORDION can assemble multiple models that satisfy desired properties. As such, it replaces large number of tedious or even imprac-tical manual experiments and guides alternative hypotheses and interventions in biological systems.

BiCoN: Network-constrained biclustering of patients and omics data
COSI: NetBio COSI
  • Olga Lazareva, Technical University of Munich, Germany
  • Van Hoan Do, Gene Center Munich, LMU Munich, Germany
  • Stefan Canzar, Gene Center, LMU, Germany
  • Kevin Yuan, Technical University of Munich, Germany
  • Jan Baumbach, Technical University of Munich, Germany
  • Paolo Tieri, Consiglio Nazionale delle Ricerche, Italy
  • David B. Blumenthal, Technical University of Munich, Germany
  • Tim Kacprowski, Technical University Munich, Germany
  • Markus List, Technical University of Munich,, Germany

Short Abstract: Unsupervised learning approaches are frequently employed to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are not suitable to unravel molecular mechanisms along with patient subgroups.

We developed the network-constrained biclustering approach BiCoN (Biclustering Constrained by Networks) which (i) restricts biclusters to functionally related genes connected in molecular interaction networks and (ii) maximizes the difference in gene expression between two subgroups of patients. This allows BiCoN to simultaneously pinpoint molecular mechanisms responsible for the patient grouping. Network-constrained clustering of genes makes BiCoN more robust to noise and batch effects than typical clustering and biclustering methods. BiCoN can faithfully reproduce known disease subtypes as well as novel, clinically relevant patient subgroups, as we could demonstrate using breast and lung cancer datasets.

In summary, BiCoN is a novel systems medicine tool that combines several heuristic optimization strategies for robust disease mechanism extraction. BiCoN is well-documented and freely available as a python package or a web-interface.

Availability: PyPI package: pypi.org/project/bicon
Web interface: exbio.wzw.tum.de/bicon
Preprint: doi.org/10.1101/2020.01.31.926345

Capturing biological pathways with community detection methods applied on protein-protein interaction networks
COSI: NetBio COSI
  • Joseph Chi-Fung Ng, King's College London, United Kingdom
  • Irene Marzuoli, King's College London, United Kingdom
  • F Fraternali, Randall Division of Cell and Molecular Biophysics, King’s College London, United Kingdom

Short Abstract: Charting maps of protein-protein interactions (PPI) is a key step towards a better understanding of how proteins act in concert. Many algorithms developed in Network Science focus on detecting “communities” by partitioning the network topology, but whether and how these methods capture biological functions shared between proteins have not been extensively evaluated. In this work we tested different community detection methods, and examined their concordance with biological pathway enrichment and consensus metrics. The entangled structure of biological networks challenges most algorithms in partitioning the graph. The software OSLOM[1], based on local network structure, was applied for the first time to biological data and proved successful in identifying communities at different resolutions. We find that characteristic GO and KEGG pathway annotations could be sought for most communities, suggesting a connection between network topology and biological functions. We compare PPI data from different databases, experimental sources and ontological evidence. This analysis illustrates the importance of evaluating multiple data source to compile comprehensive PPI networks[2], and demonstrates an approach potentially useful to predict functions for poorly annotated proteins, based on their community partners, or to identify key interactors.
[1] PLoS ONE 6, e18961 (2011).
[2] Sci Rep 5, 8540 (2015).

Characterizing host-virus interactions through host factor curation and network analysis
COSI: NetBio COSI
  • Sean Nesdoly, McGill University, Canada
  • David Sharon, McGill University, Canada
  • Amine Kamen, McGill University, Canada
  • Yu Xia, McGill University, Canada

Short Abstract: The biomolecular interactions that take place between a virus and its host are complex in nature and play a significant role in governing functional and evolutionary dynamics. Moreover, the mechanics by which viruses infect, replicate, and otherwise impact its host varies across individuals, species, and between generations. Therefore, determining what interactions occur, how they evolve, and their importance in the context of the cellular interactome is of significant value and has yet to be fully established.

To this end, a systems biology approach has been taken to elucidate a comprehensive set of human-influenza genetic and functional interactions. Host genes were curated from the literature and classified, with varying degrees of confidence based on the nature of the interaction, as either being essential for or acting against viral propagation. These were then used as a framework to investigate the human interactome, wherein various biological properties and functional relationships were discovered. From this, a set of antiviral genes has been identified as potential targets for knocking out as a way to increase viral titers in cell-based vaccine manufacturing platforms. This study also extends the knowledgebase for proviral genes, which may be used to develop more effective treatments against viral infection.

Comparing Performance of Module-Detection Methods for Identifying Aging-Related Protein Modules
COSI: NetBio COSI
  • Mary Rachel Stimson, Boston University, United States
  • Marlene Tejeda, Boston University, United States
  • Ethel Nankya, Boston University, United States
  • Devlin Moyer, Boston University, United States
  • Paola Sebastiani, Boston University, United States
  • Gary Benson, Boston University, United States

Short Abstract: While age is an important risk factor for many human diseases and disabilities, the mechanisms are poorly understood. It is well-established that different people age differently, and age alone is not a sufficient predictor of an individual's functional status. Recent advances in proteomics have given us additional insight into how people age, and what biological factors underlie longevity. By identifying networks of coexpressed proteins in people who have reached extreme ages and remained healthy, we hope to better characterize the mechanisms by which humans can achieve extreme longevity. We analyzed 224 samples consisting of centenarians, their offspring, and unrelated controls using Weighted Gene Co-expression Network Analysis (WGCNA) and Independent Component Analysis (ICA) to identify co-expressed groups of proteins. Both approaches identified 25 modules, however, the methods showed very little concordance between detected modules. No modules were consistently associated with known aging-related genes and/or pathways. To validate these results, we compared the accuracy of both methods on simulated data across sample sizes using a variant on the Jaccard index. Module recovery of both methods was largely independent of sample size. While WGCNA performed slightly better than ICA, neither method achieved an average similarity rating greater than 0.65 compared with simulated modules.

Computational modeling of the regulatory network controlling monocyte to dendritic cell differentiation
COSI: NetBio COSI
  • Karen Nuñez-Reza, International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México, Querétato, México, Mexico
  • Aurélien Naldi, Institut de Biologie de l’Ecole normale supérieure – Université PSL, France
  • Arantza Sanchéz-Jimenez, International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México, Querétato, México, Mexico
  • Ana V. Leon-Apodaca, International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México, Querétato, México, Mexico
  • M Angelica Santana, Centro de Investigación en Dinámica Celular (IICBA), Universidad Autónoma del Estado de Morelos, Cuernavaca, México, Mexico
  • Morgane Thomas-Chollier, Computational Systems Biology team, Institut de Biologie de l’Ecole normale supérieure – Université PSL, France
  • Denis Thieffry, Computational Systems Biology team, Institut de Biologie de l’Ecole normale supérieure – Université PSL, France
  • Alejandra Medina-Rivera, International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México, Querétato, México, Mexico

Short Abstract: In 1994, Sallusto and Lazavecchia described a protocol to obtain Dendritic Cells (called moDCs) from monocytes using a culture medium containing IL4 and CSF2, both required for moDC differentiation and maintenance. IL4 and CSF2 activate the signaling pathways JAK/STAT, NFKB, PI3K, and MAPK. The main transcription factors (TFs) targeted by these two signaling cascades in monocytes are known, but how genes specific for DCs are activated remains to be clarified.

We performed a comprehensive literature search to identify the TFs controlling the differentiation of moDCs from monocytes, and we integrated them together with IL4 and CSF2 signaling pathways in a logical model. To complete this model, we further considered experimentally validated and computationally predicted TF-DNA regulatory interactions.

Built with the software GINsim, the model was further analysed using the tools integrated into the CoLoMoTo toolbox. The model gives raise to four stable states, each corresponding to one combination of input cytokines (CSF2 and IL4). We further challenged our model by performing in silico simulations of nine single gene mutants documented in the literature, which result in stable states matching the outcomes of the corresponding experiments. In conclusion, our logical model analysis supports the roles of 96 novel regulatory interactions.

Computing, analyzing and visualizing genome-scale metabolic flux networks with Fluxer
COSI: NetBio COSI
  • Archana Hari, University of Maryland Baltimore County, United States
  • Daniel Lobo, University of Maryland Baltimore County, United States

Short Abstract: Genome-scale metabolic models not only represent the biochemical circuits within cells but can also be used to simulate and predict cellular phenotypes. However, there is a lack of user-friendly tools to visualize the metabolic contents within these genome-scale models and simulate their flux behaviors. For this, we have developed Fluxer (www.fluxer.umbc.edu), a freely-available novel web application for the simulation and visualization of genome-scale metabolic flux networks with an easy-to-use interface. The application can take as input any metabolic model encoded with the Systems Biology Markup Language format, automatically perform Flux Balance Analysis and compute different flux graphs. The flux networks can be visualized as spanning trees and complete graphs with different layouts. The interactive graphs can be used to study major pathways contributing to any metabolic reaction or biosynthesis of any metabolite as well as to simulate reaction knockouts. In addition, Fluxer can compute the k-shortest paths between two reaction or metabolites within the model. Nodes can display detailed metabolic and reaction information, including molecular weights, reaction fluxes, and molecular structures. Over 80 whole-genome metabolic reconstructions are readily available for visualization and analysis. Fluxer enables efficient analysis and visualization of genome-scale metabolic models towards the discovery of key metabolic pathways.

Construction and Visualization of Diseasome of Lung Diseases Associated with COVID-19 from Co-association Networks of Multi-omics Data
COSI: NetBio COSI
  • Reddy Rani Vangimalla, International Institute of Information Technology Bangalore, India
  • Jaya Sreevalsan-Nair, Graphics-Visualization-Computing Lab, IIIT Bangalore, India

Short Abstract: Studying protein-protein interactions is important for identifying appropriate drugs to fight the disease. COVID-19 is a global pandemic that has affected ~4.5 million people in 213 countries with ~0.3 million fatalities, since December 2019. In this work, we propose a multi-omics and multi-microarray analysis of comorbidities associated with COVID-19, such as pneumonia, and acute respiratory distress syndrome (ARDS), and related lung diseases, Lung Squamous Cell Carcinoma (LSCC), and tuberculosis. We propose to build correlation matrices between significant genes, which are filtered and collated from these multi-omics data of each disease, i.e. the DNA methylation features and mRNA gene expressions. Using a consensus method of voting to aggregate the correlation networks per disease, we build a diseasome between the genes and the diseases. Our goal in this work is to then identify appropriate genes as well as gene pairs to target for drug therapy based on the findings from comorbidities associated with COVID-19. For example, ACE-2 has been identified as a prominent gene of interest in COVID-19. Identifying ACE-2 and similar genes determine the extent of association of these genes with related diseases. The visualization of the diseasome provides insights into the co-association of genes and gene-pairs with these diseases.

Data-driven biological network alignment that uses topological, sequence, and functional information
COSI: NetBio COSI
  • Shawn Gu, University of Notre Dame, United States
  • Tijana Milenkovic, University of Notre Dame, United States

Short Abstract: Network alignment (NA) can transfer functional knowledge between species' conserved biological network regions. Traditional NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions' functional relatedness. However, we recently found that functionally unrelated proteins are as topologically similar as functionally related proteins. So, we redefined NA as a data-driven framework, TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to their functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, TARA yielded higher protein functional prediction accuracy than existing NA methods, even those that used both topological and sequence information. Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods.

deepHPI: Comparative deep learning framework for the efficient prediction of host-pathogen protein-protein interactions using sequence-based features
COSI: NetBio COSI
  • Cristian Loaiza, Center for Integrated BioSystems, Utah State University, United States
  • Nicholas Flann, Dept of Computer Science, Utah State University, United States
  • Rakesh Kaundal, Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, United States

Short Abstract: Host-pathogen protein-protein interactions (HPIs) play vital roles in several biological processes and are directly involved with infectious diseases. It is crucial to understand their mechanism and unravel potential targets to develop therapeutics. Beyond single-species Protein-Protein Interaction (PPI) prediction, no comprehensive analysis has been attempted to model HPIs on a genome scale.
We compared different machine learning methods such as support vector machines (SVM), artificial neural networks (ANN) and Deep Learning-based Convolutional Neural Networks (CNN) to predict HPIs with high accuracy. Several sequence-based features were tested, including Autocorrelation, Dipep composition, Conjoint Triad, Quasi-order and One-hot. The training datasets were obtained from HPIDB and Negatome, to create positive and negative datasets, and independent test datasets.
Most of the models were able to perform well; independent test sensitivity values ranging from 84.6% to 99.5%, specificity 56.1% to 98.8%, and MCC 0.61 to 0.91. We found that Negatome is not an ideal dataset in inter-species predictions. A novel approach to generate a more suitable non-interaction dataset is proposed. Among the methods tested, Deep Learning looks promising. The best prediction models have been implemented on a webserver, deepHPI for four host-pathogen data types: plant-pathogen, human-bacteria, human-virus and animal-pathogen. deepHPI is freely available at bioinfo.usu.edu/deepHPI/

Discrete model of macrophage simulates six clinically relevant activation states
COSI: NetBio COSI
  • Adam Butchy, University of Pittsburgh, United States
  • Natasa Miskov-Zivanov, University of Pittsburgh, United States
  • Cheryl Telmer, Carnegie Mellon University, United States

Short Abstract: Macrophages have contextual phenotypes, with a variety of activation states, contingent on the cytokines and chemokines in their extracellular environment. This plasticity makes them a challenge for network analysis and modeling. Recently, a transcriptomic network analysis suggested the spectrum of macrophage activations states beyond the canonical M1/M2-polarization model. We have previously developed a discrete model of intracellular signaling in macrophages, and we used the transcriptomic data to validate this model. We identified six induction signals from the data, with distinct and clinically relevant activation states. For each induction signal, we found key response pathways and elements. Stochastic model simulations, conducted for each induction scenario, provided the information about the changes in the behavior of key model elements in time. We then assessed model performance by comparing element behavior against the transcriptome data. Of the 28 model elements that are found in the transcription database, 23 matched the data, that is, the macrophage model is able to reproduce the response to three induction signals, while for the other three signals it slightly diverges from the expected behavior. These results suggest that our model is comprehensive, and therefore, it can be beneficial in studying macrophage response to changes in their microenvironment.

Dynamic Microbial Association Networks in the Ocean
COSI: NetBio COSI
  • Ina Maria Deutschmann, Institute of Marine Sciences (ICM-CSIC), Spain
  • Ramiro Logares, ICM-CSIC, Spain

Short Abstract: Background
Ecological interactions among microbes are fundamental for ecosystem function, still they are barely known. High-throughput-omics can help predicting microbial interactions through association networks, which are often dense and static. Yet, microbial interactomes are highly dynamic. Here, we investigate microbial association networks through time aiming to improve our understanding of network dynamics, which can lead to a better comprehension of marine microbial ecosystems.

Method and Results
We developed a novel method to obtain dynamic networks and applied it on ten years of marine microbial community composition data (16S and 18S rRNA genes), which allowed us to reconstruct one network per month. The median Jaccard similarity used for pairwise network comparison based on their edge sets, suggest low overall similarity (Jall=0.15, range=[0.01, 0.94]). Yet, similarity increases if networks are compared intraseasonally ([Jwinter, Jspring, Jsummer, Jfall]=[0.59, 0.20, 0.46, 0.27]), or monthly (JApr=0.21 to JJan=0.76), pointing to a seasonally-dependent recurrence in network architecture ranging from modest to moderate.

Conclusions
We quantified network recurrence in a model marine microbial ecosystem and found a low amount of interseasonal recurrent edges, but a modest or moderate amount of intraseasonal ones. This suggest that ocean ecosystems require a moderate amount of reoccurring microbial interactions to function.

Enhanced Quality Control for Kinome Microarrays
COSI: NetBio COSI
  • Connor Denomy, University of Saskatchewan, Canada
  • Scott Napper, University of Saskatchewan, Canada
  • Anthony Kusalik, University of Saskatchewan, Canada

Short Abstract: High throughput analysis of kinase activity is a useful tool for characterization of the kinases involved in disease and cellular processes. Profiling the network of kinase interaction using microarrays has been used in a variety of applications. Unfortunately, statistical noise and sources of error often create results that are difficult to interpret, leading to indecipherable results or incorrect conclusions. Here we present an extension of existing kinome microarray analysis technology to include quality control metrics that will aid researchers in identifying the most important results from their kinome microarray experiments. Two important quality control metrics have been developed. First, error caused by improper software image alignment can be measured using the difference between the background corrected median and mean for each probe, prompting researchers to perform manual realignment to improve the experimental results. Secondly, outlier arrays are determined by inter-array comparisons using measurements of the mean absolute difference between the data from all corresponding probes between arrays. These metrics can demonstrably improve kinome microarray results by identifying low quality data and reducing batch effects, allowing researchers to make more reliable decisions on how to proceed. Applying these corrections results in improved clustering of treatment groups and enhanced reproducibility of experiments.

Estimating Dispensable Content in the Human Interactome
COSI: NetBio COSI
  • Mohamed Ghadie, McGill University, Canada
  • Yu Xia, McGill University, Canada

Short Abstract: Protein-protein interaction (PPI) networks (interactome networks) have successfully advanced our knowledge of molecular function, disease and evolution. While much progress has been made in quantifying errors and biases in experimental PPI datasets, it remains unknown what fraction of the error-free PPIs in the cell are completely dispensable, i.e., effectively neutral upon disruption. Here, we estimate dispensable content in the human interactome by calculating the fractions of PPIs disrupted by neutral and non-neutral mutations. Starting with the human reference interactome determined by experiments, we construct a human structural interactome by building homology-based three-dimensional structural models for PPIs. Next, we map common mutations from healthy individuals as well as Mendelian disease-causing mutations onto the human structural interactome, and perform structure-based calculations of how these mutations perturb the interactome. Third, we integrate these results to calculate the probabilities for common mutations (assumed to be neutral) and disease-causing mutations (assumed to be non-neutral) to disrupt human PPIs. Finally, we apply Bayes’ theorem to calculate the probabilities for human PPIs to be neutral or non-neutral upon disruption. Using our predicted as well as experimentally-determined interactome perturbation patterns by common and disease mutations, we estimate that <~20% of the human interactome is completely dispensable.

Evaluating the Significance of Disease Transitions in Regulatory Networks
COSI: NetBio COSI
  • James Lim, University of Arizona, United States
  • Chen Chen, University of Arizona, United States
  • Megha Padi, The University of Arizona, United States

Short Abstract: The use of biological networks such as protein-protein interaction and transcriptional regulatory networks is becoming an integral part of biological research in the genomic era. These networks are not static; during phenotypic transitions like disease onset, they can acquire new “communities” of genes that carry out key cellular processes. Changes in community structure can be detected by optimizing a modularity-based score, but because biological systems are inherently noisy, it remains a challenge to evaluate whether these changes truly represent a coordinated cellular response, or whether they appear by random chance. Here, we introduce Constrained Random Alteration of Network Edges (CRANE), a computational method for sampling networks with fixed node strengths, and we use this null distribution to assess the robustness of observed changes in network structure. In contrast with common approaches like consensus clustering or generative models, CRANE produces more biologically realistic networks and performs better in simulations. When applied to breast and ovarian cancer networks, CRANE improves the recovery of cancer-relevant GO terms while reducing the signal from non-specific housekeeping processes. CRANE is a general tool that can be applied in tandem with a variety of stochastic community detection methods to evaluate the veracity of their results.

Expansion of Reactome Functional Interaction Network to Allow Exposure of Knowledge Space of Understudied Proteins in the Context of Biological Pathways
COSI: NetBio COSI
  • Nasim Sanati, Oregon Health and Science University, United States
  • Timothy Brunson, Oregon Health and Science University, United States
  • Solomon Shorser, Ontario Institute for Cancer Research, Canada
  • Lisa Matthews, NYU School of Medicine, United States
  • Robin Haw, Ontario Institute for Cancer Research, Canada
  • Peter D'Eustachio, New York University School of Medicine, United States
  • Guanming Wu, Oregon Health and Science University, United States
  • Lincoln Stein, Ontario Institute for Cancer Research, Canada

Short Abstract: Currently one of the major overarching questions in research is “how to match drug response with omics?”. Even in the era of super computation and massive data flow, a small fraction of the human genome is still understudied. Designing clinically actionable therapeutics targeting proteins encoded by these understudied genes requires an expansion of knowledge space. Reactome is the most comprehensive, open-source biological pathway knowledgebase, widely used for pathway analysis and visualization. As part of the Cutting Edge Informatics Tools program of the NIH Illuminating the Druggable Genome (IDG) Consortium, we are expanding the Reactome Functional Interaction Network to provide a pathway and network knowledge space for these understudied proteins. We have collected more than 100 highly reliable protein pairwise relationship data sources, including tissue/cancer-specific gene coexpressions from GTEx and TCGA, gene similarities from Harmonizome, and protein-protein interactions from StringDB, BioGrid, and BioPlex. We are developing a machine learning approach to integrate these data sources to predict functional relationships between understudied proteins and Reactome annotated proteins. Placing these understudied proteins in the context of Reactome pathways will facilitate generation of hypotheses for new potential targetable proteins. We have made the data accessible and navigable via our new Reactome IDG portal, idg.reactome.org.

Funcoup: Network of functional gene associations
COSI: NetBio COSI
  • Miguel Castresana Aguirre, Stockholm university, Sweden
  • Emma Persson, Stockholm University, Sweden
  • Erik L.L. Sonnhammer, Stockholm university, Sweden

Short Abstract: FunCoup is one of the most comprehensive databases for genome-wide
functional association networks. It uses a redundancy-weighted Bayesian
integration approach to combine 11 data types leveraged with orthology
transfer across different species. This approach provides high coverage
of the included genomes as well as high quality of inferred
interactions. Identifying these functional couplings is an important
step in the understanding of higher level mechanisms performed by
complex cellular processes. The latest release (4.1) of FunCoup
contains networks for 17 species and is to our knowledge the only
network analysis website that offers “comparative interactomics”, i.e.
visual analysis of network conservation between species. This update
features improved networks and network-based gene prioritization with
the MaxLink algorithm.

Gene Embeddings of Complex Network (GECo) and hypertension disease gene classification
COSI: NetBio COSI
  • Cagatay Dursun, Medical College of Wisconsin and Marquette University, United States
  • Jennifer R. Smith, Rat Genome Database, Medical College of Wisconsin, United States
  • Serdar Bozdag, Marquette University, United States

Short Abstract: Diseases such as hypertension, cancer, diabetes are the causes of nearly 70% of the deaths in the U.S. Such complex diseases involve multiple genes and their interactions with environmental factors. Therefore, identification of genetic factors to understand and decrease the morbidity rates of those complex diseases are important and challenging tasks. With the generation of unprecedented amount multi-omics datasets, network-based methods have become popular to represent the multilayered complex molecular interactions. Particularly network embeddings, the low-dimensional representations of nodes in a network are utilized gene function prediction. Most of the network embedding methods, however, could not integrate multiple types of datasets from genes and phenotypes. This is an important limitation as multi-omics data integration alleviates the issues due to missing data or lack of context-specific data. To address this limitation, we developed a network embedding algorithm named GECo that can utilize multilayered heterogenous networks of genes and phenotypes. We evaluated the performance of GECo using genotypic and phenotypic datasets of Rattus norvegicus organism to classify hypertension disease related genes. Our method significantly outperformed the state-of-the-art network embedding methods by 95.17% AUC in prediction where the second-best performer achieved 85.87% AUC.

Global network alignment for the identification of conserved Alzheimer-related pathways in Homo sapiens and Caenorhabditis elegans
COSI: NetBio COSI
  • Avgi Apostolakou, National and Kapodistrian University of Athens, Greece
  • Xhuliana Sula, National and Kapodistrian University of Athens, Greece
  • Katerina Nastou, National and Kapodistrian University of Athens, Greece
  • Georgia Nasi, National and Kapodistrian University of Athens, Greece
  • Christos Panagopoulos, BioAssist S.A., Greece
  • Ilias Maglogiannis, University of Piraeus, Greece
  • Vassiliki Iconomidou, National and Kapodistrian University of Athens, Greece

Short Abstract: Alzheimer disease (AD) is a neurodegenerative disorder, whose study often relies on the use of model organisms, such as Caenorhabditis elegans. In this work we performed in silico construction and comparison of Alzheimer-related protein-protein interaction (PPI) networks in H. sapiens and C. elegans. The aim of this work was to discover conserved biological pathways in both organisms, implicated in AD processes. PPI networks were created for amyloid precursor protein (APP) and Tau in H. sapiens − proteins crucial for the progression of AD − and their C. elegans orthologs APL-1 and PTL-1. Global network alignment was used for network comparison which allowed the discovery of similar – possibly conserved – network regions. Two prominent, highly conserved pathways implicated in AD are present in both organisms; the APP-processing and the Tau-phosphorylation pathways. The protein interactions that take place in those pathways have been associated with AD in humans, yet remain mostly unexplored in C. elegans. Many proteins implicated in these pathways, as well as their interactions, could serve as targets for experimental studies in C. elegans, leading to an enhanced comprehension of the mechanisms governing AD.

Graph-structured Feature Selection with Normalized Trend Filtering with Applications to Cancer Studies
COSI: NetBio COSI
  • Govinda Kamath, Microsoft Research New England Labs, United States
  • Hugh Yeh, Microsoft Research, United States
  • Ifrah Tariq, Massachusetts Institute of Technology, United States
  • Ernest Fraenkel, Massachusetts Institute of Technology, United States
  • Lester Mackey, Microsoft Research, United States

Short Abstract: A canonical learning task for cancer patient data is identifying a set of features predictive of survival.
Unfortunately, modern day clinical datasets are massively imbalanced with tens of thousands of features but only dozens of patients.
However, prior knowledge is often available that should be useful in identifying the most relevant features. In many cases, this data can be described by a graph where nodes represents proteins or genes and weighted edges indicate interactions based on previous experimental studies.
We propose to embed this prior knowledge into a family of sparsity-inducing regularizers based on the normalized Laplacian of the graph.
We call the resulting regression normalized trend filtering (NTF) in homage to the graph trend filtering work of Wang and Tibshirani in 2016.
We show that NTF can be efficiently solved using standard Lasso, SCAD, and MCP solvers and prove that for graphs with heterogeneous degree distributions, NTF enjoys significantly improved recovery guarantees over standard graph trend filtering.
Finally, we demonstrate that NTF yields both more accurate survival prediction and more interpretable feature selection than the Lasso, Laplacian ridge regression, and standard graph trend filtering on a suite of cancer datasets with heterogeneous-degree knowledge graphs.

Hypergraphs for predicting essential genes using multiprotein complex data
COSI: NetBio COSI
  • Florian Klimm, Imperial College London, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom
  • Gesine Reinert, University of Oxford, United Kingdom

Short Abstract: Protein–protein interactions are crucial in many biological pathways and facilitate cellular function. Investigating these interactions as a graph of pairwise interactions can help to gain a systemic understanding of cellular processes. It is known, however, that proteins interact with each other not exclusively in pairs but also in polyadic interactions and they can form multiprotein complexes, which are stable interactions between multiple proteins. In this study, we use hypergraphs to investigate multiprotein complex data. We find that projecting a hypergraph of polyadic interactions onto a graph of pairwise interactions leads to the identification of different proteins as hubs than the hypergraph. In our data set the hypergraph degree is a more accurate predictor for gene-essentiality than the degree in the pairwise graph. We find that analysing a hypergraph as pairwise graph drastically changes the distribution of the local clustering coefficient. Furthermore, using a pairwise interaction representing multiprotein complex data may lead to a hierarchical structure, which is not observed in the hypergraph. Hence, we illustrate that hypergraphs can be more suitable than pairwise graphs for the analysis of multiprotein complex data.

Identifying dysregulation of protein complexes in tumour cells from brain tumour proteomes
COSI: NetBio COSI
  • Theodore B. Verhey, University of Calgary, Canada
  • A. Sorana Morrissy, University of Calgary, Canada

Short Abstract: Glioblastoma (GBM) is the most common and lethal brain tumour with few treatment options. Whole genome sequencing and RNA-Seq have enabled the discovery of many genomic variants, but interpretation of their impact has lagged. Recent advances in proteomics such as multiplexed mass spectrometry have allowed for sensitive relative quantification of proteins, offering insight into the phenotype of genomic variants. In cancer cell lines, it has been shown that this proteome data is uniquely able to identify protein complexes through the high correlation of member proteins. However, protein complexes have not been reliably detected from clinical tumour proteomes due to confounding sources of covariation such as tumour purity. Extending an existing method that infers tumour and normal covariance networks, we identify networks of peptides that are highly correlated in the tumour cell population. We use gene models and protein complex databases to interpret modules of covariant peptides across our cohort of glioblastoma patients, including isoform-specific correlation, which is a major source of regulation in the brain. Since protein complexes are regulated post-transcriptionally and execute many context-dependent cellular functions, detecting their dysregulation would allow for identification of a new class of aberrations that could provide novel targets for therapy.

Idg.reactome.org: A web-based platform for visualizing dark proteins in the context of Reactome pathways
COSI: NetBio COSI
  • Tim Brunson, Oregon Health Science University, United States
  • Nasim Sanati, Oregon Health Science University, United States
  • Solomon Shorser, Ontario Institute for Cancer Research, Canada
  • Lisa Matthews, NYU School of Medicine, United States
  • Robin Haw, Ontario Institute for Cancer Research, Canada
  • Peter D'Eustachio, New York University School of Medicine, United States
  • Guanming Wu, Oregon Health and Science University, United States
  • Lincoln Stein, Ontario Institute for Cancer Research, Canada

Short Abstract: Placing understudied proteins in the context of biological pathways facilitates the generation of experimentally testable hypotheses to infer potential functions of these proteins. The Reactome Pathway Diagram View is a web-based tool, providing a biologist-friendly way to visualize proteins, complexes, and reactions in high-quality Reactome pathways. In order to put understudied proteins in the context of Reactome pathways, we have extended the Pathway Diagram View to overlay tissue-specific expression data, protein pairwise relationships, and drug/target interactions. We implemented new interfaces for users to select tissue-specific mRNA and protein expression data from 19 data sources collected in the target central resource database (TCRD). The pairwise relationship overlay allows users to display positive and negative relationships from multiple sources in the same view. We have also implemented a new visualizer via the use of Cytoscape.js, allowing a pathway to be displayed as a set of functional interactions. Drugs can be overlayed in our pairwise view and in the new functional interaction visualizer. The new features we have introduced in the Reactome Pathway Diagram View pave the way for us to predict and visualize functions of understudied proteins based on Reactome pathways.

ImmunoGlobe: enabling systems immunology with a manually curated intercellular immune interaction network
COSI: NetBio COSI
  • Michelle Atallah, Stanford University, United States
  • Parag Mallick, Stanford University, United States

Short Abstract: While technological advances have made it possible to profile the immune system at high resolution, translating high-throughput data into knowledge of immune mechanisms has been challenged by the complexity of the interactions underlying immune processes. Tools to explore the immune network are critical in predicting the outcome of complex immune interactions, and can provide mechanistic insight that allows us to precisely modulate immune responses in health and disease. Development of these tools will require a standardized network map of immune interactions. To facilitate this we have developed ImmunoGlobe, a manually curated intercellular immune interaction network extracted from Janeway’s Immunobiology textbook.
ImmunoGlobe is the first graphical representation of the immune interactome, and is comprised of 253 immune system components and 1112 unique immune interactions with detailed functional and characteristic annotations. Analysis of this network shows that it recapitulates known features of the human immune system and can be used uncover novel multi-step immune pathways, examine species-specific differences in immune processes, and predict the response of immune cells to stimuli. ImmunoGlobe is publicly available through a user-friendly interface at www.immunoglobe.org and can be downloaded as a computable graph and network table.

Inferring a causal gene regulatory network for yeast
COSI: NetBio COSI
  • Adriaan-Alexander Ludl, University of Bergen, Norway
  • Tom Michoel, University of Bergen, Norway

Short Abstract: With data on both genetic variation and gene expression now available for individual organisms in population panels it becomes possible to improve the statistical methods used to infer gene regulatory networks (GRN). Causal inference methods using genetic variants as instrumental variables or in mediation-based statistical tests have been developed, but simulations suggest that large sample sizes are needed to reach their full potential. Here, we present a systematic analysis of causal GRNs inferred from public gene expression data for a panel of 1012 segregants from crosses of two strains of S. cerevisiae, the most complete dataset currently available for studying the effects of genetic variation on gene expression in any species. Comparison to networks of known transcriptional regulatory interactions in yeast confirms that instrumental variable methods outperform mediation-based causal inference, and that causal inference can resolve the structure of gene regulatory networks underlying genetic hotspot regions. Our findings can act as a blueprint for the application of causal network inference methods in other species, where fewer validation data exist.

Inflammatory pathway analysis in immune cells combining single-cell transcriptomics and network biology approaches
COSI: NetBio COSI
  • Lejla Potari-Gul, Earlham Institute, United Kingdom
  • Dezso Modos, Earlham Institute, United Kingdom
  • Matthew Madgwick, Earlham Institute, United Kingdom
  • Agatha Treveil, Earlham Institute, United Kingdom
  • Nicholas Powell, Imperial College London, United Kingdom
  • Tamas Korcsmaros, Earlham Institute, United Kingdom

Short Abstract: NKG2D receptor is one of the master regulators of the cell-mediated immune response. It is usually expressed on NK and Th17 cells, however, it appears on other cell types as well in inflammation or diseases, like inflammatory bowel disease (IBD). NKG2D and its ligands are attractive new targets for IBD therapies, although, the downstream effect of the signal is not completely understood.
We constructed an NKG2D signalling network, from the receptor till interferon-gamma (IFNG) expression, based on published pathway members connecting them with protein-protein interactions using OmniPath. Analysing publicly available scRNA-seq data from 23 different types of immune cells we selected the relevant ones (NK, T cells, monocytes, macrophage and innate lymphoid cells) and compared the expression of pathway members among cell types in different conditions (healthy and ulcerative colitis (UC), one main type of IBD).
As a result, we have found that NKG2D signalling differs among the cells in healthy and non-inflamed UC conditions. We discovered paralogues specific rewiring of the pathway altering its effect depending on the healthy or diseases state. In conclusion, using network biology approaches and scRNA-seq analysis we have revealed the regulation of IFNG expression by NKGD2 receptor in a diverse set of immune cells.

Iterative Graph Perturbation to Identify High Impact Nodes with Application to Genetic Regulation of Onset of Puberty
COSI: NetBio COSI
  • Basak Selcuk, OHSU, United States
  • Alejandro Lomniczi, OHSU, United States
  • Peter Heeman, OHSU, United States
  • Kemal Sonmez, OHSU, United States

Short Abstract: Complex networks that connect thousands of nodes can produce complex behavior through the use of nodes interacting, affecting and regulating one another. Genetic networks underlying high-level hormonal changes are also believed to be complex and are not fully understood. Therefore, finding genes transcriptionally active in a gene set that are responsible for a change in the human body is a key point in reaching the underlying network structure. Here, we present a new technique that searches for these nodes in a variable set that connects to form a network. It is an iterative search that tries to find nodes whose removal cause the greatest change in the network structure. We use a beam search over such networks to minimize getting stuck in a local minimum. Our simulated genetic data results show that our method is successful for differentiating between transcriptionally active nodes and background nodes with a p-value of less than 0.05. As real biological data, we used rat RNA-seq data taken for initiation-of-puberty research. Our method found new genes as well as confirming previously known genes with significant enrichment results taking charge in functions such as transcriptional binding, histone modifications, and reproductive development.

Knowledge Guided Multi-Omic Network Inference for Multi-Omic Biomarker Detection
COSI: NetBio COSI
  • Christoph Ogris, Helmholtz Center Munich, Germany
  • Yue Hu, Helmholtz Center Munich, Germany
  • Nikola Mueller, Helmholtz Center Munich, Germany

Short Abstract: Here we present our method KiMONO - a Knowledge Guided Multi-Omic Network Inference framework. So far, most common analysis strategy is to explore all layers independently and search for common features afterwards. However, this assumes that all omics levels are contributing equally to the results and that there is no ‘cross-omics’ relationships within the data. In our approach we account for these issues by using all data points, from all omics levels, as independent variables for the regression model. We further use prior knowledge from public databases to cope with the high dimensional feature space of multi-omic data. In a benchmark we evaluated the performance of KiMONo by simulating low sample sized data and data sets with high noise levels. Finally we examplify KiMONo by applying it to eleven different data sets of tumor types, containing information of protein expression, mRNA expression, DNA methylation, mutation, copy number variation and clinical features. We were able to integrate the data and identify sets of cancer type specific features across all omic levels. These results demonstrate that our method successfully utilizes prior knowledge to boost the power of data integration and therefore vastly improves the interpretability of multi-omics studies.

Membrane protein-regulated networks across human cancers
COSI: NetBio COSI
  • Chun-Yu Lin, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Taiwan
  • Chia-Hwa Lee, School of Medical Laboratory Science and Biotechnology, Taipei Medical University, Taiwan
  • Yi-Hsuan Chuang, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Taiwan
  • Jung-Yu Lee, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Taiwan
  • Yi-Yuan Chiu, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Taiwan
  • Yan-Hwa Wu Lee, Department of Biological Science and Technology, National Chiao Tung University, Taiwan
  • Yuh-Jyh Jong, Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Taiwan
  • Jenn-Kang Hwang, Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Hong Kong
  • Sing-Han Huang, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Taiwan
  • Li-Ching Chen, TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taiwan
  • Chih-Hsiung Wu, Department of Surgery, School of Medicine, College of Medicine, Taipei Medical University, Taiwan
  • Shih-Hsin Tu, TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taiwan
  • Yuan-Soon Ho, School of Medical Laboratory Science and Biotechnology, Taipei Medical University, Taiwan
  • Jinn-Moon Yang, Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Taiwan

Short Abstract: Alterations in membrane proteins (MPs) and their regulated pathways have been established as cancer hallmarks and extensively targeted in clinical applications. However, the analysis of MP-interacting proteins and downstream pathways across human malignancies remains challenging. Here, we present a systematically integrated method to generate a resource of cancer membrane protein-regulated networks (CaMPNets), containing 63,746 high confidence protein–protein interactions (PPIs) for 1962 MPs, using expression profiles from 5922 tumors with overall survival outcomes across 15 human cancers. Comprehensive analysis of CaMPNets links MP partner communities and regulated pathways to provide MP-based gene sets for identifying prognostic biomarkers and druggable targets. For example, we identify CHRNA9 with 12 PPIs (e.g., ERBB2) can be a therapeutic target and find its anti-metastasis agent, bupropion, for treatment in nicotine-induced breast cancer. This resource is a study to systematically integrate MP interactions, genomics, and clinical outcomes for helping illuminate cancer-wide atlas and prognostic landscapes in tumor homo/heterogeneity.

miRNet 2.0: network-based visual analytics for miRNA functional analysis and systems biology
COSI: NetBio COSI
  • Le Chang, McGill University, Canada
  • Guangyan Zhou, McGill University, Canada
  • Othman Soufan, McGill University, Canada
  • Jianguo Xia, McGill University, Canada

Short Abstract: MicroRNAs (miRNAs) regulate most cellular processes and are promising therapeutic candidates for cancer and other diseases. Understanding miRNA function is challenging due to the “many-to-many” relationships between miRNAs and their target genes. In addition, complex interplay exists between miRNAs and other functional elements, such as transcription factors (TFs), lncRNAs, etc. However, integrating multiple data types and interpreting these results at a systems-level is challenging. We, therefore, present miRNet 2.0, a web application to support miRNA centric network analysis for systems-level interpretation of miRNA functions and gene regulations. Compared to its predecessor, this new release has i) added support for TFs and single nucleotide polymorphisms (SNPs) that affect miRNAs, miRNA-binding sites or target genes, whilst also greatly increased (>5 fold) the underlying knowledgebase; (ii) implemented new functions to allow the creation and visual exploration of multipartite networks, with enhanced support for in situ functional analysis; and (iii) revamped the web interface, optimized the workflow, and introduced microservices and web application programming interface (API) to sustain high-performance, real-time data analysis. The underlying R package is also released in tandem with version 2.0 to allow more flexible data analysis for R programmers. The miRNet 2.0 website is freely available at www.mirnet.ca.

Multi-omics data analysis in the cloud: inference of differential breast cancer-related network hubs between TCGA patient cohorts
COSI: NetBio COSI
  • George Acquaah-Mensah, Massachusetts College of Pharmacy and Health Sciences, United States
  • Kawther Abdilleh, ISB-CGC, General Dynamics Information Technology:, United States
  • Boris Aguilar, Institute of Systems Biology, United States
  • Ronald Taylor, National Cancer Institute, United States

Short Abstract: There are disparities in biomolecular and clinical presentations, racial distribution, and incidences of aggressive types of breast cancer (BC). Prior work indicates prognosis tends to be worse for Black/ African-American (B/AA50) stage II breast invasive carcinoma patients diagnosed at age 50 years or younger, compared to White patients of similar age (W50). To characterize these two cohorts molecularly, we examined data of BC samples from patients as deposited in The Cancer Genome Atlas.

Our analysis used a novel cloud-based approach, performing cohort analysis on multi-omic data sets from TCGA data sets in the Google Cloud environment provided by the NCI-funded Institute of Systems Biology Cancer Genomics Cloud. We combined multi-omic (gene expression, methylation, somatic mutation) data from multiple Google BigQuery database tables using SQL queries. The great bulk of the integration and analysis was performed in the cloud, rather than locally.

We inferred notable network hubs expressed at higher levels in B/AA50 than in W50, using microRNA and gene expression correlations. Such hubs include microRNA genes hsa-mir-93, hsa-mir-92a-2, hsa-mir25, hsa-mir200c, hsa-mir-519a-2 and hsa-mir-1304. Also, we identified somatic mutations occurring with higher relative frequencies between the cohorts, as well as significant differences in gene expression.

Network Analysis of Protein-Protein Interactions relevant in Atherosclerosis and Rheumatoid Arthritis
COSI: NetBio COSI
  • Ravisen Beemadoo, University of Mauritius, Mauritius
  • Anisah Ghoorah, University Of Mauritius, Mauritius
  • Yasmina Jaufeerally Fakim, University of Mauritius, Mauritius

Short Abstract: Atherosclerosis and rheumatoid arthritis are chronic inflammatory diseases of high incidence worldwide. Complex molecular interactions can be addressed through protein-protein interaction networks (PPIN). Here, we identify the common genes and their functional implications in pathways that underlie the common pathophysiology of RA and atherosclerosis using network analysis of PPIN.
By integrating protein-protein interaction (PPI) data and gene-disease associations, we constructed a common PPIN to identify the common genes. We then performed network analysis to identify hub genes. Functional modules were identified and mapped to Gene Ontology (GO) annotations.
The resulting PPIN consists of 1379 nodes and 1739 interactions with 26 hub genes representing the union of atherosclerosis and RA related genes. Of these 26 genes, PTGS2, VCAM-1, ICAM-1, TNF-α, ALOX5, TGFβ1 and FOS are identified as both hub and common inflammation genes. Five most significant functional modules are detected and GO analysis reveals that these genes are involved in inflammatory response, regulation of T cell proliferation and response to lipopolysaccharide as biological processes. Three significant pathways validated by KEGG are TNF, NFkB and Interleukin-17 signaling pathways. These genes could represent potential biomarkers and drug targets for future research on RA and atherosclerosis and their associated comorbidities.

Network analysis reveals transcriptional regulation driving neuroendocrine tumors
COSI: NetBio COSI
  • Jiawen Yang, University of Arizona, United States
  • James Lim, University of Arizona, United States
  • Megha Padi, The University of Arizona, United States

Short Abstract: Neuroendocrine tumors (NETs) are highly aggressive malignancies. Although NETs can arise in different tissue types, they converge to exhibit similar morphology and express common neuroendocrine markers which suggest a shared trans-differentiation process driving their development. Here we study Merkel cell carcinoma, a neuroendocrine skin cancer that can be caused by infection with Merkel cell polyomavirus (MCPyV). Using network-based approaches, we investigated how MCPyV oncogenes ST and LT synergistically reprogram normal fibroblasts into a neuroendocrine-like state. We created doxycycline-inducible fibroblast cell lines expressing ST and LT, and tracked the influence of the oncogenes over time through RNA-seq profiling. Our analysis revealed that simultaneous expression of ST and LT leads to dynamic changes in metabolic state, pro-metastatic functions, and neuroendocrine markers such as ENO2, NEFM, and SYN1. We reconstructed the host transcriptional network before and after expression of MCPyV oncogenes and applied ALPACA, a method for detecting significant changes in the community structure of weighted bipartite transcriptional networks. Our analysis revealed that OLIG and atonal (ATOH1) are key drivers of the trans-differentiation of fibroblasts to neuroendocrine cells. By combining time-series data with network structure, we can systematically identify the regulatory circuits driving a challenging cancer type in need of effective therapeutic strategies.

Network-guided connectivity mapping
COSI: NetBio COSI
  • Ameya Bhope, McGill University, Canada
  • Lulan Shen, McGill University, Canada
  • Andrey Cybulsky, McGill University Health Centre, Canada
  • Amin Emad, McGill University, Canada

Short Abstract: Connectivity mapping (CMap) is a common approach to match gene signatures of different conditions (e.g. to identify potential treatments for a disease or to annotate the cell types in a single-cell transcriptomic study). In its simplest formulation, correlation analysis (or its variations), or supervised classification is used for this purpose. However, these methods do not take into account the interaction (e.g. protein-protein interactions) and relevance (e.g. belonging to the same pathway) of the genes in a cell, failing to incorporate biologically important information. We developed a novel method that enables incorporating the relationship among genes into the CMap analysis. The method first obtains a relevance score for each gene-pair by embedding a gene-gene interaction network onto a Euclidean subspace. Then, it utilizes a generalized ranking distance to match two gene expression profiles together, while incorporating these scores. Using gene expression signatures of different drugs and cell lines from the LINCS database, we showed that this method improves the accuracy of CMap analysis compared to correlation- and classification-based methods currently in use. This novel method, being a general framework for gene signature analysis and CMapping, will be applicable to various biomedical studies.

PC2P: Parameter-free network-based prediction of protein complexes
COSI: NetBio COSI
  • Sara Omranian, University of Potsdam, Max Planck Institute of Molecular Plant Physiology, Germany
  • Angela Angeleska, University of Tampa, United States
  • Zoran Nikoloski, University of Potsdam, Max Planck Institute of Molecular Plant Physiology, Germany

Short Abstract: Prediction of protein complexes from protein-protein interaction (PPI) networks relies on algorithms for network community detection that seek to identify dense subgraphs in protein interaction networks. However, recently assembled gold standards in yeast and human indicate that protein complexes also form sparse subgraphs. The contribution of our study is five-fold: (1) we formalize the concept of a protein complex as a biclique spanned subgraph, that addresses the density issue of existing approaches; (2) we propose a parameter-free approximation algorithm, termed Protein Complexes from Coherent Partition (PC2P), that solves the network partitioning into biclique spanned subgraphs by removing the smallest number of edges in a given PPI network; (3) we demonstrate that the resulting clusterings are of high modularity, thus reflecting the local structures in PPI networks, and (4) we show that PC2P outperforms seven seminal approaches with respect to a composite score combining five performance measures.

PecanPy: Parallelized, efficient, and accelerated node2vec in Python
COSI: NetBio COSI
  • Reming Liu, Michigan State University, United States
  • Arjun Krishnan, Michigan State University, United States

Short Abstract: Learning low-dimensional numerical representations of nodes in biological networks is an important task for leveraging machine learning to analyze large networks. Node2vec has been widely used for this purpose, and has shown good performance in a variety of tasks. However, the original Python or C++ implementations of node2vec do not scale well to and become intractable for modern genome-scale, dense molecular networks with tens of thousands of nodes and hundreds of millions edges. We have developed PecanPy, a new software implementation of the node2vec algorithm, written in Python, that resolves the computation and memory inefficiencies in the original implementations. Extensive benchmarking of the embedding process and the resulting representations on node classification tasks demonstrate the computational performance of our software and the high quality of the embeddings. The software is freely available at github.com/krishnanlab/pecanpy.

PhenoGeneRanker: A Tool for Gene Prioritization Using Complete Multiplex Heterogeneous Networks
COSI: NetBio COSI
  • Cagatay Dursun, Medical College of Wisconsin and Marquette University, United States
  • Naoki Shimoyama, Marquette University, United States
  • Mary Shimoyama, Medical College of Wisconsin, United States
  • Michael Schläppi, Marquette University, United States
  • Serdar Bozdag, Marquette University, United States

Short Abstract: Identification of specific complex-trait genes is a challenging process as the etiology of those traits involve multiple genes, multiple layers of molecular interactions and environmental factors. Gene prioritization is an important step to make a manageable short list of high likely complex-trait genes. Integration of biological datasets through networks is a promising approach to identify the complex-trait genes by providing natural way of integration of different, complementary genotypic and phenotypic datasets. Integration of different datasets alleviates the effects of missing data, low signal and noisy nature of biomedical datasets. In this study, we present PhenoGeneRanker, a gene prioritization tool which utilizes multi-layer gene and phenotype networks by combining them in a heterogeneous biological network. PhenoGeneRanker enables integration of weighted/unweighted and undirected gene and phenotype networks for wholistic and comprehensive prioritization of genes. It calculates empirical p-values of gene ranking using random stratified sampling of genes based on their degree of centrality in the network to address potential bias toward high degree nodes in the network. To assess PhenoGeneRanker, we applied it on a rice dataset to rank cold tolerance-related genes. Our results showed PhenoGeneRanker successfully ranked genes such that the top ranked genes were enriched in cold tolerance-related GO terms.

Post-translational modification clusters filter protein interaction networks to elucidate cross-talk between cell signaling pathways in lung cancer
COSI: NetBio COSI
  • Karen Ross, Georgetown University, United States
  • Guolin Zhang, Moffitt Cancer Center and Research Institute, Tampa, FL; mProbe Inc., United States
  • Cuneyt Akcora, Department of Computer Science and Statistics, University of Manitoba , Winnipeg, Manitoba Canada, United States
  • Mark Grimes, Division of Biological Sciences, University of Montana, United States

Short Abstract: Dynamic signaling complexes employ multiple post-translational modifications (PTMs) to convey intracellular messages that govern cell division. We hypothesize that large-scale PTM data combined with protein interaction networks will reveal cell signaling pathways that will suggest strategies to overcome acquired drug resistance to tyrosine kinase inhibitors (TKIs) in cancer therapy.

Sequential enrichment of PTM (SEPTM) proteomics and MaxQuant were used to quantitatively characterize phosphorylation, ubiquitination and acetylation in lung cancer cell lines treated with four TKIs (crizotinib, erlotinib, dasatinib, afatinib) and the proteasome inhibitor, PR171. We used t-SNE to identify PTM clusters from these data and then used the clusters to filter known protein-protein interactions to elucidate cell signaling pathways that respond to drug in concert with the drug target. Intersection of these data-derived networks with biological pathways (from BioPlanet) identifies possible mechanisms for crosstalk among signaling pathways.

Our results show that kinase signaling pathways initiated by receptor tyrosine kinases are associated with distinct signaling modules regulated by acetytransferases and ubiquitin ligases. Elucidation of points of cross-talk among signaling pathways employing different PTMs will reveal new potential drug targets and candidates for synergistic attack through combination drug therapy. These analyses provide rationales for developing new strategies to improve TKI therapeutic efficacy.

Rapid Estimation of Node Significance in Weighted Bipartite Networks
COSI: NetBio COSI
  • James Lim, University of Arizona, United States
  • Chen Chen, University of Arizona, United States
  • Megha Padi, The University of Arizona, United States

Short Abstract: In recent years, network-based approaches that help to understand biological systems and identify disease mechanisms have attracted attention. Such approaches build on the assumption that human diseases are the result of localized perturbation within a certain neighborhood in the cellular network. The identification of these neighborhoods, or disease modules, is therefore becoming an important tool to understand diseases. While numerous existing community detection algorithms can find putative disease modules, it remains challenging to evaluate their robustness. In this work we aim to fill this gap by constructing a statistical significance test for the contribution of every node to the disease module in gene regulatory networks. The configuration model, a constrained maximum-entropy ensemble of networks, was considered as the null model. We derived its exact distribution for weighted bipartite graphs (e.g., transcriptional networks) and created a new efficient algorithm to learn this graph within a few minutes. We also analytically estimated the asymptotic distribution of node score so its significance could be evaluated instantaneously. Our approach improved recovery of cancer-relevant GO terms in breast and ovarian cancer networks. Our fast implementation of the configuration model could potentially be integrated into the community detection process to discover novel disease modules.

Robust gene coexpression networks using signed distance correlation
COSI: NetBio COSI
  • Javier Pardo-Díaz, University of Oxford, United Kingdom
  • Lyuba V Bozhilova, University of Oxford, United Kingdom
  • Mariano Beguerisse-Díaz, University of Oxford, United Kingdom
  • Philip Poole, University of Oxford, United Kingdom
  • Charlotte Deane, University of Oxford, United Kingdom
  • Gesine Reinert, University of Oxford, United Kingdom

Short Abstract: Even within well-studied organisms, many genes lack any useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. The lack of trustworthy functional annotation for an organism can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information, and are structurally stable even in the absence of functional information.

We introduce the concept of signed distance correlation as a signed measure of dependency between two variables and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation. We propose a pipeline to generate self-consistent unweighted networks using signed distance correlation purely from gene expression data, with no additional information. We demonstrate that networks generated with our proposed method from three different datasets are more stable, and capture more biological information compared to networks obtained from Pearson or Spearman correlations.

SignaLink3 - a multi-layered approach of uncovering tissue-specific signaling networks
COSI: NetBio COSI
  • Dávid Fazekas, Earlham Institute, United Kingdom
  • Tamás Kadlecsik, Eotvos Lorand University, Hungary
  • Balázs Bohár, Earlham Institute, United Kingdom
  • Máté Szalay-Bekő, Earlham Institute, United Kingdom
  • Dezső Módos, Earlham Institute, United Kingdom
  • Luca Csabai, Earlham Institute, Norwich, UK, United Kingdom
  • Tamas Korcsmaros, Earlham Institute, United Kingdom

Short Abstract: Biological networks are used to describe elements of a biological system by representing interactors (such as proteins or cells) with nodes and their connections with edges. However, this representation portrays a simplified version of reality. Therefore, we developed SignaLink3 - an update of SignaLink2, currently with 200 weekly users - an integrated tool to improve biological relevance of these networks in human and three main model organisms. The multi-layered structure of the resource combines different interaction types by creating connections between multiple networks (i.e. protein-protein interaction network with their transcriptional regulators). SignaLink3 integrates both experimentally validated and predicted interactions from 18 widely used resources. With over 400 000 new human interactions, the update extends the core signaling pathways to contain lncRNAs in addition to pathway-specific transcription factors, miRNAs, and post-translational modifying enzymes. The tool will now allow user-specific filtering based on tissue or subcellular localization by merging included data with expression information. We provide researchers with multiple download format options (i.e.: csv, PSI-MI TAB, Cytoscape). With the development of SignaLink3, we hope to provide researchers with a user-friendly way to design experiments, model specific signaling processes or discover novel drug target candidates.

Supervised learning methods can efficiently leverage whole-genome molecular-networks for accurate gene classification
COSI: NetBio COSI
  • Anna Yannakopoulos, Michigan State University, United States
  • Kayla Johnson, Michigan State University, United States
  • Christopher Mancuso, Michigan State University, United States
  • Reming Liu, Michigan State University, United States
  • Arjun Krishnan, Michigan State University, United States

Short Abstract: Assigning every human gene to specific functions, diseases, and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods that can leverage molecular interaction networks to predict gene attributes. In this study, we present a comprehensive benchmarking of supervised-learning for network-based gene classification, evaluating this approach and a classic label-propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised-learning on a gene’s full network connectivity outperforms label-propagation and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label-propagation’s appeal for naturally using network topology. We further show that supervised-learning on the full network is also superior to learning on node-embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised-learning is an accurate approach for prioritizing genes associated with diverse functions, diseases, and traits and should be considered a staple of network-based gene classification workflows. The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

Survival analysis of pathway activity as a prognostic determinant in breast cancer.
COSI: NetBio COSI
  • Gustavo S Jeuken, KTH Royal Institute of Technology, Sweden
  • Lukas Käll, KTH Stockholm, Sweden

Short Abstract: Recently a couple of papers have been published linking the expression of each transcript in cancer tissue to the survival of patients. However, typically the analysis renders prognostic values for more than twenty thousand dimensions, leading to various issues related to high dimensional data. Here we propose a factor analysis method that translates gene transcription to biological pathways.

We reanalysed the METABRIC breast cancer dataset, by condensing each cancer’s transcription profile into individualized measures for pathway expression, followed by survival analysis. Not only does the method capture the difference in pathway activity between the samples, but also estimates the significance of its influence over patients' survival.

We compared our pathway- to transcript-level analysis and demonstrated prognostic power, as measured by concordance index, on par with the best individual transcripts. We found a larger fraction of Reactome pathways than individual transcripts (7% vs 1%) that have large predictive value (C>0.6). As we typically have an order of magnitude fewer hypotheses to test, the framework gives more accessible interpretation as well as more significant results.

The Reactome Pathway Knowledgebase: Variants, Dark Proteins and Functional Interactions
COSI: NetBio COSI
  • Robin Haw, Ontario Institute for Cancer Research, Canada
  • Reactome Consortium, Reactome, Canada
  • Henning Hermjakob, EMBL - European Bioinformatics Institute, United Kingdom
  • Peter D'Eustachio, New York University School of Medicine, United States
  • Guanming Wu, Oregon Health and Science University, United States
  • Lincoln Stein, Ontario Institute for Cancer Research, Canada

Short Abstract: Reactome is an open access, open source pathway knowledgebase. Its holdings now comprise 12,986 human reactions organized into 2,362 pathways involving 10,908 proteins, 1,865 small molecules, 237 drugs, and 12,206 complexes. 31,237 literature references support these annotations. The roles of variant forms of some proteins, both germline and somatically arising, have been annotated into disease-variant types of reactions and additional reactions that capture the effects of small molecule drugs on these disease processes. To support different visualization and analysis approaches, we implemented several new features through our website, tools, and ReactomeFIViz-Cytoscape app, such as gene set analysis (GSA), an R interface, a Python client, and an intuitive genome-wide results overview based on Voronoi maps. Furthermore, to increase Reactome adoption within the research community, we developed portals and web services for specific user communities. As part of the Illuminating the Druggable (IDG) program, we have undertaken the role to project understudied (Tdark) proteins into the Reactome pathway context, providing useful contextual information for these understudied proteins for experimental biologists to design experiments to understand these proteins’ functions. Reactome thus provides dominant pathway- and network-based tools for analyzing multiple data sets and types.

Towards networking biomedical data using the CROssBAR API
COSI: NetBio COSI
  • Vishal Joshi, EMBL-EBI, United Kingdom
  • Laura Lopez Real, EMBL-EBI, United Kingdom
  • Rabie Saidi, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • María Martin, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Short Abstract: 1. Crossbar API
CROssBAR utilises its bespoke ETL (Extract-Transform-Load) pipelines designed on highly distributed EMBL-EBI infrastructure. These pipelines implement a multi-omics data integration approach to periodically release a single resource. Data are fetched from various resources (see Table 1) namely, UniProt, InterPro, ChEMBL, PubChem, DrugBank, EFO and HPO; then cleansed, validated and consolidated into highly connected biological and biomedical information. The CROssBAR knowledgebase is hosted and maintained by EMBL-EBI in a MongoDB database. It provides a broad spectrum of information such as functional annotations, pathways, protein interaction, diseases, drugs, chemical compounds, etc. These data can be queried via a public API at: wwwdev.ebi.ac.uk/crossbar/swagger-ui.html. The Crossbar API provides a multi faceted view of stored data through 12 endpoints; which allows users to drive biological networks from any facet e.g drug-centric, disease-centric, gene-centric etc.
2. Use case: Building a COVID-19 knowledge network
Figure 1 depicts a biological network focused on the COVID-19. Protein (yellow and pink nodes) accessions were fetched from covid-19.uniprot.org. CROssBAR was then used to retrieve more information to build the network including all possible relations between:
Proteins (interactions)
Proteins and their organisms (orange nodes: Human, SARS-CoV and SARS-CoV-2)
Proteins and drugs (green nodes)

WikiPathways: Pathway Models for Network Analysis
COSI: NetBio COSI
  • Martina Kutmon, Maastricht University, Netherlands
  • Anders Riutta, The Gladstone Institutes, UCSF, United States
  • Denise Slenter, Maastricht University, Netherlands
  • Egon Willighagen, Maastricht University, Netherlands
  • Kristina Hanspers, The Gladstone Institutes, UCSF, United States
  • Chris Evelo, Maastricht University, Netherlands
  • Alexander Pico, The Gladstone Institutes, UCSF, United States

Short Abstract: WikiPathways (wikipathways.org) is a community curated pathway database that enables researchers to capture rich, intuitive models of biological pathways. Importantly, pathway models from WikiPathways are also a valuable source for network analysis and the content is provided in different formats including RDF [1], via dedicated apps like for Cytoscape [2], and on the network data exchange platform, NDEx [3]. This enables simple integration of pathway and interaction data from WikiPathways in network analysis as highlighted in recent publications [4,5,6].

In addition to ongoing curation efforts to grow and maintain the database, we have identified publication figures as a valuable resource. We estimate ~1000 pathway figures are published and indexed by PubMed Central each month [7]. These figures contain novel pathway content not present in the text nor captured in pathway databases. We identified 64,643 pathway figures published over the past 25 years and performed optical character recognition (OCR) to extract over a million gene symbols mapping to 13,464 unique human NCBI Genes. Pathway figure-based gene sets can be used to index and annotate the literature, to perform enrichment analysis, and to prioritize curation of new pathway models for downstream network analysis.

References:
1.doi.org/10.1371/journal.pcbi.1004989
2.doi.org/10.12688/f1000research.4254.2
3.doi.org/10.1016%2Fj.cels.2015.10.001
4.doi.org/10.3389/fgene.2019.00059
5.doi.org/10.1167/iovs.61.4.24
6.doi.org/10.1016/j.jsbmb.2019.01.003
7.doi.org/10.1101/379446