Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters - Schedules

Poster presentations at ISMB/ECCB 2021 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster beginning July 19 and no later than July 23. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2021. There are Q&A opportunities through a chat function and poster presenters can schedule small group discussions with up to 15 delegates during the conference.

Information on preparing your poster and poster talk are available at: https://www.iscb.org/ismbeccb2021-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Session A: Sunday, July 25 between 15:20 - 16:20 UTC
Session B: Monday, July 26 between 15:20 - 16:20 UTC
Session C: Tuesday, July 27 between 15:20 - 16:20 UTC
Session D: Wednesday, July 28 between 15:20 - 16:20 UTC
Session E: Thursday, July 29 between 15:20 - 16:20 UTC
A systematic evaluation of the impact of environmental exposures on the human interactome network
COSI: NetBio
  • Salvo Danilo Lombardo, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences – Vienna, AT, Austria
  • Jörg Menche, Max Perutz Labs, University of Vienna, Vienna, Austria, Austria

Short Abstract: Historically, the health risk of environmental exposures was assessed mostly by epidemiological studies. Only recently, there was an increased effort to integrate the large amount of newly generated -omics data to uncover the relationships between chemical compounds and human health. Here, we propose a network-based approach for investigating the landscape of genetic perturbations induced by the known chemical compounds present in the environment. We constructed a bipartite network between genes and exposures, which enabled us to identify eight distinct communities, and define two major types of environmental perturbations (hub and module exposures). Our systematic approach further allowed us to identify major biological mechanisms of action and targeted tissues for each exposure, retrieving known tissue-exposure associations and predicting new ones. For example, the category: “Chemical Class and Uses” is particularly enriched for steroid hormone metabolism on a genetic, pathway, and tissue level, with implications for development and disease risk. Finally, we demonstrate that there is a strong correlation between chemical similarities and genetic perturbation similarities on the human interactome.

Accurate network-based gene classification with ultra fast context-specific node embedding
COSI: NetBio
  • Arjun Krishnan, Michigan State University, United States
  • Renming Liu, Michigan State University, United States
  • Matthew Hirn, Michigan State University, United States

Short Abstract: Genome-scale gene interactions networks are powerful models of functional relationships between tens of thousands of genes in complex organisms. Numerous studies have established how these networks can be leveraged to classify experimentally un(der)-characterized genes to specific biological processes, and diseases. We and others have previously shown that low-dimensional embeddings of nodes, in particular using node2vec, can be beneficial for network-based gene classification. However, the original node2vec implementations have significant bottlenecks due to memory and computational inefficiencies. Here, we present PecanPy, an efficient Python implementation of node2vec that is parallelized, memory efficient, and accelerated using Numba with a cache-optimized data structure. We have extensively benchmarked our software using networks from the original node2vec study and multiple additional large biological networks. These analyses demonstrate that PecanPy efficiently generates high-quality node embeddings for networks at multiple scales including large (>800k nodes) and dense (fully connected network of 26k nodes) networks that the original implementations failed to execute. With the ultra fast and memory efficient PecanPy, we’re currently working on creating contextualized biological network embeddings using biological context derived from factors such as environmental/treatment conditions, tissues and cell types, which could be beneficial for disease gene classification specific to their contexts.

Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
COSI: NetBio
  • Ping-Han Hsieh, Centre of Molecular Medicine Norway, Norway
  • Camila Miranda Lopes-Ramos, Department of Biostatistics, Harvard T.H. Chan School of Public Health, United States
  • Geir Kjetil Sandve, Department of Informatics, University of Oslo, Norway
  • Kimberly Glass, Channing Division of Network Medicine, Brigham and Women’s Hospital, United States
  • Marieke Lydia Kuijjer, Centre of Molecular Medicine Norway, Norway

Short Abstract: Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns that may be driven by similar regulatory processes. Co-expression can be estimated from RNA-Seq data, which are generally normalized to remove technical variability. Here, we find and demonstrate that quantile-based normalization methods can introduce false-positive associations between genes. This consequently hampers downstream co-expression network analysis. As certain quantile-based normalization method can be applied on large-scale heterogeneous data to remove technical variability while maintaining global differences in expression for samples with different biological attributes. We therefore developed CAIMAN, a method to correct for false-positive associations that may arise from the normalization of RNA-Seq data. CAIMAN utilizes a Gaussian mixture model to fit the distribution of gene expression and to adaptively select the threshold to define lowly expressed genes, which are prone to form false-positive associations. Thereafter, CAIMAN corrects the normalized expression for these genes by removing the variability across samples that might lead to false-positive associations. Moreover, CAIMAN avoids arbitrary gene filtering and retains associations to genes that only express in small subgroups of samples, highlighting its potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data.

Analysis of dynamics and stability of hybrid system models of gene regulatory networks
COSI: NetBio
  • Juris Viksna, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Karlis Cerans, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Karlis Freivalds, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Lelde Lace, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Gatis Melkus, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Darta Rituma, Institute of Mathematics and Computer Science, University of Latvia, Latvia

Short Abstract: We present research on method development for analysis of dynamics and behavioural stability of hybrid system based models of gene regulatory networks. The used HSM framework provides differentiation between quantitative and qualitative behavioural aspects of the modelled system. The framework regards GRNs to be fully defined with concrete functions describing variable changes and thresholds triggering state transitions. At the same time, it is assumed that the knowledge about the underlying model is limited with information on comparative values of its parameters. The framework allows to answer the questions about stable behaviour regions (attractors) of the system, 'switching conditions' that irrevocably leads the system to reaching a single region of stable behaviour, and requirements that are needed for the system to exhibit specific cyclicity for trajectories. This is done by analysing state spaces of the modelled systems in graph topology terms.

The new contributions we are presenting here include: 1) analysis tools, in particular, for identification of switching conditions and behaviour cyclicity; 2) lambdoid phage models LPH2 and HK22 and their analysis results; 3) preliminary HSM models for mammalian circadian cycle that show that the framework allows to model cyclic activity of controlling genes consistently with more complex ODE models.

Analysis of Host-Pathogen Protein-Protein interaction network with special focus on HIV-MTB Co-infection.
COSI: NetBio
  • H. A. Nagarajaram, University Of Hyderabad, India
  • Aishwarya Gholse, university Of Hyderabad, India

Short Abstract: HIV and MTB (Mycobacterium tuberculosis) are among the 10 leading causes of death worldwide and together they contribute high incidence of mortality. It is known that HIV infection increases the risk of latent Tb to active TB by 20 fold and MTB increases levels of HIV-1 replication, propagation & genetic diversity. Their coinfection is more dangerous than their mono-infection.
In the present study, known MTB-human protein-protein bipartite interactions, human-HIV protein-protein bipartite interactions were integrated into known Human protein-protein interactions to generate an HIV-Human-MTB protein-protein bridge network representing a snapshot of coinfection condition in terms of protein-protein interactions among the three stakeholders.
A preliminary analysis of this bridge network reveals that the human proteins commonly targeted by both pathogens are more topologically important in the PPIN than the proteins targeted in their mono-infection. Functional enrichment analysis also indicates that the human proteins commonly targeted by MTB and HIV are involved in core cellular functions, and they have a higher number of pathways (KEGG), a higher number of splice variants, have a significantly high protein abundance, and also expressed in many tissues as compared with those interacting with either MTB or HIV. This study, therefore, identifies the human proteins important for co-infection.

Beyond protein-protein interaction networks: Exploring the impact of alternative splicing using DIGGER and NEASE
COSI: NetBio
  • Jan Baumbach, Computational Systems Biology, Hamburg University, Germany
  • Markus List, Chair of Experimental Bioinformatics, Technical University of Munich, Germany
  • Zakaria Louadi, Computational Systems Biology, Hamburg University, Germany
  • Kevin Yuan, Chair of Experimental Bioinformatics,Technical University of Munich, Germany
  • Alexander Gress, Helmholtz Centre for Infection Research, Germany
  • Olga Tsoy, Computational Systems Biology, Hamburg University, Germany
  • Olga V. Kalinina, Helmholtz Centre for Infection Research, Germany
  • Tim Kacprowski, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Medical School Hannover, Germany

Short Abstract: Alternative splicing plays a major role in regulating the functional repertoire of the proteome. However, isoform-specific effects to protein-protein interactions (PPIs) are usually overlooked, making it impossible to judge the functional role of individual exons on a systems biology level. We overcome this barrier by integrating PPIs, domain-domain interactions (DDIs) and residue-level interactions to lift exon expression analysis to a network level. Our user-friendly tool and database DIGGER (Domain Interaction Graph Guided Explorer) is available at exbio.wzw.tum.de/digger and allows users to seamlessly switch between isoform and exon-centric views of the interactome and to extract subnetworks of relevant isoforms.

Furthermore, alongside the database, we propose network enrichment of alternative splicing events (NEASE), a method for differential splicing analysis using the PPIs and DDIs joint network. NEASE considers interactions affected by AS and identifies enriched pathways based on affected edges rather than affected genes. Our analysis shows that NEASE largely outperforms classic gene set enrichment in the context of AS and generates meaningful biological insights on the impact of AS. The DIGGER database and NEASE tool together provide essential resources for studying mechanistic consequences of AS in systems medicine.

Bow-tie architecture of gene regulatory networks in species of varying complexity
COSI: NetBio
  • Gourab Ghosh Roy, University of Birmingham, United Kingdom
  • Shan He, University of Birmingham, United Kingdom
  • Nicholas Geard, University of Melbourne, Australia
  • Karin Verspoor, University of Melbourne, Australia

Short Abstract: Biological differences between species can be well explained by the architecture of their gene regulatory networks (GRNs). We aim to understand species differences in terms of some universally present dynamical properties of their gene regulatory systems. A network architectural feature associated with controlling dynamical system properties is the bow-tie. This architecture, observed in many networks, is identified by a strongly connected subnetwork, the CORE layer, between two sets of nodes, the IN and the OUT layers. However, its existence has not been extensively investigated in GRNs of species of widely varying biological complexity. Here we analyse publicly available GRNs of several well-studied species from prokaryotes to unicellular eukaryotes to multicellular organisms. A bow-tie architecture with a distinct largest strongly connected CORE layer is observed in their GRNs. We show that the bow-tie architecture is a GRN characteristic feature. We observe a generally increasing trend in the relative CORE size with species complexity. Using previously studied relationships of the CORE size with dynamical properties like robustness and fragility, flexibility, criticality, controllability and evolvability, we hypothesise how these dynamical gene regulatory system properties have emerged differently with biological complexity, based on the observed differences of the GRN bow-tie architectures.

Comparative analysis of Pure Hubs and Pure Bottlenecks in Human Protein-protein Interaction Networks
COSI: NetBio
  • Nithya Chandramohan, University of Hyderabad, India
  • Manjari Kiran, University of Hyderabad, India
  • H.A Nagarajaram, University of Hyderabad, India

Short Abstract: Physical interactions among proteins can be represented as graphs where the nodes are proteins, and the edges are interactions among proteins. Analyses of protein-protein interaction networks (PPINs) have potential to yield a wealth of information on the system level functionality of a living cell as a function of interacting proteins. One of the ways to analyse the networks is to measure relative importance of various nodes by computing their centrality values. Of the centrality measures, Hubs and Bottlenecks are the topologically important nodes in a network. It is well known that removal of these nodes is lethal to the cell. In Human PPIN based on hubs and bottlenecks proteins, we find three distinct groups refer to as: a) Pure hubs (nodes having high degree but low betweenness values), b) Mix proteins (nodes having both high degree and high betweenness values) and c) Pure bottlenecks (nodes having high betweenness values but low degree values). Furthermore we also find that these three categories of proteins are associated with distinct characteristics features in the interaction networks – both in global as well as tissue-specific networks. Such information is useful for target prioritization while designing new drugs or repositioning of known or withdrawn drugs.

ConAn: Differential Network Connectivity Analysis
COSI: NetBio
  • Samantha Clayton, Boston University, United States
  • Oluwatosin Olayinka, Boston University, United States
  • Lina Kroehling, Boston University, United States
  • Anthony Federico, Boston University, United States
  • Eric Reed, Boston University, United States
  • Gary Benson, Boston University, United States
  • Stefano Monti, Boston University, United States

Short Abstract: Gene expression and other ‘omics’ data are frequently used to identify biological differences between samples (for example, those from disease and normal groups). Typically, changes are analyzed at the individual gene level, thus missing aggregate changes. Gene co-expression modules are sets of genes that display correlated expression levels across samples. They can be represented in a network graph where nodes are genes, edge weights are correlation coefficients, modules are connected components (based on an edge weight cutoff), and module connectivity is the average of the edge weights. We present ConAn (Differential Network Connectivity Analysis), a tool, and associated R package, for analyzing changes in gene co-expression modules using statistically rigorous resampling-based methods for calculating significance. Extensive simulations show the high sensitivity and specificity of ConAn. We applied ConAn to a Head and Neck Squamous Cell Carcinoma dataset and found two differentially connected modules enriched for cytokine signaling, recapitulating published data showing that increased cytokine expression was associated with tumor progression. ConAn provides an important new methodology for elucidating biologically significant variation through detection of correlated changes.

Differential Co-exepression Network Analysis Using Topological Connectivity And Weights Of Edges
COSI: NetBio
  • Sheng-Lun Huang, National Kaohsiung University of Science and Technology, Taiwan
  • Wen-Yu Chung, National Kaohsiung University of Science and Technology, Taiwan

Short Abstract: Everything is connected, one way or the other. In gene co-expressiong networks, with tens of thousands of gene expression data, we study the changes in topology and association strengths. The network is presented as a graph, with each gene as a node, and co-expression relationship between genes is transformed as the weights of the edges. Can we identify the most relevant disease genes from biological networks? We hypothesize that the such genes will have highly dynamic expression profiles and connectivities. Thus, we proposed two differential co-expression analysis functions, using the combination of topological similarity and edge weights to calculate the scores of two-, three-, four-member gene clusters. We utilized datasets from type 2 diabetes in rats in NCBI GEO, which is available in five growth points. Five gene co-expression networks are established respectively, and the differential co-expression analysis is calculated between neighboring time points. The final score is the dynamics between the genes over growth time. Since the objective is to find the most relevant genes, we pre-processed the expression profiles and retained only differential expressed genes for further analyses. We reported top-ranked genes as the most likely to be disease-related genes. The results showed various features of network biology.

Differential predictions from integrated, multi-omics molecular networks
COSI: NetBio
  • Katharina Baum, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam; before: Luxembourg Institute of Health, Germany
  • Julian Hugo, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany
  • Justus Zeinert, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany
  • Nataniel Müller, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany
  • Spoorthi Kashyap, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany
  • Jagath C. Rajapakse, Nanyang Technological University Singapore, Singapore
  • Francisco Azuaje, Luxembourg Institute of Health; current affiliation: Genomics England, London, UK, Luxembourg
  • Bernhard Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Germany

Short Abstract: Networks provide means to incorporate molecular interactions into reasoning. However, characterization beyond individual ‚omic‘ layers, especially combining metabolomics with gene-product-derived measurements, still remains a challenge.
We here present a novel network analysis pipeline that enables integrative analysis of multi-omics data including metabolomics. It allows for comparative conclusions between two conditions, such as tumor subgroups, healthy vs. disease, or generally control vs. perturbed.
Our approach focuses on interactions and their strength instead of on node properties and includes molecules with low abundance and unknown function. We use correlation-induced networks that are reduced and combined into heterogeneous, multi-omics molecular networks. Prior information such as metabolite-protein interactions are incorporated. A semi-local, path-based integration step denoises the network and ensures integrative conclusions.
As case study, we investigate differential drug response in breast cancer datasets providing proteomics, transcriptomics, phospho-proteomics and metabolomics measurements and contrasting patients with different estrogen receptor status. The impact of our approach is highlighted by critical evaluation on ground truth data from cancer cell lines.
Our proposed pipeline leverages multi-omics data for differential predictions, e.g. on drug response, and includes prior information on interactions. We make it available as R package, molnet, that enables versatile integration and network-based analysis of multi-omics data.

Discovering functional clusters from GWAS genetic interaction profile similarity networks
COSI: NetBio
  • Chad L. Myers, University of Minnesota, United States
  • Mehrad Hajiaghabozorgi, University of Minnesota, United States
  • Wen Wang, University of Minnesota, United States

Short Abstract: Genetic interactions can be important for predicting individual phenotypes. Due to statistical power issues, the systematic study of genetic interaction from GWAS has been challenging. We recently published a new computational approach, called BridGE, for detecting genetic interactions in GWAS by leveraging pathway membership patterns.
One powerful application of genetic interactions, demonstrated by reverse genetic studies in model organisms, is to use the pattern of genetic interactions for a gene of interest to identify other functionally related genes. Gene pairs that exhibit high profile similarity often correspond to genes in common pathways or protein complexes. To date, this profile similarity approach has been applied to genetic interactions derived from reverse genetic screens, but not to genetic interactions from population genetics. The goal is to develop a method for measuring functional gene networks based on genetic interaction profile similarity using genetic interactions derived from GWAS. There are issues that first need to be addressed to get meaningful functional patterns from the similarities in the profiles including population structure, sparsity of profiles, and LD. Different solutions(Clustering, PCA) are used to solve the issues. We summarize progress in benchmarking functional networks and highlight several important caveats to extending profile similarity analyses to GWAS.

Disentangling marine microbial association networks
COSI: NetBio
  • Ina Maria Deutschmann, Institute of Marine Sciences (ICM-CSIC), Spain
  • Ramiro Logares, Institute of Marine Sciences (ICM-CSIC), Spain

Short Abstract: Although microbial interactions are fundamental for Earth’s ecosystem functioning, they remain barely known. Omics-based censuses are helpful to infer association networks aggregated over time or spatial scales (static networks). However, associations could result from either ecological interactions between microorganisms, or environmental selection. Moreover, static networks are problematic because microbial interactions are highly dynamic. This needs to be considered for a better understanding of microbial ecosystems.
We present EnDED, an implementation to predict environmentally-driven microbial associations, and our approaches to determine monthly and sample-specific subnetworks to study microbial association networks across temporal and spatial scales, respectively. We found that the fraction of environmentally-driven edges among negative associations increased rapidly with the number of environmental factors. Moreover, our temporal network appeared to follow an annual cycle, collapsing and reassembling when transiting between colder and warmer waters. We observed higher repeatability in colder than warmer months. Lastly, quantifying spatial association recurrence, we determined universal (lowest in deep-ocean) and location-specific (increased with depth) associations. Most deep-ocean ASVs but only few associations already appeared in upper layers.
Our work are steps forward to disentangle environmental effects, the temporal nature, and the spatial distribution of marine microbial associations, and can be adapted to diverse microbiomes.

Dissecting differential Cell Cell Communication with CrossTalkeR
COSI: NetBio
  • James Shiniti Nagai, RWTH Aachen University Medical School, Germany, Germany
  • Nils B. Leimkühler, University Hospital Essen,Erasmus Medical Center, Germany
  • Rebekka K. Schneider, RWTH Aachen University, Erasmus Medical Center, Germany
  • Michael T. Schaub, RWTH Aachen University, Germany
  • Ivan G. Costa, RWTH Aachen University Medical School, Germany

Short Abstract: By combining the single cell RNA sequencing (scRNA-seq) data and a Ligand Receptor (LR) interaction repositories, methods allows the study of cellular crosstalk via inference of putative links between cell types in a given sample. However, identifying the most prominent cell-type pairs from such data and LR interactions is hard. Some further challenges for the study of CCI includes, the phenotypes’ differential CCI analysis: disease vs. normal; the understanding of gene expression and its biological function (e.g ligand or receptor); and the identification of cell-type specific crosstalk signatures. Motivated by this, we developed CrossTalkeR(Nagai et al. 2021) a novel network based crosstalk analysis tool, which can be used with both single and comparative phenotype data. Our method facilitates the extraction of salient patterns through network properties such as centrality and PageRank at two resolution levels: Cell-Cell Communication (CCC) and Gene-Cell Communication level (GCC). We revisited the Bone Marrow PMF niche study(Leimkühler et al. 2021) using CrossTalkeR (Fig 1.). CrossTalkeR log-odds pagerank revealed disease related main cell types, Megakaryocytes and Neural. Using the Principal Component Analysis of the GCC niche related proteins were identified as examples of cellular matrix (FN1 and COL1A1) and hematopoiesis (CXCL12 and SDC1).

Dynamic PINs: Using Fruit Fly data to study cancer initiation and progression
COSI: NetBio
  • Faisal F. Khan, Precision Medicine Lab, Pakistan
  • Maryam Shah, Precision Medicine Lab, Pakistan

Short Abstract: Cancer is a genetic disease driven by sequential accumulation of mutations over time in cancer genes leading to cancer initiation and later, progression. The aim of this study was to develop a dynamic network-based model of oncogenesis using time-course gene expression data across developmental stages in Drosophila melanogaster. We identified 723 cancer genes from the Cancer Gene Consensus in the COSMIC database, and mapped them into 638 fruit fly orthologs. Out of these 609 had temporal gene expression data across 21 time-courses in Flybase which were taken forward. Next, we developed a dynamic protein interaction network (PIN) across 21 different time-courses. We found 25 proteins that were ‘persistent’ across all time-courses. We use functional annotations of these persistent proteins using DAVID and found the following enriched GO annotations: biological process (regulation of cell shape, centriole duplication, translation initiation, phagocytosis, responses to stress), cellular components (cytoplasm, nucleus, nucleosome, and centriole) and molecular functions (succinate dehydrogenase (ubiquinone) activity, protein heterodimerization activity, kinase binding, nucleosomal DNA binding and protein binding). We continue to further analyze the dynamic network for other topological features including hubs, gate-keepers and sub-clusters in a temporal context and further study the persistent proteins for their roles, essential and otherwise.

Enabling single-cell trajectory network enrichment
COSI: NetBio
  • Alexander G. B. Grønning, University of Copenhagen, Denmark
  • Mhaned Oubounyt, University of Hamburg, Germany
  • Jan Baumbach, University of Hamburg, Germany

Short Abstract: Single-cell sequencing (scRNA-seq) technologies allow the investigation of cellular differentiation processes with unprecedented resolution. Although powerful software packages for scRNA-seq data analysis exist, systems biology-based tools for trajectory analysis are rare and typically difficult to handle. This hampers biological exploration and prevents researchers from gaining deeper insights into the molecular control of developmental processes. Here, to address this, we have developed Scellnetor; a network-constraint time-series clustering algorithm. It allows extraction of temporal differential gene expression network patterns (modules) that explain the difference in regulation of two developmental trajectories. Using well-characterized experimental model systems, we demonstrate the capacity of Scellnetor as a hypothesis generator to identify putative mechanisms driving haematopoiesis or mechanistically interpretable subnetworks driving dysfunctional CD8 T-cell development in chronic infections. Altogether, Scellnetor allows for single-cell trajectory network enrichment, which effectively lifts scRNA-seq data analysis to a systems biology level.

Fast and flexible analysis of linked microbiome data with mako
COSI: NetBio
  • Lisa Röttjers, KU Leuven, Belgium
  • Karoline Faust, Department of Microbiology and Immunlogy, REGA Institute, KU Leuven, Leuven, Belgium, Belgium

Short Abstract: While meta-analytical approaches are increasingly adopted in a range of studies, no framework exists that facilitates flexible and intuitive methods of storage and analysis for microbial association networks. Static network databases are unsuitable for such networks as microbial network construction tools evolve rapidly and do not usually lead to generation of validated microbial interaction networks. Consequently, the analysis of microbial associations demand a more flexible setup that can integrate a wealth of data.

To achieve this, we developed mako, a software package for the rapid and simple construction and use of network databases from microbiome data. The mako software provides an interface between standard biological formats and Neo4j databases. We use a curated database containing 60 unique data sets to illustrate how Neo4j can aid in the study of microbial networks. Specifically, we find that 4-node cliques are overrepresented in animal-derived data sets compared to data sets labelled as plant, saline or non-saline.

Genetic adaptation of bacterial catabolic pathways for pesticide biodegradation: role of historical contingency
COSI: NetBio
  • Kathleen Marchal, UGent, Belgium
  • Harry Lerner, KU Leuven, Belgium
  • Patricio Puchaicela, UGent, Belgium
  • Simon Isphording, UGent, Belgium
  • Dirk Springael, KU Leuven, Belgium

Short Abstract: The use of pesticides contaminates surface and groundwater. Biodegradation is crucial for the fate of pesticides in the environment both in natural attenuation processes in the field and in dedicated bioremediation technologies for mitigating contamination.

Genetic adaptation appears important in the acquisition of novel catabolic pathways for bacterial pesticide degradation. Such pathways apparently evolved by combining individual catabolic gene modules and/or by mutations/rearrangements in genes for degradation of structurally related natural or xenobiotic compounds. Here we assessed through an evolution experiment of communities subjected to different related pesticides whether communities adapted towards one pesticide can more easily gain the capacities to metabolize a novel but related pesticide not yet seen by the community. DNA extracted from parallel evolved communities was subjected to DNA SIP metagenomics analysis. Metagenome analysis pointed towards unexpected changes in community composition.

GLOWgenes, a novel method to predict new gene-disease associations by a disease-aware evaluation of heterogeneous molecular networks
COSI: NetBio
  • Lorena de la Fuente, Health Research Institute Fundación Jiménez Díaz (IIS-FJD), UAM, Spain
  • Irene Perea-Romero, Health Research Institute Fundación Jiménez Díaz (IIS-FJD), UAM, Spain
  • Marta del Pozo-Valero, Health Research Institute Fundación Jiménez Díaz (IIS-FJD), UAM, Spain
  • Fiona Blanco-Kelly, Health Research Institute Fundación Jiménez Díaz (IIS-FJD), UAM, Spain
  • Carmen Ayuso, Health Research Institute Fundación Jiménez Díaz (IIS-FJD), UAM, Spain
  • Pablo Minguez, Health Research Institute Fundación Jiménez Díaz (IIS-FJD), UAM, Spain

Short Abstract: The screening for pathogenic variants in the diagnosis of genetic diseases can be now performed in all genes due to the application of whole exome/genome sequencing. To accelerate the discovery of new gene-disease associations, several computer-based algorithms are available mainly searchin the functional neighborhood of known causing genes. With this aim, we hypothesize that the capacity of every type of functional information to extract relevant insights depends on the disease under study.

We compiled a collection of 33 functional networks classified in 13 knowledge categories (KCs) and developed GLOWgenes (www.glowgenes.org), a network-based algorithm using a random-walk with restart propagation model that systematically evaluates KCs ability to recover genes on a given list associated to any phenotype/disease and modulates the prediction of new candidates accordingly.

Applied to 91 gene-sets associated to diseases, we observed a high variability of the KCs in their ability to recover genes from diseases, measured in terms of efficiency and uniqueness. A comparison with other state-of-art tools shows that GLOWgenes can boost the discovery of new gene-disease associations, especially for the less obvious. Applied to 15 unsolved WES, GLOWgenes proposed five new genes for retinal dystrophies, two of them not associated with any disorder previously.

Graphlet Coefficient Vector: a comprehensive topological descriptor for biological networks
COSI: NetBio
  • Gaia Ceddia, Barcelona Supercomputing Center, Spain
  • Markus K. Youssef, Université de Lausanne, Switzerland
  • Sergio Doria-Belenguer, Barcelona Supercomputing Center, Spain
  • Noel Malod-Dognin, Barcelona Supercomputing Center, Spain
  • Natasa Przulj, Barcelona Supercomputing Center, Spain

Short Abstract: Over the last two decades, the use of network science in biology unraveled valuables insights into the organizational principles of a cell. It has been proved that functional mechanisms in a cell are better captured by interaction patterns than by focusing on isolated constituents. Thus, numerous tools and measures have been developed to quantify the topological structures of biological networks. For example, the number of appearances of graphlets, i.e., small induced graphs, is a clear topological descriptor, displaying correlations between functional annotations of gene products and their role in multiple network representations (e.g., protein-protein interaction, coexpression, or genetic interaction networks). In this study, we develop a new method to quantify rewiring patterns of biological frameworks by leveraging the redundancy relations of orbit counts (i.e., node partitions of graphlets). We define graphlet coefficients as the normalized orbit counts for given redundancy equations; thus, they are independent of each other and do not need normalization. We further demonstrate that graphlet coefficients better capture biological annotations within several species-specific networks than previously used graphlet counts. Graphlet coefficients also perform better in distinguishing both model and real networks than orbit counts, revealing a comprehensive topological descriptor that can detect different local topologies.

Graphlet-based Coalescent embedding uncovers complementary biological information in yeast molecular networks
COSI: NetBio
  • Daniel Tello, Barcelona Supercomputing Center, Spain
  • Sam F.L. Windels, Barcelona Supercomputing Center, Spain
  • René Böttcher, Barcelona Supercomputing Center, Spain
  • Noël Malod-Dognin, Barcelona Supercomputing Center, Spain
  • Nataša Pržulj, Barcelona Supercomputing Center, Spain

Short Abstract: Understanding the functional organization of molecular networks is an ongoing challenge. For this purpose, Spatial Analysis of Functional Enrichment (SAFE) was recently proposed to uncover functionally enriched regions in a network. It does so by embedding the network into a 2-dimensional (2D) space and finding groups of nodes embedded together that are statistically significantly similarly annotated. While SAFE is based on the Spring embedding algorithm, other approaches such as Coalescent and graphlet-based embedding are known to embed biological networks in biologically more relevant ways. We extend SAFE to use both Coalescent and graphlet-based embedding. Moreover, we combine these two methods into Graphlet-Based-Coalescent (GraCoal) embedding and compare it to the default Spring embedding by embedding three different budding yeast molecular interaction networks: genetic interaction, co-expression, and protein-protein interaction network. We show that the topology captured by GraCoal embeddings allow for uncovering functional information that is more functionally coherent than using SAFE with the Spring embedding. We also show that different graphlets within this embedding framework capture complementary biological functions. Finally, we examine the complementarity of GraCoal embeddings by investigating biological functions that are uniquely captured by each corresponding graphlet.

Heterogeneity in the gene regulatory landscape of leiomyosarcoma.
COSI: NetBio
  • Tatiana Belova, Centre for Molecular Medicine Norway (NCMM), University of Oslo, Norway
  • Ping-Han Hsieh, Centre for Molecular Medicine Norway (NCMM), University of Oslo, Norway
  • Yang Jing, German Cancer Research Center (DKFZ), Heidelberg, Germany
  • Priya Chudasama, German Cancer Research Center (DKFZ), Heidelberg, Germany
  • Marieke Kuijjer, Centre for Molecular Medicine Norway (NCMM), University of Oslo, Norway

Short Abstract: Characterizing inter-tumor heterogeneity is crucial for selecting suitable cancer therapy as the presence of diverse molecular subgroups of patients can be associated with disease outcome or response to treatment. However, no methods have been developed to characterize heterogeneity based on genome-wide patient-specific regulatory networks. Here, we propose a simple but efficient approach to characterize inter-tumor regulatory network heterogeneity, which we call PORCUPINE (PCA to Obtain Regulatory Contributions Using Pathway-based Interpretation of Network Estimates). PORCUPINE uses as input individual patient regulatory networks, represented by estimated regulatory interactions between transcription factors and their target genes, and a list of genes assigned to biological pathways in order to identify pathways that drive heterogeneity among individuals. We used PORCUPINE to model regulatory heterogeneity in leiomyosarcoma, a rare, aggressive, and heterogeneous cancer. We applied it to 80 genome-wide leiomyosarcoma regulatory networks modeled on data from TCGA and validated the results in an independent dataset of 37 leiomyosarcoma cases. PORCUPINE identified 37 pathways, including pathways that represent potential targets for treatment, such as FGFR and CTLA4 inhibitory signaling pathways. PORCUPINE provides a robust way of analysing and interpreting patient-specific regulatory networks and serves as the first step towards implementing network-informed personalized medicine in leiomyosarcoma.

Higher-resolution protein interaction networks for Precision Medicine: A case for Head & Neck Cancer
COSI: NetBio
  • Faisal F. Khan, Precision Medicine Lab, Pakistan
  • Maryam Shah, Precision Medicine Lab, Pakistan

Short Abstract: Protein-Protein Interaction (PPI) networks are great scaffolds for omics datasets and can aid significantly in drug discovery research. This study aimed to use domain-domain interaction (DDI) data to produce higher resolution protein interaction networks with enhanced topological insights. An extensive literature review identified 65 driver genes for HNSCC and 19 genes for TNBC which collectively had 557 and 316 protein domains. Eventually, 343/557 and 191/316 domains had well-annotated somatic mutations reported in COSMIC.
The top hubs in PPI networks were TP53 for (HNSCC) and EGFR (for TNBC), while the top hubs in DDI networks were PF12796 (NOTCH) and PF07714 (EGFR) for HNSCC and TNBC networks, respectively.
The top nodes in sub-clusters based on module analysis in PPI networks were ERBB4 and NOTCH1 for HNSCC, and AR and EGFR for TNBC. For DDI networks, the top nodes in the densest sub-cluster for HNSCC were PF06816 (NOTCH1-3), while for TNBC, PF07714 domains from FGFR1 and EGFR were top nodes.
Our preliminary results re-emphasize the significance of NOTCH, EGFR and TP53, and their domains in cancer. 20/31 FDA approved cancer drugs were map to domains within HNSCC network. With Cetuximab approved for HNSCC, 19 candidates can be further analyzed for repurposing in HNSCC.

Identification and Clustering Network of Virulent Aeromonas Hydrophila C16-13425 Hypothetical Proteins
COSI: NetBio
  • Harun Pirim, Industrial and Systems Engineering, Mississippi State University, United States
  • Hasan C. Tekedar, College of Veterinary Medicine, Mississippi State University, United States
  • Matt J. Griffin, College of Veterinary Medicine, Mississippi State University, United States
  • Geoffrey C. Waldbieser, Warmwater Aquaculture Research Unit, Agriculture Research Service, U.S. Department of Agriculture, United States
  • Larry A. Hanson, College of Veterinary Medicine, Mississippi State University, United States

Short Abstract: Aeromonas hydrophila is an opportunistic pathogen that causes infections in humans and fish species. The U.S. channel catfish industry has been affected by virulent A. hydrophila (vAh) since 2009 and caused extensive mortalities and economic losses to the channel catfish industry in the United States. However, our knowledge of pathogenic mechanisms of vAh is limited. Therefore, important work remains to be done. We sequenced the complete genome of an A. hydrophila strain C16-13425 that was isolated from an outbreak of Aeromonas septicemia in catfish from a commercial production pond in Mississippi. However, many proteins (1082 out of 4879) from its genomes are not assigned a role. These unknown proteins are called hypothetical proteins and they remain to be elucidated so that their function and potential biological roles could be identified and assigned. Pfam and CATH databases are used to retain hypothetical proteins at consensus. 83 sequences were in common. These sequences are submitted to Blast, DEG, and PSORTdb databases to obtain information about homolog sequences, essential genes, and subcellular localization. The information is employed to construct a weighted similarity network of relationships between hypothetical proteins. Louvain community structure finding algorithm is applied and three distinct clusters of proteins are formed.

Identification of the most influential nodes involving all topological dimensions of a network
COSI: NetBio
  • Abbas Salavaty, Monash University, Australia
  • Mirana Ramialison, Monash University, Australia
  • Peter Currie, Monash University, Australia

Short Abstract: Biological systems are composed of highly complex networks and several algorithms have been designed to identify the most influential regulatory points within them. However, current methods do not address all the topological dimensions of a network or correct for inherent positional biases. Here we present Integrated Value of Influence (IVI), which integrates the most important and commonly used network centrality measures in an unbiased way and captures all of the topological dimensions of a network to successfully identify the most influential nodes.

idg.reactome.org: A web-based platform for visualizing dark proteins in the context of Reactome pathways
COSI: NetBio
  • Solomon Shorser, Ontario Institute for Cancer Research, Canada
  • Robin Haw, OICR, Canada
  • Tim Brunson, Oregon Health Science University, United States
  • Nasim Sanati, Oregon Health Science University, United States
  • Lisa Matthews, Reactome, United States
  • Lincoln Stein, Ontario Institute for Cancer Research, Canada
  • Peter D’eustachio, NYU School of Medicine, United States
  • Guanming Wu, OHSU, United States

Short Abstract: Due to research bias towards already druggable genes, there is minimal knowledge about one-third of protein-coding genes. idg.reactome.org provides a collection of tools to explore understudied proteins in the context of Reactome. Reactome is the most comprehensive, open access pathway knowledgebase. The IDG specific tools are designed to facilitate generation of experimentally testable hypotheses to illuminate dark genes. The homepage allows users to search any gene and view its location in Reactome’s annotated pathways and interacting pathways reachable via one-hop pairwise relationships. Users can view scored interacting pathways based on functional interactions predicted from a random forest model trained with 106 features. We have extended our Pathway Browser with new overlays and visualizations. In the overview, users can view interacting pathways of a searched protein. When a pathway is opened, users are presented with an extended diagram viewer, allowing view of the protein knowledge levels, overlay multiple tissue specific expression values from 19 sources, and overlay protein/protein pairwise relationships or drug/target interactions. Reactome’s diagrams can also be converted into functional interaction networks and viewed with cytoscape.js. This portal offers an integrative platform for researchers to study dark proteins and learn about their potential functions.

Integrating expression and network analyses to unveil functional features of brassinosteroid signaling in Arabidopsis mutant lines
COSI: NetBio
  • Razgar Seyed Rahmani, Ghent University, Belgium
  • Tao Shi, Chinese Academy of Sciences, China
  • Dongzhi Zhang, Lanzhou University, China
  • Xiaoping Gou, Lanzhou University, China
  • Jing Yi, Lanzhou University, China
  • Giles Miclotte, Ghent University, Belgium
  • Kathleen Marchal, Ghent University, Belgium
  • Jia Li, Lanzhou University, China

Short Abstract: Brassinosteroid (BR) signaling regulates plant growth and development. Although many genes have been identified that play a role in BR signaling, the functional consequences of disrupting those key BR genes still requires detailed investigation. Here we performed phenotypic and transcriptomic comparisons of A. thaliana lines carrying a loss-of-function mutation in BRI1 gene, bri1-5, that exhibits a dwarf phenotype and its three activation-tag suppressor lines that were able to partially revert the bri1-5 mutant phenotype to a WS2 phenotype, namely bri1-5/bri1-1D, bri1-5/brs1-1D, bri1-5/bak1-1D. From the three investigated bri1-5 suppressors, bri1-5/bak1-1D was the most effective suppressor. All three bri1-5 suppressors showed altered expression of the genes in the abscisic acid (ABA signaling) pathway, indicating that ABA likely contributes to the recovery of the wild-type phenotype in these bri1-5 suppressors. Network analysis revealed crosstalk between BR and other phytohormone signaling pathways, suggesting that interference with one hormone signaling pathway also affects other hormone signaling pathways. In addition, differential expression analysis suggested the existence of a strong negative feedback from BR signaling on BR biosynthesis and also predicted that BRS1, rather than being directly involved in signaling, might be responsible for providing an optimal environment for the interaction between BRI1 and its ligand.

Internetwork connectivity of molecular networks across species of life
COSI: NetBio
  • Tarun Mahajan, University of Illinois at Urbana-Champaign, United States
  • Roy D. Dar, University of Illinois at Urbana-Champaign, United States

Short Abstract: Molecular interactions are studied as independent networks in systems biology. However, molecular networks do not exist independently of each other. In a network of networks approach (called multiplex), we study the joint organization of transcriptional regulatory network (TRN) and protein–protein interaction (PPI) network. We find that TRN and PPI are non-randomly coupled across five different eukaryotic species. Gene degrees in TRN (number of downstream genes) are positively correlated with protein degrees in PPI (number of interacting protein partners). Gene–gene and protein–protein interactions in TRN and PPI, respectively, also non-randomly overlap. These design principles are conserved across the five eukaryotic species. Robustness of the TRN–PPI multiplex is dependent on this coupling. Functionally important genes and proteins, such as essential, disease-related and those interacting with pathogen proteins, are preferentially situated in important parts of the human multiplex with highly overlapping interactions. We unveil the multiplex architecture of TRN and PPI. Multiplex architecture may thus define a general framework for studying molecular networks. This approach may uncover the building blocks of the hierarchical organization of molecular interactions.

Jointly modeling networks from multiple species to improve network-based gene classification
COSI: NetBio
  • Arjun Krishnan, Michigan State University, United States
  • Christopher Mancuso, Michigan State University, United States
  • Kayla Johnson, Michigan State University, United States
  • Sneha Sundar, Michigan State University, United States

Short Abstract: Network-based machine learning is a powerful approach for leveraging the cellular context of genes to computationally predict novel/under-characterized genes that are functionally similar to a set of known genes of interest. One powerful network-based gene classification method that is gaining popularity is to use supervised learning algorithms where the features for each gene are determined by that gene’s connections in a molecular network. In this work, we explore how networks from multiple species can be jointly leveraged to improve this gene classification method. We first build multi-species networks by connecting nodes (genes/proteins) in different species if they belong to the same orthologous group. Then, we create feature representations by directly considering a gene's connection to all other genes in the entire multi-species network or considering a low-dimensional embedding for the entire network. We find that adding information across species improves performance for the tasks of predicting human and model species gene annotations across a set of non-redundant gene ontology biological processes. In addition to providing better predictions, this approach allows genes across species to be represented in the same “space” where they can be naturally incorporated into any joint model.

Linear functional organization of the omic embedding space
COSI: NetBio
  • Alexandros Xenos, Barcelona Supercomputing Centre, Spain
  • Noel Malod-Dognin, Barcelona Supercomputing Centre, Spain
  • Stevan Milinković, RAF School of Computing, Serbia
  • Natasa Przulj, ICREA; Barcelona Supercomputing Center; University College London, Spain

Short Abstract: Motivation: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings.

Methods: Since neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a Positive Pointwise Mutual Information (PPMI) matrix, we propose the use of the PPMI matrix to represent the human protein-protein interaction (PPI) network. We also introduce the Graphlet Degree Vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network. Then, we generate gene embeddings by decomposing these matrices with Non-Negative Matrix Tri-Factorization.

Results: We demonstrate that we can extract new biomedical knowledge directly by doing linear operations on the genes vectorial representations. We exploit this property to identify new cancer-related genes and predict genes participating in protein complexes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature, and with patient survival curves, we demonstrate that 93.3% of them have a potential clinical relevance as biomarkers of cancer.

Multiscale phase separation by percolation model with the single chromatin loop resolution
COSI: NetBio
  • Yijun Ruan, The Jackson Laboratory for Genomic Medicine, United States
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland, Poland
  • Kaustav Sengupta, Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland, Poland
  • Michał Denkiewicz, Faculty of Mathematics and Information Science, Warsaw Technical University, Warsaw, Poland, Poland
  • Teresa Szczepińska, Centre for Advanced Materials and Technologies, Warsaw Technical University, Warsaw, Poland, Poland
  • Ayatullah Mollah, Department of Computer Science and Engineering, Aliah University, Kolkata, West Bengal, India, India
  • Raissa D'Souza, Department of Computer Science, University of California, Davis, USA, United States

Short Abstract: We propose models of dynamical human genome folding into hierarchical components in the GM12878 cell line. Our models are based on explosive percolation theory and polymer loop extrusion. The chromosomes are modeled as graphs where CTCF chromatin loops are represented as edges. The folding trajectory is simulated by gradually introducing loops to the graph following various edge addition methods which are based on topological properties, loop frequency, compartment data, or chromatin features. In particular, we propose an order parameter which is a scalar value calculated based on chromatin features to guide the genome folding. The value is fitted by Linear Discriminant Analysis to classify the compartments efficiently. The phase separation, where chromatin fiber is condensing in 3D space to topological domains or compartments, is observed when the critical number of contacts is reached. Overall, our in silico model integrates high-throughput, population-averaged genome interaction experimental data with novel theoretical concepts of phase separation. This allows us to model event-based time dynamics of chromatin loop formation and folding trajectories.

NetControl4BioMed: A web-based platform for controllability analysis of protein-protein interaction networks
COSI: NetBio
  • Victor Popescu, Åbo Akademi University, Finland
  • Jose Angel Sanchez Martin, Technical University of Madrid, Spain
  • Daniela Schacherer, Heidelberg University, Germany
  • Sadra Safadoust, Koç University, Turkey
  • Negin Majidi, University of California, Santa Cruz, United States
  • Andrei Andronescu, Polytechnic University of Bucharest, Romania
  • Alexandru Nedea, Polytechnic University of Bucharest, Romania
  • Diana Ion, Polytechnic University of Bucharest, Romania
  • Eduard Mititelu, Polytechnic University of Bucharest, Romania
  • Eugen Czeizler, Åbo Akademi University, Finland
  • Ion Petre, University of Turku, Finland

Short Abstract: Target network controllability aims to discover suitable external interventions that can guide a system to a specific state. In the biomedical domain, it can translate to finding drugs that can influence a cell in a desired way. This can lead to novel and personalized therapeutic suggestions based on drug combinations and drug repurposing. We introduce NetControl4BioMed, a free open-source web-based application that allows users to generate or upload personalized protein-protein interaction networks and to investigate and analyze them from a controllability point of view, providing customized drug therapeutic suggestions through a user-friendly interface, while offering close integration with external applications and databases. Additionally, it makes sharing between users possible, offering the possibility for collaboration on work. The application integrates protein data from HGNC, Ensemble, UniProt, NCBI, and InnateDB, protein-protein interaction data from InnateDB, Omnipath, and SIGNOR, cell-line data from COLT and DepMap, and drug-target data from DrugBank. The application and data are available online at netcontrol.combio.org/. The source code is available at github.com/Vilksar/NetControl4BioMed under an MIT license.

Network analysis of DEGs with putative roles in neuronal development under microgravity conditions
COSI: NetBio
  • Faisal F. Khan, CECOS-RMI Precision Medicine Lab, Pakistan
  • Maryam Shah, CECOS-RMI Precision Medicine Lab, Pakistan
  • Quratulain Danish, Institute of Integrative Biosciences, CECOS University, Pakistan

Short Abstract: Microgravity promotes changes at the cellular level as well as the genomic level. These changes contribute to the risks of abnormal physiological and neurological responses faced by astronauts during prolonged missions, for example, at the International Space Station (ISS). This study aims to investigate the potential effects of microgravity on gene expression levels of genes that are key to the central nervous system (CNS) especially the differentiation of stem cells into neurons. We use transcriptomic data from the Gene Expression Omnibus (GEO) as well as the NASA Gene Lab. We have identified 60 key genes that are important across the four stages of neuronal development. Following the detailed analysis of differentially expressed genes (DEGs) across these datasets, we intend to conduct a systems-level study using protein-protein interaction data as well in order to study the functional interdependencies of the DEGs. We hope to understand how the CNS changes in response to, and adapts with, microgravity conditions.

Network mapping of pharmacogenomic data on a diverse panel of disease phenotypes enhances cell type-aware drug discovery
COSI: NetBio
  • Arda Halu, Brigham and Women's Hospital/Harvard Medical School, United States
  • Julius Decano, Brigham and Women's Hospital/Harvard Medical School, United States
  • Joan Matamalas, Brigham and Women's Hospital/Harvard Medical School, United States
  • Takaharu Asano, Brigham and Women's Hospital/Harvard Medical School, United States
  • Namitra Kalicharran, Brigham and Women's Hospital/Harvard Medical School, United States
  • Sasha Singh, Brigham and Women's Hospital/Harvard Medical School, United States
  • Masanori Aikawa, Brigham and Women's Hospital/Harvard Medical School, United States

Short Abstract: Large-scale pharmacogenomic databases such as the Connectivity Map (CMap) have greatly assisted computational drug discovery. Despite their utility, CMap studies have mostly been agnostic to gene-perturbation interactions in multiple disease contexts. We present a computational framework that uses the recent large-scale CMap to build over 50 cell type-specific gene-perturbation networks and integrates these networks with an intermediary panel of disease phenotypes and cheminformatic data for a nested prioritization of cell lines and perturbations. The prediction performance of our method surpasses that of solely cheminformatic measures, as well as state-of-the-art methods that use CMap data to generate gene-perturbation networks and rank perturbations in a cell type-specific manner. Top-ranked drug perturbations identified using our framework have high chemical structural diversity, suggesting its potential for building compound libraries. Finally, a proof-of-concept application of our framework demonstrates the effectiveness of the intermediary disease phenotypes in providing additional non-redundant information on drug mechanisms related to diseases that are not directly evident from the input disease signatures. Overall, our analytical framework outperforms currently available methods in terms of predictive power and offers the potential to be a feasible blueprint for a cell type-specific drug discovery and repositioning platform that accounts for multiple disease phenotypes.

Network modules identify unique breast invasive lobular carcinoma pathways
COSI: NetBio
  • George Acquaah-Mensah, Massachusetts College of Pharmacy and Health Sciences (MCPHS) - Worcester, MA, United States
  • Eunmi Kim, Massachusetts College of Pharmacy and Health Sciences (MCPHS) - Worcester, MA, United States
  • Eunmi Kim, mcphs, United States

Short Abstract: Breast invasive lobular carcinomas (ILC) are more aggressive than invasive ductal carcinomas (IDC) in terms of metastasis, worse prognoses, larger tumor sizes, and worse survival outcomes.
This study seeks to characterize age effects on differential expression patterns between ILC and IDC, and how protein-protein interactions and biological pathways associated with cancer progression are impacted by those.
Using SQL and Google BigQuery, breast tumor sample RNASeq and clinical data derived from the Cancer Genome Atlas (TCGA) were interrogated. Genes differentially expressed (DEGs) between ILC and IDC within different age groups were identified using Bioconductor package, siggenes. A DEG-based subset of the STRING protein-protein interaction (PPI) network was analyzed in Cytoscape. Network modules (jActive Modules) were examined.
Based on DEGs, 126,075 edges of PPI were examined. CDK1 and SRC represent DEGs with the highest degrees in the network. CDK1 is in physical interaction with proteins involved in the G2/M Transition, pathways in Alzheimer’s disease models, Neurodegenerative Diseases, and Diseases of programmed cell death. SRC interacts with proteins involved in Signaling by receptor tyrosine kinases, cell-cell communication, signaling by SCF-KIT, signaling by MET, and co- stimulation of the CD28 family. These elucidate differences in aggressiveness between ILC and IDC and characterize age-dependent changes.

Network-based approach to study drug mechanisms-of-action
COSI: NetBio
  • Vivian Robin, EMBL-EBI, United Kingdom
  • Girolamo Giudice, EMBL-EBI, United Kingdom
  • Evangelia Petsalaki, EMBL-EBI, United Kingdom

Short Abstract: Drugs can have adverse effects which can range from relatively mild, such as headaches, nausea and vomiting to very serious such as severe lung swelling, eventually drastically reducing life quality. Understanding the mechanisms of drug side effects can allow us to target them or predict them. The prediction of side-effects usually involves information of a chemical nature, such as the chemical formula and structure of the drug, but also information on the biological profile, such as age, sex, and family history. In our study we aim to use network analysis by random-walk-with-restart to combine information about specific side effects caused by multiple different drugs and identify potential mechanisms of adverse drug effects. We will benchmark our pipeline using information from the Adverse Outcome Pathway Wiki and apply the approach to all side effects annotated in the Open targets database.

Network-based precision medicine in complex phenotypes: Individualized interactomes in hypertrophic cardiomyopathy
COSI: NetBio
  • Ruisheng Wang, HMS, United States

Short Abstract: See the attachment.

Reducing false GO term calls in network-based active module identification: methodology and a new algorithm
COSI: NetBio
  • Hagai Levi, Tel Aviv University, Israel
  • Ran Elkon, Tel Aviv University, Israel
  • Ron Shamir, Tel Aviv University, Israel

Short Abstract: Algorithms for active module identification (AMI) are central to analysis of omics data. Such algorithms receive a gene network and genes' activity scores as input and report sub-networks with high activity signal ('active modules'), thus representing biological processes that presumably play key roles in the analyzed conditions. Here, we systematically evaluated six popular AMI methods on gene expression and GWAS data. We observed that GO terms enriched in modules detected on the real data were often also enriched on modules found on randomly permuted data. This indicated that AMI methods frequently report modules that are not specific to the biological context measured by the analyzed omics dataset. To tackle this bias, we designed a permutation-based method that empirically evaluates GO terms reported by AMI methods. We used the method to fashion five novel AMI performance criteria. Last, we developed DOMINO, a novel AMI algorithm, that outperformed the other six algorithms in extensive testing on GE and GWAS data. Software is available at github.com/Shamir-Lab.

Revealing molecular mechanisms of a rare hereditary thrombophilia using a phenotype integrative approach
COSI: NetBio
  • Noel Malod-Dognin, Barcelona Supercomputing Center (BSC), Spain
  • Gaia Ceddia, Barcelona Supercomputing Center (BSC), Spain
  • Natasa Przulj, Barcelona Supercomputing Center (BSC), Spain

Short Abstract: Traditional genome-wide association studies (GWAS) of rare diseases have always been affected by the scarcity of related data. To overcome this issue, we leverage the property of the Non-negative Matrix Tri-Factorization method to integrate multiple datasets, supporting and boosting the biological signal. Particularly, we focus on a rare subtype of hereditary thrombophilia caused by mutations in the prothrombin gene, called antithrombin resistance. We employ patients' phenotypes from two Serbian families that reported a specific mutation called prothrombin Belgrade mutation, leading to thrombosis disorders. In this study, we develop a data-integration framework to combine the patients' phenotypes with genes' molecular interactions. This allows us to obtain candidate driver genes from integrated gene clusters, which take into account both phenotypes and molecular interactions. Results show that clusters contain annotated genes for thrombosis disorders and candidate driver genes. Indeed, by analyzing healthy-specific and disease-specific subnetworks, we find several genes whose mutations may affect the decreased platelet activation, i.e., a well-known process related to the prothrombin mutation. Overall, our framework can cope with the lack of genomic data providing interesting insights on the molecular mechanisms of the disease, and it can be easily generalized for different kinds of rare diseases.

Revealing the hidden language of DNA
COSI: NetBio
  • Mikhail Rotkevich, Barcelona Supercomputing Center, Spain
  • Sam Windels, Barcelona Supercomputing Center, Spain
  • Carlos Garcia-Hernandez, Barcelona Supercomputing Center, Spain
  • Noël Malod-Dognin, Barcelona Supercomputing Center, Spain
  • Nataša Pržulj, Barcelona Supercomputing Center, Spain

Short Abstract: Advances in DNA sequence data analysis have led to many discoveries, such as the elucidation of disease mechanisms and uncovering drug targets. However, current sequence-based methods are usually based on sequence similarity, a relatively simple heuristic. To go beyond sequence similarity, we apply NLP-based methods to the DNA, interpreting it as a long string of text and k-mers, nucleotides sequence of length k, as words. Specifically, we compute the positive pointwise mutual information (PPMI), quantifying the likelihood of co-occurrence of two k-mers on the DNA. Then, we apply Non-negative matrix tri-factorisation (NMTF) on the PPMI matrix to create an embedding space that captures the functional organisation of the DNA. We validate this by embedding genes into the k-mer space and showing that clusters of embedded genes are enriched in gene ontology terms. By considering different k-mers` lengths, we demonstrate that there exists a specie-specific optimal kmer-length, such that increasing it the percentage of enriched clusters and enriched functional annotations plateau. As our framework captures the functional organisation of the entire DNA and utilises NMTF, it could be used to improve current sequence-based applications that currently usually only take the genome sequence into account, e.g., patient stratification and drug-target repurposing.

scRNA-NetExtract: generating signalling networks from single-cell expression data
COSI: NetBio
  • Prajna Hebbar, EMBL-EBI, United Kingdom
  • Girolamo Giudice, EMBL-EBI, United Kingdom
  • Evangelia Petsalaki, EMBL-EBI, United Kingdom

Short Abstract: Single-cell RNAseq data and phosphoproteomics data are similar in that they are both noisy and sparse. Hence, it is reasonable that approaches developed for analysing phosphoproteomics data can be used for similar analyses in scRNAseq data.
We have recently developed NetExtract, a global and local network propagation approach that can provide network signatures from phosphoproteomics datasets (Giudice et al., in preparation). In this study, we adapt the method for single-cell RNAseq and evaluate whether it can recover signalling networks from transcriptomics signatures.
As there is no paired scRNAseq/phosphorylation dataset available as a benchmark, we considered the paired bulk transcriptomics - phosphoproteomics data available in the CPTAC studies and simulated single-cell RNAseq data from the bulk RNAseq datasets. To ensure that the pseudo-single-cell data resemble real single-cell data, we performed multiple analyses, comparing the distribution of total gene count, count distribution for individual genes, and mean-variance relationship of gene expression. Once assured of the resemblance, the pseudo-scRNAseq networks were generated using NetExtract. These networks are compared with the phospho-networks. The scores of the genes from the pseudo-scRNAseq networks will be compared to those obtained from phospho-data. If successful, we expect scRNA-NetExtract to be useful for extracting signalling networks from scRNAseq datasets.

Simulation, modeling, and network-guided detection of epistasis
COSI: NetBio
  • Jan Baumbach, Chair of Computational Systems Biology, University of Hamburg, Germany
  • Markus List, Chair of Experimental Bioinformatics, Technical University of Munich, Germany
  • David Blumenthal, Chair of Experimental Bioinformatics, Technical University of Munich, Germany
  • Tim Kacprowski, PLRI, TU Braunschweig, MHH, BRICS, Germany
  • Markus Hoffmann, Chair of Experimental Bioinformatics, Technical University of Munich, Germany

Short Abstract: Genome-wide association studies (GWAS) link genetic variants to phenotypic traits of interest (i.e., a disease), usually by looking for biallelic single nucleotide polymorphisms (SNPs) that are individually predictive of the phenotype. SNPs usually account only for a fraction of the investigated traits’ heritability. The most common hypothesis is that the missing heritability can be explained by epistasis, i.e., by interactions between SNPs that are jointly predictive of the phenotype but individually have little or no effect. Although epistasis is assumed to play an important role in the genomics of complex phenotypic traits, no undisputed cases of epistasis in humans are known. Developing epistasis detection tools is problematic for at least three reasons: Firstly, there is no suitable human data with ground truth that could be used for evaluation. Secondly, it is unclear how epistasis should be formally modeled to render it algorithmically accessible. Thirdly, it is often unclear whether predicted cases of epistasis are biologically meaningful or mere statistical artifacts. In our work, we address these problems with (1) an epistasis simulation tool, (2) a comparison of existing statistical models, and (3) a detection tool guided by biological knowledge to lift state-of-the-art epistasis detection to a systems-oriented network biology level.

Stoichiometric Modeling of String Chemistries
COSI: NetBio
  • Devlin Moyer, Boston University, United States
  • Alan Pacheco, Boston University, United States
  • David Bernstein, Boston University, United States
  • Daniel Segrè, Boston University, United States

Short Abstract: Uncovering the general principles that govern the structure of metabolic networks is key to understanding the emergence and evolution of living systems. Artificial chemistries can help illuminate this problem by enabling the exploration of chemical reaction universes that are constrained by general mathematical rules that can be as complex or as simple as desired. Using a novel Python package, ARtificial CHemistry NEtwork Toolbox (ARCHNET), we have explored the topological characteristics of different artificial chemistry networks. We have also developed a network-pruning algorithm that can generate minimal metabolic networks capable of producing a specified set of biomass precursors from a given assortment of environmental nutrients using flux-balance analysis. We found that the compositions of these minimal metabolic networks were influenced more strongly by the identities of the metabolites in the biomass reaction than the identities of the environmental nutrients. This has important implications for the reconstruction of organismal metabolic networks, and could help us better understand the rise and evolution of biochemical organization. More generally, our work provides a bridge between artificial chemistries and stoichiometric modeling, which can be applied to a variety of topics, from the origin of life to the structure of microbial communities.

Strengths and limitations of causal inference for gene regulatory networks in yeast
COSI: NetBio
  • Adriaan Ludl, Computational Biology Unit, Department of Informatics, University of Bergen, Norway
  • Tom Michoel, Computational Biology Unit, Department of Informatics, University of Bergen, Norway
  • Mariyam Khan, Computational Biology Unit, Department of Informatics, University of Bergen, Norway

Short Abstract: Causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks. Instrumental variable methods (IV) use a local eQTL as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods (ME) additionally require that distal eQTL associations are mediated by the source gene. Using Findr, a software providing uniform implementations of IV, ME, and coexpression-based methods, a dataset of 1012 segregants from a yeast cross, and the YEASTRACT database, we compared causal gene network inference methods. We found that causal inference results in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. ME had high specificity, but a subsampling analysis revealed that residual correlations reduce sensitivity, leading to saturation at large sample sizes. IV methods overcome the limited sensitivity of ME, at the expense of potential false positive predictions due to genomic linkage between eQTLs for nearby genes. In ongoing work, we are developing multi-variate IV methods to solve the genomic linkage problem and disentangle the relative trans-effects of multiple genes with linked cis-regulatory sites.

Supervised prediction of aging-related genes via weighted dynamic network analysis
COSI: NetBio
  • Tijana Milenković, University of Notre Dame, United States
  • Qi Li, University of Notre Dame, United States
  • Khalique Newaz, University of Notre Dame, United States

Short Abstract: We focus on supervised prediction of aging-related genes from -omics data. Unlike gene expression methods for this task that capture aging-specific information but ignore interactions between genes (their protein products), or protein-protein interaction (PPI) network methods for this task that consider PPIs but the PPIs are context-unspecific, we recently integrated the two data types into an aging-specific PPI subnetwork, which yielded more accurate aging-related gene predictions. However, a dynamic aging-specific subnetwork did not engender better performance than a static aging-specific subnetwork, despite the aging process being dynamic. This could be because the dynamic subnetwork was inferred using a naive induced subgraph-based approach ("Induced"). Instead, more recently, we inferred a dynamic aging-specific subnetwork using the notion of network propagation (NP). This NP-based subnetwork is unweighted, i.e., it gives all aging-specific PPIs the same relevance. Because considering aging-specific edge weights might be important, we now propose a weighted NP-based dynamic aging-specific subnetwork. We demonstrate that a predictive machine learning model trained and tested on this new weighted dynamic subnetwork yields higher accuracy when predicting aging-related genes than predictive models run on the existing unweighted dynamic or static subnetworks, regardless of whether the existing subnetworks were inferred using NP or the Induced approach.



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube