View Posters By Category
Session A: (July 22 and July 23)
Session B: (July 24 and July 25)
Presentation Schedule for July 22, 6:00 pm – 8:00 pm
Presentation Schedule for July 23, 6:00 pm – 8:00 pm
Presentation Schedule for July 24, 6:00 pm – 8:00 pm
Session A Poster Set-up and Dismantle
Session B Poster Set-up and Dismantle
Short Abstract: Hepatic cancer stem cells (HCSCs) are considered as main players for the hepatocellular carcinoma (HCC) initiation, metastasis, drug resistance and recurrence. There is a growing evidence supporting the down-regulated miRNAs in HCSCs as key suppressors for the stemness traits, but still more details are vague about how these miRNAs modulate the HCC development. To uncover some of these miRNA regulatory aspects in HCSC, we compiled 15 down-regulated miRNA and their validated and predicted up-regulated targets in HCSC. The targets were enriched for several cancer cell stemness hallmarks and CSC pre-metastatic niche, which support these miRNAs role in suppression of HCSCs neoplastic transformation. Further, we constructed miRNA-Transcription factor (TF) regulatory networks, which provided new insights on the role of the proposed miRNA-TF co-regulation in the cancer stemness axis and its cross talk with the surrounding microenvironment. Our analysis revealed HCSC important hubs as candidate regulators for targeting hepatic cancer stemness such as, miR-148a, miR-214, E2F family, MYC and SLC7A5. Finally, we proposed a possible model for miRNA and TF co-regulation of HCSC signaling pathways. Our study identified an HCSC signature and set bridges between the reported results to give guide for future validation of HCC therapeutic strategies avoiding drug resistance.
Short Abstract: Metabolic network analysis through flux balance is an established method for the computational design of production strains in metabolic engineering. A key principle often used in this method is the production of target metabolites as by-products of cell growth. Recently, the strong coupling-based method was used to demonstrate that the coupling of growth and production is possible for nearly all metabolites through reaction deletions in genome-scale metabolic models of Escherichia coli and Saccharomyces cerevisiae under aerobic conditions. However, it is unknown whether this coupling, using reaction deletions, is always possible under anaerobic conditions. In fact, when growing S. cerevisiae under anaerobic conditions, deletion strategies using the strong coupling-based method were possible for only 3.9% of all metabolites. Here, we found that the coupling of growth and production is theoretically possible for 91.3% metabolites in genome-scale models of S. cerevisiae under anaerobic conditions if any reaction deletion is allowed. This analysis was conducted for the worst-case-scenario using flux variability analysis. To demonstrate the feasibility of the coupling, we derived appropriate reaction deletions using a new algorithm for target production in which the search space was divided into small cubes. Our results are fundamentally important for computational metabolic engineering under anaerobic conditions.
Short Abstract: Unsupervised learning approaches are frequently employed to identify disease-associated genes. In particular, biclustering is a powerful technique to cluster genes along with patients. However, the genes forming biclusters are often not related in a functional way. This complicates biological interpretation of the results. Molecular interaction networks can provide a mechanistic explanation of clustered patients through functional connections among genes that improve the interpretability of results. This leads to a better understanding of disease patterns. We developed the novel, network-constrained biclustering approach BiGAnts, based on ant colony optimization to obtain interpretable subsets, of genes for stratifying patients. The algorithm enforces genes inside a bicluster to be functionally connected in a gene interaction network. Analyses on large-scale cancer gene expression datasets such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) demonstrated that BiGAnts clustered patients in agreement with known breast or lung cancer subtypes and discovered phenotype-specific genes that are connected and can, thus, be interpreted with respect to their biological function. BiGAnts is noise and batch-effect robust approach able to provide valuable insights about cancer subtypes and their mechanisms.
Short Abstract: In many problem settings that arise within the field of computational biology, data is modeled via combinatorial graphs with or without node and edge attributes. For some tasks, (dis-)similarity queries between those graphs have to be answered. A very flexible and therefore widely used dissimilarity measure is the graph edit distance (GED). Given edit cost functions which are defined in terms of the input graphs' node and edge attributes, GED is defined as the minimum cost of a sequence of edit operations (node and edge substitutions, insertions, and deletions) that transforms one graph into another. In our poster, we present GEDLIB, an open source C++ library for exactly or approximately computing GED and hence answering graph dissimilarity queries. Many existing edit cost functions and GED algorithms are already implemented in GEDLIB. Moreover, GEDLIB is designed to be easily extensible: for implementing new edit cost functions and GED algorithms, it suffices to implement abstract classes contained in the library. For implementing these extensions, the user has access to a wide range of utilities, such as deep neural networks, support vector machines, mixed integer linear programming solvers, a blackbox optimizer, and solvers for the linear sum assignment problem with and without error-correction.
Short Abstract: To assess drug safety, efficacy and toxicity, the primary organ of interest is the liver, as it is responsible for drug biotransformation and the first organ exposed to oral drug formulations. Ideally, accurate drug toxicity assessment is done using co-cultures of hepatocytes and non-parenchymal liver cells. The Verfaillie laboratory developed protocols to derive hepatocyte-like cells (HLCs) and endothelial cells (ECs) from pluripotent stem cells (PSCs). However, PSC-derived progeny still differ from their primary counterparts. To improve maturation of HLCs, and to derive liver sinusoidal endothelial cells (LSEC) from PSCs, deeper insights in regulatory transcription factors (TFs) are needed. As any cell type is characterised by one or more sets of co-expressed genes, we hypothesised that TFs that centrally regulate these gene sets would be crucial to improve maturation of PSC-progeny into their respective primary counterparts. We performed large-scale meta-analyses on HLCs and various ECs, including WGCNA gene co-expression analyses and TF binding motif enrichment analyses by iRegulon. We scored the identified clusters and regulons, resulting in a ranking of candidate TFs determining hepatocyte and LSEC identities. Currently, the functional effect of inducing expression of key regulatory TFs in maturation of HLCs and differentiation of PSC-ECs to LSECs is being tested.
Short Abstract: Plant-derived natural compounds, such as acetylsalicylic acid, morphine, digitoxin, have shown plethora of beneficial therapeutic effects. However, the elucidation of their mechanism(s) of action can be challenging, because these compounds can bind multiple protein targets with unrelated structures. A systems biology approach that integrates numerous cellular data layer components may help to link downstream effect(s) with the cascade(s) of signaling molecular changes in response to a natural compound. In this work, we treat different cell systems with various concentrations of compounds in the presence or absence of activators or inhibitors of key signaling pathway molecules. Supernatants and cell lysates are sampled within minutes to hours to generate a multi-omics dataset, including phosphoproteomic, proteomic, transcriptomic, and metabolomic data. The data analysis aims to build a consensus biological network model by combining an ensemble of knowledge-based and data-driven computational algorithms to determine the dynamic signaling routes promoted by natural compounds. Reverse engineering methods, such as the network perturbation amplitude algorithm leveraging causal network models are used to translate gene expression into quantitative values that indicate the specific activation of a network. Integer linear programming on interaction graphs is used to cross-reference experimental data with our current knowledge of signaling network topologies.
Short Abstract: Recent advances in scRNA-seq technology have led to significant efforts to store and share the resulting data. For example, the EBI has extended Expression Atlas to include single-cell data, and the Chan-Zuckerberg Initiative has launched a major project to build a Human Cell Atlas, which will use single-cell transcriptomics along with images and other data to categorize and identify all of the cell types in the human body. Systems such as Seurat and scanpy offer pipelines for the analysis of scRNA-seq data, but few tools exist for its integration with other -omics data or for its downstream analysis within the context of protein networks or pathways. The construction of network models from single-cell data is a crucial step in exploratory analysis and interpretation, offering a host of graph-based analytical and visualization opportunities. Here we present scNetViz, a Cytoscape App for the network analysis and visualization of single-cell RNA-seq data.
Short Abstract: Network-based approaches have proven to be essential to understand the molecular mechanisms underlying human diseases. The use of these methods has been boosted by the abundance of information about the genetic determinants of human diseases and of high quality interactomics data. The DisGeNET Cytoscape App is designed to enable the exploration of the genetic basis of human diseases from a network viewpoint. The app contains a set of functions to query, analyze, and visualize from different perspectives DisGeNET data (version 6.0): gene-disease, variant-disease and disease-disease networks. The functionalities of the app include querying DisGeNET data for specific diseases, genes, variants, and their combinations, and filtering the information by source, score, association type, evidence index, and by disease class. It also enables the annotation of networks of proteins and variants generated by other apps, or uploaded by the user. Finally, the app features a new automation module with functions to programmatically execute different tasks using popular programming languages such as R and Python. In summary, with its newly implemented functionalities the DisGeNET Cytoscape app provides the biomedical community with a tool that enables a systems-level analysis of human diseases in an automatic, reproducible way.
Short Abstract: We are flooded with large scale molecular data capturing complementary aspects of the functioning of a cell. To enable new discoveries, we propose a novel, data-driven concept of an integrated cell, iCell. Also, we introduce a computational prototype of an iCell, which integrates three omics, tissue-specific molecular interaction network types: protein-protein interactions, gene co-expressions, and genetic interactions. We apply our framework and construct iCells of four cancers, breast, prostate, lung, and colorectal, as well as of the corresponding tissue controls. Comparison between cancer and control iCells allows us to uncover the most rewired genes in cancer that do not appear as different in any of the constituent data types. Many of these genes are of unknown function. We biologically validate that they have a role in cancer by knockdown experiments followed by cell viability assays. We find additional support through Kaplan-Meier survival curves of thousands of patients. Finally, we extend this analysis to twenty different cancer types to uncover new pan-cancer genes. Our methodology is universal and enables integrative omics comparisons of diverse data over cells and tissues.
Short Abstract: Cytoscape is an open-source software used to analyze and visualize networks. In addition to being able to import networks from a variety of sources, Cytoscape allows users to import tabular node data and visualize it onto networks. Unfortunately, such data tables can only contain one row of data per node, whereas omics data often have multiple rows for the same gene or protein, representing different post-translational modification sites, peptides, splice isoforms, or conditions. However, Cytoscape has an API that allows developers to make apps that extend its functionality. Here, we present a new app, Omics Visualizer, that allows users to import data tables with several rows referring to the same node, connect them to one or more networks, and visualize the connected data onto networks. Omics Visualizer uses the Cytoscape enhancedGraphics app to draw charts in the nodes (pie charts) or around the nodes (donut charts). If the user does not provide a network, Omics Visualizer can retrieve one from the STRING database using the Cytoscape stringApp. The app is freely available at http://apps.cytoscape.org/apps/omicsvisualizer.
Short Abstract: Major depressive disorder (MDD) is a psychiatric illness with devastating consequences with regard to personal and social functioning as well as physical health. The world health organization (WHO) has declared MDD the leading cause of disability worldwide, and increasing rates of MDD incidence are reported, while no sizeable improvement combatting MDD has been achieved during the last decade. Therefore, the design of biomarkers for MDD is an important step towards personalized medicine, helping with precision prognosis, diagnosis and evaluation of therapy response. Here, we present lipidomics and metabolomics combinatorial MDD biomarkers designed by means of an automatic computational approach for omics data analysis that employs PC-corr algorithm, a PCA-based machine learning technique for discovery of discriminative network. We show that molecular signatures obtained by this computational approach segregate both MDD and controls with high performance, measured by Area Under the Roc-Curve (AUC) and Area Under Precision Recall (AUPR). The analysis was made on two datasets coming from a German cohort of 50 samples described by 496 lipids and 919 metabolites. Our Automatic Biomarker Design (ABD) algorithm have demonstrated to overcome in both datasets the performance of SVM and RF, having higher AUC and AUPR values for the validation analysis.
Short Abstract: Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of networks remains poorly understood. We launched the “Disease Module Identification DREAM Challenge”, an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology, and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies (GWAS). Our critical assessment of 75 contributed module identification methods reveals novel top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes benchmarks, tools and guidelines for molecular network analysis to study human disease biology (https://synapse.org/modulechallenge).
Short Abstract: Which of the results from a high throughput molecular experiment can be explained by existing knowledge about an organism's biological networks, and which results cannot be explained by existing knowledge? Given the large amount of network knowledge now in hand (for example, the EcoCyc database now contains 2800 substrate-level enzyme regulation events, 3600 transcriptional regulatory interactions, and 2000 metabolic reactions), these questions are not easy to answer. The MultiOmics Explainer is a new component of the Pathway Tools software that seeks to answer those questions. The MultiOmics Explainer queries a Pathway/Genome Database such as EcoCyc, searching through the network of metabolic reactions, transport events, cofactors, enzyme substrate-level regulation events, and transcriptional and translational regulation. The software searches for mechanistic connections among its input genes, proteins, and metabolites. Computed connections are displayed in a network diagram that combines all of the preceding types of influences into an explanatory graph. We have tested the software on Escherichia coli experimental data in which changes in metabolite levels resulting from gene knock-outs have been measured. In a number of cases the software is able to generate reasonable explanations of how the gene knock-out would cause the altered metabolite levels.
Short Abstract: Significant comorbidity exists among a wide variety of developmental and neuronal disorders, where more than one disease occurs in a single patient. These include schizophrenia (SCZ), epilepsy (EPI), intellectual disability (ID), Tourette’s syndrome (TS), as well as other congenital disorders such as congenital heart disease (CHD). While some are known to have overlapping genetic causes, in most cases the genomic basis of this observed comorbidity remains unexplained. Here we use network propagation to examine the network overlap among the 5 neuronal and cardiovascular developmental disorders mentioned above. Analysis of disease pairs uncovers significant overlap at the network level, which is further recapitulated by comorbidity rates, significant enrichment for de novo variants (DNVs), and significant association with biological pathways This convergence of evidence at the network level points to shared genetic mechanisms, which give rise to multiple observed abnormal phenotypes. Our work demonstrates that a systems level approach is essential for understanding common mechanisms underlying a wide variety of neuro and cardiovascular developmental disorders, as well as prioritizes novel disease-related genes which may lead to future therapeutic targets.
Short Abstract: Omics results are often interpreted in the context of molecular interaction networks, enriched pathways, gene sets, or ontologies. In this work, we propose a new analytical tool called GS-rank to implement the concept of Gene set, Network, and Pathway Analysis (GNPA), which aims to unify downstream analysis, by extending the previous work of a GNPA database called PAGER, and we showed how to apply it to disease biology studies. In GS-rank, we consider the following information into a comprehensive disease-gene ranking score: 1) candidate genes from a given omics study; 2) protein-protein interaction (PPI) networks, with available PPI quality information; 3) PAG (pathway, annotated list, and gene signatures) annotation derived from gene set enrichment analysis results again PAG databases; 4) quality information about each PAG; and 5) gene sub-network information within each enriched PAG. We demonstrate with both simulation study results and an Alzheimer's disease case study that GS-rank is superior in helping biomedical researchers focus on candidate genes of significant biomedical functions better than ranking genes on network information alone. With GS-rank, we also hypothesize based on our data that Alzheimer's disease is a lifestyle disease associated with the Western diet rich in cholesterols.
Short Abstract: The misregulations of microRNA have been shown the contribution to diseases. Because identification of microRNAs for certain diseases in laboratories is time-consuming, numerous network-based methods have been developed to predict novel microRNAs. The homogeneous networks based on the shared targets among microRNAs have been widely used to predict their roles in certain diseases. Although the homogeneous networks can provide potential microRNAs, they do not consider the roles of the target genes. Recently, we have proposed a computational method based on a random walk framework on a microRNA-target gene network to predict disease-associated microRNAs (Le, et al., 2017). This was shown superior when compared to existing state-of-the-art network- and machine learning-based methods since it well exploits mutual regulation between microRNAs and their target genes in microRNA-target gene networks. To facilitate the use of this method, we develop a Cytoscape app, named RWRMTN, to predict disease-associated microRNAs. RWRMTN can work on any microRNA-target gene network. Highly ranked microRNAs can be supported with evidence from the literature. Then, they can be also visualized based on the rankings in relationships with the query disease and their target genes. In addition, automation functions are also integrated, which permits RWRMTN can be used from external environments.
Short Abstract: Motivation Combination drugs have been drawing attention as a key strategy for treating complex diseases. Screening of all pairwise drugs remains as an important challenge due to its expensive cost and labor-intensive work. Previous methods have limited ability to interpret mechanisms of drug actions. Comprehensive understanding of phenotype profiles of drugs can help discover effective combinations in rational and systematic ways. Result We present a computational framework predicting therapeutic effects of combination drugs and interprets for mechanisms of actions at multiple scales of biological phenomena. We designed this framework as an integrative model of a molecular network and a visible neural network. This model is applicable for single drugs and combination drugs, we trained this by using single drugs. For breast cancer, two networks were trained and validated by 281 drugs whose targets are known and 9,256 samples of DEGs for drug treatments on breast cancer cell lines, respectively. The drugs or samples above were divided into a train set and a test set at a ratio of 3:1. The two networks showed AUROC 0.692 and 0.98 for validation sets, respectively. The integrative model performed AUROC 0.68 and 0.76 respectively for 641 single drugs and 1,229 combination drugs.
Short Abstract: To understand the behavior of proteins one can use different graph theoretical models, such as protein-protein interaction networks, metabolical pathways, and similarity networks. The required information is accessible from large databases (Int-Act, String, KEGG). These structures enable mathematical analysis of biological data as such. The aim of this work was to find important proteins related to amyloid structures, in protein-interaction database with the PageRank algorithm, which was created as a search engine for the World Wide Web. In the course of the work different modelling options, parameter settings and scoring heuristics were available, which were compared to measure the robustness of the results of the algorithm. The main network of this study is from the Int-Act database and for reference the DIP and MINT databases were also used. The list of amyloid structures is available at https://pitgroup.org/amyloid/.
Short Abstract: Metabolic networks describe the chemical reactions within an organism. We can reconstruct these networks from genome and enzyme reactions databases. About ten years ago, Takemoto et al., 2007 stated that the structure of undirected prokaryotic metabolic networks is correlated to an environmental variable: the optimal growth temperature. Reinspecting this work with an updated database and new species, we compared network structural properties to the associated growth temperature, and confirmed most of Takemoto et al., 2007 results. Furthermore, we extended this work to directed metabolic networks. We found three robust directed graph measures significantly related to growth temperature. These results can be extended to more complex environmental phenotypes. We considered gut microbiota metabolic networks, and host clinical data as phenotypes. Using metagenomics data from bariatric surgery patients, we found several microbiota genes able to predict high or low response to surgery in terms of weight loss, and additional connections between networks and clinical phenotypes. Network structure analysis can be a new powerful tool to analyse the relationship between gut microbiota and host. In the context of obesity and metabolic diseases, it can be a new avenue to explore functional properties between the gut and diseases, and towards new targeted therapies.
Short Abstract: Drug combination treatments improve efficacy and reduce toxicity compared to single drug treatments. However, experimental drug combination screening is costly and there is a vast number of possible combinations from existing drugs. Therefore, computational methods for predicting drug combination effects are required. We present a method to predict drug combination effects based on drug-gene relations. Here, our hypothesis is the more similar the relations of two drugs to genes are, the more synergistic two drugs are. To obtain the relations for each drug-gene pair, we used drug-target interactions (DTI), protein-protein interactions (PPI), and transcriptions factor (TF)-gene associations. We reconstruct a PPI network using interactions between proteins of which type corresponds to either activation or inhibition. The PPI network is used to connect drug targets to TFs. We quantify the relations of each drug-gene pair by considering DTI, drug target-TF connections, and TF-gene associations. Then, a score for drug combination effect is calculated from two drugs. Our method outperforms 24 of 31 methods proposed during the DREAM 7 Challenge. Although our method is not the best, it may be useful when drugs do not have expression data since we do not need the expression data for drugs, unlike the 31 methods.
Short Abstract: A cell lineage consists of several consecutive developmental stages from the pluripotent or multipotent stem cell state to a particular state of terminally differentiated cells. This differentiation process is accompanied by dynamic rewirings of the gene regulatory networks (GRNs) of transcription factors (TFs) and target genes. There is considerable interest in identifying the global regulators that control such a differentiation process and the individual cell fate stages (Goode, et al. Dev Cell 2016). Here, we consider the set of genes forming the largest strongly connected component (LSCC) in the embryonic stem cell state as candidates for the set of global regulators. This set of candidates is topologically controlled by a set of dominator and connector nodes (Nazarieh, et al., BMC Syst Biol 2016). At different stages along cell differentiation, different subsets of the TFs in the LSCC activate and deactivate other genes and TFs to define a particular cellular identity. For each stage, a set of key players can be identified by solving the related optimization problems. As proof of concept, we consider this approach to data on six developmental stages from embryonic stem cells to terminally differentiated macrophages compiled by (Goode et al. 2016).
Short Abstract: Diseases involve complex processes and modifications to the cellular machinery. The gene expression profile of the affected cells contains characteristic patterns linked to a disease. Hence, biological knowledge pertaining to a disease can be derived from a patient cell's profile, improving our diagnosis ability, as well as our grasp of disease risks. This knowledge can be used for drug re-purposing, or by physicians to evaluate a patient's condition and co-morbidity risk. Here, we look at differential gene expression obtained from microarray technology for patients diagnosed with various diseases. Based on this data and cellular multi-scale organization, we aim to uncover disease-disease links, as well as disease-gene and disease-pathways associations. We propose neural networks with structures inspired by the multi-scale organization of a cell. We show that these models are able to correctly predict the diagnosis for the majority of the patients. Through the analysis of the trained models, we predict and validate disease-disease, disease-pathway, and disease-gene associations with comparisons to known interactions and literature search, proposing putative explanations for the novel predictions that come from our study.
Short Abstract: In recent years, much attention has been given to single-cell RNA sequencing techniques as they allow researchers to examine the functions and relationships of single cells inside a tissue. In this study, we combine single-cell RNA sequencing data with protein–protein interaction networks (PPINs) to detect active modules in cells of different transcriptional states. We achieve this by clustering single-cell RNA sequencing data, constructing node-weighted PPINs, and identifying the maximum-weight connected subgraphs with an exact Steiner-Tree approach. As a case study, we investigate RNA sequencing data from human liver spheroids but the techniques described here are applicable to other organisms and tissues. The benefits of our novel method are two-fold: First, it allows us to identify important proteins (e.g., receptors) which are not detected from a differential gene-expression analysis as they only interact with proteins that are transcribed in higher levels. Second, we find that different transcriptional states have different subnetworks of the PPIN significantly overexpressed. These subnetworks often reflect known biological pathways (e.g., lipid metabolism and stress response) and we obtain a nuanced picture of cellular function as we can associate them with a subset of all analysed cells.
Short Abstract: Functional annotation on individual genes is prerequisite for the interpretation of omics data. Various gene set definitions such as GO, Reactome, and KEGG are publicly available for analysis and interpretation of selected genes from given omics data. One limitation of those existing definition is that they are all comprised of category-based classification of genes. Quantitative, numerical definition of the functional similarity between genes will provide improved resolution and flexibility in the analysis and interpretation of various omics data. Another problem with using existing gene sets is that genes are defined in multiple categories because individual genes are commonly known as multi-functional. This aspect often makes it difficult to interpret optimal gene networks (or gene lists) extracted from complex omics data. Thus we attempted to quantitatively calculate the pairwise similarity between genes from all available gene set data. Tanimoto scoring between genes was calculated from each of gene set definition databases. Since the size of individual gene sets is varied, optimal weight factor for gene set size is also explored to obtain universal similarity scores between genes. This novel gene similarity score provides a new tool for network optimization and functional analysis of gene lists extracted from large-scale proteome data.
Short Abstract: Toxicogenomics is the study of the molecular effects of chemical, biological and physical agents in biological systems, with the aim of elucidating toxicological mechanisms. The majority of toxicogenomics data has been generated at the transcriptome level, and large quantities of drug-treatment data have been made publicly available through different repositories. Besides the identification of differentially expressed genes from case-control studies or drug treatment time series studies, bioinformatics methods have emerged that infer gene expression data at the molecular network and pathway level in order to reveal mechanistic information. In this work we describe different approaches that relate gene expression measurements with known pathway and network information. We highlight approaches that integrate gene expression data with molecular interactions in order to derive functional network modules related to drug toxicity. We demonstrate the construction of a suitable molecular interaction network, as well as the identification of subnetworks via network propagation. We apply methods and tools to publicly available rat in vivo data on anthracyclines, an important class of anti-cancer drugs that are known to induce severe cardiotoxicity in patients. We report the results and functional implications achieved for four anthracyclines and compare the information content inherent in the different computational approaches.
Short Abstract: One of the most challenging tasks in computational biology is the integration of complementary biological data, produced from different sources. In particular, the combination of expression data and biological interactions, in order to identify “active modules”, i.e., sets of interacting genes/proteins associated to coordinated expression changes in different biological contexts, such as disease contexts. We use a modified version of NSGA-II with two objectives to maximize, to consider both the gene/protein differential expression and the density of interactions of the subnetworks. After a given number of generations, the final result is the set of subnetworks belonging to the first Pareto Front, i.e., all the non-dominated solutions. We tested our algorithm in both, simulated and real data. Our algorithm is capable of processing RNA-seq data and different types of biological networks simultaneously (e.g., PPI and miRNA-mRNA target) and find several active modules in a single run, each of which potentially represents a cellular function affected in patients as compared to controls. We proved that these modules are indeed biologically meaningful. Recapitulating, we developed an algorithm able to identify subnetworks deregulated in diseases, that can be associated to specific biological processes.
Short Abstract: The genetic basis of complex diseases involves alterations on multiple genes. Unravelling the interplay between these genetic factors is key to the discovery of new biomarkers and treatments. In 2014, we introduced GUILDify, a web server that searches for genes associated to diseases, finds novel disease-genes applying various network-based prioritisation algorithms and proposes candidate drugs. Here, we present GUILDify v2.0, a major update and improvement of the original method, where we have included protein interaction data for seven species and 22 human tissues and incorporated the disease-gene associations from DisGeNET. To infer potential disease relationships associated with multi-morbidities, we introduced a novel feature for estimating the genetic and functional overlap of two diseases using the top-ranking genes and the associated enrichment of biological functions and pathways (as defined by GO and Reactome). The analysis of this overlap helps to identify the mechanistic role of genes and protein-protein interactions in comorbidities. Finally, we provided an R package, guildifyR, to facilitate programmatic access to GUILDify v2.0 (http://sbi.upf.edu/guildify2). The research has been recently published in Journal of Molecular Biology (doi: 10.1016/j.jmb.2019.02.027).
Short Abstract: IAMBEE is a web-server designed for network-based genotype-phenotype mapping of clonal populations that display the same focal phenotype. Such populations are typically obtained through either natural or experimental evolution. The goal is to derive from the genomic information of these populations the adaptive genes that carry alterations causal to an observed common adapted phenotype. Evolving clonal systems accumulate mutations which can be adaptive (drivers) but also neutral (passengers). Distinguishing drivers from passengers is not trivial, especially when the passengers largely outnumber the drivers. In addition, the same phenotype can originate through interfering with the same pathway, irrespective of which genes are affected in this pathway. As a pathway can consist of many genes, identifying adaptive mutations by searching for genes frequently mutated across different adapted populations is too restrictive. Hence to improve the identification of driver genes, IAMBEE searches for neighborhoods in a gene-gene interaction network that are recurrently mutated across the different parallel evolved populations. These neighborhoods are a proxy of adaptive pathways and drivers are identified as members of these pathways. Using this approach, IAMBEE allows exploiting parallel evolution to identify adaptive pathways and to study their evolutionary principles. The web-server is freely available at http://bioinformatics.intec.ugent.be/iambee/
Short Abstract: Modern high-throughput technologies, including proteomics and transcriptomics, produce long lists of genes or proteins, which need to be interpreted in the context of existing biological knowledge. To facilitate the analysis, visualization and exploration of such omics data, we developed the stringApp. In one workflow, it combines the popular protein associations resource STRING v11 with the powerful functionality of the Cytoscape network visualization platform. Here, we highlight the most recent features of the stringApp and showcase their usability for exploring omics data. The stringApp results panel has a new interface with quick access to frequently used node and edge actions as well as filtering and coloring edges by their evidence type or filtering nodes by their tissue and compartment information. When performing enrichment analysis, users can now choose a custom background, explore additional categories like Reactome pathways or UniProt keywords and browse related publications from PubMed. Finally, stringApp allows users to “stringify” any network generated from their own data or retrieved from another network resource. Thereby, they gain access to all stringApp features like functional enrichment, tissue and compartments associations, or additional interaction evidence from STRING. The app is freely available at http://apps.cytoscape.org/apps/stringApp.
Short Abstract: Many protein interaction databases provide confidence scores for each interaction based on the available experimental evidence. Protein interaction networks (PINs) are then built by thresholding on these scores. We demonstrate that even small variations in the score threshold can result in PINs with significantly different topologies. We argue that if a node metric is to be useful for extracting biological signal, it should induce similar node rankings across PINs obtained at different confidence score thresholds. We propose three measures—rank continuity, identifiability, and instability—to test for threshold robustness. We apply these to twenty-five metrics of which we identify four as the most robust: the number of edges in the step-1 ego network, and the leave-one-out differences in average redundancy, average number of edges in the step-1 ego network, and natural connectivity. Our measures show good agreement across PINs from different species and data sources. However, analysis of synthetically generated scored networks shows that robustness results are context-specific, and depend both on network topology and on how scores are placed across network edges.
Short Abstract: Genomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules. Results: We present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions, mutual exclusion, and coverage to identify cancer driver modules. MEXCOWalk outperforms several state-of-the-art computational methods on TCGA pancancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples, and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code, and useful scripts are available at: https://github.com/abu-compbio/MEXCOwalk.
Short Abstract: Changes in cellular systems result from different types of coordinated interactions between biomolecules. At the most fundamental level, biomolecules interact physically with each other, forming context-specific complexes that function as whole units. Understanding these functional units depends upon having access to a structured documentation of complexes to use as a searchable and downloadable reference. EMBL-EBI Complex Portal (www.ebi.ac.uk/complexportal) is the home for manually curated macromolecular complexes from 20 model organisms. In 2018 we completed the curation of all known Saccharomyces cerevisiae complexes, the “Yeast Complexome”, in collaboration with SGD and UniProt. The Complex Portal documents all complex participants, their topology and binary interactions, along with annotations of their stoichiometry, agonists, and antagonists. The molecular function, biological process and cellular location of the entire macromolecular complex, rather than the individual subunits, are annotated with Gene Ontology (GO) terms. Complex portal entities can validate topologically predicted networks or genetic interactions derived from synthetic lethality maps, among other applications. Disease-specific complexes are generated to support analysis of disease-specific datasets. Going forward, we will compare curated complexes to experimentally-derived protein-protein and genetic interaction datasets, integrate experimentally-derived features such as post-translational modifications or mutations affecting protein binding and expand curation for other species.
Short Abstract: Rhizobium leguminosarum is a bacterium that fixes atmospheric nitrogen when associated with plants of the legume family, improving legume growth. Gene co-expression network models can help to understand the metabolic and transcriptomic changes in R. leguminosarum from the "free-living" to the "plant-associated” states. We utilise rich but noisy gene expression data from free-living bacteria, bacteria from the soil where legumes grow, and bacteria from inside the plant. Using the correlation between the gene expression under these growth environments, we impose a threshold to select the strongest relationships in order to construct an unweighted network, in which nodes correspond to genes and undirected links connect nodes with high positive correlation. We also create weighted networks where the weights are the correlation values, to both include more information in our analysis and as a comparison to the unweighted networks. We performed Monte Carlo tests to evaluate the significance of the connections between functionally related genes, considering data from KEGG, STRING, and OperonDB, in both types of networks. Our results suggest that both such networks can be a useful guide in the identification of genes involved in the same biological processes, the prediction of gene function, and in the verification of genome annotations.
Short Abstract: Combination therapies that block multiple oncogenic processes offer a solution to drug resistance and more durable therapeutic responses in cancer. Discovery of effective combinations remains challenging due to the combinatorial complexity of the underlying processes. We developed a multi-omics data-driven network modeling method, TargetScore, that (i) reveals pathway modules involved in therapeutic resistance and (ii) nominates combination therapies to down-regulate the resistance pathways. We experimentally test the resulting predictions (combinations of the original single drug with drugs targeting members of the resistance pathways). The TargetScore is calculated for each measured entity by integrating the drug response of the entity along with the response in its pathway neighborhood. High scores correspond to adaptive responses (e.g. activation of compensatory oncogenic pathways in response to therapy). The method is amenable to calculations for hundreds of samples treated with individual (or combinations of) drugs in multiple doses and/or time points and assayed for thousands of molecular entities (mRNA/proteins). Longitudinal data analysis allows us to trace the evolution of drug resistance and, potentially, the optimum time points for intervention. We applied our method to BET-BRD inhibition in ovarian and breast cancers and discovered cell-type-specific, anti-resistance combinations involving BET inhibitors.
Short Abstract: Ecological interactions among microbes are fundamental for ecosystem functioning. Yet, most of them remain unknown. High-throughput omics can help unveiling microbial interactions by inferring associations, which can be represented as networks. Associations in these networks can indicate ecological interactions between species or alternatively, similar or different environmental preferences, in which case the association is environmentally-driven. We developed an approach to determine whether or not two species are associated in a network due to environmental preference. We use four methods (Sign Pattern, Overlap, Interaction Information, and Data Processing Inequality) that in combination can detect what associations are environmentally-driven. We implemented our approach in a publicly available software tool called EnDED. Our program was tested on simulated networks as well as on real marine microbial networks constructed with spatial or temporal community composition data. We found evidence of environmentally-driven associations in all tested datasets. For instance, in a network constructed with 10 years of monthly data, we found that 14% of the associations were environmentally-driven. We conclude that environmentally-driven associations are ubiquitous in microbial association networks and that it is crucial to determine and quantify them in order to generate more accurate hypotheses on ecological interactions in the microbial world.
Short Abstract: Gene expression is a key factor in the development and maintenance of life in all organisms. This regulation is carried on mainly through the action of transcription factors (TFs), although other actors such as ncRNAs are also involved. However, integrating different types of information related to gene regulation at a specific time can be costly experimental and computationally. In this work, we developed a method to construct condition specific Gene Regulatory Networks (GRNs), i.e., networks depicting regulatory events in a certain condition, using several types of experimental data collected from different databases. Our method creates GRNs starting from a Gold Standard Network (GSN) that contains all known gene regulations for an organism. Regulations from the GSN that are unlikely taking place are removed by applying a series of filters. The method considers different combinations of experimental evidence, including DNA methylation and accessibility, histone modifications and gene expression, or combination of these data. In this way, if a given pattern of data generated in the same experimental condition is associated to, for example, inactive TF-binding sites, the respective filter will remove all TF-gene interactions associated to those sites. The implementation is available as a Cytoscape application available at https://figshare.com/articles/WeoN_install_zip/7913912.
Short Abstract: Alternative splicing is a post-transcriptional regulation which is important for the diversity of the proteome and interactome. It enables the production of multiple proteins from a single gene with different structures. In a network point of view, these structural changes can introduce new interactions or remove existing ones. In this study, we reconstructed patient specific signaling networks with tumor specific protein isoforms. For this purpose, we collected 400 breast cancer and 112 normal RNA-seq data from the Cancer Genome Atlas (TCGA) and found the differentially expressed transcripts with isoforms at protein level. Additionally, we compiled a structural interactome from multiple sources and aligned the isoforms to the known/predicted protein structures. At the end, we constructed a tumor-specific interactome for each sample based on the lost interfaces. Then, we used the proteins of differentially expressed transcripts with Omics Integrator for tumor specific network modeling. Finally, we compared all tumor specific networks simultaneously to reveal pathway, interaction and protein patterns that can cluster the tumors. In our analysis, we are searching for interfaces that are predominantly affected by splicing and eventually their alternatives in the rewiring of signaling networks. Our results will help in target selection and developing therapeutic strategies.
Short Abstract: Dilated cardiomyopathy (DCM) is a common cause of heart failure, ultimately leading to premature death. Allelic imbalance, when the two alleles of a gene are expressed at a ratio different from 1:1, is indicative of gene splicing and non-sense mediated decay – both strongly associated with DCM. Within a DCM cohort, we are investigating to what extend allelic imbalance detected in left ventricular heart tissue can be employed for patient stratification. Due to the known involvement of numerous genes in DCM, we employed a network analysis based on shared imbalance across genes and patients. Additionally, by assigning the relative shared imbalance as edge-weights to a patient-based network, affinity propagated clustering enables identifying groups of patients with similar profiles. Our analyses reveal imbalanced genes previously associated with DCM as well as multiple novel candidates. Apart from suggesting a wide-spread phenomenon, the network approach enables pinpointing of joint events: if a particular gene is imbalanced, then likely paths of co-imbalanced genes can be derived. Additionally, we identified distinct patient clusters with specific allelic imbalance profiles. We are currently in the process of relating both the different genetic paths as well as the patient clusters to disease stage, prognosis and co-morbidities.
Short Abstract: Inflammation, a crucial part of the immune system, is a well-coordinated process which is triggered by infectious microbes like bacteria, viruses or fungi, host tissue damage such as external or internal injury or tissue malfunctioning. However, the slightest dysregulation of inflammatory pathways or an excessive, chronic inflammatory response can have disastrous effects on the affected tissue or even the whole body. By building a highly-curated inflammation network, we aim to provide a valuable resource for the investigation of the underlying mechanisms and the comparison of the role of inflammation in different diseases. We integrated the network with publicly available gene expression datasets from five different diseases including cancer (breast and lung cancer), obesity, rheumatoid arthritis, systemic lupus erythematosus and cardiomyopathy. Besides topological, network module and clustering analyses, we will use network alignment methods to identify unique (only in one disease) and global (in all diseases) affected regions in the network. Initial results are promising and show large disease differences regarding the changes in gene expression of the inflammation network.
Short Abstract: Network theory has been used for many years in the modelling and analysis of complex systems, as biology and biomedicine. As the data evolves and becomes more heterogeneous and complex, monoplex networks become an oversimplification of the corresponding systems. This imposes a need to go beyond traditional networks into a richer framework, called Multilayered Network. These complex networks have contributed in many contexts and fields, although they have been rarely exploited in the investigation of biological networks, where they are very applicable. Our idea is to integrate pathways and their related knowledge, like drugs and diseases, into a multilayer model, where each layer represents one of their elements. We create patient-specific multi-layer models to help improve clinical research and practice, by integrating pathway’s data from multiple sources, high-throughput and mutation patient data, as well as drug-target information, while providing a tracking of the applied modifications to increase reproducibility in network biology and network medicine. In this poster, we present to you mully, an R package to create, modify and visualize multilayered graphs, as well as the different steps that have to be done in order to reach our aims.
Short Abstract: Glioblastoma (GBM), the most aggressive type of the glial tumors, is widely promoted by stem-like cells. Although certain cancer types have been radically treated with Receptor Tyrosine Kinases (RTKs) inhibitors, Liau et al. demonstrate that treatment Glioblastoma Stem Cells (GSCs) with RTKs inhibitor, dasatinib, led to dynamic interconversion from proliferative to slow-cycling, persister state. In this work, we used the publicly available RNA-Seq and ChIP-Seq data from patient-derived GBM cell line for naive, 12d-day treated and persister status and applied an integrative approach to model the effect of dasatinib treatment in a network context. We first identified significantly active transcription factors for each condition. Then we reconstructed an optimal network for each condition by integrating these transcription factors and a confidence weighted protein interactome using Omics Integrator software. As a result, we obtained three condition specific networks and clustered them based on the topology of these networks. We found that Wnt, MAPK and JAK/STAT signalling pathways are differentially regulated across conditions. We also found that Smad and Sox transcription factors are differentially active across different networks. We propose that deciphering transcriptional regulation mechanisms in a network context help to characterize epigenetic changes underlying reversible transition and resistance of GSCs.
Short Abstract: As we exhaust Mendelian approaches for identifying disease-causing genes, future approaches must harness the ever-growing body of genome-wide molecular data. Existing methods like GWAS, although powerful, are hampered by statistical limitations, and experimental candidate gene testing in humans is unrealistic. We developed diseaseQUEST, an integrative computational-experimental framework to identify novel human disease genes. diseaseQUEST combines human genome-wide disease studies with in silico network models of model organism tissue- and cell-type specific function to prioritize candidates within functionally conserved areas of biology. As part of this framework, we also developed a novel semi-supervised regularized Bayesian integration method that enables network integration in the presence of sparse prior data. We applied diseaseQUEST to predict candidate genes for 25 different diseases and traits, including cancer, longevity, and neurodegenerative diseases. Focusing on Parkinson’s Disease, a diseaseQUEST-directed C. elegans behavioral screen identified many novel candidates with experimentally-verified age-dependent motility defects mirroring clinical symptoms. Furthermore, knockdown of the top candidate gene, branched chain amino acid (BCAA) transferase bcat-1, caused spasm-like “curling” and neurodegeneration, paralleling the reduced BCAT1 expression exhibited in PD patient brains. These findings demonstrate the power of diseaseQUEST to identify novel human disease genes.
Short Abstract: Alterations in cancer-driver genes influence cancer development and occur in oncogenes, tumor suppressors, and dual-role genes. Discovering dual-role cancer genes is difficult because of their elusive context-dependent behavior. We defined as oncogenic mediators genes that control biological processes, and we used them to classify cancer-driver genes and unveil their role in cancer mechanisms. To this end, we developed Moonlight: a tool that incorporates multiple -omics data to identify critical cancer-driver genes. We applied Moonlight to over 8,000 tumor samples from 18 cancer types to discover 3310 oncogenic mediators, 151 having dual roles. Moonlight discovered more than one thousand cancer-driver genes, confirming known molecular mechanisms explaining the activation of oncogenes at the promoter level (amplification, mutation, DNA methylation, chromatin accessibility). Additionally, Moonlight discovered inactivation of tumor suppressors in intron regions, and that tissue type and subtype indicate dual-role status. Moonlight confirmed critical cancer-driver genes by the analysis of cell line datasets. These findings help explain tumor heterogeneity and could guide therapeutic decisions.
Short Abstract: Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. In health care however, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm. After optimizing the combination of kernels to predict a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.
Short Abstract: Genomic sequence alignment has revolutionized understanding of cellular functioning. However, sequence alignment ignores interactions between proteins, which are ultimately what carry out biological processes. Biological network alignment (NA) can fill this gap, transferring functional knowledge between species' conserved molecular network (rather than just sequence) regions. Hence, NA can redefine homology. However, current NA methods do not end up aligning proteins that should be aligned, i.e., that are functionally related. One reason is that traditional NA assumes it is proteins with similar network topologies that are functionally related and should thus be aligned, but we find this not to hold. So, a paradigm shift is needed with how the NA problem is approached. Consequently, we redefine NA as a data-driven framework, which learns from data the relationship between topological relatedness and functional relatedness, without assuming that topological relatedness means topological similarity. Another possible reason is that traditional NA treats biological data as a homogeneous network. However, biological data are heterogeneous, with different -omics data types capturing different slices of cellular functioning. To handle such data, we generalize homogeneous NA to heterogeneous NA. We find that both data-driven and heterogeneous NA lead to alignments of better functional quality compared to traditional NA.
Short Abstract: Networks (graphs) are an intuitive way to represent and visualize relationships (edges) between biological entities (vertices), such as genes, gene products, metabolites, or even organisms. Biological data, represented as a network, can be analyzed using standard computational approaches to, for example, identify functional domains from sequence similarity or specify molecular pathways from co-expression data. However, analyses are often limited by network size, a characteristic that has been rapidly increasing throughout biological subfields due to the decreasing costs and improved efficiency of high-throughput experimentation. We created a functional similarity network of all proteins encoded by the currently available fully-sequenced bacterial genomes. Our network contains ~16 million nodes (proteins) related by ~50 billion edges to form ~183 thousand strongly connected components. Notably, the largest connected component contained 76% of the proteins in our set. To identify functional units represented by the subgraphs in this network we evaluated various existing high-throughput clustering techniques. None of the algorithms were able to deal with the size of our largest connected component, revealing methodological limitations and/or unwieldy resource requirements. Here we present our work to use t-Distributed Stochastic Neighbour Embedding (t-SNE) to solve this clustering problem in an embarrassingly parallelizable fashion.
Short Abstract: Network-based algorithms enable the integration of diverse forms of data in unique and flexible ways that can be customized for a particular experimental design or disease area. Specifically, these approaches can map gene expression activity from a disease of interest to protein-protein interactions that give rise to such expression patterns. These resulting networks enable interpretation and hypothesis development beyond what can be done with gene expression analysis alone. In this work, we integrate network-based tools that interpret gene expression data with published drug-target interaction data into a single scientific workflow. The approach starts from inferring proteins uniquely active within each tumor type using the MetaViper package and then identifies functionally relevant networks between those proteins and drugs using a modified version of the Prize-Collecting Steiner Forest (PCSF) algorithm and data from the Drug-Target Explorer package. We evaluated this workflow on data from patients with NF1, a tumor-predisposing syndrome for which there are no clinically approved drug treatments. We found drugs that were specific to each of the five NF1 tumor types evaluated and shown to be effective in related cell lines. The workflow is publicly available at: https://github.com/Sage-Bionetworks/drug-target-expression-network.
Short Abstract: Determining the genetic basis of diseases is critical to developing therapeutic strategies. We have previously developed approaches to integrate rich and diverse sets of omics data into interpretable, hierarchical models and have found that they can both recapitulate known cellular subsystems and guide discovery of new ones. To enable systematic discovery of biomedical knowledge, we have built DiseaseScope, a service that automatically organizes high-throughput gene-gene interaction data into interactive hierarchical models. This method takes a disease name and returns biological information at multiple scales including a core set of disease-associated genes, interactions that form disease-relevant pathways and the hierarchical organization of these pathways. Furthermore, to elucidate how each gene cluster is related to the disease, DiseaseScope includes two interpretation tools: HiView Lens to explore the underlying structure of individual modules by overlaying additional networks and NetAnt to determine what biomedical concepts connect the gene module to disease by proposing mechanistic pathways to pathogenesis. Although the pipeline is automatic, each module is a self-contained service that can be invoked independently, allowing users to form custom applications. Together, DiseaseScope aggregates across massive amounts of biological knowledge about diseases and organize the knowledge to guide discovery.
Short Abstract: Gene co-expression networks are commonly applied to identify modules of highly co-expressed genes in tissue transcriptomes. It is broadly considered that these co-expression modules represent coordinated molecular functions and/or co-regulated groups of genes. However, most of the gene expression variance in tissue transcriptomes comes from variations of the cell type composition. We hypothesized that consistently co-expression modules in tissue transcriptomes represent marker genes of the comprising cell types. In order to test our hypothesis, we gathered 69 publicly available gene expression datasets from several tissues, breast, brain, lung, kidney, etc.. We combined several co-expression networks per tissue following a consensus co-membership weighted network approach to identify consistently co-expressed gene modules. We compared the consistently co-expressed gene modules of each tissue with the known marker genes for the tissue comprising cell types. These reference marker genes were obtained from scRNA-seq experiments available in the CellMarker database. We found that the consistently co-expressed gene modules significantly intersect with marker genes of the main comprising cell types. Therefore, this co-expression consensus-co-membership network approach constitutes an unsupervised approach to deconvolute the cell type composition of tissue transcriptomes.
Short Abstract: Phenotype robustness to environmental fluctuations is a common biological phenomenon. Although most phenotypes involve multiple proteins that interact with each other, the basic principles of how such interactome networks respond to environmental unpredictability and change during evolution are largely unknown. Here we study interactomes of 1,840 species across the tree of life involving a total of 8,762,166 protein-protein interactions. Our study focuses on the resilience of interactomes to network failures and finds that interactomes become more resilient during evolution, meaning that interactomes become more robust to network failures over time. In bacteria, we find that a more resilient interactome is in turn associated with the greater ability of the organism to survive in a more complex, variable, and competitive environment. We find that at the protein family level proteins exhibit a coordinated rewiring of interactions over time and that a resilient interactome arises through the gradual change of the network topology. Our findings have implications for understanding the molecular network structure in the context of both evolution and environment.
Short Abstract: Biological networks are a common and widely used method of representing the complex connections in biology, but the data formats used for encoding the information also differ heavily by their intended use. Suitable solutions for transmission conflict with those for storage, or usage in applications and analyses. Seamless conversion between different formats becomes as important as the data itself. NDEx, the Network Data Exchange (www.ndexbio.org) is an online commons for biological networks. The networks are shared in the Cytoscape exchange (CX) format, a JSON based and aspect-oriented data structure. Within the R statistical environment, the ndexr package grants access to the NDEx platform and provides the retrieved networks using a simple adaptation of the CX structure. This draft data model underwent ongoing advancement to enhance usability and enrich functionality by a better adjustment to fundamental R data structures and adding high-level functions for data manipulation. Furthermore, extensibility was increased by facilitating the creation of custom aspects. The new RCX data model closes the gap between the CX transmission format and the R way of thinking of data while being compatible with well-established R data formats and tools. The corresponding R package is available on Github (https://github.com/frankkramer-lab/RCX).
Short Abstract: With the development of transcriptomics technologies, gene expression analysis has become an important resources in molecular profiling. However most known methods fail to detect more subtle gene sub-systems contributing to phenotypes since they mainly focus on singular gene independently. Moving to holistic genome expression study allows both less biased discovery of new associations/biomarkers, and detection of large scale signatures. Using co-expression to define genes interactions, multiple studies have been exploring those systems. Knowledge can be gathered from modules based on topologic intelligence (data-driven) while additional biological information can be added (knowledge-driven) to help interpretation. Current tools to perform such analyses remain accessible only to researchers with advanced bioinformatics skills. Moreover, it requires holding a series of tools, each one of them performing one or few functionalities. Our new automated pipeline aims to fulfill those needs: supporting both microarray and RNA-seq data, filtration and normalization, co-expression network building, modules detection, functional enrichment, graph comparison, topological analysis, transcription factor positioning and graph visualisation. Using skin aging samples from two distant age range, we explored the augmented network built by our tool to discover new gene that could complete the knowledge of skin aging process.
Short Abstract: BACKGROUND: How age and contraction mode (i.e. eccentric (ECC), concentric (CON)) interact to influence resistance exercise-induced muscle adaptation remains poorly understood. Here, we utilise predictive network analysis to explore molecular signatures of contraction-specific muscle adaptation as a function of age. METHODS: RNA-sequencing data generated from young (18-30 y) and older (65-75 y) human skeletal muscle before and following acute ECC and CON contractions was subject to Weighted Gene Co-expression Network Analysis. Identified gene modules with an expression profile influenced by contraction were: annotated using the Gene Ontology; associated with contraction-induced muscle strength declines, and; subject to candidate target identification. RESULTS: 21 modules displayed an age/contraction-dependent expression pattern, including several ‘mitochondrial’- (downregulated post-CON with ageing) and ‘extracellular’-related (upregulated post-contraction in an age/mode-dependent manner) modules. Candidate target identification produced a refined list of 273 hub genes and 43 putative transcriptional regulators across age/contraction-related modules. 2 modules correlated with post-contraction changes in muscle strength, namely: a cell adhesion-related module upregulated by ECC contraction per-se and positively related to ECC-induced strength declines, and; a transcription-related module downregulated by contraction per-se and negatively associated with contraction-induced strength declines. CONCLUSION: Predictive network analysis represents a powerful approach to identify candidate targets of age/contraction mode-(in)dependent muscle adaptation.
Short Abstract: Phenotypic heterogeneity in cancer is often caused by different genetic alterations. Understanding such phenotype-genotype relationships is fundamental for the advance of personalized medicines. Phenotype-genotype relationships in cancer can be better interpreted in a pathway-centric view, in which genetic alterations in the disease are considered from the context of dysregulated pathways. However, most of pathway identification methods in cancer focus on finding subnetworks that include general cancer drivers or are associated with discrete features, hence cannot be directly applied for the analysis of continuous features such as drug response. On the other hand, existing genome wide association approaches do not fully utilize the complex proprieties of cancer mutations. To address these challenges, we propose NETPHLEX (NETwork-to-PHenotype mapping Leveraging EXclusivity), which aims to identify subnetworks of mutated genes that are collectively associated with continuous cancer phenotypes. We formulate the problem as an integer linear program and solve it optimally to obtain a connected set of mutated genes maximizing the association. Analyzing a cell line drug response dataset, we identified sensitivity associated subnetworks for a large set of drugs. NETPHLEX can be used to identify subnetworks associated with any continuous phenotypes beyond drug response data.
Short Abstract: Identification of functional pathways mediating molecular responses may lead to better understanding of disease processes and suggest new therapeutic approaches. We propose a method to detect such mediating functions using topological properties of protein-protein interaction networks. We start by introducing the concept of pathway centrality, a measure of communication between disease genes and differentially expressed genes. Using pathway centrality, we identify mediating pathways in three pulmonary diseases (asthma; bronchopulmonary dysplasia (BPD); and chronic obstructive pulmonary disease (COPD)). We systematically evaluate the significance of all identified central pathways using alleviating and positive genetic interactions. Mediating pathways shared by all three pulmonary disorders favor innate immune and inflammation-related processes, including toll-like receptor (TLR) signaling, PDGF- and angiotensin-regulated airway remodeling, the JAK-STAT signaling pathway, and interferon gamma. Disease-specific mediators, such as neurodevelopmental processes in BpD or adhesion molecules in CopD, are also highlighted. Some of our findings implicate pathways already in development as drug targets, while others may suggest new therapeutic approaches.
Short Abstract: The molecular complexity of neoplastic development occurs at a number of genomic, transcriptomic and epigenomic scales, which makes the integrative analysis of multiomics in cancer research vital. In addition, cancer-related genes act within an interaction network that must be explicitly incorporated into the integration procedure. Network-based methods have the ability to naturally incorporate closely correlated variables and give insights into global differences between samples of multi-dimensional datasets. The Wasserstein metric from the theory of optimal mass transport (generalized to graphs) has useful properties for smoothly measuring these differences. Here, we present a novel method of aggregating multiomics and Wasserstein distance clustering (aWCluster) to perform hierarchical clustering of breast carcinoma and lung adenocarcinoma from the TCGA project. Integrating mRNA expression, copy number alteration, and DNA methylation followed by clustering samples (via aWCluster) results in subgroups with significantly different survival rates in both breast and lung cancer. Gene ontology enrichment analysis of significant genes in the substantially low survival breast subgroup led to the well-known phenomenon of tumor hypoxia. Moreover, aWCluster successfully recovered PAM50 molecular subtypes of breast cancer. Consequently, we believe aWCluster has potential to discover novel subtypes and biomarkers challenging to discover without network inference or with single-omics analysis.
Short Abstract: In transcriptomics, a common assumption is that regulation is the underlying cause of the observed coexpression and that regulatory relationships could be inferred from coexpression links. Here we revisited this assumption by studying differential coexpression between five human tissues to identify the potential cases of regulatory rewiring. We identified many robust tissue-specific links, but found up to 75% of the tissue-specific links to be predictable by the average expression level of the genes. This is contrary to the common belief that much differential coexpression happens in the absence of differential expression. We also found that brain has a particularly high count of tissue-specific links (32% of its total links). Through simulation, we demonstrate that in a heterogeneous bulk tissue, cellular composition variation among the samples could induce variance and coexpression among the genes. To confirm this in the real data, we modelled the variation of genes’ expression levels among the bulk brain samples, using the variation of brain cell-type-marker genes. We show that much of the observed brain specific coexpression is likely to be induced by cellular composition variation among the samples. Our findings raise questions on the potential of the bulk tissue coexpression in predicting regulatory signal.
Short Abstract: Understanding the mechanisms underlying infectious diseases is fundamental to develop prevention strategies. Host-Pathogen Interactions (HPI), which includes from the initial invasion of host cells by the pathogen through the proliferation of pathogen in their host, are studied to find potential genomic targets for the development of novel drugs, vaccines, and other therapeutics. Determining which proteins are involved in the interaction system behind an infectious process is the first step to develop an efficient disease control strategy. In silico prediction methods have barely been implemented as web services to infer novel HPIs, and there is not a single framework which combines several of those approaches to produce and visualize a comprehensive analysis of host-pathogen interactions. We introduce PredHPI, a powerful framework that integrates both the detection and visualization of interaction networks in a single web service, facilitating the apprehension of model and non-model host-pathogen systems to aid the biologists in building hypotheses and designing appropriate experiments. PredHPI is built on high-performance computing resources on the backend capable of handling proteome-scale data from both host as well as pathogen. Data are displayed in an information-rich and interactive visualization, which can be further customized with user-defined layouts. PredHPI is freely available at http://bioinfo.usu.edu/PredHPI/.
Short Abstract: Heart failure is a major cause of morbidity and mortality, because of the inability of the human heart to replenish lost tissue post myocardial infarction (MI). Unlike humans and mice, many vertebrates are capable of endogenous heart regeneration at adult stages. Reciprocal analyses between regenerative zebrafish and non-regenerative medaka has begun to reveal valuable knowledge about heart regeneration. However, the regulatory cascade throughout heart regeneration remains largely elusive. Here, we analyzed two time-series transcriptomes from injured hearts of zebrafish and medaka. We first constructed a regeneration-specific TO-GCN with orthologous gene pairs that only coexpressed in zebrafish but not medaka. This method presumably filtered out coexpression relationships unrelated to heart regeneration in zebrafish. We then identified the expression order of regeneration-related transcription factors (TFs) and their co-expressed genes. Furthermore, we inferred the dynamics of biological processes associated with heart regeneration and found that genes involve in multicellular organism development and signal transduction were upregulated at six hours post cardiac injury, while genes involve in inflammatory response, innate immune response and cellular response to DNA damage stimulus were preferentially upregulated at one day after injury. Using the TO-GCN, we predicted several TFs might preferentially regulate the regulatory cascade and trigger heart regeneration.
Short Abstract: While the advanced capacities of high throughput single cell technologies have facilitated a great understanding of cells behaviors during the process of its growth and expansion, establishing causal relationships among transcription factors and genes at the cellular level still remains challenging. Here we present PySCNet, a python toolkit that enables reconstructing and analyzing gene regulatory network (GRN) from single cell gene expression data. It consists of four core modules: 1) Preprocessing module; 2) BuildNet module for reconstructing GRN by various methods; 3) NetEnrich module for network analysis including similarity comparison, enrichment test and network fusion; 4) Plotting module for network visualization. In addition to network analysis, gene trigger paths indicating genes with hidden relationships can be predicted by supervised random walk as well. We applied this tool to our single cell RNA-Seq data with thousands of cells clustering into 6 different cell types. We were able to establish GRNs for each cluster. Furthermore, results of simulated dataset showed that integrating multiple gene networks generated from different datasets using the same cell types, largely increases the complexity and completeness of the cell type specific network.
Short Abstract: The study of comorbidities is a major priority due to their underlying common molecular mechanisms remain elusive despite the multiple disease co-occur. Identification of comorbid diseases by considering the systematical interactions between molecular components in biological networks is a key figure to understand the underlying disease mechanisms. Here, we developed a network biological approach utilizing underlying molecular mechanisms to identify comorbid diseases. Comorbid diseases frequently cause complications and share numbers of molecular mechanisms by dysfunctions of common functional modules, such as protein–protein interaction networks and biological pathways. Our designed method grades similarities between pairs of disease mechanisms including comorbid diseases and their pathological conditions. We found that the identification of common functional modules revealed strongly connected with comorbid diseases. As a proof of concepts, metabolic diseases and cancers showed highly comorbid patterns and allergic march in which several allergic diseases often co-occur were examined to discover whether common functional modules shared between diseases were associated with comorbidities. This study demonstrated that identification of common functional modules across diseases accompanying comorbidity phenomena could be used to discover disease comorbidities and underlying molecular mechanisms to understand pathological conditions between comorbid diseases.
Short Abstract: Functional gene set enrichment has become a crucial method to interpret the results of genomics and quantitative proteomics experiments. Most widely used methods rely on enriching for only curated gene sets from e.g. Gene Ontology using a simple hypergeometric test. Here, we present a new functionality of the protein-protein interaction (PPI) database STRING for functional gene set enrichment that works directly on the experimental data and enriches for gene sets derived directly from the PPIs themselves. For the STRING-derived functional gene sets, we hierarchically clustered the PPI network of each of the 5090 organisms present in STRING. Each cluster was named, and near-redundant clusters were removed. We then implemented a very fast permutation-based functional enrichment method on the STRING website. This method works directly on log fold changes or log p-values and thus does not require setting a significance threshold. The experimental values can be enriched for both existing functional annotations as well as for our new hierarchical clusters. Our new STRING-derived protein clusters allow us to identify potential new pathways of any size in an automated fashion, especially in less studied taxa. The new enrichment functionality of STRING is powerful and yet simple to use and available at string-db.org.
Short Abstract: We revisit a fundamental question in signaling pathway analysis: are two molecules "connected" in a network? We examined this question for the Reactome pathway database using four different pathway representations: graphs, compound graphs, bipartite graphs, and hypergraphs (which capture many-to-many relationships in reaction networks). Reactome is well connected as a graph and poorly connected as a hypergraph, with the other representations falling between these two extremes. We present a novel relaxation of hypergraph connectivity, B-relaxation Distance, that iteratively increases connectivity from a node while preserving the hypergraph topology. B-relaxation Distance bridges bipartite graph connectivity and hypergraph connectivity, and we use it to systematically identify the influence of one Reactome pathway on another. We also assess the quality of evidence channels in the STRING database by using connectivity measures to determine the set of positive interactions. We show that while many STRING channels include interactions where both proteins participate in the same Reactome pathway, the channels are enriched for the more restrictive set of interactions where nodes are connected according to hypergraph definitions. Our method presents a systematic approach for working with more accurate pathway topologies, and lays the groundwork for other generalizations of graph-theoretic concepts to hypergraphs.
Short Abstract: Drug developing is a complicated, time-consuming (approximately 13 years per drug), high-priced (average $12 billion per drug) and a risky investment. For these reasons, drugs are designed for binding multiple targets and eventually can be used in treatment of different diseases. In view of these findings, computational drug repositioning approaches were proposed by using bioinformatics analysis tools. Using drug repositioning, drug development can be completed in a shorter time (average 8 years), less costly (mean $1.6 billion) and less risky. There are three main approaches for drug repositioning: computational, biological experiment and mixed approaches. One of the computational methods is the network-based approach which uses physical relationship between two proteins and functional similarities between genes to discover a new usage for a known drug/compound. In network-based approach, network content representing different hypotheses including protein-protein interaction networks, drug-target / drug -drug / drug -disease / drug -side effect relationships or disease-disease relationships have been established. In this study, network-based drug repositioning studies in cancer area are reviewed. Key problems and opportunities in this field are summarized to guide researchers for further studies.
Short Abstract: KEGG is a database resource for understanding high-level functions of the biological system, such as the cell and the organism, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. Several software programs have been released that integrate KEGG pathway with the user data for the functional analysis. But there's still a lot to be improved, such as lack of interactiveness, software componentization, and flexibility in data integration. Here we present a dashboard component (called Dash-KEGGscape) for such KEGG data integration and analysis. Dash-KEGGscape specification is based on the dashboard creation framework Dash, which allows users to create analytical web applications easily. When combined with the other Dash components (e.g., charts, data tables), Dash-KEGGscape provides a more interactive, dynamic, and informative pathway network visualization. Dash-KEGGscape converts KEGG pathway object metadata into the tidy data structure since the dashboard component combinations require complicated molecular ID references. The tidy data makes it easier to read and write the data aggregation programs for the component combinations and helps users customize the dashboard. As a practical example of Dash-KEGGscape, we demonstrate the visualization of the time-series profile of a whole-cell simulation of Escherichia coli.