If you need assistance please contact firstname.lastname@example.org and provide your poster title or submission ID.
Track: Network Biology
Short Abstract: We develop a proof-of-concept in visualization and manipulation of static and dynamically changing holographic molecular networks in 3D space using Microsoft HoloLens mixed reality. We apply our approach to study protein interaction networks centered around complex genetic disorders.
Short Abstract: Differential network analysis (DiNA) denotes a recent class of network-based Bioinformatics algorithms which focus on the differences in network topologies between two states of a cell, such as healthy and disease, to identify key players in the discriminating biological processes. One major advantage of DiNA algorithms over conventional differential expression is that they identify changes in the interplay between molecules rather than changes in single molecules. Here, we perform simulation studies to evaluate ten different DiNA algorithms regarding their ability to recover genetic key players. Therefore, we construct random scale free correlation networks of different sizes. In the disease networks we perturb the correlations between a different number of genes and their neighbors to varying degrees. Given the covariance structure of the generated networks we simulate the corresponding expression data with the Cholesky decomposition. For each combination of parameter settings we perform 100 runs for each DiNA algorithm. The best performing algorithms are the local algorithm LS and the hybrid algorithm DiffRank. They have a highly significant recovery rate and are less dependent on the size of the network and the number of genes with changed correlation. We also find that the changed genes mostly do not show a change in their expression, which makes it impossible to find them with conventional differential expression analysis. Our simulations underline the advantages of comprehensive cell models for the analysis of transriptomics data.
Short Abstract: In the early drug discovery process, phenotypic screening is considered as an alternative method to target-based screening. Target-based screening requires rigorous validation of identified targets, which is challenging due to the complexities of underlying biology. Phenotypic screening directly identifies hit compounds of the desired phenotypic response. The next step is to identify targets underlying the overserved pharmacological outcome, or target deconvolution. Here, we applied a network diffusion approach to predict unknown targets of drugs based on drug-induced transcriptome dataset (CMAP) and several types of gene-gene network. Our analyses confirm that the gene expression is more frequently changed among the proximal genes to the known drug targets than randomly selected genes in the networks tested. It suggests that network diffusion approach may be useful in the target deconvolution of phenotypic screening hits.
Short Abstract: Literature-based networks derived from observed activating and inhibiting causal relationships between genes are "signed", i.e. each edge is associated with a sign +1 or -1, determining whether the activation states ("activated" or "inhibited") of connected genes are correlated or anti-correlated. We use the Ingenuity Knowledge Base (QIAGEN) to construct a large-scale signed gene network, and apply a statistical physics approach employing quenched Monte Carlo simulations to discover gene sets ("modules") whose activity patterns are most consistent with the underlying signed network structure. In a second step, these modules are connected to consistent upstream regulators and downstream functions, allowing the construction of high-level context-specific causal networks for exploration and visualization of the underlying biology.
Short Abstract: A gene regulatory network (GRN) is a biological network expressing the regulatory relationship between a transcription factor (TF) and a target gene. Identifying GRNs is a crucial problem for understanding biological systems. Using high throughput techniques such as microarray and RNA-Seq, gene expression data can be easily obtained and used to reconstruct GRNs. However, to more precisely capture the characteristics of a dynamic biological system, analyzing GRNs using time-series data is necessary. Various models that infer GRNs using time-series gene expression data have been proposed. However, each model has been validated on only a limited number of benchmark datasets. We first integrated all the benchmark time-series gene expression datasets from previous studies and reassessed the baseline models. We observed that the bagging based tree ensemble model GENIE3-time achieved the best performance on the integrated dataset. GENIE3-time basically computes the regulatory score between a target gene and its TFs using feature importance score of the Random Forest (or Extra-Trees). To improve the performance of GENIE3-time, we applied Boosted Trees, powerful boosting based ensemble tree models, when calculating the feature importance scores. We evaluated our models on the integrated benchmark dataset and achieved the best results on both AUROC (area under the receiver operating characteristic curve) and AUPR (area under the precision-recall curve) scores. Furthermore, we ranked the scores for all datasets and showed the best average rank, demonstrating the robustness of our model.
Short Abstract: The booming popularity of analytics authoring and delivery systems such as Jupyter and RStudio has enabled bioinformatic programmers to create, distribute and improve novel workflows more quickly and economically than ever before. While languages such as Python and R have access to robust and performant libraries that implement general graph operations, such libraries lack support for network biologic operations such as enrichment, complex clustering, complex layouts and visual styling, publication support, and biologic database access. To date, we have positioned Cytoscape to provide basic network construction, styling and layout capabilities via the CyREST system, which consists of language-specific libraries that broker Cytoscape functions across a REST-based network connection. In our latest work, we have extended the CyREST repertoire to enable access to the large collection of biologically relevant Cytoscape apps thus far available only to interactive users. These include complex clustering, heat propagation, network alignment, pathway analysis, regulatory interaction attributes, enrichment and ontology analysis, among others. Finally, the Cytoscape Cyberinfrastructure enables bioinformaticians to author new network analyses functions in the language of their choice (e.g., Python, golang, C++), deploy them as services in a scalable cluster, and make them available to Cytoscape as apps callable via CyREST. This extends Cytoscape to leverage large memory and CPU farms previously out of reach. By exposing Cytoscape’s app ecosystem and flexible, scalable network-biologic web services, we enable network biologists to now author and distribute complex, auditable, and reproducible workflows without first redeveloping Cytoscape functionality, and yet still leverage highly capable web services.
Short Abstract: High throughput assays (e.g. sequencing and mass spectrometry) generate large, comprehensive, and rich datasets. Analyzing different ‘omics data streams together provides important insight into the behavior/regulation of a biological system. However, the integration and analysis of multiple ‘omics data remains a challenge and is an active area of research. Probabilistic graphical models (PGMs), e.g. Bayesian networks, are an intuitive way to represent the uncertainty and complexity of biological systems in an unbiased manner. PGMs provide a method to understand the causal relationships between diverse sets of data, such as molecular, phenotypic and clinical data. As such, drivers of outcomes can be analyzed, which represent potential biomarkers and therapeutic targets of a phenotype/disease. Project Survival is a collaborative longitudinal study between Berg, BIDMC, PCRT and CRAB to identify and validate clinical biomarkers and additional diagnostic and therapeutic molecules to improve outcomes for patients with pancreatic adenocarcinoma, which is an unmet medical need. This project’s goal is to discover and implement effective companion diagnostic panels to stratify pancreatic cancer patients based on expected therapy outcomes and, hence, define custom treatment strategies. Currently, we have interrogated the lipidome, metabolome and proteome of 164 patients diagnosed with various forms of pancreatic disease. Using our Bayesian AI platform, bAIcis™, we have assessed associations between molecular and clinical data streams. Molecular drivers of clinical endpoints have been identified and analyzed to rank potential biomarkers, which will be validated in future patients as part of Project Survival.
Short Abstract: Animal models are very important for the study of human diseases and for the development of new treatment therapies. However, it is challenging to identify the regulatory genes and pathways in an animal model that would be useful to generate reliable hypotheses about a phenotype of interest in human. In order to transfer knowledge from one species to another, it is crucial to take into account the intrinsic differences in regulation between human and animal models as well as the mechanisms underlying the specific phenotype. Here, we present a comprehensive framework for the comparison of pathways in mammalian organisms. Our analysis provides different levels of detail based on the integration of different types of data and network analysis techniques. To identify and visualize the overlap of already existing pathways, we use the orthology relationships between participating genes as indicated by the eggNOG database (http://eggnogdb.embl.de). We further integrate the pathways with tissue expression data from the TISSUES database (http://tissues.jensenlab.org/), which covers several mammalian organisms. In particular, we highlight the differences and similarities of genes involved in KEGG signaling pathways (http://kegg.jp/) between different tissue types for four organisms of interest (human, mouse, rat and pig).
Short Abstract: The etiological agents of cryptococcosis are species of the fungus Cryptococcus spp. The disease typically affects immunocompromised patients and represents a neglected public health problem. It is estimated that approximately 500,000 people die annually of cryptococcosis, most of them with weakened immune system. Cryptococcosis treatment efficiency is low and the collateral effects can be severe. The search for new drugs targets that could contribute to cryptococcosis control can be made through analysis of protein–protein interactions (PPIs) networks. Nevertheless, PPIs data for Cryptococcus genus are scarce and laboratorial methods for high-throughput PPIs identification are very expensive, time-consuming and can produce a high number of false-positives and false-negatives. In this study we are developing an ab initio approach for Cryptococcus gattii and Cryptococcus neoformans PPIs prediction using machine learning techniques. The Weighted Sparse Representation based Classifier (WSRC) methodology that has been reported with accuracy estimates of 97% was used together with Global Encoding vector in order to build predictive models. As training dataset Saccharomyces cerevisiae PPIs data were used. Networks predicted were analyzed using Cytoscape software and they have the potential of expand the biological knowledge of Cryptococcus spp. and catalyze new discoveries.
Short Abstract: The high lethality of Pancreatic Ductal Adenocarcinoma (PDAC) is partially due to its intrinsic apoptosis resistance. Previous studies demonstrated that PDAC cells with p53R172H alleles show increased apoptosis resistance to those with p53 wild-type alleles. To investigate the underlying molecular mechanisms, we inferred the regulatory network through which mutant p53R172H alters the expression of apoptosis genes and prioritized the information flow. Apoptosis genes were determined in the differentially expressed genes between 32 p53 wild-type and 15 p53R172H PDAC samples from a Cre-loxP mouse model. On their basis, our published regulatory model of mutant p53R172H that was reconstructed from the same dataset was minimized to the apoptosis relevant regulatory network. Topological and distance-based methods were then employed to prioritize the importance of the dysregulated transcription factors and miRNAs in mediating the expression changes. Our study identified 66 genes implicated in apoptotic processes and the reconstructed apoptosis network contained a regulatory prediction for 53 of them. The network analysis prioritized that the mutant p53R172H-induced dysregulation of p53, Ctnnb1, Sp1 and of the miR-297-669 cluster has a strong effect on the apoptosis resistance of p53R172H cells. Analyzing the distribution of apoptosis genes whose expression changes have a positive or negative effect on apoptosis indicated further that the loss of p53 wild-type gene regulation, the modulation of Ctnnb1 and the down-regulation of miR-297-669 cluster have an apoptosis-suppressing effect.
Short Abstract: Identifying the similarities, at the molecular level, between patients who exhibit similar susceptibilities is necessary to understand the differences in disease predisposition between individuals. Over the last decade, genome-wide association studies (or GWAS) have become one of the prevalent tools for detecting genetic variants correlated with a phenotype. GWAS have provided novel insights into the mechanisms of many common human diseases. However, a number of frustrating results have also been reported. Indeed, the genetic variants they have uncovered often fall short of explaining all of the phenotypic variation that is known, from family studies, to be inheritable. A key reason for this "missing heritability" is the statistical difficulties of analyzing data with orders of magnitudes more features than samples. One way to address this problem is to reduce the dimensionality of the space of solutions by means of prior biological knowledge. Such knowledge can in particular be given by biological networks, which provide a means to take a more holistic view of the problem. In this context, SConES (Selecting CONected Explanatory SNPs) was proposed a few years ago for quantitative phenotypes to look for SNPs that are both informative and connected on an underlying network. Here, we present a R package that facilitates the usage of SConES for bioinformaticians. It incorporates statistical tests for the case/control setting and a BIC-based approach for the selection of appropriate hyperparameters that leads to improved performance on simulated data.
Short Abstract: Pathway enrichment analysis has become a fundamental tool to gain insight into the underlying biological relation between e.g. differentially expressed genes and biological pathways. Network-based methods have been demonstrated to outperform simpler methods based on gene overlap. However, gene sets derived from experiments are often complex and influenced by noise, decreasing analysis accuracy. The complexity, i.e. that multiple pathways are affected, incompleteness of known pathways, and noise can lead to failure in detecting true pathways. Therefore,clustering was applied on the gene sets and pathway analysis techniques were performed on the separate modules with the objective to increase the sensitivity. The impact of clustering was benchmarked with different pathway enrichment analysis tools.
Short Abstract: Genome-scale metabolic models have been proven to be valuable for defining cancer or to indicate the severity of cancer. However, identifying effective metabolic drug-target of the active small-molecule compound are difficult to unravel and there are still unmet challenges to solve. In this study, we propose a network analysis of enzyme- and metabolite-centric networks to identify targets on breast cancer data. We have applied topological network analysis to identify clusters, which are crucial in controlling the system and providing useful parameters for defining the relationships between topological features and drug-targets. We show that both enzymes and metabolites can be effective targets, and high degree metabolites can be driver nodes in the network. We also perform a comparative analysis between the analysis of cancer networks and that of corresponding random networks to measure the set of predicted drug-targets changes in the system. Furthermore, principal component analysis defines the overall metabolic states in the systems, and correlation analysis identifies the link between drug-target and enzymes. Overall, we show that a better understanding of the metabolic networks of cancer by use of statistical modeling could be useful in drug-target identification for efficient therapeutic approaches and personalized medicine.
Short Abstract: Motivation: Seamless exchange of biological network data enables bioinformatic algorithms to integrate networks as prior knowledge input as well as to document resulting network output. However, the interoperability between pathway databases and various methods and platforms for analysis is currently lacking. NDEx, the Network Data Exchange, is an open-source data commons that facilitates the user-centered sharing and publication of networks of many types and formats. Results: Here, we present a software package that allows users to programmatically connect to and interface with NDEx servers from within R. The network repository can be searched and networks can be retrieved and converted into igraph-compatible objects. These networks can be modified and extended within R and uploaded back to the NDEx servers. Availability: ndexr is a free and open-source R package, available via GitHub (https://github.com/frankkramer-lab/ndexr) and has been submitted to Bioconductor.
Short Abstract: Expansions of the CAG repeat in nine polyglutamine (polyQ) genes (HTT, ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, ATN1, AR, and TBP) cause neurodegenerative diseases including Huntington's disease (HD) and spinocerebellar ataxias (SCAs). These polyQ diseases are characterized by different patterns of brain atrophy. The expanded CAG repeat length in the causal gene negatively affects the age-at-onset (AAO). Additionally, the CAG repeat length in other polyQ genes, other than the causal gene, also affects AAO, suggesting functional associations between the polyQ genes. However, there is no detailed assessment of the interactions among polyQ genes in pathologically relevant regions of the brain. We used gene co-expression analysis to study the functional relationships between polyQ genes in different brain regions using the Allen Human Brain Atlas (AHBA), a spatial map of gene expression in the healthy brain. We constructed co-expression networks for seven anatomical structures as well as a region associated to magnetic resonance imaging (MRI). In the HD-associated region, we found that ATN1 and ATXN2 are co-expressed and functionally related through other co-expressed genes. We observed the same association in the frontal lobe, parietal lobe, and striatum, which are structures involved in HD pathology. Across the brain, the two genes also share many co-expressed genes with HTT forming a triangular relationship. The observed interactions of HTT, ATN1, and ATXN2 may be dysregulated in all polyQ diseases. However, the stronger interaction between ATN1 and ATXN2 observed in the HD-associated region and even more strongly in striatum may be more specific to HD.
Short Abstract: Drugs are designed to target specific genes or pathways. However, researchers and clinicians often observe unexpected severe changes and side-effects, resulting from drug-gene unknown interactions. Moreover, not all the molecular events taking part in the drug’s response are always fully known. Our goal was to implement a pipeline for predicting the molecular effects of drug on complex networks. We have built a simple, free and intuitive Cytoscape application  designed for non-programmer biologists. The pipeline enables importing and merging of multiple pathways from the KEGG database , loading expression levels from experiments\databases such as GEO (NCBI), and network behaviour’s simulation using the BioNSi (Biological network simulation) tool . Heregulin (HRG) is known to bind and inhibit ErbB receptors, and is therefore known as a target for cancer therapy. ErbB signaling, MAPK and EGFR pathways were selected for simulation analysis of the molecular events following HRG drug’s effect. Microarray gene expression data was obtained for control and for heregulin-treated MCF7 cells (GSE6462 ). Simulation of in-silico HRG inhibition on control expression data was performed. The pipeline’s multi-pathway HRG simulation enables prediction and visualization of network dynamics, and presents central hub genes, many common to the gene expression experimental results, that suggest a model of the molecular mechanism of HRG drug effect. References: 1. Shannon P. et al. (2003) Genome Research 13, 2498. 2. Minoru K. et al. (2017) Nucleic Acids Res 45, D353. 3. Rubinstein,A. et al. (2016) J. Proteome Res. 15, 2871. 4. Kim,Y. et al. (2011) Bioinformatics 27, 391.
Short Abstract: The amount of biomedical data produced by DNA-sequencing, by curated knowledge on diseases mechanisms and treatments, by pharmaceutical research and by many other data generation studies, is escalating with high pace. The wealth of biomedical data is a precious resource for integrative research studies which draw conclusions through the analysis of multiple heterogeneous data for knowledge discovery. However, data integration and in addition data interpretation have to overcome a number of hurdles, which result from the characteristics of the biomedical data sources including challenges from data diversity in protein labelling, data consistency, analogy, availability and interoperability. The aim of this work is harnessing the capability of the big biomedical data by integrating its artefacts through the application of Semantic Web and the Linked Data principles in the extraction of potential protein-protein interactions. A semantic model for protein-protein interaction networks has been developed in this work which is used in order to identify explicit knowledge on protein interactions. The data is exposed as visual analytics platform, LinkedPPI, which is optimised for intuitive data exploration. A selection of potential protein interactions has then been experimentally validated in order to demonstrate the validation of the predicted interactions. The positive outcomes of the experimental validation demonstrate that the semantically integrated data forms effective means in the selection of most relevant and yet unknown protein interaction candidates.
Short Abstract: The reconstruction of gene-regulatory networks from incomplete molecular data remains one of the most challenging tasks in systems biology. Frequently, the only data available are medium- to large-scale gene perturbation assays, resulting in generally underdetermined systems. The problem is compounded further by the tendency of common classifiers to identify spurious edges due to transitivity. To address these issues, regularized regression methods such as LASSO have been widely employed, alongside tools aimed explicitly at the identification of spurious relationships. Here, we systematically evaluate variants of LASSO regression in conjunction with stability selection. To further improve prediction accuracy, we compare recent methods for the discrimination of direct from indirect relationships: regularized partial correlations, distance partial correlation, local transitive reduction, and network deconvolution. We take note of interesting properties of these methods, e.g. a bias of regression-based methods that causes negatively correlated nodes to be discovered at a lower frequency but with much higher precision. Combining these tools into a pipeline, we demonstrate good performance on in-silico gene perturbation data at different levels of sparsity, in the presence of non-linear effects. We then applied this pipeline to an RNAi assay of 116 selected genes in mouse embryonic and epiblast-derived stem cells. After recovering many known and putative players of early lineage determination and their dynamic associations in each condition, we validated them using independent time courses of stem cell differentiation.
Short Abstract: Understanding gene regulatory networks (GRNs) is key towards deciphering gene deregulation in cancer development. We are building on previous efforts to find tissue-specific and disease-specific gene regulatory networks (FANTOM5). While large efforts have been devoted to create context specific GRNs for a range of tissues as well as diseases, most currently available cancer GRNs are inferred from unmatched datasets for which only the diseased tissue is available. Our goal is to find disease-specific changes of gene regulation using matched normal and tumor patient data in a cohort-specific fashion. We propose DeepGRN, a deep learning model that enables us to find cohort-specific disease-induced changes in the GRN’s of cancer patients by learning the interactions from RNASeq measurements and reported tissue-specific interactions. We apply DeepGRN to two prostate cancer cohorts: TCGA-PRAD and ProCOC (Zurich University Hospital). For each prostate-specific interaction reported in FANTOM5 we use as features the joint RNASeq measurements from the two interacting genes of the patients in a given cohort. We then train a deep learning network on the data from normal patient samples to learn transcription factors to target interactions in disease-free state. Once the model has been trained on normal samples, we predict the GRN and their deregulation for the tumor samples, highlighting differences in the regulation process between normal and tumor samples. DeepGRN can be used to generate hypotheses for detection of new biological processes relevant for cancer onset and development and puts forward a novel approach to drive drug discovery and suggest targeted therapies.
Short Abstract: Pathway Commons (www.pathwaycommons.org/) serves researchers by integrating data from public pathway and interaction databases and disseminating it in a uniform fashion. The knowledge base is comprised of metabolic pathways, genetic interactions, gene regulatory networks and physical interactions involving proteins, nucleic acids, small molecules and drugs. Alongside attempts to increase the scope and types of data, a major focus has been the creation of user-focused tools and resources that facilitate access, discovery and application of existing pathway information to facilitate day-to-day activities of biological researchers. For those wishing to browse and discover pathways within the collection, we offer a web-based ‘Search’ application that enables users to query by keyword and visualize ranked search results. ‘PCViz’ is a web tool that accepts gene names and returns a customizable interaction network visualization based upon pathway data resources. These complement existing desktop software add-ons linking Pathway Commons to the Cytoscape (CyPath2) network analysis tool and the R (paxtoolsr) programming language. To facilitate analysis and interpretation of experimental data - for instance, enrichment studies that distill pathway alterations from underlying gene expression changes - pathway data file downloads can be directly used in software tools such as Gene Set Enrichment Analysis. For those wishing to learn more about pathway resources and analysis, an online ‘Guide’ includes case studies and guided workflows. Ongoing development of web apps will enhance the accessibility to pathways and integrate support for visualization and interpretation of experimental data.
Short Abstract: Although studied deeply, the identification of regulatory mechanisms from biological data still is challenging, especially across different types of ‘-omics’ data, which is growing in size and complexity. Specifically, we are interested in identifying the regulatory mechanisms underlying trans-acting meQTL hotspots from multi-omics (genotype, gene expression and CpG methylation) data. While the integration of different layers of information in ‘multi-omic’ studies has the potential to yield an almost complete view of the underlying biological system, the statistical methods to fully use this potential are still lagging behind. Here we present an approach based on Bayesian Gaussian Graphical Models (GGMs), which we extend to include non-uniform, data-driven priors to identify regulatory networks from multi-omics data. In general, GGMs impose a sparse graph structure on the underlying data by the use of partial correlations. This Bayesian approach to GGMs is based on a Markov-Chain-Monte-Carlo method, where graphs are scored depending on the data and the given prior information for that graph. Assuming independence between edges, the nature of the algorithm allows us to express graph specific priors by edge specific priors. We devise priors for the three different edge types derived from the data: SNP-gene, gene-gene and CpG-gene priors. For the SNP-gene priors, we estimate the probability of observing an expression quantitative trait locus (eQTL) based on the GTEx eQTL data. Similarly, we define priors for CpG-Gene links by using expression quantitative trait methylation information (eQTM) reported in an independent study. Gene-Gene priors are created by integrating protein-protein interaction (PPI) information from STRING with the GTEx gene expression data. By implementing non-uniform graph priors, we extended a Bayesian GGM framework which allows for the identification of direct associations between genotypes, gene expression and CpG methylation levels. This approach eases the integration of different kinds of ‘-omics’, thereby making it possible to extract regulatory networks from those data whilst taking into account already established biological knowledge as well as other, independent data. In the end, those networks aid the interpretation of complex multi-omics data and give insights into underlying regulatory mechanisms which might explain studied phenotypes or diseases.
Short Abstract: Over 90% of the worldwide population is infected by at least one species of human herpesviruses, members of the Herpesviridae family. The currently eight known human species differ in infective strategies, but they all cause lifelong infections with sporadic reactivations. Their symptoms range from mild to severe, including encephalitis and neurological disabilities. The high prevalence and possible severity of their infections, coupled with the lack of antiviral therapy capable of eradicating the virus from the host, make these viruses a predominant public health concern. Using network biology, we have been working to extend our understanding on the physical and functional relationships among herpesvirus proteins upon infection . We have computationally reconstructed a protein-protein interaction (PPI) network for the best-studied human herpesvirus species, and archetype species of the whole family, herpes simplex virus type 1 (HSV1). Our pipeline integrates two different data sets: one with PPIs collated from public resources, and another containing computationally predicted PPIs derived using a sequence homology-based method . Computationally predicted PPIs pinpoint potential novel interactions that can be used to assist experimental PPI testing design. All our interaction data are assessed under a common scoring scheme inspired on the standardised MIscore system . Using clustering approaches we highlight functional modules and macromolecular complexes within the reconstructed network, and identify novel functionally meaningful relationships. A web server has been designed to make all our interactomics data publicly available: HVint (http://topf-group.ismb.lon.ac.uk/hvint/) .  Ashford, et al. (2016)  Yu, et al. (2004)  Villaveces, et al. (2015)
Short Abstract: Graphlets and their statistics have been used to compare biological networks, to uncover their functional organisation principles, and to relate the wiring patterns of genes in these networks with their biological functions. However, modelling molecular data as a network in which edges only capture binary interactions between molecules is limited and disregards the higher level organisation of the interacting molecules. A more comprehensive model is obtained through hypergraph representations, in which hyperedges link all entities that interact in a specific way, e.g. an entire protein complex or a signaling pathway. Inspired by these, we build upon hypergraphlets -- an extension of graphlets to hypernetworks that has been recently defined by Jose Lugo-Martinez et al. (2016) -- to propose: a Hypergraphlet Degree Vector (HDV) defined for each node by the counts of hypergraphlets, an HDV-similarity using the cosine similarity for pairs of nodes, and Hypergraphlet Correlation Distance (HCD), which provides a measure of distance between two hypergraphs. We apply these new statistics to mine hypergraphs modelling protein-protein interactions, proteins complexes, biological pathways, drug-target data, and gene-disease data. On these hypergraphs, we observe that genes having similar functions (according to the semantic similarity between their gene ontology annotations) also have similar wiring patterns in hypergraphs, as measured by HDVs, however the opposite is generally not true. We also note that the set of hypergraphlets in a network gives an indication of the type of data it represents.
Short Abstract: With the advent of high-throughput biological data and knowledge, integration of gene expression profiles (GEPs) and large-scale biological networks (BN) derived from Pathways Databases is a subject which is being widely explored. Existing methods are based on significantly measured species and only a small number of them include the directionality and underlying logic existing in BN. In this study we approach the GEP-networks integration problem by considering the network logic without requiring a prior genes selection according to their expression level. Our method aims to model the causality logic in BNs using Logic Programming . This model points to reachable network discrete states that maximize a notion of harmony between the molecular species active or inactive possible states and the directionality of the pathways reactions according to their activator or inhibitor control role. From these states independent components are found, each of them related to a fixed and optimal assignment of active or inactive states and independant of the others. Then we compute the similarity between these subgraphs states and the GEP allowing to identify specific subgraphs to a class of GEP. We applied our method to study the set of possible states derived from a graph from the NCI-PID Pathway Interaction Database. This graph linked Multiple myeloma (MM) genes to know receptors for cancer. We identified 15 independant subgraphs, and when confronted to 611 MM and 9 healthy GEPs, we discover one subgraph as more specific to represent the difference between cancer and healthy profiles.
Short Abstract: Drug-drug interactions (DDIs), i.e. changes in the effect of a drug when used in combination with another drug, have important implications for clinical applications (e.g. efficacy of a treatment, side effects) as well as for drug development (e.g. combination therapies). Although the concept of DDIs has been known for nearly a century, relatively little is known about general rules and patterns underlying such drug combination effects on a cellular (cell autonomous) level. In order to fill this gap, we have created and analysed a comprehensive DDI network using: (i) high-throughput high-content imaging screens of a representative library of FDA-approved drugs. (ii) A novel methodology based on high-dimensional cell morphology feature vectors allowing us to identify the full extent of reciprocal and joint interactions between drugs. (iii) An integrative network analysis of the resulting DDI network in the context of molecular and phenotypic networks. Overall, this project represents a first systematic attempt to reveal the fundamental arithmetics of drugs, i.e. a profound, molecularly rooted and predictive understanding of how the effects of individual drugs add up when used in combination, and thus helps us understanding how and when they occur, revealing their molecular mechanisms and identifying promising combinations for specific diseases.
Short Abstract: The interactome, i.e., the integrated network of all physical interactions within the cell, can be interpreted as a map of biological mechanisms. Functional gene annotations (e.g., gene ontology (GO) terms, disease or phenotype associations etc.) allow for local enrichment analyses to explore the functional landscape of the interactome or other biological networks. Network-based approaches can aid in identifying the specific interactome neighborhoods associated with particular biological functions. Here, we present a novel tool called NICE (Network Informed funCtional Enrichment) that characterizes biological networks by identifying functional clusters using topological proximity measures. We show how NICE can be used to explore the functional landscape of the interactome and as a tool to enhance the interpretation of sample gene sets of interest, for example genes associated with a particular disease. More generally, we address the questions of how to properly define biologically meaningful clusters in dense networks and whether different kinds of networks (e.g., protein-protein, co-expression or genetic networks) are more suitable for different kinds of biological annotations.
Short Abstract: We present two database resources, ConsensusPathDB and ToxDB, display their functionalities and demonstrate how they can be used to carry out a system-wide approach for the HeCaToS project. ConsensusPathDB consists of a comprehensive collection of molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. ToxDB was developed for the analysis of the functional consequences of drug treatment at the pathway level, and thus consists of 2,282 pathway concepts as well as numerical response scores for 437 drugs and 7,467 different experimental conditions. For the HeCaToS project we have used both resources and established an integrative approach for the identification and prediction of human liver and heart drug induced toxicity.
Short Abstract: Despite over a decade of post-genomic molecular biology, the functional characterisation of human genes, both in their normal cellular context, as well as in disease states, remains a challenge. In the past, a considerable contribution to our understanding of individual genes and their functions has been derived from the study of rare diseases, as they are often monogenic and thereby offer a clear genotype to phenotype relationship. Given the wealth of unbiased, high-throughput molecular and phenotypic data that is now available on many rare diseases, we can now attempt to go beyond studying one rare disease at a time and move towards a more systematic approach to identifying the pathobiological processes in which the respective causal genes are involved. As an initial case study for such an approach, we consider a group of diseases characterized by intellectual disability (ID). It has been estimated that well over 2000 genes could be involved in ID, yet less than 800 have been identified to date. We propose a network-based community expansion method to predict the likelihood of a gene being causative based on molecular, phenotypic and topological information of other genes in both human and model organisms. On one hand, the results from our computational approach facilitate further experimental validation and characterization of the identified genes. On the other hand, they may also contribute to a more fundamental understanding of disease genes and their phenotypic impact in the context of their molecular interactions.
Short Abstract: Prediction of the treatment response of Acute Myeloid Leukemia (AML) patients may speed clinical decisions. The 2014 DREAM Challenge aims to predict the Complete Remission (CR) and Primary Resistant (PR) response of 191 AML patients from proteomics data (231 measurements) and from 40 bio-clinical data. The results put in evidence only 2 discriminant proteins, the most discriminant data was the bio-clinical. In this study we check if by using a mathematical model built over a graph associating the measured proteins we could increase the number of proteomics measures to predict the CR-PR patients response. In order to do this we first build a graph that connected the 231 measured proteins. In this graph we distinguished 3 types of nodes: stimuli, inhibitors and readouts. Our objective was to find underlying Boolean networks (BNs) that explain the logic of the observed nodes in the graph according to the proteomics measures. Then we conceived a logic program in Answer Set Programming to select k stimuli and inhibitor proteins that maximize the number of pairs of patients for which the discretized values of their experimental measures matched in both classes (CR, PR). This subset of the initial experimental data, was used in a latter step as input to caspo, a method that learns BNs from multiple perturbation data. We aimed to learn different BNs families by using the identical stimuli-inhibitor cases and the maximal difference of readouts measures for each CR and PR class, and to finally compare the structure of these BNs families.
Short Abstract: During the recent epidemic of microcephaly in South America, the dissimilar geographic distribution of the associated Zika Virus Infection (ZVI) raised questions about the virus’ role in the cause of the birth defect. These questions where further amplified as other factors such as agro-toxic usages, vaccinations, metal-poisonings and social conditions have been proposed to co-explain the microcephaly epidemic in Brazil. Therefore, we quantify and provide insights in the different incidence patterns of microcephaly, with and without the presence of the Zika Virus. To that aim, we generate a network model based on the significant associations between 382 non-redundant variables related to ZVI-microcephaly surveillance and social determinants of health, as measured over the 5,665 municipalities in Brazil. Relationships between these variables are quantified by means of significant partial correlations and are represented as edges in our network. As this approach provides us with a dense network, we apply and evaluate different thresholding strategies on the edges of our network to allow for a meaningful topological analysis. Finally, we extract multiple sub-networks representing the incidence patterns around variables of interest, such as microcephaly with and without ZVI, which consist of the node representing our variable of interest, its direct neighbours and the edges connecting them. By comparing such sub-networks we are able to distinguish which variables influence the occurrence of microcephaly under different conditions.
Short Abstract: Gene regulatory networks (GRNs) are complex, and still largely uncharacterized, sets of regulators and interactions that govern cellular processes, such as proliferation, invasion and metabolic adaptation. Virtually simulating biological pathways offers a powerful method to better understand the role of specific genes and pathways in health and disease, and predict cellular responses to perturbations or treatment. Pathway data collected from the literature is represented computationally in various formats including SBML, SBGN, SIF, to name a few. The BioPAX (Biological Pathway Exchange) language is becoming a widely-used format to work with this data. It aims to offer flexibility and compatibility across computational methods and software to solve the issue of format disagreement from different data providers. Major pathway databases such as Reactome, WikiPathways, and Panther increasingly support file export to BioPAX. Here, we aim to develop executable gene regulatory networks (GRNs) exploiting BioPAX. Having identified all major pathways repositories, we evaluated whether the file formats offered could be executed with available software tools. Whereas we could easily generate faithful visual representations of the GRNs considered, none of the tools available, including BioPAX supported tools, allowed us to efficiently simulate them. The results so far highlight the complexity of the task, requiring both format and language compatibility, and the need for further standardization.
Short Abstract: Biologial network alignment is a challenging computational problem in bioinformatics that aims to identify similar nodes, edges or subnetworks among two or more networks. By computing the maximum common edge subgraph between a set of networks, one is able to detect conserved substructures and quantify their topological similarity. To aid such analyses we have developed a heuristic algorithm for the multiple maximum common edge subgraph problem on both directed and undirected networks. Our algorithm uses an iterative local search algorithm for computing conserved subgraphs by optimizing a novel edge conservation score that is able to detect not only fully conserved edges but also partially conserved edges, to provide further insight into the common structure of the compared networks. Our method is available as a stand-alone application and as a Cytoscape app at http://cytomcs.compbio.sdu.dk.
Short Abstract: To identify genes contributing to disease phenotypes remains a challenge for bioinformatics. Static knowledge on biological networks is often combined with the dynamics observed in gene expression levels over disease development, to find markers for diagnostics and therapy, and also putative disease-modulatory drug targets and drugs. The basis of current methods ranges from a focus on expression-levels (Limma) to concentrating on network characteristics (PageRank, HITS/Authority Score), and both (DeMAND, Local Radiality). We present an integrative approach (the FocusHeuristics) that is thoroughly evaluated based on public expression data and molecular disease characteristics provided by DisGeNet. The FocusHeuristics combines three scores, i.e. the log fold change and another two, based on the sum and difference of log fold changes of genes/proteins linked in a network. A gene is kept when one of the scores to which it contributes is above a threshold. Our FocusHeuristics is both, a predictor for gene-disease-association and a bioinformatics method to reduce biological networks to their disease-relevant parts, by highlighting the dynamics observed in expression data. The FocusHeuristics is slightly, but significantly better than other methods by its more successful identification of disease-associated genes measured by AUC, and it delivers mechanistic explanations for its choice of genes.
Short Abstract: Network models aim to describe relationships between components and provide an abstract view on how a biological system works as a whole. However, learning the structure of the most representative network from sparse and often noisy data sets is a challenging optimisation problem. In this work we propose a visual analytics approach for inferring Bayesian networks from time-series gene expression data. In particular, we seek to provide visual support for exploring the search space of all candidate networks in conjunction with a representation of the original data. As part of the variable selection phase we apply hierarchical clustering with multiple-level cuts to reduce feature redundancy. The purpose is to effectively reduce the number of variables by detecting and aggregating genes that follow a similar expression pattern over time. As part of the network construction phase we use different search algorithms and parameter settings to sample the search space of all possible Bayesian networks. An appropriate scoring metric is used to assess their fitness to the original data. We extend the small multipiles technique to provide visual support for exploring large collections of scored, directed networks, which constitute the solution space. Depending on its shape, either the top scoring network is selected, or a consensus network is constructed from a number of high-scoring candidate networks. Our approach aims at the design and implementation of a visualisation toolbox which would help analysts in inferring Bayesian network models which are not only reproducible, but also representative of the original time-series data.
Short Abstract: Graphical Gaussian Models (GGM) has recently become a popular tool to study association networks. Application of GGMs to omics data is quite challenging, as the number of variables (p) is usually much larger than the number of samples (n), and classical GGM theory is not valid in a small sample setting. Several algorithms have been developed to handle GGMs with small samples. These algorithms boil the problem down to finding suitable estimates for the covariance matrix and its inverse when n < p. In this work, we have verified through simulations the ability of the methods to recover the original structure of direct interactions. We have used different algorithms implemented in the R package GGMselect and significance tests of the partial correlation coefficient in the case n> p, adjusting p-values for multiple comparisons. In practice we have generated graphs of order p, we have obtained random samples of size n from the generated structure and we have verified if the original state: connected or not connected, is recovered between each pair of variables. Results show, on one hand, a dependency on the ratio n/p. The greater the ratio, the better it fits. On the other hand, results depend only very slightly on the original network structure. Similar results have been obtained with highly structured networks or when the original structure is random. We have also considered several methods to compare networks by comparing the overlapping of nodes and edges or the degree distribution of nodes.
Short Abstract: A german-wide consortium named “Molecular Mechanisms in Malignant Lymphomas - Demonstrators of Personalized Medicine” compound of research groups of biologists, bioinformaticians and doctors propose to develop prognostic and diagnostic platforms that guide treatment decisions and that support the process of therapeutic target identification in diffuse large B-cell lymphomas (DLBCL). The focus lies on the DLBCL microenvironment as prognostic relevance, which is the foundation of the diagnostic platforms the consortium will establish. The communication of the cell microenvironment with the tumour cells will be the target for the novel therapeutic strategies the consortium wants to investigate. In our subproject, we aim to investigate hybrid-models, which will integrate signalling data with existing gene expression data to predict how lymphomas translate signalling stimuli in expression phenotypes. For this approach we will integrate pathway knowledge and experimental data and implement previously developed network reconstruction methodology. These existing approaches as Deterministic Effects Propagation Networks (Bender et al., 2011) and Nested Effects Models (Fröhlich et al., 2008; Markowetz et al., 2005) are based on Bayesian networks. This is the ground line of my research and shall be adapted, so that measurements from proteomic experiments and prior pathway knowledge can be combined. References: Bender, C., Heyde, S., Henjes, F., Wiemann, S., Korf, U., and Beissbarth, T. (2011). Inferring signalling networks from longitudinal data using sampling based approaches in the R-package 'ddepn'. BMC Bioinformatics 12, 291. Fröhlich, H., Fellmann, M., Sultmann, H., Poustka, A., and Beissbarth, T. (2008). Predicting pathway membership via domain signatures. Bioinformatics 24, 2137-2142. Markowetz, F., Bloch, J., and Spang, R. (2005). Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics 21, 4026-4032
Short Abstract: In this study, we construct a classifier to find novel drugs for specific disease via exploring gene regulatory network. Firstly, we build a gene regulatory network which has 3 types of interactions (activation, inhibition, and neutral) using data from KEGG, bioCarta, Reactome and Pathway Interaction Database. Then we collect drug-target genes, disease genes and chemical-disease associations from DrugBank, disGeNet and CTD respectively. The chemical IDs are mapped to drugBank IDs. In this study, for each drug-disease pair, we create a vector whose component is value for how much the gene is affected by the drug. To get the value, we find shortest paths from drug-target genes to disease genes on the gene regulatory network. Then we calculate a probability value for each path using degrees of genes on the network. We use the value as weight of the path. For calculating direction of the path, we use interactions between genes in the path. The product of weight and direction implies how much a drug-target gene affects a disease gene through the path. If there is more than one path, values are added up. Additionally, if drug has multi-target genes, we sum up all the values of each target gene. To construct classifier, we make positive set using vectors which have known drug-disease association. Negative set is built with random sampling from vectors excluding the positive set. Finally, we use randomForest and this process is repeated 1000 times for each disease. As a result, we get AUC>0.6 for 265/298 diseases.
Short Abstract: Infectious diseases remain one of the leading causes of mortality and illness in the modern world. However, disease susceptibility can differ wildly, both between individuals of the same species and across species boundaries (e.g. reservoirs of zoonotic pathogens). This interplay between intracellular pathogens and their hosts is mediated through intricate molecular interactions, which generally serve common goals such as the invasion of the host cells, evasion of the immune system and manipulation of host processes for replication. By studying these interactions via a systems biology approach, a better understanding of the underlying determinants of disease susceptibility can be gained. We propose a generic methodology based on frequent pattern mining to tackle this research question and we showcase it in this case-study on the intra-species interactomes of Herpesviridae and their hosts. By utilising the wealth of knowledge that has been compiled into public protein-protein interaction databases and complementing this with different forms of annotation data, we attempt to uncover biological relevant patterns in these networks. This method also conveniently lends itself to the derivation of association rules and the construction of classification models to distinguish between, for example, human and animal herpesviruses. An extension of this workflow is the inclusion of different omics levels to validate or filter the previous findings. Ultimately, our goal is to deliver fundamental insights into the evolutionary drivers of disease susceptibility, as a better understanding of the underlying molecular mechanisms is crucial for the treatment and prevention of pathogen-induced infection diseases.
Short Abstract: ‘’Omics’’ approaches are widely applied to examine in depth physiological processes and pathological conditions, studying the disease pattern. Our interest is oriented to the integration of data from different experimental approaches and fields of investigation, to highlight hidden information and to mine new knowledge from available experimental data. Our study is focused on integrative-functional analysis of molecular pathways that involve AHR (Aryl hydrocarbon receptor), a cytosolic ligand-activated transcription factor in cutaneous malignant melanoma (CMM1) and some skin cancer melanoma-independent . The functional analysis is executed by means of different open-source platforms and tools bioinformatics: GeneCards, DSYSMAP, Oncomine platform for structural-functional and clinical characteristics; Cytoscape platform for realizing and visualizing molecular networks at different levels, in order to improve the knowledge of molecular mechanisms; BioGPS platform, BioXpress and MERAV software to analyze the gene expression profile of our biological target involved in melanoma [2,3], and MelGene DB (melanoma database). In this study, it is explained the molecular networks and discussed the potential roles of specific nodes evidenced by the analysis, comparing this information with Oncomine clinical data, also in consideration of the role of several AHR disease-related mutations in different biological conditions. A deeper analysis of AHR molecular mechanisms based on pro/antitumor functions could be useful for a better understanding of the bound between AHR and development-progression of melanoma, and for proposing novel therapies, in order to cure or control the melanoma evolution.
Short Abstract: The use of gene co-expression networks to model and analyse complex biological phenomena is becoming increasingly commonplace. Whilst several methods exist to create these networks, none offer a means of evaluating parameter selection choices made during network construction. Gene co-expression networks are normally created using default settings where parameter optimisation against objective criteria could result in networks that better represent the underlying biological systems data. Here we tackle this issue using an entropy measure to optimise parameter settings by quantifying their effect on the association of gene sub-networks with biological pathways. Our method compares the topology of gene modules that have been identified by the WGCNA co-expression network tool with the topology of gene interactions in biological pathways. We derive entropy values for each module against each pathway using the weights of common edges between the two graphs and measure the effects of varying correlation method, minimum cluster size and edge weight threshold on module-pathway entropy. To evaluate this approach, we use RNA-Seq data in which genes from specific pathways have been synthetically perturbed to increase association values and measure the sensitivity and specificity of changes in module-pathway entropy. As our method preserves topology we propose that it presents a more rigorous and sensitive way to assess the efficacy of gene co-expression network creation tools such as WGCNA and could be used for their future development.
Short Abstract: In modern Systems Medicine approaches the aim is to look at increasingly complex interactions of complete signaling pathways in order to get a more holistic view for individualized treatment decisions. Individualized treatment decisions and newly developed specialized drugs warrant the need to broaden the focus in individualized medicine from singular biomarkers to pathways. On the other hand pathway databases offer vast amounts of knowledge on biological networks, freely available and encoded in semi-structured formats[BCS06, SAK+09]. The efficient re-use of pathway knowledge and its integration into bioinformatic analyses enables new insights for researchers in systems medicine. However, the vast amount of published data on molecular interactions makes it increasingly challenging for life science researchers to find and extract the most relevant information. Currently, the tools to use this information and integrate it in a clinical context are still lacking. Our idea is to compose an analysis pipeline in order to enable patient-specific systems medicine analyses in a university hospital setting. Our poster will present a workflow for visualizing pathway information and integrating omics data within an interactive online application, utilizing state of the art technology[FLH+16, R C14, KBK+ 13, FBBL15] and well-established standard data models[DCP+ 10, HFS+ 03, PCW+ 15]. References [BCS06] Gary D. Bader, Michael P. Cary, and Chris Sander. Pathguide: a pathway resource list. Nucleic Acids Research, 34(Database issue):D504–506, January 2006. [DCP+10] Emek Demir, Michael P Cary, Suzanne Paley, et al. The BioPAX community standard for pathway data sharing. Nature Biotechnology, 28(9):935–942, September 2010. [FBBL15] Silvia Frias, Kenneth Bryan, Fiona S. L. Brinkman, and David J. Lynn. CerebralWeb: a Cytoscape.js plug-in to visualize networks stratified by subcellular localization. Database, 2015:bav041, January 2015. [FLH+ 16] Max Franz, Christian T. Lopes, Gerardo Huck, et al. Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics, 32(2):309–311, 2016. [HFS+ 03] M. Hucka, A. Finney, H. M. Sauro, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics, 19(4):524–531, March 2003. [KBK+ 13] F. Kramer, M. Bayerlova, F. Klemm, A. Bleckmann, and T. Beissbarth. rBiopaxParser–an R package to parse, modify and visualize BioPAX data. Bioinformatics, 29(4):520–522, February 2013. [PCW+ 15] Dexter Pratt, Jing Chen, David Welker, et al. NDEx, the Network Data Exchange. Cell Systems, 1(4):302–305, October 2015. [R C14] R Core Team. R: A Language and Environment for Statistical Computing. 2014. [SAK+ 09] Carl F. Schaefer, Kira Anthony, Shiva Krupa, et al. PID: the Pathway Interaction Database. Nucleic Acids Research, 37(Database issue):D674–D679, January 2009.
Short Abstract: Paralleling the increasing availability of protein-protein interaction (PPI) network data, several network alignment methods have been proposed. Network alignments have been used to uncover functionally conserved network parts and to transfer annotations. However, due to the computational intractability of the network alignment problem, all aligners are heuristics providing divergent solutions and no consensus exists on a gold standard, or on which scoring scheme should be used to evaluate them. We comprehensively evaluate the alignment scoring schemes and global network aligners on large scale PPI data and observe that three methods, HUBALIGN, L-GRAAL and NATALIE, regularly produce the most topologically and biologically coherent alignments. We study the collective behaviour of network aligners and observe that PPI networks are almost entirely aligned with a handful of aligners that we unify into a new tool, Ulign. Ulign enables a complete alignment of two networks, which traditional global and local aligners fail to do. Also, multiple mappings of Ulign define biologically relevant soft clusterings of proteins in PPI networks, which may be used for refining the transfer of annotations across networks. Hence, PPI networks are already well investigated by current aligners, so to gain additional biological insights, a paradigm shift is needed. We propose such a shift to come from aligning all available data types collectively rather than any particular data type in isolation from others.
Short Abstract: RNA expression profiling is routinely employed to quantify the abundance of tens of thousands of transcripts across different conditions and tissues. Metabolomics, which systematically measures the abundance of small molecules, has emerged as an attractive addition as it directly measures the end products of biological processes and is thus key to understand disease phenotypes. If data gathering has become easier with the improvement of experimental protocols, the analysis of their results still poses significant challenges. Gene set analysis (GSA) has emerged as a promising technique to provide a biological interpretation of such data and pathway topology is one of its most important source of information. Our “graphite” R package provides networks derived from the pathways of six major databases (Biocarta, HumanCyc, KEGG, NCI/Nature Pathway Interaction Database, Panther and Reactome) covering 14 species. The software discriminates between different types of gene groups (complexes or alternative genes of the same family); allows the selection of edges by type of interaction; uniformly converts heterogeneous node IDs using the facilities provided by BioConductor. Moreover, it gives easy access to topological analyses such as the clipper, DEGraph, SPIA and topologyGSA methods. Here we present a novel extension to the package which explicitly tracks small molecules. This makes it possible to capture with a higher level of detail metabolic pathways, further extends the collection of databases to include dedicated resources such as SMPDB and PharmGKB and opens the way to statistical analyses over mixed datasets measuring both RNAs and metabolites.
Short Abstract: It is essential to understand a mechanism of action (MOA) of drug for treating diseases. However, it is still challenging to analyze the mechanism because genes perform various functions interacting with each other inside the body. Therefore, we try to elucidate the MOA of drug with not relationships between genes but modular groups of genes. In our study, a protein-protein interaction (PPI) network is obtained from BioGRID, DIP, KEGG, and Reactome databases. We use the link communities to modularize genes that are significantly related to each other in the PPI network. Edge weights between gene modules are calculated with interactions based on PPI network, and then we construct a weighted gene module-module interaction (MMI) network. Based on the MMI network, modules which are significantly enriched with drug target genes and disease related genes are selected as starting modules and ending modules respectively. To build a drug-disease module pathway, we find shortest path between the starting modules and the ending modules. Finally, we analyze the MOA of drug with the drug-disease module pathway. In the drug-disease module pathway, if there are genes which are unknown to be related to the mechanism, they could be assumed to be involved in the mechanism. Further study of modules in the pathway can give us insight on which functions play important role on the MOA of drug.  Ahn, Y. Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks. Nature, 466(7307), 761-764.
Short Abstract: The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality molecular interaction information and tools to extract relevant relationships from them. We present HitPredict (http://hintdb.hgc.jp/htp/), a consolidated resource of experimentally identified, physical protein–protein interactions with confidence scores to indicate their reliability (1). The latest version of HitPredict provides a high quality dataset of 398,696 physical associations between 70,808 proteins from 105 species (2). To extract condition-specific information from databases like HitPredict, we present TimeXNet (http://timexnet.hgc.jp/), a tool that predicts activated pathways during a cellular response using time-course gene expression profiles (3). TimeXNet identifies activated pathways in a large molecular network between three sets of genes based on their time of expression (4). A web-based version of TimeXNet (http://txnet.hgc.jp/) directly utilizes the interaction networks in HitPredict to enable the analysis of gene expression profiles from multiple species. References 1. HitPredict: a database of quality assessed protein-protein interactions in nine species. Ashwini Patil, Kenta Nakai and Haruki Nakamura, Nucleic Acids Research 39:D744:D749, 2011. 2. HitPredict version 4 - comprehensive reliability scoring of physical protein-protein interactions from more than 100 species. Yosvany Lopez, Kenta Nakai and Ashwini Patil, DATABASE 2015:bav117, 2015. 3. TimeXNet: Identifying active gene sub-networks using time-course gene expression profiles. Ashwini Patil and Kenta Nakai, BMC Systems Biology, 8(Suppl4), S2, 2014. 4. Linking transcriptional changes over time in stimulated dendritic cells to identify gene networks activated during the innate immune response. Ashwini Patil, Yutaro Kumagai, Kuo-ching Liang, Yutaka Suzuki and Kenta Nakai, PLOS Computational Biology 9(11):e1003323, 2013.
Short Abstract: Reactome (http://reactome.org) is a free, open-source, curated and peer-reviewed knowledge base of biomolecular pathways that provides infrastructure and intuitive bioinformatics tools for search, visualisation, interpretation and analysis of pathways. From the data point of view, it offers detailed representations of cellular processes as an ordered network of molecular reactions, annotating them in a consistent pathway format to create an online resource for researchers and students as a core reusable pathway dataset for systems biology based approaches. This network amounts to thousands of interconnected terms forming a graph of biological knowledge. Storing, retrieving, and analysing such networks can become inefficient when relying on a relational database management system (RDBMS). Although relational databases are widely used among pathway knowledge-bases for data management, they are not always the best fit to deal with modern performance requirements and increasing complexity of data. Complexity in this case does not only refer to the quantity of data but also its interconnectedness. The benefit of storing these data in their natural form is that there is no need to be transformed into a flat table format but instead, can be persisted as originally designed. Adopting Neo4j as the graph database management system helps reducing the complexity of the database and, thus, allows a more straightforward access to the Reactome knowledgebase via its query language, Cypher. The latter allows for faster queries, reducing the average response time per query by 95%, and helps expressing the queries in a more intuitive, human readable and easy to learn syntax.
Short Abstract: Common Gene Regulatory Network (GRN) inference methods, such as LASSO, do not provide information about the confidence of inferred links. We address this by extending the bootstrap method, instead overlapping the analysis in iterated runs, and applying it to three inference methods. Details of the shortcomings of L1-regularization methods when operating over sufficiently informative data are known. Here, all of the referenced methods perform sub-optimally in terms of Matthew's Correlation Coefficient (MCC) for low signal-to-noise ratio (SNR) data matrices, even when the data are informative enough for network inference by other metrics. It is thus important not only to introduce methods which are optimized for analyzing datasets of certain quality, but also to define criteria for determining which method to use to optimize analysis. When considering which gene-gene interactions are true, we seek to differentiate spurious gene-gene interactions from those that truly exist in the system. To this end we use a linear ODE model and the GeneSPIDER package to infer the regulatory network of interactions by relating the effect of single gene perturbations to the expression of the remaining unperturbed set.
Short Abstract: Drug-target interaction (DTI) prediction is a fundamental step in drug discovery. Given an unknown pair, (di, tj), of drug compound di and target protein tj, the objective is to predict whether di interacts with tj given a known DTI network as training data. Machine learning (ML) is currently being used for DTI predictions. There are mainly two types of ML-based approaches: similarity-based methods and feature-based methods. We propose a new feature-based approach which uses short-linear motifs (SLiMs) as protein features combined with chemical substructure fingerprints used as drug features, and applied ML methods to predict DTIs. SLiMs are short protein sequence patterns of 3-10 amino acids involved in the recognition and targeting activities of drugs. Existing methods for DTI predictions consider the absence of interaction between a drug di and a target tj in the training data as a true negative interaction. However, the lack of interaction in (di, tj) means unknown, not negative. We introduce a strategy that finds negative pairs based on protein and drug features, and devise feature selection and classification algorithms to predict DTIs. We tested our DTI prediction approach on four gold-standard data sets (Yamanishi et al, 2008). Both, random forest (RF) and support vector machine (SVM) classifiers give high AUC performance of 99.24% and 97.64%, respectively. Our method also outperforms existing DTI prediction methods discussed in literature. Generally, RF performs better than SVM, with AUC results of 99.04%, 96.39%, 97.33%, and 87.64%, respectively in each data set.
Short Abstract: Motivation: Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet prediction of tissue-specific cellular function remains a critical challenge for biomedicine. Results: Here we present OhmNet, a hierarchy-aware unsupervised node feature learning approach for multi-layer networks. We build a multi-layer network, where each layer represents molecular interactions in a different human tissue. OhmNet then automatically learns a mapping of proteins, represented as nodes, to a neural embedding based low-dimensional space of features. OhmNet encourages sharing of similar features among proteins with similar network neighborhoods and among proteins activated in similar tissues. The algorithm generalizes prior work, which generally ignores relationships between tissues, by modeling tissue organization with a rich multiscale tissue hierarchy. We use OhmNet to study multicellular function in a multi-layer protein interaction network of 107 human tissues. In 48 tissues with known tissue-specific cellular functions, OhmNet provides more accurate predictions of cellular function than alternative approaches, and also generates more accurate hypotheses about tissue-specific protein actions. We show that taking into account the tissue hierarchy leads to improved predictive power. Remarkably, we also demonstrate that it is possible to leverage the tissue hierarchy in order to effectively transfer cellular functions to a functionally uncharacterized tissue. Overall, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems.
Short Abstract: The study of viruses is aided by bioinformatics resources such as protein–protein interaction databases. Having a comprehensive picture of a virus protein's interaction partners is crucial to the understanding of the viral lifecycle and aids in the search for vaccines and antiviral drugs. Here, we extend the STRING database of protein-protein interactions to store and display cross-species virus-host and intra-virus interactions. Information is taken from different channels: experimental evidence, pathways and text mining. This enables the visualization of networks that show the virus interacting with the host proteins, which are primarily proteins of the innate immune system.
Short Abstract: Systems immunology leverages recent technological advancements that enable broad profiling of the immune system to better understand the response to infection and vaccination, as well as the dysregulation that occurs in disease. An increasingly common approach to gain insights from these large-scale profiling experiments involves the application of statistical learning methods to predict disease states or the immune response to perturbations. However, the goal of many systems studies is not to maximize accuracy, but rather to gain biological insights. The predictors identified using current approaches can be uninterpretable or present only one of many equally predictive models, leading to a narrow understanding of the underlying biology. Here we show that incorporating prior biological knowledge within a logistic modeling framework by using network-level constraints on transcriptional profiling data significantly improves interpretability. Moreover, incorporating different types of biological knowledge produces models that highlight distinct aspects of the underlying biology, while maintaining predictive accuracy. We propose a new framework, Logistic Multiple Network-constrained Regression (LogMiNeR), and apply it to understand the mechanisms underlying differential responses to influenza vaccination. While standard logistic regression approaches were predictive, they were minimally interpretable. Incorporating prior knowledge using LogMiNeR led to models that were equally predictive yet highly interpretable. In this context, B cell-specific genes and mTOR signaling were associated with an effective vaccination response in young adults. Overall, our results demonstrate a new paradigm for analyzing high-dimensional immune profiling data in which multiple networks encoding prior knowledge are incorporated to improve model interpretability.
Short Abstract: Networks can model real-world systems in a variety of domains. Network alignment (NA) aims to find a node mapping that conserves similar regions between compared networks. NA is applicable to many fields, including computational biology, where NA can guide the transfer of biological knowledge from well- to poorly-studied species across aligned network regions. Existing NA methods can only align static networks. However, most complex real-world systems evolve over time and should thus be modeled as dynamic networks. We hypothesize that aligning dynamic network representations of evolving systems will produce superior alignments compared to aligning the systems' static network representations, as is currently done. For this purpose, we introduce the first ever dynamic NA method, DynaMAGNA++. This proof-of-concept dynamic NA method is an extension of a state-of-the-art static NA method, MAGNA++. Even though both MAGNA++ and DynaMAGNA++ optimize edge as well as node conservation across the aligned networks, MAGNA++ conserves static edges and similarity between static node neighborhoods, while DynaMAGNA++ conserves dynamic edges (events) and similarity between evolving node neighborhoods. For this purpose, we introduce the first ever measure of dynamic edge conservation and rely on our recent measure of dynamic node conservation. Importantly, the two dynamic conservation measures can be optimized using any state-of-the-art NA method and not just MAGNA++. We confirm our hypothesis that dynamic NA is superior to static NA, under fair comparison conditions, on synthetic and real-world networks, in computational biology and social network domains. DynaMAGNA++ is parallelized and it includes a user-friendly graphical interface.
Short Abstract: Motivation: Proteins in a metabolic pathway catalyze reactions on similar metabolites. Examination of metabolic networks provides information on key targetable proteins. In this study, we propose a novel machine-learning based method to cluster proteins by representing them with ligands they bind to. The proteins are represented utilizing the word-embeddings of the SMILES representations of their ligands and then clustered using K-means algorithm. We compare this method with two other methods one of which is another machine-learning based approach that utilizes the word-embeddings model to represent proteins using their sequences. The other is a network-based model that we introduced in our previous study, in which proteins are connected to each other based on the ligands which they interact. We showed that the investigation of proteins based on the ligands with which they interact reveals functionally meaningful protein families on a network model. To collect the required protein-ligand interaction data, we also developed a Python package to automatically extract protein-ligand interactions from available databases. As a test case, we examined proteins that participate in the sphingolipid (SL) metabolic pathway. Results: Our results show that describing proteins with the ligands that they bind to brings the ligand similarity information within, thus leading to the construction of functionally meaningful protein clusters. Availability: https://github.com/hkmztrk/SMILES2VecBasedProteinClustering
Short Abstract: Motivation: Incorporating gene interaction data into the identification of ‘hit’ genes in genomic experiments is a well-established approach leveraging the ‘guilt by association’ assumption to obtain a network based hit list of functionally related genes. We aim to develop a method to allow for multivariate gene scores and multiple hit labels in order to extend the analysis of genomic screening data within such an approach. Results: We propose a Markov random field based method to achieve our aim and show that the particular advantages of our method compared to those currently used lead to new insights in previously analysed data as well as for our own motivating data. Our method additionally achieves the best performance in an independent simulation experiment. The real data applications we consider comprise of a survival analysis and differential expression experiment and a cell-based RNA interference functional screen. Availability: We provide all of the data and code related to the results in the paper.
Short Abstract: Micro-organisms are ubiquitous and exist together in communities, where they interact with each other through several means, especially the exchange of chemical signals and metabolites. There has been an increasing interest in using microbial consortia for applications in metabolic engineering. A microbial consortium, besides enjoying division of labour, also provides a wider scope for joint exploration of diverse metabolisms. Although there are many naturally occurring microbial communities, their systematic exploration has been very rarely carried out. This is primarily due to the difficulties in understanding the complex interactions between the organisms in a community. In order to systematically design a microbial community, it is important to understand their metabolism and the metabolic exchanges happening therein. To this end, we have developed a novel graph-based algorithm, ComPass (Community Pathway analysis), to predict all possible metabolic interactions that occur between microorganisms in a consortium. We demonstrate that ComPass can easily scale to large metabolic networks and can reliably predict several sub-networks between any given source and target metabolite. We illustrate the utility of ComPass to understand existing microbial communities by analysing the predicted sub-networks from different types of microbial communities and demonstrate interesting metabolic exchanges that occur between the micro-organisms.
Short Abstract: Tumor microenvironment (TME) plays important and, sometimes, opposite roles in tumor evolution. We have created a collection of comprehensive cell type specific maps of molecular interactions in TME. The collection includes maps of Macrophages (Mph), Dendritic cells (DC), Natural killers (NK), and non-immune cancer-associated fibroblasts (CAFs) map. Cell type-specific innate-immune maps were integrated together with specific information on Neutrophils, Mast cells and MDSC in TME, which gave rise to a seamless comprehensive meta-map of innate immune response in cancer, depicting signalling responsible for anti- and pro-tumour activities of innate immunity system as a whole. It is a ‘geographical-like’ hierarchically organized meta-map with functional ‘zones’, namely, signalling mechanisms contributing to anti-tumor or pro-tumor immune phenotypes. The map contains 1476 objects and based on the information manually retrieved from 820 papers. It will soon become part of ACSN, http://acsn.curie.fr.
Finally, we applied these network maps for identification of molecular mechanisms regulating cell reprogramming in several innate immune cell types. We applied unsupervised classification methods for decomposing single cell RNASeq data on fibroblasts, natural killers and macrophages from melanoma and selected several sub-groups in each population, which probably play different functional roles in TME. Analysis and interpretation of the expression data for each subset in the context of cell-type specific maps and the innate immunity meta-map revealed the functional differences resulting from transcriptomic heterogeneity in several cell types.
Short Abstract: Signaling pathways are series of reactions that are typically initiated by an extracellular ligand to a membrane-bound receptor, culminating in altered expression of a set of target genes. Pathways are commonly represented as graphs, which offer elegant algorithms for analyzing signaling pathways but fail to capture many-to-many relationships among molecules in signaling reactions. We recently presented a shortest path formulation posed on directed hypergraphs, a generalization of graphs which capture many aspects of signaling reactions. However, it offered a strict and restrictive definition of connectivity that limited applicability to real-world signaling pathways. Here, we extend a mixed Integer Linear Program (mILP) to achieve hyperpath relaxations in two ways. First, we allow simple cycles in shortest hyperpaths that capture feedback loops. Second, we allow plausible "source" nodes that are not specified in advance. We apply these relaxations to hypergraphs built automatically from pathway databases.
Short Abstract: Describing living systems through the reconstitution of their genomic-regulatory functions stands for the biggest challenge of the current "big-data omics" era. Here we present TETRAMER, a cytoscape app providing a user-friendly framework for the reconstruction of cell fate transition-specific GRNs by integrating user-provided temporal transcriptomes with generic GRNs derived from (i) the analysis of multiple publicly available human/mouse gene expression profiles (CellNet); (ii) the genome-wide mapping of promoters and enhancers in multiple cell type/tissues from CAGE data generated by the FANTOM5 consortium (regulatory circuits); (iii) the systematic analysis of most publicly available ChIP-sequencing data corresponding to TF-binding in a variety of human or mouse cell type/tissues (ngs-qc: http://ngs-qc.org).
TETRAMER evaluates the capacity of each TF, retrieved on the GRN, to drive cell fate transformation. For it the temporal transcriptional regulation cascade derived from each TF is scrutinized as a way to verify its influence on the reconstitution of the differential gene expression patterns associated to the cell fate transition. TETRAMER is available from its dedicated website: http://igbmc.fr/Gronemeyer/qcgenomics/TETRAMER
Short Abstract: A major challenge in precision medicine is the development of methods to predict drug response using multi-omic data with models that are both accurate and interpretable. There are a large number of statistical methods (e.g. LASSO, elastic net, random forest regression) that generate predictive models and allow for automatic feature selection. Still, these methodologies often fail to yield biologically interpretable models because their results are purely numerically-driven and issues, such as collinearity, can mask the most biologically relevant features. We present an analysis of large cell line databases using a network-constrained regression that incorporates pathway information from the Pathway Commons database to bridge the limiting gap in existing methods that ignore biological knowledge. In this work, we examine multivariate linear models based on combinations of molecular features and evaluate their predictive power for ~200 FDA-approved or investigational compounds.
Short Abstract: We developed a new solution to visualize the biological pathways involved in sparse metabolomics data. Using knowledge from two pathway resources and ontology-based approaches, we can show the directed networks between active metabolites from metabolomics data. The data from both resources is made interoperable by collapsing metabolites in the pathways into single nodes in the biological networks using ontological approaches. This explicit ontological linking allows for precise biological interpretation of the paths. By using Neo4j and Cytoscape, we ensure the computational calculation environment for larger networks as well as advanced visualization functionality to investigate the identified subnetworks. The generic nature of this approach opens up the option to combine with other omics data sources, such as proteomics and transcriptomics.
View Posters By Category
- Bioinformatics Open Source Conference (BOSC)
- Network Biology
- Regulatory Genomics (RegGenSig)
- Computational Modeling of Biological Systems (SysMod)
Session A: (July 22 and July 23)
- High Throughput Sequencing Algorithms and Applications (HitSeq)
- Machine Learning Systems Biology (MLSB)
- Translational Medicine (TransMed)