SPONSORS:

Silver

Silver Sponsor: Sanofi



General

General Sponsor - IBM Research

General Sponsor - MAGNet

General Sponsor -National Cancer Institute

RECOMB/ISCB RegSysGen 2014 Sponsor - NRNB

Cytoscape Sponsors

RECOMB/ISCB RegSysGen 2014 Sponsor - Agilent Technologies

RECOMB/ISCB RegSysGen 2014 Sponsor - Cytoscape

SYSTEM BIOLOGY POSTERS




Updated Nov 5, 2014


SB P01: FAST-SL: An efficient algorithm to identify synthetic lethal reaction/gene sets in metabolic networks

Aditya Pratapa1, Shankar Balachandran1, Karthik Raman1

1Indian Institute of Technology Madras

Synthetic lethal reaction/gene sets are sets of reactions/genes where only the simultaneous removal of all reactions/genes in the set abolishes growth of an organism. In silico, synthetic lethal sets can be identified by simulating the effect of removal of reaction/gene sets from the reconstructed genome-scale metabolic network of an organism. Previous approaches to identifying synthetic lethal reactions in genome-scale metabolic networks have built on the framework of Flux Balance Analysis (FBA), extending it either to exhaustively analyze all possible combinations of reactions, or formulate the problem as a bi-level Mixed Integer Linear Programming (MILP) problem.

FAST-SL circumvents the complexity of both exhaustive enumeration and the bi-level MILP by iteratively reducing the search space and the computational time involved in identification of synthetic lethal reaction sets. FAST-SL, while considering all possible phenotypes and all parts of metabolism, efficiently identifies the targeted phenotypes. Our algorithm shows more than a 4000-fold reduction in search space over exhaustive enumeration of triple lethal sets for Escherichia coli iAF1260 model. Unlike the previous methods used for identification of lethal reaction sets, FAST-SL uses the sparsest solution obtained by solving the flux balance constraints of a metabolic network, which is a linear programming problem, to eliminate reaction combinations that do not lead to a lethal phenotype, thereby reducing the search space for identifying lethal reaction sets.

As our algorithm finds application in the identification of combinatorial drug targets, in this study, we performed synthetic reaction and gene lethality analysis for genome-scale reconstructions of Salmonella enterica typhimurium and Mycobacterium tuberculosis. We validated the reaction lethals obtained using FAST-SL with exhaustive enumeration of reaction deletions up to the order of two for these organisms. The triple lethal reactions obtained for Escherichia coli using FAST-SL have a precise match with the results obtained with exhaustive enumeration, by performing it on a high-performance computer cluster. Our results also completely agree with those of the SL finder algorithm (Suthers, P.F. et al (2009). Mol Syst Biol, 5:301); notably, our algorithm is substantially faster. Further, we also present a mathematical proof for the correctness of our algorithm.

Overall, FAST-SL is a powerful tool to identify the lethal reaction/gene sets, through a massive reduction in the search space over an exhaustive enumeration approach and the SL Finder algorithm. We believe that our algorithm presents an important advance and can enable the rapid enumeration of synthetic lethal reaction/gene sets in genome-scale metabolic networks.

Availability: The MATLAB implementation of our algorithm (compatible with the COBRA toolbox v2.0, a popular toolbox for constraint-based analysis of metabolic networks) is freely available from: https://home.iitm.ac.in/kraman/lab/research/fast-sl.

............................................................................................................................
SB P02: Identification of master regulatory genes as a minimum connected dominating set

Maryam Nazarieh1, Volkhard Helms1

1Saarland University

The identification of gene regulatory networks governing cellular identity is one of the main challenges for understanding the mechanisms underlying cellular differentiation and reprogramming or cancerogenesis. In this work, we reformulate this problem as an optimization problem, namely that of determining a Minimum Connected Dominating Set for directed graphs. Our approach is motivated by the observation that the pluripotency network in embryonic stem cells is maintained by a small set of key transcription factors which share hundreds of target genes. To exactly identify a particular subset of genes among 2n possible subsets of n genes takes exponential time, but approximation algorithms have been developed which find a close to optimal solution in polynomial time with a constant approximation factor. Here, we show on the basis of time-series gene expression data during the cell cycle of S. cerevisiae that this method reliably identifies top regulators that are known to govern the cell cycle in this model organism.

............................................................................................................................
SB P03: Biological signaling pathways and potential mathematical network representations: biological discovery through optimization

Juan Rosas1, Enery Lorenzo1, Lynn Perez1, Michael Ortiz1, Clara Isaza2, Mauricio Cabrera1

1University of Puerto Rico at Mayaguez, 2Ponce School of Medicine and Health Sciences

Establishing the role of different genes in the development of cancer can be a daunting task, starting with the detection of genes that are important in the illness from high-throughput biological experiments. These experiments belong to the ‘omics denomination, as in genomics, proteomics, metabolomics, and the like. Furthermore, it is safe to say that even with a list of potentially important genes it is highly unlikely that these show changes in expression in isolation. A biological signaling path is a more plausible underlying mechanism. This work attempts the analysis of a microarray experiment to build a mathematical network problem. A pre-selection of genes is carried out with a multiple criteria optimization framework previously published by our research group. First results are presented in lung cancer.

............................................................................................................................
SB P04: Discovering disease associated molecular interactions using Discordant correlation

Charlotte Siska1, Katerina Kechris1

1University of Colorado Anschutz Medical Campus

A common approach for identifying molecular features (such as transcripts or proteins) associated with disease is testing for differential expression or abundance in –omics data. However, this approach is limited for studying interactions between molecular features, which would give a deeper knowledge of the relevant molecular systems and pathways. We have developed a method for this purpose that we call the Discordant method. The Discordant method measures the posterior probability that a pair of features has discordant correlation between phenotypic groups using mixture models and the EM algorithm. We compare our method to existing approaches, one that uses Fisher’s transformation in a classical frequentist framework and another that uses an Empircal Bayes joint probability model. We prove with simulations and miRNA-mRNA glioblastoma data from the Cancer Genome Atlas that the Discordant method performs better in predicting related feature pairs. In simulations we demonstrate that while all of the methods have similar specificity, the Discordant method has better sensitivity and is better at identifying pairs that have a correlation coefficient close to 0 in one group and a largely positive or negative correlation coefficient in the other group. Using the glioblastoma data, which has matched samples between miRNA and mRNA, we find that the Discordant method finds relatively more glioblastoma-related miRNAs compared to other methods. We conclude from the results in both simulations and glioblastoma data that the Discordant method is more appropriate for identifying molecular feature interactions unique to disease.

............................................................................................................................
SB P05: The effect of context-specificity on predicting mechanism of action

Yishai Shimoni1, Mukesh Bansal1, Jung Hoon Woo1, Paola Nicoletti1, Charles Karan1, Andrea Califano1

1Columbia University

We recently developed an algorithm (DeMAND) to elucidate the mechanism of action of a compound using gene expression data following compound-perturbation and a regulatory network. An important question is what is the effect of the cell-type-specificity of network on the quality of the predictions? Here we analyze the effect of using various regulatory networks on the same data and benchmark their influence on identifying genes that are known to be involved in the mechanism of action of the tested compounds. We use context-specific networks from various platforms, context independent networks (from the STRING database), and networks that are specific to other cell types. Our results show that context specificity is essential to achieve high performance.

............................................................................................................................
SB P06: Autophagy Regulatory Network – a general resource and its application to analyze bacterial autophagy modulation

Dénes Türei1,2,*, László Földvári-Nagy1,*, Leila Gul1,*, Dávid Fazekas1, Dezső Módos1,2,3, János Kubisch1, Tamás Kadlecsik1, Amanda Demeter1, Katalin Lenti1,3, Péter Csermely2, Tibor Vellai1, Tamás Korcsmáros1,2,4,5,§

* These authors contributed equally to this work.
1 Department of Genetics, Eötvös Loránd University, Pázmány P. s. 1C, H-1117, Budapest, Hungary
2 Department of Medical Chemistry, Semmelweis University, PO Box 260, H-1444, Budapest, Hungary
3 Department of Morphology and Physiology, Faculty of Health Sciences, Semmelweis University, Vas u. 17, H-1088, Budapest, Hungary
4 TGAC, The Genome Analysis Centre, Norwich Research Park, Norwich, UKb
5 Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, UK
§ corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Autophagy is a complex cellular process having multiple roles, depending on tissue, physiological or pathological conditions. Major post-translational regulators of autophagy are well known, however, they have not yet been collected comprehensively. The precise and context dependent regulation of autophagy necessitates additional regulators, including transcriptional and post-transcriptional components that are listed in various datasets. Prompted by the lack of systems-level autophagy-related information, we manually collected the literature and integrated external resources to gain a high coverage autophagy database. We developed an online resource, Autophagy Regulatory Network (ARN; http://autophagy-regulation.org), to provide an integrated and systems-level database for autophagy research. ARN contains manually curated, imported and predicted interactions of autophagy components (1,485 proteins with 4,013 interactions) in humans. We listed 413 transcription factors and 386 miRNAs that could regulate autophagy components or their protein regulators. We also connected the above mentioned autophagy components and regulators with signaling pathways from the SignaLink 2 resource. The user-friendly website of ARN allows researchers without computational background to search, browse and download the database. The database can be downloaded in SQL, CSV, BioPAX, SBML, PSI-MI and in a Cytoscape CYS file formats. ARN has the potential to facilitate the experimental validation of novel autophagy components and regulators. In addition, ARN helps the investigation of transcription factors, miRNAs and signaling pathways implicated in the control of the autophagic pathway. The list of such known and predicted regulators could be important in pharmacological attempts against cancer and neurodegenerative diseases. (Turei at al, Autophagy, in press)

Autophagy is also known to be important for intestinal homeostasis and its malfunction is related to inflammatory bowel disease (IBD) and cancer. Conversely, autophagy is often manipulated by intestinal pathogenic bacteria, such as Salmonella. Better understanding the effect of certain bacterial species on the regulation of human intestinal autophagy could help us to propose IBD and colon cancer prognosis markers. Accordingly, we recently examined the potential autophagy regulating functions of 62 protein-protein interactions detected between Salmonella and human cells. We found that at least three of these interactions could have autophagy regulating functions (AvrA-p53; AvrA-Beta catenin; sifA-RAB7A). We also predicted 144 domain-domain based interactions between Salmonella proteins and 106 human proteins involved in autophagy or its regulation (e.g., the autophagy inducing ULK1 and the selective autophagy component ATG16L1). Our domain-motif based prediction found that the Salmonella E3 ubiquitin-protein ligase SlrP could bind to the upstream autophagy regulator PI3K. This interaction is in agreement with the function of SlrP. Thus, these preliminary interaction predictions already show the power of computational biology methods to generate a pool of potential candidates that are responsible for bacterial autophagy modulation.

Keywords: autophagy, regulation, network, protein-protein interactions, transcription factors, signaling pathway, Salmonella, inflammatory bowel disease

............................................................................................................................
SB P07: Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing

Fernando Biase1, Xiaoyi Cao1, Sheng Zhong1

1University of California, San Diego

It remains an open question when and how the first cell fate decision is made in mammals. Using deep single-cell RNA-seq of matched sister blastomeres, we report highly reproducible inter-blastomere differences among ten 2-cell and five 4-cell mouse embryos. Inter-blastomere gene expression differences dominated between-embryo differences and noises, and were sufficient to cluster sister blastomeres into distinct groups. Dozens of protein-coding genes exhibited reproducible bimodal expression in sister blastomeres, which cannot be explained by random fluctuations. The protein expression of one, out of four of these bimodal genes tested, Gadd45a, exhibited clear inter-blastomeric contrasts. We traced some of the bimodal mRNA expressions to embryonic genome activation, and others to blastomere-specific RNA depletion. Inter-blastomere differences created co-expression gene networks that were much stronger and larger than those that can be possibly created by random noises. The highly correlated gene pairs at the 4-cell stage overlapped with those showing the same directions of differential expression between inner cell mass (ICM) and trophectoderm (TE). These data substantiate the hypothesis of inter-blastomere differences in 2- and 4-cell mouse embryos, and associate these differences with ICM/TE differences.

............................................................................................................................
SB P08: Inferring disease mechanisms from multiple gene expression datasets

Sahar Ansari1, Michele Donato1, Sorin Draghici1

1Wayne State University

The ultimate goal of any biological experiment is to understand the underlying phenomenon of the investigated condition. Understanding the mechanisms that cause changes in a phenotype requires the identification of the genes that are disrupted in that phenotype, and the relationships between them.

The networks that explain the interactions between genes can be used to 1) predict the disease or the responses of the system to a specific impact (e.g. drugs), and 2) find the subset of genes that interact with each other and have an important involvement in the condition of interest.

Current technologies allow us to measure gene expression with unprecedented accuracy. The interactions between genes can imply an indirect relation between them via their protein product(s) or their transcription factor(s). Another source of information that can help in the understanding of gene-gene interaction is the physical interactions between genes such as protein-protein interactions (PPI) and/or protein-DNA interaction (PDI) networks.

The currently available methods fail to discover the condition-specific relationships between genes with high accuracy. Many existing methods find gene regulatory networks without focusing on one specific phenotype. These networks are not precise, because genes interact with each other differently in different conditions. As an example there are more than 300 interactions in the KEGG pathway database that exist only in phenotype-specific pathways (e.g. Alzheimer’s disease, colorectal cancer, etc.) and do not exist in others.

In this work, we use multiple gene expression data sets to find the effect of the genes on the ones downstream. In the snapshot data the expression of the genes is measured at one time point; therefore, the effect of one gene on the others may not be captured in only one dataset.

We use the union of the differentially expressed (DE) genes from each dataset as a unique list of DE genes. We build a “neighbor” network for each gene with the edges from these genes and all genes immediately downstream of them in the PPI network. In the next step, we calculate the enrichment of each neighbor network based on the number of DE genes they contain. In the last step, we construct a unique network by joining all significant neighbor networks that are connected to each other. This overall network can span different existing pathways or can be a subgraph or one. This network represents the proposed putative mechanism that is consistent with all measured expression changes, all known PPI interactions and are unlikely to be impacted to the level observed just by chance. We applied this approach on three datasets that come from experiments studying type II diabetes. We assessed the result by comparing the constructed network with the pathways that are associated with diabetes in the KEGG database. We rank the pathways based on their enrichment compared to the resulting mechanism. Pathways such as the TGF-beta signaling pathway (enrichment p-values=6.74e-40) and pancreatic secretion pathway (enrichment p-values=4.78e-06), which are associated to type II diabetes, are highly ranked among other pathways. We also performed Gene Ontology (GO) analysis to find the biological processes that are enriched with the resulting mechanism. The results show that the proposed mechanism includes genes known to have important functions in type II diabetes. Also, many of the interactions included in the putative mechanism are present in pathways that are known to be associated with type II diabetes.

............................................................................................................................
SB P09: DSPathNet: a novel computational framework to decipher drug signaling pathway networks for understanding drug action

Jingchun Sun1, Min Zhao2, Peilin Jia3, Lily Wang3, Yonghui Wu1, Carissa Iverson3, Yubo Zhou4, Erica Bowton3, Dan Roden3, Joshua Denny3, Melinda Aldrich3, Hua Xu1, Zhongming Zhao3

1University of Texas Health Science Center at Houston, 2Vanderbilt University School of Medicine, 3Vanderbilt University, 4Chinese Academy of Sciences

A drug performs its function via a cascade to transfer chemical signals from drug binding proteins to signal recipient transcription factors (TFs). The cascade is complicated, involving multiple signaling pathways acting in the network mode. Reconstruction of signaling pathway networks is vital for the identification of drug targets and off-targets, which in turn facilitates our understanding of the mode of drug action and drug development. However, it is challenging to abstract multiple signaling pathways involved in the drug action into one system.

To address this challenge, we developed a novel computational framework, a Drug-specific Signaling Pathway Network method (DSPathNet), for constructing a signaling pathway network (SPNetwork) for an individual drug of interest. The SPNetwork is expected to include genes that harbor genetic variations contributing to the pathology of the drug indication or drug response. We illustrated the utility of DSPathNet using metformin, which is one of the most widely prescribed anti-diabetic drugs in the world and has been recently shown to be useful for cancer treatment and prevention in people at higher risk. Given the available data and the nature of signal transduction cascades, we compiled 65 metformin upstream genes and 66 metformin downstream genes. Then by overlaying them onto the human SPNetwork, we compiled and applied random walk algorithms through longitudinal and lateral movements, generating one metformin-specific SPNetwork with 477 nodes and 1,366 edges. By examining the disease genes and genotyping data of multiple GWAS data in the network, we found that the metformin network was significantly enriched with disease genes for both T2D and cancer, and that the network also included genes that may be associated with metformin-associated cancer survival. Furthermore, from the metformin SPNetwork and common genes to T2D and cancer, we generated a subnetwork to highlight molecule crosstalk between T2D and cancer. The follow-up network analyses and literature mining revealed that seven genes (CDKN1A, ESR1, MAX, MYC, PPARGC1A, SP1, and STK11) and one novel MYC-centered pathway with CDKN1A, SP1, and STK11 may play important roles in metformin’s antidiabetic and anticancer effects.

In this study, we showed that 1) DSPathNet is a novel approach for constructing drug-specific signal transduction networks; 2) Metformin-specific SPNetwork provides insights into the molecular mode of metformin; and 3) the study serves as a model for exploring signaling pathways to facilitate understanding of drug action, disease pathogenesis, and identification of drug targets.

............................................................................................................................
SB P10: Learning nucleotide groups from RNA structure-probing data

Xihao Hu1, Kevin Yip1

1The Chinese University of Hong Kong

Motivation: Genome-wide RNA structure probing has become popular in recent years, especially in the study of RNA structures in vivo. High-throughput methods that detect modifications on unpaired adenines and cytosines by dimethyl sulphate (DMS) treatment are able to distinguish exposed, unpaired nucleotides from others in either in vitro or in vivo systems. Previously we have developed a mixture of Poisson linear model to fit the raw read counts from high-throughput RNA structure probing data based on structure-specific enzymatic cuts. We have shown that the hidden states learned by the model can be combined with other sequence features to predict protein binding sites on RNAs.

Results: We have applied our developed methods to new DMS data. We compared the modeling results from data obtained by three different conditions, namely in vivo, in vitro, and denatured. We found that our method provided the highest improvements of modeling accuracy as compared to other simpler models when applied to in vivo data. The results suggest that DMS data from in vivo systems are closer to a mixture of states, which we hypothesize to be a state for exposed unpaired nucleotides and a state for all other nucleotides. By transforming raw read counts into probabilities of the two states, our method makes the sequencing data easier to interpret and provides useful features for downstream analyses. We have made our model and software publicly available, and hope it can help extract important features of various types of RNA structure probing data.

............................................................................................................................
SB P11: Expression amplitude based drug repositioning

Mario Failli1, Vincenzo Belcastro1

1Telethon Institute of Genetics and Medicine (TIGEM)

Background: Traditionally, most drugs have been discovered using phenotypic or target–based screens, but their indications are often expanded on the basis of clinical observations, providing additional benefit to patients. For that reason and in response to the high cost and risk in traditional de novo drug discovery, discovering potential uses for existing drugs, also known as drug repositioning, has attracted increasing interest from both the pharmaceutical industry and the research community (Hurle et al., 2013).

Recent research has shown that computational approaches have the potential to offer systematic insights into the complex relationships among drugs, targets, and diseases for successful repositioning. In particular, targeted mechanism-based drug-repositioning methods integrate treatment omics data to delineate the unknown mechanisms of action of drugs. Such research led to the development of computational approaches to predict drug mode of action (MoA) and drug repositioning from the analysis of the Connectivity MAP (CMAP) (Lamb J. et al., 2006), a compendium of gene expression profiles (GEPs) following drug treatment of 5 human cell lines with 1309 bioactive small molecules.

Iorio et al. (2009) first proposed to construct a ‘Prototype List’ of the drug by merging its experiments from cell lines, batches, concentrations, and microarray platforms. A following study by Iskar et al. (2010) overcame the batch effect (Lander E.S., 1999) by implementing a novel protocol with filtering and normalization steps, applicable to gene expression upon heterogeneous drug treatments. However, both methods suffer from two main limitations: lack of confidence levels over the ranked lists of genes, and use of fixed-length prototype ranked lists to compute drug-drug associations.

Objective: Development of a procedure to associate to each drug the full list of differentially expressed genes, following treatment across multiple cell lines or different dosages, while keeping information on the ratio between treatment and control (fold change) and its statistical significance (p-value). Both fold changes and p-values are then combined to compute drug-drug associations over the full list of genes.

Material/Methods: The approach requires two normalization steps: first, a Robust Multi-array Averaging (RMA) algorithm (Gautier et al., 2004) normalizes between treated and untreated samples of a single experiment. Second, the ‘quantile’ normalization (Yang and Thorne, 2003) forces the RMA normalized arrays, belonging to the same drug, to have identical empirical distribution in order to avoid batch effects. A linear model of treated conditions versus controls is then fitted, and the significance level of each differentially expressed gene computed. Probesets are then ranked based on a combination of both fold change and p-value (Martin et al., 2012), and pairwise distances between drugs computed via Spearman rank correlation. A similar approach was applied to derive pairwise drug correlations from Iorio and Iskar’s ranked lists for comparison purposes.

Results and Discussion: We collected for 469 out of 511 drugs the Anatomical Therapeutic Chemical (ATC) code (Pahor et al., 1994) and determined their distances adopting string similarity criteria. We next compared the three methods by computing the Kolmogorov-Smirnov (KS) statistic to check whether drugs sharing high Spearman correlations tend to be close in terms of ATC distance. Although all three methods performed better than random, the data collected from our method indicated the highest significance (with picks of >10 orders on KS significance levels) on a wide range of correlation intervals. Hence the method is a valid resource for drug repositioning; in addition, the presence of significance levels over lists of genes ease the integration with other bioinformatics resources (i.e., coexpression gene networks) to improve predictions.

............................................................................................................................
SB P12: A systems biology approach highlights the role of GSK3-beta in the regulation of PDX1 by IL1-beta in pancreatic beta cell

Jisha Vijayan1, Yogeshwari Sivakumaran1, Rajagopal Rangarajan1, Mahesh Verma1, Anup Oommen1, Krishnamurthy Sheshadri1

1Connexios Life Sciences

A network model describing the response of proliferation (PDX1) to inflammation (IL1-Beta) in beta cell is described. The model is automatically extracted using an automated path-tracing algorithm from a large beta cell network based on integration of extensive literature. A mathematical model based on mass action kinetics is formulated for this network model. A steady-state simulation of the mathematical model shows that PDX1 decreases as IL1-beta is increased, saturating at high IL1-beta levels. Measurements on mouse pancreatic beta cell line (NIT-1) confirmed this behavior. Further simulations showed that under conditions of GSK3-beta inhibition, the response of PDX1 to IL1-beta reverses to increasing behavior, again saturating at high IL1-beta levels; further, the PDX1 levels were lower in this case. The saturation levels in both of these cases are comparable. Again this behavior was confirmed by measurements. This study highlights the role of GSK3-beta in the switching of PDX1 response to IL1-beta.

............................................................................................................................
SB P13: Elucidating complex phenotypes based on high-throughput expression and biological annotation data

Nitesh Singh1, Mathias Ernst1, Volkmar Liebscher2, Georg Fuellen1, Leila Taher1

1University of Rostock, 2Ernst Moritz Arndt University of Greifswald

The interpretation of large gene expression datasets describing complex phenotypes is obscured by the fact that such phenotypes are likely to be controlled by the concerted action of multiple genes. Combining gene expression analysis with annotation regarding the functions, processes, and pathways in which the genes are involved has the potential to elucidate the biological interactions underlying a given phenotype. Here, we present an approach that integrates gene expression and biological annotation data to identify biological units and their interactions that influence a phenotype of interest. First, we divide genes with similar biological annotation into clusters. Second, we group the genes within each cluster into sub-clusters, based on their expression profiles. Finally, we construct a co-expression network of sub-clusters to analyze the interactions between the biological units represented by these sub-clusters. We applied our approach to two microarray expression datasets describing the differentiation of mouse embryonic stem cells into embryoid bodies, and mouse liver development and regeneration, respectively. For the first dataset, our findings confirm that developmental processes and apoptosis have a key role in cell differentiation. Furthermore, we suggest that processes related to pluripotency and lineage commitment, which are known to be critical for development, interact mainly indirectly, through genes implicated in more general biological processes. For the second dataset, we concluded that the transcriptional mechanisms beneath liver regeneration are fundamentally different from those regulating embryonic liver development. Moreover, we provide evidence that supports the relevance of cell organization in the developing liver for proper liver function. Understanding how genes involved in specific biological functions, processes, and/or pathways interact depending on particular experimental conditions is crucial to decipher the molecular basis of health, disease, and drug response. This study provides a new approach to examining gene expression data that can be easily extended to other high-throughput expression data.

............................................................................................................................
SB P14: Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction

Hilary Parker1, Jeffrey Leek1, Alexander Favorov1, Xiaoxin Xia2, Sameer Chavan1, Christine Chung1, Elana Fertig1

1Johns Hopkins University, 2Rutgers University

Sample source, procurement process, and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori. Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict human papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set.

............................................................................................................................
SB P15: Experimental design for regulatory network discrimination

Jukka Intosalmi1, Henrik Mannerstrom1, Harri Lähdesmäki1,2

1Aalto University, 2Turku Centre for Biotechnology, University of Turku and Åbo Akademi

Biochemical systems such as regulatory networks can in many cases be modeled using ordinary differential equations (ODEs). The construction of ODE models is typically based on some initial information on interactions between different components and known patterns of stationary or temporal behavior. Typically, neither the interactions nor the parameter values defining the interaction strengths are fully known and have to be inferred from available or forthcoming data. During the past decades, numerous statistical methods for parameter inference and model selection have been developed to carry out this challenging task.

While both parameter inference and model selection for regulatory networks are studied extensively, the development of experimental design methods that can be used to predict the most useful and efficient experiment for model selection has gained much less attention.

In this work, we present a novel procedure to design optimal experiments for model selection of regulatory network models. Our approach relies on a utility-based Bayesian framework that enables the efficient use of prior information not only on the parameter values but also over the dynamics of model responses. We exemplify the usefulness of our approach by constructing an optimal experimental design (of measurement time-point selection) for a network inference problem, in the presence of very scarce initial data. In addition, we discuss an efficient numerical implementation of our method and outline promising future applications.

............................................................................................................................
SB P16: Bioconductor's EnrichmentBrowser: Seamless navigation through combined results of set-based and network-based enrichment analysis

Ludwig Geistlinger1, Gergely Csaba1, Ralf Zimmer1

1Ludwig-Maximilians-Universität Munich

Background: Enrichment analysis of gene expression data is essential to find functional groups of genes whose interplay can explain experimental observations. Numerous methods have been published that either ignore (set-based) or incorporate (network-based) known interactions between genes. However, the often subtle benefits and disadvantages of the individual methods confusing for most biological end users and there is currently no convenient way to combine methods for an enhanced result interpretation.

Results: We present the EnrichmentBrowser package as an easily applicable software that enables (1) the application of the most frequently used set-based and network-based enrichment methods, (2) their straightforward combination, and (3) a detailed and interactive visualization and exploration of the results. The package is available from the Bioconductor repository and implements additional support for standardized expression data preprocessing, differential expression analysis, and definition of suitable input gene sets and networks.

Conclusion: The EnrichmentBrowser package implements essential functionality for the enrichment analysis of gene expression data. It combines the advantages of set-based and network-based enrichment analysis in order to derive high-confidence gene sets and biological pathways that are differentially regulated in the expression data under investigation. Besides, the package facilitates the visualization and exploration of such sets and pathways.

............................................................................................................................
SB P17: CoRegNet: reconstruction and interrogation of co-regulatory network

Rémy Nicolle1, François Radvanyi2, Mohamed Elati1

1Institute of Systems & Synthetic Biology, Université d'Evry, 2Institut Curie

Recent advances in large-scale transcriptomics have enabled the profiling of hundreds to thousands of tumor samples by large consortia such as the The Cancer Genome Atlas (TCGA: http://cancergenome.nih.gov) or the International Cancer Genome Consortium (www.icgc.org). While the amount of data holds great promise for our understanding of tumor progression, they necessitate new and efficient methodologies to be analyzed and to extract valuable knowledge from them.

We present COREGNET, an R/Bioconductor package (under revision) that implements a set of tools to infer and analyze co-regulatory networks from gene expression data. The functions implemented in the package are based on previously validated and published studies. The first step of the proposed workflow is based on H-LICORN (1,2), a hybrid method network inference method using both a discretized and continuous version of the data to infer the set of cooperative regulators of genes. In order to improve the predicted network, the package implements methods derived from the modENCODE project (3) to refine the predictions by integrating additional regulatory evidence from data such as transcription factor binding sites, ChIP-seq, or ChIP-on-chip. The second step of the workflow aims at identifying, based on transcriptomic data, the combination of active transcription factors in a given sample. Based on our studies of feature extraction in gene expression (3), the proposed method uses the structure of the network to estimate the activity of a given transcription factor in a given sample through the expression of its target genes. This transformation of the data results in a new dataset representing the whole transcriptome of every sample by the activity of transcription factors. Additional implemented functions are derived from MARINa (5) (MAster Regulator Inference algorithm) to identify the most specific regulator of a given set of target genes of interest. Finally, an embedded shiny application eases the analysis of the co-regulatory network, containing links between cooperative regulators, through an interactive display of the network using a Cytoscape applet (javascript, non flash-based) integrating gene expression, transcription factor activity, and other features such as mutations or copy number alteration.

References
1. I. Chebil, R. Nicolle, G. Santini, C. Rouveirol and M. Elati. Hybrid method inference for the construction of cooperative regulatory network in human, IEEE transactions on nanobioscience, 13: 97 - 103, 2014.
2. Elati M, Neuvial P, Bolotin-Fukuhara M, Barillot E, Radvanyi F. and Rouveirol C. LICORN: learning co-operative regulation networks from expression data. Bioinformatics, 23:2407-2414, 2007.
3. Marbach D, Roy S, Ay F, Meyer PE, Candeias R, Kahveci T, Bristow CA & Kellis M (2012) Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Research 22: 1334–1349.
4. Nicolle R, Elati M & Radvanyi F (2012) Network Transformation of Gene Expression for Feature Extraction. Machine Learning and Applications (ICMLA’11), IEEE 1:108-113.
5. Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, Basso K, Beltrao P, Krogan N, Gautier J, Dalla-Favera R & Califano A (2010) A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Molecular Systems Biology 6: 1–10.

............................................................................................................................
SB P18: BioTapestry: new Version 7 features improve scalability and flexibility

Suzanne Paquette1, Kalle Leinonen1, William Longabaugh1

1Institute for Systems Biology

BioTapestry is a well-established tool for building, visualizing, and sharing models of gene regulatory networks (GRNs), with particular emphasis on the GRNs that drive development. It uses a hierarchy of models to present multiple views of the network at different levels of spatial and temporal resolution, and uses a visual representation that is tailored to the presentation of GRNs. Given their complexity, it is important to provide online interactive tools that can be used to explore a GRN model, and the existing Java-based BioTapestry Viewer has been used, for example, to provide an interactive online version of the sea urchin endomesoderm network since 2003.

However, current web-browser technologies such as HTML5 Canvas make it possible to provide an interactive graphical network model directly in a web browser without needing Java; we have now created a version of the BioTapestry Viewer using these technologies. At the same time, BioTapestry's new dual-mode software architecture continues to support the traditional Java-based BioTapestry Editor desktop application. This new feature is part of the new BioTapestry Version 7, which is scheduled for release in the autumn of 2014.

Version 7 also continues to build upon the helpful automatic network layout tools that were introduced in the current Version 6. In particular, we have improved the layout performance, and have also applied lessons learned from developing the new companion BioFabric network visualization tool to enhance BioTapestry’s presentation of large directed networks. These performance improvements are particularly noteworthy, since BioTapestry's "circuit trace" presentation style for directed links is highly scalable, and thus the user can automatically create rational, understandable, and highly organized presentations of large directed networks containing thousands of nodes.

............................................................................................................................
SB P19: Network Infusion to infer information sources in networks

Soheil Feizi1, Ken Duffy2, Muriel Medard1, Manolis Kellis1

1Massachusetts Institute of Technology, 2Hamilton Institute

Several models exist for diffusion of signals across biological, social, or engineered networks. However, the inverse problem of identifying the source of such propagated information seems on the surface intractable, even in the presence of multiple network snapshots, and especially for the single-snapshot case, given the many overlapping paths in real-world networks. Mathematically, this problem can be undertaken using a diffusion kernel that represents diffusion processes in a given network, but computing this kernel is generally intractable.

Here, we introduce a modified diffusion kernel that relaxes the path-coupling constraints by only considering k independent shortest paths among pairs of nodes, assuming an exponential time distribution for node-to-node spreading. We use the resulting Erlang network diffusion kernel to solve the inverse diffusion problem using both likelihood maximization and error minimization. We apply this framework for both single-source and multi-source diffusion, for both single-snapshot and multi-snapshot observations, and using both uninformative and informative prior probabilities for candidate source nodes.

We apply Network Infusion (NI) to identify disease-causing genes of several human diseases including T1D, Parkinson’s, MS, SLE, CVD, CAD, psoriasis, and schizophrenia, and show that NI infers candidate disease-causing genes that are biologically relevant and often not distinguishable using the raw p-values. In a second application, we identify the news sources for 3553 stories in the Digg social news network, and validate our results based on annotated information that was not provided to our algorithm. We also apply NI to several synthetic networks and compare its performance to centrality-based and distance-based methods for Erdos-Renyi graphs, power-law networks, symmetric grids, and asymmetric grids.

We also provide proofs that under a standard susceptible-infected (SI) diffusion model, (1) the maximum-likelihood Network Infusion method is mean-field optimal for tree structures or sufficiently sparse Erdos-Renyi graphs, (2) the minimum-error algorithm is mean-field optimal for regular tree structures, and (3) for sufficiently-distant sources, our multi-source solution is mean-field optimal in the regular tree structure.

............................................................................................................................
SB P20: Discovering patterns in leukemia — from local protein networks to global pathway utilization

Chenyue Hu1, Steven Kornblau2, Amina Qutub1

1Rice University, 2MD Anderson Cancer Center

Acute myeloid leukemia (AML) is a notoriously heterogeneous disease. Molecular variations in AML patients, including both genetic mutations and post-translational events, make targeted therapies extremely difficult. Delineating clinically impactful patient subpopulations and characterizing their disease mechanisms will not only improve diagnosis, it will also open a door to discovering new drug targets. However, most AML studies have focused on a handful of genetic mutations, which fail to fully account for the diversity of AML phenotypes and offer limited therapeutic opportunities. In this study, we developed a computational methodology to analyze protein and phosophoprotein states from AML patient samples. We applied our approach to discover new AML patient subpopulations, characterize local protein networks, build global pathway utilization maps, and tie the molecular patterns back to the clinical information and outcome of patients.

Profiles of 231 protein expression levels in 560 AML patient bone marrow samples as well as 21 normal bone marrow samples were obtained using Reverse Phase Protein Array (RPPA). Based on prior knowledge of protein functionality, we first grouped all the proteins into 24 pathways (e.g., hypoxia, adhesion, apoptosis). This categorization allows us to explore patient subpopulations and network patterns in a better-defined biological context. For each pathway, we then clustered patients into groups with distinct protein expression patterns using a combination of prototype clustering (estimating optimal cluster number) and K-means. We observed patterns that are consistent with normal biological mechanisms, as well as patterns that can not be explained by our current understanding of cell biology. To distinguish disease-specific patterns, we mapped normal samples onto patient samples in reduced dimensions of protein expression levels. Normal samples overlapped significantly with patient clusters in some pathways (e.g., differentiation, cell cycle), while they appeared distinct from almost all patient clusters in other pathways (e.g., heatshock, ribosome). We also identified patient clusters that are prognostic (e.g., adhesion) and clusters that are strongly associated with certain clinical correlates (e.g., cell differentiation). To investigate protein interactions, we built local protein networks combining both interactions from public databases (e.g., KEGG, STRING) and interactions inferred from the data set using glasso. These networks demonstrated a large contrast between what is known in literature and what is suggested from the new clinical data. In addition, we organized patterns in each pathway into a pathway activation roadmap, which illustrates multiple potential routes cells can hijack to activate or deactivate a pathway. At the global level, we translated all pathway patterns into a barcode system and were able to identify pathway groups that are co-utilized by different subpopulations.

In sum, results of this proteomic analysis uncovered potential pathways to target therapeutically and to use as biomarkers for groups of leukemia patients. At the same time, it provides a computational strategy that can be applied broadly to clinical omics data. Furthermore, the global analysis reconstructed a molecular utilization map for AML — one that could be a key to unlocking mechanisms of cancer.

............................................................................................................................
SB P21: A visual exploration of biological networks based on edge-centric view and edge centralities

Divya Mistry1, Julie Dickerson1

1Iowa State University

In a biological network, nodes typically represent the biological entities — genes, proteins, metabolites, or other biomolecules — resulting from experimental data, while edges represent a statistical or biophysical relationship between the nodes. A key problem is prioritizing the potential edges between the nodes for focused investigation of biological networks. To enable visual data mining, this work develops a parallel coordinate plot (PCP) based linear network visualization tool, called VisPNet. In a VisPNet diagram, each axis represents edge properties of interest, such as edge weights, edge centrality measures, or a user-defined metric. VisPNet provides the following advantages for visually mining a biological network.

• By organizing a viewport based on the edge distribution, topological features originating from edge properties become prominent. Because biological networks tend to have more edges than nodes, this organization results in an uncluttered and less noisy view of the network.

• VisPNet provides an option to group or highlight edges based on pre-calculated edge metadata such as weights, functional annotations, cellular localization annotations, and properties of adjacent nodes.

• VisPNet allows interactive brushing along the axes to explore subgraph topologies reliant on the metadata chosen for each of the axes.

• VisPNet can be used for dynamic graph exploration and analysis. Individual networks of various time points or experimental conditions may be laid out as the PCP axes. The ordering of nodes (if relevant), edges, or axes can help highlight consistently similar or divergent biological subnetworks. VisPNet also provides an option to choose up to three axes to generate a static Hive Plot [1], which may help highlight topological patterns.

The current implementation of VisPNet is developed using the D3 JavaScript library [2] atop all the modern browsers with HTML5 capabilities. VisPNet will be available at http://git.io/vispnet in the near future.

References
1. Krzywinski M, Birol I, Jones SJ, Marra MA (2011) Hive plots--rational approach to visualizing networks. Briefings in Bioinformatics: bbr069–. doi:10.1093/bib/bbr069.
2. Bostock M, Ogievetsky V, Heer J (2011) D³: Data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics 17: 2301–2309. doi:10.1109/TVCG.2011.185.

............................................................................................................................
SB P22: A novel model to combine clinical and pathway-based transcriptomic information for the prognosis prediction of breast cancer

Lana Garmire1

1University of Hawaii Cancer Center

With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed for more personalized treatment and disease management. We here then have developed a novel computational model for breast cancer prognosis by combining the pathway deregulation score (PDS) based pathifier algorithm, Cox regression, and L1-LASSO penalization method. The resulting prognosis genomic model successfully differentiated relapse in the training set (log rank p-value = 6.25e-12) and three testing data sets (log rank p-value < 0.0005), and consistently performed better than gene-based models. Moreover, combining genomic information with clinical information improved the p-values of prognosis prediction by at least three orders of magnitude in comparison to using either genomic or clinical information alone.

............................................................................................................................
SB P24: Verification of biological network models using a collaborative platform

Stephanie Boue1, Anselmo Di Fabio2, Brett Fields3, William Hayes3, Julia Hoeng1, Jennifer Park3, Manuel Peitsch1, Walter Schlage1, Marja Talikka1

1Philip Morris International, 2Applied Dynamic Solutions, 3Selventa

The sbv IMPROVER [Industrial Methodology for PROcess VErification in Research] network verification challenge (NVC) aims to verify and enhance comprehensive biological network models using a web application that facilitates collaboration among the scientific community. These network models are constructed using a structured syntax (Biological Expression Language, BEL) and are supported by scientific evidences at the edge level. We describe here an approach for biological network construction that synergizes manual curation and data-derived components with the process of crowdsourcing. By implementing a reputation-based web application available to the entire scientific community to add biology and vote on supporting scientific evidence, the NVC has created a collaborative crowdsourcing platform to complement peer review of publications describing the networks and to ensure complete and accurate biological networks that can be used as a standard in the field.

A collection of 49 biological network models capture a wide range of biological processes (from cell fate to cell stress, inflammation, and tissue repair) and represent a novel resource to investigate key downstream effectors linking experimental perturbations to specific biological pathways, enabling a predictive analytics pipeline to interpret experimental findings on a mechanistic level. The network models are freely available for the scientific community to download, utilize, and continue to refine for use in toxicological and drug discovery applications.

............................................................................................................................
SB P25: Deconfounding time-series by pseudotemporal estimation

John Reid1, Lorenz Wernisch1

1MRC Biostatistics Unit, Cambridge

Many biological systems of interest evolve over time. Classical examples are the cell cycle and expression patterns in the embryo. Studies of these systems are often cross-sectional in nature. That is, samples taken at distinct time points do not come from the same cells. In many systems, the cells are not synchronized. They progress at different rates and this can confound analyses of such cross-sectional data.

We present a method designed to control for this effect. Our method relies on a probabilistic model which estimates a pseudotime for each sample. On one hand, the pseudotime is related to the observed sample time, on the other hand the measured biological variables are softly constrained to vary smoothly, encouraging the model to explore pseudotimes consistent with the data.

Our model can be viewed as a Gaussian process latent variable model where we place a structural prior on the latent pseudotemporal space. In this sense, it can be seen as a generalization of the work of Buettner and Theis. Our model is also related to the work of Trapnell et al., who use a dimensionality reduction method combined with a maximum-likelihood type estimation procedure to estimate pseudotimes.

We give results for our model showing how it successfully reconstructs pseudotimes in synthetic data and an analysis of data from the cell cycle. The method is applicable to data from populations of cells but is particularly relevant to studies in single cells.

............................................................................................................................
SB P26: Differential regulatory mechanisms: An application to large-scale high throughput molecular profiling of schizophrenia brains

Thanneer Malai Perumal1

1Sage Bionetworks

Dysfunctional networks in disease: High throughput (HTP) molecular profiling data has become a ubiquitous tool for studying a wide range of human diseases. The majority of the existing approaches look for genes that are differentially (co-) expressed. It has now become increasingly important to know how the regulatory networks change along with the change in (co-)expression. Identifying changes in regulatory networks are expected to provide insights into dysregulated mechanisms and to identify key molecular players that may serve as candidate biomarkers or drug targets. To this end, based on HTP RNA-seq and genotype profiling, this work presents a novel ensemble methodology, named differential analysis of regulatory networks (DARN), to elucidate the underlying dysregulated mechanisms in cellular networks resulting in the observed phenotype of disease states.

Detecting differential regulatory networks: Grounded on the principles of perturbation and information theory, DARN uses an ensemble learning approach to infer regulatory interactions from a pool of interactions that are consistent with the expression data and identify mechanisms that differ between cases and controls. Starting from HTP RNA-seq and genotype data, DARN generates regulatory network models using both the data and text mining based approaches. Later, DARN curates the resulting interaction pool using a genetic optimization based ensemble algorithm that enriches the network for the most relevant positive feedback loops for the case-control difference. The major assumption behind this methodology is that the disease and control states are considered as stable expression patterns. Therefore, enriching positive feedback loops, which are a necessary condition for multi-stability, is expected to shed light on disease-induced dysfunction.

Application to HTP datasets of schizophrenia brains: Efficacy of this methodology is showcased through an application to a large-scale HTP data (RNA-seq & genotyping) from the dorsolateral prefrontal cortex (BA9/46) of human post-mortem brain in 265 schizophrenia cases and 289 controls. This dataset was generated in the framework of the CommonMind Consortium (commonmind.org), which aims to generate and analyze large-scale data from human subjects with neuropsychiatric disorders. After normalizing via voom and correcting for the effects of known clinical (gender, age of death, medications) and technical (brain bank, post-mortem interval, RNA quality, sequencing batch) covariates, the initial interaction pool were generated using data-driven methods including CLR, VBSR, TIGRESS, and GENIE3, as well as the literature based text mining toolkit of MetacoreTM. Using this pool of interactions DARN learned a population of regulatory networks representing the changes between control and schizophrenia postmortem brains. Analysis of the resulting population of networks is expected to provide a system level understanding of the genotype-phenotype relationships and key components in schizophrenia regulation, and will generate experimentally testable hypotheses for schizophrenia susceptibility and progression.

............................................................................................................................
SB P27: Variability in B-vitamin dependencies in the human microbiome genomes

Matvei Khoroshkin1 , Andrei Osterman2, Dmitry Rodionov2

1Institute for Information Transmission Problems, Russian Academy of Sciences, 2Sanford-Burnham Medical Research Institute

B vitamins are biochemical cofactors essential for any living systems. Human microbiota is the complex and dynamic community of commensal, symbiotic and pathogenic microorganisms that are present on and within the human body and has an enormous impact on humans. We investigate the ability of bacteria from human microbiome to produce and salvage B vitamins. We have selected the reference set of 1143 bacterial genomes from 7 phyla out of those sequenced in course of Human Microbiome Project (HMP). By using the metabolic subsystems approach (as implemented in the SEED database) and analyzing genomic context and regulons, we have reconstructed biochemical pathways for synthesis of eight B vitamins (thiamin, riboflavin, niacin, biotin, pyridoxine, cobalamin, pantothenate, folate) and predicted putative vitamin transporters in the reference HMP genomes. Using the reconstructed metabolic pathways, we have classified the HMP organisms with respect to their B-vitamin proto-, auxotrophy and their vitamin transport capabilities. The preferable patterns of vitamin dependency were attributed to a number of taxonomic units. For instance, the Bacteroides are mostly prototrophs that are capable synthesizing all B vitamins, excluding cobalamin. On the contrary, the Lactobacillales are auxotrophes for all vitamins, excluding folate. The reference HMP genomes show a relatively high level of conservation of vitamin synthesis phenotypes at the genus level, hence only 25% of the studied genera demonstrate variability of phenotypes for individual vitamins. Also we have identified patterns of vitamin dependency for a number of body sites. Gastrointestinal tract generally shows the prevalence of vitamin prototrophic bacteria, whereas oral cavity, urogenital tract and blood are largely populated by vitamin auxotrophs. This work is important for understanding the role of B-vitamins in maintaining homeostasis of human microbiome community structures and for future developing of specific vitamin diets.

............................................................................................................................
SB P28: Master regulators of luminal and basal subtypes of breast cancer

Archana Iyer1 , Celine Lefebvre2, Yishai Shimoni1, Mukesh Bansal1, Mariano Alvarez1, Jose Silva3, Andrea Califano 1

1Columbia University, 2Gustave Roussy, 3Mount Sinai Medical Center

Breast cancer is a heterogeneous group of diseases that can be stratified into several subgroups based on their molecular signature. Understanding the regulators of these molecular subtypes will allow us to make them more amenable for targeted therapies or personalized medicine. Here we present our discovery of master regulators that are important in the transcriptional regulation of the two major subtypes: basal and luminal. We reverse engineered a breast-cancer specific transcriptional network using large-scale gene expression datasets in breast cancer (TCGA, Metabric, UNC-300) to create a breast cancer interactome. Using MARINa (Master Regulator Inference Algorithm) we have identified specific transcription factors that regulate basal and luminal subtypes of breast cancer. We further validated these master regulators experimentally by performing a pooled shRNA screen on six independent cell lines (2 luminal, 3 basal, and one normal mammary epithelial cell line). The pooled shRNA screen was sampled at days 0, 10, 18, and 25, and genomic DNA was barcoded and sequenced using the Illumina MiSeq technology. Both computational predictions and our experimental results from the deconvolution of the shRNA screen validate the luminal transcription factors (FOXA1, ESRI, and GATA3). In addition, we discover novel luminal-specific transcription factors like TFAP2C. For the basal subgroup, while we discover novel regulators we also find that this group is more heterogeneous compared to the luminal. Importantly we can effectively target this subgroup with a combination of master regulators.

............................................................................................................................
SB P29: Network modeling reveals key features of epithelial-to-mesenchymal transition dynamics in liver cancer invasion

Steven Steinway1 , Jorge Gomez Tejeda Zañudo2, Thomas Loughran1, Reka Albert2

1University of Virginia, 2Penn State University

............................................................................................................................
SB P30: Trafficking and signaling interplay modeling after serotonin receptor activation

Aurélien Rizk1 , Mauno Schelb1, Milica Bugarski1, Maysam Mansouri1, Gebhard Schertler1, Philipp Berger1

1Paul Scherrer Institute

Despite the physiological and pharmacological importance of G protein-coupled receptors (GPCRs), receptor activation and its translation into cytoplasmic trafficking and cellular response remain elusive. In this project, we study the interplay between signaling and trafficking of serotonin receptors 5-HT2c after stimulation. We use RAB GTPases as markers of intracellular compartments to monitor the dynamic distribution of receptors after stimulation and ERK phosphorylation to monitor signaling output. In order to obtain statistically significant trafficking data and high temporal resolution we developed the "Squassh" image analysis software for automatic vesicles segmentation, counting, and colocalization computation [Rizk et al., Nature Protocols 2014]. Based on the receptor localization data, signaling data and previous work on the modeling of GPCR activated signaling pathways [Heitzler et al., MSB 2012] we developed an ordinary differential equation model combining signaling with receptor internalization and transport to early, recycling, and late endosomes. This is to our knowledge the first attempt to develop a dynamic trafficking model for a GPCR. We evaluate trafficking influence on signaling by conducting global sensitivity analysis and use the model to test hypotheses on receptor constitutive internalization, trafficking regulation, and signaling from endosomes.

............................................................................................................................
SB P31: Understanding multicellular function and disease with human tissue-specific networks

Arjun Krishnan1 , Casey Greene2, Aaron Wong1, Emanuela Ricciotti3, Rene Zelaya2, Daniel Himmelstein4, Daniel Chasman 5, Garret Fitzgerald3, Kara Dolinski1, Tilo Grosser3, Olga Troyanskaya1

Princeton University, 2Dartmouth College, 3University of Pennsylvania, 4University of California, San Francisco, 5Harvard Medical School

Tissue and cell-type identity lie at the core of human physiology and disease. Therefore, understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. Yet we still lack tools to systematically explore the landscape of genes and interactions that shape specialized cellular functions across hundreds of tissue types and cell lineages in the body. Here we present genome-wide functional interaction networks specific for each of 144 human tissues and cell types developed using an integrative data-driven methodology. Our approach integrates thousands of diverse genome-scale datasets by simultaneously using both tissue-specific and functional contexts. This technique effectively leverages signals detected by distinct technologies from experiments spanning both tissues and disease states. The tissue networks predict lineage-specific response of genes to perturbation, reveal changing functional roles of genes depending on tissue context, and illuminate meaningful disease-disease associations. We show that genes with nominally significant p-values in genome-wide association studies (GWAS) can be used in conjunction with tissue-specific networks to identify biologically important disease-gene associations, a procedure we term NetWAS. NetWAS identifies disease-associated genes more accurately than GWAS alone or an approach using a non-tissue-specific functional network. Our webserver, GIANT, (http://giant.princeton.edu) provides an interface to human tissue networks with multi-gene query capability, network visualization, analysis tools, and downloadable networks. GIANT also enables NetWAS reprioritization of users' GWAS results.

............................................................................................................................
SB P32: A modeling framework for generation of positional and temporal simulations of transcriptional regulation

David Knox1, Robin Dowell2

1University of Colorado Anschutz Medical Campus, 2University of Colorado

Abstract: We present a modeling framework aimed at capturing both the positional and temporal behavior of transcriptional regulatory proteins. There is growing evidence that transcriptional regulation is the complex behavior that emerges not solely from the individual components, but rather from their collective behavior, including competition and cooperation. Our framework describes individual regulatory components using generic action oriented descriptions of their biochemical interactions with a DNA sequence. All the possible actions are based on the current state of factors bound to the DNA. We developed a rule builder to automatically generate the complete set of biochemical interaction rules for any given DNA sequence. Off-the-shelf stochastic simulation engines can model the behavior of a system of rules and the resulting changes in the configuration of bound factors can be visualized. We compared our model to experimental data at well-studied loci in yeast, confirming that our model captures both the positional and temporal behavior of transcriptional regulation.

Method: Our goal was to integrate inherently dynamic aspects of transcriptional regulation, such as transcriptional interference, with the intuitive position based models. To this end, we constructed a modeling framework that leverages the power of Petri nets to describe the actions of various regulators and the extent or span of their influence. By treating the DNA as an ordered set of entities (nucleotides or groups of nucleotides) rather than a single molecular entity, we can generate models that grow linearly with the length of the DNA sequence being modeled. At the core of our framework is our stochastic rule builder, an application that can take in an arbitrary sequence and construct the complete set of coherent biochemical rules. Off-the-shelf stochastic simulation engines, such as Dizzy, can then simulate these rule sets. The simulations produce tremendous amounts of positional and temporal data, which can be converted into simple visualizations depicting the state of the DNA at each time step.

Results: We have developed a framework to create biologically realistic models of the mechanisms of transcriptional regulation. Based on this framework, we can model not only the steady-state behavior of transcription factor binding and nucleosome formation, but also the dynamics of components such as the transcriptional machinery. Our framework scales linearly, making it possible to simulate very large segments of DNA. The simulations record the state of the complex system of interactions at each time step. We have interpreted this state information to visualize the binding configuration of components along the DNA.

............................................................................................................................

SB P33: Mutation analysis pipeline for E. coli

Erin Boggess1, Liam Royce1, Yingxi Chen1, Laura Jarboe1, Julie Dickerson1

1Iowa State University

Next-generation sequencing is increasingly accessible and affordable, resulting in vast quantities of genomic data. Metabolic evolution benefits from next-gen sequencing because of the ability to sequence and compare entire genomes. In microbial engineering, metabolic evolution is an essential method for developing organisms with a desired phenotype. In an evolution experiment, variants with advantageous phenotypes emerge under strong selective pressure and displace the parent strain in a population. Improved fitness is attributed to acquired mutations that are identified by comparing the genomes of evolved and parent strains. The technique generates a strain with a desired phenotype, but requires further research to ascertain how mutations relate to the phenotype.

After genome sequencing, a bottleneck occurs in annotating mutations and interpreting their effects. Traditionally, mutations are manually annotated and those within coding regions are investigated for relation to fitness. While drastic mutations such as loss of function will be revealed, this ignores extragenic changes that can affect regulation. Furthermore, manual annotation and exploration is tedious and significantly increases time to a secondary round of metabolic evolution and other experimentation. The massive amount of sequence variation data generated in adaptive evolution experiments necessitates computational tools that assess mutation significance.

Here we provide a pipeline to facilitate and expedite mutation analysis in E. coli with the underlying ambition of mapping genomic changes to phenotype. Our pipeline leverages public genome databases, gene regulatory networks, metabolic pathways, and computational tools such as structure prediction software.

The pipeline begins by annotating and classifying a mutation, then assigning a priority given its predicted effect to structure and function. We use the E. coli regulatory network to discover other genes regulated by the mutated element. This expanded gene list is examined to identify associated enzymatic reactions in the E. coli metabolic network. When studying a set of mutations, expanded gene and reaction lists are collected and examined for overrepresented pathways and GO terms. Knowledge of altered reactions, affected pathways, and gene function commonalities is key to mapping mutations to phenotype.

When applied to E. coli experimental data from a metabolic evolution study for enhanced octanoic acid tolerance, we identified mutations in extragenic regions that would traditionally be overlooked. In addition, regulatory links extended our understanding of mutation effects, and prioritization assisted in focusing the follow-up research efforts.

The proposed methods benefit the research community by broadening the study of mutations and mechanisms of adaptation. Additionally, automating portions of comparative genomic analysis reduces the lifecycle of metabolic evolution studies. From an initial list of mutations and genomic positions, we provide researchers with a prediction of how that mutation affects the element in which it occurs, cellular implications, and prioritization so that efforts may be focused on the most relevant mutations.

............................................................................................................................

SB P34: Network-based model of oncogenic collaboration for prediction of drug sensitivity

Ted Laderas1

1Oregon Health & Science University

Tumorigenesis is a multi-step process, involving the acquisition of multiple oncogenic mutations that transform cells, resulting in systemic dysregulation that enables tumor proliferation. High throughput “omics” techniques allow rapid identification of these mutations with the goal of identifying treatments that target them. However, the multiplicity of oncogenes required for transformation (oncogenic collaboration) makes mapping treatments difficult. To make this problem tractable, we have defined oncogenic collaboration as mutations in genes that interact with an oncogene that may contribute to its dysregulation, a new genomic feature we term “surrogate mutations.” By mapping the mutations to a protein/protein interaction network, we can determine significance of the observed distribution using permutation-based methods. We identified significant surrogate mutations in oncogenes such as BRCA1 and ESR1 that are frequently observed across breast cancer cell lines. In addition, using Random Forest Classifiers, we show that these significant surrogate mutations predict drug sensitivity in breast cancer cell lines with a mean error rate comparable to the current standard; e.g., the PAM50 expression data (30.1% vs 29.1%). Our model has potential for integrating patient-unique mutations in predicting drug sensitivity, suggesting a potential new direction in precision medicine as well as a new model for drug development. Additionally, we show the prevalence of significant surrogate mutations in breast cancer patients within the Cancer Genome Atlas, suggesting that surrogate mutations may be a useful genomic feature in personalized medicine.

............................................................................................................................

SB P35: Emergence in signal transduction networks: identification of complex mechanisms of information transfer to understand and control cell phenotypes in health and disease

Mark Ciaccio1, Neda Bagheri1

1Northwestern University

Many drug candidates fail in clinical trials due to an incomplete understanding of how small-molecule perturbations affect signal transduction at the systems level. Small molecules can bind multiple proteins, exerting non-intuitive emergent effects on cell phenotype due to nonlinear signaling properties such as feedback and redundancy. We created a computationally-efficient algorithm, DIONESUS (Dynamic Inference Of NEtwork Structure Using Singular Values), based on partial least squares regression (PLSR) to accurately reconstruct the signaling network architecture from the phosphoproteomic signatures of 60 phosphosites at four time points following 30 diverse perturbations. This dynamic dataset was collected using the microwestern array, a high-confidence and high-throughput method for assaying protein abundance and modification.

DIONESUS enabled us to explore several questions that are central to computational modeling of cell signaling: How much predictive power is gained by (i) accounting for temporal dependence and information flow in network structure, (ii) characterizing non-linear interactions among nodes, and (iii) inferring non-additive relationships between parent and daughter nodes, such as AND-, OR-, and XOR-gates? Understanding the essential properties of signaling networks beyond linear combinations of variables provides insight into how biological networks process information to create controlled and robust responses from noisy stimuli. We integrated our new methodology with phosphoproteomic data to hypothesize new mechanisms of information transfer and propose viable drug targets that take into account the complex and emergent properties of signal transduction networks.

............................................................................................................................

SB P36: A first truly systems level mechanistic model – unravelling the gene regulation of Th2 differentiation

Mattias Köpsén1, William Lövfors1, Sören Bruhn2, Gunnar Cedersund1, Mikael Benson1, Mika Gustafsson1

1Linköping University, 2Karolinska Institute

Recent and ongoing revolutions in measurement technologies imply completely new possibilities for genome research; today, time-resolved, quantitative, and systems-level data are available. Nevertheless, without a corresponding revolution in methods for data analysis, these new data tend to drown researchers and doctors, rather than provide clear and useful insights. Such new methods are developed within the field of systems biology. Systems biology has two main approaches: mechanistically detailed and well-determined simulation models for small subsystems, and more approximative statistical models for the entire genome. However, there are few, if any, methods that combine the strengths of these two approaches. Herein, we present LASSIM, a new simulation-based approach, which can be applied to systems of the size of the entire genome. The superior performance of LASSIM is demonstrated in three examples: i) an example with simulated data shows that unlike traditional large-scale methods, LASSIM correctly identifies the true behavior between measured data-points, ii) LASSIM outperforms the winner of a previous DREAM challenge, the most competitive benchmarking approach available, iii) based on new data from TH2 differentiation, LASSIM identifies a first mechanistic model for the entire genome. The key predictions of this model are typically enriched for DNA bindings, which suggests that most predicted interactions are direct. Moreover, in silico knockdowns were experimentally validated. In summary, LASSIM opens the door to a new type of model-based data analysis: models that combine the strengths of reliable mechanistic models with truly systems-level data.

............................................................................................................................

SB P37: Utilization of Whole Genome Analysis Approaches for Personalized Therapy Decision Making in Patients with Advanced Malignancies

Martin Jones1, Yaoqing Shen1, Erin Pleasance1, Katayoon Kasaian1, Sreeja Leelakumari1, Yvonne Y Li1, Peter Eirew2, Richard Corbett1, Karen L Mungall1, Nina Thiessen1, Yussanne Ma1, Alexandra Fok1, Jacquie Schein1, Andrew J Mungall1, Yongjun Zhao1, Richard A Moore1, Stephen Yip3, Karen Gelmon4, Howard Lim4, Daniel Renouf4, Anna Tinker4, Sophie Sun4, Robyn Roscoe1, Steven JM Jones1, Janessa Laskin4, Marco A Marra1,5

1Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada; 2Department of Molecular Oncology, British Columbia Cancer Agency, Vancouver, BC, Canada; 3Centre for Translational and Applied Genomics, British Columbia Cancer Agency, Vancouver, BC, Canada; 4Department of Medical Oncology, British Columbia Cancer Agency, Vancouver, BC, Canada; 5Department of Medical Genetics, University of British Columbia, Vancouver BC, Canada

Genomic analysis is being widely investigated to support cancer treatment decision-making; here we detail our experience at the BC Cancer Agency. Our Personalized Oncogenomics (POG) project aims to determine the feasibility of a whole genome data driven approach to support these treatment decisions at a tertiary cancer care centre.

The POG project enrolls patients with metastatic cancers for which standard chemotherapy regimens fail or do not exist. For each patient, we performed whole genome (80-100X) and transcriptome sequencing (100-200M paired reads) of a fresh tumour biopsy sample and whole genome sequencing (40-50X) of peripheral blood. When available, the genome of an archival sample was also sequenced. All samples also undergo targeted deep sequencing using a “panel” to analyze selected known cancer mutations. Bioinformatics approaches were used to identify genes with somatic single nucleotide variants, small insertions and deletions, copy number variants, regions of loss of heterozygosity, structural variants, and expression changes to build an individual somatic molecular profile. This is followed by intensive pathway analysis and literature searches to identify the candidate biological processes that have been affected by mutation. Infectious agent presence and expression is also determined along with any integration sites.

Based on the integration of all these results, therapeutic options can be proposed. POG has consented 140 patients and sequenced and analyzed samples from 90 patients (including 6 pediatric) representing 33 different tumour types. The time between acquiring the biopsy sample and presenting a report for clinical oncologists was ~30-50 days. For ~70% of the patients the genome and transcriptome data were informative in guiding treatment. In addition the whole genome analysis approach was found to complement standard of care clinical tests aiding diagnosis.

Significant spatial and temporal tumour heterogeneity has often been noted as well as molecular response to treatment.

The POG project has now moved to a 2nd phase with ethical approval for 5,000 patients. Accrual has reached the rate of 5 patients per week with a ten fold increase expected over the coming 6-12 months.


Top of Page