SPONSORS:

Silver

Silver Sponsor: Sanofi



General

General Sponsor - IBM Research

General Sponsor - MAGNet

General Sponsor -National Cancer Institute

RECOMB/ISCB RegSysGen 2014 Sponsor - NRNB

Cytoscape Sponsors

RECOMB/ISCB RegSysGen 2014 Sponsor - Agilent Technologies

RECOMB/ISCB RegSysGen 2014 Sponsor - Cytoscape

SYSTEM BIOLOGY PRESENTATIONS & ABSTRACTS

Presented Tuesday, November 11 and Wednesday, November 12


--> Go directly to Wednesday, Nov 12

TUESDAY, NOVEMBER 11



1:55 pm – 2:15 pm


SB T01
A cell lineage-specific regulatory network inferred using limited expression data of erythropoiesis


Fan Zhu1, Lihong Shi1, James Engel1, Yuanfang Guan1

1University of Michigan

Modeling regulatory networks using expression data observed in a differentiation process may help identify context-specific interactions. Despite intensive research efforts on this topic, the outcome of the current algorithms highly depends on the quality and quantity of a single time-course data, and the performance may be compromised for data with a limited number of samples. In this work, we report a novel multi-layer graphical model that is capable of leveraging heterogeneous, generic, publicly available time-course datasets, as well as limited cell lineage-specific data to model regulatory networks specific to a differentiation process. First, a collection of network inference methods are used to predict the regulatory relationships in individual datasets. Then, the inferred relationships are weighted and integrated together by evaluating against the cell lineage-specific data. To test the accuracy of this algorithm, we collected a time-course RNA-Seq dataset during human erythropoiesis to infer regulatory relationships specific to this differentiation process. The resulting erythroid-specific regulatory network reveals novel regulatory relationships activated in erythropoiesis, which were further validated by genome-wide TR4 binding studies using ChIP-seq. These erythropoiesis-specific regulatory relationships were not identifiable by single dataset-based methods or context-independent integrations. Analysis of the predicted targets reveals that they are all closely associated with hematopoietic lineage differentiation. In summary, this paper develops an integrative strategy that is capable of leveraging a limited, cell type-specific expression dataset and large-scale, generic time-course datasets to infer regulatory networks specific to a differentiation process, which is applicable to other cell lineages.

...............................................................................................................................
Tuesday, November 11
2:15 pm – 2:35 pm

SB T02

FAST-SL: An efficient algorithm to identify synthetic lethal reaction/gene sets in metabolic networks

Aditya Pratapa1, Shankar Balachandran1, Karthik Raman1

1Indian Institute of Technology Madras

Synthetic lethal reaction/gene sets are sets of reactions/genes where only the simultaneous removal of all reactions/genes in the set abolishes growth of an organism. In silico, synthetic lethal sets can be identified by simulating the effect of removal of reaction/gene sets from the reconstructed genome-scale metabolic network of an organism. Previous approaches to identifying synthetic lethal reactions in genome-scale metabolic networks have built on the framework of Flux Balance Analysis (FBA), extending it either to exhaustively analyze all possible combinations of reactions, or formulate the problem as a bi-level Mixed Integer Linear Programming (MILP) problem.

FAST-SL circumvents the complexity of both exhaustive enumeration and the bi-level MILP by iteratively reducing the search space and the computational time involved in identification of synthetic lethal reaction sets. FAST-SL, while considering all possible phenotypes and all parts of metabolism, efficiently identifies the targeted phenotypes. Our algorithm shows more than a 4000-fold reduction in search space over exhaustive enumeration of triple lethal sets for Escherichia coli iAF1260 model. Unlike the previous methods used for identification of lethal reaction sets, FAST-SL uses the sparsest solution obtained by solving the flux balance constraints of a metabolic network, which is a linear programming problem, to eliminate reaction combinations that do not lead to a lethal phenotype, thereby reducing the search space for identifying lethal reaction sets.

As our algorithm finds application in the identification of combinatorial drug targets, in this study, we performed synthetic reaction and gene lethality analysis for genome-scale reconstructions of Salmonella enterica typhimurium and Mycobacterium tuberculosis. We validated the reaction lethals obtained using FAST-SL with exhaustive enumeration of reaction deletions up to the order of two for these organisms. The triple lethal reactions obtained for Escherichia coli using FAST-SL have a precise match with the results obtained with exhaustive enumeration, by performing it on a high-performance computer cluster. Our results also completely agree with those of the SL finder algorithm (Suthers, P.F. et al (2009). Mol Syst Biol, 5:301); notably, our algorithm is substantially faster. Further, we also present a mathematical proof for the correctness of our algorithm.

Overall, FAST-SL is a powerful tool to identify the lethal reaction/gene sets, through a massive reduction in the search space over an exhaustive enumeration approach and the SL Finder algorithm. We believe that our algorithm presents an important advance and can enable the rapid enumeration of synthetic lethal reaction/gene sets in genome-scale metabolic networks.

Availability: The MATLAB implementation of our algorithm (compatible with the COBRA toolbox v2.0, a popular toolbox for constraint-based analysis of metabolic networks) is freely available from: https://home.iitm.ac.in/kraman/lab/research/fast-sl.

...............................................................................................................................
Tuesday, November 11
2:35 pm – 2:55 pm

SB T03
Trafficking and signaling interplay modeling after serotonin receptor activation

Aurélien Rizk1, Mauno Schelb1, Milica Bugarski1, Maysam Mansouri1, Gebhard Schertler1, Philipp Berger1

1Paul Scherrer Institute

Despite the physiological and pharmacological importance of G protein-coupled receptors (GPCRs), receptor activation and its translation into cytoplasmic trafficking and cellular response remain elusive. In this project, we study the interplay between signaling and trafficking of serotonin receptors 5-HT2c after stimulation. We use RAB GTPases as markers of intracellular compartments to monitor the dynamic distribution of receptors after stimulation and ERK phosphorylation to monitor signaling output. In order to obtain statistically significant trafficking data and high temporal resolution we developed the "Squassh" image analysis software for automatic vesicles segmentation, counting, and colocalization computation [Rizk et al., Nature Protocols 2014]. Based on the receptor localization data, signaling data and previous work on the modeling of GPCR activated signaling pathways [Heitzler et al., MSB 2012] we developed an ordinary differential equation model combining signaling with receptor internalization and transport to early, recycling, and late endosomes. This is to our knowledge the first attempt to develop a dynamic trafficking model for a GPCR. We evaluate trafficking influence on signaling by conducting global sensitivity analysis and use the model to test hypotheses on receptor constitutive internalization, trafficking regulation, and signaling from endosomes.

...............................................................................................................................
Tuesday, November 11
2:55 pm – 3:15 pm


SB T04
Joint learning over drugs improves prediction of cancer drug response


Ivan Paskov1, Han Yuan2, Hristo Paskov1, Alvaro Gonzalez2, Christina Leslie2

1Stanford University, 2Memorial Sloan Kettering Cancer Center

The ultimate goal of precision medicine is to predict the best personalized therapeutic option from patient-specific genomic data. In cancer, precision medicine seeks to leverage new targeted therapies that work only in a subset of tumors where the targeted pathway is suitably altered; in general we cannot predict drug response from the mutation or copy number status of the target alone. Here we use publicly available drug response data sets in cancer cell lines, including Cancer Cell Line Encyclopedia (CCLE) and National Cancer Institute NCI-60, and develop a multi-task strategy to predict drug sensitivity by jointly learning across many drugs at once. We use a nuclear norm regularization approach with a highly efficient ADMM (alternating direction method of multipliers) optimization algorithm that readily scales to large data sets. For the CCLE data set, we used cross-validation to train on 445 cell lines with 50,000 genomic features (gene expression, copy number, and mutation status) and jointly learn prediction models for 24 drugs. For all drugs, our multi-task learning approach outperformed elastic net single-task learning in a transductive cross-validation setting, where the features of all cell lines are seen across tasks, but the drug response values for each task’s test set are held-out. The mean square error (MSE) of multi-task learning is on average 33% smaller than the MSE of single-task learning. For NCI60 dataset, we trained on 60 cell lines, around 60,000 genomic features, and 309 FDA approved drugs. Here, multi-task learning outperformed elastic net single-task learning for 226 out of 309 drugs in a transductive cross-validation setting, with a mean improvement in MSE of 14.1%. Moreover, our joint training approach led to more interpretable drug response models, where drugs with similar mechanisms of action had similar regression models, and where enrichment analysis of regression coefficients revealed the mechanism of action.

...............................................................................................................................
Tuesday, November 11
3:40 pm – 4:00 pm

SB T05
A first truly systems level mechanistic model – unravelling the gene regulation of Th2 differentiation

Mattias Köpsén1, William Lövfors1, Sören Bruhn2, Gunnar Cedersund1, Mikael Benson1, Mika Gustafsson1

1Linköping University, 2Karolinska Institute

Recent and ongoing revolutions in measurement technologies imply completely new possibilities for genome research; today, time-resolved, quantitative, and systems-level data are available. Nevertheless, without a corresponding revolution in methods for data analysis, these new data tend to drown researchers and doctors, rather than provide clear and useful insights. Such new methods are developed within the field of systems biology. Systems biology has two main approaches: mechanistically detailed and well-determined simulation models for small subsystems, and more approximative statistical models for the entire genome. However, there are few, if any, methods that combine the strengths of these two approaches. Herein, we present LASSIM, a new simulation-based approach, which can be applied to systems of the size of the entire genome. The superior performance of LASSIM is demonstrated in three examples: i) an example with simulated data shows that unlike traditional large-scale methods, LASSIM correctly identifies the true behavior between measured data-points, ii) LASSIM outperforms the winner of a previous DREAM challenge, the most competitive benchmarking approach available, iii) based on new data from TH2 differentiation, LASSIM identifies a first mechanistic model for the entire genome. The key predictions of this model are typically enriched for DNA bindings, which suggests that most predicted interactions are direct. Moreover, in silico knockdowns were experimentally validated. In summary, LASSIM opens the door to a new type of model-based data analysis: models that combine the strengths of reliable mechanistic models with truly systems-level data.

...............................................................................................................................
Tuesday, November 11
4:00 pm – 4:20 pm

SB T06
Simulation predicts IGFBP2-HIF1α interaction drives glioblastoma growth

Ka Wai Lin1, Angela Liao1, Amina Qutub1

1Rice University

Introduction: Recent clinical studies show that both obese and type II diabetic patients have faster tumor progression and decreased survival rates from brain cancers compared to cancer patients without obesity or type II diabetes [1]. Though studies suggest interactions between insulin-like growth factor I (IGFI), insulin-like growth factor binding proteins (IGFBP2), and hypoxic inducible factor 1 alpha (HIF1α) correlate to tumor growth and invasiveness, the detailed mechanisms of these interactions are unknown. Computational modeling can address the complexity of these interactions by identifying sets of key signaling regulators and characterizing the architecture of their signaling pathways. Here we present a computational model relating IGFBP2, IGFI, and HIF1α to the growth of glioblastoma cells. Many drugs that have targeted the insulin-like growth factor receptor (IGFIR) have shown promising results under in vitro settings but have failed in late stage clinical trials. Results from our model found a potential target in the insulin signaling pathway which will guide the design of new drugs for glioblastoma.

Materials and Methods: The interactions between IGFBP2, IGFI, HIF1α, and oxygen levels to glioblastoma growth were summarized by an extensive literature search. The chemical-kinetic model was created containing 5 ordinary differential equations and simulated using Matlab. The parameters were found by fitting the rate constants to in vitro data from existing literature. IGFI and IGFBP2 was fitted using data that observed IGFBP2 concentration as a function of IGFI stimulation over time [2]. HIF1α was fitted using data that observed changes in HIF1α concentration as a function of oxygen levels [3]. The HIF1α signaling is linked to the growth of the glioblastoma, which was fitted in vitro assays which used U87 glioblastoma cells cultured into spheroids by the hanging drop approach in our lab where the diameter at different time points was measured. An extensive series of sensitivity analyses were conducted on all parameters in the model.

Results and Discussion: The results from the model showed that the downstream signal from IGFI to HIF1α was less sensitive to change as compared to the feedback of IGFBP2 to HIF1α. Results from our glioblastoma growth reduction analysis showed that when the feedback loop was removed there was a greater decrease in glioblastoma diameter as compared to removing the downstream IGFI to HIF1α signal.

Conclusions: The simulations from the computational model are representative of the in vitro system and they are in agreement with known literature of in vitro growth of glioblastoma. Model sensitivity analysis highlighted that feedback from IGFBP2 to HIF1α is more integral to the sustained growth of the glioblastoma spheroid than the downstream signaling from IGFI to HIF1α. This implicates a more significant potential drug target as compared to the current IGFI targets. Ongoing studies in the lab are following the potential of these targets of glioblastoma in vitro.

References:

1. Chambless LB et al. Type 2 diabetes mellitus and obesity are independent risk factors for poor outcome in patients with high-grade glioma. Journal of neuro-oncology. 2012;106(2):383-9.

2. Slomiany MG et al. IGF-1-induced VEGF and IGFBP-3 secretion correlates with increased HIF-1 alpha expression and activity in retinal pigment epithelial cell line D407. Invest Ophthalmol Vis Sci. 2004;45(8):2838-47. 3. Jiang BH et al. Hypoxia-inducible factor 1 levels vary exponentially over a physiologically relevant range of O2 tension. Am J Physiol. 1996;271(4 Pt 1):C1172-80.

...............................................................................................................................
Tuesday, November 11
4:20 pm – 4:40 pm

SB T07
Ensemble-based design of experiments for gene regulatory networks

Erica Manesso1, Rudiyanto Gunawan1

1ETH Zurich

Model-based discovery in systems biology is an iterative process that integrates wet-lab experiments, parameter estimation, in silico analysis, and optimization. There are still many challenges in performing the iterative model-based discovery. The bottlenecking step is often encountered during the estimation of unknown kinetic parameters from experimental data. The estimation of kinetic parameters by fitting model simulations to biological data is usually ill posed; there often does not exist a single (best-fit) solution to the data fitting problem, and instead one can find many parameter combinations; i.e., an ensemble of parameters, that can fit the data statistically equally well. The parameter ensemble represents the uncertainty of the model parameters. However, the issue above describes only one type of uncertainty in the mathematical modeling of biological systems. There are also other factors that contribute to model uncertainty, including structural and dynamical uncertainty. In the context of gene regulatory networks, the use of ensemble models has been very limited and focused mainly on network structure. In practical applications, it is often desired and necessary to reduce the size of the ensemble by performing additional experiments and gathering new data. The goal of the present work is to design the experiments that would lead to a significant reduction in the ensemble size taking into account different aspects of model uncertainty.

Modern technique of model-based experimental design aims at obtaining the most informative data from an experiment in order to validate the predictions of a model (e.g., gene expression profiles). For this purpose, the experimental conditions are usually optimized to obtain the maximum information from the data. Only recently the uncertainty associated to the model parameters has been taken into account using the Approximate Bayesian Computation Design (ABCD), where the parameter uncertainty is employed as a priori information. In this work we adapt ABCD method such that both parameter and structure uncertainties can be considered. The resulting Ensemble-based Design Of Experiments (EDOE) gives the optimal experimental condition that simultaneously reduces the ensemble of structures and parameters. Briefly, the procedure consists of: (1) select the initial experimental design ξ(0); (2) draw a sample (s(0), θ(0)) from the ensemble of structure and parameters (i.e., a sample from the prior distribution p(s,θ), where s is the structure and θ is the vector of parameters); (3) evaluate model prediction y(0), thus producing a sample from the joint distribution p(y|s,θ;ξ(0)); (4) compute the design criterion h(0) according to the joint distribution p(y|s,θ;ξ(0)); (5) propose a new design ξ(c) from a Metropolis-Hastings probing density; (6) draw another sample (s(c), θ(c)) from the prior distribution p(s,θ) and evaluate y(c) from the joint distribution p(y|s,θ;ξ(c)); (6) compute the design criterion h(c) according to the joint distribution; (8) replace the existing design ξ(0) with the proposed design ξ(c) if the acceptance criterion is satisfied; (9) repeat steps 5-7 until convergence. We demonstrated the utility of the EDOE on a case study of gene regulatory networks: the GATA-1, GATA-2, and PU.1 circuit governing the differentiation of myeloid progenitors.

...............................................................................................................................
Tuesday, November 11
5:25 pm – 5:45 pm

SB T08
Reconstruction of gene regulatory networks based on repairing sparse low-rank matrices

Young Hwan Chang1, Roel Dobbe1, Palak Bhushan1, Claire Tomlin1

1University of California, Berkeley

With the growth of high-throughput proteomic data, in particular time series gene expression data from various perturbations, a general question that has arisen is how to organize inherently heterogeneous data into meaningful structures. Since biological systems such as breast cancer tumors respond differently to various treatments, little is known about exactly how these gene regulatory networks (GRNs) operate under different stimuli. For example, when we apply a drug-induced perturbation to a target protein, we often only know that the dynamic response of the specific protein may be affected. We do not know by how much, how long, and even whether this perturbation affects other proteins or not. Challenges due to the lack of such knowledge not only occur in modeling the dynamics of a GRN but also cause bias or uncertainties in identifying parameters or inferring the GRN structure.

This paper describes a new algorithm that enables us to estimate bias error due to the effect of perturbations and to correctly identify the common graph structure among biased inferred graph structures. To do this, we retrieve common dynamics of GRN subject to various perturbations. We refer to the task as “repairing” inspired by “image repairing” in computer vision. The method can automatically correctly repair the common graph structure across perturbed GRNs, even without precise information about the effect of the perturbations. We evaluate the method on synthetic data sets and demonstrate advantages over l1-regularized graph inference by advancing our understanding of how these networks respond across different targeted therapies.

............................................................................................................................
Tuesday, November 11
5:45 pm – 6:05 pm

SB T09
Understanding multicellular function and disease with human tissue-specific networks

Arjun Krishnan1, Casey Greene2, Aaron Wong1, Emanuela Ricciotti3, Rene Zelaya2, Daniel Himmelstein4, Daniel Chasman5, Garret Fitzgerald3, Kara Dolinski1, Tilo Grosser3, Olga Troyanskaya1

1Princeton University, 2Dartmouth College, 3University of Pennsylvania, 4University of California, San Francisco, 5Harvard Medical School

Tissue and cell-type identity lie at the core of human physiology and disease. Therefore, understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. Yet we still lack tools to systematically explore the landscape of genes and interactions that shape specialized cellular functions across hundreds of tissue types and cell lineages in the body. Here we present genome-wide functional interaction networks specific for each of 144 human tissues and cell types developed using an integrative data-driven methodology. Our approach integrates thousands of diverse genome-scale datasets by simultaneously using both tissue-specific and functional contexts. This technique effectively leverages signals detected by distinct technologies from experiments spanning both tissues and disease states. The tissue networks predict lineage-specific response of genes to perturbation, reveal changing functional roles of genes depending on tissue context, and illuminate meaningful disease-disease associations. We show that genes with nominally significant p-values in genome-wide association studies (GWAS) can be used in conjunction with tissue-specific networks to identify biologically important disease-gene associations, a procedure we term NetWAS. NetWAS identifies disease-associated genes more accurately than GWAS alone or an approach using a non-tissue-specific functional network. Our webserver, GIANT, (http://giant.princeton.edu) provides an interface to human tissue networks with multi-gene query capability, network visualization, analysis tools, and downloadable networks. GIANT also enables NetWAS reprioritization of users’ GWAS results.

............................................................................................................................
Tuesday, November 11
6:05 pm – 6:25 pm

SB T10
Pathway-based biomarkers specifically and robustly classify diverse multiple diseases

David Amar1, Tom Hait1, Ron Shamir1

1Tel Aviv University

Background: Gene expression signatures, serving as biomarkers, have been used successfully for prognosis, diagnosis, and patient stratification in cancer. However, such signatures are often not robust and sometimes perform poorly on new datasets. Moreover, standard case-control studies may yield a signature that is not specific to the tested disease. In addition, the set of genes constituting a signature often provides little insight about the disease etiology.

Methods: We compiled a compendium of annotated expression profiles from 174 gene expression studies from GEO, covering 13,314 samples from 17 different array technologies, and 1,699 RNA-Seq samples from TCGA. The RNA-Seq samples were used for validation only. Overall, our compendium covers 48 diseases, each covered by at least five different studies. Each sample was manually annotated with Disease Ontology terms. We used the compendium to learn a multi-label classifier. In order to avoid batch effects, leave-dataset-out cross validation was used to test the performance of the classifier on each disease. Previous studies sought classification of cases of one disease vs. all other samples. However, since many diseases originating from different tissues are included in the compendium, such classifiers may actually predict well the tissue but not the disease status. Thus, we take a more stringent approach: for each disease we test both for overall separation of cases vs. the rest, and for separation of cases vs. the 'disease controls' that were included in the same studies. In such analysis, a good signature would produce biomarkers that are disease-specific and consistently distinguish the cases from the disease controls and the background samples.

Results: Our strategy produced high performance classifiers for 16 diseases, including cardiovascular disease, gastric, breast, and immune system cancers. For these diseases, accurate classification was obtained in cross validation and even on new datasets produced by RNA-seq. In addition, our overall multi-label classifier outperformed previous studies while using a simpler classifier (e.g., Huang et al. (PNAS 2010) report 82% recall at 20% precision, while we get 90% recall at 20% precision, and 32% precision at 82% recall). We constructed a gene signature for each disease by including genes that (1) had high importance score in the disease classifier, and (2) were markedly differential in the disease patients compared to both healthy controls from the same study and to samples of other diseases. Reassuringly, our cancer gene signature is enriched with pathways from the hallmarks of cancer: e.g., cell cycle, cell cycle checkpoints, and DNA replication. Similarly, for leukemia and cardiovascular disease, the gene signature is highly enriched for pathways that were previously associated with these diseases. Of note, 43% of our signature genes are not part of known pathways, suggesting that they can lead to new biological hypotheses. Moreover, our leukemia signature contains two known targets of leukemia drugs. In gastric cancers, a new up-regulated druggable gene, NR1I2, is suggested.

Conclusions: A judicious analysis of a large, heterogeneous compendium of expression profiles produces disease-specific diagnostic signatures and reveals previously unknown disease genes.

............................................................................................................................
Tuesday, November 11
6:25 – 6:45 pm

SB T11
Network modeling reveals key features of epithelial-to-mesenchymal transition dynamics in liver cancer invasion

Steven Steinway1, Jorge Gomez Tejeda Zañudo2, Thomas Loughran1, Reka Albert2

1University of Virginia, 2Penn State University

Epithelial-to-mesenchymal transition (EMT) is a developmental process hijacked by cancer cells to leave the primary tumor site, invade surrounding tissue, and establish distant metastases. A hallmark of EMT is the loss of E-cadherin expression, and one major signal for the induction of EMT is transforming growth factor beta (TGFβ), which is dysregulated in up to 40% of hepatocellular carcinoma (HCC). We previously constructed and experimentally validated an EMT network of 69 nodes and 134 edges by integrating the signaling pathways involved in developmental EMT and known dysregulations in invasive HCC (Steinway et al., Cancer Research, 2014). Currently, we are analyzing perturbations (through computation and experiments) to our network that suppress TGFβ-driven EMT, with the ultimate goal of identifying therapeutic interventions which suppress tumor invasion. We noticed that some perturbations produce steady states that differed substantially from previously identified epithelial and mesenchymal steady states in our model. Further analysis revealed that these perturbations lead to states that are intermediate to epithelial and mesenchymal phenotypes. Similar so-called “EMT hybrid” states have been described in the literature. Quantitative analysis of these attractors reveals that these hybrid states form a subset of steady states that are distinct from epithelial and mesenchymal steady states. Lastly, our results suggest that combinatorial inhibition can effectively suppress EMT. Out of 2346 possible combinations (of two nodes), our model predicts that 9 nodes in combinations (SMAD, ERK, SOS1, GRB2, RAS, DLL, NOTCH, CSL) will robustly suppress TGFβ-driven EMT. We have demonstrated experimentally that expression of these nodes is enriched in mesenchymal relative to epithelial phenotype HCC cell lines. Furthermore, we demonstrate that these knockout combinations act by disrupting feedback loops that drive the EMT process. We are currently working to validate these finding experimentally. These results support network modeling as an important tool to identify critical mediators in complex biological processes. We further propose network modeling as a tool to discover therapeutic targeting strategies within complex disease pathways, specifically in liver cancer invasion.

............................................................................................................................

Top of Page


WEDNESDAY, NOVEMBER 12

 


9:45 am – 10:05 am

SB T12
A canonical correlation analysis based dynamic Bayesian network prior to infer gene regulatory networks from multiple types of biological data

Brittany Baur1, Serdar Bozdag1

1Marquette University

One of the challenging and important computational problems in systems biology is to infer gene regulatory networks of biological systems. Several methods that exploit gene expression data have been developed to tackle this problem. In this study, we propose the use of copy number and DNA methylation data to infer gene regulatory networks. We developed an algorithm that scores regulatory interactions between genes based on canonical correlation analysis. In this algorithm, copy number or DNA methylation variables are treated as potential regulator variables and expression variables are treated as potential target variables. We first validated that the canonical correlation analysis method is able to infer true interactions with high accuracy. We showed that the use of DNA methylation or copy number datasets leads to improved inference over steady-state expression. Our results also showed that epigenetic and structural information could be used to infer directionality of regulatory interactions. Additional improvements in gene regulatory network inference can be gleaned from incorporating the result in an informative prior in a dynamic Bayesian algorithm. This is the first study that incorporates copy number and DNA methylation into an informative prior in dynamic Bayesian framework. By closely examining top-scoring interactions with different sources of epigenetic or structural information, we also identified potential novel regulatory interactions.

............................................................................................................................
Wednesday, November 12
10:05 am – 10:25 am

SB T13
Variability in B-vitamin dependencies in the human microbiome genomes

Matvei Khoroshkin1, Andrei Osterman2, Dmitry Rodionov2

1Institute for Information Transmission Problems, Russian Academy of Sciences, 2Sanford-Burnham Medical Research Institute

B vitamins are biochemical cofactors essential for any living systems. Human microbiota is the complex and dynamic community of commensal, symbiotic and pathogenic microorganisms that are present on and within the human body and has an enormous impact on humans. We investigate the ability of bacteria from human microbiome to produce and salvage B vitamins. We have selected the reference set of 1143 bacterial genomes from 7 phyla out of those sequenced in course of Human Microbiome Project (HMP). By using the metabolic subsystems approach (as implemented in the SEED database) and analyzing genomic context and regulons, we have reconstructed biochemical pathways for synthesis of eight B vitamins (thiamin, riboflavin, niacin, biotin, pyridoxine, cobalamin, pantothenate, folate) and predicted putative vitamin transporters in the reference HMP genomes. Using the reconstructed metabolic pathways, we have classified the HMP organisms with respect to their B-vitamin proto-, auxotrophy and their vitamin transport capabilities. The preferable patterns of vitamin dependency were attributed to a number of taxonomic units. For instance, the Bacteroides are mostly prototrophs that are capable synthesizing all B vitamins, excluding cobalamin. On the contrary, the Lactobacillales are auxotrophes for all vitamins, excluding folate. The reference HMP genomes show a relatively high level of conservation of vitamin synthesis phenotypes at the genus level, hence only 25% of the studied genera demonstrate variability of phenotypes for individual vitamins. Also we have identified patterns of vitamin dependency for a number of body sites. Gastrointestinal tract generally shows the prevalence of vitamin prototrophic bacteria, whereas oral cavity, urogenital tract and blood are largely populated by vitamin auxotrophs. This work is important for understanding the role of B-vitamins in maintaining homeostasis of human microbiome community structures and for future developing of specific vitamin diets.

............................................................................................................................
Wednesday, November 12
10:25 am – 10:45 am

SB T14
Master regulators of luminal and basal subtypes of breast cancer

Archana Iyer1, Celine Lefebvre2, Yishai Shimoni1, Mukesh Bansal1, Mariano Alvarez1, Jose Silva3, Andrea Califano1

1Columbia University, 2Gustave Roussy, 3Mount Sinai Medical Center

Breast cancer is a heterogeneous group of diseases that can be stratified into several subgroups based on their molecular signature. Understanding the regulators of these molecular subtypes will allow us to make them more amenable for targeted therapies or personalized medicine. Here we present our discovery of master regulators that are important in the transcriptional regulation of the two major subtypes: basal and luminal. We reverse engineered a breast-cancer specific transcriptional network using large-scale gene expression datasets in breast cancer (TCGA, Metabric, UNC-300) to create a breast cancer interactome. Using MARINa (Master Regulator Inference Algorithm) we have identified specific transcription factors that regulate basal and luminal subtypes of breast cancer. We further validated these master regulators experimentally by performing a pooled shRNA screen on six independent cell lines (2 luminal, 3 basal, and one normal mammary epithelial cell line). The pooled shRNA screen was sampled at days 0, 10, 18, and 25, and genomic DNA was barcoded and sequenced using the Illumina MiSeq technology. Both computational predictions and our experimental results from the deconvolution of the shRNA screen validate the luminal transcription factors (FOXA1, ESRI, and GATA3). In addition, we discover novel luminal-specific transcription factors like TFAP2C. For the basal subgroup, while we discover novel regulators we also find that this group is more heterogeneous compared to the luminal. Importantly we can effectively target this subgroup with a combination of master regulators.

............................................................................................................................
Wednesday, November 12
11:10 am – 11:30 am

SB T15
Inferring the genome-wide functional modulatory network: a case study on the NF-κB/RelA transcription factor

Xueling Li1, Min Zhu2, Allan Brasier1, Andrzej Kudlicki1

1The University of Texas Medical Branch at Galveston, 2Hefei Institutes of Physical Science, Chinese Academy of Sciences

How different pathways lead to the activation of a specific transcription factor with specific effects is not fully understood. A modulatory network is composed of triplets of a specific transcription factor, target genes, and modulators. Modulators usually affect the activity of the specific transcription factor at the post-transcription level in a target gene-specific manner (action mode), which may be classified as enhancement, attenuation, and inversion of the activation or inhibition. Reconstructing such modulatory networks will help to interpret how transcription factors produce distinct gene responses to different stimuli. As a case study we inferred, from a large collection of expression profiles, all potential modulations of NF-κB/RelA. The predicted modulators include many proteins previously not reported as physically binding to RelA. The functions of the predicted modulators are consistent with biological activities of NF-κB/RelA include RNA processing, alternative splicing, cell cycle, mitochondrion, ubiquitin-dependent proteolysis and ribosome biogenesis, and are consistent with binding modulators in our previous study. The predicted genome-wide RelA modulators from different enriched pathways or processes exert specific prevalent action modes on distinct pathways through RelA. Also, the modulators from noncoding RNA (ncRNA), RNA binding proteins, transcription factors, cytoskeleton, and kinases modulate the NF-κB/RelA activity with specific action modes consistent with their molecular functions and modulation level. Finally, we analyzed the modulatory network of NF-κB/RelA in the context of TGFB1-induced epithelial-mesenchymal transition (EMT). Here modulators of NF-kB/RelA included those involved in extracellular matrix (FBN1), cytoskeletal regulation (ACTN1), and tumor suppression (FOXP1).

............................................................................................................................
Wednesday, November 12
11:30 am – 11:50 am

SB T16
The gene expression cascade connecting p53 dynamics to cell fates

Antonina Hafner1, Jeremy Purvis2, Galit Lahav3

1Harvard University, 2University of North Carolina at Chapel Hill, 3Harvard Medical School

The dynamics of transcription factors have been shown to play important roles in a variety of biological systems. However, the mechanisms by which these dynamics are decoded to trigger specific responses are not well understood. Our study focuses on the tumor suppressor protein p53 and how its temporal dynamics control gene expression and cell fate decisions. We measured the genome-wide transcriptional response to different p53 dynamics with the goal to identify the input-output relationship between p53 and its target genes.

In response to γ-irradiation, cells exhibit pulses in p53 protein levels. The number of p53 pulses is proportional to the irradiation dose, with higher doses leading to more pulses and pushing cells toward permanent cell cycle arrest (senescence). Changing the pulses into a sustained p53 levels leads to senescence independently of the radiation dose. This suggests that a given temporal behavior profile of p53 can trigger a specific cellular outcome. p53 mainly functions a transcription factor. To understand the relationship between p53 dynamics and the expression of its target genes we exposed cells to pulsed or sustained p53 signaling and measured gene expression at a high temporal resolution and over several days.

Our analysis revealed clusters of genes with distinct temporal characteristics: induction versus repression, immediate versus delayed response, and a transient versus sustained expression. I will present our data and analysis for dissecting the properties of each cluster and the role of various molecular mechanisms (e.g., p53 DNA binding, chromatin states, and post-transcriptional regulation of mRNA) for connecting p53 dynamics and the downstream gene expression response. Understanding the mechanisms decoding p53 dynamics into cellular outcomes will enable us to identify and test novel methods for pushing cells toward a specific fate.

............................................................................................................................
Wednesday, November 12
1:55 pm – 2:15 pm

SB T17
Pathways on demand: automated reconstruction of human signaling networks

Anna Ritz1, Christopher Poirel1, Allison Tegge1, Nicholas Sharp1, Allison Powell1, Kelsey Simmons1, Shiv Kale1, T. M. Murali1

1Virginia Polytechnic Institute and State University

Signaling pathways are a cornerstone of systems biology. Several databases store representations of these pathways that are amenable for automated analyses. Despite painstaking manual curation, significant variations exist between databases. To overcome these limitations, we present PathLinker, a new computational method that can reconstruct a signaling pathway from a background protein interaction network given only the identities of the receptors and transcription factors and regulators in that pathway. We demonstrate that PathLinker can reconstruct the Wnt pathway in the NetPath database with much higher precision and recall than several state-of-the-art algorithms, recovering non-canonical branches that appear only in this pathway's representation in other databases. PathLinker suggests a surprising role for CFTR, a chloride ion channel transporter of the ABC class, in Wnt/beta-catenin signaling, which we validate using siRNA experiments. We extend our computational results to accurately reconstruct a comprehensive set of signaling pathways in the NetPath database. We demonstrate that PathLinker can bridge differing representations of the same pathway between databases.

............................................................................................................................
Wednesday, November 12
2:15 pm – 2:35 pm

SB T18
Cell-to-cell variability in overcoming a caspase-8 activity threshold explains fractional killing by TRAIL

Marc Hafner1, Jeremie Roux1, Samuel Bandara1, Joshua J. Sims1, Diana Chai2, Peter K. Sorger1

1Harvard Medical School, 2Merrimack Pharmaceuticals

Ligands and DR4/5-receptor agonist antibodies such as TRAIL or Apomab trigger apoptosis in tumor cells. Although promising, drugs targeting this pathway have stalled in Phase II/III of clinical trials because of variable efficacy. Many mechanisms of resistance have been proposed to explain patient-to-patient variability but no quantitative model has been built to evaluate and compare these different hypotheses. In this work, we developed a quantitative model of the death-inducing signaling complex (DISC) to identify dynamical features that predict cell fate. We used this model to understand how resistance genes prevent apoptosis and found drug combinations that overcome this resistance.
Using live cell microscopy and HeLa cells engineered with a FRET reporter of Bid cleavage, we monitored caspase-8 activity at the single cell level after exposure to TRAIL. For each cell (a few hundreds per condition), we derived parameters characterizing DISC activity. We found that the maximal value of FRET ratio (i.e., integrated caspase-8 activity) is not different between surviving and dying cells, but its derivative (a surrogate of instantaneous caspase-8 activity) is significantly lower in surviving cells; only cells with a capsase-8 activity reaching a specific threshold θ will die. Based on a simple mechanistic model, the maximal caspase-8 activity is the product of k, the rate of capspase-8 activation, and τ, the duration of this activation. Therefore, the three parameters k, τ, and θ determine cell fate at a single cell level and explain the fate divergence across an isogenic population with more than 70% accuracy.

Higher doses of TRAIL increase the rate k and consequently induce more killing. This relation is also true for the DR4/5 antibodies Mapatumumab and Apomab, although the caspase-8 activation rate is lower, which results in less than 50%, respectively 10%, of killing at saturating doses. We found that clustering these antibodies significantly increases the rate k beyond the saturation value and therefore induces more cell death. Using Bortezomib, we were also able to move cells along the τ-axis. As predicted by our model, for treatments where the rate k is high enough, Bortezomib strongly synergies with the DR agonists to induce apoptosis. In particular, Apomab, which has low efficacy when used a single agent (<8% killing), can kill the majority of cells when combined with the clustering agent and Bortezomib.

Using this framework, we studied how the FLICE-inhibitory proteins (FLIP-L or -S for the long or short forms) induce resistance to TRAIL. We found that FLIP overexpression is correlated with a decreased caspase-8 activation rate and prevents death even at the highest doses of TRAIL. Based on the values of k and the half-life of each FLIP isoform, we predicted that only FLIP-L overexpressing cells will be sensitized by Bortezomib, whereas cells with high levels of FLIP-S will not because of the stronger inhibitory effect of FLIP-S on caspase-8 activity. Our results confirmed these predictions and validated our model. In conclusion, we developed a framework to quantitatively understand the mechanisms of resistance to TRAIL and DR4/5 antibodies, and showed that it can be used to find drugs working in synergy with DR agonists.

............................................................................................................................
Wednesday, November 12
2:35 pm – 2:55 pm

SB T19
Synthesizing signaling pathways from temporal phosphoproteomic data

Ali Sinan Köksal1, Anthony Gitter2, Kirsten Beck3, Aaron McKenna3, Saurabh Srivastava4, Nir Piterman5, Rastislav Bodík1, Alejandro Wolf-Yadlin3, Ernest Fraenkel6, Jasmin Fisher7

1University of California, Berkeley, 2University of Wisconsin-Madison, 3University of Washington, 420n, 5University of Leicester, 6Massachusetts Institute of Technology, 7Microsoft Research Cambridge

Advances in proteomic measurements reveal that even the best-curated pathways fail to capture a large fraction of signaling events. Here we propose a synthesis approach to produce precise models of signal transduction from temporal phosphoproteomic data. We first integrate the time series data with a protein-protein interaction network to produce an initial undirected graph. Using program synthesis techniques, we exhaustively explore all possible signaling pathways that are consistent with the proteomic data and the initial graph without explicitly enumerating all models. These pathways must satisfy several logical constraints. Most notably, a chain of events initiating at the source of the stimulation, the epidermal growth factor (EGF) in this case, must explain the activation or inhibition of each phosphorylated protein. In addition, the timing of all events must agree with the temporal data such that upstream proteins are not activated or inhibited after their downstream neighbors. This approach identifies parts of the network that are consistent with all possible pathway models. We are able to determine the direction of interactions, whether edges activate or inhibit, and the times at which proteins are activated.

Using new mass spectrometry data of the temporal EGF response in EGFR Flp-In HEK-293 cells, we show that nearly all proteins that change significantly in phosphorylation (89 to 98% depending on the database) are absent from canonical maps of the epidermal growth factor receptor (EGFR) signaling. Our computational approach reconstructs and summarizes all valid pathway models that explain how proteins are activated or inhibited by EGF. Collectively, these models account for 83% of the significant proteins and contain 413 protein-protein interactions, of which 200 can be confidently assigned a direction. In all cases where we predict a directed interaction between two EGFR pathway nodes, the prediction is correct. We use three natural language processing (NLP) tools to search for literature support for 54 predicted pathway edges that are peripheral to known EGFR pathway interactions. Manually verifying the results, we find that the direction is correct for 15 of the 16 predictions for which there is a definitive direction in the literature. Overall, of the 200 predicted directed pathway edges, 82 are supported by the canonical EGFR pathway, NLP, or kinase-substrate interactions (whose directions are included as prior knowledge). We are presently testing several predictions experimentally by assessing whether kinase inhibitors disrupt phosphorylation of the predicted substrate at the specific times proposed by our model. In summary, our computational approach identifies many previously unrecognized components of a well-studied signaling pathway. Our technique is broadly applicable to systems where dynamic proteomic data is available and has great potential for constructing pathway maps in conditions that alter classic signaling cascades, such as in diseased cells.

............................................................................................................................
Wednesday, November 12
2:55 pm – 3:15 pm

SB T20
Dissecting germ cell metabolism through network modeling

Leanne Whitmore1, Ping Ye1

1Washington State University

Metabolic pathways are increasingly postulated to be vital in programming cell fate, including stemness, differentiation, proliferation, and apoptosis. The commitment to meiosis is a critical fate decision for mammalian germ cells, and involves a key metabolite, retinoic acid (RA). Recent evidence suggests that a pulse of RA is generated in the male mouse, thereby triggering meiotic commitment. However, enzymes and reactions that regulate this RA pulse have yet to be identified in germ cells. We developed a genome-scale mouse metabolic network with a refined RA pathway. Using this network, we implemented flux balance analysis throughout the initial synchronized wave of spermatogenesis to elucidate important reactions and enzymes for the generation and degradation of RA. Our results indicated that the primary RA source is from the extracellular region and the major RA sink is nuclear transport. We further performed in silico knockouts of gene and reaction in the RA pathway and discovered that retinol binding to proteins is crucial for successful meiosis commitment. Finally, we examined the activity of other metabolic pathways in the genome-scale network and found that fatty acid synthesis and oxidation are the primary sources of energy in germ cells. This study predicts enzymes, reactions, and pathways that are most important for germ cell commitment to meiosis. Findings from this study help to enhance our understanding of the metabolic control of germ cell fate, results that will be critical for guiding future experiments to improve reproductive health.

............................................................................................................................
Wednesday, November 12
3:40 pm – 4:00 pm

SB T21
A scalable method for molecular network reconstruction identifies properties of targets and mutations in acute myeloid leukemia

Edison Ong1, Anthony Szedlak2, Yunyi Kang, Peyton Smith1, Nicholas Smith1, Madison McBride3, Darren Finlay3, Kristiina Vuori3, James Mason4, Edward D. Ball5, Carlo Piermarocchi2, Giovanni Paternostro3

1Salgomed, 2Michigan State University, 3Sanford-Burnham Medical Research Institute, 4Scripps Health, San Diego, 5University of California, San Diego

A key aim of systems biology is the reconstruction of molecular networks. However, we do not yet have networks that integrate information from all datasets available for a particular clinical condition. This is in part due to the limited scalability, in terms of required computational time and power, of existing algorithms. Network reconstruction methods should also be scalable in the sense of allowing scientists from different backgrounds to efficiently integrate additional data.

We present a network model of acute myeloid leukemia (AML). In the current version (AML 2.1) we have used gene expression data (both microarray and RNA-seq) from five different studies comprising a total of 771 AML samples and a protein-protein interactions dataset. Our scalable network reconstruction method is in part based on the well-known property of gene expression correlation among interacting molecules. The difficulty of distinguishing between direct and indirect interactions is addressed by optimizing the coefficient of variation of gene expression, using a validated gold standard dataset of direct interactions. Computational time is much reduced compared to other network reconstruction methods. A key feature is the study of the reproducibility of interactions found in independent clinical datasets.

An analysis of the most significant clusters, and of the network properties (intraset efficiency, degree, betweenness centrality, and PageRank) of common AML mutations demonstrated the biological significance of the network. A statistical analysis of the response of blast cells from eleven AML patients to a library of kinase inhibitors provided an experimental validation of the network. A combination of network and experimental data identified CDK1, CDK2, CDK4, CDK6, and other kinases as potential therapeutic targets in AML.

............................................................................................................................
Wednesday, November 12
4:00 pm – 4:20 pm

SB T22
Enhancer poising and regulatory locus complexity shape transcriptional programs in hematopoietic differentiation

Alvaro Gonzalez1, Manu Setty2, Christina Leslie1

1Memorial Sloan Kettering Cancer Center, 2Columbia University

We carried out an integrative analysis of enhancer landscape and gene expression dynamics in hematopoietic differentiation using DNase-seq, histone mark ChIP-seq, and RNA-seq in order to model how enhancer poising and regulatory locus complexity together govern gene expression changes at cell state transitions. We found that high complexity genes – i.e., those with a large total number of DNase-mapped enhancers across the lineage – differ architecturally and functionally from low complexity genes, achieve larger expression changes, and are enriched for both cell-type specific and “transition” enhancers, which are established in HSPCs and maintained in one differentiated cell fate but lost in others. We then developed a quantitative model to predict gene expression changes from the DNA sequence content and lineage history of active enhancers. Our method accurately predicts expression changes for high complexity genes during differentiation, suggests a novel mechanistic role for PU.1 at transition peaks in B cell specification, and can be used to correct enhancer-gene assignments.

............................................................................................................................
Wednesday, November 12
5:05 pm – 5:25 pm

SB T23
Disease gene prioritization using network and feature

Bingqing Xie1, Gady Agam1, Sandhya Balasubramanian2, Jinbo Xu3, Natalia Maltsev2, Conrad Gilliam2, Daniela Boernigen2

1Illinois Institute of Technology, 2University of Chicago, 3Toyota Technological Institute of Chicago

Identification of the most promising candidate genes contributing to disease phenotypes among large lists of variations produced by high-throughput genomics using traditional experimental methods is time- and cost-consuming. Therefore, using computational approaches that utilize existing biological knowledge for the prioritization of such candidate genes will enhance the efficiency and accuracy of the analysis of biomedical data. It will also reduce the cost of the studies by avoiding experimental validations of irrelevant candidates. To prioritize candidate genes contributing to a disease or phenotype of user’s interest for further testing, in this study, we present a novel algorithm that utilizes both types of information sources, gene annotations, and gene interactions simultaneously, while preserving their original representation using Conditional Random Field (CRF) model. We further improve the accuracy and efficiency of our proposed approach by assigning enrichment scores to the annotation feature factors within the model. To estimate the performance of our approach, we evaluated it on two independent benchmark studies, ranking the candidate genes by both network and feature knowledge. Our results overall had high Area Under Curve (AUC) values and high partial AUC (pAUC) values on various diseases benchmarks and revealed a higher accuracy and precision at the top predictions (10%) as compared with other prioritization tools. Additionally, we applied our method on a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our method was able to recover additional genes related to both disorders and provide suggestions for possible candidates based on their rankings and functional categories.

............................................................................................................................
Wednesday, November 12
5:25 pm – 5:45 pm

SB T24
Elucidating compound mechanism of action by network dysregulation analysis in perturbed cells

Mukesh Bansal1, Jung Hoon Woo1, Yishai Shimoni1, Archana Iyer1, Andrea Califano1, Charles Karan2, Gonzalo Lopez1, Paola Nicoletti1, Maria Rodriguez-Martinez3, Prem Subramaniam1, Wan Seok Yang1, Ronald Realubit1, Brent R. Stockwell1, Michela Mattioli4

1Columbia University, 2Columbia University Medical Center, 3IBM, 4Istituto Italiano di Techologia (IIT)

Genome-wide identification of small-molecule compound targets and effectors, within specific tissues, represents a highly relevant yet equally elusive question, with critical implications in the assessment of compound efficacy and potential toxicity in drug discovery. Experimental approaches are labor-intensive, mostly in vitro, and limited to specific protein classes (e.g., protein kinases and other enzymes), thus potentially missing proteins responsible for undesired toxicity. Computational approaches are virtually non-existent. We introduce a new regulatory-network based algorithm for elucidating compound mechanism of action (MoA). Experimental validation, using large collections of molecular profiles following compound perturbations, confirmed its ability to correctly identify established MoA proteins for >80% of tested compounds, including specific effectors of drug toxicity, such as SIK1 for doxorubicin. Several new predicted effectors were experimentally validated, including RPS3A, VHL, and CCNB1 as effectors of the mitotic spindle inhibitor Vincristine and JAK2 as a novel modulator of Mitomycin C sensitivity. Finally, the algorithm was effective in identifying specific proteins responsible for compound MoA similarity, such as GPX4, an established effector of sulfasalazine which was inferred and validated also as a direct target of altretamine, responsible for the compounds’ MoA similarity through increase in lipid ROS levels. This suggests that regulatory networks can provide novel mechanistic insight into drug activity, thus contributing to the characterization of potent, non-toxic small-molecule inhibitors.

............................................................................................................................
Wednesday, November 12
5:45 pm – 6:05 pm

SBT25
Network Infusion to infer information sources in networks

Soheil Feizi1, Ken Duffy2, Muriel Medard1, Manolis Kellis1

1Massachusetts Institute of Technology, 2Hamilton Institute

Several models exist for diffusion of signals across biological, social, or engineered networks. However, the inverse problem of identifying the source of such propagated information seems on the surface intractable, even in the presence of multiple network snapshots, and especially for the single-snapshot case, given the many overlapping paths in real-world networks. Mathematically, this problem can be undertaken using a diffusion kernel that represents diffusion processes in a given network, but computing this kernel is generally intractable.

Here, we introduce a modified diffusion kernel that relaxes the path-coupling constraints by only considering k independent shortest paths among pairs of nodes, assuming an exponential time distribution for node-to-node spreading. We use the resulting Erlang network diffusion kernel to solve the inverse diffusion problem using both likelihood maximization and error minimization. We apply this framework for both single-source and multi-source diffusion, for both single-snapshot and multi-snapshot observations, and using both uninformative and informative prior probabilities for candidate source nodes.

We apply Network Infusion (NI) to identify disease-causing genes of several human diseases including T1D, Parkinson’s, MS, SLE, CVD, CAD, psoriasis, and schizophrenia, and show that NI infers candidate disease-causing genes that are biologically relevant and often not distinguishable using the raw p-values. In a second application, we identify the news sources for 3553 stories in the Digg social news network, and validate our results based on annotated information that was not provided to our algorithm. We also apply NI to several synthetic networks and compare its performance to centrality-based and distance-based methods for Erdos-Renyi graphs, power-law networks, symmetric grids, and asymmetric grids.

We also provide proofs that under a standard susceptible-infected (SI) diffusion model, (1) the maximum-likelihood Network Infusion method is mean-field optimal for tree structures or sufficiently sparse Erdos-Renyi graphs, (2) the minimum-error algorithm is mean-field optimal for regular tree structures, and (3) for sufficiently-distant sources, our multi-source solution is mean-field optimal in the regular tree structure.


Top of Page | Go directly to Wednesday, Nov 12