HOME

Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category O - 'Systems Biology and Networks'

O001 - ACROSS THE BRIDGES AND INTO THE TREES

Short Abstract: Circadian clocks are ubiquitous and are found in bacteria, fungi, plants and animals.
They phase cellular processes and behavior to specific times of day and anticipate daily diurnal changes providing fitness advantage.The difficulty in elucidating the circadian clock is rooted in the notion that two independent factors contribute to controllability: the systems architecture (which components interact with each other) and the dynamic rules that capture the time dependent interactions between components. The identification of the principal actors (several have been characterized) inside a molecular and phenotypic network or inside a community is important to understand the topology of a complex network. Previously, we developed novel methods for the identification of circadian genes from short time-course microarray data and for the identification of the individual regulatory motifs which aggregate into coherent motif clusters capable of predicting the phase of a clock gene with high fidelity. Such motifs form the backbone of regulatory networks and play central role in defining its global topological organization. Here, we integrate gene expression profiles and protein interaction maps to provide a systematic and global view of combinatorial network modules underlying representative circadian programs. Furthermore we integrate the newly discovered cis regulatory modules into the circadian regulatory networks. This study forms the beginning of analytical framework that should allow one to study the controllability of a complex system like the circadian clock in plants through the combination of driver nodes with their time dependent control reflecting the systems dynamic logic. Such a network will provide a quantitative outlook upon a agronomically important network

O002 - The just-in-time expression of yeast ribosomal proteins

Short Abstract: The just-in-time expression of yeast ribosomal proteins
Gang Chen, Bernard Fongang, Xueling Li, Andrzej Kudlicki

The mature eukaryotic ribosome consists of ribosomal RNAs and several dozen proteins (RPs). Under certain conditions, the output of mature ribosomes in a synchronized cell culture is strongly modulated in time, which allows applying model-based deconvolution to find peaks in the temporal profiles of the individual ribosomal genes. These methods have been shown to yield timing resolution higher than the original time-course data. We analyzed timecourse expression data from a metabolically regulated yeast culture using an empirical model of the temporal profile, this procedure allows precisely estimating the times of expression peaks of ribosomal genes. The results reveal different time delays in the expression of the ribosomal subunits, spanning an approximately 20-minute interval.
The expression times reflect the position of the gene product in the ribosome (the early expressed RPs tend to be buried deeper in the ribosome), which supports the “just in time transcription” view of regulation of subunits of macromolecular complexes. We also find significant differences between the regulatory elements of the early and late expressed genes, which may be responsible for regulating the timing. The most prominent example (p=0.009) is the correlation between the expression time and the distance from the Rap1 motif to the 5’ end of the coding sequence. This type of regulation of fine-tuned timing of gene expression has not been described previously in eukaryotes.

O003 - A Generalized Framework for Interactomes with Time-varying Topology

Short Abstract: Topological analyses of interactomes have focused on static overall networks that represent interactions regardless of their temporal context. However, even if one assumes that no false interactions are included in the network, not all network interactions occur at the same time. Therefore, conclusions based on the topology of such overall networks may not be valid. Consequently, one must consider interactomes to have dynamic topology that affects and is affected by the functions they perform.

We introduce a generalized framework for simulation and integration of dynamic interactomes, and provide an example for its implementation. The framework describes a network as an evolving system partitioned into an edge evolution model and a node evolution model. This framework can be used to simulate and integrate interactomes of varying types, sizes and complexities and it can serve as a basis towards facilitating the study and analysis of dynamic interactomes. In addition, we also demonstrate that time-varying topological node metrics (like time-average betweenness-centrality) correlates to the nodes' functional context more than metrics measured on the overall static network.

O004 - Prediction of synthetic lethal interactions by network topological features using decision tree modelling

Short Abstract: Synthetic lethal interactions (SLIs) are genetic interactions in which mutations or deletions in separate genes result in lethality when combined in same cell under a given condition. While the detection of such interactions may provide clues about how cellular processes work when the protein products expressed by two different genes have an effect together but not separately, testing all combinations of genes for detecting which genetic interactions are lethal under many different conditions is time-consuming and labor-intensive. To expand the amount of known SLIs, many efforts have been attempted to construct models that predict SLIs. Most of these models, however, are primarily based on SLI-dependent features, that is, features extracted from the known SLI network itself. Thus, these models are far less effective in predicting gene pairs not well connected with the known SLIs. Here we developed a decision tree model based on SLI-independent features, specifically network centrality measures calculated from a integrated network of gene interactions (INGI)-network containing simultaneously protein-protein, metabolic and transcriptional regulatory interactions-, to predict SLIs in Saccharomyces cerevisiae. This model is able to recover 72% of known SLIs with a precision of 72%. Furthermore, the probability that our decision tree model will rank a randomly selected SLI higher than a randomly selected non-SLI is of 80% as indicated by the area under the receiver operating characteristic curve (AUC). Interestingly, although our model is based on only six network centrality measures, it outperformed other previously developed model based on hundreds of SLI-independent features.

O005 - ARNI: Abductive inference of complex regulatory network structures.

Short Abstract: Introduction
The cellular response to perturbation is controlled by complex interconnected molecular interactions comprising of changes in signalling as well as transcriptional regulatory components. A fundamental challenge in systems biology is to extract the integrated regulatory network activated as part of such a response. Current inference methods have limited expressiveness and applicability, relying on cause-effect pairs and systematically perturbed datasets.

Methods
We present a novel framework, Abductive Regulatory Network Inference (ARNI), for building integrated models of regulatory networks from measurements under the influence of a single environmental factor or perturbation. We model the reconstruction problem as an abductive inference procedure that integrates elements from model checking, state predictions and topology inference. Logical rules use prior knowledge from online databases, and a sign-consistency model to determine how affected genes are organized in a signed-directed network.

Results
We evaluate our approach using in-silico datasets provided by the DREAM (Dialogue for Reverse Engineering Assessments and Methods) consortium. Comparing it to existing approaches, we show that ARNI achieves better coverage, enriching networks with feedback loops, competitive gene influences and coordinated regulation, while retaining scalability and robustness. We present a sensitivity analysis on the ability of ARNI to infer the correct network under i) increasing number of unobserved genes (influenced by the noise in the data), and ii) under decreasing availability of prior knowledge.

Conclusions
The wider applicability and improved expressiveness of ARNI is expected to elucidate more realistic networks, better capturing the dynamics of the system.

O006 - Classifying Protein–Protein Interactions

Short Abstract: Modern high-throughput techniques made it possible to identify thousands of protein-protein interaction (PPI) and to reconstruct interaction networks for a growing number of model organisms. However, these methods are only capable of measuring whether or not two proteins engage in a physical interaction or are part of the same complex. This binary classification of PPIs (yes or no) does not fully reflect the diversity of PPI types in different cellular contexts. In this work we focused on two aspects of PPIs. The first aspect concerns the distinction between obligate and non-obligate interactions based on stability of protein complexes. Proteins that take part in obligate PPIs are not found as stable structures on their own in vivo while non-obligate interactors can exist independently. The second biological property of interest is whether a protein interacts with multiple partners at the same time (simultaneously possible (SP) interactions) or can rather interact with only one partner at a time (mutually exclusive (ME) interactions). Here, we describe the first accurate machine learning classifier for SP/ME (auROC of 0.835) and obligate/non-obligate (auROC of 0.881) interactions that uses only network and sequence features of proteins and does not rely on known three-dimensional structures. We describe informative sequence features and characteristic network topologies for each interaction type. We also present the results of a large-scale application of our classifier to the comprehensive interaction networks obtained from the iRefIndex database and evaluate the results using a collection of manually curated protein complexes from the CORUM database.

O007 - The regulatory network that controls T lymphopoiesis

Short Abstract: A large amount of molecular information was integrated to infer the regulatory network that controls the differentiation process of T lymphocytes. The regulatory network consists of 50 nodes and 97 interactions, representing the main signaling circuits established between molecules and molecular complexes regulating the differentiation of T cells. Such network was converted into a continuous dynamical system with the use of the standardized qualitative dynamical systems (SQUAD) methodology, and its dynamical behavior was studied with the aid of numerical methods. The dynamical system has nine fixed point attractors, which correspond to the activation patterns observed experimentally for the following cell types: CD4-CD8-, CD4+CD8+, CD4+ naive, Th1, Th2, Th17, Treg, CD8+ naive, and CTL. Furthermore, the model is able to describe the differentiation process from the precursor CD4-CD8- to any of the effector types due to a specific series of extracellular signals.

O008 - From Cox Processes to the Chemical Lèvy-Langevin Equation

Short Abstract: We develop a new statistical model for biochemical reaction with non-stationary conditions. The new model is an extension of the well known Poisson model to Cox processes. The Cox process can be considered as a scale (and mean) mixture of Poisson processes. It is shown that the property of the convergence of the Poisson distribution to Gauss distribution for large rate parameter is paralleled in Cox processes to a convergence into scale (and mean) mixture of Gaussian distribution. We identify a special case namely alpha-stable distribution that can model skewed distributions as well as impulsive distributions and satisfy a generalized version of the central limit theorem. Based on this observation, we extend the classical Chemical Langevin Equation to Chemical Levy-Langevin equation, a stochastic process that is modelling Levy-walks as opposed to the Brownian motion modelled by classical Chemical Langevin Equation.

O009 - Efficient Modeling and Active Learning of Biological Responses: Learning without Prior Knowledge

Short Abstract: High throughput screening involves determination of the effect of many chemical compounds on a given cellular target. As currently practiced, a full set of measurements for all compounds for each new target is typically made, with little use of information from previous screens. To efficiently study compound effects on many targets, a means is needed for determining and exploiting similarities in the effects of compounds and/or behavior of targets such that measurements of all combinations of compounds and targets are not needed to achieve high accuracy. Here, we describe probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for selecting future informative batches of experiments. Through extensive simulated experiments we showed that our approaches can produce powerful predictive models and learn them significantly faster than can be done by random choice. We further characterized our method’s performance experimentally using a collection of 48 compounds and 48 NIH 3T3 cell clones expressing different GFP-tagged proteins; the learner’s task was to efficiently build a model of the effects of each compound on each clone. Since none of the effects were known prior to beginning the experiments, each clone and compound was silently duplicated to provide the ability to check how well duplicates were recognized. The learner could to request acquisition of batches of image data for specific combinations of drugs and clones using liquid handling robotics and an automated microscope. Our method achieved a 92% accuracy having only sampled 28% of the experiment space.

O010 - Deep proteome fractionation reveals the conserved and dynamic modularity of animal complexes

Short Abstract: Despite the knowledge that the vast majority of life's processes at a cellular level are carried out by complexes of multiple proteins, knowledge of all the complexes formed in animal cells and their members seems a distant goal. By using a new approach we refer to as co-separation consisting of 1) subjecting biological samples to many levels of many types of native biochemical fractionations, 2) quantifying protein levels using LC-MS/MS for every fraction, and 3) processing the data through a machine learning pipeline, we discover complexes using a high-throughput all-by-all approach.

We recently published (*) a map of over 600 complexes derived from the analysis of more than 15 biochemical separations comprising over 1,000 LC-MS/MS samples from two human cell lines, and have since expanded our experimental datasets to include more than 50 biochemical separations comprising more than 5,000 LC-MS/MS samples, coming from many tissue types and cell lines, using many separation techniques, from 6 animal species. This expanse of data covering over 12,000 protein-coding genes enables us to globally seek out and explore interesting patterns of stable interactions, including 1) cohesive, conserved stable complexes, 2) machinery displaying clear hierarchical modularity, and 3) groups of proteins displaying compelling patterns of changing interactions between members over different experimental, developmental, and evolutionary contexts. This biochemical data, besides offering insights into the global organization and evolution of stable protein interactions, will be a valuable community resource systematically cataloging a large proportion of stable protein interactions conserved across animal cells.

*: Havugimana, Hart, Nepusz, Cell 2012

O011 - A two-scale 13C-based method for intracellular metabolic flux measurement and prediction in metabolically engineered biofuel producing E. coli

Short Abstract: The Fuel Synthesis division at the Joint BioEnergy Institute (JBEI) metabolically engineers E. coli and S. cerevisiae strains in order to ferment the sugar obtained from plant biomass into biofuels. Metabolic engineering involves assembling from genetic components found in nature a metabolic pathway capable of producing a desired biofuel molecule, and then expressing it in a host organism where the genetic components are not native. A fruitful approach for host engineering relies on the study of internal metabolic fluxes: i.e. the number of molecules traversing each biochemical reaction encoded in its genome per unit time. Two of the most popular methods for studying metabolic fluxes are Flux Balance Analysis (FBA) and 13C Metabolic Flux Analysis (13C MFA), each displaying its own advantages and disadvantages. FBA deals with genome-scale models of metabolism, takes into account system-wide balances of metabolites and displays a variety of predictive capabilities. 13C MFA does not need to invoke maximum growth assumptions and is backed by more extensive sets of data from 13C labeling experiments, but is merely a descriptive method with no capability for prediction. Here, we propose a new method, two scale 13C Metabolic Flux Analysis (2S-13C MFA), which combines the advantages of both. 2S-13C MFA provides estimates of fluxes for genome-scale models, explicitly takes into account global metabolite balances and can be used as a base for flux prediction methods. It does not rely on maximum growth assumptions but rather on the highly constraining data from 13C labeling experiment.

O012 - G-stack bias in publicly deposited Affymetrix HG-U133A Microarray data

Short Abstract: Affymetrix GeneChips have proved to be an enormously useful tool in transcriptomics. The large of amounts of data of this type that has been deposited in publicly available repositories has been an invaluable resource for Systems Biology in inferring gene networks and for comparisons with other experiments on other platforms. It is essential that these data sets be as free as possible from bias.

Here we show that there exist spurious correlations between probe sets which have two or more probes with G-stacks (runs of 4 or more G's) in them for the Affymetrix HG-U133A GeneChip. This effect is largest for high correlations and hence one could infer associations between probe sets that are not present.
We have performed this analysis for 576 out experiments deposited at ArrayExpress using RMA normalisation. We show that for over 40% of the experiments we have examined the median shift in correlation (when we include or discard the G-stacks probes before normalisation) is greater than 0.1. We have shown that this is a feature of the G-stacks as we see no effect when repeated the calculation using C-stacks (runs of C's).

We propose that any future analyses of these data sets should take this effect into account and there should be a re-evaluation of previous research based on this chip.

O013 - Analysis of De-novo Epistatic Interaction Networks from WTCCC Datasets

Short Abstract: Genome-wide association studies (GWAS) traditionally focus on the association of single SNPs with disease (main effects). However, the genetic inheritance of complex disease is most likely at least in part explained by interactions between two or more loci (epistatic interaction effects). Recent advances in high performance computing permit exhaustive testing of multiple loci association with case-control data.
Recently, Genome Wide Interaction Search (GWIS) was developed, which uses a novel statistical test to discover significant epistatic interactions. We term two loci which generate a high score with GWIS as a top SNP-SNP pair or top pair.

We study properties of networks generated from the top pairs. We identify and study the occurrence of particular types of subgraphs, including hub subgraphs and bipartite subgraphs. We study the biological significance of the subgraphs by extending the rank-based gene set enrichment test from the list of top SNP-SNP pairs, to lists of pairs of SNPs with path length 2 or higher in the subgraph. We also identify top-scoring subgraphs by finding those that provide a good graphical alignment to gene networks corresponding to molecular pathways.
We further introduce an interactive visualisation tool to allow researchers to quickly identify interesting networks. Motivated by previous work on circular ideograms, Manhattan plots and co-occurrence matrices, we provide an integrated browser based tool to analyse epistatic interaction networks.

Using these novel statistical and computational approaches, we investigate
top SNP-SNP pairs generated with GWIS from the seven well-studied Welcome-Trust Case-Control Consortium (WTCCC) datasets.

O014 - An Ensemble-Based Approach to Infer Gene Regulatory Networks from Expression Profiles

Short Abstract: Purpose:
Gene regulatory network inference methods aim to decipher the complex
interplay of genes which either characterize the normal physiological state or disease-driving cellular dysfunction. Even though a plethora of methods using diverse conceptual ideas has been developed, a reliable network reconstruction remains challenging. A promising approach to compensate the limitations associated with each method and to benefit from their individual strengths is to aggregate inference results from different approaches.
Methods and Results:
We present an ensemble-based strategy for network reconstruction which
assumes that the structure of the gene regulatory network to be inferred is
partially known. By reconstructing the known subnetwork using heterogeneous
network inference methods, we first create an ensemble of network predictions. Based on this ensemble, a supervised classifier is built, with which unknown regulatory interactions can be predicted, and thus the subnetwork complemented. We evaluate the performance of our approach on expression compendia and goldstandards of the DREAM5 initiative and provide a comparison with state-of-the-art network inference methods including unsupervised ensemble methods.
Conclusion:
Unravelling networks of gene regulatory interactions is crucial for a better
understanding of disease mechanisms on the cellular level. With our ensemble-based approach, we aim at a more precise and robust reconstruction of gene regulatory networks.

O015 - Tell me your pathways

Short Abstract: A key annotation facet for a gene is the list of pathways it belongs to. Therefore, GeneCards V3.10 mines 11 pathway sources, typically showing 30±20 pathways for each of the highly annotated genes. However, the flat pathway list is of limited utility, due to a high degree of intra- and inter-source redundancy and inconsistency. Highly similar pathways often bear disparate names, and identically named pathways frequently show appreciable differences. Striving to convey an integrated, internally consistent view of biological pathways in every GeneCard, we have clustered a total of 3840 pathways from the 11 sources into 733 “super-pathways”, each containing 2 or more nodes. We used the previously employed gene-content metric and Jaccard pairwise distance (J). We further employed nearest neighbor clustering, excluding edges with J<0.3, and including all edges with J≥0.7. This resulted in a collection of manageable super-pathways, each with no more than 80 members, and with optimal inter-cluster orthogonality. In comparison, the use of hierarchical clustering, as implemented elsewhere, resulted in much larger clusters, with up to 980 pathways. The unified super-pathways were added to GeneCards, while also preserving the original source-specific entries. This now enables more meaningful characterization of a gene via its affiliated set of pathways. In an upcoming enhancement, each super-pathway will be displayed as an entry in PathCards, a new searchable GeneCards suite member, portraying detailed super-pathway properties, and showing their internal pathway and gene components. Super-pathway enrichment with ncRNAs (cf. Belinky et al Bioinformatics 2013) is also projected.

O016 - Stratification of human protein-protein interaction network: Comparative analysis of Tissue-specific Hubs and Housekeeping Hubs

Short Abstract: Protein-protein interaction networks(PPIN)generated from the pair-wise interactions of proteins without the information of when and where these interactions occur gives a static picture of interactions and such a network is often referred to as ‘global network’. By mapping context-specific expression data of genes onto the global PPIN one could generate a context-specific network. For example, tissue-specific networks can be generated by mapping tissue-specific gene expression data onto global interaction network. Context-specific networks could lead to the identification of context-specific roles of topologically important proteins. In this study, we have constructed tissue-specific networks of human and identified hubs in different tissues. Based on the expression breadth of these hubs, we identified locally expressed tissue-specific hubs (TSH) as well as globally expressed housekeeping hubs (HKH). In-depth analysis of TSH and HKH revealed significant differences between these two groups at sequence, structural and functional levels. TSHs are longer and recent proteins enriched with more disordered regions compared to HKHs. TSH and HKH have similar number of binding interfaces but they differ in the number of interactions they make; TSHs show lower degree than HKHs suggesting that TSH are “unsaturated” with regard to their binding and are perhaps evolving with regard to their interactions. TSHs are less expressed and enriched with PEST motifs indicating their tight regulation and easy degradation. All these properties of TSHs and HKHs seem to imbue their distinct functional roles; TSHs are mostly secreted, transporter and signalling proteins whereas HKHs are involved in transcription, translation and complex formation.

O017 - A Linear Program to Infer Signal Transduction Networks using Perturbation Data

Short Abstract: Recent biological and technological advances in the field of perturbation experiments such as RNA interference (RNAi) enable the knockdown of individual genes in a high-content high-throughput manner. Thereby, a detailed quantification of perturbation effects on specific phenotypes can be assessed using multiparametric imaging. This allows to elucidate the gene function easily, whereas the spatial and temporal placement of the genes in the underlying networks is still a challenging task.

We propose methodologies which use individual cell measurements for the data analysis and statistical scoring of RNAi data. This results in an increased sensitivity and specificity of the identified hits in comparison to already existing methods. Furthermore, we present a network inference method for perturbation data which is based on a linear program and thus, can be solved efficiently even for large-scale data. Based on simulated data we show an improved performance of our approach over state-of-the art methods in terms of sensitivity and specificity, but also in terms of a significantly reduced computation time. Using our approach on real data studying the intracellular signaling of human primary naive CD4+ T-cells and ErbB signaling in trastuzumab resistant breast cancer cells we can recover already known interactions and additionally new ones. The interactions inferred with our approach for the ErbB data predict an important role of negative and positive feedback in controlling the cell cycle progression which needs to be validated further.

O018 - rBiopaxParser: A new package to parse, modify and merge BioPAX-Ontologies within R

Short Abstract: Methods for network reconstruction are often designed with the possibility to integrate prior knowledge about the topology of biological signaling networks. However, the format of prior knowledge required, usually in form of an adjacency matrix, is a strong abstraction of the biological reality. In the past years ontologies have been the tool of choice to represent and allow the sharing of knowledge of this biological reality. BioPAX is a commonly used ontology for the encoding of regulatory pathways.
The R Project for Statistical Computing is the standard environment for statistical analyses of high-dimensional data and network reconstruction methods. Although there are packages available that provide the pathway data of databases like KEGG, the Pathway Interaction Database (Nature/NCI) or Reactome as graphs, there was no software available to parse, merge and manipulate BioPAX ontologies inside of R.
We present a new open-source package called rBiopaxParser that parses BioPAX-Ontologies and represents them in R. Class definitions, properties and restrictions are mapped on a 1:1 basis, with respect to the limitations of object-orientation of R. The user is able to parse arbitrary BioPAX OWL files, for example the exports of popular online pathway databases like PID, Reactome or KEGG. Instances of BioPAX-Classes can be programatically added or removed. Multiple pathways can be merged or transformed into an adjacency matrix suitable as input for network reconstruction algorithms, i.e. reducing a pathway to a graph with edges representing only activations or inhibitions. The software is publicly available on Bioconductor.

O019 - Novel modeling of combinatorial miRNA targeting identifies SNP with potential role in bone density

Short Abstract: MicroRNA genes (miRNAs) are small non-coding RNAs that regulate the expression levels of mRNAs post-transcriptionally. miRNAs are critical in many important biological processes, like development, and are important markers for many diseases. Identifying the targets of miRNAs is not an easy task. Recent development of high-throughput data collection methods for identification of all miRNA targets in a cell are promising, but they still depend on computational algorithms to identify the exact miRNA:mRNA interactions. We present a novel algorithm, ComiR (Combinatorial miRNA targeting), which addresses a more general question, that is, whether a given mRNA is targeted by a set of miRNAs. ComiR uses miRNA expression to improve the targeting models of four target prediction algorithms. Then it combines their predicted targets using a support vector machine. By applying ComiR to single nucleotide polymorphism (SNP) data, we identified a SNP that is likely to be causally associated to osteoporosis in women. The ComiR web tool is available at http://www.benoslab.pitt.edu/comir/. It generates custom predictions for H.sapiens, M.musculus, D.melanogaster and C.elegans species miRNAs.
The work presented in this poster has been published in Coronnello et al (2012), PLoS Comput Biol 8(12): e1002830.

O020 - The integration of gene and miRNA expression using pathway topology: a case study on epithelial ovarian cancer.

Short Abstract: Pathways are formal descriptions of the biological processes. The study of gene expression in terms of pathways is defined as pathway analysis and aims at identifying groups of functionally related genes that show coordinated expression changes. Recently, pathway analysis moved from algorithms using merely gene lists to ones exploiting the topology that define gene connections. A crucial and, unfortunately, limiting step for these novel methods is the availability of the pathways as gene networks in which nodes are genes and edges are relations between pairs of elements. To this aim, we developed a pathway data interpreter, called graphite, able to uniformly store, process and convert pathway information into gene networks. graphite has been made publicly available as R package within the Bioconductor platform. graphite fills the existing gap lying between technical and methodological aspects. graphite allows more informative analyses on omics data and allows the development of new methods based on the increased accessibility of biological knowledge. However, the pathways of the four main public resources integrated into graphite (KEGG, Reactome, Biocarta and PID), still lack of crucial interactors: the microRNAs. Thus, we worked on an extension of graphite package able to integrate microRNAs in pathway topology, linking the non-coding RNAs to their validated target genes and providing integrated networks suitable for the topological pathway analyses. The feasibility of this approach has been validated on a specific biological context, the early stage of Epithelial Ovarian Cancer (EOC), and successfully guided us towards important biological results with potentially therapeutic implications.

O021 - MONGKIE: Modular Network Generation and Visualization Platform with Knowledge Integration Environment

Short Abstract: In the recent years, high-throughput studies of biological systems have been resulting in a greatly increased volume of complex and inter-connected data. Given the huge amount of data and the heterogeneity, well-integrated network visualization together with data analysis methods is a key aspect of both the understanding and analysis of the data. There are useful network visualization tools, but big challenges still remain. Here, we present MONGKIE, Modular Network Generation and Visualization Platform with Knowledge Integration Environment, which is an integrated network visualization and analysis platform which allows us to explore and analyze biological network in an interactive manner with knowledge integration environment. Although it is optimized for exploring binary interactions and pathways tightly integrated with hiPathDB (http://hipathdb.kobic.re.kr), it can be easily applied to any biological data which can be modeled as network structures such as miRNA-TF-gene regulatory network. MONGKIE provides the generalized data model of visualization to represent domain-specific types of biological entities and the interactions between them. And it is designed for both the visualization of networks and the analysis of these networks with a seamless integration between the two procedures. MONGKIE incorporates various knowledge integration and network analysis methods into the visualization platform, such as Interaction Manager, Gene ID Conversion, Expression Overlay, Network Clustering, Gene Set Enrichment Analysis of GO and Pathway, Pathway Integration and Visualization, Construction and Analysis of the Integrated Regulatory Network. MONGKIE is a java-based application, and supports plug-in architecture, thus being platform-independent and easily extendable with additional functionalities. MONGKIE is available at http://mongkie.org.

O022 - Mathematical modeling of synthetic-dosage gene interactions leading to invasive phenotype in colon cancer mouse model

Short Abstract: During cancer progression, cells acquire a number of hallmarks that promote tumour growth and invasion eventually leading to metastasis. Epithelial to mesenchymal transition (EMT-like) process is considered to represent the early step in initiation of cancer invasion, and an attractive but debated concept. There are several signalling processes that govern EMT-like in cancer; these includes the major pathways Notch, Wnt, p53, and AKT. In order to identify interplay between these signalling pathways and their impact on the development of EMT in intestinal cells, a biochemical reaction network was manually created based on scientific literature. For mathematical modelling purposes, this network was then converted and reduced into an influence network that focussed on how EMT is influenced by Notch, p53 and Wnt pathways. Logical modelling was used to formalise the hypothesis that activated Notch and p53 loss of function have synergetic effect on EMT. In addition we show a putative mechanistic model of the regulation of the transcription factors that induce EMT. Finally, as a confirmation of proposed hypothesis, in vivo mouse model with constitutively active Notch and loss of function of p53 developed highly invasive intestine cancer with EMT at early stages, followed by the different stages of metastasis in distant organs; as predicted, all other single and double mouse mutants do not show EMT.

O023 - The role of post-translational modifications in the context of protein interaction networks

Short Abstract: As chemically covalent modifications of protein, post-translational modifications (PTMs) contribute to a broad range of protein functionalities. Employing new technologies such as high-throughput mass spectrometry (MS)–based proteomics has led to the discovery of ever more proteins with PTMs. Among different PTMs, rich information on acetylation, amidation, glycosylation, methylation, and phosphorylation is already available in large-scale protein portals (e.g. Swiss-Prot). As all protein functionality is mediated via physical encounters between different molecules, protein interaction networks (PIN); i.e. interactions between proteins constitute a central aspect of cellular processes. In this study, several types of PTMs were mapped onto PINs of Arabidopsis thaliana, Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae S288c, Homo sapiens, Rattus norvegicus. Proteins with annotated acetylation, amidation, methylation, as well as phosphorylation events were found to exhibit significantly higher than average degree, while it is significantly lower for glycosylation compared to the average protein. We aimed to associate these systematic differences to the function of the respective proteins. For example, glycosylation sites were found enriched in proteins with plasma membrane localizations and transporter or receptor activity, which generally have fewer interacting partners. By integrating the global protein interaction network and specific PTMs, our study offers a novel approach to the role of PTMs on cellular processes.

O024 - Evaluation of cyclic pathway activities by signal transduction score flow algorithm

Short Abstract: Current approaches for large-scale data analysis do not effectively benefit from topological information brought in by signaling pathways. Determination of cell signaling behavior is crucial to understand the physiological response to a specific stimulus or against a drug treatment. In this study, we present a model- and data-driven hybrid approach, signal transduction score flow algorithm, that quantitatively evaluates the activity of cyclic cell signaling pathways by utilizing transcriptome and ChIP-seq data. This score flow algorithm converts a signaling pathway into a directed graph and maps experimental data onto gene nodes as scores, which are transferred over the graph by traversing the topology until a pre-defined biological target response is attained. Initially, experimental data-driven enrichment scores of the genes are computed in a pathway, then a heuristic approach is applied using the gene score partition as a solution for protein node stoichiometry during dynamic scoring of the pathway of interest.
Evaluation of the algorithm using both transcriptome and ChIP-seq data-generated signaling pathways revealed good correlation with expected cellular behavior on both KEGG and manually generated pathways. Implementation of the algorithm as a Cytoscape plug-in allows interactive visualization of pathways and analysis of KEGG or user-generated pathways. Moreover, the algorithm predicts gene-level and global impacts of single or multiple gene knockouts.

O025 - Graphlet-based measures are suitable for biological network comparison

Short Abstract: Recent advances, such as yeast two-hybrid screening and mass spectrometry of purified complexes, have provided an abundance of protein-protein interaction (PPI) network data. Gaining biological insight from these large and complex networks requires the development of efficient and biologically meaningful algorithms for their analyses. Systematic measures based on graphlets (small induced subgraphs of large networks) are proving useful in this regard. Recently, the use of graphlet-based measures for biological network comparison has been questioned: it has been claimed that the measures are ‘unstable’ in regions of low edge density.

In this study we demonstrate that graphlet-based measures are suitable for biological network comparison. We start by generating empirical distributions of “graphlet degree distribution agreement” scores, in order to identify the edge density regions in which the topology of model networks is ‘unstable.’ We show how graphlet-based measures correctly detect this topological instability and then demonstrate how eighteen recent PPI networks of different species are dense enough that this ‘instability’ does not affect their analysis. Furthermore, we show that data networks have local densities much higher than a model network would have, since models are uniformly dense. Hence, graphlet measures are 'stable' in regions that are of interest in real networks. Finally, we validate the use of graphlet-based measures for finding well-fitting random models for PPI networks by using a recently devised non-parametric statistical test. We show for the first time that five viral species, possessing the latest and most complete PPI networks, are well-fit by several models.

O026 - Protein interaction networks as metric spaces: a novel perspective on distribution of hubs and biological functional implications

Short Abstract: Complex networks have successfully modeled various phenomena in nature and
society. Following the successful description of small-world and scale-free
networks in terms of statistical mechanics of network topology, it has been
assumed that these networks share many of their organizing principles, including
the existence of randomly placed hubs. In our novel approach, we cast more light on this
view on the human functional protein network and other PPI across kingdoms.
We show that, as graphs, the structure PPIs are not those of
hubs randomly placed in the network with proteins of low degree interacting with
hubs. We show that
proteins are radially organized from a central point; with those of high degree
coagulating in the center and those in the fringes constituting what we have
called quills. This approach elucidates on how protein-protein
interaction networks are organized.

Further, we point to the fact that this mathematical organization has biological
significance. We show that in general proteins in central positions (in a large human functional protein interaction network (HFPIN) consisting of 9448 nodes and 181706 interactions) tend to be
involved in sensing functions and as one moves from the center to the periphery,
there is a tendency to diversify in routine metabolic functions.

O027 - Large-scale identification and exploration of biological pathway mentions in biomedical literature

Short Abstract: Biological pathways are frequently mentioned in biomedical literature and play a key role in many biomedical studies. Names for specific pathways gradually emerge within the community to avoid verbose repetition of molecular details. Meanwhile, curated databases have been built to present detailed graphical descriptions of molecular processes forming conventional pathways. However, many pathway names are mentioned within the literature yet missing from these databases, and conversely, many pathway names in databases never appear in the literature. We devised a hybrid method composed of dictionary matching and rule-based detection to identify mentions of named pathways in the literature, which achieves an F-score around 80%. Our work complements other text-mining efforts that aim to construct pathways from molecular interactions mentioned in literature; we, on the other hand, aim to recover mentions of conventional pathways. Systematic identification of biological pathway mentions is valuable in many ways: detected mentions can be employed to reveal hidden relations between pathways and diseases; they can also be utilized by systems biologists to assist interaction model construction by linking pathway mentions to corresponding molecular details hosted in curated databases; for pathway database curators, detected pathway mentions can be valuable to help prioritize the most interesting pathways for curation in relation to particular diseases or cellular functions. We ran the method on all MEDLINE abstracts related to cancer (~2.5 million abstracts) and on the whole PMC Open Access subset (~500,000 full-text articles). We provide an initial analysis of pathway literature mentions and have made the results publicly available.

O028 - Computing Multi-Level Clustered Alignments of Gene-Expression Time Series

Short Abstract: Identifying similarities and differences in expression patterns across multiple time series can provide a better understanding of the relationships among various normal biological and experimentally induced conditions such as chemical treatments or the effects induced by a gene knockout/suppression. We consider the task of identifying sets of genes that have a high degree of similarity both in their (i) expression profiles within each condition, and (ii) changes in expression responses across conditions. Previously, we developed an approach for aligning time series that computes clustered alignments. In this approach, an alignment represents the correspondences between two gene expression time series. Portions of one of the time series may be compressed or stretched to maximize the similarities between the two series. A clustered alignment groups genes such that the genes within a cluster share a common alignment, but each cluster is aligned independently of the others. Unlike standard gene-expression clustering, which groups genes according to the similarity of their expression profiles, the clustered-alignment approach clusters together genes that have similar changes in expression responses across treatments. We have now extended the clustered alignment approach to produce multi-level clusterings that identify subsets of genes that have a high degree of similarity both in their (i) expression profiles within each treatment, and (ii) changes in expression responses across treatments. We evaluate this method by considering the stability of resulting clusters and their agreement with extrinsic data sources.

O029 - Establishing the rules of intercellular communication in the human blood-forming system

Short Abstract: Hematopoietic tissue consists of multiple phenotypically and functionally distinctive cell populations generated by hematopoietic stem cell (HSC) differentiation through a series of relatively well-defined parent-child relationships. The tissue’s cellular composition is responsive to physiological demand and the cellular microenvironment. How cellular composition is influenced by the microenvironment, and the consequential impact of feedback signals from differentiated cells on HSC fate, is an important but poorly understood process. We have employed three approaches to probe this question. Using transcriptome data of 12 phenotypically enriched populations from human cord blood, first, we developed a gene expression-based statistical model that deconvolves cellular phenotypic compositions and transcriptome states of heterogeneous samples. The model empowers us to examine cellular heterogeneity in accordance with microenvironmental settings in an unbiased manner. Second, we constructed intercellular communication networks between the 12 populations. Quantitative network analyses revealed that individual cell types overlap in the signals that they receive, but specialize in the signals that they present (secrete) into the microenvironment, suggesting the existence of cell population-dependent feedback interactions. Subsequently, 34 HSC-targeting ligands were selected from the network, and their regulatory roles (apoptosis, differentiation, neutral, proliferation, quiescence, self-renewal induction) on HSC fate were assigned using an in vitro screening assay. By incorporating this regulatory information into the network, we were able to rank populations based on their impact on HSC fate. For example, neutrophil- and megakaryocyte-secreted ligands predominantly induced HSC apoptosis and self-renewal, respectively. Overall, our platform provides insight into how complex cellular systems thrive during development, homeostasis and repair.

O030 - Inferring Gene Regulatory Network from Variable Delay High Temporal Data

Short Abstract: Background: Recent advances in the live cell imaging techniques have enabled us to closely observe the cell lineages with high temporal resolution. Gene expression patterns in these cell lineages suggest regulatory pathways, which help us to decode various underlying mechanisms in developmental biology. Inferring gene networks from temporal data is fundamental but still a long standing challenge. Quite a number of gene network inference methods are proposed. However, every method has its own biases. Network Inference gets complicated with the dynamic delay associated to the gene regulation and the number of time points. This advocates the need of new gene network inference methods that can effectively handle the variable delay in high temporal live cell imaging data.
Method: Here we design a new gene network inference algorithm based on the dynamic local alignment of time series gene expression data. The novelty of our method is the use of gapped alignment to handle the variable delay in gene regulation. The local alignment can even detect the short term gene regulations in the cell lineages, that are undetectable by traditional correlation and Mutual Information based methods.
Results: We tested our method on both synthetic and C. elegans live cell imaging data and compared its performance against other popular methods like MIC, ARACNE and Banjo. The area under the curve (AUC) of our method is observed to be significantly higher when compared to others. Besides the well-established regulatory relationships we also predicted few new relationships that are worthy of subsequent experimental investigation.

O031 - High-throughput Computer Method for 3D Neuronal Structure Reconstruction from the Image Stack of the Drosophila Brain and Its Applications

Short Abstract: Drosophila melanogaster is a well-studied model organism, especially in the field of neurophysiology and neural circuits. Analyzing theDrosophila brain is an ideal start to understanding the neural structure. The most fundamental task in studying the neural network of Drosophila is to reconstruct neuronal structures from image stacks. Although the fruit fly brain is small, it contains approximately 100 000 neurons. It is impossible to trace all the neurons manually. This study presents a high-throughput algorithm for reconstructing the neuronal structures from 3D image stacks collected by a laser scanning confocal microscope. The proposed method reconstructs the neuronal structure by applying the shortest path graph algorithm. The vertices in the graph are certain points on the 2D skeletons of the neuron in the slices. These points are close to the 3D centerlines of the neuron branches. The accuracy of the algorithm was verified using the DIADEM data set. This method has been adopted as part of the protocol of the FlyCircuit Database, and was successfully applied to process more than 16 000 neurons. This study also shows that further analysis based on the reconstruction results can be performed to gather more information on the neural network.

O032 - Systematic study on microRNA regulation network from next generation sequencing

Short Abstract: MicroRNAs (miRNAs) are small non-coding single-stranded RNAs that regulate gene expression at post-transcriptional level through translational inhibition and mRNA degradation. Accumulated evidence demonstrates that miRNAs play significant roles in regulating genes that drive cancer progression and drug resistance.

Next generation sequencing technology has more advantages on the discovery of non-coding and small RNA than microarray without the limitations of probes and is also able to discover novel miRNAs when aligning reads to the genome. Here, we present an approach for an integration study of miRNAs and RNA sequencing data. This pipeline includes data processing and analysis modules for miRNA-seq and RNA-seq including quality control, alignment and quantification, together with down-stream integrations, which are all implemented in a data analysis and integration framework, Anduril [Ovaska et al. Genome Medicine 2010]. Our pipeline is capable of analyzing large amount of sequencing data in parallel with less time cost.

Our objective in this study is to identify candidate miRNA-gene pairs that are highly correlated at expression level and also from micro-RNA target prediction. In addition, we integrate exome-seq and RNA-seq data to study the effects of variants in miRNA-mRNA binding regions on their binding affinities. Candidate pairs are visualized at biological networks in a personalized mode.

We report here the candidate pairs with survival association in vivo breast cancer samples and miRNA regulation networks. Our results show this approach is useful and efficient in finding miRNA regulation behavior with survival effects in breast cancer and also discovering the impact on its regulation from genetics variations.

O033 - Evaluation of machine learning algorithms on protein-protein interactions

Short Abstract: Protein-protein interactions are important for the majority of biological processes including signal transduction, transcription, translation and understanding metabolic pathways. A significant number of computational methods have been developed to predict protein-protein interactions using proteins’ sequence, structural and genomic data. In the advent of high-throughput experimental studies, the amount of information regarding protein-protein interactions is nowadays at the whole proteomes scale. The exponential increase of the available number of known protein-protein interactions make machine learning and computational intelligence approaches the tool of choice in order to analyze, assign statistical significance and finally predict new interactions. We performed the comparative study of various ma- chine learning methods, training them on the set of known protein-protein interactions, using proteins’ global and local attributes.

O034 - Identification of genetic determinants of colony morphology switch in natural S. cerevisiae strain using a systems biology approach

Short Abstract: Colony morphology is a fascinating phenotype described in unicellular organisms as a possible step towards multicellularity. We studied the genetic determinants of a specific type of colony morphology, called filigreed morphology, observed at low frequency in natural S. cerevisiae strains. This phenotype is present in heterozygosis in the M28 natural S. cerevisiae strain. Filamentous structures are used by some pathogenic fungi, as Candida albicans, to invade human tissues. Transcriptional analysis by mean of Microarrays on cells grown in fermentable and not-fermentable carbon sources and Functional Enrichment Analysis identified the genes involved in the regulation of colony morphology switch and allowed to dissect the filamentation process. Our results support the hypothesis of an ecological function of filamentous phenotype in creating a community adaptable to changes of the environmental conditions. Whole genome comparative analysis on 12 M28 sporal derivatives of three different tetrads, whit Next-Generation Sequencing (NGS) approach, allowed to investigate the mendelian inheritance of filamentous morphotype. The SNPs calling procedure has been used to identify variants able to explain the morphotype differences between M28 meiotic segregants. We demonstrate that a number of three tetrads is sufficient to map a genetic trait with mendelian inheritance from NGS data. Moreover, by comparing M28 to the S288c fully annotated genome we have also found some putative new genes in M28 natural strain. Finally, RNA-seq based transcriptome analysis on the all M28 sequenced genome allowed to identify a gene expression profile associated to the filamentous morphotype and to confirm the candidate morphotype regulators genes.

O035 - A Comparative Approach to Inspect the Validity of Kinetic Metabolic Models

Short Abstract: Kinetic systems biology models can be used to monitor
the behavior of variables in a biological
system since they are able to define the system’s complete dynamical behavior. However, this predictive power depends on the validity of the
model structure and the associated parameters. That brings the need for
careful validation of the proposed model structures. Although there have
been tremendous effort in constructing kinetic models for different biological systems, not much effort has been put into their validation. There
have been only a number of approaches proposed which are either based on
application of statistical model selection algorithms or robustness analysis of dynamical systems. But their applications on real size biological systems are still lacking showing the need for
more straight forward and easily implementable methods. In this study,
we worked on a novel approach for the exploration of the validity of kinetic
models which does not suffer from dimension related problems. We based our approach on the evaluation of a kinetic model’s
predictive power through cross validation. As a reference point for this
evaluation, we used unsupervised data analysis
methods on the same test sets. We applied this comparative approach
on simulated data in order to see under which conditions incorporating
the model information in the analysis increases the predictive power. At
the current stage, we have concluded that simple experimental uncertainty at realistic levels is not enough to obscure the predictive
power of kinetic modeling. However, errors in the kinetic model structure
could lead to decrease in predictive power.

O036 - Identifying genes relevant to specific biological conditions in time course microarray experiments

Short Abstract: Microarrays have been extremely useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, analysis of the microarray data is a challenging task. One of the key problems with microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affect the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses risk of overfitting. Feature selection methods help in selecting relevant genes and, hence, improves the classification accuracy.
We have developed relative Signal-to-Noise ratio (rSNR), a novel feature selection method which ranks genes based on their specificity to an experimental condition. rSNR compares intrinsic variations, i.e. variations in gene expression within an experimental condition, with extrinsic variations, i.e. variations in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We tested feature selection methods on two time-series microarray dataset and one static microarray dataset. We found that rSNR performed better than other feature selection methods. Genes selected with high rSNR resulted in significantly better classification accuracies than genes with low rSNR. This indicates that rSNR discriminates the specific genes for classification against noise or irrelevant genes for an experimental condition.

O037 - Comparison of different graph-based pathway analysis methods on breast cancer expression data

Short Abstract: Pathway analysis methods are a frequently used bioinformatics approach to test enrichment or overrepresentation of differentially expressed genes in a given pathway. Probably the most popular among these methods is Gene Set Enrichment Analysis (GSEA). However, the main drawback of GSEA is that it handles pathways as gene lists omitting any knowledge of molecular interactions or graph structure. Recently, several bioinformatics algorithms integrating pathway topology information into pathways analysis were proposed: SPIA (Tarca et al. 2009), GGEA (Geistlinger et al. 2011), clipper (Martini et al. 2012) and PathNet (Dutta et al. 2012). We have compared and evaluated these graph-based pathway analysis methods using simulated expression data and graph information from different pathway databases (Pathway interaction database, KEGG, Reactome, Biocarta). To integrate graph data into our analysis we have developed an R package rBiopaxParser (Kramer et al. 2012) which imports BioPAX encoded pathway data into the statistical computing environment of R. Within R the parser is transforming data into a directed graph or an adjacency matrix and allows further editing of pathways to obtain suitable input for the before mentioned pathways analysis methods. Further, we have applied these algorithms to breast cancer expression data to test significance of WNT signalling pathways and sub-pathways parsed from several databases within the context of different molecular subtypes of breast cancer.

O038 - Image-derived generative models of Primary Neuron Morphology

Short Abstract: There exists a need in cell biology for concise descriptions of complex cell shapes and subcellular events with the ability to make predictive inferences based on learned trends. One solution to this is to construct a virtual cell complete with subcellular structures and protein location patterns. To address this need, we have previously described CellOrganizer, a open source system for learning three-dimensional image derived generative models of cell morphology and subcellular arrangements. Currently, CellOrganizer can model fibroblasts and other squamous shaped cells. We here describe a method for expanding the previous model to accommodate neuronal cell types. In particular, we extended a recently described semiparametric tree-structured stick breaking process to capture the distribution of neurites; this model is weakly parameterized by total length, curvature, volume, branching, and segment length of neurite populations. The model's parameter inference was tested with tracings from the NeuroMorpho database and also by acquiring and modeling confocal images of primary mouse trigeminal neurons expressing a surface GFP marker. In contrast to prior work, these parameters are biologically relevant, identifiable, learned from the empirical distribution of data, and do not require cell type specific assumptions. We found that the model is capable of generating a variety of neuronal structures accurately.

O039 - Connectivity map and the shared transcriptional network of nephrotic syndrome diseases

Short Abstract: Glomerular filtration barrier failure in nephrotic syndrome manifests with similar functional alterations irrespective of causation of the disease. Connectivity Map (CMap) studies have proven the feasibility of gene expression-based drug discovery. We hypothesized that:
1. Gene expression signatures in Focal segmental Glomerulosclerosis (FSGS), Membranous glomerulopathy (MGN), and Minimal-change disease (MCD) reflect the mechanisms relevant for disease.
2. The shared transcriptome allows for detailed description of common molecular mechanisms of the 3 diseases.
3. CMaP studies in the shared transcriptome create a comprehensive nephrotic syndrome connectivity map that relates enriched drugs and the affected transcriptional networks.
Glomerular gene expression levels were compared with living donor controls. The functional context was defined individually for each disease and for the shared transcriptome. CMap was used as a hypothesis generating tool whose resulting drug-transcriptome networks were displayed using Cytoscape.
Known and unknown pathways were identified among transcripts differentially expressed in disease vs. controls. Lymphocyte/interferon-related pathways were detected for FSGS; inflammatory pathways were detected for MCD (NFkappaB-signaling). MGN signatures were enriched for antigen presentation pathways and PLA2R1 biology. The shared transcriptome was characterized by fibrosis pathways and slit diaphragm disturbances. CMap detected 38 perturbagens connected with the shared transcriptome. Drugs with known nephrotoxic properties (Gentamicin) mimicked the transcriptional changes seen in disease, while those with known beneficial effects on renal function (Lanatoside) reversed these transcriptional changes.
These findings suggest the use of gene expression data for disease characterization and for the screening of therapeutic and toxicological profiles of candidate drugs in nephrotic syndrome.

O040 - Dynamic networks reveal key players in aging

Short Abstract: Since susceptibility to diseases increases with age, studying aging gains importance. Analyses of gene expression or sequence data, which have been indispensable for investigating aging, have been limited to studying genes and their protein products in isolation, ignoring their connectivities. However, proteins function by interacting with other proteins, and this is exactly what biological networks (BNs) model. Thus, analyzing the proteins' BN topologies could contribute to understanding of aging. Current methods for analyzing systems-level BNs deal with their static representations, even though cells are dynamic. For this reason, and because different data types can give complementary biological insights, we integrate current static BNs with aging-related gene expression data to construct dynamic, age-specific BNs. Then, we apply sensitive measures of topology to the dynamic BNs to study cellular changes with age.

While global BN topologies do not significantly change with age, local topologies of a number of genes do. We predict such genes as aging-related. We demonstrate credibility of our predictions by: 1) observing significant overlap between our predicted aging-related genes and "ground truth" aging-related genes; 2) showing that our aging-related predictions group by functions and diseases that are different than functions and diseases of non-aging-related genes; 3) observing significant overlap between functions and diseases that are enriched in our aging-related predictions and those that are enriched in "ground truth" aging-related data; 4) providing evidence that diseases which are enriched in our aging-related predictions are linked to human aging; and 5) validating 76% of our predictions in the literature.

O041 - A re-ranking algorithm for gene regulatory network predictions using graphlets and graph-invariant properties

Short Abstract: Many algorithms have been proposed in the last decade that try to deduce the gene regulatory network from high throughput data such as gene expression measurements. These algorithms typically produce a prioritized list of links between transcription factors and target genes based purely on connections found in the data and fail to take into account the typical structure properties of gene regulatory networks. In this ongoing work we present an algorithm that uses a prioritized list of gene regulatory links as input and re-ranks this list based on various graph-invariant properties.
The algorithm functions as follows: using a simulated annealing framework, we create n smaller sub-networks of size k, 2k, k… nk by taking the top of the ranking at increasing threshold values. Next we calculate penalty functions based on the graphlet distribution, the original prediction rank and other graph-invariant properties for each of the sub-networks. We combine the penalty functions in a global fitness function of the ranked list, putting higher importance on the smaller networks.
Results show that by using this approach, the original ranking can be effectively re-ranked in such a way that true positive links are moved towards the top of the list, resulting in a significantly better prediction. However, challenges remain in automatically tuning the framework and finding additional graph invariant properties that can be used to drive the modification process.

O042 - Orphan enzymes survey, a decade later

Short Abstract: Millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes. In contrast, the «orphan enzyme activities» problem, which was reported for the first time a decade ago, corresponds to experimentally characterized activities that lack associated protein sequence. Here, we present an update of previously conducted surveys on orphan enzymes.
While the percentage of orphan enzymes has decreased from 38% to 22% in ten years, they are still more than 1,000 among the 5,000 entries of the Enzyme Commission classification. Though, the number of EC entries has increased considerably in the last years: more than 800 were created since 2010. We extended this study to local orphans: activities which have no representative sequence in a given clade but have one in other organisms. We observed an important bias in Archaea for local orphans and estimated the presence of candidate homologous proteins that may be shared between Eukaryotes and Prokaryotes. Beside, an analysis of orphan enzyme connectivity in metabolic networks was made: it shown that many of them are not in a pathway and only few ones are neighbors of non-orphan activities. Finally, by studying relations between protein domains and catalyzed activities, we showed that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity of known enzyme families may solve a part of the orphan enzymes. Indeed, the discovery of new families may extend the landscape of enzymatic activities.

O043 - Dissecting the transcriptional dynamics of zebrafish heart regeneration through network-based approaches

Short Abstract: Zebrafish is a clinically-relevant model of heart regeneration. Unlike mammals, it has a remarkable heart repair capability after injury, and promises new advances for translational research. A systems-level understanding of the transcriptional responses following injury is needed to identify drivers of heart tissue repair, as well as potential targets for boosting this mechanism in humans. Using in silico approaches, here we identified and characterized gene modules that may be crucial for inducing heart regeneration. This was done through: differential expression analyses, time-series clustering techniques, inference of gene association networks from expression data, network clustering and functional analyses of candidate modules. Independent published datasets were investigated to generate and validate such models. Top candidate modules include genes known to prompt cell proliferation and heart repair in mammals, such as periostin. The transcriptional responses of key module-linked genes displayed relatively concordant dynamics across independent datasets. Potential novel players of heart regeneration were identified, and insights into putative regulatory mechanisms responsible for heart repair are delineated. Thus, our approach provides the basis for further experimental and computational investigations, including those carried out in our laboratories using alternative in vivo models.

O044 - Metabolic Network Inference using Sequence Kernels

Short Abstract: Metabolic networks allow the modeling of molecular systems to understand the underlying biological mechanisms in a cell, representing the set of metabolic pathways. However, there are a large number of them that remain unknown, as well as other known metabolic pathways where enzymes are missing. This work describes the integration of sequence data on methods to infer new metabolic pathways and missed enzymes using machine learning techniques. These methods are based on metric learning for inferring metabolic networks developed in kernel-based frameworks. By supervised it means that the reliable a priori knowledge about part of the true network is used in the inference process itself. The network is represented as a graph, where classification and regression inference methods are used (i.e.. Support Vector Machine, Kernel Matrix Regression and Kernel Canonical Correlation Analysis). Our first solutions compare different sequence kernels (i.e. Pfam, Motif and Spectrum) on these supervised methods. Several experiments are developed, in all cases these kernels outperform the accuracy when they are compare to non-sequence kernels. As a second solution, we present other general kernels that capture sequence similarity with respect to large sets of proteins, based on rational kernels. Rational kernels can represent and efficiently compute large size sequence problems using weighted finite-state transducers and automata. We report results of extensive experiments demonstrating that they compare favorably with other kernels in predicting metabolic networks.

O045 - CircuitsDB_2: integrating lincRNAs into the cancer genes regulatory network

Short Abstract: Eukaryotic cell development and viability are governed by multiple layers of gene expression control; impairment of the main regulatory elements in this process can lead to severe effects, including cancer. Recently, the discovery and characterization of numerous long noncoding RNAs (lncRNAs, > 200 bps long) transcripts in human is having a profound impact in understanding the functioning of a cell. Roles for lncRNAs as drivers of tumor suppressive and oncogenic functions are currently reported in several cancer types.

In spite of several studies focused on the reconstruction of transcriptional or microRNA-mediated regulatory network, there is a substantial lack of computational models analyzing the coordinate interplay of microRNAs and lncRNAs within the human regulatory circuits, and their effect on cancer-related genes in particular.

We previously built CircuitsDB, a web-server devoted to the study of human and mouse microRNA- mediated regulatory circuits. Here, we introduce CircuitsDB_2, an improvement and extension of our previous work, in which we systematically study, by computational means, the coordinate role of microRNAs and lincRNAs within the human regulatory network and their impact on cancer-related genes, i.e. genes whose mutations or dis-regulation are known to be functionally involved in tumor initiation and progression.

CircuitsDB_2 internally contains a large number of regulatory circuits, composed by TFs, genes, microRNAs and lincRNAs interacting together, in the human genome. The resulting integrated network can be the starting point for a deeper understanding of their system-level properties. Furthermore, CircuitsDB_2 identifies novel microRNA-lincRNA-gene connections thus providing candidates for experimental validation.

O046 - Key Cardiovascular Disease Genes

Short Abstract: The structure of protein-protein interaction (PPI) networks has already been successfully used as a source of new biological information. Even though cardiovascular diseases (CVDs) are a major global cause of death, many CVD genes still await discovery. We explore ways to utilize the structure of the human PPI network to find important genes for CVDs that should be targeted by drugs. The hope is to use the properties of such important genes to predict new ones, which would in turn improve a choice of therapy. We propose a methodology that examines the PPI network wiring around genes involved in CVDs. We use the methodology to identify a subset of CVD-related genes that are statistically significantly enriched in drug targets and 'driver genes'. We seek such genes, since driver genes have been proposed to drive onset and progression of a disease. Our identified subset of CVD genes has a large overlap with the Core Diseasome, which has been postulated to be the key to disease formation and hence should be the primary object of therapeutic intervention. This indicates that our methodology identifies 'key' genes responsible for CVDs. Thus, we use it to predict new CVD genes and we validate over 70% of our predictions in the literature. Finally, we show that our predicted genes are functionally similar to currently known CVD drug targets, which confirms a potential utility of our methodology towards improving therapy for CVDs.

O047 - ANALYSIS OF MULTIGENE DISORDERS USING LYNX SYSTEM

Short Abstract: Gaining greater understanding of molecular mechanisms underlying common multigene disorders (e.g. autism, schizophrenia, diabetes) is a major challenge of biomedical research. Construction of predictive models of these mechanisms critically depends on the availability of efficient algorithmic approaches for mining of high-throughput genomic data integrated with clinical observations and prior knowledge about genotype-phenotype relationships.
We present an approach and a supporting computational platform LYNX (http://lynx.ci.uchicago.edu/) for the analysis of complex heritable disorders from the systems biology perspective. Our approach is based on large-scale integration of genomic and clinical data and various classes of biological information from over 40 public and private databases. This data is used for the identification of genes and molecular networks contributing to phenotypes of interest, as well as for the prediction of additional high-confidence disease genes to be tested experimentally. Our analytical strategy includes: (a) the enrichment analysis of high-throughput genomic data; (b) feature-based gene prioritization and (c) the development of network-based disease models for the identification of molecular mechanisms involved in disease pathogenesis. Networks-based gene prioritization leverages previous work of Dr. Börnigen-Nitsch on PINTA system. The algorithms were modified to accommodate a variety of weighted data types for gene prioritization. We will illustrate the utility of our approach using the prediction of the molecular mechanisms underlying the brain connectivity disorders (e.g. agenesis of corpus callosum, autism) and other neurodevelopmental disorders as examples. This knowledge will eventually lead to the development of efficient diagnostic and therapeutic strategies. LYNX tools and algorithms are also exposed as web services.

O048 - Probic: Simultaneously detecting coexpression modules and their regulatory patterns

Short Abstract: Probic is a tool to simultaneously search for coexpression modules and their regulatory motifs.

The key of ProBic-II framework is a data integration strategy relying on the Probabilistic Relational Models (PRMs).By taking the advantage of Bayesian integration of multiple data types, ProBic-II optimizes the combined task of learning genes and their motifs in an iterative way (EM based strategy). First, ProBic-II discovers tightly co-expressed modules (biclusters). Then, genes in these biclusters are used as query to retrieve regulation patterns. Updated gene list are again used as seeds for coexpression modules. These alternate steps end when reaching a stable number of genes both in biclusters and regulation patterns.

We applied ProBic-II on a large scale Escherichia coli(E.coli) compendium together with regulatory motif data obtained by screening the whole genome with known motifs extracted from RegulonDB. By applying ProBic-II to the binding site information and microarray compendium, we confirmed 54% of the known transcriptional interactions in E. coli.

O049 - ProteoMirExpress: inferring microRNA-centered regulatory networks from high-throughput proteomic and transcriptome data

Short Abstract: MicroRNAs (miRNAs) regulate gene expression through translational repression and RNA degradation. Recently developed high-throughput proteomic methods measure gene expression changes at protein levels, and therefore can reveal the direct effects of miRNAs’ translational repression. Here, we present a web server, ProteoMirExpress that integrates proteomic and mRNA expression data together to infer miRNA-centered regulatory networks. With both high throughput data from the users, ProteoMirExpress is able to discover not only miRNA targets that have mRNA decreased, but also subgroups of targets whose proteins are suppressed but mRNAs are not significantly changed or whose mRNAs are decreased but proteins are not significantly changed, which were usually ignored by most current methods. Furthermore, both direct and indirect targets of miRNAs can be detected. Therefore ProteoMirExpress provides more comprehensive miRNA-centered regulatory networks. We use several published data to assess the quality of our inferred networks and prove the value of our server. ProteoMirExpress is available at http://jjwanglab.org/ProteoMirExpress, with free access to academic users.

O050 - Network Based Classification Model for Deriving Novel Drug-Disease Associations

Short Abstract: The drugs can act on multiple targets rather than one target. This polypharmacological concept can lead to drug-repositioning which finds new indications for existing drugs, or adverse side-effects due to the molecular mechanisms that may underlie a chemical–disease connection. We developed a novel method to find new drug-disease associations. First, we generated an integrated genetic network using combinations of interactions including protein-protein interaction, genetic interaction, gene regulation, and inferred interaction. Drug or disease which does similar action in biological system is identified on the network, compared with known drug-disease associations. We measured drug-target similarity between the drug in a known drug-disease association and other drugs, and disease-gene similarity between the disease in a known drug-disease association and other disease. Furthermore, we pulled out the common functional modules from known drug-disease associations, and quantified the relation of those modules and the particular drugs or diseases. These measurements are converted into features for each unknown drug-disease pair, and eventually function as a classifier. The classifier showed AUC of 0.97 when we used CTD(Comparative Toxicogenomics Database) database as test data, and we identified new drug-disease associations.

O051 - Different stability of gene networks formed following induction of long-term potentiation identified using Boolean models

Short Abstract: Long-term potentiation (LTP) is a well-accepted model of long-term memory formation in the mammalian brain, shown to last for periods of weeks. This persistence is thought to be underpinned by altered gene expression. Indeed, the late phase of LTP is dependent on activation of distinct gene regulatory networks (GRN) at different time intervals after induction. Shortly after LTP induction, these networks have been shown to contribute to expansion of the gene response and regulation of key cell signalling mechanisms. Later GRNs are likely to contribute to synaptic reorganisation and a homeostatic gene response important in the consolidation of LTP.

We hypothesised that the temporally specific LTP-related GRNs would show different dynamic properties over time, which reflect the compromise between stability to background noise and sensitivity to the meaningful perturbations.

We carried out simulations of random Boolean networks (RBN) for GRNs identified at 20 min, 5 h, 24 h after LTP induction and as a benchmark, RBNs derived from a published yeast transcriptional network. First, we found that the LTP-related GRNs possess similar dynamic properties to the yeast network. Then we studied the dynamic differences between the 3 LTP networks and found that the rapidly
induced GRNs are more sensitive to changes in the initial conditions, while the late network possesses a more stable architecture.

These data suggest that the LTP-related gene netwoks are vulnerable to change at early times but that the networks become more stable over time, a property consistent with stabilisation of LTP.

O052 - Revealing missing parts of the interactome via link prediction

Short Abstract: Protein interaction networks (PINs) are often used to "learn" new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared immediate neighborhoods of the nodes to be linked. As such, they have limitations. Thus, in order to comprehensively study what are the topological properties of nodes in PINs that dictate whether the nodes should be linked, we had to introduce novel sensitive LP measures that overcome the limitations of the existing methods.

We systematically evaluate the new and existing LP measures by introducing "synthetic" noise to PINs and measuring how well the different measures reconstruct the original PINs. Our main findings are: 1) LP measures that favor nodes which are both "topologically similar" and have large shared extended neighborhoods are superior; 2) using more network topology often though not always improves LP accuracy; and 3) our new LP measures are superior or comparable to the existing measures. After evaluating the different methods, we use them to de-noise PINs. Importantly, we manage to improve biological correctness of the PINs by de-noising them, with respect to "enrichment" of the predicted interactions in Gene Ontology terms. Furthermore, we validate a statistically significant portion of the predicted interactions in independent, external PIN data sources.

O053 - Differential regulatory roles of SOX2 in embryonic and neural stem cells

Short Abstract: It is well known that the transcriptional factor SOX2 is an essential TF for early development as well as for the propagation of undifferentiated embryonic stem cells(ESC). In addition, SOX2 has an essential role in development of neural stem cell(NSC). In an effort to elucidate the difference in regulatory mechanisms of SOX2 in ESC and NSC, we performed the ChIP-chip experiment to identify SOX2 target genes in human NSC. The result was compared to the equivalent data in human ESC publicly available. Target genes were significantly different between ESC and NSC. Gene set analysis showed that target genes were enriched in different categories of GO and pathways. We hypothesized that there must be cell type specific cofactors for Sox2, and it’s verified using random permutation test. Several transcription factors which are well-known essential factors in ESC, NSC are included in those cell type specific cofactors. We also constructed cell type specific Sox2 target gene networks and expanded it using Protein-Protein interactions with cofactors. And then, network clustering analysis was performed. Cell type specific clusters were observed. In ESC network cluster, TGF beta signaling highly related with SMAD3, SMAD4 and SP1 was significant. Whereas, in NSC network cluster, several genes(CREB, TP53, HDAC1, YY1, SMAD2, and STAT3) popped out and it looks like EP300 plays role as a network hub. Finally, we have analyzed the gene expression profiles of cell type specific network clusters. The result would provide useful information to understand the role of SOX2 in differentiation of ESC to NSC.

O054 - Reconstruction of a gene regulatory network for T cells from publicly available microarray data

Short Abstract: Knowledge about the regulatory mechanisms during T cell activation is essential to understand the physiology and pathophysiology of this important hub of the adaptive human immune system. There is clear evidence that important regulatory processes during this transition emerge from the underlying transcriptional program.
The aim of our work is to the structure of regulatory mechanisms during T cell activation. Therefore, we created a large expression data experiment from publicly available microarray experiments as basis for gene regulatory network inference. After intensive manual exploration of GEO and thorough quality checks, we identified six appropriate microarray experiments from two different types of Affymetrix microarrays, altogether forming a set of 82 arrays. We process every experiment individually using background correction and quantile normalization. A meta-expression matrix of all experiments is created by mapping the probes to Ensembl genes identifier for all engaged microarray platforms. This meta-expression set contains all initially contributed time points and is again quantile normalized to reduce batch/lab specific expression patterns between the experiments. To infer gene regulatory networks from this data set, we apply co-expression analysis to those genes detected as differentially expressed. For each pair of genes a correlation value based on their expression pattern is calculated and a gene-gene interaction is assumed for highly correlated pairs. The resulting networks are evaluated using public interactions databases and the rich literature on T-cell differentiation.

O055 - A mechanistic and dynamic model for T helper 17 cell differentiation

Short Abstract: The differentiation of naive CD4+ helper T cells into effector T cells is largely determined by extracellular cytokine signals. The cytokine signals activate the differentiation specific transcription factors and control the dynamics of underlying regulatory mechanisms. For T helper (Th) 17 cell differentiation, the critical cytokines are TGFbeta and IL6 which activate the key transcription factors, RORgammat and STAT3. In this study, we construct a mathematical model to describe the dynamics of the core components, which drive the Th17 cell differentiation. Our minimal model consists of the key transcription factors (RORgammat and STAT3) and two cytokine inputs (TGFbeta and IL6) driving the differentiation process. The model is implemented in the form of nonlinear ordinary differential equations to model the population average of the core components. The mRNA levels described by the model can be combined with time-course RNA sequencing data (B6 mice) through a Bayesian framework, which provides us with a data driven, probabilistic description of model parameters and outputs. Our model is capable of reproducing realistic dynamics in four different cytokine conditions; for two of these conditions we have experimental time-course RNA-Seq data which our model is able to reproduce. Furthermore, our model can be used to generate predictions to hypothesize and design new wet-lab experiments. For example, we can determine cytokine conditions that lead to an increased risk of differentiation failure.

O056 - Systems Pathology: Stratification of antibody subsets by making use of the Web Interface Protein Atlas (WIPA)

Short Abstract: The Web Interface Protein Atlas (WIPA) (www.toponostics.org/wipa) promotes computational systems biology/pathology studies by comparing protein expression of any gene set of interest (e.g. KEGG pathway, gene ontology, protein function) in millions of immunohistochemical images (IHC). Users can intuitively determine differential expression of protein sets of choice in any combination of tissues by analyzing differential protein expression in a gene-centered as well as in an expression-centered view. A query “Validation score of antibody” has now been integrated to enable the selection of antibodies according to their performance in immunohistochemistry (IHC), immunofluorescence (IF), Western blotting (WB) and on proteinarrays (PA) using the categories supportive, uncertain, non-supportive and unknown. In Proteinatlas (Version 10), 14070 genes/proteins are detected by 18053 antibodies, of which 11696 are called supportive by protein array (PA) analysis, 9935 by WB, 4360 by IH, and 3423 by IF. Combining WB and PA, “supportive” is shared by 6631 antibodies. Adding IH and IF, 1627 and finally 717 antibodies remain supportive. Cytochemical stainings of 289 from 717 “supportive” antibodies (Proteinatlas Version 7), quantitated by Definiens XD software (Data A), were compared to the original Proteinatlas classification (Data B) and to published RNA seq data (Data C). Correlation coefficients (CC) of 0.75 and higher revealed comparable performances by only 12 antibodies in all three data sets. CC of data A and B gave 75, data C and B gave 21, data A and B gave 33 antibodies above CC of 0.75. Besides the four-step Proteinatlas classification, reasons of this poor performance are manifold.

O057 - Approaches to Modelling the Mouse Response to Trypanosomiasis

Short Abstract: Mouse models for trypanosomiasis have been studied extensively in order to reveal the differences between the immune response in Trypanosoma congolense resistant and susceptible mice. However, despite the availability of high-throughput gene expression data for susceptible and resistant strains, there is a lack of a network systems understanding on how genes interact during infection, and how these interactions might relate to the difference in susceptibility between mouse strains.

For each strain, A/J and C57BL/6, theoretical Boolean interaction networks were built from known pathways for Natural Killer (NK) cells. The state of the networks in the uninfected strains was first simulated using Cytoscape with the SimBoolNet plugin. We also checked the models using a generalized framework for interactomes, which can be used to investigate time-varying topology. When the results were compared against microarray gene expression data we found a close similarity between the network simulation results and gene expression for both strains.

The principal difference between these initial murine models of the NK cell network was found in the CD2 superfamily of receptors, including 2B4 and Ly9, which contain switch motifs. Also, the model of the susceptible A/J strain suggests the receptor NK-P1B is not expressed.

This preliminary work highlights the importance of adopting a differential networks approach to understanding the difference in immune response between different strains. We will now extend the work to model the differences in response to infection.

O058 - AllegroMCODE: a GPU-accelerated bioinformatics application for large-scale protein interaction networks

Short Abstract: Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. The architecture of molecular networks can reveal important principles of cellular organization and function. Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multi-molecular machine.
A protein-protein interaction (PPI) network can be represented as a graph where proteins are nodes and their interactions are edges. Protein complexes can be identified as highly interconnected subgraphs and computational methods are now inevitable to detect them from protein interaction data. In addition, High-throughput screening techniques such as yeast two-hybrid screening enable identification of detailed protein-protein interactions map in multiple species. As the interaction dataset increases, the scale of interconnected protein networks increases exponentially so that the increasing complexity of network gives computational challenges to analyze the networks.
We present a fast parallel implementation using commodity graphics hardware based a well-known sequential complex finding algorithm of MCODE to address the computational challenge. Our parallel algorithm is implemented on the NVIDIA parallel computing architecture of CUDA. It is evaluated for a various kinds of large-scale PPI networks. Our GPU accelerated implementation using the latest NVIDIA graphics hardware achieves a speedup of two orders of magnitudes compared to the original MCODE in the latest CPU for lager-scale protein-protein interaction networks.

O059 - Simulating Lateral Inhibition in the Pancreatic Duct

Short Abstract: The adult pancreas is composed of exocrine as well as endocrine tissue. The endocrine cells are organized in the islets of Langerhans, in which reside the Insulin secreting beta-cells.
In embryonic development the islets arise from Neurogenin3 (Ngn3) expressing cells in the growing pancreatic ducts. After ablating from the ducts these cells migrate into the surrounding mesenchymal tissue.
However, only some cells of the duct become Ngn3 positive. It is surmised that they prevent their neighbors from also expressing Ngn3 by Delta-Notch signaling from cell to cell.

Here we model the negative feedback loop consisting of Ngn3 and Hes1, driven by the Delta-Notch pathway, in a cell- and gene-based program simulating a growing pancreatic duct.
The concentration of gene products in each 'cell' of the duct is shown in 'virtual in situ stainings' using the Java and Java 3D API.

The negative feedback loop can show oscillatory behavior unless the Dll1 mRNA decay rate falls below a certain threshold.
This gives rise to a stable pattern of Ngn3 positive cells surrounded by Hes1 expressing cells.
In an hexagonal cell neighborhood the ratio of Ngn3 to Hes1 expressing cells is 1:2 which is too high compared to experiment.
To get more realistic values of the ratio we added Nkx6.1 and Ptf1a to the network.

At this point we conclude that the negative feedback loop between Ngn3 and Hes1 in pancreatic ducts allows for oscillatory behavior which can change to stable patterning under variation of the oscillation period influencing decay rate(s).

O060 - A fast and efficient algorithm to eliminate thermodynamically infeasible loops from flux distributions

Short Abstract: Constraint-based modeling methods (e.g., flux balance analysis, FBA) are routinely used to predict metabolic phenotypes, such as growth rates, ATP yield, and the fitness of gene knockouts, and were recently applied to predict drug targets [1, 3]. One frequent difficulty of FBA solutions is the existence of thermodynamically infeasible cycles, which disturb the predicted fluxes. One technique to remove such loops minimizes the sum of absolute fluxes through the metabolic network [2]. However, this approach also changes many other fluxes, ignoring the existence of alternative feasible flux distributions. Comparisons to ‘omics’ data indicates that the minimum flux solution frequently does not reflect the biologically realized flux distribution. Schellenberger et al proposed a version of FBA that avoids loops based on the incorporation of thermodynamic laws into the calculation; however, this solution uses MILP, making it computationally very expensive compared to standard FBA [4].
We propose a simple post-processing of FBA solutions (cyclefreeFBA), which removes cycles from a given flux distribution without affecting fluxes without disturbing other fluxes not involved in the loops. Our method requires only one additional linear optimization, making it computationally efficient. The algorithm works by minimizing the sum of absolute fluxes, but conserving the exchange fluxes as well as the direction of all internal fluxes.

References
[1] Colijn C, et al. PLoS Computational Biology 2009; 5: e1000489
[2] Holzhütter HG. (2004), European Journal of Biochemistry, 271: 2905–2922.
[3] Plata G, et al. Molecular Systems Biology, 6 (2010), p. 408
[4] Schellenberger J, et al. Biophysics Journal. 2011;100(3):544–53

O061 - A Human regulatory network database: mining condition-specific transcription networks from composite genomic information

Short Abstract: We propose an approach for identifying context-specific transcription networks from a large collection of gene expression change profiles. Using a composite gene-set analysis method (Nam et al. 2006), we combined the information of transcription factor binding site (TFBS), Gene Ontology or KEGG pathway gene sets, and gene expression fold-change profiles. Thereby, we tried to address the fundamental but largely open question of which transcription factors (TFs) regulate which functions or pathways under differing cell conditions? As a result, three-dimensional information among the TFs, their functional targets and cell conditions are readily obtained. Because transcription networks inferred in our approach have multiple lines of evidences (more than three filtering criteria), a substantial portion of such predictions are very likely true. We validated four novel targets (Gadd45b, Dusp6, Mll5 and Bmp2) of E2F1 in HeLa cells using ChIP, and validated EMT transcription networks from literature.

O062 - MeDUSA: a sage-based tool for computing the capacitance of a metabolic network

Short Abstract: Optimization-based analysis of metabolic networks has become a central topic in systems biology. The primary objective is the prediction of metabolic capabilities in genome-scale metabolic networks while using few information. This can be achieved by stating an objective function (e.g., biomass production) and seeking its maximal value within the feasible domain defined by the physico-chemical constraints governing the living organism. Several optimization-based approaches have been proposed, most of them are based on the seminal approach flux balance analysis (FBA) (Varma, 1993), which investigates reaction fluxes in a metabolic network at steady state. Recently, (Lahrlimi, 2012) proposed a novel method which not only estimates the maximum theoretically possible product yield, but also helps to design metabolic engineering strategies to increase the network capabilities by using a suitable chemically feasible transformation called ‘the stoichiometric capacitance’. In extension to this, we present our tool MeDUSA (MEtabolic Design Using SAge) which, given a metabolic network and an objective function, (i) computes the stoichiometric capacitance, (ii) investigates the effect of adding the stoichiometric capacitance and (iii) visualizes the changes in the importance of metabolic reactions for performing the network objective. MeDUSA has been implemented using SAGE, a free open-source python-based software that smoothly handles the use of advanced optimization packages (Stein, 2013). MeDUSA is useful for investigating the capabilities of metabolic networks under different growth conditions. Accordingly, we believe that MeDUSA will be beneficial to researchers in the fields of systems biology and biotechnology.

O063 - timeClip: viewing the pathway along time

Short Abstract: Recently much attention has been directed towards the study of gene sets in the context of expression data analysis. A microarray experiment typically provides a list of differentially expressed genes between static conditions that represent the starting point of a highly challenging process of result interpretation. However all the biological processes within a cell are dynamic and the right experimental design to investigate these trends is the time course. Unfortunately only few pathway analysis methods account for time dependent design and none of them exploit the topology to identify the portion of the pathway mostly involved in the biological problem studied.
We present timeClip, a new method to identify the time-dependent activation of different portions of a biological pathway. timeClip is based on two step: i) the whole pathway is firstly tested for time dependency and ii) if the previous test is significant it exploits Gaussian graphical theory to decompose a pathway into small connected components associated each other through a junction tree. Each connected component is tested for its time dependency and then their p-values are used to calculate a score for each portion of the pathway.
We use timeClip in a dataset of mouse muscle regeneration with 27 time-points. We identify 131 time dependent pathways that chronologically order regeneration phases into inflammation, proliferation, development and muscle mass increase. On Innate Immune System timeClip identifies portions of the pathways sequentially activated evidencing an early response to the injury followed by a slow progression to the normality.

O064 - De novo analysis of electron ionization mass spectra using fragmentation trees

Short Abstract: Electron ionization (EI) is the most common form of ionization for GC-MS analysis of metabolites. This hard ionization method results in a mass spectrum not necessarily containing the molecular ion peak. Methods for automated analysis of EI mass spectra are highly sought, but currently limited to database searching and rule-based approaches.

We present the automated analysis of high resolution EI mass spectra based on fragmentation trees that are constructed by automated signal extraction and evaluation. These trees explain relevant fragmentation reactions and assign molecular formulas to fragments. The method enables the identification of the molecular ion and the molecular formula of a metabolite if the molecular ion is present in the spectrum, even if it is of very low abundance or hidden under contaminants with higher masses. These identifications are independent of existing library knowledge.

We apply the algorithm to a selection of 50 metabolites and demonstrate that in 78% of cases the molecular ion can be correctly assigned. The fragmentation trees allow the assignment of specific relevant fragments and fragmentation pathways even in the most complex EI-spectra in our dataset. On a simulated dataset of 22 compounds with annotated fragmentation pathways, we find that fragmentation trees explain the origin of ions in accordance to the literature. No peak is annotated with an incorrect fragment formula and 79% of the fragmentation processes are correctly reconstructed.

Our approach enables the automated analysis of metabolites independent of existing library knowledge and thus has the potential to support the explorative character of metabolomics studies.

O065 - The Core Diseasome

Short Abstract: Large amounts of protein–protein interaction (PPI) data are available. The human PPI network currently contains over 56 000 interactions between 11 100 proteins. It has been demonstrated that the structure of this network is not random and that the same wiring patterns in it underlie the same biological processes and diseases. In this paper, we ask if there exists a subnetwork of the human PPI network such that its topology is the key to disease formation and hence should be the primary object of therapeutic intervention. We demonstrate that such a subnetwork exists and can be obtained purely computationally. In particular, by successively pruning the entire human PPI network, we are left with a ‘‘core’’ subnetwork that is not only topologically and functionally homogeneous, but is also enriched in disease genes, drug targets, and it contains genes that are known to drive disease formation. We call this subnetwork the Core Diseasome.
Furthermore, we show that the topology of the Core Diseasome is unique in the human PPI network suggesting that it may be the wiring of this network that governs the mutagenesis that leads to disease. Explaining the mechanisms behind this phenomenon and exploiting them remains a challenge.

O066 - Systematically Studying Kinase Inhibitor Induced Signaling Network Signature by Integrating both Therapeutic and Side Effects

Short Abstract: Substantial effort in recent years has been devoted to analyzing data based large-scale biological networks, which provide valuable insight into the topologies of complex biological networks but are rarely context specific and cannot be used to predict the responses of cell signaling proteins to specific ligands or compounds. In this work, we proposed a novel strategy to investigate kinase inhibitor induced pathway signature by integrating multiplex data, e.g. KINOMEscan data and cell proliferation/mitosis imaging data. Using this strategy, a mathematical model was established to investigate the pathway signature in PC9 cell line when perturbed by a small molecule kinase inhibitor GW843682. PC9 specific pathway revealed the role of PI3K/AKT in modulating the cell proliferation process and the absence of two anti-proliferation links, which indicated a potential mechanism of abnormal expansion in PC9 cell line. Incorporating kinase inhibitor toxicity predicted by primary human hepatocyte specific pathway model, this system model was used to screen 27 kinase inhibitors in LINCS database and several candidates were suggested with an optimal concentration to suppress PC9 cancer cell expansion while avoiding severe damage to primary human hepatocytes. Drug combination analysis revealed that the synergistic effect region can be easily predicted based on a threshold which seemed to be an inherent property of each kinase inhibitor. Furthermore, this integration strategy can be easily extended to other specific cancer cell to be a powerful tool for drug screening before clinical trials.

O067 - E. coli traversal of environments by horizontal gene transfer in silico

Short Abstract: To adapt to novel environments, bacteria need to add the necessary enzymes and transporters to their metabolic networks. Corresponding genes are typically acquired via horizontal gene transfer [Pal2005], a process typically restricted to at most a few genes per uptake event. It is thus likely that adaptations to metabolically distant environments do not occur in a single step; instead, evolution likely proceeds through intermediate environments, such that each adaptive step requires the addition of a small number of genes. Here, we tested this ‘pre-adaptation’ hypothesis in silico.
We started our simulations from the metabolic network of Escherichia coli K12 [Feist2007]. We used flux balance analysis within the SyBiL framework (available on CRAN) to test if biomass can be produced in a given metabolic environment. To simulate the adaptation of E. coli to a universe of different environments, we assembled a large set of known metabolic reactions and transporters (modified from [Henry2010]), which acted as a repository for horizontal gene transfers.
In our simulations the E. coli network could live in 302 environments. 124 additional environments can be reached through addition of a single reaction; however, only 8 further environments are reachable from these expanded networks within a single step. We then characterized the distance to all environments in terms of the number of additional reactions. We found that all environments to which adaptation is possible at all (based on the universal repository of genes) can be reached with the addition of at most 12 reactions, with a median of only 2 reactions.

O068 - Systems Level Analysis of Breast Cancer Reveals the Differences between Lung and Brain Metastasis through Protein-Protein Interactions

Short Abstract: According to American Cancer Society, breast cancer is the second most common cause of cancer death among women. Generally, the reason of fatality is the metastasis in another organ, not the primer tumor in the breast. A better understanding of the molecular mechanism of the metastatic process may help to improve the clinical methods. For this purpose, we have used protein structure and protein networks together at the system level to explain genotype-phenotype relationship, and applied it to breast cancer metastasis.
We have built a comprehensive human PPI network, by combining the available protein-protein interactions data from various databases. Then, we have ranked all the interactions of human PPI network according to their relevance to genes known to be mediating breast cancer to brain and lung metastasis. We have formed two distinct metastasis PPI networks from high ranked interactions.
We have preformed functional analyses on brain/lung metastasis PPI networks and observed that the proteins of the lung metastasis network are also enriched in “Cancer”, “Infectious Diseases” and “Immune System” KEGG pathways. This finding may be pointing to a cause and effect relationship between immune system-infectious diseases and lung metastasis progression.
We have enriched the metastasis PPI networks with structural information both with available data in Protein Databank and with our protein interface predictions. In the interface prediction step, the most common protein-protein interface templates in lung metastasis are observed to be coming from bacterial proteins. This finding reinforced our claim about the relationship between lung metastasis and infectious diseases.

O069 - Stratified benchmarking of methods for gene regulatory network inference

Short Abstract: New methods for gene regulatory network (GRN) reconstruction from biological data are continuously being developed. While each method may be benchmarked against older methods upon publication, it is seldom independently benchmarked after publication on diverse data and against newer competing methods.

We created diverse data sets capturing a range of network properties such as: network size, connectivity, interampatteness, as well as data properties, such as: signal to noise ratio, number of samples and the condition number of the response matrix. As a start, we compare 6 different GRN inference methods selected for their capability to use steady state perturbation data.

The fact that these methods infer the same type of GRN means that we can focus on data and network properties. This gives the basis for informed decisions about under what conditions a method performs well and how to optimally utilize each method when the data is not fully informative.

We compare the different inference algorithms using a wide range of performance measures and investigate under what circumstances a given inference method gives informative network models.

We demonstrate that the quality of inferred gene networks in terms of mechanistic insight is highly dependent on the algorithm and properties of the network and data. The algorithm should therefore be chosen based on the expected properties of the network and data. Moreover, not only accurate measurements, but also experiments designed specifically to counteract intrinsic signal attenuation are required.

O070 - What is in Control of DNA Replication Timing ?

Short Abstract: Initiation of DNA replication at specific sites in the genome is known to be inefficient in metazoan where even the most efficient initiation sites fire in less than 10% of cell cycles. At the same time, on a genomic scale, one finds that the replication timing program is highly orchestrated, with specific regions in the genome replicated at reproducible time points throughout S-phase. In an attempt to reconcile these two somewhat discordant observations, many researchers have assumed that a yet unknown, active regulatory mechanism in the cell is responsible for coordinating replication timing.

Here, we present a mechanistic explanation for genome-wide replication timing patterns. Our model is purely stochastic, does not contain any active control components and reproduces existing replication timing data with extreme fidelity (correlation between prediction and experiment > 0.9), approaching the limits set by experimental noise.

The main findings of this study are:

- Integrative analysis of recently released ENCODE data identifies semi-local chromatin structure sufficient to fully determine global replication timing.

- Replication timing is a “systems” phenomenon emerging from the collective action of many uncoordinated, stochastic replication initiation sites. Replication timing cannot be understood by investigating individual initiation sites: deleting up to 30% of high probability initiation sites does not significantly change replication timing.

- Plasticity of replication timing across different cell types is a direct consequence of changes in the chromatin structure following cell differentiation.

- Existence of reproducible genome-wide replication patterns do not logically necessitate a complex, active control system in the cell.

O071 - Metabolic Modeling of Glioblastoma Subtypes

Short Abstract: Glioblastoma multiforme (GBM) is the most common form of malignant glioma, characterized by unpredictable clinical behaviors that suggest distinct molecular subtypes. With the tumour metabolic phenotype being one of the hallmarks of cancer, we have set upon to investigate whether GBMs show differences in their metabolic profiles.
The iMAT algorithm was used to reconstruct context-specific constraint based models (CBM) using our manually curated version of a published brain model and gene expression data of 159 GBM patients from four intrinsic molecular subgroups that were previously validated on five independent sample cohorts.
Context-specific CBM showed that serine biosynthesis pathway tends to be more essential in the molecular subgroups of intermediate prognosis (median survival, 1–4 years) rather than molecular subgroups of poor prognosis (median survival, 1< years). It was also observed that the activity of the serine pathway is negatively coupled with that of the branched chain amino acid degradation pathway.
Using single gene deletion studies, we found that phosphoglycerate dehydrogenase (PHGDH) is critical for the cellular program promoting the flux of serine metabolism, in particular for the subgroup of intermediate prognosis (median survival 3.32 years).
The results presented here provide new insights into the metabolic subtypes of GBM and pave the way for a system-level understanding of GBM metabolism that is cardinal for its diagnosis and treatment.

O072 - Graphlet-based edge clustering reveals pathogen-interacting proteins

Short Abstract: Prediction of protein function from protein interaction networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and assign the entire cluster with a function based on functions of its annotated members. Traditionally, network research has focused on clustering of nodes. However, clustering of edges may be preferred: nodes belong to multiple functional groups, but clustering of nodes typically cannot capture the group overlap, while clustering of edges can. Clustering of adjacent edges that share many neighbors was proposed recently, outperforming different node clustering methods. However, since some biological processes can have characteristic "signatures" throughout the network, not just locally, it may be of interest to consider edges that are not necessarily adjacent.

We design a sensitive measure of the "topological similarity" of edges that can deal with edges that are not necessarily adjacent. We cluster edges that are similar according to our measure in different baker's yeast protein interaction networks, outperforming existing node and edge clustering approaches. We apply our approach to the human network to predict new pathogen-interacting proteins. This is important, since these proteins represent drug target candidates.

O073 - Heterogeneous Data Integration to Inform Kinetic Model Building for Systems Biology

Short Abstract: Signaling pathways are central to the maintenance of correct cellular function, and their mis-function is well known to lead to a range of diseases. Within Systems Biology projects, such as the Virtual Liver Network, signaling pathways are studied by combining quantitative experimental data with kinetic models that represent the function of the pathway. An iterative process of experiment and refinement can lead to improved models and the generation of new hypotheses. However, the effect of small-molecule inhibitors on these pathways is not always at a single protein level (due to off-target effects or promiscuous binding), and parameter estimation for the models is often challenging.

Previously, we have developed SYCAMORE to assist with kinetic model building and parameter estimation. Here, we present tools that extend the capabilities of SYCAMORE, by focusing on small-molecule ligands. The tools are integrated into a web framework allowing users to evaluate data from a range of sources, such as reaction kinetics data from SABIO-RK and small-molecule binding affinity data from ChEMBL. Additionally, we have developed computational tools to improve coverage for cases where there are no data on the actual protein or compound of interest but data are available for similar proteins or compounds. The computational tools use protein 3D structural information from the Protein Data Bank (PDB) to perform structural alignments of binding sites and comparison of protein-ligand interactions. We will present their application in a case study based on work from Lorenza D'Alessandro, Ursula Klingmüller (DKFZ, Heidelberg), Tim Maiwald and Jens Timmer (University of Freiburg).

O074 - Extended multi-level logical regulatory and metabolic model for Quorum Sensing in Pseudomonas aeruginosa

Short Abstract: The pathogen Pseudomonas aeruginosa usually infects patients with immune system deficiencies. Its pathogenicity is related to the production of virulence factors. The number of infecting strains that are resistant to most current antibiotics increases. Thus, new antibacterial strategies consider inhibitors that are highly selective to targets that are directly involved in the regulatory pathways of virulence factors instead of using commonly applied growth inhibitors. This avoids selection pressure of resistance mutations. Here, we have analyzed by computational means whether enzyme inhibitors or receptor antagonists are more efficient in reducing the formation of small signaling molecules and virulence factors. Our approach provided a topology analysis which suggested that PqsE, PqsR, and an autoinducer are together required to form pyocyanin. We could further show that with a high dependence of the inhibition level PqsBCD inhibitors strongly reduce the levels of HHQ and PQS whereas PqsR antagonists clearly decrease the level of the virulence factor pyocyanin.

O075 - KBase: An Integrated Knowledgebase for Predictive Biology and Environmental Research

Short Abstract: The new Systems Biology Knowledge base (KBase) is integrating commonly used core tools and their associated data, and building new capabilities on top of the combined data. New functionality allows users to visualize data, create powerful models or design experiments based on KBase‐generated suggestions. Although the integration of different data types will itself be a major offering to users, the project is about much more than data unification. KBase is distinguished from a database or existing biological tools by its focus on interpreting missing information necessary for predictive modeling, on aiding experimental design to test model‐based hypotheses, and by delivering quality‐controlled data. The project leverages the power of cloud computing and high‐performance computing resources across the DOE system of labs to handle the anticipated rapid growth in data volumes and computing requirements of the KBase.

KBase is a collaborative effort designed to accelerate our understanding of microbes, microbial communities, and plants. It is a community-driven, extensible and scalable open-source software framework, and application system. KBase offers free and open access to data, models and simulations, enabling scientists and researchers to build new knowledge, test hypotheses, design experiments, and share their findings.

O076 - Cross-species alignment of gene-gene coexpression networks

Short Abstract: BACKGROUND: Animal models have been useful for improving our knowledge of molecular interactions underlying human diseases. However, often animal models fail to mimic human disease adequately. One way of validating the similarity of a model organism to its human counterpart is to integrate gene expression profiles from different studies and then identify conserved co-expression subnetworks across species via network alignment. However, gene-gene coexpression networks are densely connected and require the alignment of millions of weighted edges making the alignment problem computationally demanding. We implemented a network alignment algorithm Natalie, to find the optimal alignment of coexpression networks. However, NP-hardness of the optimization problem complicates the search for the best scoring alignment. Natalie uses Lagrangian relaxation approach in order to obtain lower and upper bounds to the solution.

RESULTS: Natalie calculates the best scoring alignment of two coexpression networks by optimizing an objective function that incorporates (i) similarity between nodes (genes) based on all-against-all BLAST and (ii) conservation of the degree of coexpression between pairs of genes. We assess Natalie’s ability to align two coexpression networks constructed using two large human and mouse liver gene expression datasets and by calculating empirical P-values for the alignment using a permutation test. The conserved subnetworks derived from the network alignment of the liver datasets showed strong concordance in terms of biological processes involved.

CONCLUSION: The results derived from Natalie show that the method scales well and produces meaningful conserved co-expressed clusters.

O077 - Ciruviz: a web-based tool for rule networks and interaction detection using rule-based classifiers

Short Abstract: The use of classification methods is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also the biological interpretation is of interest. Tools that aid the interpretation and visualization of the models are thus needed, and feature interactions may be of specific interest.
Using rule-based classifiers, we have previously proposed a rule-network visualization strategy that may be used as a heuristics to find interactions. The networks are constructed based on co-occurrences of rules conditions. Expressed in the form of Circos graphs the networks associate pairs of features that contribute the most to discerning the outcomes. This provides a fast visualization of the rules, and interacting conditions tend to be strongly connected in the network. We have now developed Ciruvis, a web-based tool for the construction of such rule networks. In addition, using both simulated and real data we have illustrated strengths and limitations of the tool and compared it to other methods. For instance, we constructed pairs of features without marginal effects but with a given probability to influence the decision when taken together; these pairs were clearly identifiable in the network as long as they were not obscured by more predictive features in the data. Removing single correlated attributes may solve this problem.
The rule networks enable a fast visualization method and provides a heuristics to generate a set of hypotheses about interactions. The tool will be freely available on the web.

O078 - Construction of Regulatory Network from Positive Symptoms (BA22) Candidate

Short Abstract: Human Brodmann Area (BA22) of the prefrontal cortex is believed to be responsible for positive symptoms and cognitive dysfunction in which function disruption often occurred in patients with psychiatric illness. Some discovered modules revealed genomic profiling of biological samples and gene expression pathways that would have been associated with different layers of cellular activities and genomic functions which allowed the identification of specific genetic network responsible for anti-cancer effect in schizophrenia. The differential expression of genes in the BA22 samples was investigated by t-tests comparing in microarrays for the schizophrenia and control samples. The genes of the corresponding probes with P values < 0.05 were defined as abnormally expressed and proposed as candidate genes for schizophrenia. Constructing the over and under expression genetic network which constis of total 139 over-expression genes and 336 under-expression genens and 3248 PPIs in this level one PPI network. Total 116 PPIs in up/down L1 network of clique-5, total 35 proteins, in which UBC was most important mediator of clique-5 network. A set of genes including CD4,PTPRC,LCK,ZAP70,FCGR3A. These genes formed an important genetic functionality of imune sustem, which illistrate the relationship between schizophrenia and autoimmune diseases. The finding indicates that pathway invoving human monoclonal antibody, immunoglobulin, immunosupresion and anti-cancer agents might need further investigation for better understanding of schizophrenia.

O079 - Open source biomedical event trigger recognition with optimized parameters

Short Abstract: Biomedical events present a key role in the understanding of biomedical processes and functions, unveiling complex molecular mechanisms. Automatic extraction of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster update of existing knowledge. Trigger recognition is a very important step in event extraction, since the following task(s) rely on its output. However, it presents various complex and unsolved challenges, namely the selection of informative features and textual context definition.
We propose an open source machine learning-based solution for biomedical event trigger recognition. It takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, namely linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event trigger type, selecting the features that have a positive contribution and optimizing the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event trigger type.
The proposed solution allows biomedical text mining researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making the application of such complex techniques a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition from scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available at http://bioinformatics.ua.pt/trigner.

O080 - Network-based gene prioritization using DTProbLog

Short Abstract: The recent revolution on sequencing technologies makes it possible to obtain the complete DNA sequence of a genome, including humans. Having data on polymorphisms at a genome-wide scale makes the challenge of identifying the genes that lead to quantitative phenotypic traits (i.e., quantitative trait loci or QTLs) amenable. However, a genetic study for such purpose (e.g., GWAS, eQTL) typically results in one or more regions on the DNA that associate to the phenotype of interest. Due to linkage disequilibrium, the associated regions often contain several candidate genes, complicating a direct biological interpretation. Several computational approaches have been developed to identify the most promising genes from these associated regions. This task is referred to as gene prioritization.
In this work, we developed a network-based gene prioritization approach that also extracts the subnetworks involved in transducing the signal from the gene predicted to be causal to the target genes (genes that are differentially expressed under the phenotype of interest). In our setting the problem of gene prioritization and subnetwork extraction was solved in a decision theoretic version of ProbLog, a novel probabilistic programming language based on logic programming and Prolog. We benchmarked our method on a yeast single gene knockout expression compendium and also applied it on a real dataset resulting from a pooled segregrant-pooled expression analysis. As result we identified novel causative genes responsible for ethanol production capacity in yeast.

O081 - Integrated visualization of pathways and large biological ontologies

Short Abstract: Ontologies are hierarchically structured data sets commonly used in biology and the size of such data sets continues to grow. Examples include databases such as KEGG Brite and Gene Ontology that group genes and/or their products into functional categories. Treemaps have proven to be a particularly efficient way to visualize such hierarchies in the context of analyzing and comparing gene expression
data.

Part of such analyses is the investigation of metabolic pathways, especially which proteins are
involved and in which order they appear along the pathway. This information is usually displayed by certain labeled graph layouts and colors are used to distinguish between proteins involved in different metabolic pathways. This makes it difficult, however, to also include information on gene expression levels in such a layout, as this is usually expressed by color gradients. Moreover, such graph layouts waste a large fraction of the often limited display space available (e.g. a computer screen). In contrast, treemaps
use all the available display space.

We present a new approach that integrates the visualization of gene expression data and metabolic pathways into various types of treemaps. The challenge is to compute an optimal layout of the pathways that respects the geometric structure of the underlying treemap. We demonstrate the applicability and the benefit of the new approach in the context of analyzing gene expression data for several organisms.

O082 - Analysing anti-cancer effect for DRD2 and HTR2A gene by protein protein interation network

Short Abstract: In this study, network biology and systemic bioinformatics data such as protein-protein interaction and genetic network were introduced for further analysis of dopaminergic and serotonergic associated genes related to cancer genetic network, which showed potential anti-cancer effect from dopamine and serotonin pathway.The candidate genes were selected from Phenopedia as input to human protein-protein interaction network which provides a straightforward viewer with sufficient functionalities to generate genetic network and tissue-specific expression profiles through the selection of different layers. LevelOne PPI and Query-Query PPI were constructed, which could be visualized the formulated genetic networks directly in the browser. DrugBank was used for query gene functions and interactions of potential drugs. The searched clique complex was matched from CORUM to find significant protein complex which might have potential anti-cancer effect.401 candidate genes collected from cancer groups . 706 QQPPIs network were generated which contained a maximum subgraph of 237genes with 697 interacters (QQPPIs). Top 30 genes were ranked as core genes for construction of cancer network. We formulated the candidate genes such as DRD2, ADORA2A, CALM1, HSP90AA1, JAK2, EGFR, STAT3, AR, TP53 with merged genetic network might be responsible for dopamine/serotonin pathway for anti-cancer effect.

O083 - The DOE Systems Biology Knowledgebase: Microbial Communities Science Domain

Short Abstract: The DOE Systems Biology Knowledgebase (KBase) is a new community resource for predictive biology. It integrates a wide spectrum of data types across the microbial, microbial community, and plant domains, and ties this data into a varied set of powerful computational tools that can analyze and simulate data to predict biological behavior, generate and test hypotheses, design new biological functions, and propose new experiments. The overarching objective is to provide a solid platform that supports predictive biology in a framework that does not require users to learn separate systems to formulate and answer questions spanning a variety of topics in systems biology research.

The microbial communities team is integrating both existing and new tools and data into a single, unified framework that is accessible programmatically and through web services. This will allow the construction of sophisticated analysis workflows by facilitating the linkages between data and analysis methods. The standardization, integration and harmonization of diverse data types housed within the KBase and data located on servers maintained by the larger scientific community will allow for a single point of access, ensuring consistency, quality assurance, and quality control checks of data.

We have begun by creating KBase data and analysis services that will link our core resources, which will allow clients to access data and analysis methods across these tools without programmatic burdens. New functionality, not currently available in our core tools, is being created within KBase using the programmatic interfaces.

O084 - Deriving coordinated conditions for stability and oscillations in metabolic systems by machine learning

Short Abstract: Metabolic systems tend to exhibit steady states that can be measured in terms of their concentrations and fluxes. However, determining the dynamic properties of such steady states in the absence of fully specified kinetic models remains an important challenge of computational systems biology.

Structural Kinetic Modeling (SKM) is a framework for the systematic analysis of local system properties like stability or oscillatory behavior. It provides a parameterized representation of the system Jacobian in which the model parameters encode information about the enzyme-metabolite interactions.

We recently presented a machine-learning approach for the evaluation of SKM experiments that enabled the detection of enzyme-metabolite interaction patterns that act together in an orchestrated manner to ensure stability. Here, we extend this methodology by comparing the previously applied decision tree classifier to the performance of other classifiers like random forests, support vector machines, and relevance vector machines. Using a set of simulated example models, we explore the potential of the classifiers to derive criteria for stability and metabolic oscillations. We also show how the derived criteria can be translated into distinct kinetic parameter sets specifying systems with the predicted properties in the observed steady state.

References:
Steuer, R. (2007). Computational approaches to the topology, stability and dynamics of metabolic networks. Phytochemistry 68 (16-18), p. 2139-2151

Steuer, R. et al. (2006). Structural kinetic modeling of metabolic networks. Proceedings of the National Academy of Sciences 103 (32), p. 11868-11873.

Girbig, D., et al. (2012). Systematic analysis of stability patterns in plant primary metabolism. PLoS ONE, 7(4), e34686.

O085 - Bicluster analysis in drug discovery

Short Abstract: Unsupervised bicluster analysis is a hot topic in Bioinformatics and has become an invaluable tool for extracting knowledge from high-dimensional -omics data. Biclustering simultaneously organizes a data matrix into subsets of rows and columns in which the entities of each row subset are similar to each other on the column subset and vice versa. This simultaneous grouping of rows (e.g. genes, bioassays, or chemical fingerprints) and columns (e.g. conditions or compounds) allows identifying new subgroups within the conditions, e.g. in drug design where researchers want to reveal how compounds affect gene expression (the effects of compounds may only be similar on a subgroup of genes) or for identifying chemical substructures that are shared by bioactive compounds. Standard clustering methods are not suited to tackle these kinds of problems. We therefore present a new biclustering approach, called FABIA, which goes far beyond the usually clustering concept. FABIA is a multiplicative latent variable model that extracts linear dependencies between column and row subsets by forcing both the hidden factors and the loading matrix to be sparse.
FABIA is a mathematical well-founded analysis technique that allows exploring high-dimensional data in an unsupervised manner and thereby shedding new light on the dark matter of many biological problems. During the poster session, we will present:
a) the FABIA model for extracting biclusters and their ranking according to information content;
b) results from a high-throughput compound screening;
c) biclustering ChEMBL’s bioactive small molecules (16 million chemical fingerprints times 1 million compounds)

O086 - Inferring the yeast salt stress response subnetwork from diverse, complementary data sources using integer linear programming

Short Abstract: To adapt to environmental stress, Saccharomyces cerevisiae undergoes widespread changes in gene regulation. While some key signaling proteins have been identified and characterized, the complex signaling network that coordinates these changes is incompletely understood. We present an integer linear programming-based approach to distill large-scale experimental data into predicted signaling pathways that control the yeast salt stress response. Our approach takes as input experimental data that examines the salt response from multiple, complementary perspectives: single gene mutants that confer a defect in stress resistance, proteins with an altered phosphorylation state during stress, and genes whose salt-responsive expression is dysregulated in fifteen single gene mutants corresponding to known regulators. Using these data, a background network of publicly available protein-protein, protein-DNA, and protein-RNA interactions, and an integer linear program, we infer a subnetwork that provides directed pathways through which the fifteen signaling regulator mutants perturb the regulation of their affected downstream targets. While previous methods for signaling network inference have addressed the task of inferring a directed subnetwork from expression profiles, our method also exploits additional sources of condition-specific experimental data. The objective function maximizes the inclusion of genes and proteins identified by fitness and phosphoproteomic assays while minimizing the total number of proteins in the inferred subnetwork. The resulting inferred subnetwork includes known or likely regulators of the salt response. Additionally, we predict the involvement of additional proteins that have not been previously annotated as being instrumental in the salt response; these are promising candidates for future experimental work.

O087 - Assessment of the global microRNA expression profile in the aged versus young cortex of mice and assessment of microRNA and target mRNA networks during aging

Short Abstract: Neurodegenerative diseases (NDs), e.g. Alzheimer’s disease, are considered non-linear, multi-factorial diseases with a characteristic late phenotypic onset during lifespan. Although genetic predisposition and sporadic events contribute to the disease development, those are not sufficient. Aging is considered an important risk factor for NDs. However, how aging impact on the late phenotypic onset of NDs is rather unknown.
MicroRNAs are small non-coding RNAs that fine-tune the transcriptome post-transcriptionally, thus modulating the proteome of a cell and therefore regulating complex cellular networks in a cell-autonomous or non-cell-autonomous manner.
MicroRNAs play crucial roles in nervous system development and in the maintenance of neuronal cell identity. They specifically function in synaptic plasticity and neuronal survival during aging and are involved in a number of diseases including NDs.
We used a system biology approach to address the role of microRNAs during aging and neurodegeneration. We assessed the global microRNA expression profile in the cerebral cortex of aged versus young wildtype mice by qPCR and deep sequencing approaches. It emerged that the relative changes of microRNA expression were in the range of 2-3 fold. A global trend of microRNA downregulation (43%) was observed in the aged cortex with 19% of microRNAs remaining unaltered. Quantitative RNA-seq data of aged versus young cortex will be analyzed and placed in context of differential microRNA expression during aging processes. We developed a mathematical model to understand the global microRNA network relation. We present the outcome of the combination of empiric and in silico data and the possible role of microRNAs in NDs.

O088 - BioModule: A Web Portal to Explore Modules from Complex Biological Network

Short Abstract: Proteins that participated in a same biochemical or cellular event are expected to form a locally dense machinery in the protein networks. Here, we propose a novel clustering algorithm, Clique Aggregation Method (CAM), to crop biological entities highlighted by topological features for both weighted and un-weighted protein networks (binary interactions). Constrains for cluster assignment are relaxed to preserve the insight for proteins involving in multiple cellular events. Compared with existed algorithms used for clusters identification, CAM performs well evaluated by F-measure on two experimental protein complexes for yeast interactome. The framework, BioModule integrated with CAM, provides a friendly web interface for data submission, cluster exploring and single cluster/whole batch downloading. BioModule not only can detect biological modules in minutes, but also performs GO (Gene Ontology) enrichment analysis and visualizes the results. Pre-calculated clustering from human, mouse, fly, nematode, yeast, and E. coli protein network datasets are also available in system demos. BioModule does not require any login requirement or installation and is freely available for use at http://hub.iis.sinica.edu.tw/biomodule

O089 - System biology approach and multivariate analysis of phenotypic data for drug discovery

Short Abstract: Despite growing evidence that neurodegenerative diseases (NDs) result from non-linear multifactorial components, our aim to treat them is still based on drug discovery programs that rely on simplistic approaches during compound selection and are frequently based on univariate analysis of data.
By contrast, here we used a system biology approach that takes advantage of high content screening platforms and delivers detailed multivariate profiles of cellular systemic function. By using phenotypic cell-based assays applied to chemo-biological screens, we queried specific physiological pathways with the aim of finding new active molecules and targets.
Our approach is based on the construction of a knowledge database built on a reference chemical library composed of approved drugs. This library has been screened for a series of cell-based assays and the phenotypes were aligned by similarity and known targets. To do this we used multivariate data obtained by micro-images to profile our phenotypes. We then used a set of statistical analysis combined with machine learning techniques based on neural network to classify the phenotypes. Finally, we matched the phenotypic clusters obtained to cellular functions and specific targets. Hence, we developed new computational tools with predictive knowledge for drug discovery that will be used to assign targets to unknown small molecules. This approach has the potential to identify novel molecules active on specific cellular pathways. Moreover, our tool could identify different chemotypes, which cause similar phenotypic effects, thus facilitating “scaffold hopping.”

O090 - KeyPathwayMiner - Identifying aberrant pathways by combining OMICS data and biological networks

Short Abstract: The emergent field of systems biology is providing life sciences with large biological networks reconstructed from data generated by recent advances in wet lab technologies. While these networks provide a static picture of the interplay of genes and their products, they fail to capture dynamic changes taking place during the development of complex diseases. We seek to fill this gap by integrating networks and OMICS datasets (DNA microarrays, RNA sequencing, genome-wide methylation studies, etc.) to extract connected sub-networks with a high number of aberrant genes. Efforts to tackle this challenge usually rely on setting unintuitive parameters for the underlying combined statistics. We circumvent these problems with KeyPathwayMiner introducing an easy-to-interpret model that performs at least as good as similar approaches when tested on real datasets. Given a biological network and a set of case-control studies, KeyPathwayMiner efficiently identifies and visualizes all maximal connected sub-networks that contain mainly genes that are aberrant, e.g., differentially expressed, in most studied cases. The exact quantities for "mainly" and "most" are modeled with two parameters (K, L) that enable the user to control the level of noise (i.e. not aberrant genes/cases) allowed in the solutions. We developed two slightly varying models (INES and GLONE) that fall into the class of NP-hard optimization problems. To tackle the combinatorial explosion of the search space, we designed a set of exact and heuristic algorithms utilizing a swarm intelligence approach, in this case ant colony optimization. KeyPathwayMiner has been implemented as both, a Cytoscape plug-in and a Java library.

O091 - PathVisio 3 : new features for pathway analysis and visualization

Short Abstract: In 2008, we presented the first version of our pathway visualization and analysis tool PathVisio. Since then, PathVisio has been used in a number of studies to create pathway maps, perform pathway statistics or visualize biological data on pathways. The core application of PathVisio has now been refactored using the OSGi framework (Open Service Gateway initiative) with the goal to achieve a better, modular system that can be easily extended with plugins.

We will focus on the newest features and extensions of PathVisio 3, namely

Plugin repository and manager: The new plugin manager helps users to search for available plugins in the repository and provides easy and fast installation.
Annotation of interactions: Biological interactions can now be annotated with identifiers from online databases, like Reactome or KEGG Reaction, to provide additional information.
Multi-omics data visualization: PathVisio 3 enables users to simultaneously visualize transcriptomics, proteomics, metabolomics and fluxomics data on biological pathways.
XMLRPC interface for PathVisio: PathVisio functionality can now be used from different programming languages directly.

PathVisio plugins are extensions of the core application that provide features relevant for a specific task. Plugins are accessible to users through the new plugin repository and they can be installed through the plugin manager from within the application. This is an important aspect of usability that will allow users to build an application with all the necessary modules relevant for their work. We will show some use cases to demonstrate the extended abilities of PathVisio 3.

O092 - Experimental validation of the FA/BRCA pathway boolean network model

Short Abstract: The FA/BRCA pathway repairs DNA interstrand-crosslinks (ICLs). Mutations in this pathway cause Fanconi anemia (FA), a chromosome instability syndrome with pancitopenia and sensitivity to ICL inducing agents. We present here a simplified version of a Boolean network model (BNM) of the FA/BRCA pathway previously developed by our group, as well as new robustness tests and some experimental validations. We reconstructed a minimal FA/BRCA network and studied its dynamical properties. our model predicts that FA cells: 1) activate the expression of alternative DNA endonucleases to process damage, 2) activate the non-homologous end-joining (NHEJ) pathway followed by the homologous recombination repair, and 3) divide, even with unrepaired DNA double strand breaks, activating a mechanism known as Checkpoint Recovery. These predictions were addressed in FA-A and wild-type cell lines exposed to mitomycin C. For prediction no. 1 the simulations and RT-PCR analysis concur that the expression of XPF endonuclease must change to repair DNA damage in FA-A cells. For prediction no. 2, the simulations and flow cytometry analysis concur that the NHEJ pathway, mediated by DNA-PKcs, is the main alternative DNA repair pathway in FA-A cells. For prediction no. 3, the model and our experimental approach concur that, despite FA-A cells arrest on the G2 phase of the cell cycle due to large amounts of DNA damage, they divide thanks to the activation of the G2 transcriptional program and the Checkpoint recovery process, which was verified by nuclear index scoring, RT-PCR and flow cytometry.

O093 - Parameter inference and model selection on small gene regulatory networks using single cell time course data

Short Abstract: Single cell time course measurements of mRNA and protein counts are available for both prokaryotic and eukaryotic cells. In order to utilize this precise data to its full extent, mechanistic models that capture the discreteness and the stochastic nature of gene expression has to be applied.

We have used particle Markov chain Monte Carlo methods to perform parameter inference and model selection on small gene regulatory models. Through simulated data we have investigated the limits of this approach in terms of measurement noise, total number of particles utilized and prior knowledge of the system. Using a complete probability model and Bayesian methodology enables us to incorporate various data sources and account for the uncertainty in all stages of the analysis, including specification of prior knowledge and measurement error. We also show how out method performs on real measurement data.

O094 - Computational metabolic engineering of Pseudomonas putida KT2440 for production of bioplastics

Short Abstract: The widespread use of fossil based plastics is a major cause of environmental pollution, and their production depends on limited non-renewable oil resources. The immense diversity of catalytic processes encoded in the genomes of sequenced organisms holds the promise of developing biotechnological systems for the efficient production of degradable materials via metabolic engineering. However, the combinatorial possibilities for integrating enzyme-coding genes from different organisms into a microbial host cannot be explored systematically by the manual review of metabolic pathways.

We have developed a computational method (Larhlimi et al. Bioinformatics 2012, 28(18): i502-i508) for determining the maximum theoretical production capability of a target compound and identifying corresponding metabolic engineering strategies. The method predicts feasible chemical reactions improving product yield when inserted into a metabolic network. Next, an identified reaction is decomposed into a set of enzyme-coding genes catalyzing its conversion, thus providing a strategy of metabolic engineering for the efficient production of valuable compounds. We have previously shown that the method correctly predicts enzymes known to be required for the production of glucose from fatty acids in a model of the TCA cycle.

We validate three existing metabolic network reconstructions of P. putida KT2440 and develop a refined model using knockout experiments. P. putida KT2440 has previously been engineered for the production of bioplastics, but the efficiency is insufficient for industrial applications. We apply our computational method to determining enzyme-coding genes improving production efficiency, thus providing novel hypotheses for the metabolic engineering of P. putida KT2440 aiming at the efficient production of bioplastics.

O095 - The DOE Systems Biology Knowledgebase: Microbial Science Domain

Short Abstract: KBase is a software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses on our scalable computing infrastructure, and model interactions in microbes, plants, and their communities. KBase provides an open, extensible framework for secure sharing of data, tools, and scientific conclusions in the fields of predictive and systems biology.

The microbes component of the KBase project aims to unify existing ‘omics datasets and modeling toolsets within a single integrated framework that will enable users to move seamlessly from the genome annotation process through to a reconciled metabolic and regulatory model that is linked to all existing experimental data for a particular organism. The results are hypotheses for such things as gene-function matching and the use of comparative functional genomics to perform higher quality annotations. KBase will embody tools for applying these models and datasets to drive the advancement of biological understanding and microbial engineering.

In order to drive the development of the microbes area and enable new science, we will focus on accomplishing prototype science workflows rather than general tasks. We have developed KBase workflows for: (1) genome annotation and metabolic reconstruction, (2) regulon reconstruction, (3) metabolic and regulatory model reconstruction, and (4) reconciliation with experimental phenotype and expression data.

O096 - Comparative co-expression: pairing transcriptomics and metabolomics data from Solanaceous species reveals genes mediating the biosynthesis of anti-nutritional glycoalkaloids

Short Abstract: Steroidal glycoalkaloids (SGAs) found in Solanaceae food plants viz. potato, tomato and eggplant are well-known anti-nutritional factors in humans. Nearly 200 years since the first report on the main potato SGA, namely, α-solanine, the biosynthetic pathway of these molecules remains largely unknown. We took advantage of the extensive transcriptome data available in the related potato and tomato plants, both producing SGAs, to identify conserved genes that are co-expressed in both species. This approach appeared to be most valuable as we could identify multiple genes that likely participate in SGAs metabolism and its control. Detailed characterization of one of these candidates, GLYCOLAKLAOID METABOLISM 4 (GAME4), encoding a cytochrome P450 protein, revealed that it performs a key step in the cholesterol-derived SGA pathway. The in silico comparative co-expression approach used in this study could be highly effective for gene discovery in the case of other, related plant species, that produce analogous specialized metabolites.

O097 - An Integrated Systems Biology Platform for Complete Proteogenomic Analysis

Short Abstract: Proteogenomic studies use large-scale MS-based proteomics data to identify novel gene products and genomic reorganizations, thereby revealing new insights into genome biology. However, the cost is complex informatics requiring combinations of disparate analysis tools, and integration of large-scale proteomic and genomic information. Currently, no complete open-source proteogenomic platform exists. As a solution, using Galaxy-P (our extension to the popular Galaxy framework), we have developed a complete pipeline – seamlessly integrating protein identification tools with genome mapping and visualization tools. The resulting workflows are sharable, reproducible, and can be creatively modified by non-technical users according to the dataset.

We analyzed three datasets for proteogenomic analysis using tools integrated into Galaxy-P. A 3D-fractionated salivary dataset and a lesion-control matched oral premalignant lesions (OPML) dataset were searched against a translated-EST database (8,000,000 sequences) and against a putative exon-exon junction human database (650,000 sequences). An iTRAQ-labeled pig islet dataset was searched against AUGUSTUS gene-predicted pig database (42,000 sequences) and a putative exon-exon junction pig database (450,000 sequences). As part of the analytical workflow, we have integrated tools and built workflows for generation of protein databases from large RNA/DNA databases and can be searched using both open-source and commercial search algorithms. Moreover, steps such as two-step protein database search, BLAST searches, peptide-spectrum-match evaluation, spectral visualization and genomic context analysis and mapping were also integrated. While our initial focus has been proteogenomics, the existing tools with simple modifications can be used for metaproteomics or other systems-biology applications using MS-based proteomics.

O098 - Decoupling Linear and Non-linear Associations of Gene Expression Over Time and Across Tissue Types

Short Abstract: The FANTOM consortium has generated a large gene expression dataset of different cell lines and tissue cultures using the single-molecule sequencing technology of HeliscopeCAGE. This provides a unique opportunity to investigate novel associations between gene expression over time and different cell types. Here, we look to use two different statistical approaches to decouple linear and nonlinear associations. In the first, we calculated Pearson correlation coefficient (r) and Spearman rank correlation coefficient (ρ) for associations between ~20,000 genes over ~1,400 different time courses and treatments, and used the differences between the two coefficients to tease apart linear and nonlinear relationships. In the second approach, we use a much more powerful (and computationally-intensive) set of statistics, Maximal Information-based Non-parametric Exploration (MINE), that captures a wide-range of associations. Following this analysis, we are then able to create a global network of significant gene associations, distinguishing between linear and nonlinear coexpression. We then characterize the graph, focusing on modularity as a way to identify clusters of linear gene associations. Emerging from this large network are clusters of linear gene associations that then associate with other clusters of linearity nonlinearly, providing insight to potential complex interactions between gene expression patterns across time and tissue types.

O099 - Integrative analysis of metabolic reaction chains in non-targeted experiments with MarVis-Graph

Short Abstract: High-throughput technologies allow the comprehensive analysis of an organism by integrating measurements from different biological fields of study such as metabolomics, transcriptomics, and proteomics. Therefore, the development of algorithms and software tools which assist the combined analysis of large, heterogeneous datasets generated in these studies is a key challenge of current Bioinformatics.

The MarVis-Graph software is an interactive tool to investigate the metabolism of an organism using data from non-targeted metabolomic, transcriptomic and proteomic studies. Measurements from mass spectrometry, micro-array, or RNA-Seq are mapped to corresponding entities in metabolic networks assembled from the KEGG or BioCyc database collection. For analysis, MarVis-Graph scores all reactions of the metabolic network based on the associated experimental data and identifies sub-networks that contain high-scoring reaction chains. To cope with missing experimental data, the Random-Walk-With-Restart graph algorithm [1] is applied to adjust the scores of nearby reactions. The resulting sub-networks are ranked, evaluated with a random permutation test, and finally visualized for interactive inspection.

MarVis-Graph investigates the full metabolic network of an organism independent from annotated pathways. Therefore, the tool is particularly useful for discovery of reaction chains based on heterogeneous data from non-targeted experiments. In combination with tools from the MarVis-Suite [2], MarVis-Graph has successfully been applied in wound response studies of Arabidopsis thaliana.

[1] Glaab, E. et al. (2012) EnrichNet: network-based gene set enrichment analysis.

[2] Kaever, A. et al. (2012) MarVis-Filter: Ranking, Filtering, Adduct and Isotope Correction of Mass Spectrometry Data.

O100 - A R tool to transform biological pathways into protein networks integrating transcriptomic and interactomic information: application to human MScs.

Short Abstract: The use of new high-throughput genomic, transcriptomic and proteomic technologies in biomedical studies is producing large amounts of biomolecular data. Developing bioinformatics methods to integrate such information in an efficient and robust way will allow to identify new biomarkers in the omic data-sets and to obtain a better molecular profiling of different biological and pathological states. We have developed a R-Tool “Path2enet” that integrates biological pathways (from KEGG), protein interaction information (from PPI databases) and transcriptomic expression data (from ESTs and microarray experiments). The result of the method is a network-based representation of the pathways that includes tissue/cell specific filtering. The Path2enet tool allows to visualize and analyse the different layers of the network, in a global or specific view. Biological researchers can use these networks: (i) to get a overview of pathways as networks; (ii) to compare the networks in different tissues or cell types. In both cases, the analysis of network parameters (like degree, betweenness, eigenvector and clustering coefficient) allows identification of central key nodes (i.e. key proteins). We applied the tool to create and study the network of human mesenchymal stem cells (MSCs) isolated from bone marrow and placenta. We integrate RNAseq data from these MSCs to build the global pathway-network for this cell type and compare it to the networks obtained with other relative cell types (like fibroblasts or hematopoietic stem cells, HSCs).

O101 - An Information Commons for Biological Networks

Short Abstract: The Network Database Project is an open-source, web-based software system to enable individuals, groups, and software applications to share, store, manipulate, and publish biological network knowledge.

The system supports multiple network formats, including OpenBEL, SBML, BioPAX, and PSI-MI. Because full semantic interoperability between existing network formats is an open issue, the project instead creates and fosters task-specific format translation services.

The Network Database Project applies ideas from social networking and collaboration systems to create a novel information commons in which users and groups interact by sharing networks. Users control which groups and users have access to each network. Groups may represent organizations or they can be created for specific projects or collaborations.

Networks are downloaded by users or manipulated and visualized in place using web-based tools. Cytoscape users access the site via an App, allowing use of the broad range of Cytoscape tools to analyze and transform networks. A web API enables custom analytic applications to access and manipulate networks.

Beyond sharing, the system addresses the issue of publication of stable versions of networks. Networks can be explicitly created as dated versions including publication meta-data. Published versions are immutable and are assigned stable URIs and DOIs, making the networks and their contents suitable for reference by publications, by other networks, and by analytic applications.

The public release of the Network Database Project is scheduled for the fourth quarter of 2013. We invite scientists and organizations - academic, non-profit, and commercial - to share and publish networks as users of this system.

O102 - MetDisease – connecting metabolites to diseases

Short Abstract: Recent progress in the field of metabolomics has created an opportunity to advance our understanding of physiological and pathological processes. It also posed a number of bioinformatics challenges associated with data analysis and interpretation. To date, there are only few tools that allow users to analyze metabolomics data and to link different types of omics data. Our recently developed tool Metscape is aimed at addressing these issues (http://metscape.ncibi.org). Metscape is a plugin for Cytoscape. It uses an internal Microsoft SQL Server database that integrates data from KEGG (http://www.genome.jp/kegg/) and the Edinburgh Human Metabolic Network (EHMN, http://www.ehmn.bioinformatics.ed.ac.uk/). It allows users to upload a list of metabolites with experimentally determined concentrations and map them to the reactions, genes and pathways. It also supports identification of enriched biological pathways from expression profiling data, building the networks of genes and metabolites involved in these pathways, and allow users to visualize the changes in the gene/metabolite data over time/experimental conditions.
While linking metabolites to metabolic pathways proved to be useful, only about half of experimentally detected metabolites can be mapped. Additional annotations are needed to enhance biological interpretation of metabolomics data. With this in mind, we recently developed a web-based tool Metab2MeSH that uses a statistical approach to annotate compounds with Medical Subject Headings (MeSH) (http://metab2mesh.ncibi.org). We will present our new tool MetDisease that uses the resulting data set to annotate compound networks with MeSH disease terms.

O103 - An optimisation approach for pathway activity inference

Short Abstract: Analysis of microarray gene expression profiles to derive insights on disease status or mechanisms relies on efficient computational strategies that can accurately assign samples into appropriate disease classes through gene expression intensities. However, the high dimensionality of the relevant classification task (large number of genes against a small number of samples) and the inherent heterogeneity of clinical outcomes hinder accurate disease classification. A promising alternative is to substitute individual gene expression measurements with a pathway activity metric that expresses the co-ordinated actions of genes within each pathway and can improve disease classification considerably [1].
A novel optimisation-based approach is presented here to infer pathway activity as a linear combination of member genes. The main aim of the optimisation problem is to build the pathway activity feature as discriminatively as possible. The proposed mathematical formulation is a mixed integer linear programming model. A number of public microarray gene expression profiles for various diseases (breast cancer, skin inflammation as in psoriasis) have been used to compare the proposed pathway activity inference model to existing approaches in the literature. We show that the proposed mathematical programming model achieves significantly higher classification rates than existing pathway activity inference schemes for all classifiers tested (SVM, KNN etc.). Apart from higher classification prediction rates, this method presents efficient performance in multi-class classification problems, not just in the more commonly used two-class classification cases.

1. Lee E, Chuang HY, Kim JW, Ideker T, Lee D. (2008) Inferring Pathway Activity toward Precise Disease Classification. PLoS Comput Biol 4(11): e1000217. doi:10.1371/journal.pcbi.1000217.

O104 - Systematic integration and comparison of multiple omics datasets

Short Abstract: Modern 'omics' technologies enable quantitative monitoring of the abundance of various biological molecules in a high-throughput manner, accumulating an unprecedented amount of quantitative information on a genomic scale. Systematic integration and comparison of multiple layers of information is required to provide deeper insights into biological systems.
Here, we describe an application of a novel multivariate statistical method to the integration, comparison and visualization of multiple datasets on same and different -omics technologies. In particular, a graphical representation of data on a lower dimensional space enables easy interpretation of the results.
We illustrate the method on four closely and three distantly related datasets. The first datasets were generated from four microarray platforms representing NCI-60 panel transcript profiles. The results confirmed a good reproducibility of different microarray studies. More importantly, the method revealed strong cell line clusters according to tissue-type and disclosed hundreds of differentially regulated genes representing potential biomarkers for numerous tumor properties. Second, we applied the method to gene expression, protein and phosphoproteome abundance of stem cell lines. Interestingly, the integration analysis indicated that the biological information contained in transcriptomic and proteomic data are equivalent but also that each technique provide complementary information. To enable community access to this technique, we implemented it into R-bioconductor as an easy-to-use package for biologists.

O105 - Generative model-based prediction of LC-MS/MS spectra for metabolite identification

Short Abstract: In order to fully understand complex biological processes, high-throughput methods are required to allow examination of all the small chemical molecules of an organism’s metabolome. Liquid chromatography tandem mass spectrometry (LC-MS/MS) shows promise as a platform for the development of such methods. However, one of the key obstacles to the high-throughput use of this technology is the difficulty in accurately and efficiently identifying a metabolite from its tandem mass spectra.

Standard methods compare collected spectra against spectra in a reference database. However the limited coverage of available databases has led to interest in computational methods for predicting spectra from a chemical structure. Existing state-of-the-art methods do this by enumerating all possible ways a molecule can break. Only simple heuristic approaches have so far been applied to predict which of these breaks are most likely. While these methods generally have good recall, explaining most if not all peaks in the spectra, they also have poor precision, predicting many more peaks than are actually observed.

We propose a probabilistic generative model for the fragmentation process, and a means by which to learn parameters for this model from data. We use this model to do in-silico spectra prediction. We compare the predicted and the measured spectra and show an improved jaccard score between our method and a pure combinatorial approach on tri-peptide and metabolite data. We also present results comparing the metabolite identification performance of our method with existing methods, including MetFrag.

O106 - Unraveling metabolic changes in long-term experimental evolution: A systems biology approach

Short Abstract: Understanding the principles that govern the distribution of metabolic fluxes is a major challenge in systems biology. Assuming optimization of fitness-relevant criteria has proven a successful approach to predict the evolution of metabolic fluxes (Ibarra et al., 2002; Schuetz et al., 2012) , but the underlying mechanisms of optimization remain elusive.
High-resolution 13C flux maps and quantitative transcriptomics datasets of long-term evolved E. coli strains were produced by Chen et al. (in prep.). These data indicate the utilization of diverging evolutionary strategies towards similar fitness values. To understand these changes on a systems level, this data is being analyzed in the context of metabolic optimality by comparison to predicted fluxes from Flux Balance Analysis (FBA). This requires modification of existing models of E. coli K-12 metabolism by a comparative genomics approach to describe the strains used in this study.
Multiple mechanisms are possible candidates that underlie the observed changes in metabolic flux distribution. To determine which levels of regulation result in the observed phenotypes, we are expanding the dataset by a quantitative proteomics approach. Correlations between data from flux predictions, measured fluxes, transcripts and proteins, as well as implications for the evolutionary mechanisms underlying metabolic optimization in bacteria, will be presented.

Ibarra, R.U., Edwards, J.S., and Palsson, B.O. (2002). Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420, 186-189.
Schuetz, R., Zamboni, N., Zampieri, M., Heinemann, M., and Sauer, U. (2012). Multidimensional Optimality of Microbial Metabolism. Science 336, 601-604.

O107 - A comparative analysis of novel complex disease pathways

Short Abstract: In recent years a number of tools have been developed to study disease-specific genetic data in the context of interaction networks, with the aim of identifying novel molecular pathways underlying common complex diseases. Typically such tools are validated using data from a well-studied disease so that results can be compared to known pathway genes. Beyond this, although the aim of using the network to propose novel disease mechanisms is clear, the interpretation of results is difficult. We have developed a framework which allows a comparative analysis between disease-related subnetworks. This can be used to compare subnetworks derived from different data sources for the same disease, or subnetworks corresponding to different diseases. Region Growing Analysis (RGA) is a tool we have developed, whose simple input requirements provide the flexibility to identify regions (proposed disease pathways) based on DNA sequence data, GWAS results, expression data or other experimental output. Using RGA we compare putative disease pathways for different diseases based on Wellcome Trust Case Control Consortium GWAS results. We test the hypothesis that diseases of the same physiological system will show greater pathway similarity than unrelated diseases. We suggest that studying the pathway components that differentiate one disease from another will lead to important inferences regarding the contribution of constituent genes to the disease process.

O108 - Domain interactions from protein interactions with Formal Concept Analysis

Short Abstract: Formal Concept Analysis (FCA) was applied to a set of proteins and their domains. The resultant concept lattice was used to discern domain-domain interactions (DDI) which underlie a given set of protein-protein interactions (PPI). Using a published methodology and dataset, and an assumption of 50% reliability for each PPI, we observed that
(i) high ranking concept pairs are enriched with gold standard domain pairs, self-interacting domain pairs and co-occurring domain pairs. This is noteworthy since these types of domain pairs are more likely to interact with one another. A domain pair (x, y) is co-occurring if both domains x and y are found in at least one protein in the input set; and
(ii) the highest ranking concept pair(s) for each PPI from a selected set of PPIs are highly likely to contain at least one gold standard domain pair.
These observations indicate that a FCA based approach is applicable to the problem of inferring DDIs from a given set of PPIs. These observations were made with Pfam-B domains included, which can be useful to fill in current information gaps about Pfam-B domains. Further, since concept pairs are evaluated instead of individual domain pairs, the role of domain context can be examined without constrain on the size of domain contexts or enumeration of power sets. So far, we observed that considering domains within their respective contexts helps to reduce false positive PPI predictions. In short, FCA is a promising, discrete and possibly simpler alternative to existing methods surveyed.

O109 - Computational methods integrated with systems pharmacology models to discover novel pathways leading to Alzheimer’s disease

Short Abstract: Discovery of genes associated with sporadic diseases such as cancer and neurodegenerative diseases heavily rely upon the conventional bioinformatics tools. These methods are useful only if disease relevant genetic variations are highly prevalent in affected compared to normal individuals. Many times, pathological changes in patients occur surreptitiously and escape capture by existing computational methods. Therefore, we have developed a comprehensive systems pharmacology approach to integrate pathophysiological changes as it relates to the Alzheimer’s disease (AD) with the genomics changes in AD brain. We have incorporated the following functional assays and biomarker information into the model:
1. Extent of brain amyloid burden in AD patients was determined by positron emission tomography (PET).
2. Recorded perturbations in A distribution kinetics in the presence of endogenous factors that modulate AD risk such as insulin and APOE.
3. Related them to the gene expression changes triggered by such factors and overlay them on the proteomic analysis, conducted in vitro as well as in vivo.

Such a vertical integration of the features of disease progression with the OMICS data highlighted impaired vesicular transcytosis as a key pathway associated with AD pathophysiology. This pathway is down regulated in the normal aging brain, but was found to be severely impaired in the AD brain. Further functional validation of key genes in this pathway will help us identify novel molecular targets for AD diagnosis and treatment.

O110 - Computational mining of transcriptional regulators from the genome of Plasmodium falciparum

Short Abstract: Reports have shown that the Plasmodium falciparum (P.f.) is growing Resistance to Artemisinins and Pyrethroids. Such resistance potentially threatens the future progress in malaria control as there is yet no licensed malaria vaccine.Despite the availability of the complete sequence of the P. f. genome, no much is known about its mechanism of transcriptional gene regulation. The extreme AT-rich nature of P. f. intergenic regions(~90%) presents challenges to computational regulatory element discovery. Arguments are that it is possible that the parasite may evoke mechanisms of transcriptional control drastically different from those used by other eukaryotes.
Our approach to identify novel transcriptional regulators within the genome of P. f. is as follows: Analyse gene expression datasets that cover parts of or the entire life cycle of the parasite to infer smoothed time course data for each gene in the dataset. Combine these time courses into clusters with unsupervised classification methods. For each cluster, retrieve sequence of upstream genetic sequence for each transcript contained in it, then find over-represented sequence elements in these putative promoter sequences. This will result in co-expression groups that may be regulated by the same factor, plus some regulatory genomic elements. However, the identity of the regulating transcription factor (TF) will so far be unknown. In order to identify these TFs we use modeling of regulatory interactions of the co-expression groups by reverse engineering. A very good anti-plasmodial strategy would then be to inactivate one or more key sets of TFs with drugs.

O111 - in silico prediction of miRNA dynamics in complex interaction networks

Short Abstract: Aging is a major risk factor for neurodegenerative diseases (NDs) and aging-related NDs are the culmination of non-linear, multi-factorial events. Recent evidence suggests that microRNAs (miRNAs) may be a contributing factor in both aging and neurodegeneration. miRNAs are small non-coding RNA that participate in mRNA translation regulation and possibly in maintaining cellular identity. Although there are some evidences of miRNA involvement in NDs their specific role in NDs is largely unknown. We employ a systems biology approach to explore miRNA influences in aging and NDs by combining experimentally derived data from tissues with computational modeling.
Briefly, to elucidate the role of miRNAs on protein dynamics we are developing a computational model of miRNA to target mRNA interactions based on available interaction information derived from experimental data and in silico predictions. Our goal is to model translational changes resulting from differential RNA expression profiles in aging and neurodegeneration, and derive the consequential effects on protein and m(i)RNA dynamics. We apply the model to complex subnetworks of miRNA to mRNA interaction supplemented by deep-sequencing data and thus aim to gain insights into the role of miRNA regulation.
Here, we present preliminary data and discuss their relevance in aging and NDs.

O112 - The Metabolic Interplay between Plants and Phytopathogens.

Short Abstract: We present a study on the interaction of plant-pathogen pairs at the metabolic level. We selected five plant-pathogen pairs, for which both genomes were fully sequenced, and constructed the corresponding genome-scale metabolic networks. We present theoretical investigations of the metabolic interactions and quantify the positive and negative effects a network has on the other when combined into a single plant-pathogen pair network. Merged networks were examined for both the native plant-pathogen pairs as well as all other combinations. Our calculations indicate that the presence of the parasite metabolic networks reduce the ability of the plants to synthesize key biomass precursors. While the producibility of some precursors is reduced in all investigated pairs, others are only impaired in specific plant-pathogen pairs. Interestingly, we found that the specific effects on the host’s metabolism are largely dictated by the pathogen and not by the host plant.

O113 - Understanding protein-protein interactions using local structural features

Short Abstract: Protein-protein interactions play a relevant role among the different functions of a cell. Identifying the protein-protein interaction network of a given organism (interactome) is useful to shed light on the key molecular mechanisms within a biological system. In this work, we show the role of structural features (loops and domains) to comprehend the molecular mechanisms of protein-protein interactions. A paradox in protein-protein binding is to explain how the unbound proteins of a binary complex recognize3 each other among a large population within a cell and how they find their best docking interface in a short time-scale. We use interacting and non-interacting protein pairs to classify the structural featrues that sustain the binding (or non-binding) behaviour. Our study indicates that not only the interacting region but also the rest of the protein surface is important for the interaction fate. The interpretation of this classification suggest that the balance between favouring and disfavouring structural features determines if a pair of proteins interacts or not. Our results are in agreement with previous works and support the funnel-like intermolecular energy landscape theory that explains protein-protein interactions. We have used these features to score the likelihood of the interaction between two proteins sand to develop a method for the prediction of protein-protein interactions. We have tested our method on several sets with unbalanced ratios of interactions and non-interacting pairs to simulate real conditions, obtaining accuracies of almost 50% in most unfavourable circumstances.

O114 - EPSILON: an eQTL prioritization framework using similarity measures derived from local networks

Short Abstract: Expression quantitative trait loci (eQTL) derived from expression and genomic data are very likely to span multiple genes. eQTL prioritization techniques can be used to select the most likely causal gene affecting the expression of a target gene from a list of candidates. As an input, these techniques use physical interaction networks that often contain highly connected genes and unreliable or irrelevant interactions that can interfere with the prioritization process. We present EPSILON, an extendable framework for eQTL prioritization that mitigates the effect of highly connected genes and unreliable interactions by constructing a local network before a network-based similarity measure is applied to select the true causal gene. We tested the new method on three eQTL data sets derived from yeast data using three different association techniques. A physical interaction network was constructed and each eQTL in each data set was prioritized using the EPSILON approach: first a local network was constructed using a k-trials shortest path algorithm, followed by the calculation of a network-based similarity measure. The aim was to predict knockout interactions from a yeast knockout compendium. EPSILON outperformed two reference prioritization methods, random assignment and shortest path prioritization. Next, we found that using a local network significantly increased prioritization performance in terms of predicted knockout pairs when compared to using exactly the same network similarity measures on the global network, with an average increase in prioritization performance of 8 percentage points (p<10-5).

O115 - Metabolic phenotypic analysis uncovers reduced proliferation associated with oxidative stress in progressed breast cancer

Short Abstract: The importance of metabolic reprogramming in cancer is being increasingly recognized. However, whole metabolic flux measurements in cancer are still scarce. Hence, we developed a novel Metabolic Phenotypic Analysis (MPA) method that profiles the metabolic phenotype of a tumor based on its gene or protein expression. We applied MPA to conduct the first genome-scale study of breast cancer metabolism based on the gene expression of a large cohort of cell lines and clinical samples. The modeling correctly predicted cell lines' growth rates, tumor lipid levels, and amino acid biomarkers, outperforming other metabolic modeling methods. MPA revealed that the tumor proliferation decreases as it evolves metastatic capability. We experimentally validated this "go or grow" dichotomy in-vitro, and linked the proliferation decrease to oxidative stress. Finally, we found fundamental metabolic differences between estrogen receptor (ER)+ and ER- tumors. These findings provide new insights into core metabolic aberrations in breast cancer.

O116 - Exploring causal genes responsible for a phenotype by intervention-calculus with the nonparanormal method

Short Abstract: Intervention experiments, e.g., knockdown or overexpression, are commonly conducted to identify genes which determine cell fate such as differentiation, inducing pluripotency and direct reprogramming. Although those experiments reveal a causal gene responsible for a phenotypic of interest, they require durations and costs prohibitively. Moreover, it is known that high-throughput intervention experiments include a lot of false positives. Therefore, it is crucially important to prioritize causal genes with high confidence to focus on. Consequently, we present the NPN-IDA (Nonparanormal intervention-calculus when the DAG is absent) which incorporates the cumulative nature of effects through a cascaded pathway via causal inference for ranking causal genes against a phenotype. IDA originally consists of the PC-algorithm for causal inference and estimating casual effects based on intervention calculus. To relax the Gaussian assumption in causal inference, we introduce the nonparanormal method. The nonparanormal method exploits the nonparametric correlation coefficient instead of Pearson’s correlation coefficient, and infers causality more accurately even when the data are not Gaussian. We demonstrate that NPN-IDA works quite well in exploring the regulators of flowering time in Arabidopsis microarray data. Despite of the simplest alternative procedure, our proposed method enables us to effectively design intervention experiments, and can be applied to a wide range of research purpose including drug discovery, due to its generality.

O117 - Functional network signatures link anti-diabetic interventions with disease parameters

Short Abstract: Our current methods to intervene with type 2 diabetes mellitus (T2DM) are insufficient, and novel interventions are being developed to improve treatment efficiency across the whole range of T2DM-related complications. To understand and improve our ability to intervene with complex diseases such as T2DM, it is important to elucidate which and how underlying molecular mechanisms may contribute to development of pathology and its regression by different interventions. We applied a network biology approach on the hepatic transcriptome dataset from a recent systems study on anti-diabetic treatments in LDLR-/- mice to identify network signatures relevant to the effects of interventions on disease endpoints. The resulting network signatures explain molecular paths linking a dietary lifestyle intervention and two anti-diabetic drug interventions (fenofibrate, T0901317) to the four dyslipidemia-related disease endpoints (atherosclerosis, plasma cholesterol levels, liver weight, and plasma triglyceride levels). The analysis combines knowledge-based and data-driven networks with a random-walks based algorithm, leading to extraction of molecular paths connecting intervention targets with network modules that correlate with disease endpoints. As these disease endpoints were reverted to the healthy levels by a dietary lifestyle intervention and in turn were aggravated by the drugs, the resulting network signatures define putative signatures that should be mimicked or circumvented by the novel interventions to achieve optimal pathology outcome. Together, the undertaken approach provide insight in molecular paths underlying both positive and negative effects of anti-diabetic interventions, holding promise for aiding an improved intervention design.

O118 - Modeling the Wnt/ beta-catenin Signaling Pathway

Short Abstract: This poster is based on Proceedings Submission of the work done on modeling the Wnt/β-catenin signaling pathway. The Wnt/β-catenin signaling pathway is important for cell development and stem cell maintenance. Dysregulation of the signaling pathways can lead to tumor formation and colon cancer. Because of the low number of drugs available for effective treatment new insights into the signaling mechanism of the pathway can lead to new targets for treatment. In the mature cell there is a shortage of Wnt leading to β-catenin degradation in the cytoplasma and a limited concentration of nuclear β-catenin. When Wnt is present it activates the signaling pathway resulting in nuclear β-catenin accumulation, which in turn activates Tcf/Lef transcription of various Wnt target genes. In colon cancer this dysregulation is mainly caused by mutations in the adenomatous polyposis coli (APC) or β-catenin. A non-deterministic concentration dependent Petri Net model has been developed for the Wnt/β-catenin signaling pathway. Simulations of the model were able to recapitulate the increased levels of Tcf/Lef seen in experiments with Wnt signaling, APC mutation and β-catenin, respectively. In addition the model also recapitulated a number of experiments with overexpression, knockdown and knockout of different proteins involved in the signaling pathway. The next step will be to make a refined version of the current model and an extended model and use these models to perform predictions of various experiments on the Tcf/Lef levels to be further experimentally validated.

O119 - Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks

Short Abstract: Inferring global regulatory networks (GRNs) from genome-wide data is a computational challenge central to the field of systems biology. Although the primary data currently used to infer GRNs consist of gene expression and proteomics measurements, there is a growing abundance of alternate data types that can reveal regulatory interactions, e.g. ChIP-Chip, literature-derived interactions, protein–protein interactions. GRN inference requires the development of integrative methods capable of using these alternate data as priors on the GRN structure. Each source of structure priors has its unique biases and inherent potential errors; thus, GRN methods using these data must be robust to noisy inputs.

We developed two methods, 1) Modified Elastic Net (MEN) and 2) Bayesian Best Subset Regression (BBSR), for incorporating structure priors into GRN inference. Both methods extend the previously described Inferelator framework, enabling the use of prior information. We test our methods on one synthetic and two bacterial datasets, and show that both MEN and BBSR infer accurate GRNs even when the structure prior used has significant amounts of error (> 90% erroneous interactions).

Currently, we are further improving these methods by using the structure priors to also predict transcription factor activities (TFAs). TFAs represent a hidden layer between the observed transcription factor expression and its regulatory strength. Early experiments with TFAs indicate a significant improvement on the leave-out set of regulatory interactions, i.e. the part of the network that did not have any priors.

O120 - Modeling cellular ROS defense in mitochondrial-related diseases

Short Abstract: Reactive Oxygen Species (ROS) generation is an unavoidable background process during normal cellular function. The main contributor to ROS production is the electron transport chain, which reduces oxygen to water. Some incompletely-reduced oxygen species escape and oxidize a variety of organic molecules, leading to molecular dysfunction and initiating a positive feedback loop of ever increasing active radical production. The increased concentration of ROS damages the mitochondria, therefore further elevating the rate of ROS generation. Healthy cells manage ROS enzymatically and by mitophagy of damaged mitochondria. The precise tuning of the latter mechanism is crucial for cell survival and is controlled by a ROS-induced regulatory network. We have built a set of kinetic models of varying complexity, based on the current understanding of the mechanism of cellular ROS defense. Our models allow simulation of various patho-physiological scenarios related to mitochondrial dysfunction and the failure of the system of ROS regulation in human cells. We employ the models we have constructed to simulate the effects of diseases related to mitochondrial dysfunction and excessive ROS generation, such as Parkinson’s disease, Huntington’s disease and cancer. Experimental evidence is used for model fitting, and we propose model improvements based on incorporation of single-cell experimental measurements. Finally, we discuss the perspective of integrating our kinetic models with genome-scale, constraint-based, tissue-specific models of metabolism, in order to study the effect of ROS misregulation on metabolic phenotype.

O121 - Simulation of Predator-Prey Dynamics in Biological Wastewater Treatment Processes

Short Abstract: The microbial community evolving in biological wastewater treatment plants is crucial for their performance. This microbial community is composed of both bacteria and bacteriophages. Until recently, most research efforts have focused on the bacterial community, while bacteriophages were kept in the shade. We believe that bacteriophages may have a major influence on the bacterial population and consequently on process efficiency. In this study, we have investigated the predation dynamics between bacteriophages and bacteria and developed a suitable mathematical model. Model equations were developed in order to describe both microbial populations and substrate transformations. Different parameters used for the calculations were calibrated by comparing calculated results with results achieved experimentally using microtiter plates which served as micro-scale eco-systems. Finally, simulations of the effect of bacteriophages on the bacterial population and therefore on substrate degradation were performed. Our findings show that mathematical models can be used for the investigation of the biological network composed of bacteriophages, bacteria and substrate in wastewater treatment plants. Furthermore, simulating the processes occurring in the bioreactor may provide new insights into the influence bacteriophages have on substrate degradation.

O122 - Inferring systematic experimental bias in protein interaction datasets.

Short Abstract: Experimental approaches for protein interaction detection vary widely in scope and suffer from a range of potential error-introducing steps. Technique-specific systematic errors are likely to have a significant effect on the quality and interpret-ability of integrated datasets. Here we present a probabilistic error model that captures the notion of technique-specific false positive and false negative rates within a generically defined biochemical space. Our model also incorporates estimates of the set of tested interactions in individual high-throughput studies. We used the model to infer error rates for the most commonly used experimental techniques (yeast two hybrid, tandem affinity purification and protein-fragment complementation assay) within the functional space defined by the Gene Ontology vocabularies. The inferred error rates exhibit clear differences between the experimental methodologies in terms of interaction detection fidelity across functional space. In addition, by using a Bayesian approach, our model allows us to infer the posterior distribution of the likely biochemical structure of the true underlying interactome, by integrating datasets of different experimental types in the inference process. This joint inference of true interactome structure and error rates for different experimental types should alleviate some of the issues arising from the straightforward integration of datasets and lead to more robust quality estimates.

O123 - Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding

Short Abstract: Motivation: Most functions within the cell emerge thanks to protein-protein-interactions (PPIs), yet experimental determination of PPIs is both expensive and time-consuming. PPI-networks present signifi-cant levels of noise and incompleteness. Predicting interactions using only PPI-network-topology (topological prediction) is difficult but essential when prior biological knowledge is absent or unrelia-ble.
Methods: Network embedding emphasises the relations between network proteins embedded in a low-dimensional space, in which protein-pairs that are closer to each other represent good candidate-interactions. To achieve network denoising, which boosts prediction performance, we first applied minimum-curvilinear-embedding (MCE), and then adoped shortest-path (SP) in the reduced space to assign likelihood-scores to candidate-interactions. Furthermore, we introduce: (i) a new valid variation of MCE, named non-centred-MCE (ncMCE); (ii) two automatic strategies for selecting the appropriate embedding-dimension; and (iii) two new randomised procedures for evaluating predictions.
Results: We compared our method against several unsupervised and supervised embedding approaches and node-neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader, outperforming the current methods in topological link-prediction.
Conclusion: Minimum curvilinearity is a valuable nonlinear frame-work that we successfully applied to the embedding of protein net-works for the unsupervised prediction of novel PPIs. The rationale for our approach is that biological and evolutionary information is imprinted in the nonlinear patterns hidden behind the protein net-work topology, and can be exploited for predicting new protein-links. The predicted PPIs represent good candidates for testing in high-throughput experiments or for exploitation in systems biology tools such as those used for network-based inference and prediction of disease-related functional-modules.

O124 - TRANSTAGING: Transcriptogram-based staging of cancer.

Short Abstract: The classification of different tumor types is most important in cancer diagnosis. The cancer classification studies are clinical based and have restricted diagnostic ability. Cancer classification using gene expression data is known to contain the keys for addressing the central problems relating to cancer diagnosis and drug discovery. Analysis of genome-wide expression data poses a challenge to extract relevant evidence. We use computational method that order genes on a line and clusters genes by the probability that their products interact. Protein–protein association information can be obtained from large databases as STRING. The genome organization obtained this way is independent from specific experiments, and defines functional modules that are associated with gene ontology terms. The starting point is a gene list and a matrix specifying interactions. Considering the Homo sapiens genome, we projected on the ordering gene expression, producing plots of transcription levels for three different tumor types (lung, neuroblastome, breast), whose data are available at Gene Expression Omnibus database. This analysis differentiated normal and tumor tissues. Moreover, the method classified tumor tissues in many classes that were previously inspected with biological process ontologies shown that each class has a set of modified process. These results are the first evidence to find biomarkers for tumor staging by a computational method.

O125 - A web implementation of ClusterONE

Short Abstract: Uncovering the set of protein complexes of an organism is an important step towards the understanding of its cellular organization and for protein function prediction. Often this is done by clustering a network of protein-protein interactions (PPI) derived from experimental data. However, it is well known that a protein may participate in several protein complexes and thus standard clustering methods which simply partition the network into disjoint groups are not ideal for this problem. In this poster, we present ClusterONEWeb (http://www.paccanarolab.org/clusteroneweb), a web server implementation of the ClusterONE algorithm (Nepusz et al, Nature Methods, 2012), the current state of the art for detecting protein complexes from protein interaction networks. ClusterONE is capable of detecting potentially overlapping protein complexes, as well as of working with both binary and weighted input networks. ClusterONEWeb runs on our server, thus allowing the user to experiment with the algorithm without installing it on his/her local machine. ClusterONEWeb allows the user to set the algorithm parameters through a user-friendly interface. Results are presented through an interactive table that can be sorted according to different criteria. ClusterONEWeb is very fast – it can cluster the entire human interaction dataset from HPRD (39,240 interactions) in less than ten seconds.

O126 - PENDISC: Parameter Estimation in a Non-DImensionalized S-system with Constraints

Short Abstract: The time-series data of metabolite concentrations are shown to be an important tool for the identification and characterization of metabolic reaction networks. A number of experiments has been recently designed and conducted to obtain the time-series data applicable to comprehensively understand the networks and their regulations essential for synthetic biology and metabolic engineering. Concurrently, the efforts to construct a mathematical model based on available time-series data have been put on with the expectation that the model will allow us to further analyze the system. However, those obtained time-series data contain noise and some metabolite concentrations are undetectable resulting in difficulties in understanding of metabolic behaviors and challenges in model construction. This is because it is hard to estimate suitable model parameters which can be used to clarify the system on the basis of noisy data. To handle these constraints, we therefore proposed a newly simple method so called PENDISC (Parameter Estimation in a Non-DImensionalized S-system with Constraints). This method aims to significantly reduce a number of parameters to be estimated by normalizing S-system equations in the framework of biochemical systems theory. Its performances were validated using several well-known generic models. The results revealed that the PENDISC method successfully provided a simple technique for model construction using time-series data even when the data contained noise and some metabolites may be undetectable. This allows us to deal with analysis of real metabolic reaction networks and prediction of metabolic behaviors.

O127 - A boolean approach for indirect inference of non-transcriptional pathways.

Short Abstract: Understanding cell signaling pathways is key in battling cancer. Extensions and revisions of current pathway models can have immediate consequences on treatment strategies.
Seaz-Rodriguez et al. (2009) proposed a method for updating pathway models (prior knowledge networks PKN) in the light of new data from perturbation experiments. They define a boolean network on the topology of a prior knowledge network (PKN) and update those logical networks using phosphorylation assay data observed after perturbing multiple pathway molecules alone and in combination. Updating is achieved by minimizing an objective function that combines the mean squared error with a penalty for network size. Edges from the PKN are either removed or equipped with a different logical function: E.g. are upstream activations A and B needed to
activate C or can the signal be propagated alternatively via A or B. Besides activation, repression can also be modelled in the network.
We propose an extension of this method which can use highdimensional gene expression data from next generation sequencing assays instead of phosphorylation assays. This type of data is much more readily available but inference on non-transcriptional signalling must be indirect. We combine the Seaz-Rodriguez model with ideas from Nested Effects Models that allow for indirect estimation of signalling pathways from downstream effects in
gene expression.

O128 - A Significant Pathway Finding by Constructing Pathway Interaction Network based on PPI

Short Abstract: Discovering and understanding biological relevant pathways and disease related genes for given phenotypes, are the fundamental challenges for human diseases.In recent years, there has been several computational methods are introduced to prioritize candidate genes and related pathways for human disease using different kind of data, such as SUSPECT which prioritizes genes from give chromosomal area using available gene and protein information. ENDEAVOUR uses set of already known genes as training set to modelize the biological process under study and then to score and rank the candidate genes using that model. Another tool GENEWANDERER based on protein-protein interaction and uses a global network distance measure, random walk analysis, to identify gene-disease association.
Here we proposed a new method for finding disease related pathways, which relies on the Pathway Interaction network constructed by using pathway information and protein interaction network. Specifically, to identify significant pathway for given disease phenotype, we have considered the association between pathway information and protein interactome data.
We have tested our method to prioritize novel disease for different disease, such as breast cancer; by using network-based machine learning approaches.

O129 - Novel time-series inference method provides insight into intestinal microbiota dynamics and stability

Short Abstract: The increased availability of metagenomics data has emphasized the
importance of the intestinal microbiota in health and disease. In healthy
individuals, the intestinal microbiota resides in relatively stable conditions; shifts in composition are indicators of external perturbations,
such as diet changes or antibiotic administration, and imbalance in
composition has been linked to the progression of disease. The knowledge
of the mechanisms responsible for this ecosystem's dynamics especially in response to external stimuli is crucial for designing therapies. However, to date there has been no attempt to disentangle these mechanisms from high-throughput data and exploit them for prediction. In this work, we present a novel ecological data-based inference method, which by using the measured temporal variations of species' densities outputs global
parameters describing the growth, the interactions and the susceptibilities to external stimuli of the species in the studied community relative to an underlying microbial community model. These parameters characterize not only the ecological system but provide us with experimentally testable hypotheses. Moreover, they can be used to predict the system's temporal dynamics on short and long time-scales. As a result, the system's steady states may explain experimentally observed
catastrophic shifts induced by external perturbations. Applied to the bacteria in the gut, such predictions may contribute to improve the design of antibiotic therapies.

O130 - Information-theoretic approach to extract topological and functional motifs in Protein-Protein Interaction Networks

Short Abstract: Interactome comparisons have highlighted conserved modules, that might represent common functional cores of ancestral origin. However, recent analyses of protein-protein interaction networks (PPINs) resulted in a debate about the influence of the experimental method on the quality and biological relevance of the interaction data [1,2]. It is crucial to know to what extent discrepancies between different species networks reflect sampling biases of the relative experimental methods, as opposed to topological features due to biological functionality. This requires new, precise and practical mathematical tools with which to quantify and compare the topological structures of networks macroscopically. To this end we started to study the relationship between structured random graph ensembles and real biological signaling networks focusing on the number of short loops in networks which represent complexes in PPINs. In this contribution we present a large scale investigation of the role of loops of length 3, 4, 5 and 6 in 28 PPINs from different species. By combination of a method for graph dynamics and an algorithm for loop counting we estimated the relative importance of loops in biological networks compared to random graphs. We found that loops are a predominant feature of PPINs suggesting that enrichment in their number has a key functional role. We also investigated the abundance of disease-related proteins in short loops.

O131 - Network Biology of Systems Flexibility

Short Abstract: Current definition of health implements a view of optimally functioning human physiology as the ability to adapt to one’s environment. To achieve such optimal function, flexibility should be established and maintained at all levels of systems complexity: e.g. in different organs, biochemical processes etc. Network biology is fundamentally suited to investigate flexibility of the system, based on following properties: (a) Network biology comprehends multi-level mapping of systems components and interactions between them, discovering processes and “hotspot” nodes required for flexible response to perturbations; (b) Generic topological network properties, such as feedback loops, redundancy, modularity and hierarchical organization, contribute to ability of a system to flexibly responds to perturbations while maintain its functionality; (c) Systems adaptation to perturbations can be quantified by dynamic networks by mapping changes in edges and network topology during a challenge. We are developing solutions to address systems flexibility using network properties and implement these findings into healthcare, with focus on following critical aspects:
1. Assessment of dynamic changes in network models (e.g. network alignment, differential network mapping, incorporation of challenge tests data, etc.)
2. Quantification of flexibility based on network topology, identification of bottleneck nodes
3. Simulation of effects on flexibility caused by interventions on specific nodes and/or edges
4. Visualization of dynamics changes, as an instrument for healthcare providers
Together, these methodologies are building upon and complementing our current and past experimental work, different datasets and existing knowledge, and thus provide the key to unify different approaches to study systems flexibility.

O132 - Supervised de novo reconstruction of metabolic pathways

Short Abstract: We developed a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as ”enzymatic-reaction likeness”, i.e., whether or not compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns, and in the large-scale applicability owing to the computational effi- ciency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in KEGG. Our comprehensively predicted reaction networks of 15,698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics.

O133 - Dynamic regulatory models of disease progression

Short Abstract: Methods for modeling regulatory and other networks by integrating expression and interaction data have provided important insights about the regulation of development and diseases. A subset of these methods can reconstruct dynamic models providing specific temporal information about the activation of transcription factors, microRNAs and the genes they regulate.

However, these methods for modeling such dynamic networks have primarily focused on model organisms and using a single input time series. Reconstructing dynamic regulatory networks from populations of patients that are tracked over time presents unique challenges, including differing disease progression rates among individuals and differences in the background expression and genetics of patients.

To address these issues we developed a new method that allows the simultaneous analysis of multiple time series expression data sets from multiple patients, with the end goal of identifying differences in the regulatory programs between groups of patients with different phenotypic responses to treatment.

Our method works through a multi-stage process. First, we align the time series using a sliding window approach, since the rate of biological processes can differ among organisms. We then use the Dynamic Regulatory Event Miner (DREM) to build multiple models that represent possible regulatory activity programs. We assign each gene of each patient to one of these models, and then assign the patient to the model s/he most resembles. This allows us to cluster patients. We then iteratively refine the model. We use our finalized patient groupings and models to identify TF activity that differs between the groups.

O134 - Plant Resources at the DOE KBase Project

Short Abstract: The U.S. Department of Energy Systems Biology Knowledgebase (KBase, http://kbase.us) is a collaborative effort designed to accelerate our understanding of microbes, microbial communities, and plants. It is a community-driven, extensible and scalable open-source software framework, and application system. KBase offers free and open access to data, models and simulations, enabling scientists and researchers to build new knowledge, test hypotheses, design experiments, and share their findings.

KBase integrates commonly used core tools, reference and experimental data, and overlays them with new capabilities for visualization, exploration, and predictive analysis with KBase-generated recommendations. KBase is distinguished from a database or existing biological tools by its focus on interpreting missing information necessary for predictive modeling, on aiding experimental design to test model‐based hypotheses, and by delivering quality‐controlled data. The project leverages the power of cloud-based high‐performance computing resources across the USDOE system of labs to handle the anticipated rapid growth in data volumes and computing requirements of the community.

KBase empowers bioenergy science with a variety of plants resources and analytical services. Reference plant genomes such as poplar and Arabidopsis are integrated with phenotype experiments, gene expression profiles, regulatory, interaction, and metabolic networks, for building models and generating new hypotheses. User-furnished data can be uploaded, analyzed using high-performance bioinformatics tools, and overlaid on the data already available, visually and analytically. As the project matures, partnerships with plants resources such as iPlant and Gramene/Ensembl will lead to a broader research platform for predictive plant and microbial biology.

O135 - Weighted maximum clique model for identifying condition specific sub-network

Short Abstract: Sub-networks can reveal the complex patterns of the whole bio-molecular network by extracting the interactions that depend on temporal or condition specific context. The identification of condition specific sub-networks is of great importance for investigating how a living cell adapts to changing environments.
In this work, we propose a continuous optimization model, called weighted maximum clique model, which uses scoring parameters that jointly measure the condition-specific changes of both individual genes and gene-gene co-expression, to identify the condition specific sub-network that has maximal score. Finding maximal scoring sub-network is generally formulated as a combinatorial optimization problem. Bio-molecular networks are often large in scale. It is impossible to solve such a large combinatorial optimization problem exactly in reasonable time. To address this issue, we formulate the sub-network identification problem as a continuous optimization problem which is an approximation of the general combinatorial problem based on the theorem due to Motzkin and Straus. It relates maximum cliques of a weighed graph to the optimization of a quadratic function under sparsity constraints. The optimization problem can be efficiently solved by the continuous genetic algorithm to find a single optimal sub-network which maximizes the quadratic objective function under sparsity constraints.
We construct the background network by obtaining weight parameters for our model and apply the model to analyze real prostate and ovarian cancer data sets. Compared with previous methods, our method is more robust in identifying truly significant sub-networks of appropriate size and meaningful biological relevance.

O136 - Engineering a portable riboswitch-LacP hybrid device for two-way gene regulation

Short Abstract: Riboswitches are RNA devices that mediate ligand-dependent control of gene expression. Previous riboswitches are confined to a particular gene, and only perform one-way regulation. Here, we used a library screening strategy for efficient creation of a riboswitch-LacP hybrid device to achieve portable gene control in response to theophylline and IPTG. This device regulated target expression in a ‘two-way’ manner: the default state of target expression was ON; the expression was switched off by adding theophylline, and restored to the ON-state by adding IPTG without changing growth medium. Use of the hybrid device uncovered an inhibitory role of RpoS in acetate assimilation, a function which is otherwise neglected using conventional genetic approaches. Overall, this work establishes a portable riboswitch-LacP device that achieves sequential OFF-and-ON gene regulation. The two-way control of gene expression has various potential scientific and biotechnological applications and helps reveal novel gene functions.

TOP

View Posters By Category

Search Posters:

TOP