20th Annual International Conference on
Intelligent Systems for Molecular Biology


Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category X - ''
X01 - ConsensusPathDB: a meta-database for functional associations and pathways
Short Abstract: ConsensusPathDB is a meta-database that integrates physical protein interactions, genetic interactions, metabolic and signaling reactions, gene regulatory interactions and drug-target interactions into a seamless functional association network. This network reveals multiple functional aspects of genes, proteins, complexes, metabolites, etc. simultaneously to capture the molecular biology of the cell in a more complete and unbiased manner. ConsensusPathDB currently integrates ~170,000 unique human interactions as well as over 3,250 human pathways from ~30 public resources; separate database instances exist also for yeast and mouse. Physical entities and interactions from different sources are mapped to each other in order to avoid data redundancy. Through regular database rebuilds it is ensured that the integrated interaction content stays up-to-date. The web interface of ConsensusPathDB offers different ways of utilizing these integrated interaction data and features tools for visualization, analysis and interpretation of high-throughput expression data in the light of functional associations and biological pathways. For example, these tools can aid the identification of patient-specific disease-causative genes, and at the same time can highlight drugs that are targeted at those genes.
ConsensusPathDB can be accessed freely at http://cpdb.molgen.mpg.de
TOP
X02 - Global Network Alignment Using Multiscale Spectral Signatures
Short Abstract: The alignment of protein interaction networks is a fundamental problem in network analysis. Global alignments put the proteins of one network into
correspondence with the proteins of another network in a manner that aligns many of their interactions while respecting other evidence of their homology.
Such alignments can be used to inform hypotheses about the functions of unannotated proteins, the existence of unobserved interactions, the
evolutionary divergence between the two species, and the evolution of complexes
and pathways.

We introduce GHOST, a global pairwise network aligner that uses a multi-scale spectral signature to measure topological similarity between subnetworks.
Combining spectral information from subgraphs of varying diameters around each node yields a discriminative, yet robust topological signature. GHOST combines this topological signature, sequence-based evidence of homology, and a
seed-and-extend approach that aligns local neighborhoods using a relaxation of the quadratic assignment problem. After the initial computation of a network's topological signatures, which is independent of a particular alignment and need
only be computed once, GHOST is capable of aligning networks with thousands of nodes in a manner of minutes.

We demonstrate that, under a number of existing biological and topological quality metrics, GHOST exhibits or exceeds state-of-the-art performance in
network alignment. Under an additional proposed new measure that addresses shortcomings of the prevailing edge correctness measure, GHOST
exhibits performance consistent with the best topological aligner while producing alignments with greater biological relevance. In general, GHOST is able to efficiently uncover large, connected, and conserved patterns of biologically meaningful interactions
between the networks of even distantly related species.
TOP
X03 - Efficient Maximum Expected Accuracy Alignment of Multiple Biological Networks
Short Abstract: Cross-species comparison of genome-scale protein-protein interaction (PPI) networks can elucidate important similarities and variations across different species. Network alignment provides an effective means for discovering similar network regions across two or multiple networks, thereby predicting novel functional modules -- such as signaling pathways and protein complexes -- conserved in different species. To obtain biologically meaningful results, an effective network alignment scheme should utilize both the molecular similarity between individual proteins as well as the similarity between their interaction patterns. In this work, we present a probabilistic network alignment scheme that finds accurate alignments based on the maximum expected accuracy alignment principle. The proposed scheme first estimates the probabilistic correspondence scores between nodes that belong to different networks, based on an efficient semi-Markov random walk approach. These probabilistic scores are refined through through two types of consistency transformations, namely, an intra-network transformation and an inter-network transformation, to effectively incorporate information from neighboring nodes as well as nodes in other networks. The transformed scores are then used to predict the maximum expected accuracy alignment through an efficient greedy alignment process. Experimental results based on real and synthetic PPI networks show that the proposed scheme can significantly improve the overall specificity and sensitivity of the alignment, clearly outperforming current state-of-the-art network alignment algorithm in terms of both accuracy and computational efficiency.
TOP
X04 - Identification of unknown metabolites by aligning fragmentation trees
Short Abstract: Small molecules play an essential role in metabolism and serve as drugs in pharmacology. Mass spectrometry in combination with a fragmentation technique allows sensitive and high-throughput analysis of small molecules. Unfortunately, the automated interpretation of the data is still in its infancy. In 2008, Böcker and Rasche introduced fragmentation trees for the automated analysis of fragmentation pattern, and in 2011 they showed that the fragmentation trees are of excellent quality.
Here, we present a method that allows fully automated computational classification and identification of small molecules that cannot be found in any database. This method is based on the assumption that similar molecules fragment in a similar way. This implies that fragmentation trees of similar molecules are not very different. By aligning the compounds' fragmentation trees, we get a similarity score of the corresponding molecules, derived solely from their fragmentation spectra.
We test our method on three datasets containing 97, 370 and 44 compounds, measured on three different instruments. We cluster compounds based solely on the alignment similarity scores and find a good agreement with known compound classes. We show that fragmentation tree similarities are strongly correlated with the chemical similarity of molecules. Furthermore, we present FT-BLAST, a database search tool. It identifies compounds with fragmentation pattern similar to the fragmentation pattern of an unknown sample compound. FT-BLAST allows the calculation of false discovery rates using a decoy database. This is not possible using common spectral comparisons. FT-BLAST shows good results on a dataset of unknown metabolites from Icelandic poppy.
TOP
X05 - A general fuzzy multi-objective optimization approach for metabolic engineering problems
Short Abstract: Improving the synthesis rate of desired metabolites in metabolic systems is one of the main tasks in metabolic engineering. In the last decade, metabolic engineering approaches based on the mathematical optimization have been used extensively for the analysis and manipulation of metabolic networks. Experimental results show that a strain may reflect resilience phenomenon after stressful environmental changes and genetic perturbations. This resilience phenomenon means that the mutant strain may respond with rapid and dramatic alterations to global genetic perturbations. However, after genetic perturbations, the mutant tries to evolve to a new steady state that may be only slightly different from its previous steady state. This new steady state indicates that the mutant strain tries to recover from its ”wild-type” characteristics and maintain relative stability on metabolism. To show the effects of resilience phenomena on the strategy of metabolic engineering, this study proposes a generalized fuzzy multi-objective optimization approach to formulate the enzyme intervention problem for metabolic networks considering resilience phenomena and cell viability. This approach is a general framework that can be applied to any metabolic networks to investigate the influence of resilience phenomena on gene intervention strategies and maximum target synthesis rates. This study evaluates the performance of the proposed approach by applying it to two metabolic systems: S. cerevisiae and E. coli. Results show that the maximum synthesis rates of target products by genetic interventions are always over-estimated in metabolic networks that do not consider the resilience effects.
TOP
X06 - Using Molecular Interaction Maps as Formal Representations of Pathway Interactions: The Notation and Software Tools
Short Abstract: The Molecular Interaction Map (MIM) notation helps users produce concise and unambiguous diagrams showcasing the complexity of protein interactions participating in biological pathways. Protein interactions cannot be easily represented by simple protein-protein interaction graphs due to post-translational modifications, multi-protein complexes, and the capabilities of specific protein domains, all of which are important in the regulation of signal transduction pathways. Here we present several novel software tools based on an updated specification of the MIM notation. Those tools include the PathVisio-MIM (a tool for end users to draw, edit, and annotate MIM diagrams), the MIMML (an exchangeable XML-based format to capture diagrams and facilitate diagram content re-use), a Java-based application programming interface (API), a Schematron-based MIM diagram validator, and a Groovy-based scripting interface to simplify programmatic manipulation of diagrams, which can be used for experimental data overlays on the MIM diagrams (as an example we overlay drug-target information on MIM diagrams). The updated notation and these software tools make MIM diagrams a feature-rich, formalized representation for pathway interaction diagrams that allows better communication of pathway knowledge and analysis.
TOP
X07 - From network topology to pathogenicity
Short Abstract: Proteins carry out biological processes by interacting in complex networked ways. Hence, analyses of protein interaction networks (PINs) have been invaluable for studying biological function and disease in the post-genomic era. A goal of biological network research is to extract new biological knowledge from network topology. For this, an efficient measure of topology is needed. Using nodes' degrees or other limited measures of topology often led to scientific polemics on whether nodes' functional activity is linked to their network position. Hence, we designed a sensitive measure to quantify topological similarity of nodes' extended network neighborhoods. We used it to show that in PINs: protein function and network topology are closely related; from topology we can extract function that cannot always be extracted from sequence; network topology can successfully identify new cancer genes; aging and disease genes are network-central; network alignment can extract species phylogeny.

Here, we apply our measure to identification of novel pathogen-interacting (PI) proteins and hence potential drug targets from the human PIN. We find that PI proteins have different network topologies than non-PI genes. Furthermore, of all PI genes, those involved with a particular pathogen are more topologically similar to each other than to other PI genes. Hence, our measure appears to be pathogen-specific. We use this specificity to cluster topologically similar proteins and predict novel PI genes from the clusters, validating our predictions in the literature. Moreover, we extend the measure to allow for node overlap between clusters, which further improves prediction accuracy.
TOP
X08 - Systematic identification of overlapping functional modules using weighted functional linkage networks
Short Abstract: Complex traits generally reflect contributions from sets of genes, and their substantial variability reflects the combinatorially large number of allelic groups that are possible even for relatively small gene sets. Moreover, many traits, including diseases, do not conform to groups represented by traditional pathways, but reflect crosstalk between and among subsets of genes in different pathways, or previously ungrouped sets of genes. Most existing algorithms for finding functionally related groups of genes, or modules, are limited to traditionally classified, non-overlapping sets of genes.
We report a new approach to the systematic detection of overlapping functional modules at the genome scale using a human functional linkage network (FLN) previously developed in our lab (Linghu et al). The FLN was developed using Bayesian methods to infer weighted links between functionally related genes based on the integration of 16 data sets. Our algorithm is based on the observation that intra-module genes have stronger correlations than inter-module genes. The algorithm starts with a small number of seed genes, and uses an annealing function to identify the potential modules. We systematically identify the modules for each gene in the genome using different combinations of up to 10 seed genes, only keeping those modules that have an average FLN weight above the given threshold and that can be identified repeatedly using different seed genes. The results indicate that we can successfully uncover most of the existing functional modules defined either by KEGG pathways or GO categories using reasonably small seed sets.
TOP
X09 - Predictive models of gene regulation for Mycobacterium tuberculosis
Short Abstract: The mechanisms underlying the persistence of Mycobacteria tuberculosis (MTB) in the host are poorly understood. The pathogen can remain in a non-replicating asymptomatic state for months to decades and be virulent. Adaptations to hypoxia, a shift to use of lipids, production of a complex cell wall and molecules that specifically manipulate the host immune system are thought to be involved in this process. These adaptations require dynamic changes in gene expression orchestrated, in part, through a complex interacting network of transcription factors. This network and its interactions can be thought of as the “cellular circuitry” that underlies the dynamic behavior of MTB. Here we aim to develop quantitative models that capture the dynamic behavior and complex interactions between the components of this circuitry. We have combined a genome-scale regulatory network consisting of 51 of the 183 MTB transcription factors mapped using ChIP-seq, with gene expression measurements derived from TF induction, to develop predictive models of dynamic gene expression. These models are able to accurately predict the expression of target genes during a time course of hypoxia and re-aeration. Through these efforts, we have begun to reveal key interactions underlying specific MTB adaptations. The network also reveals the interactions between TFs that can result in complex dynamic responses. Thus, we are extending the models to account for feed-forward loops and more complex combinatorial interactions. Ultimately, accurate and predictive quantitative models hold out the promise of rationally guiding the development of drug, vaccines, and diagnostics.
TOP
X10 - Integrating Genotype data and Transcriptome data to detect CKD associated pathways
Short Abstract: An integrative approach combining GWAS-derived candidate genes with clinical information and transcriptomic data could help to define common pathways in chronic kidney diseases (CKD). CKD-candidate genes from previous publications were tested for their correlation with renal function and served as a starting point for the detection of correlated transcripts. A network of pathways enriched for these correlated transcripts was generated and screened for known CKD association and their ability to aggregate biologically related pathways. Cluster analysis of the pathway network revealed a first subcluster consisting of 22 mainly inflammation-related pathways and a second subcluster of 25 mainly metabolic pathways. As expected, metabolism-related transcripts were mainly down-and inflammation-related transcripts were mainly up-regulated. 281 of 309 connecting genes of the pathway network correlated with renal function. Nineteen of the connecting genes were identified to be enriched for the functional term “Renal tubule injury”. Four connecting genes were previously identified CKD biomarkers. A protein-protein interaction network generated from the connecting genes was stronger interconnected than the randomized version of this network, confirming the close functional relationships of the connecting genes. A validation in an independent cohort showed that 80% of the pathways as well as 95% of the connections among them were recaptured. These data show that a combined analysis of genotype and transcriptomic data identifies both, known and yet unknown CKD-associated pathways and their interplay via CKD-associated transcripts. Further hypotheses how SNPs might influence other transcripts in the context of CKD can be tested with our approach
TOP
X11 - Experimental validation of gene function prediction from graph kernels
Short Abstract: As genome-scale RNAi screens require large investments, we would like to use a gene function prediction approach to build small process-specific siRNA libraries. To this end, we introduce different kernels on graph nodes and explore how they capture functional relationships between genes when derived from various data sources. Validation with independent RNAi screen data shows that graph kernels can be used to predict gene function and to significantly increase efficiency in the selection of system-specific RNAi experimental targets. In particular, we validate predictions of chromosome condensation genes using live cell imaging and automated microscopy.
TOP
X12 - Food systems biology network to explore diet-health relationships.
Short Abstract: Through their diets, humans are exposed to complex mixtures of compounds that may be involved in causing, modulating and preventing diseases. These compounds may be nutrients, lipids, carbohydrates and also other small molecules e.g. volatile molecules and phenolic compounds. With the advance of the ‘omics’ area, data become available and opportunities have emerged of increasing our understanding on how food components can modulate our biological systems.

Since several components are shared by a large number of foods, e.g. dodecalactone which occurs in cooked beef, apricot and blue cheese, we generated a network-based approach to identify food-food associations. We developed a food-component matrix to explore connections between foods. Based on this matrix, a food-food association network is created. This network is based on a recently published computational systems biology method[1], based under the assumption that if two foods contain the same compound and perturb the same set of proteins, it might have a higher impact (positively and negatively) on the human systems biology.

The developed food network will help to understand the underlying molecular mechanisms of food components and the biological pathways they perturb by integrating protein-compound data, protein-protein interaction data[2], disease-component annotations and functional annotations of proteins. With the proposed computational systems biology approach, identification of food-food-disease associations is of potential interest, especially with elucidation of interactions between foods and genes within the aim to optimize health with personalized diets.

1 Audouze K et al (2010). PLoS Comput. Biol. 6(5):e1000788
2 Lage K et al (2007) Nat. Biotechnol. 25: 309-16
TOP
X13 - System Bioinformatics Approaches to Model and Decipher Plant Hormonal Regulatory Networks
Short Abstract: Plant hormone (also known as phytohormone) signaling and regulatory networks play critical roles in all aspects of plant function, such as seed germination, organ development, plant growth, and response to environmental challenges. Significant progress has been made during the past decade in elucidating many of the genes involved in hormone synthesis, transport and early signal transduction. In contrast, systematic knowledge of global hormone regulatory networks is still lacking, for example, little is known about the downstream regulatory target genes and phytohormone cross-talks.

We employed both "bottom-up" and "top-down" strategies to model and construct global plant hormonal regulatory networks. Briefly, we developed comprehensive graphical models of plant hormone gene regulatory network (GRNs) that integrate information about molecular interactions including known and predicted protein-protein, protein-DNA, and regulatory small RNA-mRNA interactions. The GRN models consist of "nodes," which represent comprehensive collections of hormone receptors, compounds, kinases, transcription factors (TFs), microRNAs (miRNAs), trans-acting small interfering RNAs (ta-siRNAs), and their regulatory targets, and "edges" between nodes, which quantitatively represent individual molecular reactions through which the products of one gene affect another. We publish the constructed global plant hormonal regulatory networks integrating information on hormone biosynthesis, functional proteins, non-coding RNAs, metabolism cascade, and key signaling transduction in Arabidopsis thaliana phytohormone pathways at PlantGRN::hrGRN - an integrative plant hormonal gene discovery and transcriptional networks analysis platform (http://plantgrn.noble.org). The PlantGRN web server also hosts integrative analysis tools and data resources for GRN inference and gene co-expression analysis.
TOP
X14 - Identification of protein complexes maintaining Oct4 expression in ES cells
Short Abstract: Octamer binding transcription factor-4 (Oct4) is a key factor maintaining pluripotency of embryonic stem (ES) cells. Its biological function made Oct4 a target of several RNAi screens aiming at elucidating mechanisms involved in the maintenance of pluripotency.

We have analyzed three genome-wide RNAi screens conducted in mouse and human ES cells with the aim of identifying genes affecting Oct4 expression. Despite of their similar experimental setup, these screens showed little overlap between hit genes. We reasoned that incorporation of protein interaction information would help to remove noise, improve the consistency between screens and aid the molecular interpretation of the data. We therefore mapped genes tested in RNAi screens onto known mammalian protein complexes, assuming that genes participating in the same complex should cause similar phenotypes. To identify complexes enriched with high-scoring genes, we tested four set enrichment methods. We demonstrated that scoring of the protein complexes considerably increases the consistency between screens and improves recovery of known pluripotency genes. Our analysis identified several complexes acting at the post-transcriptional level that indirectly affect the expression of Oct4. The regulation of pluripotency at the post-transcriptional level is largely unexplored and our findings provide first insights into the machinery involved in this process.

Our analysis reveals that combining results of RNAi screens and incorporating external data help to extract more comprehensive information from the experiments. We present a catalogue of protein complexes potentially important for ES cell maintenance.
TOP
X15 - Integrative analysis of Metabolomics and Transcriptomics data: Filtering, clustering, visualization and Set Enrichment Analysis within the MarVis-Suite
Short Abstract: The combination of experimental data from different omics platforms and methods for data processing plays a central role in comprehensive systems biology analysis. The MarVis-Suite is an interactive software which provides incorporated tools for the integrative analysis of large data sets. These data sets may be obtained from different omics technologies, such as mass spectrometry (MS) based fingerprinting, DNA Microarray or RNA-Seq analysis. Via the MarVis-Filter interface data sets containing multivariate intensity/expression profiles for a large number of marker candidates (e.g. molecular features in MS or mRNAs in Microarray analysis) can be imported from comma separated values files or Excel spreadsheets. After preprocessing, which includes adduct and isotope correction of MS data, the data sets may be filtered according to ANOVA or Kruskal-Wallis tests in combination with correction for multiple testing and heuristic methods, such as signal-to-noise ratio or fold change. The filtered data sets can then easily be combined and further analyzed. For clustering, visualization and interactive selection of marker candidates the MarVis-Cluster tool is utilized. Via the MarVis-Pathway interface whole data sets or selected clusters can be annotated in the functional context of organism-specific pathways from the KEGG and BioCyc database collection. The functional annotation is either based on gene/compound ID matching or comparison of exact molecular masses for compounds. All interfaces are embedded in a statistical framework which allows the analysis of data sets from different omics platforms in a robust (Gene/Metabolite) Set Enrichment Analysis.
TOP
X16 - Comprehensive analysis of microRNA-mediated regulation in colon cancer cell lines
Short Abstract: MicroRNAs (miRNAs) are key post-transcriptional regulators that suppress gene expression by inhibiting translation, promoting mRNA decay or both. Here we integrated global expression profiling data on miRNAs, mRNAs and proteins from nine colorectal cancer cell lines to perform comprehensive analyses of miRNA-mediated regulation. Compared with recent studies suggesting a dominant role of mRNA decay, our results revealed that translational repression played a major role in 31% of the miRNA-target interactions and another 25% of the interactions involved weak, but concordant mRNA decay and translational repression, highlighting an equally important role of translational repression in miRNA-mediated regulation. After evaluating sequence features known to drive site efficacy in mRNA decay, including sites type, sites location, local AU-context and additional 3’ pairing, we found that these features were generally not applicable to translational repression except for local AU-context. This may explain underestimation of translational repression in previous studies. Interestingly, our results suggested that the preference for mRNA decay or translational repression may be entirely dictated by the miRNA itself, as evidenced by strong mechanism preference inferred for several miRNAs and supported by further experimental analysis. Besides expanding our knowledge on miRNA-mediated gene repression, our study also provided interesting insights into colon cancer biology such as the possible contributions of miR-138 and miR-141/miR-200c in inducing specific phenotypes of SW480 and RKO cell lines, respectively.
TOP
X17 - Deciphering genomic alterations in colorectal cancer through subtype-specific driver networks
Short Abstract: High-throughput genomic studies have identified thousands of genomic alterations in colorectal cancer (CRC). Distinguishing driver from passenger mutations is critical for developing rational therapeutic strategies. Because only a few transcriptional subtypes exist in previously studied tumor types (e.g. breast and ovarian), we hypothesized that highly heterogeneous genomic alterations may converge to a limited number of distinct mechanisms that drive unique cancer biology in different transcriptional subtypes. In this study, we seek to define transcriptional subtypes for CRC and to identify subtype-specific driver mutations and networks. Consensus clustering using a patient cohort with 1173 samples identified three transcriptional subtypes, which were validated in an independent cohort with 485 samples. Survival analysis demonstrated that each subtype was associated with statistically different prognosis. For each subtype, we mapped somatic mutation and copy number variation data onto an integrated signaling network and identified subtype-specific driver networks using a random walk-based strategy. For the subtype with the worst prognosis, we found that the driver network was enriched with genomic alterations in the Wnt signaling pathway and the VEGF signaling pathway. Consistently, Wnt targets were significantly enriched in the transcriptional signature of this subtype, as well as genes involved in biological processes regulated by these pathways such as “cell migration” and “blood vessel morphogenesis”. Functional correlation between inferred upstream driver networks and the downstream expression signatures were also observed for the other two subtypes. These results support the hypothesis stated above, and our work provides a general framework for identifying subtype-specific driver mutations and networks.
TOP
X18 - NetGestalt: Data integration over hierarchically and modularly organized networks
Short Abstract: Advanced high-throughput omics technologies have led to an increasing gap between data generation and investigators’ ability to interpret the vast amount of interconnected data. To fill this gap, we have developed NetGestalt, a web application for the visualization and integrative analysis of multidimensional omics data within a network context. Because alterations at DNA, RNA, and protein levels exert their effects primarily through changing the activity of proteins and their participating networks, protein interaction network has become a powerful model for the visualization and integration of different types of molecular data. However, the standard graph-based network visualization becomes inadequate as network size and data complexity increase. We address this challenge through exploiting the inherent hierarchical architecture of protein interaction networks. By using only the horizontal dimension of a webpage to layout genes according to the hierarchical network architecture, NetGestalt allows users to simultaneously compare and correlate information from experimental data, network modules, and existing knowledge rendered as tracks along the vertical dimension of the webpage, similar to the widely used genome browsers. However, without constraining the system to genomic sequence-based coordinates, NetGestalt is able to reveal functional relationship between different genes as encoded in the network. We employ efficient software architecture to enable fast track rendering process and smooth navigation between different resolution scales from individual genes to the whole network. The potential of NetGestalt is demonstrated using the recently published TCGA ovarian cancer data with multiple types of genomic measurements on around 500 tumor samples and corresponding normal controls.
TOP
X19 - Algorithm to Identify Frequent Coupled Modules from Two-Layered Network Series: Application to Study Transcription and Splicing Coupling
Short Abstract: Current network analysis methods all focus on one or multiple networks of the same type. However, cells are organized by multi-layer networks, e.g. transcriptional regulatory networks, splicing regulatory networks, protein-protein interaction networks, which interact and influence each other. Elucidating the coupling mechanisms among those different types of networks is essential in understanding the functions and mechanisms of cellular activities. In this paper, we developed the first computational method for pattern mining across many two-layered graphs, with the two layers representing different types yet coupled biological networks. We formulated the problem of identifying frequent coupled clusters between the two layers of networks into a tensor-based computation problem, and proposed an efficient solution to solve the problem. We applied the method to 38 two-layered co-transcription and co-splicing networks, derived from 38 RNA-seq datasets. With the identified atlas of coupled transcription-splicing modules, we explored to what extent, for which cellular functions, and by what mechanisms transcription-splicing coupling takes place.
TOP
X20 - Reverse Engineering of Mycoplasma pneumoniae Gene Regulatory Network
Short Abstract: The acknowledged genomic, metabolomic, proteomic data of Mycoplasma pneumoniae offers a basis to understand the minimal-cell concept and to characterize the structure, function and dynamics of the organism. Despite of its tiny genome to the contrary, the regulatory networks of the bacteria is complex. Insight into the gene regulatory dynamical networks is obtained via the reverse engineering approach. Genome-wide mRNA and protein concentrations at different growth conditions as time-series measurements are implemented in mathematical modeling.
Our approach consists: 1) clustering of genes/proteins by cMonkey, 2) static network determination by Inferelator 3) dynamics of the network. cMonkey is customized according to our input types. The dynamical model is designed by the mathematical models and parameter optimization techniques that we built. Time-series experimental data is used to build up the dynamic networks. The functions contains information about the transcription, translation, degradation, transcription factor binding, and enzymatic conversions. The change of components according to time is represented by ordinary differential equations, which is solved by our ODE model. The parameters of the mathematical models are deciphered via optimization techniques: simulated annealing and evolutionary algorithms.
The method leads to both quantitative and qualitative properties of the network. The topology and function of M. pneumoniae will be helpful for further studies on drug design to treat and prevent pneumonia.
TOP
X21 - Reconstructing targetable pathways in lung cancer
Short Abstract: Signaling networks are frequently perturbed in cancer cells, and their aberrant activity leads the cancer initiation and progression. Although, oncogenic pathways have been extensively characterized, in many cases as in the KRAS oncogenic pathway, the specific network of effector proteins that drives carcinogenesis in a particular tissue is still far from being understood. In this work, we studied the KRAS signaling pathway in KRAS dependent non-small lung cancer cell lines (NSCLC) by integrating transcriptome, proteome and phosphoproteome. Gene expression, protein abundance and protein phosphorylation were quantified for each of 12 cell lines using RNA-sequencing (RNA-seq) and label-free quantitative tandem mass spectrometry (LC-MS/MS) respectively. In order to reconstruct active and targetable networks associated with KRAS dependency, we formulated this network reconstruction task as a Prize Collecting Steiner Tree (PCST) problem allowing us to synthesize transcriptome, proteome and phosphoproteome signatures with a human protein interaction network derived from public repositories. The network reconstruction formulation have several advantages when compared with traditional pathways enrichment methods: it uses the topology of the network; it can find hidden modules or nodes relevant for the network which were not directly measured and it does not require very large data sets in order to reconstruct the network as it is the case with network inference methods. By using the above strategy we are suggesting a druggable pathway (MET, LCK, PAK1) that is active in the KRAS-Dependent but not in the KRAS-Independent phenotype, and so defining new potential druggable targets for treating KRAS dependent lung cancer.
TOP
X22 - Spindle organization is regulated nonlinearly
Short Abstract: The spindle apparatus is a crucial cellular component, which is responsible for chromosome segregation during cell division. Its construction is a complex process with different checkpoints to avoid failures which may lead to cancer. Spindle organizing is assumed to contain signal cascades and feedback loops for controlling this process. It is unknown whether this process is based on nonlinear regulations of transcripts to maximize its efficiency.

Systems biologists usually construct gene networks to model a biological process based on known interactions between these genes. While linear gene interactions can be detected robustly, it is difficult to reliably detect nonlinear dependencies between genes because nonlinearity and noise must be distinguished from each other.

We propose a new generative model to detect nonlinear interactions between genes. The model extends linear Gaussian factor analysis where gene expression values belonging to a pathway are mainly driven by a single latent factor. In our model, genes are also nonlinearly driven by the hidden factor. To avoid the interpretation of noise as nonlinearity, we determine p-values that measure the probability of a linear gene being detected as nonlinear by chance.

Using this algorithm, we detect nonlinear interactions of genes involved in spindle organization based on microarray gene expression data. We found that TTK and ZWINT might have a strong nonlinear dependency to the underlying factor driving spindle organization.
TOP
X23 - Integrating Many Co-Splicing Networks to Reconstruct Splicing Regulatory Modules
Short Abstract: Alternative splicing is a ubiquitous gene regulatory mechanism that dramatically increases the complexity of the proteome. However, the mechanism for regulating alternative splicing is poorly understood, and study of coordinated splicing regulation has been limited to individual cases. To study genome-wide splicing regulation, we integrate many human RNA-seq datasets from Sequence Read Archive to identify splicing module, which we define as a set of cassette exons co-regulated by the same splicing factors. We have designed a tensor-based approach to identify co-splicing clusters that frequently appear across multiple conditions, thus very likely to represent splicing modules - a unit in the splicing regulatory network. In particular, we model each RNA-seq dataset as a co-splicing network, where the nodes represent exons and the edges are weighted by the correlations between exon inclusion rate profiles. We apply our tensor-based method to the 38 co-splicing networks derived from human RNA-seq datasets and indentify an atlas of frequent co-splicing clusters. We demonstrate that these identified clusters represent potential splicing modules by validating against four biological knowledge databases. The likelihood that a frequent co-splicing cluster is biologically meaningful increases with its recurrence across multiple datasets, highlighting the importance of the integrative approach. Co-splicing clusters reveal novel functional groups which cannot be identified by co-expression clusters, particularly they can grant new insights into functions associated with post-transcriptional regulation, and the same exons can dynamically participate in different pathways depending on different conditions and different other exons that are co-spliced.
TOP
X24 - Structural Properties of Proteins and their Relationships to Fold Age
Short Abstract: The evolution of proteins, the basic unit of biological functions, is the single process that has delivered the diversity and complexity of life that we see around ourselves today. While the evolution of species depends on continued and successful reproduction, protein evolution faces selection pressure purely from improved, diversified or even unaltered function. The sequence-structure-function paradigm is highly relevant to a conversation on protein evolution. While genetic changes, the vehicle for evolution, occur on a protein’s amino acid sequence, selection for these changes occurs, as discussed above, on the functional level. The 3-D structure of a protein is intimately connected to both its sequence, through the physical and chemical properties of amino acids, and its function, by underpinning the nature of possible interactions a protein can maintain. As such it is arguably an appropriate unit for any consideration of the evolution of proteins.

Existing work in the field has already shown that there exist certain relationships between properties of structural domains and their fold age. Here we explore a variety of different definitions for the "age" of a protein fold and comprehensively explore how a portfolio of different domain characteristics relate to the age of its fold. In doing so we aim to assist in a course-grained understanding of the forces behind the evolutionary landscape of protein structures.
TOP
X25 - ENViz: a Cytoscape Plugin for Integrative Statistical Analysis and Visualization of Multiple Sample Matched Data Sets
Short Abstract: Modern genomic, metabolomic and proteomic assays produce high quality multiplexed measurements characterizing molecular composition and activity of biological samples from often complimentary angles. Integrative analysis of such measurements remains a challenge to researches in the field. Here we present ENViz - a Cytoscape plugin that implements enrichment networks approach to the joint analysis of two types of sample matched datasets and available systematic annotations. ENViz analyses a primary dataset (e.g. gene expression) with respect to a pivot dataset (e.g. miRNA expression, metabolomics or proteomics measurements) and primary data annotation (e.g. pathway or gene ontology) in the following way. For each pivot entry, we rank elements of the primary data based on the correlation to the pivot across all samples, and compute statistical enrichment of annotation elements in the top of this ranked list based on the minimum hypergeometric statistics (Eden et al, 2007). Significant results are represented as an enrichment network - a bipartite graph with nodes corresponding to pivot and annotation entries, and edges corresponding to pivot-annotation entry pairs with enrichments scores above the user defined threshold. Correlations of primary data and pivot data can be visually overlaid on biological pathways for significant pivot-annotation pairs using WikiPathways (Kelder et al, 2012). Edges of the enrichment network may point to functionally relevant mechanisms. In Enerly et al. (2010), an association between miR-19a and cell-cycle module was substantiated as association to proliferation, validated using high-throughput transfection assays where transfection of miR-19a to MCF7 cell lines resulted in increased proliferation.
TOP
X26 - Network Biology and Machine Learning Approaches to Metastasis and Treatment Response
Short Abstract: Metastasis causes 90% of cancer deaths, and is clearly a major clinical concern. Epithelial-mesenchymal transition (EMT) is a cellular programme implicated in normal development and the invasion-metastasis cascade. We have taken a network biology approach to EMT proteins in human breast and ovarian cancers using tissue microarray (TMA) quantitative immunofluorescence.

We have developed a toolkit specifically focused on TMAs, including methodology for normalisation, batch correction, clustering, estimation of survival functions and inference of small-scale networks typical of these data. EMT protein networks were produced using a combination of Spearman's correlation and kernel density estimated mutual information, with per-protein adaptive bandwidth estimation. Therefore, networks include signed and non-monotonic edges. For example, a nine-node network calculated over 128 node-positive breast cancer patients had twenty-one non-monotonic relationships in twenty-eight significant edges.

Building on this, we developed an algorithm (TamSVM) to risk-stratify tamoxifen-treated estrogen receptor-positive, node-negative breast cancer patients for treatment selection. These patients commonly receive tamoxifen and chemotherapy. However, retrospective analysis indicates 80% of patients derive little benefit from chemotherapy. TamSVM predicts survival under tamoxifen alone to assist clinical decision-making. Candidate features were seven proteins relevant to breast cancer and EMT, plus clinical parameters (age, tumour grade and size) for >1200 breast cancer patients. A support vector machine (SVM) was developed with wrapper feature selection to predict 10-year distant metastasis-free survival. The classifier gave an area under ROC curve (AUC) of 0.8 on an independent blind test cohort and compares well with alternative approaches.
TOP
X27 - Discriminative modeling of protein networks based on ensemble machine learning
Short Abstract: Understanding differences between pathological and normal phenotypes can be facilitated by discovering networks of proteins that are show different behavior in different phenotypes. To aid this task, we developed a method based on ensemble machine learning that integrates experimental data with existing biological knowledge.
Specifically, we focused on ensembles of linear support vector machine models. Each model is trained using experimental data combined with knowledge in the form of the collection of pathways obtained from Pathway Commons database. Based on shortest paths in the pathways, we derived connections between the proteins for which proteomic measurements are available in the analyzed dataset. Training of each model within the ensemble starts with a single seed protein, and expands the network by adding neighboring proteins that together are best in phenotype discrimination. Then, the ensemble is formed by combining models trained with different seed proteins.
We tested the methods using proteomic dataset related to wound healing that includes levels of 175 proteins, each measured in 8 mice. Half of the animals were exposed to stress, which is known to delay wound healing. The other mice served as control. The independent test-set accuracy in discriminating stressed versus non-stressed mice was markedly better, with area under ROC of 0.9, for the ensemble than for a single large SVM model trained without network information and without using the ensemble approach, which achieved AUC of 0.7. The results show that ensemble methods that incorporate prior knowledge can markedly improve the accuracy of discriminative models used for protein network modeling.
TOP
X28 - A Supply Chain Landscape Model for Mechanisms Investigation and Drug Efficacy Prediction of Multiple Myeloma Drug Resistant
Short Abstract: The tumor signaling pathway networks are supply chain networks. Fast tumor growth requires strong oncogenetic signals delivered via cellular signaling pathways at low cost of cellar resources. Anti-tumor medicines block the transduction of growth signals and induce apoptotic cues to fail the demands of tumor growth signals. The development of drug resistance is a typical re-routing of signaling supply chain by activating bypass pathways and shutting down death pathways to recover the supply of tumor growth cues. Therefore we 1) defined tumor signaling network and drug resistance process in terms of supply chain management, re-routing, and optimization problems, 2) established a landscape-based signal supply network from gene expression and dynamic functional protein data for model parameterization, and 3) applied supply chain network analysis approaches to illustrate the underlying mathematical principles of tumor cell responses to drug treatments.

Our supply chain landscape suggested the pivot role of autophagy in ATO-induced myeloma cell death, and revealed the importance of lipid metabolism for drug resistant. The re-routing analysis demonstrated the possible remodeling strategies tumor cells may utilize in response to drug combinations, and predicted candidate drug targets to re-sensitize drug-resistant cells. Our analysis also emphasized that cell attachment contribute to the post-treatment survival of blood cells, which has been experimentally validated by us.

In summary, out work proved the concept of supply chain landscape and established fundamental concepts and approaches for further using the latest arsenal of supply chain models and methods to address a wide range of signaling network related problems.
TOP
X29 - PARADIGM 2.0. An extended pathway inference engine incorporating predicted interactions
Short Abstract: We have extended the PARADIGM model which infers protein activity from genomic copy number and mRNA expression in a pathway context. The new additions include predicted functional interaction superimposed on existing curated pathways, and utilization of methylation data to infer epigenetic state.
TOP
X30 - HotLink: Identifying causal paths linking genomic perturbations to expression states in cancer.
Short Abstract: Samples from the same cohort are characterized by any number of genomic perturbations involving gene mutations, focal copy number gains and losses, and distinct promoter methylation events. One goal of cancer genomics is to connect these observed and imposed perturbations to the molecular changes that occur in cancer cells. Identifying genetic pathways activated in response to perturbations will lead to a mechanistic understanding of drug response and disease progression.

We have developed a method based on a heat-diffusion kernel approach that connects genomic perturbations to gene expression changes. The method computes a subnetwork solution that interconnects protein level data to gene expression level data using protein-protein interactions, predicted transcription factor to target connections, and curated interactions from literature. Permutation-based analysis is then used to gauge the significance of the solutions resulting from the HotLink network.

We have applied our method to four Cancer Genome Atlas (TCGA) datasets including glioblastoma multiform, ovarian cysadenocarcinoma, colorectal, and breast and found that the method identifies the expected major pathways in these different tumor types. In the breast cancer TCGA dataset, our method identified a key signaling pathway through beta-catenin that explains MYC activity in basal tumors, as well as additional signaling pathways involved in the basal tumor phenotype. In each case, these pathways contain genes lacking any genomic perturbation data, and can only be identified with a pathway based approach. In addition to uncovering these key genes, our results provide a mechanistic explanation of tumor behavior that may suggest subtype-specific drug targets.
TOP
X31 - A Systems Biology Approach to Discover the Components of Perturbed Signaling Pathways in Huntington’s Disease by Integrating Genetic and Proteomic Data
Short Abstract: This poster is based on Proceedings Submission:

Huntington’s disease (HD) is a dominant neurodegenerative disorder, caused by the expansion of polyglutamine in the N-terminal of the huntingtin protein. Although many genes have been shown to increase or decrease the toxicity of mutant huntingtin or polyglutamin proteins in model systems, the exact mechanisms by which these genes altered the toxicity of mutant huntingtin are unclear. Consequently, we hypothesized that many of these genes are linked to changes in phosphoregulation through perturbed signaling pathways in HD.

To discover these perturbed signaling pathways in HD, we applied a combined computational and experimental approach. We first compiled genes which influence the toxicity of mutant huntingtin in yeast, Drosophila, and C. elegans. Furthermore, using mass-spectrometry, we identified differentially phosphorylated proteins between striatal cells from knock-in mice expressing either wild-type or polyglutamine-expanded huntingtin. We then linked the genetic screening data to the differentially phosphorylated proteins by adapting a computational approach. This approach is based on the prize-collecting Steiner tree (PCST) algorithm, and finds an optimum network which links the detected proteins in experiments through undetected ones by known protein-protein interactions.

Using PCST, the mouse orthologs of the genetic screening data and the differentially phosphorylated proteins in HD are connected. Gene ontology (GO) enrichment analyses of the resultant network identify clusters of proteins enriched with the functions related to HD. Furthermore, we ranked network nodes based on their importance in the network. The nodes with higher ranks are potentially the components of perturbed signaling pathways in HD.
TOP
X32 - Dynamic Network Analysis Reveals Stage-specific Changes in Early Zebrafish Embryo Development
Short Abstract: Molecular networks act as the backbone of molecular activities within cells, providing a unique environment for analysis of other omics data, such as whole transcriptome RNA sequencing data. However, a static network cannot explain how the biological molecules in the network are dynamically regulated in a spatiotemporal manner. To address this challenge, we propose a novel network-based computational framework to investigate the spatiotemporal changes of a biological system at network level, by integrating time series mRNA sequencing data, molecular network, and gene ontology (GO). Specifically, differential expressed genes (DEGs) were first identified by a negative binomial model, and a network between these DEGs was extracted from the static molecular network. A meta-flow network of GO biological process categories enriched in the DEGs was constructed. This GO level network is more statistically reliable compared to individual gene network and gains a better perspective on what the temporal pattern across different stages in the biological system looks like. The proposed method was applied to investigate 1α,25-Dihydroxyvitamin D3-altered changes during early zebrafish embryo development. We observed a temporal propagation of 1α,25-Dihydroxyvitamin D3-altered transcriptional changes from a few genes that were altered at earlier stage, to large groups of biologically coherent genes at later stages. The most notable biological processes included cartilage development, one-carbon compound metabolic process and retinal development, and generalized stress response. These demonstrate that our approach can identify well-studied as well as unrecognized mechanisms by which 1α,25-Dihydroxyvitamin D3 functions in vivo in developing eukaryotes.
TOP
X33 - Integrating sparse data and "omics" networks using Bayes
Short Abstract: We approach the challenge of prioritizing genes that may be common members of a functional module associated with a specific phenotype or condition. Two challenges frequently occur with this objective; sparse experimental measurements on certain genes, and making use of all public or otherwise available data that can supplement experimental data.

A Bayesian approach allows us to address both these challenges. Missing measurements on some genes can be allowed through the use of priors. Information outside of the experiment can be incorporated into these priors.

Our approach differs from other Bayesian approaches by focusing on network biology. Given a set of genes of known association (seed genes), we take a network model of gene interaction (eg. regulatory network, protein interaction network) and calculate a score reflecting topological proximity between each gene in the network and the seed genes. Such network connectivity scores represent a measure of functional association with the seed genes. Moreover, these specific scores are probabilities, so they can be used to create prior and likelihood distributions in a Bayesian framework. We allow any number of interaction networks to be integrated into the prior. Once a posterior probability distribution is calculated, Bayesian decision theory is employed to select a module of genes associated with the condition.

We are currently using simulation to test how robust this approach is to stochasticity, the size of the functional module being studied, and degree of sparseness in data measurements.
TOP
X34 - Coral: an integrated suite of visualizations for comparing clusterings of biological data
Short Abstract: Clustering is a standard analysis for many types of biological data (e.g. interaction networks, gene expression, metagenomic samples). However, it is typically possible to obtain a large number of contradictory clusterings by varying the clustering algorithm, the algorithmic parameters, and by considering near-optimal clusterings, each capturing a different aspect of the data. This produces uncertainty in which clusters represent real features and have biological meaning.

We present Coral, an application for the interactive exploration and comparison of ensembles of disagreeing clusterings. Coral provides several coordinated visualizations comparing clusterings at various levels of detail. Visualizations include a ladder component that shows clustering comparison metrics (such as Jaccard similarity, variation of information, F-measure, and others), providing a high-level comparison between entire clusterings. A co-clustering matrix shows how often each data item co-clustered with other items and helps identify core groups of frequently co-clustered items. A parallel partitions plot allows users to trace the cluster membership of specific sets of items across all clusterings, naturally highlighting clusters that have split or merged. Using these and other visualizations in Coral, researchers can assess the diversity of clusterings and gain insight into why they are (dis-)similar.

As a case study, we compare clusterings of a protein interaction network of Arabidopsis thaliana. We find that the clusterings vary significantly and that few proteins are consistently co-clustered in all clusterings. We suggest that several clusterings should typically be considered, and Coral can be used to perform a comprehensive analysis of these clustering ensembles.
TOP
X35 - Cross-species alignment of gene-gene coexpression networks for metabolic syndrome related datasets
Short Abstract: BACKGROUND: Animal models have been useful for improving our knowledge of molecular interactions underlying human diseases. However, often animal models fail to mimic human disease adequately. One way of validating the similarity of a model organism to its human counterpart is to integrate gene expression profiles from different studies and then identify conserved co-expression subnetworks across species. Nonetheless, gene-gene coexpression networks are significantly denser than protein-protein interaction networks and require the alignment of millions of weighted edges.

RESULTS: We developed an R package geoDB consisting of a MySQL database schema that provides a homogeneous framework for the systematic collection, storage and retrieval of gene expression data from heterogeneous platforms stored in GEO. Coexpression networks are calculated for datasets stored via geoDB and aligned using the network alignment tool Natalie with a suitably designed scoring function. The optimal alignment of two coexpression networks is based on the optimization of an objective function that incorporates (i) similarity between nodes (genes) based on all-against-all BLAST (ii) conservation of the degree of coexpression between pairs of genes. However, NP-hardness of the optimization problem complicates the search for the best scoring alignment. Therefore, Natalie uses Lagrangian relaxation approach in order to obtain lower and upper bounds to the solution.

CONCLUSION: The geoDB R package provides a platform for loading gene expression data from heterogeneous platforms into a local database. The subnetworks derived from the network alignment of human and mouse liver expression datasets indicate that the method scales well and produces meaningful conserved co-expressed clusters.
TOP
X36 - Integrating to Interact: Using Reconstructability Analysis to Integrate Genomic and Proteomic Data to Predict Drug Sensitivity
Short Abstract: The challenges of developing a personalized cancer therapy are immense, given the diversity of data types and size of data that are necessary for a comprehensive representation for tailoring a therapy. Integration of multiple -omics types (proteomics, genomics, transcriptomics) is required in order to gain a unified understanding to how genetic alterations affect drug sensitivity. Discovering which gene states and protein levels are associated with drug sensitivity is difficult, given the high dimensionality and heterogeneity of these data types. We explore the impact of integrating proteomic, genomic, and copy number variation data using Reconstructability Analysis (RA) as well as several other discrete data modeling methods to predict drug sensitivities (GI50s) in a panel of breast cancer cell lines. RA is a discrete information-theoretic modeling framework, related to Bayesian Networks, that is capable of detecting interactions between variables, such as protein levels and copy numbers, in predicting drug sensitivity. By evaluating nested models that include interactions between -omic variables, RA can highlight which variables and interactions are important in predicting drug sensitivity. We examine the impact of the discretization of continuous data, as well as the impact of adding each data type to the analysis. The results provide guidance on which proteins, transcripts, and gene alterations and states to focus on in developing continuous models of drug response.
TOP
X37 - Comparing Diverse Pluripotent Stem Cell Fate Programs Using Cell-Type-Specific Data Integration and Machine Learning Methods
Short Abstract: Machine learning techniques that apply Bayesian networks for genomic data integration were originally developed to predict functional linkages among proteins in yeast and have since been successfully applied to explore functional gene relationships in mouse and human systems. These types of predictive functional networks enable researchers to explore proteins likely to be related based on data collected from scores of studies, under hundreds of conditions, applying many different high-throughput experimental techniques. However, little work done to date considers the importance of cellular context for data integration in mammalian systems, in which the same protein may perform very different functions in different cell types. We developed a Bayesian network machine learning methodology designed to generate predictive functional networks using cell-type-specific, high-throughput mammalian data. For this study, we assembled separate compendiums of mouse and human pluripotent stem cell data that we integrated into cell-type-specific functional relationship networks focused on a biological processes known to be active in those cell types: self-renewal and cell fate determination. We have compared networks generated for different pluripotent stem cell types in different species, analyzed and compared the biological content of these predictive networks, and selected top predictions for experimental validation. These pluripotent stem cell networks will be publicaly available at StemSight.org, our portal for high-throughput stem cell data analysis. Our results demonstrate that Bayesian network integration of high-throughput data restricted to a single cell type can significantly enhance the predictive clarity of mammalian functional relationship networks.
TOP
X38 - Learning dynamic networks for combined miRNA and transcription factor regulation
Short Abstract: Understanding the regulation of gene expression is one of the key issues to understand mechanisms that lead to diseases. A common approach is to model the interdependencies of regulatory relationships in form of a regulatory network (RN). In most cases RNs are learned as static snapshots but we are concerned with constructing the RN dynamic to recognize regulatory relationships that change over time.
The dynamic regulatory events miner (DREM)[1] was previously developed to include static transcription factor (TF) binding data (ChIP-Chip or ChIP-seq) and gene expression time series data to produce a regulatory map. In this work we show how to extend the DREM approach to model the down-regulating effect of miRNAs by including miRNA time series expression data. The new approach is able to predict joint regulatory events of TF-miRNA gene regulation. We show the effect of inclusion of miRNA down-regulation by application to mouse time series data, sheding new light on the impact of miRNA regulation in the context of development.
References :
[1] Jason Ernst, Oded Vainas, Cristopher T. Harbison, Itamar Simon, and Ziv Bar-Joseph. Reconstructing Dynamic Regulatory Maps. Molecular Systems Biology, 3:74, 2007
TOP
X39 - Integrated Network Approaches to Understanding the Combinatorial Control of Expression by Variants of the Mediator Complex
Short Abstract: The Mediator complex is a large multi-subunit protein complex that is required for transcription of most protein-coding genes. Structurally, Mediator is comprised of head, middle, tail and CDK modules. In yeast, the CDK module includes the subunits SRB8, SRB9, and SRB10. Each of these three subunits has two homologs in vertebrates: MED12 and MED12L for SRB8, MED13 and MED13L for SRB9, and CDK8 and CDK19 for SRB10. Preliminary work indicates that exactly one of each of these three pairs of homologs are included in mammalian Mediator complexes. Further work indicates that each of these six subunits appears in complexes with both homologs of the other two pairs, suggesting that these three pair of homologs give rise to eight variants of Mediator. We use computational methods to provide evidence for which of these eight variants of Mediator appear in nature and determine their function.

We use statistical as well as graph clustering methods to find proteins that are either more strongly associated with one of the Mediator subunits than its homolog or more strongly associated with one of the eight purported variants of Mediator than the others. By looking at the functions of the proteins in these sets and what functions are significantly overrepresented, we hope to discover distinct functions for different variants of Mediator.
TOP
X40 - Integrated Analysis of new autism protein-protein interaction network
Short Abstract: Autism is a neurodevelopmental disorder involving many functionally heterogeneous genes. Here we investigate autism from a systems biology perspective. First, we performed large-scale discovery of brain-expressed alternatively spliced isoforms for 191 autism candidate genes. Then, we screened 420 discovered isoforms for interactions with human ORFeome (~15,000 clones) using yeast-two-hybrid system. Finally, we built autism-centered interactome network (ACIN) composed of 534 interactions between 81 genes and 293 binding partners.
We found that adding splice isoform-specific interactions expanded ACIN by 25%, and some of novel interactions influenced network topological properties by connecting previously disconnected modules. The network analysis allowed identification of binding partners (preys) shared between autism candidate genes, thereby implicating new gene targets in autism. The preys’ inter-connectivity is significantly higher than background human interactome preys (p<0.0001). Particularly, 18 preys in ACIN were binding significantly more autism candidate genes as compared to the background. Five of these preys were found to be located within autism-associated de novo copy number variants (CNVs), representing a significant enrichment compared to the expectation (p=0.0077). The number of preys connecting 2, 3 or 4 different de novo CNVs was also significantly higher in ACIN than in the background network. In summary, the new autism interactome provides a more complete view of the connectivity among known autism candidate genes. Furthermore, novel interactions combine genes from functionally diverse pathways (synaptic transmission, neurogenesis, mTOR, hormone-related cluster) into a single connected component. The new autism interactome represents a valuable resource for the research community and for future autism studies.
TOP
X41 - Meta-Storms: Efficient Search for Similar Microbial Communities Based on a Novel Indexing Scheme and Similarity Score for Metagenomic Data
Short Abstract: This poster is based on Proceedings Submission 20.

It has long been intriguing scientists to effectively compare different microbial communities in large scale: given a set of unknown samples, find similar metagenomic sample from a large database and examine how similar these samples are. With the current number of metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests, and then search this database to find the most similar metagenomic sample for each unknown sample. However, on one hand,current databases with a large number of metagenomic samples mostly serve as data repositories that offer few analysis functionalities; on the other hand, methods to measure pair-wise similarity of metagenomic samples work well only for a small set of samples. It is not yet clear how to efficiently search for metagenomic samples against a large database containing metagenomic samples. In this study, we have proposed a novel method, Meta-Storms, to systematically and efficiently organize and search metagenomic data. It includes these components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficiently indexing all samples in the database based on taxonomical structures, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny, and (iv) managing database by index import/export, data insertion/deletion and database merging. It could achieve similar accuracies compared to current popular significance-testing based methods, and the query based on indexing is more than 10 times faster than brute-force one-against-one metagenomic sample comparison.
TOP
X42 - Construction of Flux-Balance Models with MetaFlux and Computing Accurate Atom-Mappings for Biochemical Reactions
Short Abstract: Flux Balance Analysis (FBA) can be applied to metabolic models to predict the growth rate of an organism, analyze the effect of gene deletions, and more. But obtaining a working FBA model can be challenging and time consuming. Indeed, a workable FBA model is based on: 1) A sufficiently rich set of balanced reactions; 2) a correct set of biomass metabolites; 3) an appropriate set of secreted metabolites; and 4) a sufficient set of nutrient compounds. If only one of these requirements is not met, flux-balance analysis cannot be done.

We present MetaFlux, a tool to help generate FBA models from metabolic databases. This tool greatly reduces the time required to get a working FBA model. MetaFlux, based on Mixed Integer Linear Programming (MILP), obtains a working FBA model using a multiple gap-filling
approach. Starting from a possibly incomplete set of reactions, nutrients, secretions, and biomass metabolites, multiple gap-filling completes these sets to obtain a feasible FBA model by adding new reactions from a reaction database and new secretions, nutrients, and
biomass metabolites from user provided ``try-sets''.

We also present a new approach to compute atom-mappings of biochemical reactions based on a minimum weighted edit-distance (MWED) metric using MILP. We have applied our approach to the reactions of the MetaCyc database. A comparison of the computed atom-mappings to the KEGG RPAIR database shows an error rate of 0.9%.
TOP
X44 - Prediction of genetic interactions using network topology
Short Abstract: Motivation: Genetic Interaction (GI) detection impacts the understanding of human disease and the ability to design personalised treatment. The mapping of every GI in most organisms (even yeast and worm) is far from completion due to the combinatorial amount of gene deletions and knockdowns needed. Computational techniques to predict new interactions based only on network topology have been developed but never applied to GI networks.
Methods: We applied several neighbourhood-based and network-embedding techniques to yeast and worm GI networks to predict new links. To investigate the true robustness of each approach, we removed links uniformly at random from the networks and analysed how sparsification impacts prediction. We also tested if a biologically meaningful network topology can be modelled by adding links uniformly at random to the aforementioned sparsified networks.
Results: We show that topological prediction of GIs is possible with unexpected high precision. Node-neighbourhood-based techniques perform better when the network is dense, while network-embedding approaches present similar performance in both dense and sparse networks, with Minimum Curvilinear Embedding attaining the best result. We also demonstrate that a random network re-densification process cannot re-generate the topology shaped by the evolution of past biological processes.
Conclusion: Computational prediction of GIs is a strong tool to aid high-throughput GI determination and untangle the complex relationships between genotype and phenotype. An innovative technique, inspired by cybernetic principles, should self-adapt its prediction in relation to different sparsity levels.
TOP
X45 - Minimum curvilinearity for topological prediction of protein-protein interactions by network embedding
Short Abstract: Motivation: Most functions within the cell emerge thanks to protein-protein interactions (PPIs), yet their experimental determination is both expensive and time consuming. Prediction of interactions using solely PPI-network topology (topological prediction) is a novel approach convenient when prior biological knowledge is absent or unreliable.
Methods: Network-embedding emphasizes relations between network proteins embedded in a low-dimensional space, where protein-pairs closer to each other represent potential candidate interactions. The approach we propose is based on the intuition that the use of the non-centred minimum curvilinear embedding (ncMCE) - first innovation - combined with the shortest-path (SP) distance for assigning likelihood scores in the reduced space - second innovation - might boost the performance in prediction. In addition, we introduce an automatic strategy - third innovation - for the selection of the appropriate embedding dimension.
Results: We compared our method against several unsupervised and supervised embedding approaches; and node-neighbourhood-based techniques. Despite its computational simplicity, ncMCE-SP was the overall leader outperforming the current methods for topological link prediction. The superiority of ncMCE may be due to its ‘soft-threshold effect’, which boosts the separation between good and bad candidate links.
Conclusion: Minimum curvilinearity is a valuable nonlinear framework, which we successfully applied in embedding of protein networks for unsupervised prediction of novel PPIs. The rationale is that a certain level of prior biological knowledge is hidden and memorised in the ‘nonlinear evolutionary relations’ of the network topology and thus can be used for prediction. The predicted PPIs represent good candidates to test in high-throughput experiments.
TOP
X46 - Revealing signaling pathways from systematic perturbation experiments by unifying knowledge mining and data mining
Short Abstract: This poster is based on Proceedings Submission 26

section{Motivation:}
Genetic and pharmacological perturbations are powerful systems
biology tools to study cellular signal transduction pathways.
Here, we report a framework to unify knowledge mining of gene
functions and data mining of systematic perturbation data to
identify modules of genes that likely constitute signal
transduction pathways.
section{Results:}
Our framework consists of three steps: 1) applying ontology-guided knowledge mining approaches
to identify functionally coherent modules among the genes that respond to systematic
perturbations; 2) employing a graph algorithm to identify perturbation instances that are
densely connected to the genes in a responding module, as a means to reveal the members of a potential signaling
pathway; 3) organizing perturbation instances in
a manner that genes carrying common signals are grouped as a signal component, and the relationships among
different signal components are further revealed as a network. Applying our approach to a compendium
of yeast systematic perturbation data, we have successfully identified perturbation modules, with some corresponding to
well-known signal transduction pathways and some leading to new hypotheses.
Further grouping perturbed genes into signaling components
revealed the organization of cellular signaling systems. Our results have led to new hypotheses regarding yeast
cellular signaling systems: some are supported by existing publications and some need further validation.
In conclusion, we have demonstrated
that our framework enables conceptualization of experimental results, which enhances the capability
of identifying signal transduction pathways from systematic perturbation data and leads to new directions
of knowledge acquisition and representation.
TOP
X47 - Bayesian Estimation and Analysis of Pathway Models using Kernel-enhanced Particle Filters
Short Abstract: Estimating the parameters of biochemical pathway models remains a challenging task. Model parameters are commonly under-constrained, and several explanations exist for the same set of measurement data. Choosing a single best estimate of the parameters is often inadequate for further analysis. Bayesian filtering provides a framework for estimating and representing a posterior probability distribution over the space of parameters. In particular, particle filters can sequentially approximate such probability distributions in a sparse and nonparametric way. However, the performance of particle filters can degrade due to limited sample size, resulting in collapsed distributions.

We provide an improved particle filter that guarantees sample diversity, while preserving the parameter posterior. This is achieved by applying an additional MCMC kernel at each stage of the sequential estimation process. The kernel is automatically tuned based on sensitivities, to ensure high acceptance rates. The method is demonstrated on a widely used model of the JAK-STAT signalling pathway, and results are compared to previously proposed particle filtering methods as well as pure MCMC methods.

The results show that using the same number of samples, the proposed method is superior in providing a representative estimation of the parameter posterior. This is critical in making accurate predictions in a Bayesian manner, without assuming a single best choice of model parameters. Using the set of particles, we show a more informative description of unmeasured species, represented as probability distributions. The particle filter can also be used to select between alternative models with differing structures.
TOP
X48 - Integrative approaches for mode of action determination
Short Abstract: High-throughput expression profiling techniques e.g. mRNA profiling often result in unstructured lists of differentially expressed genes. Computational analysis of unstructured lists will in most cases be restricted to a simple functional enrichment analysis to elucidate the potential pathways or functions the candidate genes belong to, but the interactions between the identified candidate genes or their role in the global molecular network of the cell remains unknown. In this work we present a suite of computational tools to analyse such data sets and to integrate them with publicly available omics data on two specific applications:
1) Results generated by genetic screenings can be expanded with an expression compendium, which consists of the expression value of genes over thousands of conditions. Identifying in a large compendium the genes coexpressed with a query gene list helps identifying the larger context of pathways and conditions under which the genes of interest showed to be active (modules). A coclustering strategy would detect to what extent identified modules are conserved across species.
2) Molecular profiles of individuals obtained at different levels (for instance miRNA and mRNA) can be treated either separately or in combination. We illustrate 3 complementary strategies that consist of: coclustering of data to detect coordinated changes in expression across samples; explanatory models to help finding regulatory programs that can explain the expression profiles of mRNA modules; and integrative procedures that uses regulatory motif to identify modules of which the expression profile is not necessarily related to that of their regulators.
TOP
X49 - Autonomous Sequential Periodic Pattern Formation from Density-dependent Motility
Short Abstract: Sequential and periodic patterns are recurring anatomical features in living organisms. Their rhythmic dynamics and intriguing beauty have fascinated generations of scientists. However, the understanding of the underlying mechanisms is hindered by the overwhelming molecular complexities in most cases. Engineered synthetic systems can simplify the complexities and refine the theoretical assumptions, thereby providing insights into the principles of naturally occurring phenomena. Here we described a synthetic pattern formation system by simply coupling cell density and motility, which enabled the programmed cells to form crisp, periodic stripes of high- and low- densities in a sequential and autonomous manner. Theoretical and experimental analyses revealed that the periodic structure arises from a recurrent aggregation process generated during the continuous expansion of the cell population. In accordance with our model prediction, patterns with different numbers of stripes were generated by tuning the activity of a single promoter. The results establish motility control as a simple, potent route for generating regular spatial structures without the need of a pacemaker, and illustrate the utility of synthetic genetic systems in studying pattern formation in spatially extended systems.

References

[1] Chenli Liu, Science 334, 238 (2011)
TOP
X50 - Predicting Long Term Behavior of Genetic Regulatory Networks with Answer Set Programming
Short Abstract: Genetic Regulatory Networks (GRNs) hold the key for understanding the complex behavior of living organisms. This is the reason why it is very important to have flexible tools that can simulate and handle accurately different situations in GRNs. One particularly interesting aspect of study when modeling and simulating GRNs is their steady state behavior. In GRNs steady states are points where the GRN stabilizes i.e. states where the GRN falls into an oscillatory behavior. In biology this can refer to stable cell-types, e.g. skin cells, liver cells, etc. In computational models of GRNs we are interested in the sets of steady states that represent different system oscillations. Such sets of states are called attractors. The focus of this contribution is on finding all attractors of a given GRN.

Our approach is based on a logical formalism called Answer Set Programming (ASP). Intuitively, in answer set programming one writes a set of rules (the program) such that certain minimal models (the answer sets) of this program correspond to solutions of the problem of interest. Specifically we describe the topology of a GRN as well as its intended behavior as an ASP program P. The attractors of the GRN can then be found through calculating the answer sets of P .

We validated our method against the very well known networks of Yeast (Li et al. (2004)) and Fission Yeast (Davidich and Bornholdt (2008)) for both synchronous and asynchronous dynamics and, to showcase the scalability of ASP-G,
TOP
X51 - A simple network-based analysis for interpreting multi-omics data in cancer research
Short Abstract: The evolutionary advances in the DNA sequencing technology and sophisticated microarray technology have enhanced the diversity of high-throughput biological experiments. In addition to gene expression profiling, DNA methylation status, and mutation analysis in cancer have been studied in a genome-wide manner.
Although the network-based or pathway-based analysis methodologies are indispensable to interpreting the meanings of these multi-omics data, many currently available tools tend to produce the results that are oriented in the so-called hub genes in a biological network such as P53 or EGFR.
Here we developed a new network-based analysis method, which could detect possible key molecules associated with the input genes. The method especially focuses on the hidden key molecules that are not famous as hub genes, and the method also identifies the sub-networks around the key molecules.
We validated the method with the somatically mutated genes in glioblastoma, which are available at the cancer genome atlas project (TCGA) databases. Then we applied the method to multi-omics data including gene expression and DNA methylation profiles of colorectal cancer (CRC) samples, and identified some genes that were likely responsible for tumorigenesis of CRC.
Through further analysis with various types of cancer samples, we will show the importance for the network-based analysis that aims for not only hub oriented genes but also focusing on hidden key molecules in cancer genome research fields.
TOP
X52 - hiPathDB : A Human Integrated Pathway Database with Facile Visualization
Short Abstract: hiPathDB is an integrated database that combines the curated human pathway data of NCI-Nature PID, Reactome, BioCarta and KEGG. It provides two different types of pathway integration. The pathway-level integration, conceptually a simple collection of individual pathways, was achieved by devising an elaborate model that takes distinct features of four databases into account and subsequently reformatting all pathways in accordance with our model. The entity-level integration creates a single unified pathway that encompasses all pathways by merging common components. Such integration makes it possible to investigate signaling network over the entire pathways and allows identification of pathway cross-talks. Added to this, we have collected pharmacological or biomedical relationships such as drug-gene, gene-disease and disease-drug from PharmDB and NCI Cancer Gene Index. The additional data can help us understand the underlying mechanism of drug action in the sub-network composed of properly selected relationships between drug, gene, related disease and signaling pathway. The built-in pathway visualization module supports explorative studies of complex networks in an interactive fashion. The layout algorithm is optimized for virtually automatic visualization of the pathways and relationships between drug, gene and related disease. hiPathDB is available at http://hiPathDB.kobic.re.kr.
TOP
X53 - Analysis of genotype-based drug sensitivity and gene expression
Short Abstract: It is challenging to identify associations between drug sensitivity and gene expression from various high-throughput data. Numerous studies have focus on the cell line, either tumor lineage or multiple genotypes based study. They tried to discover a reasonable relationship between drug response and gene expression which will provide clues to understand the mechanism of drug sensitivity or resistance across diverse cancer cell lines with different genotypic background. Most drug sensitivity seems highly depend on genomic mutations and it can explain diverse drug response observed across multiple cell lines. Notably, mutation of p53 gene is one most common genetic alteration in human cancers and it always plays a critical role in tumorigenesis. In this study, we applied a statistical framework, Cell line Enrichment Analysis (CLEA) to quantitative association between high frequency cancer genotypes and drug response and gene expression using NCI60 cell line panel. In this study, each genotype is divided into two groups based on mutant state of TP53. Many different drug response and gene expression signatures were identified for TP53 mutant state-dependent genotypes. Furthermore, these studies expand understanding of the mechanism of complex genomic associations of drug sensitivity. Also, the combination study of genotype-based drug sensitivity and gene expression is a comprehensive method to identify new potential chemicals of cancer therapy.
TOP
X54 - Identification of proteomic signature for glucose starvation in cancer cells
Short Abstract: Cancer cells always have increased rates of energy metabolism when compared to normal cells. It is the requirements of cancer cells for their proliferation, metastasis or spread throughout the body. In this study we focus on the analysis of proteomic signatures on low glucose condition for diverse cancer cell lines. Here, we used the reverse phase protein array (RPPA) data, which contains cancer cell lines of 170 lineages and diverse genetic mutations. The expression and phosphorylation levels of 115 protein antibodies between normal and low glucose conditions were analyzed in terms of lineages and mutation patterns. A statistical framework, Cell Line Enrichment Analysis (CLEA) was used to analyze the association of protein expression/phosphorylation with diverse cancer genotypes. As a result, genotype dependent proteomic signatures were identified which might provide new insights to understand varied mechanism of cell metabolism in cancers.
TOP
X55 - Inferring host subnetworks involved in viral replication
Short Abstract: Understanding the interactions that occur between viruses and their hosts is important for controlling the impact of viruses on human health. Systematic, genome-wide loss-of-function experiments can be used to identify host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell. We present a computational approach that uses an integer linear program and graph kernel to infer the intracellular pathways through which these host factors modulate viral replication. The input is a set of viral phenotypes observed in single-host-gene mutants and a background network consisting of a variety of host intracellular interactions. The output is an ensemble of subnetworks that provides a consistent explanation for the observed phenotypes, and predicts which unassayed host factors modulate the virus and which host factors are the most direct interfaces with a viral component. These subnetworks could be used to guide further experimentation toward uncovering and validating the mechanisms of host-virus interactions.

We analyze data from experiments screening the yeast genome for genes modulating the replication of two RNA viruses. To evaluate our method, we conduct a cross-validation experiment in which we predict whether held-aside test genes have an effect on viral replication. We compare our method to several baseline methods. As an additional evaluation, we use our approach to predict which unassayed host genes are likely to be involved in viral replication. Multiple predictions are supported by recent independent experimental data.
TOP
X56 - Integration of prior knowledge enables cell wide causal inference
Short Abstract: Numerous large data collections are nowadays available, describing all kinds of protein properties, such as their functions, interactions and conditional responses.

This data contains information relevant to the structure and functioning of the regulatory network. However, combining such heterogeneous data in order to construct a common regulatory model of the cell has proven to be a challenging task.

Here, we propose a new machine learning based method, able to learn how this data can be used together in inferring a regulatory model of the cell.

The resulting model is used to predict and understand regulatory cause-effect relationships. We validate the method in Saccharomyces cerevisiae, showing how the reconstructed regulatory network accurately predicts responses to gene perturbations, and in reverse, predicts causal genes given observed responses. Application to the KEGG MAPK pathways shows that, by simulating the pathway in its cellular context, we can correctly predict feedforward, cross-talk and feedback effects that are not obvious from the pathway diagram itself.
TOP
X57 - Timing the regulation of gene expression during mouse somitogenesis
Short Abstract: In all vertebrates, the segmental pattern of the body axis is established during embryogenesis as somites, masses of mesoderm distributed along the two sides of the neural tube, are formed sequentially in the anterior-posterior direction. Although it is universally accepted that the mechanism is governed by the clock and wave-front model involving genes from Wnt, Fgf and Notch pathways, the exact phase and the hierarchy between those pathways is not well understood. Moreover, the exact number of genes involved as well as their timing is a subject of actual debate [Krol et al 2011].
Based on the analogy between the clock and wave-front model and the wave propagation in physics phenomena, we proposed a model to explain the gene regulation during the process which allows us to determine the correct phase for each gene. The method, based on maximum entropy deconvolution [Rowicka et al 2007] and the geometry of the system, is applied to gene expression timecourse data-sets of [Dequeant 2006, Aurelie Krol et al 2011]. The results allow us to firstly confirm the existing model and secondly propose new candidates cycling genes as well as possible interactions between those genes and hierarchy between corresponding pathways. We have identified a number of genes which have two expression peaks per cycle, which allows us to postulate a previously unknown role for those genes.
We present the timing of genes expression involved in the somitogenesis phenomena and discuss the results according to the known regulated genes and propose new candidates genes involved in the process.
TOP
X58 - A three-dimensional map of protein networks within and between species
Short Abstract: General properties of the largely antagonistic biomolecular interactions between pathogens and their hosts remain poorly understood, and may differ significantly from known principles governing the cooperative interactions within the host. Recent host-pathogen systems biology efforts have generated global, but low-resolution, maps of host-pathogen protein-protein interaction networks. Here, we integrate three-dimensional homology models of protein complexes with interaction networks among human and viral proteins to construct the first human-virus structural interaction network. Subsequent analyses reveal significant biophysical, functional, and evolutionary differences between host-virus and within-host structural interaction networks. We find that viral proteins tend to bind to existing within-host interfaces. Compared to within-host protein-protein interfaces, host-virus protein-protein interfaces tend to be more transient, targeted by more host proteins, more regulatory in function, faster evolving, and rely less on sequence similarity to achieve interface mimicry. These results highlight the distinct consequences of cooperation versus antagonism in biological networks within and between species.
TOP
X59 - Direct Linkage of Transcriptional Response and Metabolic Phenotype via Kinetic Modeling
Short Abstract: Cellular behavior is the outcome of multiple biological processes that are orchestrated with exquisite complexity. Two of these processes are gene expression and cellular metabolism, for which considerable amount of quantitative profiles have been generated. However, extracting biological knowledge from these data is challenging. In this work, we aim to increase our understanding of cellular behavior by bridging the gap between gene expression and cellular metabolism through kinetic modeling. To this end, we developed a modeling framework that addresses key modeling challenges by defining kinetic expressions that can be directly parameterized using experimental gene expression and metabolic profiles. This allows us to construct large-scale kinetic models while avoiding exhaustive experimental characterization, extensive literature mining, and complex optimization problems. We tested the ability of the approach to connect gene expression and metabolism by investigating the response of Saccharomyces cerevisiae under different experimental conditions. For this, we constructed models that displayed predictive capabilities that, to our knowledge, have not been reported by similar modeling efforts and that proved to be effective tools for revealing biological insights hidden in the gene expression and metabolic profiles. Importantly, because of its flexibility and robustness, we feel that our modeling framework can be applied to other cellular systems of medical and industrial relevance by the research community interested in understanding the effect of the regulation of metabolic genes on the cellular phenotype.
TOP
X60 - VisANT: Integrative network platform to connect genes, drugs, diseases and therapies
Short Abstract: VisANT is a network-driven biological data analysis platform for data exploring, mining, and the integration. Its unique metagraph implementation provides a powerful capability of achieving the multi-scale network visualization with integrated hierarchy knowledge such as Gene Ontology and the disease classification. In VisANT 4.0, we provide a set of new features supporting the network pharmacology research and the translational medicine research such as 1) a new user interface to allow easy management of working space and editing node and edge properties; 2) hierarchy presentation of diseases and therapies with integrated associations between diseases and genes, therapies and drugs, and drugs and their targets; 3) visualization of diseases and therapies as metanetworks, which consist of diseases or therapies as metanodes, and metanodes comprise genes or drugs; 4) build-in functions that provide the ability to infer correlation between diseases/therapies at same level or across different levels using a bottom-up approach; 5) a new visualization layout for presenting metanetworks of diseases and therapies as a bipartite graph of drugs and genes, therapies and diseases, or even tripartite graph for visualizing the interconnections between therapies, drugs and drug targets. The new features allow easy mining of biological data for understanding the mechanism of diseases and discovering new drugs. VisANT is freely available at http://visant.bu.edu
TOP
X61 - Gateway Gene Discovery with Network Integration
Short Abstract: High-throughput studies have produced volumes of publicly available metadata which provide a wealth of information that can be used to better guide biological research. However, models that can readily identify actual signal and biological impact from this data have not been developed at the same rate, due in part to dimensional noise and a lack of adequately powerful algorithms for analysis. Thus, rendering of high-throughput data is increasingly becoming possible via network modeling as networks are capable of denoting biological elements and relationships en masse. Moreover, well-established graph theoretic methodology can be applied to network models to increase efficiency and speed of data analysis. We propose an integrated network model that represents multiple states of biological data at the transcriptional level via correlation of gene expression. This work formally defines the concept of “gateway nodes” in the integrated network obtained from two states that identify genes representing a transition in expression and possibly function from one state to another. Further, we provide a proof-of-concept for the existence and importance of these gateway elements by mining critical genes related to murine aging from networks made from hypothalamic gene expression in young and middle-aged mice. This research highlights the need for methods for state comparison (of temporal or multi-stage nature) using network models. It also provides a first-step for identifying the processes behind comparative experiments in aging and disease progression that are applicable to any type of modular multi-state network.
TOP
X62 - Identification of candidate disease genes in rare copy number variant association studies: a novel approach using the STRING network
Short Abstract: There is accumulating evidence that rare copy number variants (CNV) contribute to neurodevelopmental and other congenital disorders. Rare variants are often defined as those occurring in < 1% of the population, hence large cohorts (> 2,000 subjects) are required to achieve significant variant or gene-level association in a case-control study. Alternative strategies consist of testing association for functional gene-sets and pathways, or prioritizing genes based on recurrence as well as known functions and phenotypes.

In this work we propose a novel network-based method for candidate disease gene prioritization. Unlike existing methods, the proposed strategy models the case-control structure, corrects for variable CNV size and does not require a list of known disease genes. Our method requires a weighted network, where the weight of an edge between two genes represents the probability of a functional interaction; we used STRING, one of the most popular probabilistic networks. Each gene (“bait”) is tested as follows: for each subject ,(1) identify the rare CNV gene (“prey”) with the highest interaction weight with the bait gene; (2) test if case subject weights are higher than control ones, using a linear regression model. Top-scoring bait genes can be further mined for reliable interaction partners harboring rare CNVs that are present only in cases.

We discuss results for several rare CNV case-control studies (congenital cardiac malformation, schizophrenia and autism). We successfully generated interesting hypotheses on disease gene and pathways. However we also found that STRING interactions often require time-consuming follow-up to assess their validity.
TOP
X63 - Conservation and Divergence of Protein Complexes Across Evolution
Short Abstract: Despite our knowledge that the vast majority of life's processes at a cellular level are carried out by complexes of multiple proteins, knowledge of all the complexes formed in a cell and their members is a distant goal. By using a new approach developed by collaborators Havugimana and Hart, et al, consisting of 1) subjecting biological samples to many levels of many types of fractionations, 2) using LC-MS/MS to quantify protein levels in each fraction, and 3) processing the data through a machine learning pipeline, we are able to seek complexes using a high-throughput all-by-all approach. By incorporating additional functional genomic information into our learning process, we are able to reconstruct maps of complexes that so far seem to rival in quality those generated using more traditional, much more labor-intensive methods. Here, we apply the approach to biological samples from many organisms, including human, sea urchin, fly, worm, and multicellular amoebas, in order to rapidly learn soluble (nuclear and cytosolic) complexes in many species. From such maps we identify interesting conservation and divergence of complexes not previously understood or studied.
TOP
X64 - Causal Network Analysis of Gliobastoma Stem-like Cells Reveals a Mechanism of Transcriptional Inhibition of Cancer Differentiation Pathways
Short Abstract: Cancer is thought to arise from a single cell, presumably, a normal adult stem cell. As most of adult stem cells, cancer stem-like cells (CSCs) are long-lived, self-renewed and multipotent. The mechanisms responsible for supporting this stable “stemness” condition and leading to their transformation into proliferating differentiated cancer cells are subject of intensive research. We applied a combination of gene expression profiling and causal modeling to elucidate a hypothesis on inhibition of differentiation pathways in CSCs by an activated set of transcription factors and kinases.
The data included global expression profiles of neurospheres (colonies glioblastoma stem-like cells growing in the absence of serum) with the corresponding differentiated glioblastoma cells for four cell lines. The cell lines were highly heterogenous with no common genes differentially expressed between neurospheres and differentiated cells. However, we applied a combination of pathway enrichment and topological significance methods to identify common sub-networks responsible for “stemness” and differentiation in all four cell lines. We have shown that in spite of absence of commonality between four cell lines at differential gene expression level, there is commonality between the cell lines at the regulation level – i.e. topologically significant genes. Separate networks of pathways upregulated and downregulated in neurospheres were interconnected by protein interactions (from MetaCore database) passing through a set of topologically significant expression regulators: transcription factors and kinases, common for all four cell lines. These genes had disproportionally many interactions with the genes underexpressed in differentiated glioblastoma cells. We also identified the key repressors of differentiation-specific genes.
TOP
X65 - Systems biology analysis of complex disorders
Short Abstract: Progress in understanding of molecular mechanisms underlying complex heritable disorders (e.g., autism, schizophrenia, diabetes) depends on new bioinformatics approaches for systems-level analysis and identification of disease-specific patterns of inheritance.
We present an approach and a supporting computational platform GEDI (http://gedi.ci.uchicago.edu/) for the analysis of common heritable disorders from the systems biology perspective. Our approach is based on a large-scale integration of genomic and clinical data and various classes of biological information from over 35 public and private databases. This data is used for the identification of genes and molecular networks contributing to phenotypes of interest, as well as for the prediction of additional high-confidence disease genes to be tested experimentally. Our analytical strategy is three-fold and includes (a) the enrichment analysis of high-throughput genomic data by Bayes factor and P-value estimates; (b) feature-based gene prioritization using support vector machines and (3) the development of network-based disease models for the identification of molecular mechanisms involved in disease pathogenesis. Networks-based gene prioritization leverages previous work of Dr. Börnigen-Nitsch and utilizes Heat Kernel diffusion, Random Walk, PageRank with priors, HITS with priors and K-step Markov model algorithms. These algorithms were modified to accommodate a variety of weighted data types to be used for gene prioritization. We illustrate our approach using analysis of brain connectivity disorders (e.g., agenesis of corpus callosum, autism, schizophrenia) as an example. Our analysis allowed uncovering some of the common molecular mechanisms that underlie these disorders. This knowledge will eventually lead to the development of efficient diagnostic and therapeutic strategies.
TOP
X66 - A Computational Study of Carotenoid Transcriptional Regulation in Arabidopsis thaliana During Abiotic Stress.
Short Abstract: Genetic engineering of food crops for improved provitamin A carotenoids has made good progress in recent years (Diretto et al., 2007; Paine et al., 2005). However, in order to achieve predictable crop development in this regard, it is necessary to understand the pathway regulation at systems level. Recent transcriptome studies on isoprenoid gene expression in response to development and osmotic stress in Arabidopsis (Meier et al., 2011) has provided significant insights. Manipulation of the biosynthetic genes that control the committed steps of carotenoid biosynthesis has been noted to offer an effective approach to quantitative and qualitative alteration of carotenoids in food crops. This study, focused on the transcriptional regulation around DXS and PSY in response to cold, drought, salt and mannitol stresses. Both DXS and PSY are known to be pathway limiting steps, thus direct metabolic flux towards carotenoid synthesis. Genomewide correlation analysis relative to the MEP and carotenoid pathway genes revealed two distinct clusters. Microarray analysis indicated differential gene expression based on source of stress, tissue type and time. NCED3, one of the carotenoid degradative enzymes, was identified as an early responsive gene regulated by HY5 transcription factor, suggesting ABA-mediated response. Promoter content analysis revealed putative GA and ABA responsive elements. Coexpression analysis revealed the following TF regulators during abiotic stress: SEP3, AGL15, AP2, HY5 and PIL5. Thus, DXS, PSY and NCED3 are potential targets for genetic perturbation to enhance provitamin A carotenoid accumulation.
TOP
X67 - Human Embryonic Stem Cells Network and Reprogramming
Short Abstract:
TOP
X68 - MONGKIE: Modular Network Generation and Visualization Platform with Knowledge Integration Environment
Short Abstract: In the recent years, high-throughput studies of biological systems have been resulting in a greatly increased volume of complex and inter-connected data. Given the huge amount of data and the heterogeneity, well-integrated network visualization together with data analysis methods is a key aspect of both the understanding and analysis of the data. There are other useful network visualization tools, but it still remains a challenge. Here, we present MONGKIE, Modular Network Generation and Visualization Platform with Knowledge Integration Environment, which is an integrated network visualization and analysis platform which allows us to explore and analyze biological network in an interactive manner with knowledge integration environment. Although it is optimized for exploring binary interactions and pathways tightly integrated with hiPathDB (http://hipathdb.kobic.re.kr), it can be easily applied to any biological data which can be modeled as network structures by utilizing data integration methods provided by MONGKIE. MONGKIE provides the generalized data model of visualization to represent domain-specific types of biological entities and the interactions between them. And it is designed for both the visualization of networks and the analysis of these networks with a seamless integration between the two procedures. MONGKIE incorporates various knowledge integration and network analysis methods into the visualization platform, such as Import and Export, Interaction Manager, Gene ID Conversion, Expression Overlay, Network Clustering, Gene Set Enrichment Analysis of GO and Pathway, Pathway Integration and Visualization. MONGKIE is a java-based application, and supports plug-in architecture, thus being platform-independent and easily extendable with additional functionalities. MONGKIE is available at http://mongkie.org.
TOP
X69 - Understanding human disease through 3D protein interactome network
Short Abstract:
TOP
X70 - "Guilt by association" is the exception rather than the rule in gene networks
Short Abstract:
TOP
X71 - Systematically Model Cell-Cell Interactions in Regulating Multiple Myeloma Initiating Cell Fate
Short Abstract: Cancer initiating cells (CICs) have been documented in mutiple myeloma (MM) and are believed to be a key factor that initiates and drives tumor growth, differentiation, metastasis and recurrence of the diseases. Although myeloma initiating cells (MICs) are likely to share many properties of normal stem cells, the underlying mechanisms regulating the fate of MICs are largely unknown. Studies designed to explore such communication are urgently needed to enhance our ability to predict the fate decisions of MICs (self-renewal, proliferation and differentiation).In this study, we developed a novel system to understand the intercellular communication between MICs and their niches by seamlessly integrating experimental data and mathematical model. We firstly designed the experiment and collected the data in three stages, MICs (identified as 'side population' or SP cells), progenitor cells (CD138- non-SP cells) and mature myeloma cells (CD138 non-SP cells) under various conditions, unsorted \
TOP
X72 - Choosing the Best Error Metric for Parameter Estimation: A Case Study in Drosophila Gap Gene Interaction
Short Abstract: Mathematical models of biological dynamics inform research by providing explicit representations of hypotheses, characterizations of system behavior, and frameworks for regulatory inference. However, they require estimation of uncertain physiochemical parameters. Parameter estimation is driven by data; optimization methods search a physiologically-constrained parameter space to minimize error between parameterized model output and data. With semi-quantitative data (e.g., immunofluorescence intensities without calibration standards), scaling is required to account for nonlinearities between data and underlying concentrations of interest. The choices of error metric and scaling complicate parameter estimation and the final choice of optimal parameters retains an element of subjectivity.

To find improved error metrics for semi-quantitative spatiotemporal data, we characterize and compare Euclidean-, cosine-, Chebyshev-distance, and unconventional metrics such as Kolmogorov-Smirnov statistics, Hausdorff-distance, correlation of Fourier coefficients, and cross-entropy. We also examine error without scaling, with linear scaling, and with quantile normalization. Using a reaction-diffusion PDE model of gap gene interaction in Drosophila, we use these metrics to estimate genetic regulatory parameters to gene expression immunofluorescence data. Past modeling studies using this system estimated parameters via multiple stochastic optimizations with SSE metrics; final parameter choice required subjective visual comparison of optimization results. Evaluating this model, we characterize the smoothness of the resulting cost surfaces and compare the results of stochastic optimization (genetic algorithms) using each metric. We find that the commonly-used Euclidean metrics sometimes perform poorly compared to others, with optimization converging to unrealistic solutions. Nonlinear scaling also smoothes cost surfaces, improving the convergence behavior of stochastic searches.
TOP

View Posters By Category

Search Posters:


TOP