View Posters By Category
Session A: (July 7 and July 8)
Session B: (July 9 and July 10)
Short Abstract: SBML is the most widely used data format to encode and exchange models in systems biology. The open-source JSBML project was launched in 2009 as an international collaboration aiming to provide a feature-rich Java implementation for reading, manipulating, and writing SBML files. The JSBML project is a stable, actively developed, and well-documented software project with many contributors around the world. A growing number of applications is now available that uses JSBML as their back-end for data manipulation. These cover diverse areas of use cases, including model building and graphical display, constraint-based modeling, dynamic simulation, annotation, etc. JSBML supports all levels, versions, and releases of SBML and provides numerous utility functions for working with this standard. Thereby, JSBML integrates well with further Java libraries for community standards. The JSBML team actively maintains and updates the project. JSBML is being used in students’ education and numerous research projects. Major model databases, such as BioModels or BiGG Models, use JSBML-based tools for their curation pipelines. JSBML is also regularly subject of international students coding events. JSBML can be freely obtained under the terms of the LGPL 2.1 from https://github.com/sbmlteam/jsbml/. The users’ guide at http://sbml.org/Software/JSBML/docs/ provides further information about using JSBML. Contact: email@example.com
Short Abstract: The DOE Systems Biology Knowledgebase (KBase) is a free, open-source software and data platform that enables researchers to collaboratively generate, test, compare, and share hypotheses about biological functions; analyze their own data along with public and collaborator data; and combine experimental evidence and conclusions to model plant and microbial physiology and community dynamics. KBase currently has over 160 analysis tools (see https://narrative.kbase.us/#appcatalog) that offer diverse scientific functionality for (meta)genome assembly, contig binning, genome annotation, sequence homology analysis, tree building, comparative genomics, metabolic modeling, community modeling, gap-filling, RNA-seq processing, and expression analysis (see Figure 1). Users can build and share sophisticated workflows by chaining together multiple apps–for example, one could predict species interactions from metagenomic data by assembling raw reads, binning assembled contigs by species, annotating genomes, aligning RNA-seq reads, and reconstructing and analyzing individual and community metabolic models. Computational experiments in KBase are saved in the form of Narratives. A finished Narrative represents a complete record of everything the authors did to complete their analysis. This recording of a user’s KBase activities within a sharable Narrative is a central pillar of KBase’s support for reproducible transparent research, simplifying the re-purposing, re-application, and extension of scientific techniques.
Short Abstract: The ModelSEED is a leading platform for automated genome-scale metabolic reconstruction, with over 100k models constructed since it’s release in 2010. Here we introduce the largest ModelSEED update since its initial release. First, we are launching a new website (www.modelseed.org), which integrates functionality from the PlantSEED resource for plant model reconstruction. This new site offers improvements to the biochemistry search and model reconstruction interfaces. Additionally, prokaryotic and plant genomes may now be annotated directly on the ModelSEED site. The ModelSEED Biochemistry Database was also updated and loaded into Github (https://github.com/ModelSEED/ModelSEEDDatabase). This enables users to curate the existing biochemistry and submit their own additions. A major part of this update was curation of the ModelSEED template models, fixing and expanding gene–reaction mappings. Additionally, biomass compositions were extended to include new metabolites that are essential for many organisms. We also improved the ModelSEED gap filling algorithm to restrict the addition of thermodynamically infeasible pathways. We validated our improvements by constructing new models for a diverse set microbial genomes, and testing model accuracy in predicting growth and knockout phenotype data. These final changes also impact the ModelSEED deploy in KBase and PATRIC.
Short Abstract: Finding sets of mechanisms that can explain observations in biological experiments (e.g., “How does treatment with SB431542 decrease the amount of SMURF2?”) generally involves a laborious process of information gathering and the construction and testing of a hypothesis in the form of a model. We present a system in which a user interacts with a computer partner through open-ended two-way English language dialogue to collect information on relevant mechanisms, and construct and test a mechanistic model serving as an explanation to observations of interest. The integrated system combines natural language understanding, dialog management, and the recognition of the user’s goals with the planning and execution of a variety of biological reasoning agents (Bioagents) capabilities. Bioagents report on relationships between drugs, their targets, and associations with disease; transcription factors and their targets, and mechanistic paths between proteins in pathway databases and networks assembled from literature-mining. Bioagents also interface with automated model assembly (INDRA) and simulation systems (Kappa, BioNetGen) to allow incremental model building and queries with respect to a model in the course of the dialogue. The dialogue system is embedded in a web-based interface which allows multi-modal visualization of the model being discussed.
Short Abstract: Single cell measurements have shown that populations of cells are intrinsically diverse in their biomolecular compositions, state, and responsiveness to environmental conditions. Surprisingly, genetic variability is not necessary for establishing population diversity. In fact, non-genetic sources of cell-to-cell variability (ngCCV) are a manifestation of the physical properties of the biochemical processes of cells, and consequently represent a general property of life at the single cell level. Of particular interest to the biomedical community is how this ngCCV contributes to pathway regulation and disease. To date a quantitative framework that specifically attributes population diversity to the observed variability in biomolecular components is lacking. To such end, we developed a method for DEtermining Parameter Influence on Cell-to-cell variability through the Inference of Variance Explained, DEPICTIVE for short. Using single cell measurements, DEPICTIVE computes the contribution of each biomolecular observable to the binary response being studied. We validated our method with both simulation data and experimental measurements of TRAIL induced apoptosis of Jurkat cells. Our method uncovered mitochondria abundance as a novel source of ngCCV that tunes the sensitivity of individual cells to TRAIL. Indeed, ngCCV that manifests as diverse sensitivities to therapeutic intervention is an important consideration for precision medicine.
Short Abstract: Chimeric proteins, comprising peptides deriving from the translation of two parental genes, are produced in cancers by chromosomal aberrations. Considering discrete protein domains as binding sites for specific domains of interacting proteins, we have catalogued the protein interaction networks for more than 11,000 cancer fusions in order to build the Chimeric Protein-Protein-Interactions (ChiPPI). Mapping the influence of fusion proteins on cell metabolism and protein interaction networks reveals that chimeric protein-protein interaction (PPI) networks often lose tumor suppressor proteins, and gain onco-proteins. We compared ChiPPI networks in different cancer phenotypes, e.g. in leukemia/lymphoma, sarcoma and solid tumors finding distinct enrichment patterns for each disease type. While certain pathways are enriched in all three diseases (Wnt, Notch, TGF beta), there are distinct patterns for leukemia (EGF receptor, DNA replication, CCKR), for sarcoma (p53 pathway, CCKR), and solid tumors (FGF and EGF signaling). We validated the predicted PPI networks using high-throughput transcriptomics and proteomics methods. More than 65% of fusions were confirmed at the unique junction sites and more than 46% of PPI networks were altered in at least two data samples. Thus, ChiPPI represents a comprehensive tool for studying skewed cellular networks produced by fusion proteins in different cancer types.
Short Abstract: Despite substantial experimental and computational efforts, mechanistic modeling remains more predictive in engineering than in systems biology. The reason for this discrepancy is not fully understood. Although randomness and complexity of biological systems play roles in this concern, we hypothesize that significant and overlooked challenges arise due to specific features of single-molecule events that control crucial biological responses. Here we demonstrate why modern statistical tools to disentangle complexity and stochasticity, which assume normally distributed fluctuations or enormous datasets, don't apply to the discrete, positive, and non-symmetric distributions that characterize spatiotemporal mRNA fluctuations in single-cells. As an example, we integrate single-molecule measurements and advanced computational analyses to explore Mitogen Activated Protein Kinase induction of multiple stress response genes. Through systematic comparisons of the same model to the same data, we elucidate why standard modeling approaches yield non-predictive models for single-cell gene regulation. We further explain how advanced tools recover precise, reproducible, and predictive understanding of diverse transcription regulation mechanisms, including gene activation, polymerase initiation, elongation, mRNA accumulation, spatial transport, and degradation. Our model-data integration approach should extend to any discrete dynamic process with rare events and realistically limited data.
Short Abstract: Exaggerated cutaneous scarring is a debilitating medical problem that occurs after trauma and surgical procedures. Frequently, extreme scarring leads to permanent functional loss in the scar tissue and significant disfigurement in patients. The ability to predict the scarring outcome in advance during the early stages of the wound-healing response is key for developing successful prophylactic therapeutic interventions. We sought to computationally identify prospective protein biomarkers that would enable such predictions. Using a previously developed and validated computational model that captures the kinetics of essential cell types and proteins during injury-initiated wound healing, we generated a dataset of 120,000 simulations representing distinct wound-healing scenarios. By applying a recently published, novel computational strategy that comprised data classification, protein concentration distribution analysis, and logistic regression models, we identified diagnostic and prognostic biomarkers of excessive wound scarring. Specifically, we found that increased levels of interleukin(IL)-10, tissue inhibitor of matrix metalloproteinase (TIMP)-1, and fibronectin could predict pathological scarring with an accuracy of ~80% as early as 4 weeks in advance, and with an accuracy of ~86% if the proteins are assayed 3 weeks in advance. Clinical validation of these model-predicted biomarkers may provide prognostic tools for objective, personalized clinical assessments of traumatic and surgical wounds. Disclaimer: The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the U.S. Army or of the U.S. Department of Defense. This abstract has been approved for public release with unlimited distribution.
Short Abstract: Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative mutants is challenging since the number of possible combinations grows explosively. Moreover, keeping track of the genetic crosses needed to make new mutants and planning sequences of experiments is unmanageable when there are hundreds of predictions to test. We present CrossPlan, an algorithm for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. CrossPlan uses an integer linear program to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. The number of target mutants we can plan increases linearly with the number of batches planned. Interestingly, planning two or three batches at a time is nearly as optimal as planning all batches simultaneously. The experimental flow that underlies our work is quite generic and our algorithm is easy to modify. Hence, our framework should be relevant in mammalian systems as well.
Short Abstract: All biological systems exhibit cell-to-cell variability, and this variability often has functional implications. To gain a thorough understanding of biological processes, the latent causes and underlying mechanisms of this variability must be elucidated. Cell populations comprising multiple distinct subpopulations are commonplace in biology, yet no current methods allow the sources of variability between and within individual subpopulations to be identified. This limits the analysis of single-cell data, for example obtained by flow cytometry and microscopy. We present a data-driven modeling framework to analyze cell populations,which comprise heterogeneous subpopulations. Our approach combines mixture modeling and frameworks for distribution approximation, facilitating the integration of multiple single-cell datasets and the detection of causal differences between and within subpopulations. We demonstrated the ability of our method to capture multiple levels of heterogeneity in the analyzes of simulated data and data from primary sensory neurons involved in pain initiation. Our approach predicted relative changes in TrkA and Erk1/2 expression levels but not subgroup composition to underlie increased NGF-responsiveness caused by exposure to different extracellular scaffolds.
Short Abstract: E-Cell System version 4 (E-Cell4) is a software environment that supports cellular simulations at multiple scale (spatial / nonspatial), algorithms (deterministic / stochastic), and platforms (operating systems, high performance computing resource managers, and cloud computing). E-Cell4 also has unified APIs to combine and switch multiple algorithms independent of the model. In this poster, we introduce the new feature of E-Cell4 called model annotation that can directly link a cellular model to bioinformatics databases. Data pipelines in Python language support constructing and customizing fully-annotated models based on various types of databases and model annotations allow automatic acquisition and integration of metadata. Based on these annotations of model entities (species, reactions), users can interactively access databases with Jupyter Notebook. The annotations generate useful information related to the model, e.g., description of entities, formatted equations, a summary table of parameters, and list of publications, as a publishable document with no extra cost. Also, the rule-based model notation facilitates the natural representation and consistent integration of interactions and reactions such as protein modification and isotopic labeling. Here, a notebook for modeling and simulation of metabolic network based on the KEGG pathway database is demonstrated. E-Cell4 is freely available at https://github.com/ecell/ecell4 .
Short Abstract: CD4+ T cells provide cell mediated-immunity in response to pathogens and diseases. After activation, naïve T cells differentiate into effector T helper and regulatory subtypes. These subtypes were initially thought of as terminally differentiated; however, plasticity in T cell differentiation has been observed in recent studies. In this study, we developed a logic-based computational model of signaling pathways that govern the differentiation process of naive T cells into T helper 1, 2, 17, and induced Treg cells. We characterized the dynamic capacity of T cell differentiation in response to the varying dosage of 512 extracellular cytokine combinations. In addition to the classical phenotypes, we predicted previously reported and novel complex T cell phenotypes that have co-existence of multiple lineage-specifying transcription factors (TFs). Our results suggested that plasticity in T cell differentiation is a function of both cytokine composition and dosage. We also identified the specific patterns of extracellular environments that can lead to each T cell subtype. Based on cytokine dosage, we identified the dominant stimuli that control the transition between canonical and complex phenotypes. In the end, we predicted the optimal activity of input cytokines that maximize the activity levels of multiple lineage-specifying TFs in complex phenotypes.
Short Abstract: A long-standing modeling problem is to infer the activity levels of each TF in many cell samples, given the gene expression profile of each sample and a qualitative network map, indicating which TFs have the potential to regulate each gene in the genome. Accurate TF-activity (TFA) inference would be useful for identifying TFs whose activity is affected by drug treatments or cancer mutations. It would also provide models for predicting the effects of knocking out or over expressing specific combinations of TFs. We present solutions problems that have limited the practical utility of TFA inference: (1) a method for constructing the required qualitative TF-network maps; (2) a method for exploiting samples of cells in which a TF has been genetically deleted; (3) a combination of regularization and constraints on parameters that improves both accuracy and interpretability; (4) the first application of TFA to a large collection of expression profiles, on a whole-cell scale, without prior knowledge beyond the qualitative network map. Our systematic, objective, genome-scale evaluations of inferred activity levels, using real data, show that our approach works on genomic scale. This opens the door to meaningful comparison of TFA inference methods and to their widespread application.
Short Abstract: Biochemical reaction networks are often stochastic because of the different time scales of reactions and often low copy numbers of participating molecular species. The discrete Chemical Master Equation provides a fundamental framework for studying their time-evolving and steady state probability landscapes. Vector fields of probability velocity and flux can further characterize the time-varying and non-equilibrium steady states properties of these systems. Here we describe a general approach of analysis of the global flow map of probability mass in all directions of all molecular species. It takes into full account the discreetness of both states and jump reactions, and provides an exact quantification of the vector fields along the boundaries of the state space dictated by the reaction network. We apply this approach to study the toggle switch network, in which the reactions of transcription and translation are both explicitly modeled. We describe the mechanism of the transitions between important cellular states, as well as examine how duplication of genes in the toggle switch affects the non-equilibrium dynamics of transitions between them. We explore changes in the dynamics of non-equilibrium probability landscape and the appearance of new cellular states, as well as changes in their locations.
Short Abstract: Roots play an important role in absorption of water, minerals, regulation of metabolism, and overall plant growth and maintenance of homeostasis. Furthermore, roots form complex interactions with their soil microbial community. Despite of its importance, root metabolism remains largely unexplored. Given the complex nature of the root system within the plant and its soil environment, a multiscale model of root metabolism and the soil microbiome community has the potential to better characterize the relationship between genotype and phenotype, and predict new interventions for crop improvement. Herein we describe a metabolic model of maize root that was developed using public transcriptomic data from 18 maize root tissues and previous maize models. This model consists of 4,917 reactions associated with 5,637 genes. In order to characterize interactions between maize root and soil microbes, we also developed metabolic models of seven microbes that are in a symbiotic relationship with maize. We are in the process of integrating these models into a multiscale model that will be able to describe the dynamic maize root – soil microbial interactions. This model will be used to understand root-microbe interplay regulating maize metabolic responses, and to predict novel pathways associated with maize root exudate and rhizobiome interactions.
Short Abstract: The immune system is regulated by biological and biochemical networks integrated across multiple scales (e.g., signal transduction, metabolism, etc). There are networks within each individual cell and at the cell population level. In order to understand the dynamics of the immune system under healthy and diseased conditions, multi-scale models are needed to fully leverage mathematical and computational tools. Herein, we discuss the first step we have taken towards describing the immune system in such a computational, system-level framework, exemplified by a multi-scale model of CD4+ T lymphocytes, including naive, effector (Th1, Th2, and Th17), regulatory, and memory cells. Within this framework, the following scales about CD4+ T lymphocytes are integrated: metabolism (described by constraint-based models), gene regulation and signal transduction (logical model), the population level (agent-based model), and extracellular cytokine concentrations (ordinary differential equations). Furthermore, the framework is oriented in space within three compartments, namely an infection site, a draining lymph node, and the circulatory system. The model was validated by reproducing known phenomena using a Monte Carlo method, including the phenotypic plasticity of CD4+ T lymphocytes, the effects of IL-2 on their proliferation and survival, and the effects of chronic inflammation.
Short Abstract: Background: Helicobacter pylori, although, known to cause gastric cancer in 1-2% of cases, exerts beneficial effects including protection against allergies and gastroesophageal diseases. Motivation: To examine the double edge sword of H. pylori as a pathogen and a beneficial organism, and investigate the immunoregulatory responses during H. pylori infection we utilized a high-performance computing (HPC)-driven ENteric Immunity SImulator (ENISI). Method: The multiscale model simulated (> 10^cells) of the gut mucosal immune system. We performed simulations integrating various spatiotemporal scales encompassing ABM- (tissue), ODE- (cellular) and partial differential equation- (cytokine gradients) based methods. The modeling data were analyzed by building a metamodel using stochastic-kriging and the design was based on a space filled Latin Hyper Cube matrix. A spatiotemporal metamodel-based variance and partial rank correlation coefficient-based regression type sensitivity analyses was conducted to analyze the parameters influencing the initiation, peak and recovery stages of the infection. Results: The data analytics methods identified the parameters related to epithelial cell death and epithelial cell proliferation validating the findings from the experiment models, and highlighted the crucial role of IL-12 in influencing the host responses to infection. Conclusion: Thus, the ENISI identified factors critical for the survival of H. pylori and lesion development.
Short Abstract: The spontaneous self-assembly of molecules into functional complexes is central to all major cellular processes, yet self-assembly chemistry has only slowly been incorporated into systems biology modeling. This in large part results from substantial computational and experimental challenges to self-assembly modeling, simulation, and model inference compared to simpler enzymatic and transport networks. Rule-based stochastic simulation has provided a way to model and tractably simulate even highly complicated self-assembly reaction networks and to learn model parameters from experimental data via simulation-based model fitting. Nonetheless, large parameter spaces, high computational cost of simulations, and limited experimental data have so far precluded the use of Bayesian methods for characterizing uncertainty of model fits, which have become the standard for most other systems biology model inference. In the present work, we improve on prior data-driven model inference for self-assembly systems in two directions: 1) extending data-fitting to encompass small-angle scattering (SAS), a richer experimental data source than the static light scattering (SLS) used in prior work, and 2) developing an efficient Bayesian optimization framework by learning Gaussian process models as surrogate functions to capture model uncertainty. We demonstrate and validate the approach on synthetic SAS data for a virus capsid assembly model.
Short Abstract: Signal transduction networks, such as the cell division cycle, are prone to the combinatorial complexity. While the number of microstates increases exponentially in such a system, the empirical data describing these states tends to be scarce. These two characteristics challenge mathematical descriptions in terms of scalability and data congruence. We developed a large-scale, mechanistically detailed and executable bipartite Boolean network of the cell cycle in Saccharomyces cerevisiae. We based this network on the reaction-contingency language, which scales with and captures the measured elemental states. Analyzing the attractors of this Boolean network enables the study of phenotypes which lead to normal or abnormal growth of a cell. Determining cyclic attractors in such a Boolean network is an NP-hard problem, and hence, cannot be solved exhaustively. We address this challenge by using partial information to reduce the number of trials in a heuristic search. We use this method to study the behavior of a reduced version of the original network. Such an analysis enables us to explore which components in the network control the cyclic behavior of the cell cycle network. In the future, analysis tools for mechanistically detailed Boolean models enable the development of whole-cell models, and ultimately personalized medicine.
Short Abstract: mRNA transfection is the process of introducing mRNA into a living cell. mRNA delivery becomes increasingly interesting for biomedical applications because it enables treatment of diseases by means of targeted expression of proteins and it is transient, avoiding the risk of permanently integrating into the genome. Despite its potential in treating diseases, many parameters of mRNA transfection are still unrevealed. We study mRNA transfection on the single-cell level. To that end, we model its dynamics by diffusion approximations to the discrete-state processes. Several models elaborate different aspects of the system, e.g. enzymatic degradation of the mRNA or ribosomal binding to mRNA for translation. The corresponding diffusion processes are equivalently described by stochastic differential equations (SDEs). Based on data from time-lapse fluorescence microscopy, we estimate the SDE model parameters. As observations are usually only available in rather low frequency, we use a Markov chain Monte Carlo algorithm that employs Bayesian data imputation and that can also handle latent variables and measurement error. We compare our approach to a recently published one based on ordinary differential equations (ODEs) and investigate e.g. how far problems of identifiability from the ODE setting can be overcome by our SDE approach.
Short Abstract: Acute myeloid leukemia is one of the most common hematological malignancies, characterized by high relapse and mortality rates: its inherent intra-tumor heterogeneity between subpopulations of cells is thought to play an important role in disease recurrence and resistance to chemotherapy. Current experimental methods are often not enough to quantify and assess the dynamics of these subpopulations. In order to overcome this limitation, we introduce a novel modeling and simulation framework that takes into account the inherent stochasticity of cell division events to investigate the possible occurrence of different subpopulations of cell types in acute myeloid leukemia, notably leveraging experimental data derived from human xenografts in mice. Our results highlight the role played by quiescent cells, as well as proliferating cells characterized by different rates of division, in the progression and evolution of the disease, hinting at the necessity to further characterize tumor cell subpopulations.
Short Abstract: A key challenge in systems biology is the elucidation of the underlying molecular pathways which regulate cellular phenotype during time series experiments in which biological tissues or cells are exposed to stressors. It is possible that there are undiscovered interactions between genes or proteins in response to a given stressor which may only be identified by a novel approach to assess gene regulatory network dynamics. We propose a novel framework based on statistical mechanical principles for systems analysis and interpretation of molecular omics data. Specifically, we propose the notion of network signaling entropy (or uncertainty) as a means of elucidating novel interactions which will provide insights into underlying basic biology, disease and repair mechanisms. We describe the power of assessing network signaling entropy to discriminate cells according to their distinct states of injury or repair during a time series transcriptomic analysis. Our analyses suggest that network signaling entropy decreases in response to inflammatory stimulation, suggesting that entropy can be used to identify novel regulatory elements mediating inflammatory injury and post-injury repair. We thus propose network signaling entropy as a powerful approach for understanding signaling promiscuity during tissue injury, repair, and regeneration.
Short Abstract: Course and outcome of gastrointestinal infections depend on the complex interplay of pathogens, their virulence and fitness factors, the host immune response, presence and composition of the endogenous microbiome. An expansion of pathogens within the gastrointestinal tract implies an increased risk for the development of severe systemic infections, especially in patients receiving antibiotic treatment or in an immunocompromised state. We developed a computational model to predict pathogen expansion, gut colonization, and infection outcome. For implementation and challenge of the model, oral mouse infection experiments with the enteropathogen Yersinia enterocolitica (Ye) were used. Our model calculates the bacterial population dynamics during gastrointestinal infection and accounts for specific pathogen characteristics, the host immune capacity and colonization resistance mediated by the endogenous microbiome. We calibrated the model to experimental data obtained by the infection of a healthy host. Afterward, we challenged our model by adopting scenarios where either a microbiome was lacking (mimicking antibiotic treatment of patients), or where the immune response was partially impaired. Experimental mouse infections approved predicted population dynamics based on these scenarios. Our model provides new hypotheses about the roles of host- and pathogen-derived factors and might be useful for developing personalized infection prevention and treatment strategies.
Short Abstract: Whole-cell (WC) models are needed to guide medicine and bioengineering. These models require data about each gene, RNA, protein, complex, and reaction. Unfortunately, this data is hard to collect because it is scattered across repositories and articles; described with different formats, identifiers, and units; and obtained from different methods, organisms, and conditions. To accelerate WC modeling, we developed Datanator, an integrated database, search engine, and web interface for data for modeling. The database includes metabolite, RNA, and protein concentrations; protein complex subunit compositions; and rate laws and kinetic constants from ArrayExpress, CORUM, ECMDB, PaxDB, and SABIO-RK. The search engine finds data for modeling specific compounds, reactions, organisms and environments, including data from similar compounds, reactions, organisms, and environments. The web interface helps modelers explore the database. In addition to using Datanator to build a WC model of Mycoplasma pneumoniae, we have shown that Datanator can find missing parameters for ODE models, augment FBA models with kinetic bounds, and recalibrate models to similar organisms. We believe that Datanator will accelerate WC modeling, and enable more predictive models. To continue to accelerate WC modeling, we plan to integrate additional data sources into Datanator and integrate Datanator with model design tools.
Short Abstract: Human Milk Oligosaccharides (HMOs) are abundant and functional components of human milk, which impact health and development of the infant. Their biosynthesis in the human mammary gland remains elusive despite nearly half a century of investigation. Here we have developed a framework for resolving ambiguous enzymes and metabolites in this process. Our approach leverages metabolic data to construct metabolic models; models are scored and selected based on their consistency with transcriptomic data. Starting from a generic metabolic network describing all feasible biosynthetic pathways of 34 potential HMO structures related to 16 most abundant (>97% by weight) oligosaccharides found in human milk. Through the integration of HMO glycoprofiling and transcriptomics, our modeling approach identifies the most likely HMO structures for uncharacterized HMOs, the associated biosynthetic reactions for those HMOs, and candidate genes for elongation, branching, fucosylation, and sialylation of HMOs. These results provide the molecular basis for HMO biosynthesis and thus can be used to guide new strategies for HMO synthesis for academic and nutritional use. Most notably, we observer unique metabolic activity as a function of maternal blood-type; the determining glycosyltransferases of which are also believed to influence HMO biosynthesis.
Short Abstract: Combination therapies are hoped to overcome the challenge of emerging cancer drug resistance by blocking malicious acquired bypass mechanisms. Due to the vast therapeutic space, undirected testing will not suffice to identify effective combinations. Computational network approaches have previously been used to successfully predict effective drug combinations (e.g. Flobak, 2015). Many of these rely on data from perturbation experiments, which is not transferable to a clinical setting. Here we present the use of logical modeling informed by baseline molecular data from unperturbed systems. We manually curated a prior knowledge network of 144 nodes encompassing 19 drug targets, which we experimentally screened in single and double perturbation across eight human cancer cell lines of different origin. Cell-line specific logical models were tailored to agree with transcriptomic and literature curated baseline data. Our in silico modelling experiments suggest that network refinement accounting for subtle biologically founded mechanism increases the models’ predictive capability, and that it is more important to accurately describe activity of nodes with high- rather than low- out-degree. Our work implies that models informed by baseline data could be used to economize screening efforts by enriching screening design for beneficial drug combinations.
Short Abstract: Transcriptional regulatory networks (TRNs) provide insight into cellular behavior by describing interactions between transcription factors (TFs) and their gene targets. The Assay for Transposase Accessible Chromatin (ATAC)-seq, coupled with TF motif analysis, provides indirect evidence of chromatin binding for hundreds of TFs. Here, we propose modified LASSO regression with StARS model selection for TRN inference in a mammalian setting, using ATAC-seq data to influence gene expression modeling. We rigorously test our methods in the context of T Helper Cell Type 17 (Th17) differentiation, generating new ATAC-seq data to complement existing Th17 genomic resources (plentiful gene expression data, 25 TF knock-outs and 9 TF ChIP-seq experiments). In this resource-rich mammalian setting, we undertake quantitative, genome-scale evaluation of our methods. In addition to the context-specific ATAC-seq, we evaluate generic sources of prior information, from a curated database and other cell types. We refine and extend our previous Th17 TRN, using our new TRN inference methods to integrate all Th17 data, highlighting new TFs in Th17 gene regulation. Given the popularity of ATAC-seq, which provides high-resolution with low sample input requirements, our methods will improve TRN inference in new mammalian systems, especially in vivo, for cells directly from humans and animal models.
Short Abstract: Insights yielded from genome-scale metabolic models (GEMs) providing information on cancer-specific metabolism have been used for identifying potential therapeutic agents and drug targets. Moreover, repositioning drug for any cancer has utmost importance in the context of drug discovery. We aimed to reconstruct a generic prostate cancer (PRAD) specific model for not only exploring the metabolism but also repurposing new therapeutic agents. RNA-Seq data for 495 individuals suffering from PRAD as well as 52 noncancerous prostate samples from The Cancer Genome Atlas database and proteome data from the Human Protein Atlas v18 were retrieved. Besides, all personalized GEMs based on PRAD transcriptomes were acquired from the Human Pathology Atlas to reconstruct a generic model covering all individual variations as well as proteome and transcriptome. tINIT and reporter metabolites algorithm via RAVEN toolbox were used to reconstruct the model and identify reporter metabolites, respectively. Differentially expressed genes in PRAD specific metabolic model were used as metabolic signatures for drug repurposing. Gene expression profiles from CMap2 were analyzed and statistically evaluated. Consequently, eleven novel drug candidates were repurposed for PRAD. Reversal effect of drug candidates are still under investigation through PRAD specific GEM.
Short Abstract: Coagulation and fragmentation (CF) is a fundamental process in which particles attach to each other to form clusters, while existing clusters also break into smaller clusters. This is a ubiquitous process that plays significant roles in biological problems such as brain shrinkage, Alzheimer’s disease or amyloid-beta aggregation in neurodegenerative disease. CF often occurs in confined space with limited number of particles; thus the system can be highly stochastic. A fundamental approach to investigate CF is through solving the underlying discrete Chemical Master Equation (dCME), which provides exact descriptions of the time-evolving and the steady states of the CF system. Recent theoretical models which are based on dCME do not fully take into account the attachment, detachment, synthesis, and degradation, as well as the effects of dimensionality, simultaneously. We use the newly developed Accurate Chemical Master Equation (ACME) method to solve the underlying dCME of the CF process and examine the time evolving dynamics of CF system at different attachment, detachment, synthesis, and degradation rates. We demonstrate how these factors can have profound effects on the CF process.
Short Abstract: Bioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package CancerInSilico to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. The core model of the package is an off-lattice, cell-center Monte Carlo mathematical model for cellular growth. We adapt this model to simulate the impact of growth suppression by targeted therapeutics in cancer and benchmark simulations against bulk in vitro experimental data. Sensitivity to parameters is evaluated and used to predict the relative impact of variation in cellular growth parameters and cell types on tumor heterogeneity in therapeutic response.
Short Abstract: Whole-cell (WC) computational models of human cells are a central goal of systems biology. WC models could help researchers understand cell biology and help physicians treat disease. Ongoing technological advances in experimentation and modeling are enhancing the feasibility of WC models. However, progress toward WC models remains slow. To identify the bottlenecks to WC modeling and develop a long-term plan to achieve human WC models, we surveyed the biomodeling community, reviewed the literature, and reflected on our experience prototyping WC models of bacteria. We identified four major bottlenecks: a) inadequate experimental methods and data repositories; b) inadequate tools for designing, describing, simulating, calibrating, and validating large models; c) few models of individual processes that can be combined into WC models; and d) insufficient coordination within the biomodeling community. Further, we propose a project, termed the Human Whole-Cell Modeling Project, which would overcome these bottlenecks and achieve the first human WC models. The cornerstones of the project include developing computational technologies for scalably building and simulating models, developing standard protocols and formats for collaborative modeling, collaboratively building models as a community, and focusing on a single cell line. We invite the community to join this exciting and ambitious effort.
Short Abstract: A whole cell modeling has been one of grand challenges in the post-genomic era. However, it is yet very difficult to realize the sustainable way of modeling and predictable simulation of a cell. Here, we present a novel framework of automatic bottom-up modeling from a genomic sequence, and of genome-scale simulation for prokaryotic cells at single- molecule and nucleotide resolution. As an example, a whole cell simulation of Escherichia coli is demonstrated. The software accepts a genomic sequence (1), automatically annotates genomic regions, e.g. operons, open reading frames and protein domains, based on various databases (2), generates a whole cell model consisting of gene expression, protein modification, metabolism, and replication (3), and simulates the stochastic agent-based model representing individual molecules and events in single-nucleotide resolution (4). It can directly evaluate the effect of mutations and synthetic genomes just by editing DNA sequences without detailed knowledge about the mathematical model. To evaluate the predictability, the simulation enables genome-scale computational experiments (in silico omics) quantitatively comparable with wet experiments like RNA-Seq and ChIP-seq. Integration of bioinformatics and systems biology based on a genomic sequence enables sustainable approach of whole cell modeling and bridges the gap between computational and experimental biology.
Short Abstract: DNA constantly undergoes double-strand breaks (DSBs), which result in cell death if the DSBs are not repaired. An early step in DSB repair is the phosphorylation of H2A histones (called gamma-H2AX) around the break site, extending up to 50 kilobases from the DSB in S. cerevisiae. Kinases Mec1 and Tel1 in S. cerevisiae (ATR and ATM in mammals) are responsible for the phosphorylation of H2A and are known to bind to the DSB site. We aim to understand how these histone modifications propagate along the chromosome from the break site. We create mathematical models of several potential propagation mechanisms, in which the kinases reach distant H2As by (1) sliding along the chromosome, (2) by diffusing in 3D from the break site to the H2As, or (3) by looping of the chromatin to bring a DSB-bound kinase into contact with a distant H2A. For each model, we derive the probability of H2A phosphorylation as a function of the distance from the DSB and time since the formation of the DSB. We quantitatively compare these theories to chromatin immunoprecipitation measurements of the kinetics of H2A phosphorylation in S. cerevisiae. We find that Tel1 undergoes sliding and Mec1 likely slides as well.