Presentation Overview: Show
The community of particular interest (COSI) in systems modeling (SysMod) organizes annual gatherings...
Presentation Overview: Show
Respiratory diseases typically involve processes across several spatial and temporal scales. The immune system plays an important role in a variety of ways, both in infectious and sterile conditions. Computational models can help understand the mechanisms that link different scales together, and help discover new and compare existing treatment options. Their usefulness would be greatly enhanced if they could be personalized to characteristics of individual patients. This talk will describe such a model of the innate immune response to fungal and viral pathogens, and show some examples of how the model can be used as a virtual laboratory. The talk will also describe tools helpful for building large-scale models of this type.
Presentation Overview: Show
MRA (Modular Response Analysis) is a method used to infer biological networks. From a set of independent perturbations applied on network nodes, it is possible to compute the connectivity between every pair of nodes. It is well-known that classical MRA is sensitive to measurement noise and perturbation intensity. One of the most important questions about MRA concerns discovered edges meaning. We have developed a new approach linking MRA and multiple linear regression. Connectivity coefficients estimation is equivalent to reckon regression parameters as confidence intervals. We have confirmed this approach successfully by comparing it with classical MRA, in the case of an "in silico" six nodes Map kinases network. One important MRA's application, coupled to regression, is to identify null edges, notably concerning gene networks, which are often sparse. Many regression methods have been compared: multiple regression, "Lasso", "threshold regression", applied to "in-silico" gene networks stemming from Dream Challenge 4. Results have shown a correct error rate for 10 and 100 genes networks.
Presentation Overview: Show
Metabolomics, synthetic biology, and microbiome research demand
information about organism-scale metabolic networks. The convergence
of genome sequencing and computational inference of metabolic networks
has enabled great progress toward satisfying that demand by generating
metabolic reconstructions from the genomes of thousands of sequenced
organisms. Visualization of whole metabolic networks is critical for
aiding researchers in understanding, analyzing, and exploiting those
reconstructions. We have developed bioinformatics software tools that
automatically generate a full metabolic-network diagram for an
organism, and that enable searching and analyses of the network. The
software generates metabolic-network diagrams for unicellular
organisms, for multi-cellular organisms, and for pan-genomes and
organism communities. The diagrams are zoomable to enable researchers
to study local neighborhoods in detail and to see the big picture. The
diagrams also serve as tools for comparison of metabolic networks and
for interpreting high-throughput datasets, including transcriptomics
and metabolomics data. These data can be overlaid on the metabolic
charts to produce animated zoomable displays of gene expression and
metabolite abundance. The BioCyc.org website contains whole-network
diagrams for more than 20,000 sequenced organisms.
Presentation Overview: Show
Human Milk Oligosaccharides (HMOs) are abundant carbohydrates fundamental to infant health and development and modulation of the infant microbiome. Although these oligosaccharides were discovered more than half a century ago, their biosynthesis in the mammary gland remains largely uncharacterized. Here, we use a constraint-based modeling framework that integrates glycan and RNA expression data to construct an HMO biosynthetic network and predict glycosyltransferases involved. To accomplish this, we construct models describing the most likely pathways for the synthesis of the oligosaccharides accounting for >95% of the HMO content in human milk. Through our models, we propose candidate genes for elongation, branching, fucosylation, and sialylation of HMOs. Our model aggregation approach recovers 2 of 2 previously known gene-enzyme relations and 2 of 3 empirically confirmed gene-enzyme relations. The top genes we propose for the remaining 5 linkage reactions are consistent with previously published literature. These results provide the molecular basis of HMO biosynthesis necessary to guide progress in HMO research, including the study of maternal genetics on HMO composition and metabolic engineering of microbes to facilitate the manufacturing of these molecules as invaluable nutraceuticals to improve infant health and development.
Presentation Overview: Show
In recent years, the production of chemicals using biological systems has been gaining traction. Bioproduction is a sustainable alternative to traditional chemical processes, but it is often associated with higher operational costs. The economy of bioprocesses can be improved by co-producing multiple products using the same system. There are multiple algorithms that exist for in silico metabolic engineering of organisms to achieve overproduction of a single product. However, there is a lack of computational tools that can co-optimize a set of metabolites. Here, we present co-FSEOF (co-production using Flux Scanning based on Enforced Objective Flux), an algorithm designed to identify intervention strategies to co-optimize the production of multiple metabolites. Co-FSEOF can identify all pairs of metabolites that can be co-produced in an organism using a single intervention. It can also identify higher-order intervention strategies for a chosen set of metabolites. We have utilized this tool to identify intervention strategies for the co-production of pairs of metabolites in Escherichia coli and Saccharomyces cerevisiae under aerobic and anaerobic conditions. The proposed computational tool provides a systematic approach to study co-production and thereby aids the design of better bioprocesses.
Presentation Overview: Show
Signaling crosstalk occurs when the stimulation of a signaling pathway's receptors results in downstream effects on another signaling pathway. While numerous network-based methods identify the presence of signaling crosstalk, they do not distinguish between different signaling events and therefore offer limited mechanistic insights into crosstalk. Given the context-specific and often concurrent types of interactions in cellular signaling, multilayer networks offer the potential to better understand signaling pathways and predict their crosstalk. We built a multilayer network consisting of a gene regulatory layer and a signaling layer, and developed a statistical framework, MuXTalk, that uses high-dimensional edges, or multilinks, to model signaling crosstalk. Using statistically over-represented multilinks as proxies of crosstalk between signaling pathways, we identified potentially crosstalking pathway pairs among 61 KEGG pathways. In our benchmark, MuXTalk had a higher area under the ROC and precision-recall curves compared to all single layer-based methods tested, identifying additions to the current gold-standard. Crosstalk predictions in our “discovery” set of pathway pairs were highly supported in the literature with a precision >80% for the top 50 pairs. Overall, our findings suggest the utility of the multilayer modeling of signaling crosstalk, with possible future applications to extend our approach to tissue- and disease-specific crosstalk.
Presentation Overview: Show
The Toll-like receptor (TLR) signaling pathway is crucial for the initiation of effective innate immune responses. In this investigation, experimental and computational techniques are being integrated to generate a strongly data driven model of the TLR pathway. Targeted mass spectrometry was used to measure the absolute abundance of 54 (phospho)proteins (using 136 unmodified peptides and 29 phosphopeptides). The protein abundances ranged from 1,332 to 227,000,000 copies per cell (mouse bone marrow-derived macrophages). They moderately correlated with transcript abundance values (r = 0.699, p = 1.37e-17), and these data were used to make proteome-wide abundance estimates. Hundreds of TLR pathway protein-protein association rates were estimated using protein structures and molecular simulations (TransComp and Simulation of Diffusional Association). Rule-based pathway modeling and simulation is being performed using the Simmune software suite. The obtained values for absolute protein abundances and protein-protein interaction rates are being used as model parameters, and targeted phosphoproteomics is being used for model training, testing, and validation. This work was supported by the Intramural Research Program of NIAID, NIH.
Presentation Overview: Show
The increasing number of available large RNA-seq datasets, combined with genome-wide association studies (GWAS), differential gene expression (DEG) studies, and gene regulatory networks (GRN) analyses have led to the discovery of many novel therapeutics. Despite this progress, our ability to translate GWAS and DEG analyses into an improved mechanistic understanding of many diseases remains limited, as both analyses disregard information about the cell types where the causative mechanisms driving the disease take place. This is critical because regulatory mechanisms may differ widely across cellular types. We explore several independent approaches to elucidate cell-type specific regulatory information about candidate genes associated with rheumatoid arthritis (RA). We compute sample-specific GRNs, which is a substantial advance compared to cohort-specific GRNs, as it enables the use of statistical techniques to compare network properties between phenotypic groups. Our analysis makes very precise experimental predictions, such as the impact of the knockdown of a specific TF in a specific cell type, and therefore, we expect it to be very useful to both rheumatologists and the broader scientific community interested in identifying cell-specific driver genes in other complex diseases.
Presentation Overview: Show
Many proteins only function when in complex with other proteins, yet experimental methods for protein-protein interaction (PPI) determination lack the ability to accurately construct interaction networks on a proteome-wide scale. To address this shortcoming, computational methods can be used to complement high throughput experimental datasets as well as motivate small-scale biochemical experiments. We have developed such an approach, PEPPI, which predicts PPIs on a whole-proteome scale through a combination of homology, functional association, and machine learning; we find that PEPPI shows superior performance when compared with current state-of-the-art methods while remaining species-agnostic and computationally efficient. With PEPPI, we have predicted several cross-species interaction networks, including the network between humans and SARS-CoV-2 virus as well as the network between humans and the probiotic bacterial strain E. coli Nissle. Ongoing refinements include incorporation of a deep learning-based approach which leverages interchain co-evolution to make its predictions. Finally, to more faithfully represent the complexity of PPI networks, we have developed a 3D network visualization program, FALCON, which supports visualization of networks in immersive virtual reality space. Through the understanding these programs provide of protein function expressed through PPI networks, we can more effectively develop therapeutics through future drug development and protein engineering studies.
Presentation Overview: Show
Cells and tissues respond to perturbations in multiple ways that can be sensitively reflected in alterations of gene expression. Current approaches to finding and quantifying the effects of perturbations on cell-level responses over time disregard the temporal consistency of identifiable gene programs. To leverage the occurrence of these patterns for perturbation analyses, we developed CellDrift (https://github.com/KANG-BIOINFO/CellDrift), a generalized linear model-based functional data analysis method capable of identifying covarying temporal patterns of various cell types in response to perturbations. As compared to several other approaches, CellDrift demonstrated superior performance in the identification of temporally varied perturbation patterns and the ability to impute missing time points. We applied CellDrift to multiple longitudinal datasets, including COVID-19 disease progression and gastrointestinal tract development, and demonstrated its ability to identify specific gene programs associated with sequential biological processes, trajectories, and outcomes.
Presentation Overview: Show
Single cell RNA-seq assays have dramatically advanced our ability to study and model cellular differentiation and cell fate decision. RNA velocity is an analysis framework that has become fundamental in the toolbox of the single-cell research community. However preprocessing choices and model assumptions of current RNA velocity models can dramatically influence their predictions and lead to misinterpretation of developmental order and cell fate. To this end, we propose Pyro-Velocity a probabilistic and end-to-end inference framework for RNA velocity based on variational inference that is scalable to millions of cells, based on unsmoothed data, and that naturally provides uncertainty estimation of cell fate based on a joint learning of transcriptional processes across genes. In addition, Pyro-Velocity can be used to learn cell fate decisions from one dataset and predict RNA velocity to other datasets with similar biological contexts. Our method outperforms existing methods in predicting cell fate on a compendium of single cell RNA-seq datasets with or without lineage information and for different biological systems.
In summary, Pyro-Velocity is a new probabilistic RNA velocity framework and a user-friendly end-to-end software package to study cell fate decision from single cell data.
Presentation Overview: Show
Despite the abundance of multi-modal data, suitable statistical models that can improve our understanding of diseases with genetic underpinnings are challenging to develop. Here we present SparseGMM, a novel statistical approach for gene regulatory network discovery. SparseGMM uniquely uses latent variable modeling with sparsity constraints regulators to learn gaussian mixtures from multi-omic data. By combining co-expression patterns with a Bayesian framework, sparseGMM quantitatively measures confidence in regulators and uncertainty in target gene assignment by computing gene entropy. We apply SparseGMM to liver cancer and normal liver tissue data and evaluate the discovered gene modules in an independent scRNA-seq dataset. sparseGMM identifies PROCR as a regulator of angiogenesis, and PDCD1LG2 and HNF4A as regulators of immune response and blood coagulation in cancer, respectively. Additionally, we show that more genes have significantly higher entropy in cancer compared to normal liver; among high entropy genes are key multifunctional components shared by critical pathways, such as p53 and estrogen signaling.
Presentation Overview: Show
Aging profoundly affects immune system function, rendering the elderly more susceptible to pathogens, cancers and chronic inflammation. Single-cell genomics studies have accelerated the discovery of age-dependent immune-cell populations, linking aging phenotypes to changes in diverse immune populations. Here, we used single-cell RNA-seq (scRNA-seq) and chromatin accessibility (scATAC-seq) to deeply profile the CD4+ memory T cell (CD4+TM) compartment over time, enriching for IL10+ cells. We captured many T cell subsets including a population of IL10-producing, T follicular helper-like cells (Tfh10) we previously linked to suppressed vaccine responses in aged mice. From these data, we inferred gene regulatory networks (GRNs) and predicted transcription factor control of gene expression across T cell subsets in youth and old age. Further, we integrated pan-cell sc-genomics studies to identify factors from the microenvironment driving age-dependent changes in CD4+TM. Through computational modeling and broad integration across sc-genomics aging studies, our atlas of finely resolved CD4+TM subsets, GRNs and extracellular-signaling networks opens new opportunities to manipulate IL-10 production and improve immune responses in the elderly.
Presentation Overview: Show
Multiomics experiments are now wide-spread and a powerful tool to assist in precision genomics. However, the transformation of multiomics data into a multi-layered representation of the biology is still a highly challenging task. I will review the challenges that field faces for multi-scale modeling and present novel statistical methods for interpretative multi-omics integration and for connecting epigenetic modifications to the control of cellular metabolism
Presentation Overview: Show
This talk briefly reviews this year’s SysMod community meeting, including speakers, chairpersons, ...