NetBio COSI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CDT
Tuesday, July 12th
10:30-11:10
Keynote Presentation: Network-based approaches to study mutational processes in cancer
Room: Madison CD
Format: Live-stream

Moderator(s): Marinka Zitnik

  • Teresa Przytycka


Presentation Overview: Show

Cancer genomes accumulate a large number of somatic mutations resulting from imperfection of DNA processing during normal cell cycle as well as from carcinogenic exposures or cancer related aberrations of DNA maintenance machinery. These processes often lead to distinctive patterns of mutations, called mutational signatures. Considering these signatures as quantitative traits, we leverage them for studies of the interactions between mutagenic processes, other cellular processes, and environment. Untangling these interactions is critical for understanding processes underlying mutational signatures. To address these challenges, we developed several complementary computational approaches allowing us to link several mutational signatures to their causes. I will discuss selected approaches focusing on network based methods.

11:10-11:30
Proceedings Presentation: DEMA: a distance-bounded energy-field mini-mization algorithm to model and layout bio-molecular networks with quantitative features
Room: Madison CD
Format: Live from venue

Moderator(s): Marinka Zitnik

  • Zhenyu Weng, Institute of Big Data Technologies, Shenzhen Graduate School, Peking University, China, China
  • Zongliang Yue, Informatics Institute, School of Medicine, University of Alabama at Birmingham, USA, United States
  • Yuesheng Zhu, Institute of Big Data Technologies, Shenzhen Graduate School, Peking University, China, China
  • Jake Chen, Informatics Institute, School of Medicine, University of Alabama at Birmingham, United States


Presentation Overview: Show

In biology, graph layout algorithms can reveal comprehensive biological contexts by visually position-ing graph nodes in their relevant neighborhoods. A layout software algorithm/engine commonly takes a set of nodes and edges and produces layout coordinates of nodes according to edge constraints. However, current layout engines normally do not consider node, edge, or node-set properties during layout and only curate these properties after layout is created. Here, we propose a new layout algo-rithm, distance-bounded energy-field minimization algorithm (DEMA), to natively consider various biological factors, i.e., the strength of gene-to-gene association, the gene’s relative contribution weight, and the functional groups of genes, to enhance the interpretation of complex network graphs. In DEMA, we introduce a parameterized energy model where nodes are repelled by the network to-pology and attracted by a few biological factors, i.e., interaction coefficient (IC), effect coefficient (EC), and fold change (FC) of gene expression. We generalize these factors as gene weights, PPI weights, gene-to-gene correlations, and the gene set annotations—four parameterized functional properties used in DEMA. Moreover, DEMA considers further attraction/repulsion/grouping coefficient to enable different preferences in generating network views. Applying DEMA, we performed two case studies using genetics data in Autism Spectrum Disorder (ASD) and Alzheimer’s disease (AD), re-spectively, for gene candidate discovery. Furthermore, we implement our algorithm as a plugin to Cytoscape, an open-source software platform for visualizing networks; hence, it is convenient. Our software and demo can be freely accessed at http://discovery.informatics.uab.edu/dema.

11:30-11:50
Joint embedding of biological networks for cross-species functional alignment
Room: Madison CD
Format: Live from venue

Moderator(s): Marinka Zitnik

  • Lechuan Li, Rice University, United States
  • Ruth Dannenfelser, Rice University, United States
  • Yu Zhu, Rice University, United States
  • Nathaniel Hejduk, Rice University, United States
  • Santiago Segarra, Rice University, United States
  • Vicky Yao, Rice University, United States


Presentation Overview: Show

Model organisms are widely used to better understand the molecular causes of human disease. While sequence similarity greatly aids this transfer, sequence similarity does not imply functional similarity, and thus, several current approaches incorporate protein-protein interactions (PPIs) to help map findings between species. Existing transfer methods either formulate the alignment problem as a matching problem, which pits network features against known orthology, or more recently, as a joint embedding problem. Here, we propose a novel state-of-the-art joint embedding solution: Embeddings to Network Alignment (ETNA). More specifically, ETNA generates individual network embeddings based on network topological structures and then uses a Natural Language Processing-inspired cross-training approach to align the two embeddings using sequence orthologs. The final embedding preserves both within and between species gene functional relationships, and we demonstrate that it captures both pairwise and group functional relevance. In addition, ETNA's embeddings can be used to transfer genetic interactions across species and identify phenotypic alignments, laying the groundwork for potential opportunities for drug repurposing and translational studies.

11:50-12:10
Accurately identifying disease genes and relevant contexts using context-specific network embeddings
Room: Madison CD
Format: Live from venue

Moderator(s): Marinka Zitnik

  • Renming Liu, Michigan State University, United States
  • Matthew Hirn, Michigan State University, United States
  • Arjun Krishnan, Michigan State University, United States


Presentation Overview: Show

Accurately identifying genes associated with diseases is the key to understanding the disease mechanisms and finding treatment strategies accordingly. The modular nature of disease genes in the human gene interaction network has motivated several network-based disease gene prediction methods, including network embeddings. However, complex diseases are heterogeneous, involving several hundreds of genes, and can manifest differently in various contexts such as tissues and disease states. Overlooking the contexts in which diseases and traits manifest themselves could lead to a less accurate understanding of the human disease genes. Here, we developed a context-specific network embedding method highlighting certain contextual information, including the tissue specificity and gene expression study specificity. We then used an ensemble logistic regression model to combine all the context-specific embeddings to perform disease gene predictions according to the validation scores. Our method significantly improves disease gene prediction performance over the context-naive embeddings. Furthermore, the resulting ensemble model coefficients accurately reflect the biologically meaningful disease-context association. Finally, our method is general and can be applied to a user-defined gene expression dataset to generate the corresponding context-specific embeddings to better understand the context information by finding the top diseases related to the gene expression dataset.

12:10-12:30
Multi-layer networks improve protein structural classification
Room: Madison CD
Format: Live from venue

Moderator(s): Marinka Zitnik

  • Khalique Newaz, University of Hamburg, Germany
  • Jacob Piland, University of Notre Dame, United States
  • Patricia Clark, University of Notre Dame, United States
  • Scott Emrich, University of Tennessee, United States
  • Jun Li, University of Notre Dame, United States
  • Tijana Milenkovic, University of Notre Dame, United States


Presentation Overview: Show

Protein structural classification (PSC) is a supervised problem of assigning proteins into pre-defined structural (e.g., CATH or SCOPe) classes based on the proteins' sequence or 3D structural features. We recently proposed PSC approaches that model protein 3D structures as protein structure networks (PSNs) and analyze PSN-based protein features, which performed better than or comparable to state-of-the-art sequence or other 3D structure-based PSC approaches. However, existing PSN-based PSC approaches model the whole 3D structure of a protein as a static (i.e., single-layer) PSN. Because folding of a protein is a dynamic process, where some parts (i.e., sub-structures) of a protein fold before others, modeling the 3D structure of a protein as a PSN that captures the sub-structures might further help improve the existing PSC performance. Here, we propose to model 3D structures of proteins as multi-layer sequential PSNs that approximate 3D sub-structures of proteins, with the hypothesis that this will improve upon the current state-of-the-art PSC approaches that are based on single-layer PSNs (and thus upon the existing state-of-the-art sequence and other 3D structural approaches). Indeed, we confirm this on 72 datasets spanning ~44,000 CATH and SCOPe protein domains.

14:30-14:50
Network-based data integration and visualization provides a global understanding of regulatory mechanisms in Aspergillus fumigatus
Room: Madison CD
Format: Live from venue

Moderator(s): Tijana Milenkovic

  • Spencer Halberg-Spencer, University of Wisconsin-Madison, Wisconsin Institute for Discovery, United States
  • Saptarshi Pyne, University of Wisconsin-Madison, Wisconsin Institute for Discovery, United States
  • Cristobal Carriel, University of Wisconsin-Madison, United States
  • Jean-Michel Ané, University of Wisconsin-Madison, United States
  • Nancy Keller, University of Wisconsin-Madison, United States
  • Sushmita Roy, University of Wisconsin-Madison, Wisconsin Institute for Discovery, United States


Presentation Overview: Show

Invasive Aspergillosis (IA), a fungal infection of the lungs caused by the pathogen Aspergillus fumigatus, is the most common invasive fungal infection in immunosuppressed individuals. Recent studies have recognized IA as a secondary infection that complicates COVID-19 increasing mortality. Despite the high clinical relevance of A. fumigatus, the molecular mechanisms that underlie IA and co-morbid conditions remain poorly characterized. We present a network-based analysis pipeline that combines gene regulatory network (GRN) inference and network-based interpretation of regulatory modules to characterize A. fumigatus transcriptional response. Our GRN inference approach incorporates latent transcription factor activity (TFA) estimation to elucidate transcription factors that are post-transcriptionally regulated for which gene expression may not be informative. We provide an interactive network visualization framework that incorporates statistical and topological tools used to investigate context specific roles of regulators within the network. Our framework can be used to interpret input gene lists to predict associated biological pathways, prioritize regulators based on kernel diffusion and identify novel subnetwork components using a Steiner tree approximation. Application of our framework to A. fumigatus predicted known and novel regulators of multiple secondary metabolite regulatory pathways. Our approach and resource are broadly applicable for network-based interpretation of clinically significant fungal species.

14:50-15:10
hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies
Room: Madison CD
Format: Live from venue

Moderator(s): Tijana Milenkovic

  • Kevin Drew, University of Illinois at Chicago, United States
  • John B Wallingford, University of Texas at Austin, United States
  • Edward M Marcotte, University of Texas at Austin, United States


Presentation Overview: Show

A general principle of biology is the self-assembly of proteins into functional complexes. Characterizing their composition is, therefore, required for our understanding of cellular functions. Unfortunately, we lack knowledge of the comprehensive set of identities of protein complexes in human cells. To address this gap, we developed a machine learning framework to identify protein complexes in over 15,000 mass spectrometry experiments which resulted in the identification of nearly 7,000 physical assemblies. We show our resource, hu.MAP 2.0, is more accurate and comprehensive than previous state of the art high throughput protein complex resources and gives rise to many new hypotheses, including for 274 completely uncharacterized proteins. Further, we identify 259 promiscuous proteins that participate in multiple complexes pointing to possible moonlighting roles. We have made hu.MAP 2.0 easily searchable in a web interface (http://humap2.proteincomplexes.org/), which will be a valuable resource for researchers across a broad range of interests including systems biology, structural biology, and molecular explanations of disease.

15:10-15:30
Are Transient Protein-Protein Interactions More Dispensable?
Room: Madison CD
Format: Live from venue

Moderator(s): Tijana Milenkovic

  • Mohamed Ghadie, McGill University, Canada
  • Yu Xia, McGill University, Canada


Presentation Overview: Show

Protein-protein interactions (PPIs) are key drivers of cell function. While it is widely assumed that permanent PPIs tend to be important for cellular function and therefore not dispensable, it remains unclear whether or not transient PPIs are more dispensable than permanent PPIs. Here, we estimate and compare dispensable content among transient and permanent PPIs in the human interactome, by calculating the fractions of transient and permanent interactions that are neutral upon disruption. Starting with a human reference interactome mapped by experiments, we construct a human structural interactome by building three-dimensional structural models for PPIs using homology modeling, and then distinguish transient interactions from permanent interactions using several structural and biophysical properties. Next, we map common mutations from healthy individuals and disease-causing mutations onto the structural interactome, and perform structure-based calculations of the probabilities for common mutations (assumed to be neutral) and disease mutations (assumed to be mildly deleterious) to disrupt transient interactions and permanent interactions. Using Bayes’ theorem, we estimate that a similarly small fraction (<~20%) of both transient and permanent PPIs are completely dispensable, i.e., effectively neutral upon disruption by mutation. Hence, transient and permanent interactions are subject to similarly strong selective constraints in the human protein interactome.

16:00-16:20
Proceedings Presentation: Computing optimal factories in metabolic networks with negative regulation
Room: Madison CD
Format: Live from venue

Moderator(s): Tijana Milenkovic

  • Spencer Krieger, University of Arizona, United States
  • John Kececioglu, University of Arizona, United States


Presentation Overview: Show

Motivation: A factory in a metabolic network specifies how to produce target molecules from source compounds through biochemical reactions, properly accounting for reaction stoichiometry to conserve or not deplete intermediate metabolites. While finding factories is a fundamental problem in systems biology, available methods do not consider the number of reactions used, nor address negative regulation.
Methods: We introduce the new problem of finding optimal factories that use the fewest reactions, for the first time incorporating both first- and second-order negative regulation. We model this problem with directed hypergraphs, prove it is NP-complete, solve it via mixed-integer linear programming, and accommodate second-order negative regulation by an iterative approach that generates next-best factories.
Results: This optimization-based approach is remarkably fast in practice, typically finding optimal factories in a few seconds, even for metabolic networks involving tens of thousands of reactions and metabolites, as demonstrated through comprehensive experiments across all instances from standard reaction databases.
Availability and implementation: Source code for an implementation of our new method for optimal factories with negative regulation in a new tool called Odinn, together with all datasets, is available free for non-commercial use at http://odinn.cs.arizona.edu.

16:20-16:40
The BioCyc Metabolic Network Explorer
Room: Madison CD
Format: Live-stream

Moderator(s): Tijana Milenkovic

  • Suzanne Paley, SRI International, United States
  • Peter Karp, SRI International, United States


Presentation Overview: Show

The Metabolic Network Explorer is a new addition to the BioCyc.org
website and Pathway Tools software that supports interactive
exploration of metabolic networks. Any metabolic network visualization
tool must by necessity show only a subset of all possible metabolite
connections, or the results will be visually overwhelming. Other tools
limit the set of displayed connections based on predefined pathways or
other preselected criteria. We sought instead to provide a tool that
would give the user dynamic control over which connections to follow.
The Metabolic Network Explorer is a web-based software tool that
allows the user to specify a starting metabolite of interest and
interactively explore its immediate metabolic neighborhood in both
directions, letting the user select from the full set of connected
reactions. Although only a small portion of the metabolic network is
visible at a time, that portion is selected by the user, based on the
full reaction complement, and it is easy to switch among alternate
paths of interest. The display is intuitive, customizable, and
provides copious links to more detailed information pages. The
Metabolic Network Explorer fills a gap in the set of metabolic network
visualization tools and complements other modes of exploration.

16:40-17:00
GRaNIE and GRaNPA: Inference and evaluation of enhancer-mediated gene regulatory networks applied to study macrophages
Room: Madison CD
Format: Live from venue

Moderator(s): Tijana Milenkovic

  • Aryan Kamal, EMBL, Germany
  • Christian Arnold, EMBL, Germany
  • Annique Claringbould, EMBL, Germany
  • Rim Moussa, EMBL, Germany
  • Neha Daga, EMBL, Germany
  • Daria Nogina, EMBL, Germany
  • Maksim Kholmatov, EMBL, Germany
  • Nila Servaas, EMBL, Germany
  • Sophia Mueller-Dott, EMBL, Germany
  • Armando Reyes-Palomares, EMBL, Germany
  • Giovanni Palla, EMBL, Germany
  • Olga Sigalova, EMBL, Germany
  • Daria Bunina, EMBL, Germany
  • Caroline Pabst, Department of Medicine V, Hematology, Oncology and Rheumatology, University Hospital Heidelberg, Germany
  • Judith Zaugg, EMBL, Germany


Presentation Overview: Show

The interpretation of disease-associated genetic variants in non-coding genomic regions remains challenging in the post-GWAS era, and enhancers emerged as key players in mediating the effect of genetic variants on complex traits/diseases. Their activity is often regulated via transcription factors (TFs), epigenetic changes and genetic variants. While existing approaches link enhancers to their target genes and infer TF-gene connections, we currently lack a framework that systematically integrates enhancers into TF-gene regulatory networks. Furthermore, we lack an unbiased way of assessing the biological meaningfulness of inferred regulatory interactions. Here we present two methods, implemented as user-friendly R-packages, for building and evaluating enhancer-mediated gene regulatory networks (eGRNs) called GRaNIE (Gene Regulatory Network Inference including Enhancers - https://git.embl.de/grp-zaugg/GRaNIE) and GRaNPA (Gene Regulatory Network Performance Analysis - https://git.embl.de/grp-zaugg/GRaNPA), respectively. GRaNIE jointly infers TF-enhancer, enhancer-gene and TF-gene interactions by integrating open chromatin data (e.g., ATAC-Seq or H3K27ac) with RNA-seq across samples (e.g. individuals), and optionally also Hi-C. GRaNPA is a general framework for evaluating the biological relevance of TF-gene GRNs by assessing their performance for predicting cell-type specific differential expression. We demonstrate their power by investigating gene regulatory mechanisms in macrophages that underlie their response to infection, and their involvement in common genetic (autoimmune) diseases.

17:00-17:20
FAVA: High-quality functional association networks inferred from massive scRNA-seq and proteomics data
Room: Madison CD
Format: Live from venue

Moderator(s): Tijana Milenkovic

  • Mikaela Koutrouli, Novo Nordisk Foundation Center of Protein Research, Denmark
  • Pau Piera Líndez, Novo Nordisk Foundation Center of Protein Research, Denmark
  • Robbin Bouwmeester, VIB-UGent Center for Medical Biotechnology | Department of Biomolecular Medicine, Ghent University, Ghent, Belgium, Belgium
  • Lennart Martens, VIB-UGent Center for Medical Biotechnology | Department of Biomolecular Medicine, Ghent University, Ghent, Belgium, Belgium
  • Lars Juhl Jensen, Novo Nordisk Foundation Center of Protein Research, Denmark


Presentation Overview: Show

Protein networks are commonly used for understanding the interplay between proteins in the cell as well as for visualizing omics data. Unfortunately, existing networks such as STRING are heavily biased by data availability in the sense that well-studied proteins have many more interactions than understudied proteins. To create networks also for the latter, we need to use high-throughput data, such as single cell RNA-seq (scRNA-seq) and proteomics, which do not have this literature bias. However, due to the sparseness (i.e. many proteins not observed in each cell/sample) and redundancy (many similar cells/samples) of such data, simple correlation analysis does not result in high-quality networks. We present FAVA, Functional Associations using Variational Autoencoders, which deals with these issues by compressing the high-dimensional data into a meaningful, dense, low-dimensional latent space. We demonstrate that calculating correlations in this latent space results in much improved networks compared to the original representation for massive scRNA-seq and proteomics data from Human Protein Atlas and PRIDE, respectively. We show that these networks, which given the nature of the input data should be free of literature bias, indeed have much better coverage of understudied proteins than existing networks.

17:20-18:00
Keynote Presentation: Mapping and analysis of a global reference genetic interaction network for human cells
Room: Madison CD
Format: Live from venue

Moderator(s): Tijana Milenkovic

  • Chad Myers


Presentation Overview: Show

Despite our ability to efficiently capture human genomes, we still remain far from accurately predicting phenotypes from sequence. There are a variety of reasons for this gap, but one reason is the potential for genetic interactions among variants. Efforts using reverse genetic approaches in the yeast model system have shed light on this problem. Combinations of mutations in nearly all possible yeast genes were constructed and phenotyped, producing a global genetic network that has been a valuable resource for understanding yeast biology. While technical challenges have previously limited similar endeavors in human cells, CRISPR/Cas9-based genome editing technology now makes this powerful combinatorial mutation approach possible.

I will discuss our recent efforts to map a global genetic interaction network for human cells based on genome-wide CRISPR/Cas9 screens in a reference human cell line. We have identified several challenges associated with interpreting data from differential CRISPR screens and have developed a novel computational pipeline for accurate scoring of quantitative genetic interactions in this context. I will describe these lessons learned and other insights from our growing reference human genetic interaction map.