NetBio

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CEST
Monday, July 24th
10:30-11:10
Invited Presentation: Omics Data Fusion for Understanding Molecular Complexity Enabling Precision Medicine
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Martina Summer-Kutmon

  • Natasa Przulj


Presentation Overview: Show

We are flooded by increasing volumes of heterogeneous, interconnected, systems-level, molecular (multi-omic) data. They provide complementary information about cells, tissues and diseases. We need to utilize them to better stratify patients into risk groups, discover new biomarkers, and repurpose known and discover new drugs to personalize medical treatment. This is nontrivial, because of computational intractability of many underlying problems, necessitating the development of algorithms for finding approximate solutions (heuristics).

We develop a versatile data fusion (integration) machine learning (ML) framework to address key challenges in precision medicine from these data: better stratification of patients, prediction of biomarkers, and re-purposing of approved drugs to particular patient groups, applied to cancer, Covid-19, rare thrombophilia and Parkinson’s Disease. Our new methods stem from graph-regularized non-negative matrix tri-factorization (NMTF), a machine learning technique for dimensionality reduction, inference and co-clustering of heterogeneous datasets, coupled with novel network science algorithms. We utilize our new framework to develop methodologies for improving the understanding the molecular organization and disease from the omics data embedding space.

11:10-11:30
Proceedings Presentation: Higher-order genetic interaction discovery with network-based biological priors
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Martina Summer-Kutmon

  • Paolo Pellizzoni, ETH Zurich, Switzerland
  • Giulia Muzio, ETH Zurich, Switzerland
  • Karsten Borgwardt, ETH Zurich, Switzerland


Presentation Overview: Show

Motivation: Complex phenotypes, such as many common diseases and morphological traits, are controlled by multiple genetic factors, namely genetic mutations and genes, and are influenced by environmental conditions. Deciphering the genetics underlying such traits requires a systemic approach, where many different genetic factors and their interactions are considered simultaneously. Many association mapping techniques available nowadays follow this reasoning, but have some severe limitations. In particular, they require binary encodings for the genetic markers, forcing the user to decide beforehand whether to use, for example, a recessive or a dominant encoding. Moreover, most methods cannot include any biological prior or are limited to testing only lower-order interaction among genes for association with the phenotype, potentially missing a large number of marker combinations.
Results: We propose HOGImine, a novel algorithm that expands the class of discoverable genetic meta-markers by considering higher-order interactions of genes and by allowing multiple encodings for the genetic variants. Our experimental evaluation shows that the algorithm has a substantially higher statistical power compared to previous methods, allowing it to discover genetic mutations statistically associated with the phenotype at hand that could not be found before. Our method can exploit prior biological knowledge on gene interactions, such as protein-protein interaction networks, genetic pathways and protein complexes, to restrict its search space. Since computing higher-order gene interactions poses a high computational burden, we also develop novel algorithmic techniques to make our approach applicable in practice, leading to substantial runtime improvements compared to state-of-the-art methods.

11:30-11:50
The axes of biology: a novel axes-based network embedding approach to decipher the fundamental mechanisms of the cell
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Martina Summer-Kutmon

  • Sergio Doria-Belenguer, Barcelona Supercomputing Centre, Spain
  • Alexandros Xenos, Barcelona Supercomputing Centre, Spain
  • Gaia Ceddia, Barcelona Supercomputing Center, Spain
  • Noel Malod-Dognin, Barcelona Supercomputing Center, Spain
  • Natasa Przulj, ICREA; Barcelona Supercomputing Center; University College London, Spain


Presentation Overview: Show

The increasing availability of omic biological data has yielded an unprecedented opportunity to understand the complex functional machinery of the cell. Common approaches to deciphering these complex data are by network embedding algorithms. Embedding approaches strictly focus on clustering the genes' embedding vectors and interpreting such clusters to reveal the hidden information of the biological networks. However, the limitations of the functional annotations resources and the difficulty in interpreting the genes' clusters hinder the identification of the cell's fundamental mechanisms. Thus, we propose to shift the exploration of the gene embedding space from genes' embedding vectors to the axes of the space.

We introduce an axes-based approach to explore the functional organization of species-specific embedding spaces that we generate by Non-negative Matrix Tri-Factorization and Deepwalk algorithms. We demonstrate that our method outperforms the gene-centric approach in capturing functional information from the space. Moreover, we use our approach to find the optimal dimensionality of the embedding spaces and explore the meaning of their axes in detail. We demonstrate that each axis represents a higher-level cellular function that we term Axes-Specific Functional Annotations (ASFAs) and validate them by literature curation. Finally, we use the ASFAs to find new evolutionary connections between species.

11:50-12:10
Exploring the relation between evolutionary gene age, gene expression and chromatin 3D structure in cancer
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Martina Summer-Kutmon

  • Flavien Raynal, Centre de Recherches en Cancérologie de Toulouse, France
  • Benoît Aliaga, Centre de Recherches en Cancérologie de Toulouse, France
  • Kaustav Sengupta, Centre of New Technologies, University of Warsaw, Poland
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, Poland
  • Vera Pancaldi, Centre de Recherches en Cancérologie de Toulouse, France


Presentation Overview: Show

The human genome is composed of genes that appeared at different evolutionary ages for 3.5 billion years. Those genes have progressively been integrated across time, they acquired new functions and made species genomes more sophisticated. Evolutionary scientists have been able to precisely estimate current human gene ages by studying duplication events through time. Whether genes with the same evolutionary age are associated within the genome, have similar expression or share a same chromatin structure is still not well characterized. Inspired by the atavistic theory of cancer, which relates malignancy to the expression of evolutionary ancient phenotypes, we investigate whether cell differentiation and cancer can alter the associations between gene age and other (epi)genomic characteristics. We therefore investigate whether genes with different evolutionary ages could show specific epigenome properties, expression regulation, variability and location in 3D chromatin structures that can be potentially altered in cancer and during differentiation. We identify consistent changes during oncogenesis in the spatial organization of genes from different evolutionary classes, correlated with changes in their expression variability across individuals, reinforcing the important role that old genes and their organization in the nucleus play in cancer phenotypes.

12:10-12:30
Cytokine Module Dynamics during Respiratory Challenges among Pre-diabetic Individuals
Room: Salle Saint Claire 3
Format: Live-stream

Moderator(s): Martina Summer-Kutmon

  • Mireya Diaz, Homer Stryker M.D. School of Medicine, Western Michigan University, United States


Presentation Overview: Show

Background
The Integrative Human Microbiome Project (iHMP) collected a wealth of omics and labs profiles for three conditions. Analysis of the pre-diabetes cohort found that cytokines explained a large proportion of the variability. The present analysis focused in identifying the cytokine networks and their dynamics.

Methods
The time profile of 62 immune analytes in serum from 13 participants who experienced a respiratory infection (INF) and/or immunization (IMZ) were selected. Dynamic weighted gene co-expression network analysis (WGCNA) identified network modules and their variation.

Results
WGCNA identified between eight and 12 modules. Many of these are small and some remain unchanged. Other modules are larger. One comprises several analytes shared by unchallenged periods. This corresponds to non-interferon responses from neutrophils activity. A second large module comprises analytes representing Th1, Th2, Th17 responses. The third module concentrates cytokines specific to respiratory conditions, Th2, and macrophage activity.

Discussion
Network measures examined immune dynamics during respiratory challenges. The network topology differs between them. However, this difference could be confounded by differences in baseline networks. These cannot be ascribed to disparities in insulin sensitivity distribution as this was fixed by design. Future steps include assessment of these baseline differences, and robustness to signed correlations.

13:50-14:10
Proceedings Presentation: Characterising Alternative Splicing Effects on Protein Interaction Networks with LINDA
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Anais Baudot

  • Enio Gjerga, University Hospital Heidelberg, Germany
  • Isabel S. Naarmann-de Vries, University Hospital Heidelberg, Germany
  • Christoph Dieterich, University Hospital Heidelberg, Germany


Presentation Overview: Show

Alternative RNA splicing plays a crucial role in defining protein function. However, despite its relevance, there is a lack of tools that characterise effects of splicing on protein interaction networks in a mechanistic manner (i.e. presence or absence of protein-protein interactions due to RNA splicing). To fill this gap, we present LINDA (Linear Integer programming for Network reconstruction using transcriptomics and Differential splicing data Analysis) as a method that integrates resources of protein-protein and domain-domain interaction, transcription factor targets and differential splicing/transcript analysis to infer splicing-dependent effects on cellular pathways and regulatory networks. We have applied LINDA to a panel of 54 shRNA depletion experiments in HepG2 and K562 cells from the ENCORE initiative. Through computational benchmarking, we could show that the integration of splicing effects with LINDA can identify pathway mechanisms contributing to known bioprocesses better than other state of the art methods, which do not account for splicing. Additionally, we have experimentally validated some of the predicted splicing effects that the depletion of HNRNPK in K562 cells has on signalling. LINDA has been implemented as an R-package and it is available online in: https://dieterich-lab.github.io/LINDA/ along with results and tutorials.

14:10-14:30
scSeqComm: a statistical and network-based framework to infer inter- and intra-cellular communication from single-cell RNA sequencing data
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Anais Baudot

  • Giacomo Baruzzo, University of Padova, Italy
  • Giulia Cesaro, University of Padua, Italy
  • Barbara Di Camillo, University of Padova, Italy


Presentation Overview: Show

Tissues are complex systems made of multiple cells spatially and temporally organized that interact with each other. Recently, single-cell RNA-sequencing has emerged as a powerful tool to study such cellular communications and several bioinformatics methods have been proposed to infer intercellular signaling between groups of cells, combining ligand and receptor expression levels. Only few methods investigate also the intracellular signalling activated by the ligand–receptor binding, i.e. the signalling cascade triggering cell response and transcriptional activation/inhibition of specific genes.
We proposed scSeqComm (R-package https://gitlab.com/sysbiobig/scseqcomm), a computational method to identify, quantify and characterize cellular communication at both intercellular and intracellular signalling level from scRNA-seq data. Compared to previous approaches, scSeqComm quantifies intracellular signalling not only to characterize the functional effects of communication, but also to support the evidence that the communication has occurred.
With respect to the original publication, the proposed framework was extended to enable analysis of differential cellular communication in multi-condition and multi-patient scenarios. Moreover, parallelism and in-memory computation features were introduced, along with a user-friendly and interactive dashboard to support analysis and interpretations of results.
We show applications of the approach to scRNA-seq datasets, its validation using spatial transcriptomics data and the comparison with state-of-the-art intercellular scoring schemes.

14:30-14:50
Identifying and refining regulatory pathways through full-genome loss-of-function correlation networks
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Anais Baudot

  • Florian Klimm, Novo Nordisk Research Centre Oxford, United Kingdom
  • Maxwell Ruby, Novo Nordisk Research Centre Oxford, United Kingdom
  • Robert Kitchen, Novo Nordisk Research Centre Oxford, United Kingdom


Presentation Overview: Show

Genome-wide loss-of-function screens are powerful tools for deciphering gene knockout effects and identifying therapeutic targets. To what extent such data can be used to refine regulatory pathways, however, is unknown. We demonstrate that constructing a correlation network from the DepMap CRISPR knockout screens reveals genes with common biological functions. Extracting subnetworks that represent well-studied pathways uncovers a hierarchical substructure that coincides with the compartmentalisation of glycolysis. The incompleteness of pathway annotation data raises the question whether we can build on the loss-of-function correlation network to refine these annotations. Using network propagation, specifically the well-established personalised PageRank algorithm, we identify genes that are in proximity to selected seed genes. We verify this approach with a cross-validation and outperform baseline ranking across a wide range of parameters. We then demonstrate the method by using members of the Ragulator--Rag complex and identify genes functionally associated with this complex. The presented method is a general tool for identifying a ranking of genes from a list of seed genes, based on a similarity in loss-of-function screens. As such, we anticipate that it can be used as hypothesis creator for biologists that aim to extend an identified list of genes.

14:50-15:10
Gene-specific optimization of binding sites integration to expression data improves regression-based Gene Regulatory Network inference in Arabidopsis thaliana
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Anais Baudot

  • Océane Cassan, LIRMM, Univ Montpellier, CNRS, Montpellier, France, France
  • Charles-Henri Lecellier, IGMM, Univ Montpellier, CNRS, Montpellier, France, France
  • Antoine Martin, IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, Montpellier, France, France
  • Laurent Bréhélin, LIRMM, Univ Montpellier, CNRS, Montpellier, France, France
  • Sophie Lèbre, IMAG, Univ Montpellier, CNRS, Montpellier, France, France


Presentation Overview: Show

Gene Regulatory Networks (GRNs) are abstract models of the molecular interactions governing gene expression. In the last decade, integrative regression-based strategies have successfully emerged to guide GRN inference from gene expression with prior data.
However, prior knowledge datasets and validation gold standards are often redundant. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior knowledge integration. Furthermore, previous works in integrative GRN inference proposed a system-level strength of integration whereas prior information might have varying added-value for all genes.
We address these limitations for two common regression models, an integrative Random Forest and a generalized linear model with stability selection estimated under a weighted LASSO penalty as we model the temporal response to nitrate induction in Arabidopsis thaliana.
For each gene, we measure how the integration of transcription factor binding sites influences gene expression prediction. Comparison with randomly permuted datasets, where the link between gene expression and prior information is broken, allows us to optimize integration strength.
As a result, this gene-specific integration optimization scheme provides a good trade-off between data integration intensity, prediction error minimization and precision on experimental interactions, while master regulators of nitrate induction are accurately retrieved.

15:10-15:30
Proceedings Presentation: Gemini: Memory-efficient integration of hundreds of gene networks with high-order pooling
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Anais Baudot

  • Addie Woicik, University of Washington, United States
  • Mingxin Zhang, University of Washington, United States
  • Hanwen Xu, University of Washington, United States
  • Sara Mostafavi, University of Washington, United States
  • Sheng Wang, University of Washington, United States


Presentation Overview: Show

The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 14% improvement in micro-AUPRC, and 71% improvement in macro-AURPC for protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini's performance significantly improves when more networks are added to the input network collection, while the comparison approach's performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks, and can be used to massively integrate and analyze networks in other domains. Gemini can be accessed at: https://github.com/MinxZ/Gemini.

16:00-16:20
Proceedings Presentation: Supervised biological network alignment with graph neural networks
Room: Salle Saint Claire 3
Format: Live-stream

Moderator(s): Natasa Przulj

  • Kerr Ding, Georgia Institute of Technology, United States
  • Sheng Wang, University of Washington, United States
  • Yunan Luo, Georgia Institute of Technology, United States


Presentation Overview: Show

Despite the advances in sequencing technology, massive proteins with known sequences remain functionally unannotated. Biological network alignment (NA), which aims to find the node correspondence between species' protein-protein interaction (PPI) networks, has been a popular strategy to uncover missing annotations by transferring functional knowledge across species. Traditional NA methods assumed that topologically similar proteins in PPIs are functionally similar. However, it was recently reported that functionally unrelated proteins can be as topologically similar as functionally related pairs, and a new data-driven or supervised NA paradigm has been proposed, which uses protein function data to discern which topological features correspond to functional relatedness. Here, we propose GraNA, a deep learning framework for the supervised NA paradigm. Employing graph neural networks, GraNA utilizes within-network interactions and across-network anchor links for learning protein representations and predicting functional correspondence between across-species proteins. A major strength of GraNA is its flexibility to integrate multi-faceted non-functional relationship data, such as sequence similarity and ortholog relationships, as anchor links to guide the mapping of functionally related proteins across species. Evaluating GraNA on a benchmark dataset composed of several NA tasks between different pairs of species, we observed that GraNA accurately predicted the functional relatedness of proteins and robustly transferred functional annotations across species, outperforming a number of existing NA methods. When applied to a case study on a humanized yeast network, GraNA also successfully discovered functionally replaceable human-yeast protein pairs that were documented in previous studies.

16:20-16:40
Accurate Cross-Species, Out-of-Distribution Predictions of Protein-Protein Interactions using Deep Learning
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Natasa Przulj

  • Joseph Szymborski, McGill University, Canada
  • Amin Emad, McGill University, United States


Presentation Overview: Show

Advancements in technology have led to “all-by-all” proteome-scale protein-protein interaction (PPI) experiments, which typically involve large investments of time, money, and resources. Such experiments are usually performed on a select few well-known model organisms, leading to a significant disparity in PPI information. In silico methods represent a potential solution to narrow this information gap by requiring fewer resources and time. However, machine learning methods have shown an inability to generalize predictions of PPIs leading to the development of the RAPPPID model, a deep regularized PPI prediction model.

We show here for the first time that RAPPPID makes accurate interaction predictions between proteins, independent of their species of origin or their presence in the training dataset. Additionally, RAPPPID maintains comparable performance when tested on various species independent of their evolutionary distance. The model is calibrated against strict datasets which are carefully controlled for data leakage. The RAPPPID online interface offers an accessible service for interactive predictions. All together, we show that RAPPPID is able to make predictions that are more accurate than existing methods, effective across species, and computationally efficient.

16:40-17:00
Proceedings Presentation: Trap spaces of multi-valued networks: Definition, computation, and applications
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Natasa Przulj

  • Van-Giang Trinh, Aix-Marseille University, France
  • Belaid Benhamou, Aix-Marseille University, France
  • Thomas Henzinger, Institute of Science and Technology Austria, Austria
  • Samuel Pastva, Institute of Science and Technology Austria, Austria


Presentation Overview: Show

Boolean networks are simple but efficient mathematical formalism for modeling complex biological systems. However, having only two levels of activation is sometimes not enough to fully capture the dynamics of real-world biological systems. Hence the need for multi-valued networks, a generalization of Boolean networks. Despite the importance of multi-valued networks for modeling biological systems, only limited progress has been made on developing theories, analysis methods, and tools that can support them. In particular, the recent use of trap spaces in Boolean networks made a great impact on the field of systems biology, but there has been no similar concept defined and studied for multi-valued networks to date.

In this work, we generalize the concept of trap spaces in Boolean networks to that in multi-valued networks. We then develop the theory and the analysis methods for trap spaces in multi-valued networks. In particular, we implement all proposed methods in a Python package called trapmvn. Not only showing the applicability of our approach via a realistic case study, we also evaluate the time efficiency of the method on a large collection of real-world models. The experimental results confirm the time efficiency, which we believe enables more accurate analysis on larger and more complex multi-valued models.

17:00-17:20
Proceedings Presentation: Optimal adjustment sets for causal query estimation in partially observed biomolecular networks
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Natasa Przulj

  • Sara Mohammad-Taheri, Northeastern Univerusity, United States
  • Vartika Tewari, Northeastern Univerusity, United States
  • Rohan Kapre, Northeastern Univerusity, United States
  • Ehsan Rahiminasab, Google Inc, United States
  • Karen Sachs, Next Generation Analytics, United States
  • Charles Tapley Hoyt, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Jeremy Zucker, Pacific Northwest National Laboratory, United States
  • Olga Vitek, Northeastern Univerusity, United States


Presentation Overview: Show

Causal query estimation commonly selects a {\it valid adjustment set}, i.e. a subset of covariates in a model that eliminates the bias of the estimator.
The same query may have multiple valid adjustment sets, each with a different variance.
When networks are partially observed, current methods use graph-based criteria to find an adjustment set that minimizes asymptotic variance.
Many models share the same graph topology, and thus share the same functional dependencies, but may differ in the specific functions that generate the observational data.
In these cases, the topology-based criteria fail to distinguish the variances of the adjustment sets.
This deficiency can lead to sub-optimal adjustment sets, and to miss-characterization of the effect of the intervention.
We propose an approach for deriving {\it optimal adjustment sets}
that take into account the nature of the data generation process, bias, finite-sample variance, and cost.
It empirically learns the data generating processes from historical experimental data, and characterizes the properties of the estimators by simulation.
We demonstrate the utility of the proposed approach in four biomolecular case studies with different topologies and different data generation processes.
The implementation and reproducible case studies are at
\url{https://anonymous.4open.science/r/OptimalAdjustmentSet-E543}

17:20-18:00
Invited Presentation: Triadic Closure and Bistability in Evolving Networks
Room: Salle Saint Claire 3
Format: Live from venue

Moderator(s): Natasa Przulj

  • Desmond Higham


Presentation Overview: Show

In the study of social interaction networks, triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that a triadic closure mechanism can lead to a bistability effect when large-scale social interaction networks evolve over time. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states. In this work, we propose and study a hierarchy of network evolution models that incorporate triadic closure. We use a chemical kinetics framework, paying careful attention to the reaction rate scaling with respect to the system size. In a macroscale regime, we show rigorously that a bimodal steady state distribution is admitted. This behaviour corresponds to the existence of two distinct stable fixed points in a deterministic mean-field ODE. The macroscale model is also seen to capture an apparent metastability property of the microscale system. Computational simulations will be used to support the analysis. This is joint work with Stefano Di Giovacchino (University of L'Aquila) and Kostas Zygalakis (University of Edinburgh).