Attention Presenters - please review the Presenter Information Page available here
Schedule subject to change
All times listed are in EDT
Sunday, July 14th
15:10-15:20
NetBio Opening
Room: 520c
Format: In person

Moderator(s): Anaïs Baudot


Authors List: Show

  • Martina Summer-Kutmon
15:20-16:00
Invited Presentation: Towards semantic representation and causal inference in biomedicine. Challenges and applications
Confirmed Presenter: Sergio Baranzini

Room: 520c
Format: In Person

Moderator(s): Anaïs Baudot


Authors List: Show

  • Sergio Baranzini

Presentation Overview: Show

Massive amounts of data and information are available for analysis in biomedicine. However, integration of these resources in a powerful statistical but also biologically meaningful framework poses a considerable challenge. SPOKE is a large knowledge graph containing information from more than 40 specialized databases and spanning multiple disciplines within biomedicine. Currently SPOKE contains 50 million concepts and more than 130 million relationships organized in a semantic graph. This talk will cover the creation of SPOKE and some of its cutting-edge applications. Some examples will include the embedding of more than 2 million electronic health records onto SPOKE, which led to training of machine learning models to predict diagnosis and outcomes in multiple sclerosis (MS), Parkinson’s disease (PD) and Alzheimer’s (AD). In addition, efforts directed towards applications in drug development and repurposing will be presented. Finally, automated integrative strategies are needed to fully harness the power of biomedical information. To that end, a novel method of knowledge graph-based retrieval augmentation (KG-RAG) implemented over SPOKE will be discussed.

16:40-17:00
Proceedings Presentation: Modeling metastatic progression from cross-sectional cancer genomics data
Confirmed Presenter: Kevin Rupp, ETH Zurich, Switzerland

Room: 520c
Format: In Person

Moderator(s): Anaïs Baudot


Authors List: Show

  • Kevin Rupp, ETH Zurich, Switzerland
  • Andreas Lösch, University of Regensburg, Germany
  • Y. Linda Hu, University of Regensburg, Germany
  • Chenxi Nie, ETH Zurich, Switzerland
  • Rudolf Schill, ETH Zurich, Switzerland
  • Maren Klever, RWTH Aachen, Germany
  • Simon Pfahler, University of Regensburg, Germany
  • Lars Grasedyck, RWTH Aachen, Germany
  • Tilo Wettig, University of Regensburg, Germany
  • Niko Beerenwinkel, ETH Zurich, Switzerland
  • Rainer Spang, University of Regensburg, Germany

Presentation Overview: Show

Metastasis formation is a hallmark of cancer lethality. Yet, metastases are generally unobservable during their early stages
of dissemination and spread to distant organs. Genomic datasets of matched primary tumors and metastases may offer
insights into the underpinnings and the dynamics of metastasis formation. We present metMHN, a cancer progression
model designed to deduce the joint progression of primary tumors and metastases using cross-sectional cancer genomics
data. The model elucidates the statistical dependencies among genomic events, the formation of metastasis, and the clinical
emergence of both primary tumors and their metastatic counterparts. metMHN enables the chronological reconstruction
of mutational sequences and facilitates estimation of the timing of metastatic seeding. In a study of nearly 5000 lung
adenocarcinomas, metMHN pinpointed TP53 and EGFR as mediators of metastasis formation. Furthermore, the study
revealed that post-seeding adaptation is predominantly influenced by frequent copy number alterations. All datasets and
code are available on GitHub at https://github.com/cbg-ethz/metMHN.

17:00-17:20
Multi-omics systems biology approach identifies novel signature genes for neuropsychiatric disorders
Room: 520c
Format: In person

Moderator(s): Anaïs Baudot


Authors List: Show

  • Deisy Morselli Gysi, Federal University of Paraná, Brazil
  • Katja Nowick, Free University Berlin, Germany

Presentation Overview: Show

The complex nature of mental disorders has long fascinated scientists, driving them to uncover the shared genetic factors that link these conditions. Much is still unknown about the genetic overlap across mental disorders nor the specificities of their genetic underpinning. We create a gene-disease network using genes associated to disorder from multiple curated sources, which revealed clusters of highly genetically related diseases, corroborating with the main chapters of the DSM-5. Interestingly, psychiatric disorders formed a tight cluster with neurodegenerative disorders. This prompts us to investigate that cluster using a combination of gene coexpression networks and protein-protein interaction networks. To this end, we constructed 61 independent coexpression networks, focusing on Transcription Factors, from studies including data from patients with autism spectrum disorder, Bipolar Disorder, Major Depressive Disorder, Schizophrenia, Alzheimer’s Disease and Parkinson's Disease, as well as control individuals, employing rigorous statistical methods to reduce bias between studies and the number of false positive links, and performed a differential network analysis to compare networks across diseases. Our analysis allowed pinpointing signature TF genes for each disorder that could help improve disease diagnosis. Taken together, our discoveries not only advance our understanding of the interconnectedness of the investigated mental disease but also offer the possibility of improving diagnostic approaches to distinguish between diseases, ultimately benefiting individuals affected by these challenging disorders.

17:20-17:40
Improved community detection through signed graphs in single-cell co-expression networks
Confirmed Presenter: Luis Augusto Eijy Nagai, University of Tokyo, Institute for Quantitative Biosciences, Japan

Room: 520c
Format: In Person

Moderator(s): Anaïs Baudot


Authors List: Show

  • Luis Augusto Eijy Nagai, University of Tokyo, Institute for Quantitative Biosciences, Japan
  • Ryuichiro Nakato, University of Tokyo, Institute for Quantitative Biosciences, Japan

Presentation Overview: Show

Recent advances in single-cell RNA sequencing (scRNA-seq) have highlighted the limitations of traditional gene co-expression network analysis in capturing the full spectrum of gene relationships, particularly in terms of negative correlations. Our study introduces an improved community detection method leveraging signed graphs in single-cell gene co-expression networks (scGCNs) to address this gap. We compared the traditional Louvain algorithm with our proposed Louvain Signed method across three distinct tests: a simulated dataset with inherent subgroups, a real dataset of CD4 cell subtypes, and a challenging dataset of ventral midbrain cells exhibiting stemness properties. The Louvain Signed approach demonstrated superior capability in distinguishing nested gene groups, identifying crucial marker genes, and discerning gene communities linked to specific biological functions, even in datasets where cell types were not clearly defined. Our findings suggest that incorporating both positive and negative gene correlations significantly enhances the resolution and relevance of community detection in scGCNs, offering a more nuanced understanding of cellular functions in single-cell studies. This approach promises to refine our understanding of gene dynamics and cellular heterogeneity, complementing existing methods in single-cell analysis.

17:40-18:00
Fast Gene Regulatory Network Inference in Single-cell RNA-Seq with RegDiffusion
Confirmed Presenter: Hao Zhu, Tufts University, United States

Room: 520c
Format: In Person

Moderator(s): Anaïs Baudot


Authors List: Show

  • Hao Zhu, Tufts University, United States
  • Donna Slonim, Tufts University, United States

Presentation Overview: Show

Understanding gene regulatory networks (GRNs) is crucial for elucidating cellular mechanisms and advancing therapeutic interventions. Many existing methods often struggle with the high dimensionality and inherent noise of single-cell data. Inspired by our previous work on dropout augmentation, here, we introduce RegDiffusion, a new class of Denoising Diffusion Probabilistic Models for fast and accurate GRN inference. RegDiffusion introduces Gaussian noise to the input gene expression data following a diffusion schedule and the neural network with a parameterized adjacency matrix is trained to predict the added noise. This approach eliminates costly matrix inversion and significantly accelerates the inference process. Analyzing real world single-cell data with over 14,000 genes now completes in under five minutes, in contrast to the hours required by previous deep learning methods. Further, to verify the biological validity of the inferred networks, we visualized the inferred local regulatory neighborhood around well-studied key genes in mouse microglia cells. We found that genes identified in those neighborhoods are consistent with prior biological knowledge, and genes from the same functional groups are often topologically clustered together. Finally, we would like to demonstrate the regdiffusion package, which includes a straightforward interface to this model and a set of tools to analyze and visualize the inferred GRNs. Overall, with its capacity for rapid inference on large scale data and the explainability of the inferred networks, we believe RegDiffusion will be a useful tool in computational biology and help deliver new insights into complex biological data. Project site: https://tuftsbcb.github.io/RegDiffusion/

Monday, July 15th
10:40-11:20
Invited Presentation: Using proximity-dependent biotinylation to understand dynamic cell organization
Confirmed Presenter: Anne-Claude Gingras

Room: 520c
Format: In Person

Moderator(s): Chad Myers


Authors List: Show

  • Anne-Claude Gingras

Presentation Overview: Show

Compartmentalization is essential for all complex forms of life. In eukaryotic cells, membrane-bound organelles and a multitude of protein- and nucleic acid-rich subcellular structures maintain boundaries and serve as enrichment zones to promote and regulate protein function, including signalling events. Consistent with the critical importance of these boundaries, alterations in the machinery that mediates protein transport between these compartments have been implicated in several diverse diseases. Understanding the composition of each cellular “compartment” (be it a classical organelle or a large protein complex) remains a challenging task. Using the proximity-dependent biotinylation approach BioID, we systematically mapped the composition of various subcellular structures, using well-characterized subcellular markers for a specified location as baits proteins. We defined how relationships between “prey” proteins detected through this approach can help understand the protein organization inside a cell, further facilitated by newly developed computational tools. We will first discuss our map of a human cell containing major organelles and non-membrane bound structures at steady-state, and illustrate how this map can be leveraged to devise “compartment sensors” to explore dynamic cell signalling. We will then describe a computational and experimental strategy to generate multiple contextual maps of subcellular organization.

11:20-11:40
Proceedings Presentation: GraphCompass: Spatial metrics for differential analyses of cell organization across conditions
Confirmed Presenter: Merel Kuijs, Helmholtz Munich, Germany

Room: 520c
Format: In Person

Moderator(s): Chad Myers


Authors List: Show

  • Mayar Ali, Helmholtz Munich, Germany
  • Merel Kuijs, Helmholtz Munich, Germany
  • Soroor Hediyeh-zadeh, Helmholtz Munich, Germany
  • Tim Treis, Helmholtz Munich, Germany
  • Karin Hrovatin, Helmholtz Munich, Germany
  • Giovanni Palla, Helmholtz Munich, Germany
  • Anna Schaar, Helmholtz Munich, Germany
  • Fabian Theis, Helmholtz Munich, Germany

Presentation Overview: Show

Spatial omics technologies are increasingly leveraged to characterize how disease disrupts tissue organization and cellular niches. While multiple methods to analyze spatial variation within a sample have been published, statistical and computational approaches to compare cell spatial organization across samples or conditions are mostly lacking. We present GraphCompass, a comprehensive set of omics-adapted graph analysis methods to quantitatively evaluate and compare the spatial arrangement of cells in samples representing diverse biological conditions. GraphCompass builds upon the Squidpy spatial omics toolbox and encompasses various statistical approaches to perform cross-condition analyses at the level of individual cell types, niches, and samples. Additionally, GraphCompass provides custom visualization functions that enable effective communication of results. We demonstrate how GraphCompass can be used to address key biological questions, such as how cellular organization and tissue architecture differ across various disease states and which spatial patterns correlate with a given pathological condition. GraphCompass can be applied to various popular omics techniques, including, but not limited to, spatial proteomics (e.g. MIBI-TOF), spot-based transcriptomics (e.g. 10x Genomics Visium), and single-cell resolved transcriptomics (e.g. Stereo-seq). In this work, we showcase the capabilities of GraphCompass through its application to three different studies that may also serve as benchmark datasets for further method development. With its easy-to-use implementation, extensive documentation, and comprehensive tutorials, GraphCompass is accessible to biologists with varying levels of computational expertise. By facilitating comparative analyses of cell spatial organization, GraphCompass promises to be a valuable asset in advancing our understanding of tissue function in health and disease.

11:40-12:00
Functional analysis of MS-based proteomics data: from protein groups to networks
Confirmed Presenter: Nadezhda T. Doncheva, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark

Room: 520c
Format: In Person

Moderator(s): Chad Myers


Authors List: Show

  • Nadezhda T. Doncheva, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark
  • Marie Locard-Paulet, Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, France
  • John H. Morris, Resource on Biocomputing, Visualization, and Informatics, University of California, San Francisco, United States
  • Lars Juhl Jensen, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark

Presentation Overview: Show

In high-throughput mass spectrometry (MS), proteins are digested into peptides and the peptide MS signals are then used to infer protein relative quantities across samples. Proteins that cannot be unambiguously distinguished based on the available set of peptides are reported as protein groups containing several protein accessions. However, typical follow-up analysis such as gene set enrichment and protein interaction networks are based on gene-level annotation. Thus, they can only be performed on single proteins or genes, rendering such analysis incompatible with protein group outputs. Currently, there is no best practice on how to handle this and its impact on functional analysis is unknown.
Here, we investigate the composition of protein groups identified in 14 published proteomics data sets, including deep proteomes, phosphoproteomics data, single-cell proteomics, and pull down experiments from different species. We show that the arbitrary choice of one protein from each group affects gene set enrichment and network analysis to varying degrees, and that this issue should especially not be ignored in network analysis. We thus developed the Cytoscape app “Proteo Visualizer” that can complement the widely used stringApp by creating STRING networks from protein groups input instead of single protein accessions. In the resulting networks, each protein group is represented as a single node that inherits all existing edges of the group members. In addition, all relevant node and edge attributes are aggregated. This app opens new avenues for performing network analysis with protein groups from MS studies.

12:00-12:20
Direct Contacts 2: identification of direct physical interactions from > 25,000 mass spectrometry experiments
Confirmed Presenter: Kevin Drew, University of Illinois at Chicago, United States

Room: 520c
Format: In Person

Moderator(s): Chad Myers


Authors List: Show

  • Erin Claussen, University of Illinois at Chicago, United States
  • Kevin Drew, University of Illinois at Chicago, United States

Presentation Overview: Show

Protein complexes are essential to biological function and when disrupted can cause adverse health outcomes such as neurodegenerative disease, developmental disease, and cancer. Many research efforts have identifying thousands of protein complexes, yet we are severely limited in our knowledge of direct physical interactions among complex subunits. Further, knowledge of the three-dimensional (3D) structure of protein complexes illuminates their function but unfortunately the 3D structures of the vast majority of protein complexes are unsolved. These include many implicated in disease limiting our ability to interpret human pathogenic mutations. Here we describe Direct Contacts 2, our machine-learning model for predicting direct physical interactions among pairs of proteins. Our highly accurate network consists of >15k protein pairs predicted to directly interact within identified protein complexes. We developed the Direct Contacts 2 model using > 25,000 high throughput mass spectrometry experiments, including affinity purification and co-fractionation mass spectrometry experiments. Our model was built using the AutoGluon model selection framework and trained on physically interacting proteins from Protein Data Bank (PDB) structures. We evaluated our method using leave out sets of structures from the PDB, a large set of AlphaFold2 structure predictions, and chemical cross-linking data. We illustrate the usefulness of our model in investigating complexes associated with developmental disease, including using our predictions to build AlphaFold-multimer models of Oral-facial-digital syndrome associated proteins and modeling specific mutations. This work informs future research on human genetic disease and allows a framework to place disease mutations into their structural context.

14:20-14:40
Proceedings Presentation: Identifying new cancer genes based on the integration of annotated gene sets via hypergraph neural networks
Confirmed Presenter: Chao Deng, Central South University, China

Room: 520c
Format: Live Stream

Moderator(s): Deisy Gysi


Authors List: Show

  • Chao Deng, Central South University, China
  • Hongdong Li, Central South University, China
  • Lishen Zhang, Central South University, China
  • Yiwei Liu, Central South University, China
  • Yaohang Li, Old Dominion University, United States
  • Jianxin Wang, Central South University, China

Presentation Overview: Show

Motivation: Identifying cancer genes remains a significant challenge in cancer genomics research. Annotated gene sets encode functional associations among multiple genes, and cancer genes have been shown to cluster in hallmark signaling pathways and biological processes. The knowledge of annotated gene sets is critical for discovering cancer genes but remains to be fully exploited.
Results: Here, we present the DIsease-Specific Hypergraph neural network (DISHyper), a hypergraph-based computational method that integrates the knowledge from multiple types of annotated gene sets to predict cancer genes. First, our benchmark results demonstrate that DISHyper outperforms the existing state-of-the-art methods and highlight the advantages of employing hypergraphs for representing annotated gene sets. Second, we validate the accuracy of DISHyper-predicted cancer genes using functional validation results and multiple independent functional genomics data. Third, our model predicts 44 novel cancer genes, and subsequent analysis shows their significant associations with multiple types of cancers. Overall, our study provides a new perspective for discovering cancer genes and reveals previously undiscovered cancer genes.
Availability: DISHyper is freely available for download at https://github.com/genemine/DISHyper.

14:40-15:00
Are under-studied proteins under-represented? How to fairly evaluate link prediction algorithms in network biology
Confirmed Presenter: Mehmet Koyutürk, Case Western Reserve University, United States

Room: 520c
Format: In Person

Moderator(s): Deisy Gysi


Authors List: Show

  • Serhan Yılmaz, Case Western Reserve University, United States
  • Kaan Yorgancioglu, Case Western Reserve University, United States
  • Mehmet Koyutürk, Case Western Reserve University, United States

Presentation Overview: Show

For biomedical applications, new link prediction algorithms are continuously being developed and these algorithms are typically evaluated computationally, using test sets generated by sampling the edges uniformly at random. However, as we demonstrate, this evaluation approach introduces a bias towards “rich nodes”, i.e., those with higher degrees in the network. More concerningly, this bias persists even when different network snapshots are used for evaluation, as recommended in the machine learning community. This creates a cycle in research where newly developed algorithms generate more knowledge on well-studied biological entities while under-studied entities are commonly overlooked. To overcome this issue, we propose a weighted validation setting specifically focusing on under-studied entities and present AWARE strategies to facilitate bias-aware training and evaluation of link prediction algorithms. These strategies can help researchers gain better insights from computational evaluations and promote the development of new algorithms focusing on novel findings and under-studied proteins.

15:00-15:20
Protein Large Language Models are Effective, Generalized Protein-Protein Interaction Predictors
Confirmed Presenter: Joseph Szymborski, McGill University, Department of Electrical and Computer Engineering, Canada

Room: 520c
Format: In Person

Moderator(s): Deisy Gysi


Authors List: Show

  • Joseph Szymborski, McGill University, Department of Electrical and Computer Engineering, Canada
  • Amin Emad, McGill University, Department of Electrical and Computer Engineering, Canada

Presentation Overview: Show

Advancements in large language model (LLM) training have led to their widespread use across various applications, including predicting protein secondary structure and function using protein LLMs (pLLMs). Few studies characterize the suitability of pLLMs for PPI inference and none yet have investigated the role of data leakage. Training models for PPI prediction poses challenges related to data leakage and generalization, and due to pre-trained pLLMs frequently containing proteins found in testing set's PPI pairs, pLLMs are a potential source of data leakage. This study evaluates pLLM suitability for PPI inference, focusing on data leakage and optimal architectural considerations for accuracy.

To test hypotheses regarding pLLMs and their effects on hyperparameters, we trained numerous pLLMs using an efficient transformer architecture (SqueezeBERT) and a high-quality dataset of amino acid sequences (SWISS-PROT), significantly improving the problem’s tractability by reducing both training time and equipment costs. We tested two datasets, ""strict"" and ""non-strict"", to investigate the role of data leakage in the performance of pLLM-based PPI inference models. The ""strict"" dataset excluded proteins found in the testing set of our PPI dataset, while the ""non-strict"" dataset had no such restrictions. Our experiments revealed that data leakage plays a negligible role in performance and may be mitigated by tuning architecture.

Our benchmarking of various pre-trained pLLMs against previous state-of-the-art methods, which do not use pLLMs, showed that all pLLM-based methods outperformed traditional PPI inference methods. Together, our results demonstrate for the first time that pLLM embeddings are both generalized and effective features for PPI inference.

15:20-15:40
Addressing data scarcity in biomedical research using Multilayer Networks
Confirmed Presenter: Iker Núñez Carpintero, Barcelona Supercomputing Center, Spain

Room: 520c
Format: In Person

Moderator(s): Deisy Gysi


Authors List: Show

  • Iker Núñez Carpintero, Barcelona Supercomputing Center, Spain
  • Davide Cirillo, Barcelona Supercomputing Center, Spain
  • Alfonso Valencia, Barcelona Supercomputing Centre BSC, Spain

Presentation Overview: Show

Realization of the new paradigm brought by Precision Medicine heavily relies on the development of integrative and cost-effective methodologies for analyzing biomedical data. However, its application to rare diseases and precision oncology faces a major challenge: data scarcity.

In order to solve data-scarce biomedical scenarios, it is crucial to explore the potential of integrating complementary biomedical knowledge with patient-specific data. Here, we illustrate how network biology, and in particular, multilayer network models, can play a pivotal role in addressing this challenge provided its advantages in molecular interpretability.

We delve into three distinct applications of multilayer network models for the analysis of biomedical scenarios heavily impacted by data scarcity: The first article presents a personalized medicine study of the molecular determinants of severity in congenital myasthenic syndromes (CMS), a group of rare disorders of the neuromuscular junction (NMJ), recently published (Feburary 2024) in Nature Communications.
In the second publication, our focus shifts to the application of multilayer networks in prioritizing genes in medulloblastoma, a childhood brain tumor. This extends and refines concepts introduced in our previous research.
Lastly, we present ongoing efforts aimed at applying the developed methodologies to provide a comprehensive molecular understanding of the existing subtypes of rare PIK3CA/TEK vascular malformations.

This work presents major advances on the use of multilayer network-based approaches for the application of precision medicine to data-scarce scenarios, exploring the potential of integrating extensive available biomedical knowledge with patient-specific data.

15:40-16:00
Target repositioning using multi-layer networks and machine learning: the case of prostate cancer
Confirmed Presenter: Milan Picard, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada, Canada

Room: 520c
Format: In Person

Moderator(s): Deisy Gysi


Authors List: Show

  • Milan Picard, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada, Canada
  • Marie-Pier Scott-Boyer, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada, Canada
  • Antoine Bodein, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada, Canada
  • Mickaël Leclercq, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada, Canada
  • Olivier Perin, Digital Transformation and Innovation Department, L'Oréal Advanced Research, Aulnay-sous-bois, France, France
  • Arnaud Droit, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada, Canada

Presentation Overview: Show

The discovery of novel drug targets typically represents the first and most important step of drug discovery. One solution for target discovery is target repositioning, a strategy which relies on the repurposing of known targets for new diseases, leading to new treatments, less side effects and potential drug synergies. Biological networks have emerged as powerful tools for integrating heterogeneous data and facilitating the prediction of biological or therapeutic properties. Consequently, they are widely employed to predict new therapeutic targets by characterizing potential candidates, often based on their interactions within a Protein-Protein Interaction (PPI) network, and their proximity to genes associated with the disease. However, over-reliance on PPI networks and the assumption that potential targets are necessarily near known genes can introduce biases that may limit their effectiveness. We propose to address these limitations in two ways. First, by creating a multi-layer network which incorporates additional information such as gene regulation, metabolite interactions, metabolic pathways, and several disease signatures. Second, several network-based approaches were exploited including proximity to disease-associated genes, but also unbiased approaches such as propagation-based methods, topological metrics, and module detection algorithms. Using prostate cancer as a case study, each approach extracted relevant features from the network and their predictive power were evaluated independently. Using the best features identified machine learning algorithms were exploited to predict novel promising therapeutic targets for prostate cancer.

16:40-17:00
Proceedings Presentation: Transfer Learning of Condition-Specific Perturbation in Gene Interactions Improves Drug Response Prediction
Confirmed Presenter: Dongmin Bang, Seoul National University, South Korea

Room: 520c
Format: In Person

Moderator(s): Martina Summer-Kutmon


Authors List: Show

  • Dongmin Bang, Seoul National University, South Korea
  • Bonil Koo, Seoul National University, South Korea
  • Sun Kim, Seoul National University, South Korea

Presentation Overview: Show

Drug response is conventionally measured at the cell level, often quantified by metrics like IC50. However, to gain a deeper understanding of drug response, cellular outcomes need to be understood in terms of pathway perturbation. This perspective leads us to recognize a challenge posed by the gap between two widely used large-scale databases, LINCS L1000 and GDSC, measuring drug response at different levels – L1000 captures information at the gene expression level, while GDSC operates at the cell line level. Our study aims to bridge this gap by integrating the two databases through transfer learning, focusing on condition-specific perturbations in gene interactions from L1000 to interpret drug response integrating both gene and cell levels in GDSC. This transfer learning strategy involves pretraining on the transcriptomic-level L1000 dataset, with parameter-frozen fine-tuning to cell line-level drug response. Our novel Condition-Specific Gene-Gene Attention (CSG2A) mechanism dynamically learns gene interactions specific to input conditions, guided by both data and biological network priors. The CSG2A network, equipped with transfer learning strategy, achieves state-of-the-art performance in cell line-level drug response prediction. In two case studies, well-known mechanisms of drugs are well represented in both the learned gene-gene attention and the predicted transcriptomic profiles. This alignment supports the modeling power in terms of interpretability and biological relevance. Furthermore, our model's unique capacity to capture drug response in terms of both pathway perturbation and cell viability extends predictions to the patient level using TCGA data, demonstrating its expressive power obtained from both gene and cell levels.

17:00-17:20
Draphnet: Learning the drug and phenotype network linking drug effects to disease genetics
Confirmed Presenter: Rachel Melamed, uchicago, United States

Room: 520c
Format: In Person

Moderator(s): Martina Summer-Kutmon


Authors List: Show

  • Mamoon Habib, UMass Lowell Department of Computer Science, United States
  • Panagiotis Nikolaos Lalagkas, University of Massachusetts Lowell, United States
  • Rachel Melamed, uchicago, United States

Presentation Overview: Show

Medications can have unexpected effects on disease, including not only harmful drug side effects, but also beneficial drug repurposing. These effects on disease may result from hidden influences of drugs on disease networks. Discovering how biological effects of drugs relate to disease biology can both provide insight into mechanism of latent drug effects, and can help predict new effects.
Here, we aim to learn how drug impact on disease can be explained by the relationship between 1) biological effects of the drug and 2) genetic alterations driving disease. We propose that simple linear models connecting a drug's molecular effects to a disease's genetic drivers can explain the drug's effect on phenotype. Our design learns a network linking the biological processes altered by each drug to the gene drivers of phenotype risk, where the model is trained to predict the (incomplete) matrix of drug-phenotype association from SIDER. To estimate this interaction network, we take a supervised approach, training the model based on known drug impacts on disease. By simultaneously training this model to predict relationships between tens of thousands of drug-disease pairs in a multitask fashion, we aim to learn an interpretable network connecting drugs to phenotypes. We call this method Draphnet, or Drug and Phenotype Network.

The network's interpretability provides a rationale for predictions, increasing confidence in these predictions. As well, this model can provide testable hypotheses for future analysis of drug-disease biology. Finally, it can allow a new classification of drugs based on their downstream effects on disease biology.

17:20-18:00
Invited Presentation: Blending Biology, Chemistry and AI through network embeddings
Confirmed Presenter: Patrick Aloy, Institute for Research in Biomedicine (IRB Barcelona), Spain

Room: 520c
Format: In Person

Moderator(s): Martina Summer-Kutmon


Authors List: Show

  • Patrick Aloy, Institute for Research in Biomedicine (IRB Barcelona), Spain

Presentation Overview: Show

Biological data is accumulating at an unprecedented rate, escalating the role of data-driven methods in computational drug discovery. The urge to couple biological data to cutting-edge machine learning has
spurred developments in data integration and knowledge representation, especially in the form of heterogeneous, multiplex and semantically-rich biological networks. Today, thanks to the propitious rise in knowledge embedding techniques, these large and complex biological networks can be converted to a vector format that suits the majority of machine learning implementations. In this computational framework, complex connections between entities can be unveiled by means of simple arithmetic operations. Indeed, we demonstrate and experimentally validate that these descriptors can be used to reverse and mimic biological signatures of disease models and genetic perturbations in vitro and in vivo. However, only a tiny
fraction of the possible chemical space has been so far explored, meaning that most compounds able to modulate biological activities (i.e. drugs) are yet to be discovered. Thus, we are currently developing
strategies to couple bioactivity signatures and inverse design algorithms to generate new chemical entities with a desired functionality.