Presentation Overview: Show
Dr Maggie Cheang is the Group Leader of the ICR-CTSU Integrative Genomics Analysis in clinical trials, Division of Clinical Studies and Head of Clinical Informatics, Centre for Global Oncology at ICR. Her primary research focus is to identify biomarkers that would be clinically relevant. Combining biological knowledge and advanced statistical analytics/machine learning to model the multi-scale multi- “omics data” with clinical outcome, her team has been developing multi-parametric molecular classifiers to predict sensitivity and resistance of tumour biological subtype to therapeutic agents and testing the performance of one of these integrated omics/mathematical algorithms within a Phase III clinical trial at this moment. She co-invented the 50 genes-based classifier for the intrinsic subtypes of breast cancer, commonly known as PAM50, which is licensed as Prosigna and has been implemented into multiple international clinical practice guidelines. She chairs the UK National Cancer Research Institute (NCRI) Clinical Trial Pathology Advisory Group, a group that spearheaded the concept of implementing a multidisciplinary proposal guidance. She also sits on various grant review panels including the MRC/UKRI and Breast Cancer Now scientific grant committee.
My current research focus is to identify and develop robust multi-parametric or multi-modal signatures for molecular stratification by unravelling the tumour complexities and undertaking discovery science within trials. In this presentation, I would discuss, with the aid of a few exemplars, the multidisciplinary approaches that we have applied on understanding the molecular characteristics of tumours to develop better diagnostics tools (e.g., companion diagnostic assay) to pair patients with optimal anti-cancer treatments, informing future trial designs and to tackle the challenges of heterogeneity in treatment response.
Presentation Overview: Show
Spatial proteomics data has been used to map cell states and improve our understanding of tissue organization. More recently, these methods have been extended to study the impact of such organization on disease progression and patient survival. However, to date, the majority of supervised learning methods utilizing these data types did not take full advantage of the spatial information, impacting their performance and utilization. Taking inspiration from ecology and epidemiology, we developed novel spatial feature extraction methods for use with spatial proteomics data. We used these features to learn prediction models for cancer patient survival. As we show, using the spatial features led to consistent improvement over prior methods that used the spatial proteomics data for the same task. In addition, feature importance analysis revealed new insights about the cell interactions that contribute to patient survival.
Presentation Overview: Show
Background: Genomic alterations can significantly affect cellular biology, however identifying which alterations have a system-wide impact is a challenging task. Cancer samples typically contain many alterations. Identifying those with an impact is essential for biomarkers development. Here we present a novel computation tool, CIBRA, to detect genomic alterations with a system-wide impact.
Method: CIBRA integrates two omics data types to determine the system-wide impact: one indicates genomic alterations, and another defines the system-wide expression response, e.g. RNAseq data. CIBRA is both able to identify the system-wide response of genomic alterations and evaluate the degree of similarity in expression responses between alterations.
Results: Applying CIBRA to a genome-wide screen of SNVs in several cancer types, we could identify the system-wide impact on expression for most known drivers, validating CIBRA’s ability to detect biologically relevant impact. Surprisingly, when applying CIBRA to structural variants, we found a similarly large proportion of genomic alterations with a system-wide impact, suggesting that the biological impact of structural variants has been largely underreported.
Relevance: CIBRA is an essential new tool to identify the impact of genomic alterations by combining multi-omics data types and can refine our current definitions of alterations, in order to derive more accurate biomarkers.
Presentation Overview: Show
Unintended effects of medications on diverse diseases have been identified many years after these drugs enter common use. This implies that the drugs unexpectedly influence disease pathways. Then, discovering how biological effects of drugs relate to disease biology can both provide insight into the biological basis for latent drug effects, and can help predict new effects. Rich data now comprehensively profiles both biological processes impacted by common drugs, and human phenotypes known to be affected by these drugs. At the same time, systematic phenome-wide genetic studies have linked each common phenotype with putative disease driver genes. Here, we develop a supervised method that integrates in vitro data on 429 drugs and gene associations of 151 common phenotypes to learn how these molecular signals can explain drug effects on disease. Our predictions of drug-phenotype relationships outperform a baseline predictive model. But more importantly, by projecting each drug to the space of its influence on disease driver genes, we can propose the biological mechanism of unexpected effects of drugs on disease phenotypes. We present evidence that this model recovers known information about drug biology, supporting its potential to provide insights into the biology of unexpected effects of drugs on disease.
Presentation Overview: Show
We present iQPA, an integrated quantitative pathway analysis platform that functionally matches modelled disease mechanisms with actual human diseases to improve drug discovery. It is challenging to evaluate how well a model system matches an actual human disease. iQPA integrates human tissue and model system transcriptomes to provide unequivocal functional phenotype matches. iQPA transforms gene expression into quantifiable pathway activities to determine pathway dysregulation. It assesses similarity by establishing a reference of common functional dysregulation between models and human. iQPA is applied to Alzheimer’s disease, determining high-fidelity therapeutic target pathways. A cellular model with a high Aβ42/40-ratio closely recapitulated human dysregulation events in the temporal cortex. Dysregulation events in the hippocampus of 5xFAD mouse models significantly correlated with human temporal cortex. iQPA identified 83 commonly dysregulated pathways with consistent dysregulation across human brains and the Aβ42/40-high model. We validated commonly dysregulated pathways in the Aβ42/4-high model, including the p38 MAPK pathway. A Clinical p38 MAPK inhibitor dramatically ameliorated Aβ-induced tau pathology and neuronal death in the matched Aβ42/4-high model. iQPA guides targeting of the right pathogenic pathway in the right model. It preclinically assesses candidate target pathways with greater confidence for impact on disease while reducing the risk of mistargeting.
Presentation Overview: Show
The co-administration of drugs known to interact has a high impact on morbidity, mortality, and health economics. We study the drug-drug interaction (DDI) phenomenon by analyzing drug administrations from population-wide Electronic Health Records (EHR) in Blumenau (Brazil), Catalonia (Spain), and Indianapolis (USA). Despite different health care systems and drug availability, we find a common large risk of DDI administration that affected 13 to 20% of individuals. In addition, DDI risk increases with aging but is not explained solely by higher co-administration rates in the elderly. We also find that women are at higher risk of DDI overall— except for men over 50 years old in Indianapolis. Finally, we show that PPI alternatives to Omeprazole can reduce the number of patients affected by known DDIs by up to 21% in both Blumenau and Catalonia, and 2% in Indianapolis, exemplifying how analysis of EHR data can lead to a significant reduction of DDI and its associated human and economic costs. Although the risk of DDIs increases with age, administration patterns point to a complex phenomenon that cannot be solely explained by polypharmacy and multimorbidity. The lack of safer drug alternatives further overburdens health systems, thus highlighting the need for disruptive drug research.
Presentation Overview: Show
Kinase inhibitors (KIs) successfully treat diseases such as cancer, autoimmunity, and neurodegeneration. Despite their success, KIs exhibit adverse events which can limit their therapeutic windows, or worse, result in their failure during clinical trials. While a KI’s adverse event profile may be explained by its kinase binding profile, very few resources exist that couple inhibition of a specific kinase to a corresponding adverse event. Delineating which adverse events are attributable to the inhibition of a specific kinase, and which are more generally caused by the KI drug class would allow next generation drug design to avoid toxic off-target binders and improve clinical viability.
To address this issue, we integrated data from the FDA Adverse Events Reporting System and OnSIDES datasets with machine-learned KI binding profile information and developed a statistical method to associate toxic events with certain kinases or with the KI class. We performed extensive permutation testing across KI binding profiles to empirically determine the strength of individual associations, and we validated our model by reviewing established kinase-toxicity associations and prospectively predicting adverse events of KIs not previously described in public databases. We show that our method can recapitulate well-established kinase-toxicity associations and identify previously unreported kinase-adverse event pairs.
Presentation Overview: Show
Jointly analyzing biomedical data held by different institutions is challenging due to privacy concerns. This presents a significant obstacle to biomedical advances, since analyzing data from large and diverse cohorts is essential for novel discoveries. Recent collaborative GWAS studies have demonstrated how jointly analyzing multiple datasets can enable detection of an unprecedented number of genetic associations. However, existing federated analysis approaches suffer from reduced accuracy, limited supported analyses, and unresolved privacy concerns.
We present a novel collaborative analysis framework for analyzing large-scale biomedical data across multiple institutions, providing rigorous privacy guarantees. Our approach combines efficient federated computation strategies with state-of-the-art cryptography techniques, enabling biobank-scale studies across siloed datasets while guaranteeing privacy.
We showcase our framework on essential genomic analysis tasks, including GWAS, PCA, and identification of genetic relatives in the federated setting. Our framework uncovers insights that cannot be identified using each dataset alone. Moreover, our secure and federated analysis tools can be easily deployed via a user-friendly web server (sfkit.org), with the potential to accelerate biomedical research by unlocking new collaborative studies across biomedical data silos.
Presentation Overview: Show
The pathogenesis of allograft rejection in organ transplantation has been explored through various 'omics' technologies. However, the lack of a homogenous signal across organ types has raised questions about the generalizability of these findings. In this study, we present the Transferable Omics Prediction (TOP) framework, a novel approach to identify shared gene expression changes in allograft dysfunction across distinct organs without the need for batch correction.
We curated 233 public gene expression datasets, totalling 15,000 patient samples, from heart, liver, lung, and kidney transplants. We trained a transfer learning model using our novel TOP framework on 168 datasets and validated it on the remaining independent datasets. Our results reveal, for the first time, homogenous gene expression changes in allograft dysfunction across different organs.
As well as establishing a method for integrating over 230 datasets, our findings demonstrate the existence of common gene expression changes in the rejection process across dissimilar organs. These changes were isolated to a common tissue resident myeloid cell population, using single cell RNAseq. Further, transfer learning models built across organs performed better in classification tasks then organ specific models. We also validate these findings in two prospective cohorts from Australia and Canada.
Presentation Overview: Show
Lung cancer is the leading cause of cancer-related death in the world. In contrast to many other cancers, a direct connection to modifiable lifestyle risk in the form of tobacco smoke has long been established. More than 50% of all smoking-related lung cancers occur in former smokers, often many years after smoking cessation. Despite extensive research, the molecular processes for persistent lung cancer risk are unclear.
To examine whether risk stratification in the clinic and in the general population can be improved by the addition of genetic data, and to explore the mechanisms of the persisting risk in former smokers, we have analysed transcriptomic data from accessible airway tissues of 487 subjects, including healthy volunteers and clinic patients of different smoking status. We developed a model to assess smoking associated gene expression changes and their reversibility after smoking is stopped. We find persistent smoking-associated immune alterations to be a hallmark of the clinic patients. Integrating GWAS data using a transcriptional network approach, we demonstrate that the same immune and interferon related pathways are strongly enriched for genes linked to known genetic risk factors. Finally, we used accessible airway transcriptomic data to derive a non-invasive lung cancer risk classifier.
Presentation Overview: Show
There are more than 7,000 rare diseases, some affecting 3,500 or fewer patients in the US. Due to clinicians’ limited experience with such diseases and the heterogeneity of clinical presentations, approximately 70% of individuals seeking a diagnosis today and up to 50% of the suspected rare diseases remain undiagnosed. Artificial intelligence has demonstrated success in aiding the diagnosis of common diseases. However, existing approaches require labeled datasets with thousands of diagnosed patients per disease. We present SHEPHERD, a machine learning approach for multifaceted rare disease diagnosis. SHEPHERD uses geometric deep learning with multimodal clinico-genetic information and is trained exclusively on synthetically generated patients. Once trained, we show that SHEPHERD can provide clinical insights about real-world patients. We evaluate SHEPHERD on a cohort of 465 patients representing 299 diseases in the Undiagnosed Diseases Network. SHEPHERD excels at several diagnostic facets: performing causal gene discovery (causal genes are predicted at rank = 3.52 on average), retrieving “patients-like-me” with the same causal gene or disease, and providing interpretable characterizations of novel disease presentations. SHEPHERD demonstrates the potential of artificial intelligence to accelerate rare disease diagnosis and has implications for using deep learning on medical datasets with very few labels.
Presentation Overview: Show
Drugs targeting disease causal genes are more likely to succeed for that disease. But, common complex diseases are caused by many risk variants, and causal gene is not always clear. In contrast, Mendelian disease causal genes are well-known and druggable. Some Mendelian diseases are known to predispose patients to specific complex diseases (comorbidity), suggesting that they may share pathogenic processes. Here, we hypothesize that Mendelian and complex disease comorbidity can be used to find new drugs for the complex disease. From previous work, we examined 90 Mendelian and 65 complex diseases, finding 2,908 pairs of clinically associated (comorbid) diseases. Using this clinical signal, we match each complex disease to a set of relevant Mendelian disease genes and suggest that drugs targeting these genes may be successful for the complex disease. To test our hypothesis, we used data from clinical trials, known drug indications, and ATC categories level 3 and 4. After adjusting for the number of drug targets, we found significant enrichment of recommended drugs for repurposing among indicated and investigated drugs for cancer, hormonal and cardiovascular disease categories. Our findings suggest that disease comorbidity can be leveraged for drug repurposing.
Presentation Overview: Show
Neoadjuvant chemo-hormonal therapy followed by cytoreductive radical prostatectomy is a novel therapeutic approach in patients with newly diagnosed high-risk primary multifocal prostate cancer with oligometastatic disease. To help guide treatment decisions, we collected clinical and imaging (PET/CT and PET/MRI) biomarkers from 30 patients and identified those that are predictive of disease progression. The following radiological biomarkers were collected before and after neoadjuvant treatment: MRI Likert score, PET Likert score, PET maximum standardized uptake value (SUVmax), MRI apparent diffusion coefficient (ADC¬), diffusion weighed imaging (DWI), and dynamic contrast-enhanced (DCE). The following clinical biomarkers were collected: tumor volume from pre-CHT MRI, pre and post-treatment prostate specific antigen (PSA) levels, 3D prostate volume, and PSA density (PSAd); and PSA change from pre to post-treatment. Using univariate and multivariate Cox regression, we observe that post-treatment SUVmax is predictive of disease progression (p=1e-6), as is tumor volume (p=2e-6), pre and post-treatment PSA levels (p=1e-4), and PSA change (p=4e-4). The combination of SUVmax and tumor volume is the most predictive combination of biomarkers (p=5e-8), and is more predictive than either biomarker individually. These results demonstrate the existence of imaging and clinical biomarkers that are predictive of disease progression in neoadjuvant chemo-hormonal therapy of multifocal prostate cancer.
Presentation Overview: Show
Understanding cell-cell communication in the complex cellular microenvironment is essential, but current single-cell and spatial transcriptomics-based methods mainly concentrate on identifying cell-type pairs with high ligand-receptor expression values, rather than prioritizing interaction features. To address this, we introduce SpatialDM, a statistical model and Spatial transcriptomic toolbox that uses a bivariant Moran's statistic to detect spatially co-expressed ligand and receptor pairs in spatial transcriptomics data. Unlike other methods, SpatialDM does not require pre-annotation of cell types and can detect local interacting spots and patterns. This method is scalable and has shown accurate and robust performance in multiple simulations. With an analytically derived z-score approach, SpatialDM only takes 12 min for a million-spot data and a few seconds for the most prevalent thousand-spot datasets. SpatialDM achieves a highest of 0.959 AUROC under various simulation settings whereas other methods range from 0.563 to 0.871. SpatialDM has been applied to melanoma and intestinal datasets of different sequencing platforms, where it has identified well-known communication patterns, promising tumor microenvironment insights, and differential interactions between conditions, thereby enabling context-specific cell interaction discovery. Revealing interactions specific to inflammation or cancer in the colon (e.g. BMP2, CEACAM) may hopefully point to promising treatment targets.
Presentation Overview: Show
Databases of biomedical knowledge are rapidly proliferating, with recent advances focusing on integrating knowledge under a standardized schema and semantic layer (e.g., the Biolink standard). This sets the stage for developing computational systems that can discover novel connections between drugs and diseases (i.e., drug repurposing) or to answer other kinds of translational questions. To build such a system, improved methods and languages for knowledge-graph-based reasoning are needed. Progress toward biomedical reasoning systems has been hindered by (1) the lack of an expressive analysis workflow language for translational reasoning and (2) the lack of an associated reasoning engine that federates semantically integrated knowledge-bases. As a part of the NCATS Translator project, we developed ARAX, which is a new computational reasoning system for translational biomedicine that combines (1) an innovative workflow language (2) a comprehensive semantically-unified biomedical knowledge graph (RTX-KG2), and (3) a versatile method for scoring search results. Users or application-builders can query ARAX via a browser interface or an API. ARAX enables users to encode translational biomedical questions and to integrate knowledge across sources to answer the user’s query and facilitate exploration of results. To illustrate ARAX’s utility in specific disease contexts, we will present and discuss several biomedical use-cases.
Presentation Overview: Show
Providing an accurate prognosis for individual dementia patients remains a challenge since they greatly differ in rates of cognitive decline. In this study, we used machine learning techniques with the aim to identify cerebrospinal fluid (CSF) biomarkers that predict the rate of cognitive decline within dementia patients. First, longitudinal cognitive scores of 210 dementia patients were used to create fast and slow progression groups. Second, we trained ML classifiers on CSF proteomic profiles and obtained a well-performing prediction model (ROC-AUC = 0.82). Lastly, we explored the potential for each of the 20 top candidates in internal sensitivity analyses. TNFRSF4 and TGF β-1 emerged as the top markers, being lower in fast-progressing patients compared to slow-progressing patients. Proteins of which a low concentration was associated with fast progression were enriched for cell signalling and immune response pathways. None of our top markers stood out as strong individual predictors of subsequent cognitive decline. This could be explained by small effect sizes per protein and biological heterogeneity among dementia patients. Taken together, this study presents a novel progression biomarker identification framework and protein leads for personalised prediction of cognitive decline in dementia.
Presentation Overview: Show
Background. Liver Transplant Recipients (LTRs) with elevated liver enzymes can have graft injury due to various etiologies, such as recurrent or de novo non-alcoholic steatohepatitis (NASH), T-cell mediated rejection (TCMR), and other conditions. The goal of our study was to develop a Machine Learning (ML) tool integrating methylation patterns on circulating DNA in plasma with clinical variables to non-invasively and accurately classify liver graft injury.
Method and results. We generated methylation profiles on circulating DNA in a pilot study of 43 LTRs, with NASH LTRs (n=11), TCMR (n=19) and 13 Controls, and developed an L2 multinomial logistic regression ML approach across 101 bootstrapped models to distinguish between the graft conditions. Our ML model achieved mean multi-classification accuracy of 0.91, with mean specificity and sensitivity of 0.94 and 0.91, respectively. The model was found to be particularly adept at detecting TCMR and Controls, with true positive rates of 95% and 90%, and AUROCs of 0.992 and 0.985, respectively. For NASH LTRs, the models achieved a fair performance with a true positive rate of 82% and an AUROC of 0.991.
Conclusion. The newly developed ML tool holds significant promise as a novel, non-invasive and specific diagnostic tool for liver pathology.
Presentation Overview: Show
Introduction: The electrocardiogram (ECG) is a central tool for developing algorithms to predict cardiovascular diseases. Moreover, difference in prediction accuracy has been noted between self-declared ethnicity. However, no investigation has been performed on the ability to predict genetically-inferred ancestry using 12-lead ECG, limiting the assessment of such variations.
Methods: We studied the genetic backgrounds of 16,707 individuals with 220,355 ECG waveforms. Ancestry was obtained using RFMix trained on 1000 Genomes Project profiles. We classified individuals into four ethnic groups (European, South Asian, East Asian, and African) if >70% of their genome matched that ancestry, else labelled as admixed. Next, we used a ResNet50 with depthwise convolutions paired with focal loss to predict ancestry.
Results and discussion: Our study demonstrates, for the first time, that ECGs can be used to predict ancestry with a notable signal-to-noise ratio with a balanced accuracy of 0.625±0.013 on a five-class prediction task versus the noise distribution of 0.342±0.046. Our results highlight the potential for ECG-based deep learning algorithms to implicitly learn ethnicity, which may lead to bias amplification and thus a lack of fairness. Further investigation into the ability to predict individual attributes from ECGs is essential to ensure the development of ethical clinical tools.
Presentation Overview: Show
Histopathological image analysis is a critical task for accurate diagnosis and treatment of cancer. In recent years, weakly supervised learning (WSL) has emerged as a promising approach to overcome the challenge of annotation scarcity in histopathological images. However, the performance of WSL methods varies significantly depending on the architecture and hyperparameters used and systematic comparison of their performance are lacking.
In this study, we present a benchmarking study of various WSL approaches. Our results demonstrate that the transformers-based models outperform DSMIL approach. However, they require tuning a significantly higher number of hyperparameters, making them computationally expensive. To address this challenge, we propose a modified DSMIL architecture (DSMIL+) (Fig. 1) that has 90% less parameters. DSMIL+ achieves comparable or even better compared to transformers-based models at a fraction of the computation cost. Table 1 shows our results for classifying of different Glioma subtypes.
In conclusion, our study provides a systematic comparison of various WSL approaches for histopathological image analysis and highlights the challenges associated with their performance. DSMIL+ architecture can serve as a computationally efficient alternative to transformers-based models, thereby facilitating the adoption of WSL methods for histopathological image analysis.
Presentation Overview: Show
: In this talk, I will present two different network-based approaches adapted to two very distinct disease scenarios.
The first scenario addresses the challenges posed by small sample sizes in the study of rare diseases. I will introduce a novel methodology that overcomes these limitations by leveraging a multilayer framework incorporating patient data and real-world data. This innovative approach identifies clusters of patients exhibiting disease-specific characteristics, focusing in deciphering the molecular basis of disease severity - an important but many time elusive medical problem (1).
The second scenario focuses on the analysis of population mobility's impact on the spread of COVID-19. In this case we used Transfer Entropy, a powerful analytical tool, the relationship between mobility and disease incidence in the dense network connecting Spanish regions. I will show how Transfer Entropy data can pinpoint potential causes of disease waves in different locations and assess the effects of mobility restriction measures (2).
Throughout the talk, three common challenges encountered in these studies will be discussed: the complexity of underlying data, the increasing demand for computational power, and the intricacies of the interpretation of biomedical data.
References:
(1) Núňez-Carpintero et al., 2023. Rare disease research workflow using multilayer networks elucidates the molecular determinants of severity in Congenital Myasthenic Syndromes. bioRxiv. 2023.01.19.524736,
Núňez-Carpintero et al., 2021. The multilayer community structure of medulloblastoma. i-science 24.
(2) Smith et al., 2022. Evaluating the policy of closing bars and restaurants in Cataluña and its effects on mobility and COVID19 incidence. Scientific Reports 12, 9132. Ponce-de-Leon et al, 2021. COVID-19 Flow-Maps an open geographic information system on COVID-19 and human mobility for Spain. Scientific Data 8, 310.
Pontes C., Ponce-de-Leon M, Arenas A, Valencia A. (2023)