Monday, July 24, between 18:00 CEST and 19:00 CEST |
Tuesday, July 25, between 18:00 CEST and 19:00 CEST |
---|---|
Session A Poster Set-up and Dismantle Session A Posters set up: Monday, July 24, between 08:00 CEST and 08:45 CEST Session A Posters dismantle: Monday, July 24, at 19:00 CEST | Session B Poster Set-up and Dismantle Session B Posters set up: Tuesday, July 25, between 08:00 CEST and 08:45 CEST Session B Posters dismantle: Tuesday, July 25, at 19:00 CEST |
Wednesday, July 26, between 18:00 CEST and 19:00 CEST |
|
---|---|
Session C Poster Set-up and Dismantle Session C Posters set up: Wednesday, July 26,between 08:00 CEST and 08:45 CEST Session C Posters dismantle: Wednesday, July 26, at 19:00 CEST |
Virtual |
|
---|
Presentation Overview: Show
Infective endocarditis (IE) complicates 10%-20% of cases of Staphylococcus aureus bacteraemia (SAB). We sought to determine whether IE strains of S. aureus are genotypically different or behave differently in experimental models of IE, in comparison with non-IE SAB strains.
We performed analyses of genes and non-coding RNAs, variant calling, and a genome-wide association study on 924 S. aureus genomes of IE (n=274) and non-IE (n=650) strains from international patient cohorts. In parallel, we tested a subset of strains in two experimental animal models of IE, one studying the early stage of bacterial adhesion to inflamed mouse valves, the other evaluating the process of local and systemic development of IE on mechanically damaged rabbit valves.
The genetic profile of S. aureus IE and non-IE SAB strains did not differ when considering all the bioinformatics analyses. No differences were observed in the two in vivo models of IE.
All of these results suggest that S. aureus strain variation is not the primary determinant of IE. Pending the possible identification of host factors predisposing to IE, all strains of S. aureus should be considered as capable of causing this common and deadly infection once they enter the bloodstream.
Presentation Overview: Show
Cardiovascular diseases (CVDs) are one of the leading causes of mortality and loss of disability adjusted life years (DALYs) globally. Heart Failure (HF) and Atrial Fibrillation (AF) are among the most common manifestations of CVD and contribute to about 45% of all CVD deaths. In this study, we focused on implementing AI/ML techniques on RNA-seq driven gene-expression data to investigate genes associated with HF, AF, and other CVDs, and predict disease with high accuracy. The study involved generating RNA-seq data derived from the serum of consented CVD patients. Next, we processed the sequenced data using our RNA-seq pipeline and applied GVViZ for gene-disease data annotation and expression analysis. To achieve our research objectives, we developed a new Findable, Accessible, Intelligent, and Reproducible (FAIR) approach that includes a five-level biostatistical evaluation, primarily based on the Random Forest (RF) algorithm. During our AI/ML analysis, we have fitted, trained, and implemented our model to classify and distinguish high-risk CVD patients based on their age, gender, and race. With the successful execution of our model, we predicted the association of highly significant HF, AF, and other CVDs genes with demographic variables.
Presentation Overview: Show
Timely understanding of biological secrets of complex diseases will ultimately benefit millions of individuals by reducing the high risks for mortality and improving the quality of life with personalized diagnoses and treatments. Diverse and high-volume genomics and clinical data have the potential to broaden the scope of biological discoveries and insights by extracting, analyzing, and interpreting the hidden information. Current and still unresolved challenges include the integration of genomic profiles of the patients with their medical records. The disease definition in genomics medicine is simplified, when in the clinical world, diseases are classified, identified, and adopted with their ICD codes, which are maintained by the World Health Organization. Several biological databases have been produced, which includes information about human genes and related diseases. Still, there is no database exists, which can precisely link clinical codes with relevant genes and variants to support genomic and clinical data integration for clinical and translation medicine. In this project, we are focused on the development of an annotated gene-disease-code database, which is accessible through an online, cross-platform, and user-friendly application i.e., PAS-GDC. Our scope here is limited to the integration of ICD-9 and ICD-10 codes with the list of genes approved by the ACMG.
Presentation Overview: Show
Stage I epithelial ovarian cancer (EOC) has generally favorable prognosis. Despite heterogeneity, in most of cases stage I EOCs are treated in the same manner (surgery and chemotherapy). No robust prognostic markers are available often leading to overtreatment. A better molecular characterization of stage I EOCs is an unmet need. Previously, we identified patterns of somatic copy number alteration (stable, S; unstable, U; highly unstable, HU), predictive of patient prognosis.
To elucidate the mechanisms leading up to these patterns and to improve patient stratification, we used sequencing data from our previous work to calculate the activity of 17 known recently published copy number signatures.
Using unsupervised methods, we identified four groups of patients with different signature activity related to SCNA patterns, in particular signatures CX1 and CX3. Similar results were observed in a cohort of late-stage EOC and in TCGA. These patterns were also consistent across different spatial metastases in an additional cohort. Lastly, one group of stage I patients with high CX3 activity had significantly worse prognosis than the others.
These results show that specific processes underlie the three copy number alteration patterns in early-stage EOC, reflect not only different aetiopathogenic processes, but also patient prognosis.
Presentation Overview: Show
Background
High grade serous ovarian cancer (HGSOC) is one of the most lethal histotypes, usually detected when the disease is already at an advanced stage.
With the aim to develop a novel approach for early disease detection, we built a workflow of analysis to detect somatic copy number alterations (SCNAs), a molecular hallmark of HGSOC, from DNA extracted from pap test smears (pDNA) taken years before diagnosis using shallow whole genome sequencing (sWGS).
Methods
We evaluated both the characteristics of pDNA and potential analytical approaches to detect SCNAs, including fragment size distribution, choices of aligners, and post-processing tools. WisecondorX (Raman, 2019) was selected for the identification of SCNAs. Parameters were tuned to identify SCNAs and minimize background noise. A set of matched tumor tissues was used as ground truth.
Results
First, we confirmed that our approach is feasible for the detection of tumor SCNAs from pDNA. Then, using the copy number profile abnormality (CPA) quantification, we demonstrated that pDNA from women with a diagnosis of HGSOC had higher CPA values (>0.61) than pDNA from healthy donors.
Conclusions
We provided a workflow of analysis that can be applied for the identification of SCNAs from pDNA.
Presentation Overview: Show
Dendritic cell (DC) vaccine has shown promising results in a cohort of newly diagnosed glioblastoma patients, and the nonsystematic response suggests that inherent differences in tumor and host immune factors among patients play a role in determining efficacy. We performed RNA-Seq on samples from newly diagnosed glioblastoma patients treated with DC vaccination and characterized the molecular mechanisms and immune microenvironments of long-term (LTS) and short-term (STS) survivors. Analysis of outcome based on molecular subtypes showed that LTS were found in 3/8 (37.5%) mesenchymal tumors, 4/16 (25%) proneural tumors, and 1/15 (7%) of classical while STS was found in 3/8 (37.5%) mesenchymal tumors, 7/16 (44%) proneural tumors, and 7/15 (47%) classical tumors. However, no significant difference in overall survival by subtypes was found. LTS patients with DC vaccination had increased calcium signaling, neurogenesis, synapsis activities and transmembrane signaling while the STS group showed enriched cell migration and invasion, and cellular stress and injury. The deconvolution analysis unfolded that the LTS patients had higher proportion of macrophage type 1 or 2 cells, and microglia 1 cells while the STS had higher proportion of microglia 2 cells. These findings will comprehend our understanding in the modulation of DC vaccination to glioblastoma immune microenvironment.
Presentation Overview: Show
CKD is characterized by a sustained proinflammatory response of the immune system, promoting hypertension and cardiovascular disease. The underlying mechanisms are incompletely understood but may be linked to gut dysbiosis. Dysbiosis has been described in adults with CKD; however, comorbidities limit CKD-specific conclusions.
We analyzed the fecal microbiome, metabolites, and immune phenotypes in 48 children (normal kidney function, CKD stage G3-G4, G5 treated by hemodialysis [HD], kidney transplantation) with a mean±SD age of 10.6±3.8 years.
Serum TNF-α and sCD14 were stage-dependently elevated, indicating inflammation, gut barrier dysfunction, and endotoxemia. We observed compositional and functional alterations of the microbiome, including diminished production of short-chain fatty acids. Plasma metabolite analysis revealed a stage-dependent increase of tryptophan metabolites of bacterial origin. Serum from patients on HD activated the aryl hydrocarbon receptor and stimulated TNF-α production in monocytes, corresponding to a proinflammatory shift from classic to nonclassic and intermediate monocytes. Unsupervised analysis of T-cells revealed a loss of mucosa-associated invariant T-cells and regulatory T-cell subtypes in patients on HD.
Gut barrier dysfunction and microbial metabolite imbalance apparently mediate the proinflammatory immune phenotype, thereby driving the susceptibility to cardiovascular disease. The data highlight the importance of the microbiota-immune axis in CKD, irrespective of confounding comorbidities.
Presentation Overview: Show
Vaccines are essential to the functioning of healthcare systems. One vaccine design strategy focuses upon the usage of inactivated recombinant exotoxin. This strategy avoids the employment of killed (inactivated) or attenuated viruses and their respective inconveniences. However, pathogenic macromolecules presented to a human immune system risk leading to an autoimmune response wherein the vaccine protein may target a human protein with high similarity to the pathogenic protein. While the controlled introduction of mutations into the query sequence of a vaccine candidate may alleviate potential protein mimicry and any subsequent immune system responses, this can significantly disrupt protein functioning. In order to overcome this problem, a pipeline was designed to “derisk” candidate vaccines. Simultaneously, the stability and the autoimmune response risk of the query sequence were assessed through the re-building of the 3D structure of the protein and a screening of similarity to human sequences respectively. Final phases of the work focused on analysing the stability of the 3D structure via molecular dynamics simulation and the screening of the mutated sequence against T-cell epitopes. This pipeline was then tested on sequences provided by our industrial partner, delivering an enhanced yield and safety profile.
Presentation Overview: Show
Differential expression studies on bulk sequencing of tumor biopsies can yield incorrect results because single-cell sequencing is still too expensive to use in a clinical setting, while varying proportions of non-tumor tissues in the samples obscure the true signal. Various deconvolution algorithms extract tissue-specific gene expression profiles, but cannot do this on a per sample basis. We take a popular deconvolution method for expression data, based on Non-negative Matrix Factorization (NMF), and extend it by applying a per-sample, per-gene constraint optimization. We compare against existing methods and demonstrate improvement in accuracy in reconstructing true sample-and-tissue specific gene expression profiles for dataset simulations based on bulk and single-cell RNA-seq data. We then apply our method to TCGA PAAD dataset to reconstruct the sample-and-tissue specific gene expression profiles, use marker gene values for consensus clustering of samples and recapitulate the survival differences found by an orthogonal study. Our work demonstrates an improvement in the area of compartment deconvolution and has a direct application to differential expression studies in a clinical setting. It can advance personalized medicine through more precise characterization of patient samples, and has the potential to address racial disparities by helping identify patient-specific patterns in currently under sampled minority communities.
Presentation Overview: Show
The COVID-19 pandemic has stressed healthcare systems worldwide, highlighting the need for effective risk prediction models to prioritize patient care. Electronic Health Records (EHRs) are a rich data source for developing such models, but their complexity and variability challenge traditional machine learning algorithms. Given their exceptional performance in natural language processing and other applications, transformer-based models are promising for such data. Building on our previous work, we developed a novel, multi-modal transformer architecture (ExMed-BERT), which extends the previously published Med-BERT by including demographic and clinical features, information about prescribed drugs, and quantitative measures such as heart rate, temperature, and blood pressure. After pre-training our model on over 3.5 million patients, we performed transfer learning to predict the risk for acute respiratory manifestations after a COVID-19 infection using data from 80,211 COVID-19 patients. Our transformer-based models outperformed traditional machine learning models like XGBoost and Random Forests with an AUC of 79.8%. We used integrated gradients and Bayesian Networks to improve interpretability to understand the relationships between the most relevant features. Additionally, we evaluated the model's adaptation to Austrian in-patient data. Our results demonstrate the potential of transformer-based models for precision medicine and provide insights into factors driving our model's predictions.
Presentation Overview: Show
Background: As dengue is an increasing global health threat, a better understanding of the global circulation dynamics and its determinants would be helpful.
Methods: Dynamics of global circulation of the four dengue virus serotypes were explored from 1990 to 2019 utilizing genetic sequences through a network-based method. Four new circulation indicators, including local intensity, betweenness centrality, tip frequency, and persistence time were defined. Three circulation roles, including source, hub and destination, were further proposed based on the indicators. Spatial and temporal changes of the circulation roles and persistence time were explored. Important determinants were evaluated through machine learning models.
Results: Thailand, Indonesia, and Vietnam in Asia and Venezuela and Colombia in Americas were as sources for all serotypes in different decades. Destinations were observed mostly in island regions. Across decades, the number of regions with different circulation roles and persistence of DENV-1 increased significantly. But No difference for median persistence time was found, as well as the mean latitude of regions with different circulation roles. Forest, population, socioeconomic, climate, and airline factors were involved in the important determinants to circulation roles and persistence of dengue. These findings are new insights into global dynamics and beneficial for targeted control of dengue.
Presentation Overview: Show
Upadacitinib and risankizumab are efficacious in inflammatory bowel disease (IBD) patients who are anti-TNF inadequate responders (TNF-IR). We aimed to understand the mechanisms mediating the response of upadacitinib and risankizumab.
Eight tissue transcriptomic datasets from IBD patients treated with anti-TNF therapies along with single-cell RNAseq data from ulcerative colitis were integrated to identify TNF-IR mechanisms. RNAseq colon tissue data from clinical studies of TNF-IR Crohn’s disease patients treated with upadacitinib or risankizumab were used to identify TNF-IR mechanisms that were favorably modified by upadacitinib and risankizumab.
We found seven TNF-IR up-regulated modules mainly related to innate/adaptive immune responses and tissue remodeling, and six TNF-IR up-regulated cell types mainly related to inflammatory fibroblasts and inflammatory monocytes. Upadacitinib was associated with a significant decrease in the expression of most TNF-IR up-regulated modules in JAK1 responders (JAK1-R); in contrast, there was no change in these modules among TNF-IR patients treated with a placebo or among JAK1 inadequate responders (JAK1-IR). In addition, four of the six TNF-IR up-regulated cell types were significantly decreased after upadacitinib treatment in JAK1-R but not among subjects treated with a placebo or among JAK1-IR patients. We observed similar findings from colon biopsy samples from TNF-IR patients treated with risankizumab.
Presentation Overview: Show
Glioblastoma (GBM) is a highly deadly brain tumor. The chemotherapeutic treatment still lacks solid patient stratification, as temozolomide (TMZ) is administered to the majority of GBM patients. In this paper, we investigated the reliability of the NAD(P)H-fluorescence lifetime imaging microscopy (NAD(P)H-FLIM) in identifying GBM tissues that will respond to TMZ treatment.
Using the information obtained by NAD(P)H-FLIM, we conducted a DE analysis on RNA-seq samples collected by some of us, comparing TMZ responder and non-responder tumors. To validate the NAD(P)H-FLIM classification, we conducted a comparable DE analysis on the GBM TCGA (The Cancer Genome Atlas) RNA-seq data using the progression-free interval (PFI) as a responsiveness indicator.
We selected the most informative genes shared by both the DE analyses (BIRC3, CBLC, IL6, PTX3, SRD5A1, TNFAIP3) and employed them as transcriptomic signature. Using a different dataset (GBM TCGA Agilent-Microarray), we built a signature-based model capable of predicting the PFI. We also showed that the performance of our model is similar to that obtained with a well-established biomarker: the methylation status of the MGMT promoter.
In conclusion, we validate the reliability of the NAD(P)H-FLIM based drug response assessment procedure and provide a new transcriptomic based model for determining patients’ responsiveness to TMZ treatment.
Presentation Overview: Show
Distal sensorimotor polyneuropathy (DSPN) is a common neurological disorder in elderly, obese, prediabetic or type 2 diabetic individuals and is associated with high morbidity and premature mortality. DSPN is a multifactorial disease and not fully understood yet. In order to address this, we developed the Interpretable Multimodal Machine Learning (IMML) framework for empowering DSPN prevalence and incidence prediction based on sparse multimodal data. We leveraged the population-based KORA F4/FF4 cohort including 1,091 participants and their deep multimodal characterisation, i.e. clinical data, genomics, methylomics, transcriptomics, proteomics, inflammatory proteins and metabolomics. Results showed that clinical data alone was sufficient to stratify individuals with and without DSPN (AUROC = 0.752), whilst predicting DSPN incidence 6.5±0.2 years later strongly benefitted from clinical data complemented with two or more molecular modalities (improved ΔAUROC >0.1). IMML also revealed important features including up-regulation of proinflammatory cytokines, down-regulation of SUMOylation pathway and essential fatty acids in the blood at baseline, thus yielding insights in the disease pathology and putative biomarkers of DSPN incidence, which would guide prevention strategies. The IMML framework shows strong utility as it could be generalised to integrate sparse multimodal datasets to study other complex multifactorial diseases, facilitating translational research in precision medicine.
Presentation Overview: Show
Background. Liver Transplant Recipients (LTRs) with elevated liver enzymes can have graft injury due to various etiologies, such as recurrent or de novo non-alcoholic steatohepatitis (NASH), T-cell mediated rejection (TCMR), and other conditions. The goal of our study was to develop a Machine Learning (ML) tool integrating methylation patterns on circulating DNA in plasma with clinical variables to non-invasively and accurately classify liver graft injury.
Method and results. We generated methylation profiles on circulating DNA in a pilot study of 43 LTRs, with NASH LTRs (n=11), TCMR (n=19) and 13 Controls, and developed an L2 multinomial logistic regression ML approach across 101 bootstrapped models to distinguish between the graft conditions. Our ML model achieved mean multi-classification accuracy of 0.91, with mean specificity and sensitivity of 0.94 and 0.91, respectively. The model was found to be particularly adept at detecting TCMR and Controls, with true positive rates of 95% and 90%, and AUROCs of 0.992 and 0.985, respectively. For NASH LTRs, the models achieved a fair performance with a true positive rate of 82% and an AUROC of 0.991.
Conclusion. The newly developed ML tool holds significant promise as a novel, non-invasive and specific diagnostic tool for liver pathology.
Presentation Overview: Show
Tumor mutation burden (TMB), mostly defined by the total number of nonsynonymous mutations in the tumor genome, is recognized as a valuable biomarker for cancer immunotherapy across multiple cancer types. Patients with higher TMB tend to have better response to immune checkpoint blockade (ICB) therapy. Recently, TMB targeted panels were developed to estimate TMB. We investigated four TMB estimation gene panels, and found that the genes in these panels, were enriched with calcium ion binding function. Nevertheless, there is no evidence that TMB is associated with calcium ion binding genes. In this study, we aimed to explore the association between TMB and calcium ion binding genes by analyzing mutation data from The Cancer Genome Atlas database, which contains 33 different cancer types. We confirmed the association between calcium ion binding genes and TMB in 27 out of 33 cancer types. Furthermore, we constructed a general pan-cancer model with calcium ion binding genes only, which can be used to estimate TMB precisely. Additionally, estimated TMB from general-TMB estimation model can identify ICB therapy response in independent datasets in different cancer types. In conclusion, our findings indicate a significant association between calcium ion binding genes and TMB.
Presentation Overview: Show
Cancer cells harbor molecular alterations at all levels of information processing. Genomic/epigenomic and transcriptomic alterations are inter-related between genes, within and across cancer types and may affect clinical phenotypes. Despite the abundant prior studies of integrating cancer multi-omics data, none of them organizes these associations in a hierarchical structure and validates the discoveries in extensive external data. We infer this Integrated Hierarchical Association Structure (IHAS) from the The Cancer Genome Atlas (TCGA) data and compile a compendium of cancer multi-omics associations. Intriguingly, diverse alterations on genomes/epigenomes from multiple cancer types impact transcriptions of 18 Gene Groups. Half of them are further reduced to three Meta Gene Groups enriched with (1) immune and inflammatory responses, (2) embryonic development and neurogenesis, (3) cell cycle process and DNA repair. Over 80% of the clinical/molecular phenotypes in TCGA are aligned with the combinatorial expressions of Meta Gene Groups, Gene Groups, and other IHAS subunits. IHAS derived from TCGA is validated in more than 300 external datasets. To sum up, IHAS stratifies patients in terms of molecular signatures of its subunits, selects targeted genes or drugs for precision cancer therapy, and demonstrates that associations between survival times and transcriptional biomarkers may vary with cancer types.
Presentation Overview: Show
Background
Our study aims to establish molecular mechanisms by synthesizing multiomics data using computational methods for effective management of diabetes and chronic kidney disease.
Method
Multi-omics data were generated from patients with and without diabetes. A marker panel of circulating proteins predicting outcome was identified using the LASSO cox method via 5-fold 200 repeats cross validation and evaluated using C statistic. An ANG-TIE pathway geneset was curated from three open databases. An ANG-TIE pathway score and receptor level and the effect of SGLT2i treatment were evaluated using scRNAseq profiles of DKD participants.
Results
The 3-panel model significantly improved prediction of kidney outcome in discovery and validation group over the clinical parameters, LR test p=0.003 ,0.0004, respectively. A joint modelling combining plasma proteins and clinical parameters resulted in the same markers. The ANG-TIE pathway scores were elevated in progressors (p=0.02) along with higher TEK receptor level. SGLT2 inhibitor treatment did not effectively reverse the increased ANG-TIE score in T2D to healthy control levels.
Conclusion
The machine learning approach using high throughput molecular data provided the basis for a plausible signaling mechanism that could lead to potential novel treatment option as a supplement to standard care and SGLT2i for patients with DKD
Presentation Overview: Show
TP53 is a master tumor suppressor gene, mutated in approximately half of all human cancers. It is possible to infer loss of p53 activity -- which may occur from trans-acting alterations -- from gene expression patterns. We apply this approach to transcriptomes of ~8,000 tumors and ~1,000 cell lines, estimating that 12% and 8% of tumors and cancer cell lines phenocopy TP53 loss: they are likely deficient in the activity of the p53 pathway, while not bearing obvious TP53 inactivating mutations. While some of these are explained by amplifications in the known phenocopying genes MDM2, MDM4 and PPM1D, others are not. An analysis of cancer genomic scores jointly with CRISPR/RNAi genetic screening data identified an additional TP53-loss phenocopying gene, USP28. Deletions in USP28 are associated with a TP53 functional impairment in 2.9-7.6% of breast, bladder, lung, liver and stomach tumors, and are comparable to MDM4 amplifications in terms of effect size. An analysis using the phenocopy scores suggests that TP53 (in)activity commonly modulates associations between anticancer drug effects and relevant genetic markers, such as PIK3CA and PTEN mutations, and should thus be considered as a relevant interacting factor in personalized medicine studies.
Presentation Overview: Show
The identification of therapeutic targets for diseases is an important step in drug discovery. In this study, we propose a computational approach to predict new therapeutic targets of various diseases integrating genetically perturbed transcriptomic signatures (knock-down and over-expression of protein-coding genes) and disease-specific gene transcriptomic signatures of a variety of diseases. The concept of target repositioning is an extension of drug repositioning. The trans-disease method, which is a multitask learning method that takes into account similarities among diseases, enabled us to distinguish the inhibitory from activatory targets, and to predict the therapeutic targetability of not only proteins with known target–disease associations, but also orphan proteins without known associations. Our proposed method is expected to be useful for understanding the commonality and specificity of mechanisms among diseases and for therapeutic target identification in drug discovery.
Presentation Overview: Show
MDS/MPN with neutrophilia, previously known as aCML, is a rare and aggressive hematologic malignancy. The options for treating aCML are limited, and the prognosis is very poor. The mutational landscape of this disease is highly heterogeneous, and the presence of specific mutations can affect the prognosis and require a different therapeutic approach. To better understand the genetic mechanisms underlying aCML, we performed a combined analysis of single-cell RNA sequencing and whole-exome sequencing mutation profiling in 4 healthy donors and 12 aCML patients. The integration of these high-dimensional data is challenging, so we developed a novel computational framework based on regularized regression with the LASSO penalty. Our approach first compares healthy donors vs patients to estimate gene expression in different conditions, and then quantifies the impact of specific mutations on gene expression returned as fold change compared to the inferred expression in normal samples.
This analysis revealed dysregulated genes in aCML patients, such as the upregulation of calcium-binding protein S100A10, associated to mutations in ASXL1 and RUNX1, and the downregulation of the phospholipid-binding protein ANXA1, related to SETBP1 mutations. This study may improve our understanding of the molecular machinery of aCML and provide new insights into therapeutic targets for this disease.
Presentation Overview: Show
Breast cancer is the leading cancer type in women worldwide. Breast cancer staging in sentinel nodes is an essential step for early signs of tumor spreading. However, this assessment by pathologists is not always easy and retrospective surveys often requalify the status of a high proportion of sentinel nodes. Convolutional Neural Networks (CNNs) are a class of deep learning algorithms that have shown excellent performances in the most challenging visual classification tasks. In this study I compare different architectures of CNNs and different hardware acceleration devices for the detection of breast cancer from microscopic images of sentinel lymph nodes tissue. The performance of the models for the classification of normal vs tumor tissue is assessed with the area under the ROC curve. All models are trained and tested on a public data set of more than 300,000 images of lymph node tissue, on four different hardware acceleration cards. The impact of transfer learning, data augmentation and hyperparameters fine-tuning are also evaluated. Hardware type can improve training time by a factor of five to twelve. The training time increases significantly with the model depth. Increasing the depth of the model improves AUC, while data augmentation and transfer learning do not.
Presentation Overview: Show
Adenosquamous carcinoma is a rare type of lung cancer, accounting for 2-3% of all lung cancers and is associated with a poor prognosis. It is characterized by a mixture of adenocarcinoma and squamous cell carcinoma components. Previous studies have used gene panels to characterize adenosquamous carcinoma and compare the molecular differences with pure adenocarcinoma and squamous cell carcinoma, but whole exome sequencing can provide a more comprehensive molecular profile. This study aims to molecularly characterize lung adenosquamous carcinoma using genomics analysis. We plan to use whole exome sequencing to determine the genome DNA sequence of 28 patients diagnosed with lung adenosquamous carcinoma and analyze the mutational landscape using a predefined pipeline for data preprocessing and variant calling. To compare the molecular differences between adenosquamous carcinoma and pure adenocarcinoma and squamous cell carcinoma, we utilized publicly available Asian datasets. Preliminary results suggest that adenosquamous carcinoma has a higher number of nonsynonymous mutations compared to pure squamous cell carcinoma and adenocarcinoma, and its base substitution type is dominated by C>T. Furthermore, given the distinct mutation hotspots observed in adenosquamous carcinoma, a more detailed analysis is necessary to unravel the complex pathogenesis mechanism of this rare and aggressive lung cancer subtype.
Presentation Overview: Show
The exponential growth of data generated by next-generation sequencing, medical imaging, and electronic health records highlights the need for specialized tools for the exploitation of multimodal biomedical data. We have developed AI-based methods to explore data from patients with congenital myopathies, rare genetic diseases that are difficult to diagnose.
We developed a tool called NLMyo (Natural Language Myopathies) that is built on recent development in natural language processing such as ChatGPT and LLAMA to exploit free-text medical reports. NLMyo is a toolbox to anonymize and extract information, facilitate diagnosis and create a patient symptoms search engine automatically. We used NLMyo to analyze a corpus of 192 biopsy reports of patients with congenital myopathies.
We also developed a tool called MyoQuant to automatically quantify pathological features in muscle fiber histology images. MyoQuant is built on recent AI models in biomedical imaging such as Cellpose and Stardist. Using custom algorithms and AI models, it can automatically quantify pathological features in three standard stainings for congenital myopathies diagnosis (HE, ATPase and SDH), such as centralized nuclei, fiber type imbalance and mitochondria repartition anomalies.
NLMyo and MyoQuant are available as online demo versions respectively at https://lbgi.fr/NLMyo and https://lbgi.fr/MyoQuant.
Presentation Overview: Show
Providing an accurate prognosis for individual dementia patients remains a challenge since they greatly differ in rates of cognitive decline. In this study, we used machine learning techniques with the aim to identify cerebrospinal fluid (CSF) biomarkers that predict the rate of cognitive decline within dementia patients. First, longitudinal cognitive scores of 210 dementia patients were used to create fast and slow progression groups. Second, we trained ML classifiers on CSF proteomic profiles and obtained a well-performing prediction model (ROC-AUC = 0.82). Lastly, we explored the potential for each of the 20 top candidates in internal sensitivity analyses. TNFRSF4 and TGF β-1 emerged as the top markers, being lower in fast-progressing patients compared to slow-progressing patients. Proteins of which a low concentration was associated with fast progression were enriched for cell signalling and immune response pathways. None of our top markers stood out as strong individual predictors of subsequent cognitive decline. This could be explained by small effect sizes per protein and biological heterogeneity among dementia patients. Taken together, this study presents a novel progression biomarker identification framework and protein leads for personalised prediction of cognitive decline in dementia.
Presentation Overview: Show
Seasonal changes in laboratory data, attributable to yearly weather and dietary variation, is a well-known and studied phenomenon. This study investigates the potential benefits of applying a newly developed seasonal laboratory data adjustment method to a large Danish cohort population of approximately 575 thousand patients. We developed and trained 4 basic machine learning models to classify 35 cardiovascular diagnoses with the only input features being 23 laboratory tests and patient’s sex. The machine learning models trained in this study were AdaBoost, Decision Tree, Neural Net, and Random Forest. Model performance gains were assessed before and after the seasonal adjustment method was applied using AUROC and AUPRC metrics. Feature contributions were quantified using SHAP values. Classifications improved for most of the 35 ICD-10 circulatory disease chapter codes assessed in this study (24 out of 35 for NN model). In summary, this study stresses the clinical value of adjusting for seasonality when conducting EHR based studies.
Presentation Overview: Show
Internal tandem duplication (ITD) is a kind of mutation that tandem duplication appears in exon region. Until now, studies about ITD are rare. FLT3-ITD in acute myeloid leukemia is one of the few ITDs studied. In this study, we investigated ITD across different cancer types to comprehensively understand its relation to cancer. We downloaded the raw sequence data for normal and tumor tissues from The Cancer Genome Atlas from 33 cancer types and focused on the genes listed in Cancer Gene Census. Two ITD calling tools, GenomonITDetector and Pindel, were executed for ITD detection and the identified ITDs were intersected for further analysis. The results showed that ITD was detected in 1,275 of 10,372 patients (12.3%). Among 33 cancer types investigated, ITD is highly enriched in female cancers such as ovarian cancer, while none ITD was detected in six cancer types. We also noticed that ITD is associated with poor survival in pan-cancer. Moreover, several genes, such as MUC16, TRRAP, RNF213, EGFR, and MTOR showed relatively high frequency of ITD, some of which may also be the predictors of poor survival. In conclusion, ITD is enriched in female cancers and it may be considered as a prognostic biomarker in pan-cancer.
Presentation Overview: Show
The involvement of immune reactions and inflammation in chronic diseases, such as cancer is a topic of considerable debate. These conditions often exhibit similar clinical phenotypes despite potentially diverse molecular origins. We aim to explore the role of inflammatory pathways in the aetiology of chronic diseases and cancer by mining disease maps with heterogeneous patient datasets across different cohorts.
We approach the problem by mapping longitudinal molecular datasets, such as ‘omics (scRNAseq and microbiome), onto relevant disease maps to enable identification of mechanistically pertinent subgroups based on phenotypic stratification. In the poster, we show the preliminary results on quantification of molecular dysregulation and common pathways across different diseases, as well as the determination of whether inflammation acts as an upstream or downstream mechanism. By identifying distinct patterns of inflammatory pathways and relevant subgroups, we gain more understanding on the heterogeneity of molecular origins and uncover potential avenues for personalized medicine.
In conclusion, the poster demonstrates the potential of disease maps as a valuable tool for investigating the aetiology of the inflammation. The findings of this project will be included in i2TRON innovative translational collaborations that aims to ultimately map the results to improvement of patient outcomes.
Presentation Overview: Show
Clinical Alzheimer’s disease (AD) is a progressive condition that impairs cognitive function and daily living. AD patients exhibit vast heterogeneity in their disease progression both with respect to progression speed and severity of individual symptoms. In this work, we used a deep learning method to cluster the multivariate disease trajectories of AD patients along cognitive and functional outcomes. We identified two distinct clusters that separate patients into ‘slower’ and ‘faster’ progressing individuals. We validated and replicated the identified clustering in external data.
Using a machine learning classifier, we were able to predict the progression cluster of a patient from only cross-sectional data at study baseline with an average area under the receiver operator curve (AUC) of 0.76. External validation of the classifier achieved an AUC of 0.70. Using this classifier, we explore its capabilities for enrichment trials. We show how it could be utilized to recruit more homogeneous trial cohorts that enable smaller sample sizes while maintaining statistical power.
Conclusively, we propose a robust, validated, data-driven disambiguation of the symptomatic progression of AD that is predictable from cross-sectional data. This can support the design of enrichment trials and facilitate new approaches to identifying interventions targeting dementia.
Presentation Overview: Show
Context: Non-functioning pituitary adenomas (NF-PA) are benign intracranial neoplasm without hormonal hypersecretion and, in most cases, an accidental diagnostic. Most NF-PA show characteristics of local invasion, leading to high recurrence rates. This landscape request to identify new molecular markers that can be used to predict the clinical evolution of this tumours. Recent evidences suggests an emerging role of the primary cilia in the regulation of cancer development.
Methodology: We used two different transcriptomic public data repositories (GSE63357 and E-MTAB-7768) to perform an in silico analysis of cilia molecular signatures in NF-PA . Differential expression analysis of controls and NF-PA was preformed using GSE63357 by microarray technology. To validate the different signatures we used a second repository (E-MTAB-7768). To correlate cilia molecular genes with clinical data, we performed a PCA and hierarchical clustering analysis and visualization with heatmap plot.
Results: We identify significant dysregulated genes involved in the ciliogenesis process such as CCP110, DCDC2, DPCD and ARL13B. This ciliary marker had a significant positive correlation with a worse prognosis.
Conclusions: In silico analysis of transcriptomics NFPA profiles, can be used to discover new pathways, as potential predictor for clinical evolution of this type of tumours.
Presentation Overview: Show
Advancements in artificial intelligence (AI) have shown great promise in medicine, particularly in complex tasks. However, to make significant discoveries from big data, high-quality data preparation is critical. This can involve programmatic data quality control (QC) followed by semantic enrichment, which enhances the clarity of relationships between patients and features. AI models can then identify complex patterns in the data and generate interpretable outputs. Stage II/III colon cancer patients represent a diverse group with difficult to predict outcomes. In this study, we used advanced AI models, specifically variational deep autoencoders (VAEs), along with data preparation tools to identify new patient cohorts. Our QC methods resulted in a consistent and accurate numeric dataset suitable for AI application, while semantic enrichment incorporated semantic relationships from medical ontologies. We used VAEs to reduce dimensionality and clustered patients into distinct cohorts, and we also employed model interpretation methods to understand which features were driving the model's predictions. Our findings demonstrate that these methods can effectively identify patient cohorts within complex medical datasets, providing insights into aetiology and treatment decisions. Additionally, prognostic stratification accuracy can be improved within cohorts due to similarities in disease mechanisms.
Presentation Overview: Show
The aim of this project is to develop modeling approaches to measure patient similarity regarding the overall survival and the molecular tumor board (MTB) recommendation using multimodal data sets, such as patient history and molecular characterization.
The project is based on data of 2189 patients suffering from non-small cell lung carcinoma, who were presented in MTB of the Charité from 2019 to October 2022. Different approaches for encoding clinical patient history were compared with respect to their ability to identify relevant subgroups in the cohort.
First, a symbolic sequence representation of the clinical history was generated. The sequences then were clustered with a hierarchical clustering using the metric of edit-distance. Second, the individual clinical history of the patients was represented as a path and as a time-dependent graph. A frequent subgraph mining was then performed on these graphs.
The results highlight the main characteristics of the data. First, clustering of the sequence representation showed to depend on the length of the reported clinical history. In comparison, the frequent subgraph mining showed characteristic patterns of the clinical history of the patients. This approach allows to encode the patient history into patterns, and facilitate the integration of additional status data of patients.
Presentation Overview: Show
Many cancer patients experience clinical benefits from ICI treatment. However, ICI is only effective for a minority of patients that qualify for this kind of treatment. Some DNA- and RNA-Seq based biomarkers have been shown to correlate with the outcome of ICI. But, to date, only the tumor mutation burden (TMB) is an FDA-approved stand-alone biomarker for selection of ICI, with high TMB improving the chance for a favorable outcome. Improving ICI response prediction is an urgent need, considering that using high TMB as sole selection criteria results in an overall response rate of only 29% of treated patients. Moreover, many patients suffer from potential life-threatening side effects of the treatment while not showing any benefits.
The goal for my study is to integrate several DNA and RNA biomarkers, generated by Next Generation Sequencing(NGS) of tumor and control tissue, into a machine learning model predicting response to ICI with high accuracy. Many different biomarkers can be derived from NGS data of tumor samples, including TMB, mutations of known ICI resistance genes, immune cell infiltration, HLA-related features, microbial infiltration etc. Models using different combinations of features are currently trained on a cohort of ICI-treated patients for which outcome data is available.
Presentation Overview: Show
B lymphocytes are an important part of humoral immunity that protects us from external pathogens but can also cause diseases if the development, selection and/or maturation is compromised and dysregulated. Unfortunately, despite intensive research, mechanisms underlying such compromising events are still poorly understood. In this study we investigated gene expression and repertoire of human B-cell populations in patients at different stages of rheumatoid arthritis (RA). We established a full-length Smart-Seq2 protocol and a computational workflow that enabled us to obtain the expression levels of the genes and to reconstruct paired heavy and light chain sequences for ~90% of the cells in a population of 800 single cells. Information retrieved from the different variable and constant regions of these heavy and light chains was used to delineate patterns of BCR modifications, such as VDJ-gene segment frequencies, somatic hypermutation and N-glycosylation sites among patients at different stages of RA. This work provides the first attempts to leverage the B-cell receptor repertoire in patients with rheumatoid arthritis for precision medicine.
Presentation Overview: Show
Objective: This study aims to identify biomarker(s) from gene expression and methylation profile(s) for the pre-selection of patients with urothelial carcinoma to be treated with radio immune therapy.
Background: Though urothelial bladder cancer is responsive to immune checkpoint inhibitor (ICI) therapy, few patients respond to monotherapy. Thus, combining ICI treatment with other treatments such as radiation along with pre-selection of patients based on certain biomarkers can increase the efficacy of treatment.
Method/Results: Radioimmune therapy comprising neoadjuvant Nivolumab (ICI) treatment along with radiation (RTX) was administered as first line of treatment to a cohort of 30 patients. This was followed by the transurethral resection of bladder tumor (TURBT) and after a certain period of time, cystectomy. RNA-seq and identification of methylation profiles from EPIC methylation arrays from both before and after the treatment, and their combined analysis helped us identify subgroups of patients with varying response to ICI-RTX therapy along with thirteen differentially methylated regions, such as in the transcription factor ZNF804A.
Conclusion: Application of RNA-seq and methylation approaches allowed putative prediction of the response to ICI-RTX in patients with urothelial carcinoma. The detection of differentially methylated regions between therapy responder and non-responder might help improve treatment pre-selection in future studies.
Presentation Overview: Show
Motivation: Using clinical time courses from multiple patient cohorts, the goal of this work is to understand what the current state of symptoms together with genetic characteristics can tell about the future disease course. Spinocerebellar ataxias (SCAs) are rare neurological diseases that are inherited. The four most common types are caused by a CAG repeat expansion in the protein-coding region of a single, type-specific gene. The polyglutamine stretch in the resulting protein leads to protein aggregates in brain cells and to adult-age symptoms related to the loss of coordination and balance.
Method and results: Based on a compiled data set comprising 1538 patients with up to six measurements of 39 neurological scales within three years, we explored type-specific models that exploit patient profiles to predict the risk of deterioration for each scale. The most universal predictors according to a survival forest analysis included the SARA sum score and gait. Among SCA3 patients, a larger number of repeats in the expanded allele increased the risk of needing a wheelchair, whereas it did not affect the transition to walking aids.
Conclusion: While comprehensive feature profiles are useful for risk stratification, task-specific clinical markers successfully capture core trends.
Presentation Overview: Show
Background:
Cancer-associated fibroblasts (CAFs) are a heterogeneous cell type found in the tumour microenvironment. CAFs support angiogenesis and tumour growth, and induce therapeutic resistance through the production of extracellular matrix. Here, we investigate the therapeutic potential of targeting CAFs, either transcriptionally or via CAF-specific neoantigens.
Methods:
CAFs and matched tumour-associated normal fibroblasts were cultured from tissue from 12 breast cancer patients. Bulk RNA-sequencing (all samples) and whole-exome sequencing (WES, samples from six patients) were carried out. CIBERSORTx [PMID:31960376] was used to characterise sample cell type proportions, leveraging publicly available data from three distinct CAF subpopulations as a reference. Predicted fusion neoantigens were identified using NeoFuse [PMID:31755900], and WES data was processed with nf-core/sarek [PMID:32269765].
Results:
The immunosuppressive-myofibroblastic subpopulation was the most prevalent in our samples, with most samples also containing the normal-like subpopulation. This confirms the heterogeneity of the cultured fibroblasts, with important implications for CAF-targeting therapeutic strategies. Four of the 12 patients had at least one predicted fusion neoantigen specific to their CAF sample and seven missense single-nucleotide polymorphisms were called across the WES CAF samples. These preliminary results suggest that CAFs may be a viable source of targetable neoantigens.
Presentation Overview: Show
Assay for transposase-accessible chromatin sequencing (ATAC-seq) is a simple technique to profile open chromatin regions, granting the opportunity to classify tumors on their inherited epigenetic landscape rather than a more variable final tumoral state. We applied ATAC-seq to Mature T-cell neoplasms (MTCN), a challenging disease in which up to 30% of diagnostic reclassifications have been evidenced following expert review of cases.
A Singularity-backed Nextflow pipeline was developed, which proved 6.5 times faster than ENCODE’s in our setting, while providing additional data useful for quality assessment (DNA contamination) and MTCN classification (quantification of integrated viruses and DNA copy-number alteration detection).
Over a thousand hematological samples of multiple natures and origins were sequenced, the focus being made on a cohort of 100 FACS-sorted tumor cells from primary MTCN samples. Samples were clustered and reclassified using a custom t-SNE-based approach, and analyzed for differential opening between MTCN subtypes. Novel markers of interest like CXCR5 and EFNB2 were identified, and preliminary results of joint clustering with sorted normal cell samples provided new insights on potential cell-of-origins for various MTCN subtypes. Latest protocols suitable for paraffin-embedded samples, currently under investigation, could achieve making ATAC-seq a key player in MTCN, and other cancers, diagnosis.
Presentation Overview: Show
Paediatric cancer survivors face lifelong battles with severe morbidities, including significant risk from recurrence. Pre-existing genetic variation within primary tumours offers certain cell populations an evolutionary advantage, increasing the likelihood that some will be resistant to treatment. The clues to predicting resistance and relapse are embedded within mutations at time of diagnosis. But the inherent complexity of the tumour means finding these markers of resistance has yet to be fully explored across childhood cancer. To characterise the progression of primary, untreated tumours to resistant relapses, I analysed 1,733 paediatric cancer patient’s tumour genomes with clinical metadata. For each sample, I reconstructed the clonal evolution and extracted mutational signatures at the subclonal level. Results show subclones survive and become enriched post-treatment, of which two subclones, on average, carry mutational signatures denoting chemoresistance. Further, quantitative metrics of clonality indicate differences between post-treatment samples with and without therapy-associated signatures. These findings suggest features of clonal structure can form the basis for machine learning models enabling prediction of treatment failure from the genome alone. The ability to differentiate tumours at greatest risk of recurrence will help in devising more effective treatment plans to improve quality of life and survival rates for paediatric cancer patients.
Presentation Overview: Show
Hematological malignancies are lethal diseases marked by the abnormal proliferation of blood-forming cells, originating in bone marrow, lymph nodes, or blood. Existing immunotherapies aim to activate infiltrated immune cells to target malignant cells. To improve our understanding of the types and states of immune cells in the tumor microenvironment (TME), we analyzed publicly available single-cell RNA-sequencing data from patients with hematological malignancies and healthy donors. Identifying cancer cells in hematological malignancies TME using gene expression profiles is a challenging task due to the lack of established gene markers that confidently differentiate malignant cells from healthy bystander cells of the same type. Therefore, we first proposed a benchmark on several computational methods designed to identify cancer cells using transcriptome profiles, utilizing both statistical and segmentation approaches to detect copy number variation events. Subsequently, we applied single-cell analytics to identify healthy immune cells and hematopoietic lineages in the diseased and healthy samples. This enabled us to further characterize the states of immune cells in the TME and gene sets that differentiate them from their counterparts in healthy bone marrow, blood, and other tissues. The findings will enrich our knowledge of the TME and aid in the development of innovative immunotherapies.
Presentation Overview: Show
Background
Rheumatoid Arthritis (RA) is a chronic autoimmune disorder that significantly impacts upon quality of life and work capacity. Treatment of RA aims to control inflammation and alleviate pain, however achieving remission with minimal toxicity is frequently not possible with the current suite of drugs. Additionally with escalating de novo drug development costs, bioinformatic discovery pipelines offer the ability to repurpose already licenced compounds and explore synergistic combinations more efficiently.
Method
Public datasets are mined and pre-processed followed by differential expression analysis to obtain a list of differentially expressed genes (DEGs). DEGs from multiple datasets are merged and mapped to Affymetrix probeset IDs to create a treatment response gene signature. Connectivity mapping analysis is used to obtain a list of alternate drugs with high probability of inducing therapeutic response.
Results
CMap analysis identified a total of 6 statistically significant candidate drugs which induced gene expression profiles indicative of theoretical response. The next step involves in silico toxicity screening on identified candidate drugs to focus in vitro tests on list of optimal drugs.
Conclusion
Analysis with this pipeline illustrates the potential of treatment response DEG extraction from expression datasets to predict novel drugs which may offer new options to refractory patients.
Presentation Overview: Show
Globally, cancer is a leading cause of mortality, responsible for nearly 10 million deaths in 2020. Drug resistance is a significant contributor to treatment failure and patient mortality. While combination therapies can address intratumor heterogeneity that allows resistance to single-drug treatments, discovering novel combinations has historically relied on logistically, temporally, and financially consuming in vitro and in vivo assays. There is a clear opportunity for in silico approaches to systematically predict and prioritize synergistic combinations. Currently, many recent machine learning (ML) models have limited clinical translatability by excluding drug concentrations, ignoring relationships between multi-omic data modalities, or lacking biological intepretability. We present a modified biologically-annotated neural network that embeds established biological interactions into the model architecture to predict synergistic anticancer drug combinations from publicly available drug and multi-omic data. Our model uses variational inference to produce posterior inclusion probabilities for the weights on each of the connections in the model. This allows us to identify interesting targets that can lead to new biomarkers and mechanisms, critical for future clinical monitoring and appropriate distribution to patients. This work advances our understanding of biologically-motivated ML approaches, informs future drug combination discovery, and enables drug repurposing efforts.
Presentation Overview: Show
Alzheimer's disease (AD) is characterized by cognitive decline and dementia and has few clinically approved treatments. The clinical and neuropathological heterogeneity of AD suggests the existence of AD subgroups with unique mechanisms and implies the need for cell-type specific subtyping and treatments. The goal of this project was to investigate the differential effects of medications on cognitive tests in eight previously defined brain cell type-based subtypes. The cognitive effects of 47 medications with at least 100 treated subjects were analyzed using 487,409 participants from the UK Biobank. Four cognitive test scores (fluid intelligence, reaction time, prospective memory, and pairs matching) were rank-transformed after adjusting for sex and age at exam. Association of transformed scores with medication status (treated vs. untreated) was determined using logistic regression, performed separately for high-risk and low-risk genetic subtype groups for comparison. Six medications were significantly associated (P<0.05) with improved cognitive function during follow-up examination. Atenolol, a high blood pressure medication, was associated with the largest gain, with an improvement on the effect of cognitive function by 67% in the high-risk compared to the low-risk subjects from an astrocyte genetic subtype. These results demonstrate potential novel implications on genetic subtype-driven treatment options for AD.
Presentation Overview: Show
Long Abstract
Presentation Overview: Show
Fetal growth is monitored periodically during pregnancy via ultrasound measurements of fetal dimensions such as femur length, head circumference, abdominal circumference, and biparietal diameter. We show that the Gompertz model, a standard model for constrained growth, with just three intuitive parameters, convincingly fits the growth of fetal ultrasound biometries. Two of these parameters— t(the inflection time) and c(the rate of decrease of growth rate)— can be treated as universal to all fetuses, while the third parameter A can be modeled as an overall scale parameter specific to each fetus, which captures the individual variation in growth. In our cohort of 817 pregnant women (“Seethapathy cohort”), we use the fit to the Gompertz model, using available ultrasound parameters, to accurately predict birth weight. We find that deviation from Gompertz- like growth is linked to neonatal complications. Finally, we show that the Gompertz growth curve is a close fit to fetal growth standards from WHO, NICHD, and INTERGROWTH, with the optimal t0 and c close to that in the Seethapathy cohort. The Gompertz formula thus has descriptive and predictive abilities and can be used prescriptively to model ideal growth.
Presentation Overview: Show
Diverse single-cell modalities such as single-cell transcriptomics, ATAC-seq, and spatial transcriptomics capture different views of the underlying biological system in healthy and disease states. Combining these views is beneficial in increasing our understanding of disease states, and finding novel drug targets and biomarkers. However, standardisation of different datasets, and their optimal integration into a common representation is necessary for downstream analyses and interpretation. We have developed novel computational tools for automated cell-type annotation, integration, interactive visualisation and performing single-cell analyses in a secure cloud platform. These tools address some challenges in single-cell data analyses in a scalable, and resource efficient manner, while maintaining data interpretability.
Using our spatial Human myocardial infarction atlas, and data obtained from fibrosis models of heart and kidney disease, we highlight the utility of our computational framework to better under cardiometabolic diseases and identify potential targets. Here, we will showcase key cellular mechanisms involved in the myofibroblast differentiation in the human heart and the role of mesenchymal–myeloid interactions in cardiac remodelling. Finally, we will highlight the use of single-cell data in biomarker discovery by discussing our multi-omics study to characterise the CD8+ T cell response to SARS-CoV-2 infection at single-cell resolution.
Presentation Overview: Show
Immune Checkpoint Blockade (ICB) is one of the most promising cancer treatment, however, is only effective for a limited number of patients, approximately 20-30%. One of the main issues is the lack of biomarkers to predict resistance to ICB. Here, we present a putative signature for treatment response prediction, based on a comprehensive flow cytometry analysis.
100 blood samples from melanoma and non-small cell lung cancer prior to ICB treatment were analysed through flow cytometry analysis. flowTOTAL R package was developed for mass analysis comprising preprocessing and automated gating. A two step strategy based on a training (67 patients) and a test cohort ( 33) were conducted for model training and evaluation. LASSO, XGBoost and SVM were used as Machine Learning (ML) models. Model performance was measured with AUC, with confidence intervals based on bootstrap analysis. Repeated-cross validation over the training data revealed the best hyperparameter tunning for each ML model. LASSO was the model with the best generalization capacity, obtaining an AUC of 0.80 (CI: [0.73-0.91]). The selected features were CD19+, CD8+PDL1+, CD8+PD1, together with cell-free DNA concentration.
We reveal a promising signature based on flow cytometry data to assist in patient selection criteria for ICB.
Presentation Overview: Show
In the last years, it was shown that a part of depressed patients has low grade inflammation, measured by the concentration of C-reactive protein (CRP) in blood [1]. However, studies with deep phenotyping, an extensive cytokine panel and multiomics measurements to better characterize subgroups within patients are missing.
237 probands from the two in-house studies [2,3] with a diagnosis according to the Composite International Diagnostic Interview (mostly depression) were selected. A panel with 43 cytokines and RNA expression was measured in blood. The probands were clustered based on age, BMI, the cytokine and RNA data.
Our results show that inflamed subgroups with immune-metabolic symptoms can be characterized within a transdiagnostic cases sample. We find cytokine markers besides CRP, including less described chemokines CCL2 and CCL4. Also, our findings highlight the advantage of using multiple data modalities for disease classification. The most important cytokines differentiate the inflamed subgroups, while the most important genes differentiate the lowest severity group from the other groups. Lastly, we show that clinical phenotypes like age and BMI are still very important for the subgroup discovery.
[1] Milaneschi, et al. (2020). Biological Psychiatry.
[2] Kopf-Beck, et al. (2020). BMC Psychiatry.
[3] Brückl, et al. (2020). BMC Psychiatry.
Presentation Overview: Show
An individualized cancer therapy is ideally chosen to target the cancer’s driving biological pathways, but identifying such pathways is challenging because of their underlying heterogeneity and there is no guarantee that they are druggable. We hypothesize that a cancer with an activated druggable cancer-specific pathway (DCSP) is more likely to respond to the relevant drug. Here we develop and validate a systematic method to search for such DCSPs, by (i) introducing a pathway activation score (PAS) that integrates cancer-specific driver mutations and gene expression profile and drug-specific gene targets, (ii) applying the method to identify DCSPs from pan-cancer datasets, and (iii) analyzing the correlation between PAS and the response to relevant drugs. In total, 4,794 DCSPs from 23 different cancers have been discovered in the Genomics of Drug Sensitivity in Cancer database and validated in The Cancer Genome Atlas database. Supporting the hypothesis, for the DCSPs in acute myeloid leukemia, cancers with higher PASs are shown to have stronger drug response, and this is validated in the BeatAML cohort. All DCSPs are publicly available at https://www.meb.ki.se/shiny/truvu/DCSP/.
Presentation Overview: Show
Chronic kidney disease is a global health burden affecting >10% of the world’s population. Patient responses to therapy and rates of disease progression are highly variable. To begin to address this, we used unbiased consensus clustering to determine if common transcriptomic clusters from 285 kidney biopsies from patients with rare kidney diseases in the Nephrotic Syndrome Study Network (NEPTUNE), the European Renal cDNA Bank Cohort (ERCB), and the Human Hereditary & Health Cohort in Africa (H3Africa) were associated with disease progression. One cluster of patients had a differentially expressed gene profile consistent with TNF activation and was associated with a more rapid loss of kidney function in NEPTUNE. Two TNF pathway markers in the urine, tissue inhibitor of metalloproteinases-1 (TIMP-1) and monocyte chemoattractant protein-1 (MCP-1 encoded by CCL2), could predict the level of TNF pathway activation in the kidney (p<0.01). A clinical trial assessing these as anti-TNF therapy response biomarkers is ongoing. Open-source resources and databases that incorporate deep patient phenotyping with high dimensional -omics data including those by NEPTUNE, H3 Africa, and the Kidney Precision Medicine Project can help decipher molecular underpinnings of kidney disease, the cellular and structural location of disease-associated genes, and help facilitate therapeutic development.
Presentation Overview: Show
Chronic kidney disease is a global health burden affecting >10% of the world’s population. Patient responses to therapy and rates of disease progression are highly variable. To begin to address this, we used unbiased consensus clustering to determine if common transcriptomic clusters from 285 kidney biopsies from patients with rare kidney diseases in the Nephrotic Syndrome Study Network (NEPTUNE), the European Renal cDNA Bank Cohort (ERCB), and the Human Hereditary & Health Cohort in Africa (H3Africa) were associated with disease progression. One cluster of patients had a differentially expressed gene profile consistent with TNF activation and was associated with a more rapid loss of kidney function in NEPTUNE. Two TNF pathway markers in the urine, tissue inhibitor of metalloproteinases-1 (TIMP-1) and monocyte chemoattractant protein-1 (MCP-1 encoded by CCL2), could predict the level of TNF pathway activation in the kidney (p<0.01). A clinical trial assessing these as anti-TNF therapy response biomarkers is ongoing. Open-source resources and databases that incorporate deep patient phenotyping with high dimensional -omics data including those by NEPTUNE, H3 Africa, and the Kidney Precision Medicine Project can help decipher molecular underpinnings of kidney disease, the cellular and structural location of disease-associated genes, and help facilitate therapeutic development.
Presentation Overview: Show
Introduction
Peripheral T cell lymphomas (PTCL) are a group of rare and aggressive lymphoid malignancies affecting post-thymic T-cells. The commonest subtype is biologically heterogeneous, hence termed Not Otherwise Specified (PTCL-NOS). Chemotherapy is the primary treatment, with short-lived responses (~30% 5-year survival rate). Discovery of new therapeutic targets for the development of safe and effective treatments is critically needed. This project aims to identify new therapeutic targets for PTCL-NOS.
Results
Differential expression analysis of public microarray datasets was performed to identify upregulated genes in PTCL-NOS compared to healthy T cells; 1098 genes were identified and considered candidate gene (CG) targets. To avoid off-target effects, CG expression was evaluated in healthy human tissues (public data), where 26 CG passed low-expression requirements. Protein expression for the 10 highest ranked CG was assessed by immunohistochemistry (IHC), resulting in 4 CG strongly expressed in PTCL-NOS. CIBERSORTx cell type deconvolution and novel single cell RNA-seq of 3 PTCL primary tumour samples suggests that the 4 CG are expressed by myeloid and T cells in the tumours.
Conclusion
Four new potential therapeutic targets have been identified at mRNA and protein level in PTCL-NOS. Our next aim is to investigate the roles of these genes in PTCL.
Presentation Overview: Show
Deletion 5q (del(5q)) is a type of Myelodysplastic Syndrome characterized by anemia and low risk of progression to acute myeloid leukemia. This study aims to identify the pathways and genes relevant to the development of the disease by studying del(5q) and non-del(5q) cell populations within patients at a single-cell level. CD34+ cells from four untreated patients with del(5q) MDS were isolated, and single-cell RNA-seq was performed. Two independent copy number alteration algorithms were used to classify cells as 5q-depleted or not, and the del(5q) cells were mainly enriched in the stem cell and erythroid compartments. The gene expression profile of del(5q) cells was compared to their normal counterparts, and although a low number of differentially expressed genes were found, they were enriched in similar pathways in most of the cell types. Regulon activity discrepancies were also studied, and the activity of the regulons in non-del(5q) cells was closer to del(5q) than to healthy ones. This study provides the first characterization of cells harboring del(5q) at the single-cell level and reveals previously unknown expression alterations that could have implications for the phenotype of del(5q) patients.
Presentation Overview: Show
Understanding cell-cell communication in the complex cellular microenvironment is essential, but current single-cell and spatial transcriptomics-based methods mainly concentrate on identifying cell-type pairs with high ligand-receptor expression values, rather than prioritizing interaction features. To address this, we introduce SpatialDM, a statistical model and Spatial transcriptomic toolbox that uses a bivariant Moran's statistic to detect spatially co-expressed ligand and receptor pairs in spatial transcriptomics data. Unlike other methods, SpatialDM does not require pre-annotation of cell types and can detect local interacting spots and patterns. This method is scalable and has shown accurate and robust performance in multiple simulations. With an analytically derived z-score approach, SpatialDM only takes 12 min for a million-spot data and a few seconds for the most prevalent thousand-spot datasets. SpatialDM achieves a highest of 0.959 AUROC under various simulation settings whereas other methods range from 0.563 to 0.871. SpatialDM has been applied to melanoma and intestinal datasets of different sequencing platforms, where it has identified well-known communication patterns, promising tumor microenvironment insights, and differential interactions between conditions, thereby enabling context-specific cell interaction discovery. Revealing interactions specific to inflammation or cancer in the colon (e.g. BMP2, CEACAM) may hopefully point to promising treatment targets.
Presentation Overview: Show
Frequent longitudinal measurements of multiple omics data are emerging as new, technical advancements for the in-depth analyses of molecules in small blood volumes are appearing (1). This makes the sampling process less disruptive for patients (no venepuncture, self-collect possible) and can allow the routine usage of such measurements in clinical practice.
However, applications that prove the surplus of these measurements in the long run for assessing disease development are lacking.
Our I AM Frontier (IAF) cohort (2) contains monthly follow-up measurements for multiple omics types. We matched our data with UK Biobank (UKB) data (3) to bring in robust phenotypes.
It was assessed whether non-diagnosed IAF individuals with high risk for type 2 diabetes show intermediate changes in a disease spectrum stretching from healthy ‘baseline’ (4) individuals to diagnosed individuals in the UKB cohort.
This was done by applying the Pareto task inference method (ParTI) (5) on the combined dataset.
Trajectories identified with ParTI were contextualized within a customized CKG graph database environment (6). The CKG library represents established molecular interaction pathways up to a molecule-specific level for the disease of interest. This enabled us to evaluate the surplus of frequent sampling by assessing the time-dependent development of chronic diseases.
Presentation Overview: Show
Introduction: This study examines the immunological changes in blood samples of PDAC patients who received FOLFIRINOX combined with lipegfilgrastim (FFX-Lipeg). We aimed to scrutinize the influence of using flow cytometry or targeted gene expression to study the immunological changes in blood samples of PDAC patients who were treated with a single-cycle FFX-Lipeg.
Material & Methods: Whole blood samples from 44 PDAC patients were collected before the first FOLFIRINOX cycle and 14 days after the first cycle. EDTA blood tubes were used for multiplex flow cytometry analyses and complete blood count tests, whereas Tempus blood tubes were used to measure immune-related genes using the NanoString Technology®.
Results: Data analysis showed that FFX-Lipeg treatment increased neutrophils and monocytes. Interestingly, flow cytometry analysis showed an increase in the number of B and T cells after treatment, whereas targeted gene expression analysis showed a decrease in B and T cell-specific gene expression.
Conclusion: The findings suggest that the different techniques used to analyze the data can influence clinical discoveries. Targeted gene expression complements flow cytometry analysis to provide a comprehensive understanding of the effect of FFX-Lipeg. Careful selection of the measurement technique is important for studying the effect of treatment.
Presentation Overview: Show
USP7 is a ubiquitin specific protease that removes ubiquitin from its protein substrates, stabilising them and preventing degradation. It has long been postulated as a drug target through its dual role in stabilising the tumour suppressor p53 and its E3 ligase Mdm2. We have developed potent, selective USP7 inhibitors and characterised their mechanism of action (MOA) in human primary cells, cancer cells and a syngeneic in vivo model using bulk RNA-seq. This approach revealed that USP7 has multiple MOA that are highly context dependent, including modulation of HIF1-alpha present only in the fibroblast component, and modulation of the transcriptional repressor complex PRC2. We have demonstrated in an in vivo syngeneic mouse model, CT26, treated with a USP7 inhibitor and analysed using RNA-Seq, that these primary cell mechanisms drive efficacy through tumour microenvironment (TME) reprogramming. We have used the transcriptional signature from this study to suggest which patients may respond to USP7 inhibitors by performing a pan-cancer analysis using gold-standard public domain datasets. This work transforms our understanding of USP7 and provides a compelling path to clinic for USP7 inhibitors. It also illustrates the potential of dynamic RNA-Seq and data integration methods to reveal the MOA of small molecule inhibitors.
Presentation Overview: Show
Rheumatoid arthritis (RA) is a chronic, inflammatory and autoimmune disease affecting 1% of the worldwide population. It is characterized by symptomatic flares during which significant inflammation and destruction of the joints appear. The pathophysiology of the disease remains poorly understood and the available treatments can only alleviate its symptoms and rarely induce long-term remission. We currently lack effective tools to predict the course and response of RA to treatments.
RA is a heterogeneous entity resulting in not all patients responding to the same treatments. We posit that this difference is due to a plurality of immune alterations causing distinct immune cell expression profiles namely the RA endophenotypes.
Using single-cell RNA sequencing of PBMCs and clinical data of RA patients, before the initiation of treatments, we applied bioinformatics and machine learning methods characterizing several potential RA endophenotypes and their specific clinical outcomes.
The long-term objectives of this ongoing project are to study the heterogeneity among patients of RA physiopathology and identify biomarkers for RA treatment response. Developing a tool linking a new patient to a characterized endophenotype - and thus its clinical outcomes - based on a few meaningful biological characteristics would pave the way for personalized medicine in RA.
Presentation Overview: Show
Screens in primary patient material have emerged as a promising platform to discover novel drug candidates in challenging indications such as ovarian and non-small cell lung cancer. Unlike traditional models in drug development, our primary model systems closely recapitulate the complex target niche with proven translatability (Kornauth et al. 2021). Here, we demonstrate a computational framework for novel target discovery, leveraging ex vivo sensitivity and multi-omic profiles of over 100 small molecule drugs (SMDs) on primary ovarian and lung cancer samples.
Using SMDs as tools, we systematically perturbed the interactome and developed a graph-based discriminant algorithm to identify topological hotspots of targets from SMDs that uniquely show high sensitivity across our primary cohorts. This allows us to further discover novel targets while avoiding community biases inherent in biological knowledge graphs. Importantly, our models allow for putative target activity to be evaluated directly in a clinically relevant system. Utilising single-cell transcriptomics, we confirmed the activity of predicted pathways which led to insights into novel mechanisms of action.
Overall, we demonstrate that our approach is highly effective at recovering known targets while also uncovering a large number of completely novel targets, with applications ranging from drug repurposing to first-in-class drug development.
Presentation Overview: Show
Viral infections can modulate the widespread alternations of cellular splicing, favouring viral replication. However, how SARS-CoV-2 induces host cell differential splicing and affects the landscape of transcript alternation in severe COVID-19 infection remains elusive. Understanding the differential splicing and transcript alternations in severe COVID-19 infection may improve our molecular insights in COVID-19. In this study, we analysed the publicly available blood and lung transcriptome data of severe COVID-19 patients, blood transcriptome data of recovered COVID-19 patients at 12-, 16- and 24-week postinfection and healthy controls. We identified a significant transcript isoform switching in the individual blood and lung RNA-seq data of severe COVID-19-infected patients and 25 common genes that alter their transcript isoform in both blood and lung samples. Altered transcripts show significant loss of the open reading frame, functional domains and switch from coding to noncoding transcript. We identified the expression of several novel recurrent chimeric transcripts in the blood samples from severe COVID-19 patients. Moreover, the analysis of the isoform switching into blood samples from recovered COVID-19 patients highlights that there is no significant isoform switching in 16- and 24-week postinfection, and the levels of expressed chimeric transcripts are reduced. This finding emphasizes: SARS-CoV-2 severe infection induces splicing.
Presentation Overview: Show
At Genomics England we are in the process of developing a model agnostic multimodal machine learning platform for cancer analysis with linked multimodal dataset available for over 17,000 patients. Data is available through the Genomics England Research Environment and contains genomic, histopathology, radiology, and clinical information. We have also developed a feature store containing embeddings, which users can use to build their own models facilitating more rapid development and significantly reduced compute. To demonstrate the benefits of our platform we have implemented published, state of the art models in cancer survival analysis. This multimodal model architecture performs better than unimodal models in 11/14 cancer types when tested against the TCGA dataset (mean multimodal c-index=0.64, histopathology c-index=0.61). We also investigate model portability by testing models pretrained on TCGA against internal Genomics England data and compare the results to training and validating on Genomics England data. The modularity of the model also allows us to easily exchange individual components which we demonstrate with an improved histopathology model. Finally, we also explore how cloud-enabled tools facilitate novel biomarker discovery which can have clinical relevance. To summarise, we demonstrate state-of-the-art machine learning models on Genomics England’s multimodal dataset.
Presentation Overview: Show
Hepatocellular carcinoma (HCC) is an inflammation-associated cancer arising from viral and non-viral etiologies. Expansion of suppressive myeloid cells is a hallmark of chronic inflammation and cancer, but their heterogeneity in HCC is not fully resolved and might underlie immunotherapy resistance in the steatohepatitis setting. Here, we present a high resolution atlas of hepatic innate immune cells (~100,000 single-cell transcriptomes) from ten patients with HCC and unravel several discrete populations of multi-lineage entropic monocytes that expand in the tumoral tissue. Among them, we identified a population of suppressive monocytes that is rare in viral HCC but increased in the steatohepatitis setting. Using the Visium spatial transcriptomic technology, we showed that these suppressive cells are spatially enriched with cancer-associated fibroblasts at HCC fibrotic lesions. Finally, we investigated the behavior of our different cell populations in larger cohorts of patients, by applying several deconvolution methods to bulk RNA-seq data. Facing the difficulty to properly deconvolute multi-lineage populations, we established a resampling protocol of our scRNAseq data and compared the performance of deconvolution methods. Our results support 1) the stratification of patients according to cancer grade and liver settings and 2) myeloid subset-targeted immunotherapies via specifically-expressed genes to treat HCC.
Presentation Overview: Show
Achilles' heel relationships arise when the functional status of one gene exposes vulnerability to perturbation of a second gene, such as chemical inhibition, providing opportunities for precision oncology. We developed SynLeGG (www.overton-lab.uk/synlegg) and the MultiSEp R package, for integrative interrogation of mutually exclusive loss signatures in multiomics datasets. Transcriptome data has advantages for investigation of GDRs and remains relatively underexplored. MultiSEp clusters samples by gene expression, with cardinality determined by regularisation, allowing partitioning CRISPR scores and mutations for candidate GDR discovery. MultiSEp outperforms competing approaches in benchmarking, include DAISY and BiSEp; with higher AUC and between 2.8-fold to 8.5-fold greater GDR coverage. We predicted Multiple Myeloma (MM) GDRs by MultiSEp analysis of gene expression and mutations from the CoMMpass study (n=1150 patients). Almost all MM patients relapse and succumb to therapy-resistant disease; accordingly, more effective treatments are urgently needed. Our predicted MM GDR networks included ‘nexus’ genes where the GDR neighbourhood mutations cover a high proportion of MM patients, representing attractive drug targets. The MM GDR networks also reveal context-specific biochemical interactions, illuminating fundamental biology, for example around the Ubiquitin-Proteasome System. Laboratory validation of a novel target further supports the value of our approach; towards new tools for cancer medicine.
Presentation Overview: Show
Cell segmentation is a common task in analysing the tumor microenvironment (TME), which encompasses vascular cells and cells of the innate and adaptive immune system in and around the tumor. The impact of the TME on survival and effectiveness of immune therapies has been shown for different cancer types e.g. gliomas, the most common type of primary brain tumors. But the immune status is usually not included in the classification of gliomas.
Here we developed an automated pipeline for cell segmentation of brightfield microscopy images using a Residual Neural Network (ResNet34) to select areas of interest and an U-Net based classifier to segment counterstained cells and cells with immunostaining. These convolutional neural networks are retrained using transfer learning with images created by immunohistochemistry of formalin-fixed paraffin-embedded tissue samples applying CD3, CD4, CD8, CD20, CD163 and Iba1 antibodies.
Large-scale image analysis can now be efficiently performed in a standardized and labour-saving manner. Using the cell count information survival analysis was conducted. In different cox regression models first evidences for a relationship between immune infiltration of CD20 and survival in the glioma-subgroup IDH-mutant and a relationship between Iba1 and survival in IDH-wildtype gliomas were found.
Presentation Overview: Show
Drugs targeting disease causal genes are more likely to succeed for that disease. But, common complex diseases are caused by many risk variants, and causal gene is not always clear. In contrast, Mendelian disease causal genes are well-known and druggable. Some Mendelian diseases are known to predispose patients to specific complex diseases (comorbidity), suggesting that they may share pathogenic processes. Here, we hypothesize that Mendelian and complex disease comorbidity can be used to find new drugs for the complex disease. From previous work, we examined 90 Mendelian and 65 complex diseases, finding 2,908 pairs of clinically associated (comorbid) diseases. Using this clinical signal, we match each complex disease to a set of relevant Mendelian disease genes and suggest that drugs targeting these genes may be successful for the complex disease. To test our hypothesis, we used data from clinical trials, known drug indications, and ATC categories level 3 and 4. After adjusting for the number of drug targets, we found significant enrichment of recommended drugs for repurposing among indicated and investigated drugs for cancer, hormonal and cardiovascular disease categories. Our findings suggest that disease comorbidity can be leveraged for drug repurposing.
Presentation Overview: Show
Artemisinin combination therapies are the first-line treatments of malaria. Resistance to these antimalarials could be catastrophic, particularly in Africa, where ~588,000 people died from malaria in 2021. Partial artemisinin resistance, mediated by mutations in the kelch13-protein (PfK13), and first reported in Southeast (SE) Asia in 2008, has since spread rapidly. To evaluate the recent spread in East-Africa, we investigated the increase in mutation frequencies by comparing annual changes in PfK13 prevalence in Uganda with estimates in SE-Asia. We fitted a Bayesian-Mixed-Effects model to the annual prevalence by district for each mutation to estimate selection coefficients. In Uganda, we estimated selection coefficients from 2016-2021 across seven districts for three mutations (469Y, 469F, 675V). The selection coefficient for all mutations in Uganda was s=0.36, i.e. prevalence is estimated to have increased by 36% yearly. To compare our estimates to the spread in SE-Asia, we employed the model on MalariaGen samples from 2003-2018 in Cambodia, Laos, Myanmar, Thailand, and Vietnam to estimate selection coefficients for 580Y (s=0.39) and all validated PfK13 mutations (s=0.32). These findings suggest that partial artemisinin resistance in Africa has been similar to those in SE-Asia, where resistance is now widespread.
Presentation Overview: Show
DNA methylation (DNAm) at specific CG dinucleotides (CpG sites) in the genome provides powerful biomarkers for a wide range of applications, such as epigenetic clocks, cell type deconvolution, and stratification of cancer patients. Such epigenetic signatures are usually based on multivariate approaches that require hundreds or even thousands of DNAm sites for predictions. For this reason, hardly any of them have been translated to clinical practice. In contrast, targeted assays for only a few selected CpGs may facilitate faster turnover, better standardisation, and reduced costs, which is especially relevant in a clinical setting. Here, we propose a computational framework named CimpleG (Maié et al., 2022) for the detection of small CpG methylation signatures, for cell type classification and cell mixture deconvolution.
In previous studies, we have shown that these singular signatures can be used to estimate the composition of tissues and cellular mixes with an accuracy comparable to that of models using orders of magnitude more features (Schmidt et al., 2020).
In this work we evaluate the performance of CimpleG and competing methods on two distinct datasets in regard to the cell type classification of blood cells and other somatic cells.
Furthermore, we benchmark how well these classifiers and other state-of-art cell deconvolution methods perform cellular deconvolution on three different datasets with real or artificially mixed leukocyte subsets. We show that CimpleG is both time efficient and performs as well as top performing methods, while basing its prediction on a single DNAm site per cell type. Taken together, CimpleG provides valuable tools for clinician scientists. It is an easy-to-use, stand-alone framework for the delineation of CpG sites that are best suited as biomarkers for targeted DNAm analysis, focused on the problems of cell type identification and cell mixture deconvolution.
References:
Maié, T., Schmidt, M., Erz, M., Wagner, W., & Costa, I. G. (2022). CimpleG: Finding simple CpG methylation signatures. bioRxiv , 2022.09.12.507513 . URL: https://www.biorxiv.org/content/early/2022/09/14/2022.09.12.507513. doi:10.1101/2022.09.12.507513.
Schmidt, M., Maié, T., Dahl, E., Costa, I. G., & Wagner, W. (2020). Deconvolution of cellular subsets in human tissue based on targeted DNA methylation analysis at individual CpG sites. BMC Biology, 18 , 178. URL: https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-020-00910-4. doi:10.1186/s12915-020-00910-4.
Presentation Overview: Show
Background/motivation
Patients are frequently multimorbid at presentation in secondary care. Comorbidities can interact and the presence of particular comorbidities can be symptomatic of a common underlying cause. The complexity of observed multimorbidity combinations means that there is a pressing need to better understand how presenting and preexisting comorbid diseases relate to patient outcomes.
Methods
We performed a sex-stratified analysis of whole-cohort UK Biobank hospital inpatient data and assembled disease sequence trajectories of ICD10 blocks to identify statistically significant disease combinations and orderings. Age-relative 1-year post-trajectory mortality and hospitalisation rates were calculated for each trajectory using Accelerated Failure Time (AFT) models with a 1:3 case-control ratio.
Results
We identified 1784 and 1762 significant disease sequence trajectories for women and men, respectively. We used mortality and hospitalisation outcomes to develop triage rules that identify the highest risk multimorbid patients for females and males presenting with cardiometabolic, respiratory or digestive diseases, cancers, infections, renal failure, mental disorders, and more based on prior diagnostic trajectories.
Conclusions
We provide a useful resource for triaging multimorbid patients based on prior histories of disease and for facilitating further research into drug discovery/repurposing and biomarker identification which may aid in preventing these high risk disease combinations from developing.
Presentation Overview: Show
Biomedicine has experienced a surge in groundbreaking discoveries over the last three decades, aided by the exponential growth of global biobanks and advances in computing power. These developments have facilitated the accumulation of vast amounts of genomics data available for analysis, empowering scientists to conduct population-scale investigations. As the volume of genetic data soars exponentially, driven by improvements in sequencing technology and decreasing costs, an urgent need for high-performance computing (HPC) tools that can efficiently handle these massive datasets emerges.
Current tools offer capabilities for analyzing individual samples; however, they can be expensive and time-consuming when faced with millions of samples. HPCs are crucial for harnessing the power of GPUs, enabling the implementation of cutting-edge methods. In our work, we scaled up the capabilities of SAIGE, a popular tool for Genome-wide Phenotype-wide Analysis, to an exascale level. To achieve this feat, we modified SAIGE to utilize GPUs, resulting in a 20-fold increase in processing speed.
As we continue to push the boundaries of biomedicine, our goal is to empower scientists to conduct analyses at the exascale level by providing HPC tools capable of keeping pace with the ever-growing volume of genetic data, unlocking new possibilities for life-changing discoveries.
Presentation Overview: Show
Introduction: The electrocardiogram (ECG) is a central tool for developing algorithms to predict cardiovascular diseases. Moreover, difference in prediction accuracy has been noted between self-declared ethnicity. However, no investigation has been performed on the ability to predict genetically-inferred ancestry using 12-lead ECG, limiting the assessment of such variations.
Methods: We studied the genetic backgrounds of 16,707 individuals with 220,355 ECG waveforms. Ancestry was obtained using RFMix trained on 1000 Genomes Project profiles. We classified individuals into four ethnic groups (European, South Asian, East Asian, and African) if >70% of their genome matched that ancestry, else labelled as admixed. Next, we used a ResNet50 with depthwise convolutions paired with focal loss to predict ancestry.
Results and discussion: Our study demonstrates, for the first time, that ECGs can be used to predict ancestry with a notable signal-to-noise ratio with a balanced accuracy of 0.625±0.013 on a five-class prediction task versus the noise distribution of 0.342±0.046. Our results highlight the potential for ECG-based deep learning algorithms to implicitly learn ethnicity, which may lead to bias amplification and thus a lack of fairness. Further investigation into the ability to predict individual attributes from ECGs is essential to ensure the development of ethical clinical tools.
Presentation Overview: Show
Genetics, environment, and lifestyle choices all shape the state of our immune system as we age, and contribute to its inevitable dysfunction and a growing susceptibility to disease. However, existing tools do not provide us with a simple metric for measuring one’s immune system health.
We recently tracked ~140 healthy individuals across 9 years and identified a high-dimensional trajectory that describes the dynamics of immune aging. Critically, the position of a given patient along this trajectory defines a single-score metric we term immune-age (IMM-AGE), which outperforms current clinical criteria for predicting cardiovascular risk and all-cause mortality.
However, to date, IMM-AGE is assessed by expensive, technically challenging methods unavailable to the clinical world. Here, through the use of computational modeling and experimental validation, we detail a scalable, cost-effective IMM-AGE measurement compatible with clinical flow cytometers. Moreover, we introduce a robust IMM-AGE gene expression signature that is technology-independent, and by leveraging publicly available datasets, and these new approaches, we show the predictive power of IMM-AGE in various, additional disease conditions.
Our work helps to establish a future where IMM-AGE serves as a widely used predictive tool for disease emergence and outcome, taking us one step closer towards personalized health management.
Presentation Overview: Show
We present iQPA, an integrated quantitative pathway analysis platform that functionally matches modelled disease mechanisms with actual human diseases to improve drug discovery. It is challenging to evaluate how well a model system matches an actual human disease. iQPA integrates human tissue and model system transcriptomes to provide unequivocal functional phenotype matches. iQPA transforms gene expression into quantifiable pathway activities to determine pathway dysregulation. It assesses similarity by establishing a reference of common functional dysregulation between models and human. iQPA is applied to Alzheimer’s disease, determining high-fidelity therapeutic target pathways. A cellular model with a high Aβ42/40-ratio closely recapitulated human dysregulation events in the temporal cortex. Dysregulation events in the hippocampus of 5xFAD mouse models significantly correlated with human temporal cortex. iQPA identified 83 commonly dysregulated pathways with consistent dysregulation across human brains and the Aβ42/40-high model. We validated commonly dysregulated pathways in the Aβ42/4-high model, including the p38 MAPK pathway. A Clinical p38 MAPK inhibitor dramatically ameliorated Aβ-induced tau pathology and neuronal death in the matched Aβ42/4-high model. iQPA guides targeting of the right pathogenic pathway in the right model. It preclinically assesses candidate target pathways with greater confidence for impact on disease while reducing the risk of mistargeting.
Presentation Overview: Show
Background: Remission rates with biologic therapies and placebo are higher in early compared to late Crohn’s disease (CD) – not ulcerative colitis (UC). Here, we describe transcriptomic changes in intestinal tissue associated with disease duration in CD and UC.
Methods: We analyzed bulk RNA sequencing data from two independent prospective CD and UC adult cohorts, MSCCR (nCD=498, nUC=421) and SPARC IBD (nCD=850, nUC=440). Differential expression with disease duration as continuous covariate was tested using linear mixed models. We conducted pathway enrichment analysis and calculated the enrichment scores for IBD patients before biologic treatment (n=61). Co-expression networks between low and high disease was estimaed, and BayesDeBulk, a novel Bayesian algorithm for cellular deconvolution, implemented.
Results: In both cohorts, more genes were differentially expressed in CD than UC. We identified a shared signature of 263 down- and 135 up-regulated genes in longer standing disease in MSCCR and SPARC (q value<0.05). Gene were enriched for pathways related to energy and lipid metabolism and protein modifications. Baseline expression levels of the LXR/RXR activation pathway is associated with infliximab treatment response.
Conclusion: In two independent cohorts, we demonstrate that duration of disease influences gene expression levels, with higher effects observed for CD compared to UC.
Presentation Overview: Show
Understanding human health and diseases requires interpretation of the human biological system at the interface of multiple aspects. The emergence of multi-omics data in various human cohorts world-wide offers possibilities for analysis in an integrative perspective in both sample and variable dimensions. However, gathering sensitive data remains challenging due to disparate privacy regulations from data contributors. We present dsMO, a protocol based on the DataSHIELD infrastructure, for non-disclosive data communication along with a class of statistical and learning methods, which allows for federated analyses on individual- and cross-cohorts with multiple data types. These methods, redesigned from known algorithms, take as input the covariance-related matrices instead of the raw data to build their model, but not sharing parameters from local learning models to build a global one as used in federated and swarm learning. We demonstrated the non-disclosive property of dsMO with mathematical proofs and tested its functionality by applying the implemented methods to toy datasets and two virtual cohorts from international consortia for type 2 diabetes and obesity studies. This creates opportunities to develop and apply such analyses across multiple cohorts while preserving the privacy of sensitive for each data owner. The package suite dsMO is available at https://github.com/sib-swiss/dsMO.
Presentation Overview: Show
Squamous cell carcinomas (SCCs) are tumors that occur in multiple sites and share molecular features independently of their tissue of origin. Advanced tumors are treated with immune checkpoint inhibitors (ICIs). However, responses are not sustained. Increasing overall response rate by priming immune response to ICIs through epigenome modulation is a promising approach under investigation with ongoing clinical trials.
In this study, Whole Exome Sequencing from tumor samples collected in the PEVOsq trial, and RNAseq/ATAC-seq data available in The Cancer Genome Atlas (TCGA) from the lung, head and neck, cervix, anus, vulva, and penis were analyzed using a multi-omics approach. The objectives of this study are to conduct exploratory analysis to assess multi-omics patterns of SCCs and to further identify prognostic and predictive biomarkers of response to treatment with vorinostat and pembroluzimab. Identification of immune and epigenetic biomarkers based on affected genes and pathways would be beneficial to assess treatment performance. RNA-Seq and ATAC-Seq of SCCs TCGA samples showed shared molecular features but still clustered similarly according to tissue of origin. WES analysis of PEVOsq SCC samples showed genes characteristic of SCCs, which was consistent with previous studies. Multi-omics integration is being planned for a more comprehensive understanding of the data.
Presentation Overview: Show
The tumor micro-environment (TME) is composed of a myriad of cell types with varying functions, capable of modulating the physio-pathological process, leading to either tumor progression or ablation. Bulk transcriptomics characterizes the population of cells present in the TME which masks the effects of the different cell types. To remedy this, several deconvolution methods to estimate cell type proportions have been developed to deal with the high cost and complexity of current high throughput techniques (e.g. single-cell transcriptomics). However, the number of resulting features have shown that there is no universal way of selecting one above other because of their high variability and a curse of dimensionality that makes patient classification very hard to achieve. We proposed an algorithm to capture all possible phenotypes of immune cells in tumor samples to describe continuum cell states influencing the TME. Feature selection (FS) approaches and transcription factors (TFs) activities were employed to identify groups of cells with predictive power on cancer progression and biologically classify the behavior of these cell niches. Results report subgroups of NK cells and cancer cells associated with a more immune-suppressive behavior and other subgroups of NK cells and myeloid cells associated with an immune-activating behavior.
Presentation Overview: Show
The tumor micro-environment (TME) is composed of a myriad of cell types with varying functions, capable of modulating the physio-pathological process, leading to either tumor progression or ablation. Bulk transcriptomics characterizes the population of cells present in the TME which masks the effects of the different cell types. To remedy this, several deconvolution methods to estimate cell type proportions have been developed to deal with the high cost and complexity of current high throughput techniques (e.g. single-cell transcriptomics). However, the number of resulting features have shown that there is no universal way of selecting one above other because of their high variability and a curse of dimensionality that makes patient classification very hard to achieve. We proposed an algorithm to capture all possible phenotypes of immune cells in tumor samples to describe continuum cell states influencing the TME. Feature selection (FS) approaches and transcription factors (TFs) activities were employed to identify groups of cells with predictive power on cancer progression and biologically classify the behavior of these cell niches. Results report subgroups of NK cells and cancer cells associated with a more immune-suppressive behavior and other subgroups of NK cells and myeloid cells associated with an immune-activating behavior.
Presentation Overview: Show
LINE-1 retrotransposons are DNA transposable elements propagating through reverse transcription of an RNA intermediate. Of the 500,000 L1 copies dispersed over the genome, approximately 100-150 are retrotransposition-competent. To elucidate the role of L1s in tumour development, the PCAWG consortium recently identified 124 recurrently active L1 sources across thousands of primary cancer tissues. Here, we aim to describe L1 activity in metastatic colorectal cancer (mCRC).
We make use of 610 mCRC samples from the Hartwig Medical Foundation (HMF). We define L1 sources as locations in the reference genome which accumulate L1-associated genomic breakends. L1 destinations were identified as all breakends which do not coincide with a source region.
We identified 19 recurrently active L1 sources in mCRC. Surprisingly, nearly all samples (n=602) display at least one L1 event, which often could not be traced back to their source. Although 33 genes showed L1 insertions in at least 30 samples, no single 100-kb region was affected in more than 22 samples.
We conclude that L1 activity is highly prevalent in mCRC. We aim to experimentally validate our results in different stages of CRC development. In addition, we intend to refine the definition of variant allele frequency (VAF), reflecting timing of retrotransposition events.
Presentation Overview: Show
Intra-operative diagnostics of CNS tumors currently rely on histopathological analysis, which is often confounded by inter-observer variability. Rapid and precise molecular characterization by DNA methylation profiling holds enormous therapeutic promise in guiding neurosurgical interventions and improving patient outcomes. The pocket-sized Nanopore MinION provides real-time detection of epigenetic modifications at single-base resolution, making it an ideal platform for on-site genomic diagnostics. However, current DNA methylation-based classifiers require ad hoc training on subsets of genomic locations covered by shallow Nanopore sequencing, making fast classification challenging. We have developed an online probabilistic framework that integrates sparse DNA methylation profiles from intra-operative genomic sequencing with past clinical information to enable ultra-rapid classification of nervous system malignancies into clinically relevant subtypes. Our optimized pipeline delivers highly accurate brain tumor type diagnosis within one hour, including library preparation, sequencing, and prediction. In a test cohort of Nanopore samples, we observed a median accuracy of 91% for predicted tumor classes from CpGs covered in the first 15 minutes of sequencing, demonstrating high precision and rapid turnaround times. These results suggest that our molecular diagnostic approach has the potential to inform surgical strategy in the future and could facilitate the adoption of personalized therapeutic modalities during neurosurgical interventions.
Presentation Overview: Show
There are more than 7,000 rare diseases, some affecting 3,500 or fewer patients in the US. Due to clinicians’ limited experience with such diseases and the heterogeneity of clinical presentations, approximately 70% of individuals seeking a diagnosis today and up to 50% of the suspected rare diseases remain undiagnosed. Artificial intelligence has demonstrated success in aiding the diagnosis of common diseases. However, existing approaches require labeled datasets with thousands of diagnosed patients per disease. We present SHEPHERD, a machine learning approach for multifaceted rare disease diagnosis. SHEPHERD uses geometric deep learning with multimodal clinico-genetic information and is trained exclusively on synthetically generated patients. Once trained, we show that SHEPHERD can provide clinical insights about real-world patients. We evaluate SHEPHERD on a cohort of 465 patients representing 299 diseases in the Undiagnosed Diseases Network. SHEPHERD excels at several diagnostic facets: performing causal gene discovery (causal genes are predicted at rank = 3.52 on average), retrieving “patients-like-me” with the same causal gene or disease, and providing interpretable characterizations of novel disease presentations. SHEPHERD demonstrates the potential of artificial intelligence to accelerate rare disease diagnosis and has implications for using deep learning on medical datasets with very few labels.
Presentation Overview: Show
Achieving the full potential of precision oncology requires precise prediction of treatment response. Previously, we developed DeepDR, a deep learning model that predicts drug sensitivity by integrating mutations and gene expression of a cancer sample. The model features a transfer learning design that captures two types of features: tumor relevant representations of mutation and expression data learned from tumors, and a predictive model for drug response learned from high-throughput drug screens of cell lines. Thus, DeepDR was applicable to both cell lines and tumors and achieved superior performance over conventional methods. To make DeepDR more accessible to biomedical researchers with limited programming skills, we present a user-friendly web server using an R Shiny framework. The web app allows users to upload mutation and/or gene expression profiles of a cancer sample (cell or tumor model) and predict the sample's response to 265 anti-cancer compounds. It provides an intuitive user interface to interactively visualize, search, and filter prediction results. Additionally, it enables various downstream analyses, including statistical tests, and provides links to external compound databases such as PubChem. We believe that the R Shiny app will foster accessibility of our deep learning prediction machine and facilitate the drug development process in cancer.
Presentation Overview: Show
Psychiatric disorders, such as schizophrenia (SCZ), schizoaffective (SCA), bipolar (BIP) and major depressive disorder (MDD), are known to have a high genetic and phenotypic overlap. Due to the high polygenicity and the variety of cell-types with complex interactions in the human brain, the processes underlying these disorders are only partially understood. It is our goal to gain new insights in the pathophysiology of psychiatric diseases in different human brain cell-types on the transcriptomic and epigenomic level. Therefore, we studied gene expression and chromatin accessibility on a cell-type level in postmortem human brain tissue from the orbitofrontal cortex of individuals with SCZ, SCA, BIP or MDD (n=38,7,7,5) and 35 healthy controls. We were able to identify 18 cell-types from more than 800’000 cells in the single-nucleus (sn)RNA-seq data, while about 400’000 snATAC-seq cells were assigned to 15 cell-types. Differential expression analysis implicates the strongest dysregulation between cases and controls in excitatory neurons and oligodendrocytes. Only parts of the regulatory processes causing these differences can be explained by differential chromatin accessibility between cases and controls.
Presentation Overview: Show
Knowing a patient's genetic ancestry is crucial in clinical settings, providing benefits such as tailored genetic testing, targeted health screening based on ancestral disease-predisposition rates, and personalized medication dosages. However, self-reported ethnicity or ancestry is frequently used in the research and clinical setting and is often inconsistent with genetic ancestry derived from genomic data, potentially driving health disparities. Many existing approaches use genome sequencing data to infer ancestry at the continental level, creating the need for methods optimized for individualized ancestry assignment. We present SNVstory, a method built upon three independent machine learning models for accurately inferring the sub-continental ancestry of individuals across 36 different populations with high accuracy.
Additionally, we apply SNVstory to our in-house clinical dataset of 293 individuals, comparing self-reported ethnicity and race to the inferred genetic ancestry. Greater than 90% of individuals share agreement between genetic ancestry and self-reported ethnicity/race at the continental level. However, several cases exist where there is a discrepancy between the self-reported information and the inferred genetic ancestry, and 13 cases have no self-reported data. The ability to refine or add genetic ancestry information in these cases is helpful for added diagnostic precision in variant filtering/prioritization.
Presentation Overview: Show
Unintended effects of medications on diverse diseases have been identified many years after these drugs enter common use. This implies that the drugs unexpectedly influence disease pathways. Then, discovering how biological effects of drugs relate to disease biology can both provide insight into the biological basis for latent drug effects, and can help predict new effects. Rich data now comprehensively profiles both biological processes impacted by common drugs, and human phenotypes known to be affected by these drugs. At the same time, systematic phenome-wide genetic studies have linked each common phenotype with putative disease driver genes. Here, we develop a supervised method that integrates in vitro data on 429 drugs and gene associations of 151 common phenotypes to learn how these molecular signals can explain drug effects on disease. Our predictions of drug-phenotype relationships outperform a baseline predictive model. But more importantly, by projecting each drug to the space of its influence on disease driver genes, we can propose the biological mechanism of unexpected effects of drugs on disease phenotypes. We present evidence that this model recovers known information about drug biology, supporting its potential to provide insights into the biology of unexpected effects of drugs on disease.
Presentation Overview: Show
Jointly analyzing biomedical data held by different institutions is challenging due to privacy concerns. This presents a significant obstacle to biomedical advances, since analyzing data from large and diverse cohorts is essential for novel discoveries. Recent collaborative GWAS studies have demonstrated how jointly analyzing multiple datasets can enable detection of an unprecedented number of genetic associations. However, existing federated analysis approaches suffer from reduced accuracy, limited supported analyses, and unresolved privacy concerns.
We present a novel collaborative analysis framework for analyzing large-scale biomedical data across multiple institutions, providing rigorous privacy guarantees. Our approach combines efficient federated computation strategies with state-of-the-art cryptography techniques, enabling biobank-scale studies across siloed datasets while guaranteeing privacy.
We showcase our framework on essential genomic analysis tasks, including GWAS, PCA, and identification of genetic relatives in the federated setting. Our framework uncovers insights that cannot be identified using each dataset alone. Moreover, our secure and federated analysis tools can be easily deployed via a user-friendly web server (sfkit.org), with the potential to accelerate biomedical research by unlocking new collaborative studies across biomedical data silos.
Presentation Overview: Show
Neoadjuvant chemo-hormonal therapy followed by cytoreductive radical prostatectomy is a novel therapeutic approach in patients with newly diagnosed high-risk primary multifocal prostate cancer with oligometastatic disease. To help guide treatment decisions, we collected clinical and imaging (PET/CT and PET/MRI) biomarkers from 30 patients and identified those that are predictive of disease progression. The following radiological biomarkers were collected before and after neoadjuvant treatment: MRI Likert score, PET Likert score, PET maximum standardized uptake value (SUVmax), MRI apparent diffusion coefficient (ADC¬), diffusion weighed imaging (DWI), and dynamic contrast-enhanced (DCE). The following clinical biomarkers were collected: tumor volume from pre-CHT MRI, pre and post-treatment prostate specific antigen (PSA) levels, 3D prostate volume, and PSA density (PSAd); and PSA change from pre to post-treatment. Using univariate and multivariate Cox regression, we observe that post-treatment SUVmax is predictive of disease progression (p=1e-6), as is tumor volume (p=2e-6), pre and post-treatment PSA levels (p=1e-4), and PSA change (p=4e-4). The combination of SUVmax and tumor volume is the most predictive combination of biomarkers (p=5e-8), and is more predictive than either biomarker individually. These results demonstrate the existence of imaging and clinical biomarkers that are predictive of disease progression in neoadjuvant chemo-hormonal therapy of multifocal prostate cancer.
Presentation Overview: Show
Background: Genomic alterations can significantly affect cellular biology, however identifying which alterations have a system-wide impact is a challenging task. Cancer samples typically contain many alterations. Identifying those with an impact is essential for biomarkers development. Here we present a novel computation tool, CIBRA, to detect genomic alterations with a system-wide impact.
Method: CIBRA integrates two omics data types to determine the system-wide impact: one indicates genomic alterations, and another defines the system-wide expression response, e.g. RNAseq data. CIBRA is both able to identify the system-wide response of genomic alterations and evaluate the degree of similarity in expression responses between alterations.
Results: Applying CIBRA to a genome-wide screen of SNVs in several cancer types, we could identify the system-wide impact on expression for most known drivers, validating CIBRA’s ability to detect biologically relevant impact. Surprisingly, when applying CIBRA to structural variants, we found a similarly large proportion of genomic alterations with a system-wide impact, suggesting that the biological impact of structural variants has been largely underreported.
Relevance: CIBRA is an essential new tool to identify the impact of genomic alterations by combining multi-omics data types and can refine our current definitions of alterations, in order to derive more accurate biomarkers.
Presentation Overview: Show
Cell therapy biomanufacturing is a growing industry. Cell therapy products have the potential to provide life changing treatment for patients. Yet challenges remain. The challenges we face are specifically derived from the short shelf-life, small lot size and subsequent strict sterility requirements necessary during release testing. Our aim is to meet and solve the challenges of rapid, sensitive and unbiased detection of adventitious agents by combining third generation nanopore amplicon sequencing with background sequencing noise controls and machine learning to provide a decision on a clinical sample’s sterility status. Our focus is on developing a highly replicable sterility workflow that can be used in T-cell therapies post-leukapheresis, at-line during culture and at release testing. We generated samples incorporating T-cells spiked with bacterial and fungal species and used rDNA amplicons to identify contaminant species. Sequenced sample reads are processed, with host reads removed and potential contaminant species identified in an untargeted metagenomics approach. The analysed sequencing data are used to build a machine learning model to understand if a sample is contaminated. Using this approach, we are able to detect the USP<71> contaminant organisms at 10 CFU within 24 hours from low initial sample volumes.
Presentation Overview: Show
Myelodysplastic syndromes (MDS) are disorders characterised by abnormal development and function of blood-forming cells. This is often associated with chromosomal abnormalities, most frequently deletion in chromosome 5 [del(5q)]. This deletion affects multiple genes, including casein kinase 1α (CSNK1A1), a part of the β-Catenin destruction complex which has been identified as a tumour suppressor. CSNK1A1 is frequently mutated in del(5q) MDS, which is associated with poorer prognosis- however, this mechanism remains unclear.
We investigated the contribution of altered CSNK1A1 expression to clonal expansion in del(5q) MDS, through parallel (phospho)proteomic and transcriptomic data analysis from cell lines with CSNK1A1 mutation or haploinsufficiency. Differential analysis and pathway enrichment techniques identified altered iron and lipid metabolism and protein translation as potential mechanisms of CSNK1A1 deregulation in disease progression. We are investigating the role of these factors in clonal selection, as their targeting might not only improve outcomes for patients with del(5q) MDS, but also other haematological malignancies.
This multiomic analysis has revealed strong candidates toward new therapeutic treatments for del(5q) MDS. We will continue to build upon this by incorporating single-cell data from patient cohorts and genome-scale network analysis.
Presentation Overview: Show
On average, it takes 12 years and costs over $2.7 billion to bring a new cancer drug to the market. A major challenge is assessing molecule safety and efficacy early in clinical trials. To tackle this, Digital Twins have been proposed to predict the trajectory of oncology patients, for example to simulate a control arm. Here, we explore whether a Digital Twin can be developed using generative deep learning models trained on real world data (RWD) from electronic health records (EHRs). This study used a nationwide Flatiron Health EHR-derived de-identified database. We first extracted and cleaned data from over 13500 multiple myeloma patients, resulting in 697 sparse longitudinal patient variables. In order to utilize large language models (LLMs) on tabular data, we procedurally transformed the input data into sentences. Then, we used pre-trained LLMs, including the open-source LongT5 model, to exploit the non-linear relations between patient variables. Finally, we ran experiments and compared LLMs against a majority classifier baseline. We encountered some open issues, such as handling long sequences and evaluating predicted trajectories, but found that overall, LLMs have high potential to model Digital Twins. In conclusion, we propose using LLMs as Digital Twins to help revolutionize clinical trials.
Presentation Overview: Show
The integration of genomics into clinical practice holds transformative potential. By illuminating the unique interplay between each individual's genome and drug compounds, genomics empowers healthcare providers to tailor treatments to patient-specific needs, enhancing outcomes while minimizing adverse reactions.
We present a novel pharmacogenomics application for individualized drug sensitivity prediction in oncology. This tool utilizes preprocessed personal genome data and publicly accessible pharmacogenomics databases to generate patient-specific drug sensitivity reports. Unique to our application is the development of a confidence score for each prediction, integrating existing knowledge of specific genetic variations and compound interactions. The application streamlines a complex bioinformatics workflow, transforming preprocessed genomic data into pharmacogenomics insights accompanied by confidence scores. Distinct graphical user interfaces (GUIs) cater to oncologists, pharmacists, and patients. The oncologists' GUI provides comprehensive drug sensitivity predictions with confidence scores; the pharmacists' interface emphasizes drug interactions, dosage adjustments, and potential adverse reactions; while the patient GUI offers simplified treatment options and potential outcomes.
This application aims to pioneer a new era of personalized medicine, merging genomics and clinical practice, to foster a more accurate, efficient, and patient-centric healthcare paradigm. Future iterations will extend the tool's utility and accessibility to other medical domains.
Presentation Overview: Show
The emerging evidence indicates that tumor transcriptional heterogeneity and plasticity plays an important role in drug response and resistance. We created a responsive web tool iscTrack to track tumor progression and predict novel therapeutic strategies to overcome drug resistance. We devised a multi-stage semi-supervised approach to analyze Single-cell RNA sequencing (scRNA-seq) data to track transcriptional state shifts, expansions, emerging, and disappearing states in patient tumor samples over time and response to therapy. We conducted simulations to evaluate its performance and benchmarked it against the commonly used approaches. The simulation results showed our approach successfully identifies and clearly interprets existing, newly emerged, or suppressed states. Furthermore, we applied the proposed method to analyze serial samples from melanoma patients to identify elevated biomarkers, enriched pathways and therapies targeting emerging resistant states. We validated the predicted drugs, PHA-767491, a dual CDC7/CDK9 kinase inhibitor and OTSSP167, a leucine zipper kinase MELK inhibitor experimentally using two pairs of parental and resistant melanoma cell lines, SK_Mel28 (-RR) and WM164 (-RR). We showed iscTrack identified emerging resistant states in patients’ progressive samples and predicted effective therapies overcoming resistance in cell culture models.
Presentation Overview: Show
Multidimensional clinical or omics data is often used to describe the variability among patients. To better understand disease phenotypic heterogeneity, simultaneous analysis of these variables is essential.
Here we present two use cases for data dimensionality reduction with DDRTree, a reversed graph embedding algorithm on multidimensional datasets. First, we used the continuous glucose monitoring metrics from a type I diabetes (T1D) cohort study (SFDT1) and applied DDRTree on six phenotypic variables normalised for age and sex. We then overlaid the resulting tree structure with the phenotype values to visualize their distribution. The clinical trajectory tree of 618 patients comprised seven distal ends of the tree branches. Five glycaemic phenotypes presented significantly different distributions in the branches. Second, we applied DDRTree on the methylation data of neoplastic precursor lesions of the biliary tract and invasive cholangiocarcinoma. The constructed tree inferred the lineage trajectory of tumorigenesis.
Conclusions: Applying DDRTree on multidimensional clinical data could help clinicians characterise the glycaemic phenotypic heterogeneity of patients with T1D, and use this representation as a personalised care solution to prevent complications. Its application to DNA methylation data can reveal distinct trajectories and infer altered developmental trajectories or different cell-of-origin in cancer.
Presentation Overview: Show
Chemotherapy resistance remains a significant challenge in high-grade serous ovarian cancer (HGSOC) treatment. To address this, we developed a framework utilizing scRNA-seq data to achieve two objectives: 1) identifying and characterizing resistant subpopulations of cells, and 2) predicting drugs targeting these resistant subpopulations.
To address our first objective, we used non-negative matrix factorization (NMF) to model the expression profile of each cell as a weighted sum over elementary gene signatures. We then assess each signature’s association with resistance using clinical variables and differences in cell proportions expressing the signature (pre- to post-chemotherapy). Additionally, external validation is performed in bulk RNA-seq data. For our second objective, we use the PRISM database to identify drugs that target each signature.
We applied our framework to a unique scRNA-seq dataset of HGSOC patients. The three candidate signatures that showed the strongest association with resistance are 1) a signature marked by FN1, 2) a signature of genes from the same genomic region (including LYNX1 and LYPD2) and a KRAS- and immune-related signature.
Future work involves validating candidate signatures and predicted drugs in organoid and PDX models.
Our framework is generalizable and offers the potential to identify precision drugs for overcoming resistance in other cancer types.
Presentation Overview: Show
In spatial biology, cellular interaction rates are frequently assessed using distance-based metrics derived from images of sectioned tissue samples. To evaluate statistical significance, it is common practice to use Monte Carlo methods where the null distribution is computed by permuting cell labels across the whole region of interest. However, this approach destroys the underlying tissue architecture and rather corresponds to testing against a spatially homogenized cell suspension.
Here, we propose to constrain permutations to distinct tissue compartments to preserve the general architecture. To compare both randomization approaches, we assessed interaction patterns of predefined distributions in synthetic cell data based on real tissue images. Results indicate that the standard procedure strongly overestimates interaction rates within compartments while underestimating cross-compartment interactions. In contrast, constrained permutation yields a more realistic null model that prevents false positives.
We then assessed real interactions between cytotoxic T cells and either tumor cells or macrophages in over 1200 samples from six tumor types. Our observations are in line with results obtained from synthetic data and emphasize the need for careful definition of the null distribution in spatial Monte Carlo tests. Together, these findings demonstrate that constrained cell label permutations enable more accurate spatial characterization of tissue samples.
Presentation Overview: Show
Although surgery benefits patients having intrahepatic cholangiocarcinoma (iCCA), recurrence rates are high and overall survival is low. Accurate prognosis models that predict outcomes are lacking but deep learning methods offer an opportunity to enhance prognosis prediction by leveraging multiple data sources and extracting distinguishing characteristics.
To investigate this, we analyzed 83 iCCA patients who underwent surgery and developed a multi-modal model to predict overall survival (OS) and progression-free survival (PFS). Histology slides were pre-processed into patches and fed into a ResNet feature extractor. Tumors were profiled for somatic genomic alterations using MSK-IMPACT, a deep targeted-sequencing assay. Cox proportional hazard models were designed for each modality (clinical data, histological slides, and altered genes), and their predictions were averaged to produce a final log risk score. The models were validated using a 5-fold, 5-repeat cross-validation.
The clinical and histological data model achieved a concordance index of 0.74 for OS and 0.73 for PFS. Adding genetic alteration information improved OS performance (concordance index: 0.80) and showed similar PFS performance (concordance index: 0.72). Both models outperformed a staging-based patient stratification approach.
This study demonstrates that machine learning models can improve survival prediction using multi-modal data after resection of intrahepatic cholangiocarcinoma.
Presentation Overview: Show
The aim of the study was to estimate the minimal depth of coverage required for sensitive exon-level copy number variants (CNV) detection in targeted next-generation sequencing (NGS) panels.
NGS was performed on Illumina MiniSeq sequencer, using custom hybrid capture NGS panels, targeting genes related to hereditary endocrine tumors, Lynch syndrome and hereditary breast cancer. The CNV detection was performed using ExomeDepth, on 18 sequencing runs (34 CNV-positive samples, 396 samples in total).
We performed two simulations, in which we downsampled bam files to obtain 95%, 90% … 5% of initial depth of coverage. In the first simulation, all bam files in each run were downsampled. 100x depth of coverage resulted in 88% CNV detection sensitivity, 150x resulted in 97%, and 173x resulted in 100% sensitivity. In the second simulation, only the bam files from positive samples were downsampled. 100x depth of coverage resulted in 91% CNV detection sensitivity, 150x resulted in 94%, 200x resulted in 97% and 346x resulted in 100% sensitivity. The highest depth of coverage was required in case of short CNVs (e.g. deletion of exon 22 in BRCA1). The lowest depth of coverage was required in case of long CNVs (e.g. deletion of whole MEN1 gene).
Presentation Overview: Show
Surgery followed by adjuvant chemotherapy (ACT) is standard of care in stage III colon cancer. However, only 15-20% of patients benefit from ACT. Therefore, there is a need for prognostic biomarkers to better stratify this group of patients and avoid futile treatment. Circulating tumor DNA (CtDNA)-guided minimal residual disease (MRD) assessment after resection of the primary tumor has shown prognostic value in colon cancer, but studies with large well-defined patient cohorts are needed to demonstrate clinical utility and determine optimal usage. To determine the prognostic value of ctDNA in stage III colon cancer patients treated with ACT, 238 patients were included in the prospective observational study “PROVENC3” (PROgnostic Value of Early Notification by Ctdna in Colon Cancer stage 3). Blood was collected pre-surgery, post-surgery, post-ACT and every six months up to three years. Tumor-informed detection of ctDNA was performed through integrated whole genome sequencing analyses of formalin-fixed paraffin-embedded tumor tissue DNA (80x), germline DNA (40x), and plasma cell-free DNA (30x). Analyses are currently ongoing for pre-surgery, post-surgery and post-ACT samples. Preliminary results demonstrated a ctDNA detection rate of 93.4% pre-surgery and 17.1% post-surgery, which was associated with disease recurrence.
Presentation Overview: Show
Background: CAR-T cells have become a promising option to treat hematological malignancies, such as Multiple Myeloma or Acute Lymphoid Leukemia. However, despite their initial success, they eventually fail. Advances in the field of single-cell sequencing, have allowed progress in understanding the genomic landscape of CAR-T cells. Despite this, there is still a lack of studies that focus on critical questions regarding molecular mechanisms governing CAR-T cell function, such as the mechanisms underlying gene regulatory networks (GRN) changes in unresponsive patients.
Results: We have created the first functional single-(meta)cell atlas of CAR-Ts in hematological malignancies from public datasets. We have integrated data coming from 11 studies yielding around 220K cells. Then, we used SeaCells to calculate the metacells, allowing us to maximize cellular resolution while maintaining data statistical power. Finally, we used SimiC to obtain the GRNs underlying CAR-T cell regulatory dynamics adding functional information to the atlas. Thanks to this pan-study integration we were able to identify cellular populations that were not considered individually but become relevant when leveraging all datasets.
Conclusion: The use of the developed single-cell atlas will enhance our understanding of CAR-T cells, and will make us able to create improved therapies for our patients.
Presentation Overview: Show
Interstitial cystitis (IC) is a chronic condition that causes pain, pressure, and discomfort in the bladder and pelvic area. However, interstitial cystitis is a condition that lacks a clear understanding of its underlying pathogenesis, and genetic study can provide valuable insights into the biological mechanisms involved. To investigate the molecular mechanisms of IC, we conducted a transcriptome analysis on bladder tissue samples of mucosal and serosal/muscular layers obtained from 10 patients. Also, to broaden our understanding of IC, transcriptomic profiling was performed on non-ulcered lesions (known as “non-Hunner”) as well as from ulcerated lesions (known as “Hunner”). In mucosal layer, we observed the activation of the immune system and fibrosis in both Hunner and non-Hunner lesions, compared to normal tissue. Specifically, genes related to anti-inflammation and immune suppression were deficient in Hunner lesions compared to non-Hunner lesions. In muscular layer, transcriptomic evidence of muscle thickness was observed in both Hunner and non-Hunner lesions; however, the difference of the two was insignificant. Our study provides a foundation for future research aimed at better understanding the complex nature of interstitial cystitis and identifying new therapeutic targets.
Presentation Overview: Show
Neurodegenerative disease development's molecular causes and mechanisms are largely unknown or incompletely understood. Still, the rapidly growing body of targeted single-cell literature suggests a broad implication of neural, glial, and immune cell subtypes in either one or multiple central nervous system disorders. A current challenge in the field is to integrate all these single-cell pieces of evidence into a greater picture of reproducible findings. Here, we present a large-scale single-cell RNA-seq brain atlas harmonizing and integrating the data of 36 studies and a total of 4.4 million mouse and human cells. Our atlas contains samples of various neurodegenerative diseases such as Alzheimer’s disease, Parkinson’s disease, and Multiple sclerosis. The dataset comprises samples from 490 human and 341 mouse donors across 18 different brain regions. We used scVI, an auto-encoder model for integrating the single-cell RNA-seq collection. The curated and harmonized data, accessible via an interactive web server, provides cell-type and disease-specific markers. We show that each sample's brain region of origin is a major driver in biological heterogeneity observed at the transcriptional level. This resource aids experts in the field in generating data-driven hypotheses and validating new single-cell findings in neurodegenerative diseases.
Presentation Overview: Show
Three years in the SARS-CoV-2 pandemic, a mixture of worldwide surveillance, vaccination campaigns and improved clinical protocols enabled for the Coronavirus disease to become a treatable disease. However, for fragile, immuno-depressed patients, SARS-CoV-2 infection still represents an unchallenged threat, with a normalized mortality of up to 60% of infected individuals (Nadkarni, Vijayakumaran, Gupta, & Divatia, 2021). Furthermore, the infection evolution in ultra-long positive patients may ease the rise of novel Variants of Concern due to massive replication, poor immune surveillance and viral recombination. During the core of the pandemic, the IFO acted as a one of the virology hubs for outpatient viral surveillance while admitting and treating oncological patients in its National Cancer Centre (Donzelli et al., 2022). This peculiar situation enabled us to mine the wealth of data produced during that period, performing a Computational Real-World Genomics Data Analysis. Specifically, we present results on (1) In-patient and out-patient outbreak clusters distribution, with different epidemiological parameters during the different lineage-associated pandemic waves (2) a novel genomics-driven biomarker defined as Intra-Host-Diversity-Index (IHDI), found slightly higher in cancer patients (3) an immuno-informatics dissection of the putative low-fraction viral antigens during long-term infections.
Presentation Overview: Show
With an increasing worldwide shortage in pathologists associated with increasing request for subspecialized pathological diagnostics, the demand for automated tools to assist pathologists has grown. The accurate diagnosis of the most frequent CNS tumor group constituted by various glioma subtypes plays a crucial role in selecting treatment strategies to improve patient survival rates. We present here an online visualization tool and pre-trained models that leverage deep learning algorithms based on H&E-stained whole slide images (WSIs) of brain tumor and normal CNS tissue samples. With our web-based application, pathologists can be supported in the diagnostic process for adult-type diffuse gliomas by an accurate AI-based classification and precise segmentation of tumor tissue and remote normal brain areas. We used deep-learning models pretrained on large-scale natural image dataset (out-of-domain) and multiple mid-scale histopathological image datasets (in-domain), and fine-tuned them with an in-house dataset consisting of 74 WSIs obtained from 28 glioma patients. This dataset included five tissue classes: astrocytoma, glioblastoma, oligodendroglioma, necrosis and normal tissue. The performance of in-domain transfer learning was higher compared to out-of-domain. We proposed an optimal model and training strategy that showed a balanced accuracy of 96.9%. The tool is available online at https://bioinfo.lih.lu/deephisto .
Presentation Overview: Show
Systemic Autoinflammatory Diseases (SAIDs) result from the dysregulation of the innate immune system. On average, the time between disease onset and the correct diagnosis and treatment, is seven years. During this time, a SAID patient can get up to 4-5 misdiagnoses without getting the correct treatment for their ailment, reducing their quality of life. Furthermore, around 40-60% of SAID patients will not get a specific diagnosis. The problem of correctly diagnosing SAIDs is the result of their strikingly similar clinical manifestations and a lack of known biomarkers that differentiates them. In this study, we aim to take advantage of the extensive Danish health registries to shed light on the clinical features that may help to streamline the diagnosis. For this study, we analysed the disease trajectories of 339,602 patients from a comprehensive list of 47 SAIDs present in the registry. With this, we expect to find key clinical differences between and within SAIDs that aid in the classification, stratification, and diagnosis of these diseases.
Presentation Overview: Show
Pancreatic cancer is one of the deadliest cancers, with a very low survival rate. The disease is challenging because of the lack of early symptoms but also because of the difficulty to detect small lesions on imaging methods. Moreover, the symptoms are so broad that in the case of pancreatic cancer suspicion, but without lesions or abnormalities detected, other diseases will be investigated instead. Deep learning has already shown promising results for medical imaging analysis and can be used to develop better imaging techniques such as detection of lesion. In this study, we explore the potential of deep learning techniques on images to help radiologists with lesion detection on computed tomography (CT). The singularity of Danish registries allows our study to have a cohort of 2877 pancreatic cancer patients with available CT scans. A certain number of scans, made at least 3 months before the diagnosis, have been labeled by a radiologist specialised in pancreatic lesions, in order to train a classification model for lesions easily missed due to their size or localisation in the pancreas. In this poster, we present the preprocessing of medical images followed by the classification model of CT volumes as lesion, without lesion or suspicion.
Presentation Overview: Show
Background: Biosimulation models can support clinical diagnosis, therapy, and scientific investigations. However, determining a model’s quality and adherence to clinical standards and regulations remains difficult. Thus their integration in clinical routines is still hindered.
Methods: We developed a framework based on Research Data Alliance indicators (https://doi.org/10.15497/rda00050) and following the IMI FAIRplus template to evaluate the FAIRness of biosimulation models encoded in COMBINE standards (https://doi.org/10.1515/jib-2021-0026). We developed a structured survey to collect further community requirements with regard to information sharing and transparency.
Results: A semi-automated FAIR evaluation tool (https://github.com/FAIR-CA-indicators) was implemented using examples of curated and non-curated biomodels. We developed recommendations for model FAIRification, and we present a community survey as a means to get structured feedback on the requirements for improved communication and transparency.
Conclusions: We believe that adherence to FAIR principles supports reproducibility and improves accessibility of biosimulation models - which are relevant indicators for high model quality. We encourage the conference participants to give us feedback on community needs regarding transparency and findability of information about existing models, tools, guidelines and training materials by answering the survey provided at https://www.soscisurvey.de/sysmed-portal/.