Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
A quantitative framework for investigating mechanisms of spatiotemporal regulation of actin microridge patterns
Track: MLCSB
  • Bhavna Rajasekaran, IISER Bhopal, India
  • Mahendra Sonawane, Tata Institute of Fundamental Research, Colaba, Mumbai- 400005, India


Presentation Overview: Show

Squamous epithelial cells across vertebrate species exhibit distinct actin rich projections called microridges. Microridges emerge on the apical domain of the epithelial cell surfaces; the periderm cells as F-actin punctate, associate with several actin binding proteins, primarily the Arp2/3 complex. Over the course of embryonic development, punctuates spontaneously fuse and form labyrinth-like patterns that remain dynamic. Our deep learning microridge segmentation strategy achieved pixel-level classification accuracy of ~95%. We performed detailed quantitative analysis of microridge patterns and analysed their bio-physical-mechanical characteristics, including their flexural rigidity, stress distribution within the actomyosin networks of microridges. We investigated how transiently formed actin clusters influenced pattern rearrangements over short length/time-scales important towards pattern maintenance. Our framework allows large-scale analysis of microridges to unravel the underlying patterning mechanisms during epithelial development and their responses to chemical and genetic perturbations.

An Ensemble AI model for Prediction of Antiviral peptides and their mechanisms of action
Track: MLCSB
  • Fernanda M Abukawa, Laboratory of Applied Toxinology, Butantan Institute, Brazil
  • Thales AM Fernandes, Laboratory of Applied Toxinology, Brazil
  • Ana Claudia O Carreira, Federal University of ABC, Brazil
  • Milton Yutaka Nishiyama-Jr, Laboratory of Applied Toxinology, Butantan Institute, Brazil


Presentation Overview: Show

To fight harmful viral epidemics, the development of new antiviral drugs and Vaccines are the best strategies. The drug discovery process can take many years and costs billions of dollars. In this regard, the venomous organisms stand out as potential source for Antiviral Peptides (AVPs). However, it is difficult to predict potential AVPs and establish their mechanisms of action. We have proposed an ensemble Machine and Deep Learning model to predict and classify the AVPs. It uses RNN-LSTM and Random Forest models, combining physicochemical and primary sequence as input; following, it uses SVM model to classify the AVPs by their mechanisms of action (Membrane, Replication and Assembly). Compared to other AVP predictors, we have achieved the best Accuracy of 96% and AUC of 98.7%, and a classification precision of (91%, 87%, 73%) for Membrane, Replication and Assembly classes, respectively. The case study of ten experimentally validated anti-arboviral peptides from venomous animals, our approach predicted 100% of the AVPs and 80% for correct mechanism of action. The Latarcin 1 peptide classified into Replication class, we evaluated it structurally by molecular docking and showed a higher binding affinity to the inhibitory pocket of DENV-2 replication protein NS5 than the envelope protein (E).

Application of machine learning algorithms in the analysis of SERS spectra for rapid discrimination of Shigella spp. and Escherichia coli
Track: MLCSB
  • Wei Liu, Xuzhou Medical University, China
  • Jia-Wei Tang, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, China
  • Jing-Yi Mou, Xuzhou Medical University, China
  • Zheng-Kang Li, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, China
  • Xiang Wu, Xuzhou Medical University, China
  • Liang Wang, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, China


Presentation Overview: Show

It is well known that Shigella spp. and E. coli are closely related with many common characteristics. Evolutionarily speaking, Shigella spp. are positioned within the phylogenetic tree of E. coli. Therefore, discrimination of Shigella spp., from E. coli is very difficult. As a low-cost and non-invasive method, surface enhanced Raman spectroscopy (SERS) is currently under intensive study for its diagnostic potential in bacterial pathogens, which is worthy of further investigation for its application in bacterial discrimination. In this study, we focused on clinically isolated E. coli strains and Shigella species (spp.), that is, S. dysenteriae, S. boydii, S. flexneri, and S. sonnei, based on which SERS spectra were generated and characteristic peaks for Shigella spp., and E. coli were identified, revealing unique molecular components in the two bacterial groups. Further comparative analysis of machine learning algorithms showed that, the Convolutional Neural Network (CNN) achieved the best performance and robustness in bacterial discrimination capacity when compared with Random Forest (RF) and Support Vector Machine (SVM) algorithms. Taken together, this study confirmed that SERS paired with machine learning could achieve high accuracy in discriminating Shigella spp. from E. coli, which facilitated its application potential for diarrheal prevention and control in clinical settings.

Bacterial Pathogen Virulence using ML - Defining and learning it
Track: MLCSB
  • Akshay Agarwal, IBM, United States
  • Kristen Beck, IBM, United States
  • L. Clifford McDonald, CDC, United States
  • Christopher Elkins, CDC, United States
  • Susannah McKay, CDC, United States
  • Alison Halpin, CDC, United States
  • Edward Seabolt, IBM, United States
  • Vandana Mukherjee, IBM, United States


Presentation Overview: Show

Virulence is defined as the ability of microbes to infect and cause disease in a host and can involve acquired elements with variable representation across a given species. Our objective was to predict virulence of the bacteria Klebsiella pneumoniae by applying machine learning (ML). Our approach involved (i.) categorizing publicly available genomes using isolation sources as proxies for virulence, (ii.) profiling defined molecular features for virulence prediction across genomes, and (iii.) ML models developed and iteratively assessed to predict virulence. Using approximately 2,800 Klebsiella genomes, we developed a set of models that can predict high vs low virulence with an F-1 score of 0.84, and virulence vs no virulence with F-1 score of 0.7. Such successful application provides a foundation for applying ML to virulence prediction of Klebsiella genomes by defining a ML problem statement and showcases models that can be used for prediction and future rapid risk assessment of novel or emerging pathogen strains. Furthermore, we intend to discuss how these models may enable discovery of additional virulence factors and how these models can possibly be enhanced from an algorithmic perspective and by reinterpreting the labels which may have significant impacts on developing metadata standards for public datasets.

Creating a Novel Deep Learning Pipeline to Generate and Screen Molecules for Hormone-Positive Breast Cancer Treatment
Track: MLCSB
  • Nishank Raisinghani, Dougherty Valley High School, United States
  • David DiStefano, Tufts University, United States


Presentation Overview: Show

In this paper, we design a novel architecture that aims to generate novel molecules that will treat hormone-receptor-positive breast cancer disease. These molecules aim to inhibit aromatase, CDK4, CDK6, PI3K, and mTOR proteins. To do this, we used an NLP-based variational autoencoder. Our model is trained on the ZINC open-source dataset due to its vast library of drug molecules. To generate our molecules we compiled a test set of molecules that have been proven to bind to our mentioned target proteins. To measure the initial viability of our generated molecules we used RDKit’s QED score, which will help provide insight into the drug-likeness of our generated data. Supplementary models predict other properties of the molecules, specifically solubility, synthetic accessibility, and toxicity, heightening our screening process. We used the AutoDock Vina framework to predict the Gibbs Free Energy Score between these molecules and the desired target proteins. Our research’s goal is to develop a novel process to generate and screen for hormone-positive breast cancer drug molecules that can be feasible in the real world. Since the drug discovery space is so large, neural networks are a valuable tool to help cut down the time it takes to find these molecules.

DEEP LEARNING MATCHES HUMAN PERFORMANCE FOR THE DETAILED ASSESSMENT OF LUNG INJURY FEATURES IN HISTOLOGY IMAGES
Track: MLCSB
  • Salma Kazemi Rashed, Department of Experimental Medical Science, Faculty of Medicine, Lund University, Lund, Sweden., Sweden
  • Iran Augusto Silva, Department of Experimental Medical Sciences, Faculty of Medicine, Lund University, Lund, Sweden., Sweden
  • Darcy Wagner, Department of Experimental Medical Sciences, Faculty of Medicine, Lund University, Lund, Sweden., Sweden
  • Sonja Aits, Department of Experimental Medical Science, Faculty of Medicine, Lund University, Lund, Sweden., Sweden


Presentation Overview: Show

Acute respiratory distress syndrome (ARDS), highly heterogeneous and life-threatening disease with reported mortality rates ranging from 30%–50% worldwide occurs due to the variety of infectious and non-infectious injuries. In the animal models due the lack of resources to mimic the clinical observations, such as Berlin criteria, the severity of lung injury is assessed typically by examining the extent of histological tissue damage. Damage assessment is conducted by utilizing semi-quantitative score systems of individual damage features in histological slides, e.g., haemorrhage, inflammatory cell influx, thickening of the alveolar wall, and others. Manual semi-quantitative score systems is time-consuming, expensive, labor intensive, can thus be prone to variability, and inconsistency even for highly trained pathologists. To overcome these shortcomings, we have applied Deep convolutional neural networks with a transfer learning approach, a key tool in image analysis, to automate the scoring process. First, we trained a binary classifier which accurately separated healthy from damaged tissue. This was then refined into a 3-class classifier which distinguished between high, medium, and low damage. Finally, a series of regression models was trained for scoring seven features that reflect the extent of lung damage. The classification models have shown up to 84% accuracy and trained regression models demonstrated a correlation up to 94% with the ground truth obtained from the aggregation of multiple human scorers. This was comparable to the performance of highly experienced individual human scorers.

HealthNet: Machine Learning For Prediction of Cystic Fibrosis Severity
Track: MLCSB
  • Manasvi Pinnaka, Basis Independent Silicon Valley, United States
  • Eric Cheek, University of Michigan, United States
  • Brianna Chrisman, Stanford University, United States


Presentation Overview: Show

Cystic fibrosis patients often develop lung infections because of the presence of thick and sticky mucus that fills their airways. The presence of this thick mucus prevents the lungs from filtering out certain dominant bacterial types, making patients highly susceptible to infections that can range anywhere in severity from mild to life-threatening. These infections can cause great distress for patients as it becomes harder for patients to breathe and increases the chance of mortality by respiratory failure. It is important to be able to track the progression or regression of cystic fibrosis to determine the best course of treatment. Thus, this project focuses on the use of an AI model to examine the microbiology of cystic fibrosis patients and predict the condition or stage of lung function in the future, as a way to guide doctors with their treatment plan. Due to the limited amounts of publicly available patient data, we used all of the data in the training and testing of our machine learning algorithms initially and then tried a 50% training, 10% validation, and 40% testing split. Our results show that with relatively simple models (cubic polynomials), we can predict FEV1 from statistically significant bacteria sequences within 98% accuracy when training on sufficiently large samples.

Machine Learning for Aptamer Design
Track: MLCSB
  • Akriti Jain, TCS Research (Life sciences division), India
  • Rajgopal Srinivasan, TCS Research (Life sciences division), India


Presentation Overview: Show

Single stranded short oligonucleotide sequence known as aptamer can bind to specific target. Systematic evolution of ligands by exponential enrichment (SELEX) is an in-vitro selection method for identifying aptamers. In our study, we try to learn models from round 2 particle display data of neutrophil gelatinase-associated lipocalin (NGAL) protein as a target protein. While a previous method used deep neural networks to build such models, we attempt to learn comparable models using well understood features, that can then identify the features that are necessary for good binding. We begin by identifying 20 hexamer sequences that are over-represented in the better binding sequences from round 2 data followed by using ViennaRNA to also compute the un-paired probability and additional sequence features using iFeatureOmega. Combining a genetic algorithm approach with our machine learned models we were able to generate novel aptamer sequences having high binding affinity to the target molecule. This study has shown that machine-learning models trained on round 2 data, with appropriate feature selection can identify high binding aptamers from round 2 data.

MLcps: Machine Learning Cumulative Performance Score for Classification Problems
Track: MLCSB
  • Ali Hashemi Gheinani, Boston Childrens Hspital, United States
  • Akshay Akshay, Functional Urology Research Group, Department for BioMedical Research DBMR, University of Bern, Switzerland, Switzerland
  • Masoud Abedi, Department of Medical Data Science, Leipzig University Medical Centre, 04107 Leipzig, Germany, Germany
  • Navid Shekarchizadeh, Department of Medical Data Science, Leipzig University Medical Centre, 04107 Leipzig, Germany, Germany
  • Fiona C. Burkhard, Department of Urology, Inselspital University Hospital, 3010 Bern, Switzerland, Switzerland
  • Mitali Katoch, Institute of Neuropathology, Universitätsklinikum Erlangen, Erlangen, Germany, Germany
  • Alex Bigger-Allen, Harvard Medical School, Boston, Department of Surgery MA, USA, United States
  • Rosalyn M. Adam, Harvard Medical School, Boston, Department of Surgery MA, USA, United States
  • Katia Monastyrskaya, Functional Urology Research Group, Department for BioMedical Research DBMR, University of Bern, Switzerland, Switzerland
  • Mustafa Besic, Functional Urology Research Group, Department for BioMedical Research DBMR, University of Bern, Switzerland, Switzerland


Presentation Overview: Show

Numerous performance metrics have been developed for classification problems, so it is difficult to choose the appropriate one since each of them represents a particular aspect of the model. In medical datasets are often unbalanced and in situations where the prevalence of a disease is low or the collection of patient samples is difficult, deciding on an appropriate metric for performance evaluation of a ML model becomes quite challenging. The most common approach to this problem is to measure multiple metrics and compare them to determine the best performing ML model. However, comparing multiple metrics is tedious and prone to bias due to user preferences. Feature selection algorithms, which are powerful tools for biomarker discovery, also require a scoring metric for model evaluation. However, these algorithms can only use one evaluation metric at a time while searching for a subset of important features, and thus ultimately produce suboptimal results. Here, we propose a new metric called Machine Learning Cumulative Performance Score (MLcps) as a Python package which combines multiple precomputed performance metrics into one metric that preserves the essence of all precomputed metrics for a given model. The results show that MLcps provides a comprehensive picture of overall model robustness.

Pipeline for automatic segmentation of epithelial cells in buccal swabs slides with custom CellPose-based model
Track: MLCSB
  • Oleksandr Skorobohatov, Institute of Molecular Biology and Genetics, Ukraine
  • Dmytro Horyslavets, Institute of Molecular Biology and Genetics, Ukraine
  • Yaroslav Ryndyk, Institute of Molecular Biology and Genetics, Ukraine
  • Pavlo Areshkov, Institute of Molecular Biology and Genetics, Ukraine
  • Olena Romantsova, Institute of Molecular Biology and Genetics, Ukraine
  • Charlotte Oliver, North West Anglia NHS Foundation Trust, United Kingdom
  • Anna Paterson, Cambridge University Hospitals NHS Foundation Trust, United Kingdom
  • Wei Cope, Cambridge University Hospitals NHS Foundation Trust, United Kingdom
  • Emily Joslin, Cambridge University Hospitals NHS Foundation Trust, United Kingdom
  • Emily O'Dea, Cambridge University Hospitals NHS Foundation Trust, United Kingdom
  • Jon Teague, Wellcome Sanger Institute, United Kingdom
  • Peter Clapham, Wellcome Sanger Institute, United Kingdom
  • Moritz Przybilla, Wellcome Sanger Institute, United Kingdom
  • Mykhailo Tukalo, Institute of Molecular Biology and Genetics, Ukraine
  • Inigo Martincorena, Wellcome Sanger Institute, United Kingdom
  • Alina Frolova, Institute of Molecular Biology and Genetics, Ukraine


Presentation Overview: Show

Collecting buccal swabs is one of two most common non-invasive oral sampling methods used for medical research. With the development of high-precision protocols, such as nanorate sequencing NanoSeq), a duplex sequencing protocol with error rates of less than five errors per billion base pairs in single DNA molecule, it is now possible to use buccal swabs to study somatic mutations. At the same time buccal swabs microscopy image data can be used during the quality control step to assess possible contamination of different cell types important for epigenetic studies in general. As the amount of data may approach hundreds of high-resolution slides, the need for an automated cell segmentation and classification pipeline arises. Thus, we developed the Automated Buccal Swabs Cells Recognition (ABSCR) tool, an efficient and scalable Python library, where we address following problems: the need to reliably determine the count of epithelial cells in the sample; the need to scale up the procedure to much higher volumes, including automating the pathology work such as manual detection of cells; a lack of well established and systematised capture, management and storage of spatial data, with the required meta-data.

Recommendation system for scientific tools and workflows in Galaxy using Transformers
Track: MLCSB
  • Anup Kumar, University of Freiburg, Germany
  • Björn Grüning, University of Freiburg, Germany
  • Rolf Backofen, University of Freiburg, Germany


Presentation Overview: Show

Galaxy is a web-based open-source platform for executing tools and workflows to perform scientific analyses. Researchers can use thousands of high-quality tools and workflows for their respective analyses in Galaxy. A tool recommender system shows a collection of tools, predicted by using a deep learning model, that can be used to enhance analysis.

In this work, a tool recommender system is upgraded by training a transformer-based neural network on sequences of tools extracted from workflows available on Galaxy Europe. The transformer-based neural network achieves faster convergence and a lower model usage time and shows a better generalisation that goes beyond training workflows than the older tool recommender system on Galaxy Europe that trained a recurrent neural network (RNN).

More robust tool recommendation model with a significantly lower usage time than RNN would assist researchers in creating scientifically valid workflows and exploratory data analysis in Galaxy. Additionally, the ability to train faster than RNN imparts more scalability for training on large datasets. To our knowledge, this is a novel usage of transformers to create a recommendation engine for scientific workflows. Open-source scripts to create the recommendation model are available under the MIT licence at https://github.com/anuprulez/galaxy_tool_recommendation_transformers

regGPT: Integrating autoregressive DNA language models and supervised models to design cell type specific regulatory elements
Track: MLCSB
  • Avantika Lal, Genentech, United States
  • Tommaso Biancalani, Genentech, United States
  • Gokcen Eraslan, Genentech, United States


Presentation Overview: Show

Designing cell-type-specific cis-regulatory DNA elements is a challenging task with numerous potential therapeutic applications, including gene and cell therapy. Autoregressive language models, such as Generative Pre-trained Transformer (GPT), are a class of expressive generative models that can produce realistic content in natural languages. In this study, we employed autoregressive language models, trained on human regulatory elements, in conjunction with a series of supervised models, trained on various genomic assays, to design realistic and cell-type-specific regulatory elements in an interpretable manner. Our approach allows designing enhancers with desired properties such as high activity in some cell types and no activity in others. We show that the synthetic enhancers designed by our method mimic the sequence and motif composition of cell type-specific genomic enhancers. For example, we identified a SPI-C transcription factor binding site motif which is a known regulator in immune cells in the synthetic enhancers that are predicted to be microglia-specific. The overall framework thus facilitates the design of realistic sequences while providing valuable insights into the grammar of the cis-regulatory code.

Uncover spatially informed biological axes by identifying shared variations underlying single-cell spatial transcriptomics with stCCA
Track: MLCSB
  • Nanxi Guo, Biostatistics and Informatics PhD Program, Center for Health Artificial Intelligence, University of Colorado, United States
  • Juan Vargas, Center for Health Artificial Intelligence, MPH Biostatistics, University of Colorado, United States
  • Revanth Krishna, Center for Health Artificial Intelligence, MPH Biostatistics, University of Colorado, United States
  • Douglas Fritz, Center for Health Artificial Intelligence, Medical Scientist Training Program, University of Colorado, United States
  • Fan Zhang, Center for Health Artificial Intelligence, Division of Rheumatology, University of Colorado, United States


Presentation Overview: Show

The recent advancement of spatial transcriptomics (ST) technologies has enabled characterization of gene expression patterns and spatial information, advancing our understanding of cell lineages and their organization within diseased tissues. Several ST data analytical approaches have been proposed, but effectively utilizing spatial information to unveil the shared variation with gene expression space remains a computational challenge. Here, we introduce stCCA, a multi-view representation learning method, based on sparse canonical correlation analysis to jointly analyze spatial information and gene expression in a scalable manner, followed by a data-driven statistical framework to measure the goodness of fit for biological axes-specific genes. We first demonstrated that our method is scalable to generate robust results insensitive to model parameters (e.g., sparsity constraint). Then, we applied stCCA to ST data acquired by 10x Visium and Slide-seqV2 from human and mouse brain regions, revealing distinct biologically meaningful gradients and spatially informed cell type clusters. Through benchmarking our results with ground truth manual annotations, we demonstrated that our stCCA achieves high clustering accuracy measured by adjusted rand index (ARI). Hence, stCCA is a generalized method to identify both gradually differentiated cell lineages and spatially informed distinct clusters in complex tissues, providing key insights into disease pathogenesis.

Unlocking Insights from Spatial Transcriptomics with Large Language Models
Track: MLCSB
  • Chao Hui Huang, Pfizer Inc., United States
  • Keith Ching, Pfizer Inc., United States
  • Jadwiga Bienkowska, Pfizer Inc., United States


Presentation Overview: Show

In situ RNA capturing presents a unique opportunity to connect transcriptomic data with spatial information, enabling the mapping of gene expression to corresponding anatomical structures. This allows researchers to gain a deeper understanding of transcriptional heterogeneity within the context of anatomical and pathophysiological information. As a result, interpreting spatially resolved data enables scientists to glean new insights into the molecular mechanisms underlying complex biological processes.

On the other hand, given the potential of artificial general intelligence, one of the most successful large language models (LLM), Vicuna, incorporates the latest advances in deep learning and natural language processing to enable efficient and accurate interpretation of the data. The model has been fine-tuned using scientific reports of specific topics, e.g., cell-cell interaction, gene ontologies, etc., obtained from PubMed, as the corpus, allowing it to identify patterns and relationships that may not be immediately apparent to human analysts.

In this work, we integrated the power of LLM and advanced machine learning approaches for decoding spatial transcriptomics to provide comprehensive and detailed analyses. The results suggested the development of LLM represents a significant step forward in spatial transcriptomic data analysis and has the potential to greatly enhance our understanding of complex biological processes.

WSI-based prediction of TP53 mutations identifies aggressive disease phenotype in prostate cancer
Track: MLCSB
  • Marija Pizurica, IDLab, Ghent University, Technologiepark-Zwijnaarde 126, Ghent, 9052, Gent, Belgium, Belgium
  • Maarten Larmuseau, IDLab, Ghent University, Technologiepark-Zwijnaarde 126, Ghent, 9052, Gent, Belgium, Belgium
  • Kim Van der Eecken, Department of Pathology, Ghent University Hospital, Corneel Heymanslaan 10, Gent, 9000, Ghent, Belgium, Belgium
  • Louise de Schaetzen van Brienen, IDLab, Ghent University, Technologiepark-Zwijnaarde 126, Ghent, 9052, Gent, Belgium, Belgium
  • Francisco Carrillo-Perez, Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, 1265 Welch Rd, Stanford, 94305-547, CA, USA, USA
  • Simon Isphording, IDLab, Ghent University, Technologiepark-Zwijnaarde 126, Ghent, 9052, Gent, Belgium, Belgium
  • Nikolaas Lumen, Department of Radiotherapy, Ghent University Hospital, De Pintelaan 185, Ghent, 9000, Ghent, Belgium, Belgium
  • Jo Van Dorpe, Department of Pathology, Ghent University Hospital, Corneel Heymanslaan 10, Gent, 9000, Ghent, Belgium, Belgium
  • Piet Ost, Department of Radiotherapy, Ghent University Hospital, De Pintelaan 185, Ghent, 9000, Ghent, Belgium, Belgium
  • Sofie Verbeke, Department of Pathology, Ghent University Hospital, Corneel Heymanslaan 10, Gent, 9000, Ghent, Belgium, Belgium
  • Olivier Gevaert, Department of Biomedical Data Science, Stanford University, School of Medicine, USA
  • Kathleen Marchal, IDLab, Ghent University, Technologiepark-Zwijnaarde 126, Ghent, 9052, Gent, Belgium, Belgium


Presentation Overview: Show

Disease management in prostate cancer (PCa) depends on correctly predicting whether patients will evolve aggressive disease. Although molecular signatures, such as TP53 mutations associate with aggressive PCa, the application of such biomarkers is hampered by the heterogeneity of the primary prostate tumor. In our study (Pizurica et al., accepted) we exploit the potential of using WSIs of the primary prostate tumor as a proxy for profiling TP53 mutations. Hereto deep learning models were trained on publicly available WSIs. Our model has state-of-the-art performance and generalizes well on an independent validation cohort. In addition, the multifocal set up of the validation set allowed showing how the model can indicate reasonably well the lesions within the primary tumor with the highest prevalence of TP53 mutations. In depth molecular analysis of the predictions shows how our model trained on the label TP53 mutation captures a cellular phenotype from WSIs that is representative for aggressive disease, and that is triggered by, but not restricted to TP53 mutations. This indicates that features derived from WSIs have the potential to become the next-generation, in-silico biomarkers for PCa prognosis.

Pizurica., M. et al. WSI based prediction of TP53 mutations identifies aggressive disease phenotype in prostate cancer. Cancer Research, In press.

Self-learning systems in the design of GPCR-targeting drugs
Track: MLCSB
  • Kavita Joshi, University of Warsaw, Poland
  • Matthew Merski, University of Warsaw, Poland
  • Paulina Dragan, University of Warsaw, Poland
  • Dorota Latek, University of Warsaw, Poland


Presentation Overview: Show

A growing number of active compounds for GPCR receptors allows for improving the accuracy of ML-assisted virtual screening. Successful training of machine learning (ML) models requires large and well-described datasets of active compounds [1]. In addition, discrepancies, duplicates, or lack of detail ligand assignment, e.g., allo/ortho, influence performance by noise increasing [1]. Two different ML algorithms were tested against ligand datasets for class A and B GPCRs to show the impact of the dataset composition on the model accuracy [2]. The combined, neural network and gradient boosting machine approach proved useful assistance of structure-based drug discovery for CCR2 and CCR3 receptors. In prior, logistic regression (ROC curves) was used to select the best chemokine receptor model for discrimination between DUD-E decoys and actives [3]. Acknowledgments: National Science Centre in Poland (2020/39/B/NZ2/00584).

[1] M. Mizera, D. Latek. Ligand-receptor interactions and machine learning in GCGR and GLP-1R drug discovery. Int. J. Mol. Sci. 2021, 22(8), 4060.
[2] P. Dragan, M. Merski, S. Wisniewski, S.G. Sanmukh, D. Latek Chemokine Receptors - Structure-Based Virtual Screening Assisted by Machine Learning. Pharmaceutics 2023, 15(2), 516.
[3] Latek D et al. Drug-induced diabetes type 2: In silico study involving class B GPCRs. PLoS ONE 2019 14(1): e0208892.