The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 16, 2024
8:40-10:00
Invited Presentation: Computational dissection of complex human disease
Confirmed Presenter: Andrey Rzhetsky
Track: CAMDA

Room: 520b
Format: In Person
Moderator(s): Joaquin Dopazo


Authors List: Show

  • Andrey Rzhetsky

Presentation Overview:Show

I will cover a collection of interrelated topics in dissection of etiology of complex human diseases, as seen through lens of large-scale medical data analysis. Individual studies that I will cover focus on mosaic of genetic, environmental, and genetic—environmental interaction factors. The studies relied on massive medical records from US, Sweden, Denmark, and Japan, and a battery of modeling approaches.

July 16, 2024
10:40-11:10
Invited Presentation: The Synthetic Clinical Health Records Challenge - Introduction
Confirmed Presenter: Joaquin Dopazo
Track: CAMDA

Room: 520b
Format: In Person

Authors List: Show

  • Joaquin Dopazo

Presentation Overview:Show

Although data protection is necessary to preserve patients’ intimacy, privacy regulations are also an obstacle to biomedical research. An interesting alternative is the use of synthetic patients. However, conventional synthetic patients are useless for discovery given that they are built out of known data distributions. Interestingly, Generative Adversarial Networks (GANs) and related developments have emerged as powerful tools to generate synthetic data in a way that captures relationships between the variables produced even if such relationships were previously unknown. GANs became popular in the generation of highly realistic synthetic pictures but have been applied in many fields, including in the generation of synthetic patients with applications such as medGAN and others.
Two datasets of synthetic patients have been subsequently created for this challenge since CAMDA 2023. Both datasets were generated from a real cohort retrieved from the Health Population Database (Base Poblacional de Salud, BPS) at the Andalusian Health System (Spain), by performing a Dual Adversarial AutoEncoder (DAAE) approach and contain data on about 1 million patients.
Two challenges are suggested on both datasets, although any other original analysis you may think will also be welcomed:
1) Finding some strong relationships in diabetes-associated pathologies that allows to predict any pathology before this is diagnosed. Some well-known pathological diabetes consequences, which can be considered relevant endpoints to predict, can be: a) Retinopathy, b) Chronic kidney disease, c) Ischemic heart disease, d) Amputations.
2) Another proposed challenge is the prediction of disease trajectories in diabetes patients

July 16, 2024
11:10-11:30
Predicting Diabetes Complications from Electronic Health Record Visits Using Machine Learning Algorithms
Confirmed Presenter: Daniel Voskergian, Computer Engineering Department, Al-Quds University
Track: CAMDA

Room: 520b
Format: Live Stream

Authors List: Show

  • Daniel Voskergian, Daniel Voskergian, Computer Engineering Department
  • Malik Yousef, Malik Yousef, Zefat Academic College
  • Burcu Bakir-Gungor, Burcu Bakir-Gungor, Department of Computer Engineering

Presentation Overview:Show

This study employed a novel approach to feature engineering, utilizing XGB feature selection combined with various supervised machine learning algorithms, including Random Forest, XGBoost, LogitBoost, AdaBoost, and Decision Tree, to develop predictive models for four complications of diabetes mellitus: retinopathy, chronic kidney disease, ischemic heart disease, and amputations. These models were built on synthetic electronic health records generated by dual-adversarial autoencoders, representing nearly 1 million synthetic patients for each of the two datasets used. These synthetic patients were derived from an authentic cohort of 979,308 and 984,414 individuals with diabetes, extracted from the Health Population Database (Base Poblacional de Salud, BPS) within the Andalusian Health System in Spain. The models considered variables such as age range and chronic diseases occurring during patient visits from the onset of diabetes. The final models, tailored to each complication, achieved an accuracy between 69% and 77% and an AUC between 77% and 84%. Notably, XGBoost and Random Forest demonstrated the best overall prediction performance, highlighting the effectiveness of our feature engineering and selection approach in enhancing model accuracy and robustness.

July 16, 2024
11:30-11:50
Cluster-based machine learning prediction of diabetes complications
Confirmed Presenter: Daniel Santana-Quinteros, Universidad Nacional Autónoma de México, Mexico
Track: CAMDA

Room: 520b
Format: Live Stream

Authors List: Show

  • Daniel Santana-Quinteros, Daniel Santana-Quinteros, Universidad Nacional Autónoma de México
  • Mario Rodriguez-Moran, Mario Rodriguez-Moran, Amphora Health
  • Joaquin Tripp, Joaquin Tripp, Amphora Health
  • Diana Colín-Ayala, Diana Colín-Ayala, Universidad Michoacana de San Nicolas de Hidalgo
  • María Arroyo-Perez, María Arroyo-Perez, Universidad Michoacana de San Nicolas de Hidalgo
  • Andrés Anguiano-Peña, Andrés Anguiano-Peña, Universidad Michoacana de San Nicolas de Hidalgo
  • Pedro Salas, Pedro Salas, Universidad Michoacana de San Nicolas de Hidalgo
  • Juan González-Tapia, Juan González-Tapia, Universidad Michoacana de San Nicolas de Hidalgo
  • Roxana Villanueva-Calderón, Roxana Villanueva-Calderón, Universidad Michoacana de San Nicolas de Hidalgo
  • Alejandro Solorio-Solorio, Alejandro Solorio-Solorio, Universidad Michoacana de San Nicolas de Hidalgo
  • Liliana Solorio-Cázares, Liliana Solorio-Cázares, Universidad Michoacana de San Nicolas de Hidalgo
  • Axel Quiroz-Ávalos, Axel Quiroz-Ávalos, Universidad Michoacana de San Nicolas de Hidalgo
  • Ana Escalera-Doming

Presentation Overview:Show

Background: Type 2 Diabetes Mellitus (T2D) is a prevalent metabolic disorder characterized by hyperglycemia due to defects in insulin secretion or action. T2D often leads to severe complications, including cardiovascular diseases, nephropathy, retinopathy, and neuropathy. This study explores the feasibility of employing cluster-based machine learning techniques to predict diabetes complications.
Methods: We utilized synthetic patient data generated by CAMDA 2023 (Spain) and real-world data from the DiabetIA database (Mexico), analyzing records of 997,657 patients in total. Data transformation involved converting JSON format to tabular form, cleaning sex information, and propagating chronic conditions across subsequent visits. Machine learning models, including Support Vector Machines (SVM) and Neural Networks (NN), were trained on stratified datasets to predict the onset of diabetic complications. Clustering techniques such as UMAP and BIRCH were employed to group patients by comorbidities.
Results: The cluster-based machine learning models demonstrated potential in classifying diabetic complications. By analyzing patient data, the models identified distinct clusters of patients with similar comorbidities and disease trajectories. The classification processes gave us an area under the curve of 0.59 for NN and 0.56 for SVM at next year prediction.
Discussion: Cluster analysis can effectively enhance the understanding of T2D by revealing the interplay between various comorbidities and their impact on disease progression. The integration of advanced predictive models within a precision medicine framework promises more personalized and proactive healthcare interventions, ultimately improving patient outcomes and optimizing healthcare resources.

July 16, 2024
11:50-12:20
Statistical Measures for the Evaluation of Clustering Methods on Single Cell Data
Confirmed Presenter: Owen Visser, University of Florida, United States
Track: CAMDA

Room: 520b
Format: In Person

Authors List: Show

  • Owen Visser, Owen Visser, University of Florida
  • Somnath Datta, Somnath Datta, University of Florida

Presentation Overview:Show

The growing efficiency of single-cell sequencing technology has provided biologists with ample cells to identify and differentiate, often through clustering. Heuristic approaches for clustering method choice have become more prevalent and could lead to inaccurate reports if statistical evaluation of the resulting clusters is omitted. During the advent of microarray data, a similar dilemma was addressed in literature through the provision of supervised and unsupervised measures, which were evaluated through Rank Aggregation. In this paper, these measures are adapted into the single-cell framework through a leave-one-out approach. Additionally, a scheme was created to utilize the information of cluster sizes by using their ranking to assign importance to the aggregation of methods, resulting in one table of methods ranked by cluster sizes. To demonstrate the ensemble of measures and scheme, five benchmark single-cell datasets were clustered with various methods at appropriate cluster sizes. We show that through rank aggregation and our importance scheme, our adapted measures select clustering methods that perform better at cluster sizes associated with true biological groups compared to those selected through traditional measures. For four of the five datasets and with internal measures alone, the rank aggregation scheme could correctly identify methods that performed the best at cluster sizes that match the original biological groups. We plan to package this ensemble of measures in the hopes to provide others with a tool to identify the best performing clustering methods and associated sizes for a variety of datasets.

July 16, 2024
14:20-14:25
The Anti-Microbial Resistance Prediction Challenge - Introduction
Track: CAMDA

Room: 520b
Format: In Person

Authors List: Show

  • Paweł Łabaj
July 16, 2024
14:25-14:55
The Antimicrobial Resistance Prediction Challenge
Confirmed Presenter: Alper Yurtseven, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbruecken
Track: CAMDA

Room: 520b
Format: Live Stream

Authors List: Show

  • Alper Yurtseven, Alper Yurtseven, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)
  • Guangyi Chen, Guangyi Chen, HIPS
  • Olga V. Kalinina, Olga V. Kalinina, HIPS

Presentation Overview:Show

Antimicrobial Resistance (AMR) is an urgent threat to human health worldwide as microbes have developed resistance to even the most advanced drugs. In this year’s CAMDA challenge, we focused on predicting AMR status of 1820 bacterial strains that belong to 7 different species (Campylobacter jejuni, Campylobacter coli, Escherichia coli, Klebsiella pneumoniae, Neisseria gonorrhoeae, Pseudomonas aeruginosa, Salmonella enterica) with machine learning.

July 16, 2024
14:55-15:15
Machine learning models for AMR prediction
Confirmed Presenter: Jaime Salvador López Viveros, CCM UNAM, Mexico
Track: CAMDA

Room: 520b
Format: Live Stream

Authors List: Show

  • Adriana Haydeé Contreras Peruyero, Adriana Haydeé Contreras Peruyero, CCM UNAM
  • Yesenia Villicaña Molina, Yesenia Villicaña Molina, CCM UNAM
  • Nelly Sélem Mojica, Nelly Sélem Mojica, CCM UNAM
  • Francisco Santiago Nieto de la Rosa, Francisco Santiago Nieto de la Rosa, CCM UNAM
  • Victor Muñiz Sánchez, Victor Muñiz Sánchez, CIMAT MTY
  • Lilia Leticia Ramírez Ramírez, Lilia Leticia Ramírez Ramírez, CIMAT
  • Anton Pashkov, Anton Pashkov, ENES Morelia UNAM
  • Mariel Guadalupe Gutiérrez Chaveste, Mariel Guadalupe Gutiérrez Chaveste, CCM UNAM
  • Jaime Salvador López Viveros, Jaime Salvador López Viveros, CCM UNAM
  • Johanna Atenea Carreón Baltazar, Johanna Atenea Carreón Baltazar, CCM UNAM
  • Luis Raúl Figueroa Martínez, Luis Raúl Figueroa Martínez, CCM UNAM
  • Ronald Cardenas Catota, Ronald Cardenas Catota, CIMAT
  • Alejandro Sierra Conde, Alejandro Sierra Conde, CIMAT
  • Fernando Fontove Herrera, Fernando Fontove Herrera, C3
  • Diana Barcelo, Diana Barcelo, CINVESTAV
  • Miguel Calderon León, Miguel Calderon León, CCM UNAM
  • Shaday Guerrero Flores, Shaday Guerrero Flores, CCM UNAM
  • César Aguilar Martínez, César Aguilar Martínez, Purdue University
  • Kotaro Hata, Kotaro Hata, CIMAT

Presentation Overview:Show

Each year, the Community of Interest Critical Assessment of Massive Data Analysis (CAMDA) presents various challenges related to massive data analysis and life sciences data. One of this year's challenges addresses the problem of predicting antimicrobial resistance in isolated samples. We conducted different analyses of the data using methods such as pangenomes and RGI to obtain data frames with counts of similar genes in gene families and counts of AMR gene families. We then applied various machine learning (ML) models: some to predict resistance-susceptibility and others to predict the amount of antibiotic needed to classify the sample. A wide variety of preprocess and dimensionality reduction methods, together with supervised and unsupervised ML models were used, yielding the best F1 scores ranging from 0.76 to 0.96, with the best result obtained with logistic regression with L1 regularization.

July 16, 2024
15:15-15:35
Proceedings Presentation: Biomarker identification by interpretable Maximum Mean Discrepancy
Confirmed Presenter: Dexiong Chen, Max Planck Institue of Biochemistry, Germany
Track: CAMDA

Room: 520b
Format: Live Stream

Authors List: Show

  • Michael Adamer, Michael Adamer, ETH Zürich
  • Sarah Brüningk, Sarah Brüningk, ETH Zürich
  • Dexiong Chen, Dexiong Chen, Max Planck Institue of Biochemistry
  • Karsten Borgwardt, Karsten Borgwardt, Max Planck Institue of Biochemistry

Presentation Overview:Show

Motivation:In many biomedical applications, we are confronted with paired groups of samples, such as treated vs. control. The aim is to detect discriminating features, i.e. biomarkers, based on high-dimensional (omics-) data. This problem can be phrased more generally as a two-sample problem requiring statistical significance testing to establish differences, and interpretations to identify distinguishing features. The multivariate maximum mean discrepancy (MMD) test quantifies group-level differences, whereas statistically significantly associated features are usually found by univariate feature selection. Currently, there are few general-purpose methods that simultaneously perform multivariate feature selection and two-sample testing.\newline
Results: We introduce a sparse, interpretable, and optimised MMD test (SpInOpt-MMD) that enables two-sample testing and feature selection in the same experiment. SpInOpt-MMD is a versatile method and we demonstrate its application to a variety of synthetic and real-world data types including images, gene expression, and text data. SpInOpt-MMD is effective in identifying relevant features in small sample sizes and outperforms other feature selection methods such as SHapley Additive exPlanations (SHAP) and univariate association analysis in several experiments.

July 16, 2024
15:35-15:45
CAMDA Trophy ceremony
Track: CAMDA

Room: 520b
Format: Live Stream

Authors List: Show

  • David Kreil
July 16, 2024
15:45-15:50
CAMDA summary and closing remarks
Track: CAMDA

Room: 520b
Format: Live Stream

Authors List: Show

  • Wenzhong Xiao