CAMDA

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CEST
Wednesday, July 26th
10:30-10:40
Welcome
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • David Kreil
10:40-11:40
Invited Presentation: How can large-scale genomics be used to manage antimicrobial resistance in non-clinical (‘One-Health’) settings?
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • Edward Feil, University of Bath, United Kingdom
11:40-12:00
The Anti-Microbial Resistance Prediction and Forensics Challenge - Introduction
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • Paweł Łabaj
12:00-12:20
Antimicrobial Resistance Prediction and Forensics
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • Amay Ajaykumar Agrawal, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS) / Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany, Germany
  • Guangyi Chen, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS) / Helmholtz Centre for Infection Research (HZI), Saarland University, Saarbrücken, Germany, Germany
  • Olga V. Kalinina, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS) / Helmholtz Centre for Infection Research (HZI), Faculty of Medicine, Center for Bioinformatics, Saarland University, Saarbrücken, Germany, Germany


Presentation Overview: Show

Antimicrobial Resistance (AMR) is an urgent threat to human health worldwide as microbes have developed resistance to even the most advanced drugs. In this year’s CAMDA challenge, we focused on exploring the metagenomic surveillance data from a selection of 144 isolates from six US cities (Baltimore, Denver, Minneapolis, New York, Sacramento, and San Antonio) provided by MetaSUB International Consortium. We found that the AMR marker genes identified from the Metagenomic data could be used to distinguish different city origins. Given the query AMR markers, we successfully identified the city of origin as New York.

13:50-14:30
Geolocation of Antimicrobial Resistance Markers in Metagenomic Surveillance Data
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): David Kreil

  • Shlomo Geva, Queensland University of Technology, Australia
  • Colin Wendt-Thorne, Queensland University of Technology, Australia
  • Stephen Bent, CSIRO, Australia
  • Timothy Chappell, Queensland University of Technology, Australia
  • James Hogan, Queensland University of Technology, Australia
  • Dimitri Perrin, Queensland University of Technology, Australia


Presentation Overview: Show

The CAMDA2023 Anti-Microbial Resistance Prediction and Forensics Challenge features resistance profiles of clinical isolates as well as environmental meta-genomics sequences. The goal is to identify resistance conferring genes in read samples collected from urban transport locations, and from metadata (AMR markers) associated with isolates collected in a hospital. The specific goal is to predict the city where the hospital is located. In this paper we describe KISS, a novel implementation of a tool for the filtering of reads against a reference database, and the prediction of the location of the hospital. The abundance of reference sequences from the CARD database in the environmental meta-genomics samples collected in various US cities is used to deduce the hospital location. A prototype implementation is compared to Bowtie2, and produces comparable results in less time.

14:30-14:50
Detecting Bacteriophages Associated with Antimicrobial Resistance in the Presence of Confounding Factors
Room: Salle Roseraie 1/2
Format: Live-stream

Moderator(s): David Kreil

  • Shoumi Sarkar, University of Florida, United States
  • Samuel Anyaso-Samuel, University of Florida, United States
  • Somnath Datta, University of Florida, United States


Presentation Overview: Show

Antimicrobial resistance (AMR) poses a severe global threat, with several genetic and non-genetic factors aiding in its spread. While bacteriophages (phages) are implicated in the dissemination of antibiotic resistance genes (ARGs), their role remains uncertain. This study focuses on the significance of phages, alongside other environmental factors, in ARG spread. Variables are grouped into blocks, and multi-block partial least squares regression is employed to explore variable and block importance scores. Our study identifies phages as the most critical block of variables, while environmental factors known to contribute to ARG spread are among the globally important variables. This underscores the need for further investigation into the role of phages and suggests that other variables may act as confounders. We control for the effect of environmental factors and model the phage abundances on ARG abundances for several antibiotic classes. Lists of phage importance scores are generated, and a consensus list is obtained using weighted rank aggregation. Additionally, we aim to incorporate bacterial abundances as genetic confounders, and develop a measure to obtain variable significance for phages, similar to p-values. Clinical literature corroborates the top phages selected through our approach, validating our method as a viable alternative to laboratory detection methods.

14:50-15:30
Antimicrobial Resistance in Diverse Urban Microbiomes: Uncovering Patterns and Predictive Markers
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): David Kreil

  • Rodolfo Toscan, Jagiellonian University - Małopolska Centre of Biotechnology, Poland
  • Wojciech Lesinski, University of Bialystok - Institute of Computer Science, Poland
  • Piotr Stomma, University of Bialystok - Institute of Computer Science, Poland
  • Agnieszka Golinska, University of Bialystok - Institute of Computer Science, Poland
  • Witold Rudnicki, University of Bialystok - Institute of Computer Science, Poland


Presentation Overview: Show

A comprehensive study was conducted using whole- genome shotgun sequencing technology to examine the micro- biomes of urban environments across different cities worldwide. A subset of the collected samples underwent quality control pro- cedures, and the resistome profile, which represents the collection of AMRs, was determined. Various annotation and statistical techniques were employed, including PCA, MCV, MDFS and RF prediction algorithm. We have also carried out clustering, which gave results consistent with the classification. The main objective was to identify AMRs, which play a crucial role in characterizing the origins of the sampled urban microbiomes. This would aid in determining the geographic location for the isolates, provided by the CAMDA 2023. Although a significant number of AMRs were detected in the urban dataset, only a small subset corresponded to the AMRs associated with the provided isolates. This finding suggests that: either 1) the sequencing depth of the urban samples was insufficient, 2) the isolated species were not dominant in the urban dataset, or 3) the classification methods were limited by incomplete reference databases. Nonetheless, based on the analysis, a subset of cities (Auckland, New York and Tokyo) was identified as potential candidates for the origin of the isolates.

16:00-16:30
Invited Presentation: Data diversity in Antimicrobial Resistance (AMR)
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Derry Mercer


Presentation Overview: Show

Antimicrobial resistance (AMR) is among the top 10 global health threats according to the World Health Organization. AMR has been described as the ‘silent pandemic’ or the ‘overlooked pandemic’ in this post-COVID-19 world. It has been predicted that by 2050, 10 million lives a year are at risk due to the rise of AMR, a cumulative 100 trillion USD of economic output is at risk due to the rise of AMR and the death toll could be a staggering one person every three seconds.

There are many types and sources of data describing AMR. These include prospective data forecasting a global catastrophe, to current/recent data from point prevalence surveys and other sources and even historical data regarding the origins of AMR. Reliable, accurate, and representative information on the incidence of resistant infections is vital for monitoring the national and global scale of AMR, identifying emerging threats, and evaluating the impact of interventions.

In this presentation, I will describe AMR and how this differs from other cases in which microbes do not respond to antimicrobials, some of the techniques used to detect AMR and some of the data sources available that describe the AMR problem.

16:30-17:10
Antimicrobial Resistance Prediction and Forensics CAMDA 2023
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Nelly Sélem Mojica, CCM, Mexico
  • Miguel Ángel Magaña Lemus, CCM, Mexico
  • Lilia Leticia Ramírez Ramírez, CIMAT, Mexico
  • Mirna Vázquez Rosas Landa, ICMyL, Mexico
  • Eugenio Balanzario, CCM, Mexico
  • Miguel Nakamura Savoy, CIMAT, Mexico
  • Victor Muñiz Sánchez, CIMAT, Mexico
  • Shaday Guerrero Flores, CCM, Mexico
  • Adriana Haydeé Contreras Peruyero, CCM, Mexico
  • Imanol Nuñez Morales, CIMAT, Mexico
  • Antón Pashkov, ENES, Mexico
  • Fernando Fontove Herrera, C3, Mexico
  • José María Ibarra Rodríguez, C3, Mexico
  • Daniel Santana Quinteros, AMPHORA, Mexico
  • Maribel Hernández Rosales, CINVESTAV, México
  • Mario Enrique Carranza Barragán, CIMAT, Mexico
  • Miguel Raggi Pérez, ENES Unidad Morelia UNAM, México
  • Jose Abel Lovaco-Flores, CINVESTAV, Mexico
  • Paula Camila Silva Gomez, Universidad Nacional Autonoma de México, Mexico
  • Andrés Arredondo Cruz, Universidad Nacional Autónoma de México, Mexico
  • José Miguel Calderón León, Unniversidad Nacional Autónoma de México, México
  • Francisco Santiago Nieto de la Rosa, CCM, Mexico
  • Johan Eduardo Pérez Ramírez, UNAM, Mexico
  • Mariel Guadalupe Gutiérrez Chaveste, CCM, Mexico
  • Rafael Pérez Estrada, ENES, Mexico
  • Karina Enriquez Guillén, ENES, Mexico
  • Diana Barceló Antemate, IBT, Mexico
  • Francisco José Villalobos Salcido, UNAM, Mexico


Presentation Overview: Show

Taxonomic and Anti Microbial Resistance (AMR) patterns arise in different cities. Every year the Community of Interest Critical Assessment of Massive Data Analysis (CAMDA) provides a challenge that helps scientists to build capacities and good extensive data practices. We explored microbiome data from 15 cities. Samples from 2016 and 2017 were supplied by MetaSUB, aiming to identify a mysterious city given an AMR pattern. Here we address both 1) the forensic geolocalization challenge, i.e., given a training set to predict the city label of a test set, and 2) Discovering the mysterious city given the AMR profile. These are preliminary results; we will hold a hackathon from 2-7 June to work on the challenges.
Our work is divided into 1)Antibiotic profiling, 2)Preliminary data exploration, 3)Classification algorithms, 4) Variance reduction, and 5) Hypothesis testing. Antibiotic profiling shows NYC as the city with more antibiotic resistance mechanisms. We utilized logistic regression and neural networks for the classification problem. We will expand our analyses by incorporating Support Vector machines, Random Forests, etc. We used Negative Binomial regression to address the variance reduction by identifying differentially abundant OTUs, using its results to diminish the number of OTUs and reduce the sparsity in the dataset.

17:10-17:20
Exploratory analysis of antibiotic microbial resistance and its correlation with codon usage of microbes
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Alejandra Cervera, INMEGEN, Mexico
  • Antonio Neme, UNAM IIMAS, Mexico
  • Alfonso Toriz, PCIC UNAM, Mexico
  • Michelle Mata, PCIC UNAM, Mexico
  • Sergio Martínez, PCIC UNAM, Mexico


Presentation Overview: Show

Antimicrobial resistance (AMR) detection is of medical and social relevance. Several algorithms are able to detect sequences associated to organisms known to present some degree of AMR. In this contribution, we approach the problem from a different perspective. We aim to characterize the codon usage of all samples available from a city. Then, we link the codon usage of all samples in a city to the histogram of the most frequent resistance mechanisms and AMR gene families. We followed an exploratory data analysis path, and in this report, we briefly sketch the steps. As a preliminary result, we have found a discrete correlation between the codon usage and the relative frequency of the most common resistance mechanisms

17:20-17:30
Evaluating the Robustness and Reproducibility of RNA-Seq Quantification Tools
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Fangyun Liu, University of Southern California, United States
  • Brian Nadel, University of Southern California, United States
  • Pelin Icer Baykal, ETH Zurich, Switzerland
  • Serghei Mangul, University of Southern California, United States


Presentation Overview: Show

With the rapid development of bioinformatics tools in recent years, the significance of benchmarking the performance of these tools is increasing. To evaluate the reproducibility of transcriptome quantification tools, we present a novel approach that involves generating computational replicates. This enables us to assess the ability of RNA-Seq quantification tools to tolerate experimental variation and produce consistent results across these replicates. Our proposed approach provides a valuable tool for researchers seeking to evaluate the performance of transcriptome quantification tools and identify those that are most robust in the face of experimental variability. Our preliminary result suggests that our approach is able to effectively assess the ability of RNA-Seq Quantification Tools to produce consistent estimates of gene expressions across replicates with different types of perturbation. We believe that our results will be valuable for the biomedical community by providing insights into the reproducibility of bioinformatics tools and guiding researchers in selecting the appropriate tool for their analysis.

17:30-17:40
A systematic assessment of the completeness of TCR databases across Mus musculus strains.
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Yu Ning Huang, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, United States
  • Yupeng He, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, United States
  • Serghei Mangul, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, United States


Presentation Overview: Show

Immunogenetics databases aid genetic research in disease and drug development research. Lab mice (Mus musculus) are crucial for in-vivo research and constitute most non-human vertebrate data. The IMGT database's completeness in representing diverse mice strains is unclear, leading to disparities in the representation of different strains in immunogenetics databases. We assessed the IMGT database's representation of four lab mice strains (C57BL6, C57BL6/J, BALB/c, and NOD) by analyzing 181 TCR-seq samples from the SRA using MiXCR software. MiXCR aligns TCR reads to the IMGT database and compares them to the reference reads. Results revealed that C57BL/6 mice are more representative of the V gene in the IMGT database (0.66 ± 8.71), while BALB/c mice are more representative of J gene (0.08 ± 1.58). Our study represents the first study to comprehensively evaluate the completeness of the IMGT database for diverse mice strains, and demonstrate that the database is severely incomplete for various mice strains and provide appealing evidence for the urgent need to diversify the databases. We identify underrepresented mice strains in the database and emphasize the importance of diverse immunogenetics databases for understanding the immune responses in different mouse strains.

17:40-17:50
Data Lakehouse to support the developpement of AI models for predicting patient clinical response to targeted and immuno-therapies
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Elodine Coquelet, CEA, France
  • Javier Alfaro, International Center for Cancer Vaccine Science, University of Gdansk, Poland
  • Fabio Massimo Zanzotto, University of Rome Tor Vergata, Italy
  • Catia Pesquita, LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
  • Rohit Kumar, Fundacio Eurecat, Spain
  • Christophe Battail, CEA, France


Presentation Overview: Show

This study was motivated by the difficulty to identify the patients that will react beneficially to anti-tumor treatments, in particular for targeted and immuno-therapies. In particular, there is currently a lack of a database specifically dedicated to support the development of AI models to help doctors in this mission. The Data Lakehouse architecture, which we have implemented with open source Delta Lake technology, brings together the best features of Data Lake and Data Warehouse.

In the context of the European project KATY (GA: 101017453), we prototyped the development of a Data Lakehouse by integrating three research studies that generated molecular profiling data from cohorts of tumor tissues taken from patients included in drug clinical trials. We will present the challenges related to the creation of this data architecture as well as the ongoing developments on data governance and secure access.

This Data Lakehouse will allow three types of access for the KATY consortium: querying with data analytics approaches, targeted data extraction for the development of AI models and feeding a Knowledge Graph to structure experimental data and a priori biological and clinical knowledge.

17:50-18:00
Day 1 closing remarks
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Wenzhong Xiao
Thursday, July 27th
10:00-10:10
Opening
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Wenzhong Xiao

  • Paweł Łabaj
10:10-11:10
Invited Presentation: Predicting medical complications in intensive care units using machine learning
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Wenzhong Xiao

  • Karsten Borgwardt, Max Planck Institute of Biochemistry , Germany


Presentation Overview: Show

The ongoing digitalization of clinical health information creates new opportunities for building early warning systems for medical complications. In particular, the rich information that intensive care units record about critically ill patients can be used to develop machine learning-based early warning systems, for instance, for sepsis or for circulatory failure. In this talk, we will describe these opportunities for big data analysis in medicine, and the challenges in creating these predictive systems.

11:10-11:30
The Synthetic Clinical Health Records Challenge - Introduction
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Wenzhong Xiao

  • Joaquin Dopazo
11:30-11:50
Invited Presentation: Synthetic Clinical Health Records Challenge - the background analysis
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Wenzhong Xiao

  • Carlos Loucera, Andalusian Platform for Computational Medicine, Spain
  • Franciso M. Ortuño, Department of Computer Architecture and Computer Technology, University of Granada, 18011 Granada, Spain., Spain
13:20-14:00
CAMDA 2023 Challenge: Predictions of Pathology before Diagnosis from Electronic Health Record Visits
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • Daniel Voskergian, Computer Engineering Department, Al-Quds University, Palestine, Palestine
  • Burcu Bakir, Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri,Turkey, Turkey
  • Malik Yousef, Zefat Academic College, Israel


Presentation Overview: Show

Our research proposes a novel framework for sequence prediction in healthcare, specifically focusing on identifying and predicting endpoints associated with diabetes using Electronic Health Records (EHRs). By employing feature extraction and engineering techniques, we present a machine learning-based approach using Random Forest to predict pathology outcomes based on patient's visit history. The identified endpoints encompass complications such as Retinopathy, Chronic Kidney Disease, Ischemic Heart Disease, and Amputations. Our study demonstrates the conversion of the Random Forest model into a set of rules for accurate pathology predictions.

14:00-14:10
Future of Synthetic Clinical Health Records challenges
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • Wenzhong Xiao
14:10-14:20
Future of Anti-Microbial Resistance Prediction based challenges
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • Paweł Łabaj
14:20-15:00
Hypothyroidism Genetics: Functional Insights from Gene-Based Association Studies in Large Populations
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Joaqin Dopazo

  • Michal Linial, The Hebrew University of Jerusalem, Israel
  • Amos Stern, The Hebrew University of Jerusalem, Israel
  • Roei Zucker, The Hebrew University of Jerusalemm, Israel
  • Michael Kovalerchik, The Hebrew University of Jerusalem, Israel


Presentation Overview: Show

Hypothyroidism is a common endocrine disorder. The disease is manifested when the thyroid gland fails to produce sufficient thyroid hormones. In many cases, the low levels of thyroid hormones are due to environmental factors. Congenital hypothyroidism (CH) results from thyroid development abnormalities. A large-scale study was conducted to identify functional genes associated with increased or decreased risk for hypothyroidism. The study used the UK-Biobank database, which reports on 13,687 cases of European ancestry with a prevalence of 7.5% and 2.0% for females and males, respectively. Using the gene-based proteome-wide association study (PWAS) method, we identified a ranked list of 77 statistically significant genes. We observed that many of the genes involved in hypothyroidism function in the recognition and response of immune cells, with a strong signature of autoimmunity. Expanding the analysis to additional genetic association protocols (e.g., TWAS, Open Targets Genetics, and coding-GWAS) revealed a complex etiology of hypothyroidism with genes explaining the CH developmental program, autoimmunity, and expression dysregulation in target tissues. No sex-dependent genetic effects were found. The study highlights the importance of applying complementary genome-based association methods to complex diseases. We conclude that an integration of established association methods can improve interpretability and clinical utility.

15:30-15:50
CAMDA Trophy ceremony
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • David Kreil
15:50-16:10
Proceedings Presentation: PPAD: A deep learning architecture to predict progression of Alzheimer’s disease
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • Mohammad Al Olaimat, University of North Texas, United States
  • Fahad Saeed, Florida International University, United States
  • Serdar Bozdag, University of North Texas, United States
  • Jared Martinez, University of North Texas, United States


Presentation Overview: Show

Alzheimer's disease (AD) is a neurodegenerative disease that affects millions of people worldwide. Mild cognitive impairment (MCI) is an intermediary stage between cognitively normal (CN) state and AD. Not all people who have MCI convert to AD. The diagnosis of AD is made after significant symptoms of dementia such as short-term memory loss are already present. Since AD is currently an irreversible disease, diagnosis at the onset of disease brings a huge burden on patients, their caregivers, and the healthcare sector. Thus, there is a crucial need to develop methods for the early prediction AD for patients who have MCI. Recurrent Neural Networks (RNN) have been successfully used to handle Electronic Health Records (EHR) for predicting conversion from MCI to AD. However, RNN ignores irregular time intervals between successive events which occurs common in EHR data. In this study, we propose two deep learning architectures based on RNN, namely Predicting Progression of Alzheimer’s Disease (PPAD) and PPAD-Autoencoder (PPAD-AE). PPAD and PPAD-AE are designed for early predicting conversion from MCI to AD at the next visit and multiple visits ahead for patients, respectively. To minimize the effect of the irregular time intervals between visits, we propose using age in each visit as an indicator of time change between successive visits. Our experimental results conducted on Alzheimer’s Disease Neuroimaging Initiative (ADNI) and National Alzheimer’s Coordinating Center (NACC) datasets showed that our proposed models outperformed all baseline models for most prediction scenarios in terms of F2 and sensitivity.

16:10-16:20
Closing remarks
Room: Salle Roseraie 1/2
Format: Live from venue

Moderator(s): Paweł Łabaj

  • David Kreil