The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 23, 2025
11:20-12:20
Invited Presentation: Genome-based prediction of microbial traits
Confirmed Presenter: Thomas Rattei, University of Vienna, Austria
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Thomas Rattei, Thomas Rattei, University of Vienna

Presentation Overview:Show

The prediction of phenotypic traits from genomic information is an ongoing challenge in computational biology. Although the fundamental principles of information encoding in genomes have been studied since decades and allowed first directed modifications, the expression of phenotypic traits is often the result of complex interactions. Predictive approaches in bioinformatics therefore focus on machine learning from labeled genomic data.

During the last years, we have focused on the computational prediction of microbial phenotypic traits from metagenomic data. These data have been collected on large scale, to explore the diversity and composition of microbial communities and to correlate them with environmental factors (e.g. human health and disease). The prediction of traits for these millions of genomes, based on neural networks that use protein families as features, goes one step further and can be used in first applications.

July 23, 2025
12:20-12:40
Invited Presentation: The Anti-Microbial Resistance Prediction Challenge - Introduction
Confirmed Presenter: Leonid Chidelevitch
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Leonid Chidelevitch

Presentation Overview:Show

The AMR prediction challenge at CAMDA is now in its third year. This year's challenge on predicting AMR quantiatively (MIC values) as well as qualitatively (resistance vs susceptibility) has been developed in conjunction with our CABBAGE project. CABBAGE, which stands for a Comprehensive Assessment of Bacterial-Based AMR prediction from GEnotypes, involves the collection, curation, and exploitation of all the publicly available data on AMR genotypes and phenotypes, not only from databases, but also from individual publications. In this introductory talk I will describe the process by which we arrived at the selected datasets for this year's challenge, discuss other progress we've made on CABBAGE so far, and preview the plans for next year's challenge.

July 23, 2025
12:40-13:00
A Hybrid Pipeline for Feature Reduction, and Ordinal Classification to Predict Antimicrobial Resistance from Genetic Profiles
Confirmed Presenter: Anton Pashkov, ENES Morelia UNAM, Mexico
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Adriana Haydeé Contreras Peruyero, Adriana Haydeé Contreras Peruyero, Centro de Ciencias Matemáticas UNAM Morelia
  • Yesenia Villicaña Molina, Yesenia Villicaña Molina, Centro de Ciencias Matemáticas UNAM Morelia
  • Nelly Sélem Mojica, Nelly Sélem Mojica, Centro de Ciencias Matemáticas UNAM Morelia
  • Francisco Santiago Nieto de la Rosa, Francisco Santiago Nieto de la Rosa, Centro de Ciencias Matemáticas UNAM Morelia
  • Victor Muñiz Sánchez, Victor Muñiz Sánchez, CIMAT MTY
  • Anton Pashkov, Anton Pashkov, ENES Morelia UNAM
  • Johanna Atenea Carreón Baltazar, Johanna Atenea Carreón Baltazar, Centro de Ciencias Matemáticas UNAM Morelia
  • Luis Raúl Figueroa Martínez, Luis Raúl Figueroa Martínez, Centro de Ciencias Matemáticas UNAM Morelia
  • Evelia Lorena Coss Navarrete, Evelia Lorena Coss Navarrete, LIIGH
  • César Augusto Aguilar Martínez, César Augusto Aguilar Martínez, Campus Monterrey
  • Varinia López-Ramírez, Varinia López-Ramírez, Tecnológico Nacional de México/ITS de Irapuato
  • Mariana Jaired Ruíz Amaro, Mariana Jaired Ruíz Amaro, ENES León UNAM
  • Johanna Castelá

Presentation Overview:Show

One of the three challenges proposed by the Community of Interest Critical Assessment of Massive Data Analysis (CAMDA) involves predicting antimicrobial resistance or susceptibility for nine bacterial species and four antibiotics of interest. The dataset underwent a cleaning process to remove duplicate IDs with differing MIC values or phenotypes. After data cleaning and preprocessing, three distinct strategies were implemented to perform the predictions. The first strategy focused on predicting minimum inhibitory concentration (MIC) values. We adapted machine learning models for ordinal classification, assuming MIC as an ordinal variable. Two main approaches were used: multiple binary models (logistic regression, CART, random forests) and threshold models (neural networks). Due to the high dimensionality and sparsity of the AMR gene count data, we applied preprocessing techniques including a TF-IDF-like transformation (GF-IAF) and dimensionality reduction (truncated SVD and NMF). In the second strategy, we tested several classical machine learning models to predict the phenotype directly and used a grid search to find the optimal set of parameters, without using MIC values. In the third, we applied dimensionality reduction methods such as TF-IDF, along with a biological filtering step, before predicting the phenotype. Finally, as a preliminary result, ANI and pangenome analyses of E. coli isolates revealed divergence in gene content among some strains. Accessory regions potentially linked to antibiotic resistance suggest that key resistance determinants may lie outside the core genome.

July 23, 2025
14:00-14:40
Predicting Antimicrobial Resistance Using Microbiome-Pretrained DNABERT2 and DBGWAS-Derived Genomic Features
Confirmed Presenter: Jack Vaska, Department of Biomedical Informatics, Stony Brook University
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Jack Vaska, Jack Vaska, Department of Biomedical Informatics
  • Pratik Dutta, Pratik Dutta, Department of Biomedical Informatics
  • Max Chao, Max Chao, Department of Biomedical Informatics
  • Rekha Sathian, Rekha Sathian, Department of Biomedical Informatics
  • Zhihan Zhou, Zhihan Zhou, Department of Computer Science
  • Han Liu, Han Liu, Department of Computer Science
  • Ramana Davuluri, Ramana Davuluri, Department of Biomedical Informatics

Presentation Overview:Show

Antimicrobial resistance (AMR) is an escalating public health threat, especially in hospitals where diverse resistance gene reservoirs have emerged. With the increasing availability of metagenomic and whole-genome sequencing data from AMR pathogens, there is a timely opportunity to develop predictive models. Given the complexity of these genomic datasets, large language models (LLMs) offer a promising approach due to their ability to capture long-range sequence patterns. DNABERT2, an LLM pretrained on diverse DNA sequences, has shown strong performance in various genomic tasks and is well-suited for AMR prediction (Zhou et al., 2023). We present a novel method to predict AMR across nine pathogenic bacterial species treated with four common antibiotics. Four custom DNABERT2 models, pretrained on human microbiome-derived genomic sequences, were fine-tuned on sequences obtained from de novo assembled bacterial genomes. To extract phenotype-associated features, we employed De Bruijn Graph-based Genome-Wide Association Study (DBGWAS) in an alignment-free manner (Jaillard et al., 2018). Statistically significant sequences (p < 0.05) were aligned back to assemblies using BLAST (≥80% identity), and 1,000 bp flanking subsequences were extracted. Resistant samples showed a markedly higher number of BLAST hits than susceptible ones. Data were grouped by antibiotic and each group was fine-tuned using a DNABERT2 model incorporating species and BLAST hit count as additional features. Consensus predictions across sequences achieved 84.5% accuracy and a macro F1 score of 0.84. Our findings demonstrate that resistant bacteria contain distinct genomic features absent in susceptible strains, highlighting the promise of LLM-based methods for AMR prediction.

July 23, 2025
14:40-15:00
The Antimicrobial Resistance Prediction Challenge
Confirmed Presenter: Alper Yurtseven, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbruecken
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Alper Yurtseven, Alper Yurtseven, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)
  • Dilfuza Djamalova, Dilfuza Djamalova, Computational Metagenomics (IBG-5)
  • Marco Galardini, Marco Galardini, Molecular Bacteriology Institute
  • Olga V. Kalinina, Olga V. Kalinina, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)

Presentation Overview:Show

Antimicrobial Resistance (AMR) is an urgent threat to human health worldwide as microbes have developed resistance to even the most advanced drugs. In this year’s CAMDA challenge, we focused on predicting antimicrobial resistance of 5,346 bacterial strains that belong to 9 different species (Acinetobacter baumannii, Campylobacter jejuni, Escherichia coli, Klebsiella pneumoniae, Neisseria gonorrhoeae, Pseudomonas aeruginosa, Salmonella enterica, Staphylococcus aureus, Streptococcus pneumoniae) using two machine learning algorithms.

July 23, 2025
15:00-15:20
Antimicrobial Resistance Prediction via Binary Ensemble Classifier and Assessment of Variable Importance
Confirmed Presenter: Owen Visser, University of Florida, United States
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Owen Visser, Owen Visser, University of Florida
  • Victor Agboli, Victor Agboli, University of Florida
  • Somnath Datta, Somnath Datta, University of Florida

Presentation Overview:Show

Antimicrobial resistance (AMR) presents a growing challenge to global health, driven by antibiotic overuse and the rapid evolution of resistant bacteria. Predicting whether an isolate is resistant or susceptible to a drug remains difficult due to genomic variability. As part of the 2025 CAMDA Challenge, we altered a standard bioinformatic pipeline to preprocess the variable raw sequencing data, and features were derived from strain-specific markers and AMR gene classes. Three machine learning methods which have shown high accuracy in recent AMR prediction research were trained and compiled into an ensemble to predict binary resistance phenotypes for nine bacterial pathogens for four antibiotics. The ensemble performed well across most species, notably achieving 96.8% accuracy for C. jejuni and 98.2% for A. baumannii. Permutation-based variable importance analysis identified relevant resistance genes and strains, such as sulphonamide and aminoglycoside genes and the LAC-4 strain in A. baumannii. These results demonstrate the utility of ensemble models for AMR prediction on large, heterogeneous genomic datasets.

July 23, 2025
15:00-15:20
A Highly Accurate Workflow for Inference of Antimicrobial Resistance from Genetic Data Based on Machine Learning and Global Data Curation
Confirmed Presenter: David Danko, Biotia Inc, United States
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Gabor Fidler, Gabor Fidler, Biotia Inc
  • Heather Wells, Heather Wells, Biotia Inc
  • Ford Combs, Ford Combs, Biotia Inc
  • John Papciak, John Papciak, Biotia Inc
  • Mara Couto-Rodriguez, Mara Couto-Rodriguez, Biotia Inc
  • Sol Rey, Sol Rey, Biotia Inc
  • Tiara Rivera, Tiara Rivera, Biotia Inc
  • Lorenzo Uccellini, Lorenzo Uccellini, Biotia Inc
  • Christopher Mason, Christopher Mason, Biotia Inc
  • Niamh O'Hara, Niamh O'Hara, Biotia Inc
  • Dorottya Nagy-Szakal, Dorottya Nagy-Szakal, Biotia Inc
  • David Danko, David Danko, Biotia Inc

Presentation Overview:Show

Note: This abstract is paired with the prediction submission “Base Model, 2nd Submission (Biotia)” made by user gfidler from team Biotia on May 15, 2025.

We present BIOTIA-DX Resistance, our submission to the CAMDA AMR Challenge. This tool builds off of our clinically validated metagenomic workflow to provide broad domain predictions for antimicrobial resistance from microbial sequencing data. We achieved an F1 score of 84 on the CAMDA challenge test set. Our technique is based on curation of global datasets, machine learning-based predictions from input data, and highly stringent prepreprocessing of input data and databases.

July 23, 2025
15:20-15:40
Invited Presentation: The Gut Microbiome Health Index Challenge - Introduction
Confirmed Presenter: Kinga Zielińska
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Paweł Łabaj


Authors List: Show

  • Kinga Zielińska
July 23, 2025
15:40-16:00
Integrating Taxonomic and Functional Features for Gut Microbiome Health Indexing
Confirmed Presenter: Rafael Pérez Estrada, Centro de Ciencias Matemáticas, UNAM
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Kinga Zielińska


Authors List: Show

  • Shaday Guerrero Flores, Shaday Guerrero Flores, Cinvestav Unidad Irapuato
  • Rafael Pérez Estrada, Rafael Pérez Estrada, Centro de Ciencias Matemáticas
  • Juan Francisco Espinosa Maya, Juan Francisco Espinosa Maya, Centro de Ciencias Matemáticas UNAM
  • Nelly Selem Mojica, Nelly Selem Mojica, Centro de Ciencias Matemáticas de la UNAM
  • David Alberto García Estrada, David Alberto García Estrada, Unidad Genómica Avanzada Cinvestav
  • Orlando Camargo Escalante, Orlando Camargo Escalante, Unidad Genómica Avanzada Cinvestav
  • Mario Jardón Santos, Mario Jardón Santos, Centro de Ciencias Matemáticas UNAM
  • Jose Daniel Chavez Gonzalez, Jose Daniel Chavez Gonzalez, Universidad Autonoma de Guerrero
  • Laura Quetzally Medina Velázquez, Laura Quetzally Medina Velázquez, Universidad Nacional Autónoma de México
  • José Yahir Cabrera Gonzáles, José Yahir Cabrera Gonzáles, Universidad Autonoma de Chiapas
  • José Daniel Sánchez Estrada,

Presentation Overview:Show

Accurate characterization of the gut microbiome is essential for understanding its role in health and disease; however, while current indices such as GMHI and hiPCA rely on taxonomic profiles to associate microbiome composition with health states, they do not consider underlying functional variability. Here, we integrate species-level (MetaPhlAn) and pathway-level (HUMAnN) data from 4,398 samples provided by CAMDA 2025 to understand key organisms and pathways in different groups of diseases and to develop and evaluate composite health indices.
We first built co-occurrence networks, identifyin keystone taxa. We then recalibrated GMHI and hiPCA for both taxonomic and functional data and developed three ensemble models. The best-performing, the Optimized Pathway Ensemble, reached an F1-score of 0.76. We extended GMHI to distinguish between disease groups and tested pairwise classifiers across conditions—including healthy, gastrointestinal, metabolic, psychiatric, and neurological disorders. Additionally, we developed the Gut Microbiome Health Calculator, a web tool for computing and comparing these indices. Our results show that combining taxonomic and functional features enhances classification and reveals biologically relevant patterns in disease.

July 23, 2025
16:40-17:20
Building a Rare-Disease Microbiome Health Index: Integrating Gut Metagenomes, Synthetic PKU EHRs and Rare-Variant Profiles to Forecast Phenylalanine Crises
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Kinga Zielińska


Authors List: Show

  • Khartik Uppalapati, Khartik Uppalapati, RareGen Youth Network 501(c)(3)
  • Bora Yimenicioglu, Bora Yimenicioglu, RareGen Youth Network 501(c)(3)
  • Shakeel Abdulkareem, Shakeel Abdulkareem, RareGen Youth Network 501(c)(3)
  • Adan Eftekhari, Adan Eftekhari, Harvard University

Presentation Overview:Show

Phenylketonuria (PKU) is an autosomal recessive metabolic disorder characterized by deficient phenylalanine hydroxylase activity, leading to episodic neurotoxic elevations in plasma phenylalanine (Phe) despite strict dietary management. However, existing gut health metrics fail to capture rare-disease–specific dysbiosis. In order to address, these concerns, we developed a Rare-Disease Microbiome Health Index (RDMHI) that integrates MetaPhlAn-derived species abundances, HUMAnN functional pathways, synthetic electronic health record timelines, and rare-variant burdens to forecast imminent Phe crises. We curated 4 398 metagenomic profiles from the CAMDA dataset alongside three external PKU cohorts (n < 100), applied centered log-ratio transformation and batch correction, and generated 5 000 patient-month windows via Synthea-augmented GAN models to simulate clinical and laboratory events. Rare-variant burdens for PAH and BH₄-pathway genes were collapsed into gene-level indicators. A LightGBM-DART classifier was trained under nested five-fold, leave-one-dataset-out cross-validation and evaluated by AUROC, AUPRC, and Matthews correlation coefficient with 1 000-sample bootstrap CIs. RDMHI achieved an AUROC of 0.91 (95 % CI 0.88–0.94), and MCC 0.64, outperforming clinical-only (AUROC 0.78; MCC 0.38) and microbiome-only (AUROC 0.81; MCC 0.45) baselines. External validation on 50 registry windows yielded an AUROC of 0.85 (0.81–0.89) and 78 % sensitivity at a 22 % false-positive rate. By outperforming existing gut-health indices (GMHI and hiPCA), RDMHI demonstrates the impact of tailoring health indices to rare diseases and establishes a new standard of microbiome-based prognostic modeling for precision risk stratification in rare metabolic disorders.

July 23, 2025
17:20-17:40
Toward the Development of a Novel and Comprehensive Gut Health Index: An Ensemble Model Integrating Taxonomic and Functional Profiles
Confirmed Presenter: Vincent Mei, University of Florida, United States
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Kinga Zielińska


Authors List: Show

  • Vincent Mei, Vincent Mei, University of Florida
  • Yulin Li, Yulin Li, University of Florida
  • Somnath Datta, Somnath Datta, University of Florida

Presentation Overview:Show

Diseases linked to the gut microbiome have been on the rise, which contributes to the rising cost of healthcare and worsening patient outcomes . Since stool samples provide an accurate representation of the gut microbiome and can be collected frequently and non-invasively, it is of clinical interest to create an index that can accurately classify samples as healthy or non-healthy.
Several indices already exist to assess microbiome health, such as the Gut Microbiome Health Index (GMHI), health index with PCA (hiPCA), and Shannon entropy measures, but their reliance solely on species abundance limits their ability to distinguish between healthy and non-healthy individuals.
To improve upon these indices, we proposed a novel ensemble-based index that integrates both taxonomical and metabolic pathway abundance data from stool samples to predict individual health status.
From the provided data with 1211 species features and 619 pathway features, 61 species and 21 pathways were identified and used to train the ensemble model. The best threshold for the index generated from the ensemble model was selected using Youden’s index, resulting in a balanced accuracy of 0.7234 compared to values below 0.5 for GMHI, hiPCA, and Shannon entropy measures.
Feature importance was also calculated simultaneously with the ensemble model training by permuting one feature at a time, leading to the identification of the 20 most important species and pathways when determining gut microbiome health.

July 23, 2025
17:40-18:00
Topology-Enabled Integration of Taxonomic and Functional Microbiome Profiles Reveals Distinct Subgroups in Healthy Individuals
Confirmed Presenter: Doroteya Staykova, Multicore Dynamics Ltd, New Milton
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Kinga Zielińska


Authors List: Show

  • Doroteya Staykova, Doroteya Staykova, Multicore Dynamics Ltd

Presentation Overview:Show

High-throughput sequencing technologies have enabled detailed taxonomic and functional profiling of the human gut microbiome. However, integrating these diverse, high-dimensional data sources remains a major challenge - particularly in defining robust, cross-modal indicators of gut health - due to significant inter-individual variability observed even within healthy populations. In this study, I applied Topological Data Analysis (TDA) to the CAMDA 2025 Microbiome Challenge dataset to integrate taxonomic and functional profiles from healthy individuals. My primary aim was to establish a baseline for human gut health by identifying microbial patterns within a large, healthy cohort. A cross-modal network representation of over 1,600 microbiome samples was constructed using the Mapper algorithm with PHATE-based topological lenses. The derived topological shape revealed two distinct subgroups within the landscape of the healthy gut microbiome. Subsequent statistical analyses identified characteristic taxonomic and functional signatures associated with each subgroup, demonstrating the utility of TDA in uncovering intrinsic patterns and providing a data-driven framework for more precise stratification of gut health.

July 23, 2025
17:40-18:00
Ensemble-Based Topic Selection for Text Classification via a Grouping, Scoring, and Modeling Approach
Track: CAMDA: Critical Assessment of Massive Data Analysis

Room: 01C
Format: In person
Moderator(s): Kinga Zielińska


Authors List: Show

  • Daniel Voskergian, Daniel Voskergian, Al-Quds University
  • Burcu Bakir-Gungor, Burcu Bakir-Gungor, Abdullah Gul University
  • Malik Yousef, Malik Yousef, Zefat College

Presentation Overview:Show

The exponential growth in scientific literature, especially in biomedical domains, has intensified the need for effective automatic text classification (ATC) systems. TextNetTopics is a recent approach that classifies documents using topic-based features derived from Latent Dirichlet Allocation (LDA), reducing dimensionality while maintaining semantic richness. However, TextNetTopics’ reliance on single topic models introduces performance variability across datasets, limiting its generalizability.

This study introduces ENTM-TS (Ensemble Topic Modeling for Topic Selection), a novel framework that enhances TextNetTopics by integrating multiple topic models through a three-stage Grouping, Scoring, and Modeling (GSM) approach. First, topics are extracted from various models and merged based on semantic similarity to reduce redundancy and generate discriminative topic groups. These groups are then scored using internal and external evaluation strategies, ensuring normalized comparison and identifying top-performing subsets. Finally, a modeling phase iteratively aggregates and evaluates these groups to build an optimal feature set for classification.
ENTM-TS was evaluated on two biomedical text datasets: the DILI dataset and the WOS-5736 dataset of scientific abstracts. Results demonstrate that ENTM-TS consistently meets or exceeds the performance of single-model configurations, improving classification accuracy and reducing variability. This ensemble-based approach not only preserves semantic richness but also ensures robustness across diverse datasets.
ENTM-TS offers a generalizable and interpretable solution for biomedical text mining, with future work aimed at automating parameter selection for greater usability.