July 24, 2025: ISMB/ECCB 2025 Day 5: Highlights and Recap

Thursday, July 24: ISMB/ECCB Day 5 Highlights and Recap

Today was the final day of ISMB/ECCB 2025 and it was a fantastic ending to a great conference!

ISCB and the ISMB/ECCB 2025 Steering Committee wish to thank every attendee for being part of the world's largest conference on bioinformatics and computational biology. We hope you enjoyed your time in Liverpool and the five days of science, networking, and discussion!

Fireside Chat with David Baker

Before our final keynote, ISCB hosted a special fireside chat with 2024 Nobel Laureate David Baker, moderated by Christine Orengo. The conversation highlighted the creative journey behind Baker’s pioneering work in protein design, starting with his early skepticism about computation to developing generative design tools with wide-ranging applications in medicine, sustainability, and technology.

Baker spoke about the power of collaboration—both within his lab and through global partnerships—and shared advice for early-career researchers on building strong, interdisciplinary teams. He emphasized the importance of high-quality data, community-driven infrastructure, and staying grounded in experimental feedback. When asked about the future, Baker expressed optimism about the evolving role of scientists in an increasingly AI-driven landscape, encouraging researchers to stay curious, collaborative, and adaptable.

Keynote Address

In his keynote, “Decoding cellular systems: From observational atlases to generative interventions,” Fabian Theis—the 2025 ISCB Innovator Award winner—traced the evolution of computational cell biology, from mapping cellular identities to modeling and predicting how cells behave, interact, and respond to interventions.

He began with early efforts to build single-cell atlases, sharing how tools like Scanpy, Squidpy, and SpatialData helped unify transcriptomic and spatial data at scale. These frameworks paved the way for deep learning models that could represent cellular states in biologically meaningful latent spaces. Theis emphasized that engineering and data quality remain critical, as do community efforts like scverse that support reproducible, scalable single-cell analysis.

Moving toward virtual cells, Theis introduced foundation models trained across diverse datasets and modalities. His group’s Nicheformer, a transformer-based model, captures spatial and transcriptional relationships across millions of cells and multiple species, enabling tasks like cell-type prediction and niche composition analysis. It outperforms other embeddings and shows promise for generalizable spatial inference.

The keynote culminated with recent work on modeling cellular perturbations using generative AI. Theis presented CellFlow, a flow-matching model for predicting how cells respond to interventions such as gene edits or drug treatments. Together with tools like moscot, which maps unpaired control and perturbed cells using optimal transport, these advances support lab-in-the-loop experimental design—offering a path toward virtual experimentation at scale.

Theis closed by noting that while challenges remain in scaling, evaluation, and computational cost, foundation models offer an unprecedented opportunity to not just observe biology, but to actively shape it.

Session Recaps

Bioinfo-Core

Wednesday, July 23

The bioinfo-core COSI brings together managers and staff working in bioinformatics core facilities around the world. In our session we had a well-rounded and interesting mix of presentations, panel discussions, and breakout groups.

Talks:

Damian Dalle Nogare - Bioimage analysis in the age of AI: lessons and a path forward from a core facility perspective. A thought-provoking keynote regarding some history of the computational imaging field and its journey into a fully AI mode. We can learn a lot from the things they have gone through, and we should work together.

Kübra Narcı - Benchmarking Variant-Calling Workflows: The nf-core/variantbenchmarking Pipeline within the GHGA Framework: Variant callers do all sorts of different things and comparing structural variants is non-trivial. This is an nf-core pipeline to handle this type of comparison that was developed for the GHGA and shared with the world.

Thomas Roder - Assembly Curator: rapid and interactive consensus assembly generation for bacterial genomes – A microbial assembly benchmarking paper (Wick and Holt) basically found no assembler worked well on all assemblies. Tricycler from Ryan Wick is good, but time consuming. The goal of this was to get 80% of the quality of Tricycler in 5% of the time. On github at https://github.com/MrTomRod/assembly-curator

Adam Giess - Long Read Sequencing at Genomics England: They’ve done 12,000 Nanopore Promethion flowcells and shared some of the details of their setup and approach.

Anil S. Thanki - Autonomous Single Cell Transcriptomics Analysis in Persist-seq: Multi-institutional effort to study early tumor environment. Diverse toolset included Kubernetes, Jenkins, Galaxy (workflow executer command line), AWS, and Slack.

Iris Diana Yu - Advancing the Expression Atlas Resources: A Scalable Single-Cell Transcriptomics Pipeline to Facilitate Scientific Discoveries: Expression Atlas and single cell expression atlas – ingest, annotate, curate, and serve data!

Ayushi Agrawal - Mixed effects models applied to single nucleus RNA-seq data identify cell types associated with animal level pathological trait of Alzheimer’s disease: LODopt model applied to single nucleus RNA-seq data. https://github.com/gladstone-institutes/LODopt

Natalie Gill - Optimizing Clustering Resolution for Multi-subject Single Cell Studies: Method of optimizing clustering by splitting PCs into odd and even. https://github.com/gladstone-institutes/clustOpt

Hubert Rehrauer - GEO Uploader: Simplifying the data deposition in the GEO repository: A tool to make it easier for researchers to upload their data to GEO: https://github.com/fgcz/geo-uploader

Carlos Prieto - Enhancing Bioinformatics Workflows with Analytical Visualization Tools: a series of impressive interactive tools and data visualizations – rjsplot, D3GB, looking4clusters, Rvisdiff, RD3plot. https://github.com/BioinfoUSAL

Patricia Carvajal-López - Competency framework profiles to reflect career progression within bioinformatics core facility scientists: A collaboration and community effort led by EBI to develop more robust career definitions and stages for people in core facilities. Publication coming soon in Bioinformatics Advances.

Panels:

The rise of computational imaging

Panelists: Damian Dalle Nogare, Jamie Soul, Syed Murtuza Baker, Emily Johnson
Imaging groups and bioinformatics groups feel themselves coming together, mainly due to spatial transcriptomics. How can these two groups interact and learn from each other to advance science? We saw issues with the software built-in to some commercial platforms in terms of calling cells correctly and more custom solutions are probably needed. Perhaps some things can be learned by the bioinformatics core community and some things will be a collaboration between the two groups, but we need to be clear who does what and who needs what files from who in order to accomplish things. In some ways imaging has transformed into a fully AI field and we can learn from their growing pains.

The practical use of AI in cores
- Panelists: Madelaine Gogol, Ashley Sawle, Mohab Helmy, Ken Brewer
- We discussed training users in the use of genAI (because we as bioinformaticians might be more experienced). Do we still need to teach coding? I think the consensus was at least some for some people, but it’s not really clear how much. How do we keep up with new tools? We need to be intentional about setting aside time for reading papers, experimenting, taking time to try things. Seqera AI was mentioned as a platform for writing NextFlow code (many of the models are not good at it yet). Multiqc has an argument now to generate an AI summary about the quality in the report. Could be used to take a collection of scripts and make it into a NextFlow pipeline with a little effort. University of Queensland fast AI course was mentioned as a potential source for upskilling. There were discussions of hiring in the age of AI with in-person interviews maybe more required and deep questions to check knowledge. Watch out for slopsquatting or bad prompting generating garbage for collaborators.

Breakout groups:

We broke into four breakout groups based on what the people in the room were interested in discussing.

Spatial / Imaging – Imaging and Bioinformatics groups can stay separate but improve communication and make sure it’s clear who does what. Bioinformaticians need to learn to perform common and easy image-based tasks, and to ask for help with the hard ones. We have a lot to learn from imaging groups so short trainings could help us be able to leverage some things for spatial transcriptomics.
AI – When users ask for AI, we need to find out if it’s the right tool and then guide them, otherwise they may pursue other less reputable sources. Upskilling our users in the use of AI could work better with free food. Assigning a few people in the team as AI experts could help the team learn and support users better.
Project Management and tools – Jira and Agile might help manage projects and time. Getting involved early can help avoid bad data. Some teams like strict scoping, others feel it’s not collaborative. Being able to say no is essential, and so is maintaining good relationships.
Reporting / communicating results – Quarto was mentioned as a great solution for updating reports and presentations when something changes. Containers and NextFlow are important for reproducibility, and clinical environments may be more resistant to changes.

CAMDA

Wednesday, July 23

The first day of CAMDA 2025 opened with a striking keynote by Thomas Rattei (University of Vienna), exploring the computational prediction of microbial phenotypic traits using neural networks trained on protein families from large-scale metagenomic datasets, linking microbiome diversity to human health.

The following session on the AMR Prediction Challenge featured diverse approaches to predicting antimicrobial resistance. Leonid Chindelevitch (Imperial College London) discussed the construction of the challenge based on the CABBAGE project compiling a unique curated genotype-phenotype database. Anton Pashkov (UNAM) compared a range of ML strategies integrating taxonomic and functional features. Jack Vaska (Stony Brook University) demonstrated the good performance of ~85% accuracy from DNABERT2 language models with DBGWAS-informed features. Alper Yurtseven (Helmholtz Institute) applied scalable models across the more than 5,000 bacterial strains. Owen Visser (University of Florida) presented an ensemble model, through which he also identified key resistance genes. David Danko (Biotia Inc.) showcased their BIOTIA-DX pipeline, achieving an F1 score of 84%, developing a clinically validated metagenomic workflow.

The second session of the day on the Gut Microbiota Challenge was introduced by Kinga Zielińska (Jagiellonian University), introducing new microbiome health indices that integrate taxonomy and function under the Theatre of Activity concept. Rafael Pérez Estrada (UNAM) developed an ensemble approach combining taxonomic and functional profiles, outperforming existing indices, and presenting a web-based health calculator. Khartik Uppalapati (RareGen) built RDMHI, a rare-disease-specific index for Phenylketonuria, integrating EHR simulations and genetic data for a 91% AUROC . Vincent Mel (University of Florida) introduced an ensemble index combining 61 species and 21 pathways, achieving 72% balanced accuracy. Doroteya Staykova (Multicore Dynamics) used topological data analysis to identify subgroups within healthy microbiomes, offering a stratified framework for gut health assessment.

Together, the first day of CAMDA 2025 highlighted how curated benchmarking data, advanced ML, and innovative integration strategies are driving new advances and insights in AMR prediction and microbiome-based health measures.

Join us tomorrow for a leading keynote by Spiros Denaxas (UCL) discussing the value hidden in electronic health records, as well as contributed talks and a panel discussion on

- The Health Privacy Challenge, presenting an interactive platform for achieving trust and robustness in the generation of privacy-preserving synthetic gene expression datasets

- The Synthetic Clinical Health Records Challenge, providing a rich set of highly realistic Electronic Health Records (EHRs) tracing the diagnosis trajectories, distilling information from 1.2 million real diabetic patients

Find our programme at http://www.camda.info/.

Thursday, July 24

The second day of CAMDA 2025 began with a keynote by Spiros Denaxas (University College London), who explored how electronic health records (EHRs) are reshaping biomedical research, enabling the simultaneous analysis of thousands of conditions while highlighting
the challenges of data sensitivity and privacy.

The Synthetic EHR Challenge followed, with Carlos Loucera (FPS)
describing the generation of synthetic EHRs for one million diabetic
patients using VAEs and GANs. Daniel Voskergian (Al-Quds University)
leveraged these data to predict diabetic complications
using a Grouping–Scoring–Modeling framework, achieving AUCs up to 84%.

The Health Privacy Challenge, part of the ELSA initiative, was
introduced by Antti Honkela (University of Helsinki) and Hakime Öztürk
(EMBL), outlining a Blue Team vs Red Team format focused on
privacy-preserving synthetic data. A panel featuring Honkela, Denaxas,
David Kreil, Wenzhong Xiao, and Joaquin Dopazo debated regulatory and technical aspects of privacy in AI and health data.

Blue Team participants presented a variety of approaches: Andrew Wicks (DKFZ) used NMF with differential privacy for synthetic genomics; Steven Golob (UW Tacoma) assessed SDG performance on RNA-seq; Patrick McKeever (UW) evaluated scRNA-seq generators, finding scDesign2 most effective; and Jules Kreuer (Tübingen) introduced NoisyDiffusion, a privacy-preserving diffusion model with strong predictive accuracy and MIA resilience.

Serghei Mangul (Sage Bionetworks / Univ. of Suceava) gave two talks: one analyzing 6M+ publications to reveal underuse of RNA-seq data in
secondary analyses, and another uncovering fragmented pre-publication data sharing practices, showing that early release
improves citations. Finally, Yuexi Gu (Xi’an Jiaotong Univ.) presented
HI-MGSyn, a multi-granularity hypergraph model for drug synergy
prediction, successfully identifying novel combinations supported by
literature.

These sessions showcased the convergence of synthetic data generation, privacy-preserving AI, and open science to drive innovation in health research.

CAMDA 2025 concluded with the CAMDA Trophy ceremony, where the First Prize was awarded to Anton Pashkov (ENES Morelia, UNAM, Mexico). The Second Prize went to Rafael Pérez-Estrada (Centro de Ciencias Matemáticas, UNAM, Mexico), and the Third Prize to Owen Visser (University of Florida, U.S.A.). An Honourable Mention for
Sustainability was awarded to Serghei Mangul (University of Suceava,
Romania).

Stay tuned for the CAMDA 2026 challenges—visit our website at
www.camda.info and sign up for the low-volume CAMDA COSI announcements mailing list.

HiTSeq

What a day 2 at #HitSeq! The topics were so vast and quite original! First, we began the day with our final keynote speaker, Tobias Marschall from Heinrich Heine University Düsseldorf, who discussed pangenome-based analysis of structural variation. In this talk, he addressed the current limitations of draft pangenomes, including existing gaps and small sample sizes that hinder the detection of rare alleles, and discussed approaches to overcome them. Good news: HGSVC3 is in progress to complete the assembly of 65 new genomes!

Then, our second fabulous sponsor, Oxford Nanopore, gave a promotional talk with Mike Vella, Senior Director of Machine Learning at Oxford Nanopore Technologies, as the designated speaker. According to Mike, base calling is one of the most critical challenges for long-read sequencing data. He introduced us to duplex basecalling and how machine learning is enabling us to solve several issues that were previously almost intractable without machine learning implementation for base calling.

Then we proceeded with our amazing, carefully selected proceedings and abstracts, chosen to showcase at the final session of the HitSeq COSI track. Refreshing presentations included the creation algorithms (ALICE) to accelerate genome assembly, to capture uncertainty in single-cell copy-number calling, to perform assessment of long read data using SQANTI-reads (Netanya Keil, Winner for Oral presentation!), and even a new algorithm to detect Copy Number Variants (CNVs) in ancient genomes (LYCEUM!).

Thank you for attending and listening to the excellent presentation at HitSeq! The HitSeq aims to leverage the latest advances in high-throughput sequencing algorithms. We look forward to seeing you next year! Don’t forget to subscribe to our social media for more updates!

Twitter (X) @HiTSeq, BlueSky @hitseq.bsky.social

iRNA

The first day of the iRNA COSI started with a great keynote by Steven West from Exeter University discussing transcription attrition and why the decision between productive elongation and premature termination involves two complexes, the Integrator and the Restrictor, each monitoring different stages of transcription and providing a sequential verification mechanism. We had several talks from abstracts on methods for the identification, regulation and function of diverse noncoding RNAs including circular RNAs, miRNAs, snoRNAs, tRNAs and lncRNAs. These were followed by talks on approaches for the design of different flavours of CRISPR/Cas systems, RNA editing detection and mRNA design as well as our second keynote given by Roser Vento-Tormo from the Wellcome Sanger Institute showing beautiful work using single cell sequencing and highlighting the importance of spatial data for the study and comparison of the female reproductive tract throughout development and adulthood. The day ended with the iRNA dinner within the ISMB networking event at Punch Tarmey’s and the traditional iRNA quiz which was won ex aequo by ChatGPT and a team of dedicated students.

The second day covered diverse topics including predicting and characterizing split open reading frames, RBP binding, intron retention, RNA-DNA interactions, pseudouridine levels, isoform quantification and cryptic exons. We had a special session on ‘Building the future of RNA tools’ with a keynote talk by Blake Sweeney from the EMBL/EBI and talks from abstracts on current tools for the characterization of RNA biology, finishing with a thought-provoking panel discussion with Ana Conesa (University of Valencia), Jan Gorodkin (University of Copenhagen), Lina Ma (Beijing Institute of Genomics) and Yaron Orenstein (Bar-Ilan University). Through numerous questions from the audience, the panel highlighted the importance of generating more quality data to study different RNA characteristics and most particularly RNA structure, better covering the whole RNA space. Our last keynote talk of the day was given by Yiliang Ding from John Innes Centre discussing tools and approaches her group has built to study and characterize functional and dynamic RNA structures including G-quadruplexes which her group found serving as dynamic cold sensors.

JPI

ISCB Junior PI Community Grows at ISMB

Junior PIs met at ISMB to network and discuss upcoming activities. The meetup was exciting, welcoming new members and ideas for the community. Leading a research group or projects as an independent researcher? Join our Slack workspace: https://tinyurl.com/jpislack

If you have questions, let us know: [email protected]

How to Receive Your Certificate of Participation/Presentation

As with each year's conference, you will be able to receive a certificate of participation/presentation for ISMB/ECCB 2025. The certificate will be accessible upon completion of the conference survey which was sent out this evening after the last keynote. The email subject line is "ISMB/ECCB 2025 Conference Survey - Your feedback requested".

A Gentle Reminder

To help us protect the integrity of the conference certificates and to prevent potential misuse, we kindly ask that you do not share images of your certificate on any social media platform.

ISMB/ECCB 2025 On-Demand

Remember that sessions have been recorded (with author permissions) and will be part of the ISMB/ECCB 2025 On-Demand library on Nucleus.

After the conference, we will work to edit the live session recordings into individual videos. All registered participants of ISMB/ECCB will have exclusive access to the conference content and will be able to log in at any time to view recordings of any talks you’d like see again!