Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Wednesday, July 23rd
11:20-12:00
Invited Presentation:
Format: In person

Moderator(s): Alberto Riva


Authors List: Show

  • Damian Dalle Nogare
12:00-13:00
Panel: The rise of computational imaging
Format: In person

Moderator(s): Alberto Riva


Authors List: Show

Presentation Overview: Show

Thanks in part to the popularity of spatial transcriptomics, many of us are now being faced with challenges that can be helped or solved with imaging data, or the need to combine images with other forms of data. How can we leverage the robust tools of computational imaging to help our collaborators and solve problems in our projects?

14:00-15:00
Panel: The practical use of AI in cores
Format: In person

Moderator(s): TBA


Authors List: Show

Presentation Overview: Show

It’s here and it’s being used. How do we use it to good effect, and how do we teach our collaborators to use it? This topic could include not only generative AI for code, but extend to the use of foundation models in single cell analysis or related topics.

15:00-15:20
Benchmarking Variant-Calling Workflows: The nf-core/variantbenchmarking Pipeline within the GHGA Framework
Format: In person

Moderator(s): TBA


Authors List: Show

  • Kübra Narcı, DKFZ, Germany

Presentation Overview: Show

The nf-core/variantbenchmarking pipeline (https://github.com/nf-core/variantbenchmarking) is a versatile and comprehensive workflow designed to benchmark variant-calling tools across various use cases. Developed as part of the German Human Genome-Phenome Archive (GHGA) project, this pipeline supports the evaluation of small variants, insertions and deletions (indels), and structural variants for both germline and somatic samples.
Users can leverage publicly available truth datasets, such as Genome in a Bottle or SEQC2, for benchmarking or provide custom VCF files with or without specific regions of interest. The pipeline supports diverse normalization methods, including variant splitting, deduplication, left or right alignment, filtration, and different benchmarking tools such as hap.py, RTG Tools, Truvari, SVAnalyzer, or Witty.er. This flexibility enables tailored analyses to meet specific research needs. The workflow generates detailed performance metrics, such as precision, recall, and F1 scores, allowing researchers to accurately assess the strengths and limitations of their variant-calling workflows.
GHGA’s architecture is built on cloud computing infrastructures and includes an ethico-legal framework to ensure data protection compliance. GHGA enables researchers to conduct reproducible, rigorous, and secure research by standardizing bioinformatics workflows and governing reusability through harmonized metadata schemas.
Built using Nextflow, the nf-core/variantbenchmarking pipeline is scalable, reproducible, and compatible with diverse computational environments, including local systems, high-performance clusters, and cloud platforms. This ensures seamless integration with secure platforms like GHGA for smooth benchmarking analyses. Additionally, the pipeline is fully open source and adheres to nf-core community guidelines, ensuring high-quality, reviewed code, modularity, and extensibility.

Assembly Curator: rapid and interactive consensus assembly generation for bacterial genomes
Confirmed Presenter: Thomas Roder, Interfaculty Bioinformatics Unit, University of Bern, Switzerland

Format: In person

Moderator(s): TBA


Authors List: Show

  • Thomas Roder, Interfaculty Bioinformatics Unit, University of Bern, Switzerland
  • Rémy Bruggmann, Interfaculty Bioinformatics Unit, University of Bern, Switzerland

Presentation Overview: Show

Introduction
Long-read sequencing technologies enable the generation of near-complete bacterial genome assemblies. However, no de novo assembler is perfect – issues like duplicated or missing plasmids, spurious contigs, and failures to circularize sequences remain common problems. Achieving optimal results still requires manual consensus generation. While tools like Trycycler simplify this process, they are labor-intensive and require command-line expertise. With the ability to sequence hundreds of datasets quickly and affordably, there is a growing need for faster, more accessible solutions.

Methods
Here, we present Assembly Curator, a platform that (i) imports multiple assemblies, (ii) clusters contigs, and (iii) facilitates interactive comparison and selection through a user-friendly graphical interface. The software has a plug-in system to enable the import of data produced by different assemblers. Assembly Curator enables on-the-fly calculation of dotplots and can submit contig subsequences directly to NCBI’s Blast servers for approximate taxonomic identification, aiding in contamination assessment. It generates standardized and informative headers in FASTA files which are directly compatible with the NCBI annotation pipeline PGAP.

Results and Discussion
Assembly Curator enables the semi-automatic processing of hundreds of genomes in just a few hours, significantly reducing manual effort while maintaining high assembly completeness and accuracy.
Moreover, the browser-based UI enables biologists without programming skills but domain specific knowledge to perform or participate in the curation process. This can potentially lead to superior results.

15:20-15:40
Long Read Sequencing at Genomics England
Format: In person

Moderator(s): TBA


Authors List: Show

  • Adam Giess, Genomics England, United Kingdom

Presentation Overview: Show

At Genomics England, in the Scientific R&D Team, we are evaluating the potential role of ‘long read’ technologies in clinical whole genome sequencing. Long read technologies such as those developed by Oxford Nanopore offer the promise of comprehensive whole genome sequencing, providing nucleotide variants and epigenetic modifications, alongside the potential to resolve large scale variation, and to uncover previously inaccessible parts of the genome. The possibility of such a comprehensive view of the genome is particularly appealing in clinical settings, and with developing platforms like the Oxford Nanopore promethION sequencer, long read sequencing at scale has become a realistic prospect. However, despite this there is still a lack of large publicly available clinical long read datasets, and this presents a problem for assessment of the technologies themselves, and for the development of tools to get the most from these technologies. Here we present our experiences with Oxford Nanopore promethION sequencing at Genomics England, moving from pilot studies to projects involving thousands of participants, across rare disease, cancer and diverse ancestries. We present our long read datasets and the steps that we took to generate them, highlighting the challenges unique to this developing technology and the solutions that we have taken along our journey to long read sequencing at scale.

Autonomous Genomics Analysis in Persist-seq
Format: In person

Moderator(s): TBA


Authors List: Show

  • Anil S. Thanki, EMBL-EBI, United Kingdom
  • Pablo Moreno, AstraZeneca, UK, United Kingdom
  • Ultan McDermott, AstraZeneca, UK, United Kingdom

Presentation Overview: Show

The analysis of large-scale biological datasets poses considerable challenges, particularly in managing data complexity, ensuring reproducibility, and reducing manual intervention. Traditional data processing pipelines often suffer from scalability issues, susceptibility to errors, and inconsistent reproducibility across computing environments.

The Persist-SEQ consortium, comprising multiple partner institutions, is focused on generating and analyzing single-cell sequencing data to investigate early persister tumor cells in cancer treatment. Like many collaborative efforts, the consortium encounters the limitations of conventional data processing approaches.
To overcome these barriers, we have developed a fully automated, scalable, and reproducible infrastructure tailored for high-throughput genomics analysis. This system leverages Kubernetes, Jenkins, and Galaxy. The platform automates data retrieval from AWS, constructs Galaxy data libraries, and executes predefined single-cell analysis workflows with no manual intervention. Jenkins coordinates the end-to-end workflow—from data ingestion through to results delivery—while Kubernetes ensures a consistent, portable execution environment across various deployments. Galaxy provides an intuitive interface for executing reproducible analytical workflows as well as provides users with access to data.

For enhanced operational transparency, the system integrates with Slack to deliver real-time status updates and error alerts, facilitating prompt monitoring and resolution. Currently deployed on the secure EBI Embassy Cloud, the infrastructure offers robust performance, data security, and efficient resource utilization. Our platform has been successfully implemented within the Persist-SEQ consortium, demonstrating its ability to streamline transcriptomic data analysis, enhance reproducibility, and reduce operational overhead. This approach represents a scalable and reliable solution for the evolving demands of modern biological research.

15:40-16:00
Advancing The Expression Atlas Resources: A Scalable Single-Cell Transcriptomics Pipeline to Facilitate Scientific Discoveries
Format: In person

Moderator(s): TBA


Authors List: Show

  • Iris Diana Yu, EMBL-EBI, United Kingdom
  • Pedro Madrigal, EMBL-EBI, United Kingdom
  • Anil Thanki, EMBL-EBI, United Kingdom
  • Christina Ernst, EMBL-EBI, United Kingdom

Presentation Overview: Show

The Expression Atlas (https://www.ebi.ac.uk/gxa) and Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc) are EMBL-EBI knowledgebases for gene and protein expression across tissues and cells. They provide standardised re-analysis of high-quality RNA-seq and single-cell RNA-seq (scRNA-seq) datasets, respectively. Both pipelines use open-source tools for quantification, aggregation, and downstream analysis, with workflows publicly available. Their web interfaces support dataset exploration via gene and metadata queries. Current datasets span animal, plant, fungal, and protist species, and integrate data from major public archives, including the International Cancer Genome Consortium, the COVID-19 Data Portal, and the Human Cell Atlas.

Over the past year, the Single Cell Expression Atlas (SCEA) has had significant changes. First, it can now consume and partially re-analyse pre-processed data, enabling the inclusion of studies that cannot provide raw data due to various constraints.The SCEA pipeline was also recently restructured into a fully end-to-end Nextflow pipeline, replacing a hybrid of Nextflow and Galaxy. This redesign improves maintainability and enables faster adaptation to advances in single-cell transcriptomics. Additionally, the post-quantification analysis workflow has also been enhanced with new features that improve its portability and compatibility with non-Atlas workflows, supporting FAIR data principles. Key improvements include modularised processes, automated testing for continuous integration, full containerisation of the analysis environment, and support for different usage scenarios. Ongoing feature development aims to further modernise the pipeline and broaden its utility within the scientific community.

Mixed effects models applied to single nucleus RNA-seq data identify cell types associated with animal level pathological trait of Alzheimer’s disease
Format: In person

Moderator(s): TBA


Authors List: Show

  • Ayushi Agrawal, Gladstone Institutes, United States
  • Michela Traglia, Gladstone Institutes, United States
  • Nicole Koutsodendris, Gladstone Institutes; University of California, San Francisco, United States
  • Yadong Huang, Gladstone Institutes; University of California, San Francisco, United States
  • Reuben Thomas, Gladstone Institutes, United States

Presentation Overview: Show

Apolipoprotein E4 (APOE4) is the strongest genetic risk factor for Alzheimer’s disease (AD). Although neuronal APOE4 expression is induced under conditions of stress or injury, its role in AD pathogenesis remains unclear. To investigate this, we analyzed single-nucleus RNA-seq data from APOE4 knock-in mice expressing human Tau-P301S mutant, alongside animal-level measurements of tau pathology, neurodegeneration, and myelin deficits.

We applied generalized linear mixed effects models to test associations between transcriptionally defined cell-type proportions and neuropathological severity. Our analysis identified disease-associated subpopulations of neurons, oligodendrocytes, astrocytes, and microglia, whose relative abundance correlated with increased tau pathology, neuronal loss, and myelin disruption. These findings suggest that specific cellular populations track with AD-related pathologies in the presence of neuronal APOE4.

To evaluate the validity of inferring absolute abundance changes from single-nucleus RNA-seq data, we generated a simulated dataset with known ground truth cell-type compositions and pathology metrics. By benchmarking multiple normalization strategies, we determined the conditions under which statistical inference of compositional changes is reliable.

Together, our results underscore the utility of mixed-effects models for integrating single-nucleus transcriptomics with phenotypic data and highlight how normalization choices critically influence biological interpretation of cell-type shifts in complex tissues like the brain.

Optimizing Clustering Resolution for Multi-subject Single Cell Studies
Format: In person

Moderator(s): TBA


Authors List: Show

  • Natalie Gill, Gladstone Institutes, United States
  • Min-Gyoung Shin, Gladstone Institutes, United States
  • Ayushi Agrawal, Gladstone Institutes, United States
  • Reuben Thomas, Gladstone Institutes, United States

Presentation Overview: Show

Increasingly, single cell -omics analysis is being done on large cohorts of patients and model organisms and modularity based graph clustering algorithms are used to identify cell types and states across all subjects. Selecting the clustering resolution parameter is often based on the concentration of expression of cell type marker genes within clusters, increasing the parameter as needed to resolve clusters with mixed cell type gene signatures. This approach is however subjective in situations where one does not have complete knowledge of condition/disease associated cell-types in the context of novel biology, it is time-consuming and has the potential to bias the final clustering results due to individual transcriptomic heterogeneity, and subject-specific differences in cell composition. We introduce clustOpt, a method that improves the reproducibility of modularity based clustering in multi-subject experiments by using a combination of subject-wise cross validation, feature splitting, random forests and measures of cluster quality using the silhouette metric to guide the selection of the resolution parameter. We describe the results from benchmarking of this method on the Asia Immune Diversity Atlas data set.

16:40-17:00
GEO Uploader: Simplifying the data deposition in the GEO repository
Format: In person

Moderator(s): Madelaine Gogol


Authors List: Show

  • Ronald Domi, University of Zurich, Switzerland
  • Falko Noé, ETH Zurich / University of Zurich, Switzerland
  • Peter Leary, ETH Zurich / University of Zurich, Switzerland
  • Hubert Rehrauer, ETH Zurich / University of Zurich, Switzerland

Presentation Overview: Show

Introduction
Making data FAIR is a key step in every research project. For NGS data there are the GEO and ENA repositories that provide long term storage and open access, and are widely adopted in the research community. Transferring the data and compiling the meta-information appropriately is however still a manual activity which may be cumbersome for massive NGS data.



Methods
We implemented a Python-based web application that performs the data upload for users and compiles the meta-information in an appropriate way. The GEO Uploader can be run standalone but is in our environment tightly integrated with our SUSHI web framework for reproducible, web-based analysis of sequencing data.


Results
The GEO Uploader is running at our center at https://geo-uploader.fgcz.uzh.ch/ and so far already close to 50 datasets have been uploaded to GEO. The uploader collects the files, generates MD5 sums, transfers the data and compiles the Excel table that is needed to provide the meta-information. It fills protocol information automatically based on the data and let’s users enter other information through a convenient web interface. It currently supports bulk RNA-seq as well as single cell RNA-seq data.

Discussion
Our GEO Uploader contributes to the community-wide adoption of Open Resarch Data (ORD) best practices. The GEO Uploader invites researchers to make data available early in the research process and it simplifies and speeds up this step.

Enhancing Bioinformatics Workflows with Analytical Visualization Tools
Confirmed Presenter: Carlos Prieto, Bioinformatics Service, Nucleus. University of Salamanca., Spain

Format: In person

Moderator(s): Madelaine Gogol


Authors List: Show

  • Carlos Prieto, Bioinformatics Service, Nucleus. University of Salamanca., Spain
  • David Barrios, Bioinformatics Service, Nucleus. University of Salamanca., Spain

Presentation Overview: Show

Current front end development technologies have enabled the development of new visual analytics tools. These methodologies allow data to be visualized in a web browser in an interactive and dynamic way. The development of new visualization tools is essential for the effective exploration and interpretation of datasets and results produced by bioinformatics analysis techniques.
This work presents programming methodologies and analytical visualization solutions that have been applied applied to the analysis of high-throughput sequencing data. Their development has been carried out using new web visualization technologies and the creation of a development architecture called LAMPR (Linux, Apache, MySQL, PHP, R). The following bioinformatics tools will be presented:
- RJSplot: A collection of 17 interactive plots implemented in R.
- D3GB: An interactive genome browser.
- Looking4clusters: A tool for the interactive visualization of single-cell data.
- Rvisdiff: Analytical visualization of differential expression results.
- RaNA-Seq: A web-based platform for the analysis and visualization of RNA-Seq data.
- MutationMinning: A self-analytical interface for the exploration of DNA resequencing results.

The use of interactive and dynamic visualization tools enhances the interpretation of complex datasets and enables study designers or wet lab members to work toward a deeper understanding of their data.

Competency framework profiles to reflect career progression within bioinformatics core facility scientists
Format: In person

Moderator(s): Madelaine Gogol


Authors List: Show

  • Patricia Carvajal-López, EMBL's European Bioinformatics Institute, United Kingdom
  • Marta Lloret-Linares, EMBL's European Bioinformatics Institute, United Kingdom
  • Cath Brooksbank, EMBL's European Bioinformatics Institute, United Kingdom

Presentation Overview: Show

There is an expanding need for specialised services from Bioinformatics Core Facilities (BCF). Providing services for these infrastructures requires highly trained specialists; however, their ill- defined career pathways and their highly specialised skill set often hinder their efforts to progress in their professions.

To address this challenge, members of the ISCB’s Bioinfo-Core group, the Curriculum Task Force of the ISCB Education Committee, and other interested individuals joined forces at the 2023 ISMB Bioinfo-Core meeting to create the ‘Bioinformatics Core Facility Scientists Competencies Taskforce’ (https://sites.google.com/ebi.ac.uk/bioinfocore-competencies). The taskforce worked to provide a benchmark for reflecting the knowledge, skills and attitudes required by professionals in BCFs, and to provide a potential template for career progression for BCF scientists.

This benchmark was developed as an extension of the ISCB Competency Framework (https://competency.ebi.ac.uk), which defines a ‘minimum standard’ for a generic, mid-career BCF scientist (and for several other distinct career profiles). The outcome of this work was the addition of six BCF scientist-focused competencies (project management, people management, collaborator engagement, users and service, training, and leadership) to the thirteen that already exist. We also created four different professional profiles for BCF scientists, outlining a potential transition from entry level to a managerial role.

The development of a well-defined, competency-based career pathway, along with training for this community, is essential to support career progression of BCF specialists who, in return, enable research and development within the life sciences.

17:00-18:00
Panel: Breakout Groups
Format: In person

Moderator(s): Madelaine Gogol


Authors List: Show

Presentation Overview: Show

Our unconferencing event - attendees will break into groups based on topics of interest to discuss further with other core members.