Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Thursday, July 24th
8:40-8:45
Opening
Room: 04AB
Format: In person


Authors List: Show

  • Qianwen Wang, Zeynep Gumus
8:45-9:40
Invited Presentation: TBD
Room: 04AB
Format: In person


Authors List: Show

  • Kay Nieselt
9:40-10:00
GENET: AI-Powered Interactive Visualization Workflows to Explore Biomedical Entity Networks
Confirmed Presenter: Bum Chul Kwon, IBM Research, United States

Room: 04AB
Format: In person


Authors List: Show

  • Bum Chul Kwon, IBM Research, United States
  • Natasha Mulligan, IBM Research, Ireland
  • Joao Bettencourt-Silva, IBM Research, Ireland
  • Ta-Hsin Li, IBM Research, United States
  • Bharath Dandala, IBM Research, United States
  • Feng Lin, Cleveland Clinic, United States
  • Pablo Meyer, IBM Research, United States
  • Ching-Huei Tsou, IBM Research, United States

Presentation Overview: Show

Formulating experimental hypotheses that test the association between SNPs and diseases involves logical reasoning derived from prior observations, followed by the labor-intensive process of collecting and analyzing relevant literature to test the scientific plausibility and viability. AI models trained with previous association data (e.g., GWAS Catalog) can help infer potential associations between SNPs and diseases, but scientists still need to manually collect and inspect the evidence for such predictions from prior literature. To alleviate this burden, we introduce an AI-enhanced, end-to-end visual analytics workflow called GENET, which aims to help scientists discover the SNP-Target associations, collect evidence from scientific literature, extract knowledge as biomedical entity networks, and interactively explore them using visualizations. The workflow consists of the following four steps, where each step’s output serves as the input for the next step: 1) biomedical network analysis: identify interesting genes/SNPs that are associated with a target disease through indirectly connected genes/SNPs using a neural network; 2) literature evidence mining pipeline: collect relevant literature on the target diseases or the infered genes/SNPs, and extract biomedical entities and their relations from the collection using large language models; 3) clustering: cluster the extracted entities and relations by generating the embeddings using pre-trained biomedical language models (e.g., BioBERT, BioLinkBERT); 4) interactive visualizations: visualize the clusters of biomedical entities and their networks and provide interactive handles for exploration. The workflow enables users to iteratively formulate and test hypotheses involving SNPs/genes and diseases against evidence from scientific literature and databases and gain novel insights.

11:20-11:40
Prostruc: an open-source tool for 3D structure prediction using homology modeling
Room: 04AB
Format: In person


Authors List: Show

  • Shivani Pawar, Department of Biotechnology and Bioinformatics, Deogiri College, Auranagabad, Maharashtra, India
  • Wilson Sena Kwaku Banini, Department of Theoretical and Applied Biology, Kwame Nkrumah University of Science and Technology, Ghana
  • Musa Muhammad Shamsuddeen, Department of Public Health, Faculty of Health Sciences, National Open University of Nigeria, Abuja, Nigeria
  • Toheeb A Jumah, School of Collective Intelligence, University Mohammed VI Polytechnic, Rabat, Morocco
  • Nigel N O Dolling, Department of Parasitology, Noguchi Memorial Institute for Medical Research, University of Ghana, Accra, Ghana
  • Abdulwasiu Tiamiyu, School of Collective Intelligence, University Mohammed VI Polytechnic, Rabat, Morocco
  • Olaitan I. Awe, African Society for Bioinformatics and Computational Biology, Cape Town, South Africa

Presentation Overview: Show

Homology modeling is a widely used computational technique for predicting the three-dimensional (3D) structures of proteins based on known templates,evolutionary relationships to provide structural insights critical for understanding protein function, interactions, and potential therapeutic targets. However, existing tools often require significant expertise and computational resources, presenting a barrier for many researchers.
Prostruc is a Python-based homology modeling tool designed to simplify protein structure prediction through an intuitive, automated pipeline. Integrating Biopython for sequence alignment, BLAST for template identification, and ProMod3 for structure generation, Prostruc streamlines complex workflows into a user-friendly interface. The tool enables researchers to input protein sequences, identify homologous templates from databases such as the Protein Data Bank (PDB), and generate high-quality 3D structures with minimal computational expertise. Prostruc implements a two-stage vSquarealidation process: first, it uses TM-align for structural comparison, assessing Root Mean Deviations (RMSD) and TM scores against reference models. Second, it evaluates model quality via QMEANDisCo to ensure high accuracy.
The top five models are selected based on these metrics and provided to the user. Prostruc stands out by offering scalability, flexibility, and ease of use. It is accessible via a cloud-based web interface or as a Python package for local use, ensuring adaptability across research environments. Benchmarking against existing tools like SWISS-MODEL,I-TASSER and Phyre2 demonstrates Prostruc's competitive performance in terms of structural accuracy and job runtime, while its open-source nature encourages community-driven innovation.
Prostruc is positioned as a significant advancement in homology modeling, making high-quality protein structure prediction more accessible to the scientific community.

11:40-12:00
Automatic Generation of Natural Language Descriptions of Genomics Data Visualizations for Accessibility and Machine Learning
Room: 04AB
Format: In person


Authors List: Show

  • Thomas C. Smits, Harvard Medical School, MA, USA, United States
  • Sehi L'Yi, Harvard Medical School, MA, USA, United States
  • Andrew P. Mar, University of California, Berkeley, USA, United States
  • Nils Gehlenborg, Harvard Medical School, MA, USA, United States

Presentation Overview: Show

Availability of multimodal representations, i.e., visual and textual, is crucial for both information accessibility and construction of retrieval systems and machine learning (ML) models. Interactive data visualizations, omnipresent in data analysis tools and data portals, are key to accessing biomedical knowledge and detecting patterns in large datasets. However, large-scale ML models for generating descriptions of visualizations are limited and cannot handle the complexity of data and visualizations in fields like genomics. Generating accurate descriptions of complex interactive genomics visualizations remains an open challenge. This limits both access for blind and visually impaired users, and the development of multimodal datasets for ML applications. Grammar-based visualizations offer a unique opportunity. Since specifications of visualization grammars contain structured information about visualizations, they can be used to generate text directly, rather than interpreting the rendered visualization, potentially resulting in more precise descriptions.

We present AltGosling, an automated description generation tool focused on interactive visualizations of genome-mapped data, created with grammar-based toolkit Gosling. AltGosling uses a logic-based algorithm to create descriptions in various forms, including a tree-structured navigable panel for keyboard accessibility, and visualization-text pairs for ML training. We show that AltGosling outperforms state-of-the-art large language models and image-based neural networks for text generation of genomics data visualizations. AltGosling was adopted in our follow-up study to construct a retrieval system for genomics visualizations combining different modalities (specification, image, and text). As a first in genomics research, we lay the groundwork for building multimodal resources, improving accessibility, and enabling integration of biomedical visualizations and ML.

12:00-12:20
Can LLMs Bridge Domain and Visualization? A Case Study onHigh-Dimension Data Visualization in Single-Cell Transcriptomics
Room: 04AB
Format: In person


Authors List: Show

  • Qianwen Wang, University of Minnesota, United States
  • Xinyi Liu, University of Northeastern, United States
  • Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

While many visualizations are built for domain users (biologists), understanding how visualizations are used in the domain has long been a challenging task. Previous research has relied on either interviewing a limited number of domain users or reviewing relevant application papers in the visualization community, neither of which provides comprehensive insight into visualizations in the wild of a specific domain. This paper aims to fill this gap by examining the potential of using Large Language Models (LLM) to analyze visualization usage in domain literature. We use high-dimension (HD) data visualization in sing-cell transcriptomics as a test case, analyzing 1,203 papers that describe 2,056 HD visualizations with highly specialized domain terminologies (e.g., biomarkers, cell lineage). To facilitate this analysis, we introduce a human-in-the-loop LLM workflow that can effectively analyze a large collection of papers and translate domain-specific terminology into standardized data and task abstractions. Instead of relying solely on LLMs for end-to-end analysis, our workflow enhances analytical quality through 1) integrating image processing and traditional NLP methods to prepare well-structured inputs for three targeted LLM subtasks (\ie, translating domain terminology, summarizing analysis tasks, and performing categorization), and 2) establishing checkpoints for human involvement and validation throughout the process.
The analysis results, validated with expert interviews and a test set, revealed three often overlooked aspects in HD visualization: trajectories in HD spaces, inter-cluster relationships, and dimension clustering.
This research provides a stepping stone for future studies seeking to use LLMs to bridge the gap between visualization design and domain-specific usage.

12:20-12:40
ClusterChirp: A GPU-Accelerated Web Platform for AI-Supported Interactive Exploration of High-Dimensional Omics Data
Room: 04AB
Format: In person


Authors List: Show

  • Osho Rawal, Icahn School Of Medicine At Mount Sinai, United States
  • Edgar Gonzalez-Kozlova, Icahn School Of Medicine At Mount Sinai, United States
  • Sacha Gnjatic, Icahn School Of Medicine At Mount Sinai, United States
  • Zeynep H. Gumus, Icahn School Of Medicine At Mount Sinai, United States

Presentation Overview: Show

Modern omics technologies generate high-dimensional datasets that overwhelm traditional visualization tools, requiring computational tradeoffs that risk losing important patterns. Researchers without computational expertise face additional barriers when tools demand specialized syntax or command-line proficiency, while connecting visual patterns to biological meaning typically requires manual navigation across platforms. To address these challenges, we developed ClusterChirp, a GPU-accelerated web platform for real-time exploration of data matrices containing up to 10 million values. The platform leverages deck.gl for hardware-accelerated rendering and optimized multi-threaded clustering algorithms that significantly outperform conventional methods. Its intuitive interface features interactive heatmaps and correlation networks that visualize relationships between biomarkers, with capabilities to dynamically cluster or sort data by various metrics, search for specific biomarkers, and adjust visualization parameters. Uniquely, ClusterChirp includes a natural language interface powered by an Artificial Intelligence (AI)-supported Large Language Model (LLM), enabling interactions through conversational commands. The platform connects with biological knowledge-bases for pathway and ontology enrichment analyses. ClusterChirp is being developed through iterative feedback from domain experts while adhering to FAIR principles, and will be freely available upon publication. By uniting performance, usability, and biological context, ClusterChirp empowers researchers to extract meaningful insights from complex omics data with unprecedented ease.

12:40-13:00
Sketch, capture and layout Phylogenies
Confirmed Presenter: Daniel Huson, University of Tuebingen, Germany

Room: 04AB
Format: In person


Authors List: Show

  • Daniel Huson, University of Tuebingen, Germany

Presentation Overview: Show

Phylogenetic trees and networks play a central role in biology, bioinformatics, and mathematical biology, and producing clear, informative visualizations of them is an important task. We present new algorithms for visualizing rooted phylogenetic networks as either combining or transfer networks, in both cladogram and phylogram style. In addition, we introduce a layout algorithm that aims to improve clarity by minimizing the total stretch of reticulate edges. To address the common issue that biological publications often omit machine-readable representations of depicted trees and networks, we also provide an image-based algorithm for extracting their topology from figures. All algorithms are implemented in our new PhyloSketch app, which is open source and freely available at:
https://github.com/husonlab/phylosketch2.

PhageExpressionAtlas - a comprehensive transcriptional atlas of phage infections of bacteria
Confirmed Presenter: Maik Wolfram-Schauerte, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany

Room: 04AB
Format: In person


Authors List: Show

  • Maik Wolfram-Schauerte, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany
  • Caroline Trust, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany
  • Nils Waffenschmidt, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany
  • Kay Nieselt, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany

Presentation Overview: Show

Bacteriophages (phages) are bacterial viruses that infect and lyse their hosts. Phages shape microbial ecosystems and have contributed essential tools for biotechnology and applications in medical research. Their enzymes, takeover mechanisms, and interactions with their bacterial hosts are increasingly relevant, especially as phage therapy emerges to combat antibiotic resistances.
Therefore, a thorough understanding of phage-host interactions, especially on the transcriptional level, is key to unlocking their full potential. Dual RNA sequencing (RNA-seq) enables such insight by capturing gene expression in both phages and hosts across infection stages. While individual studies have revealed host responses and phage takeover strategies, comprehensive and systematic analyses remain scarce.
To fill this gap, we present the PhageExpressionAtlas, the first interactive resource for exploring phage-host interactions at the transcriptome level. We developed a unified analysis pipeline to process over 20 public dual RNA-seq datasets, covering diverse phage-host systems, including therapeutic and model phages infecting ESKAPE pathogens like Staphylococcus aureus and Pseudomonas aeruginosa. Users can visualize gene expression across infection phases, download datasets, and classify phage genes as early, middle, or late expressed using customizable criteria. Expression data can be explored via heat maps, profile plots, and in genome context, aiding functional gene characterization and phage genome analysis.
The PhageExpressionAtlas will continue to grow, integrating new datasets and features, including cross-phage/host comparisons and host transcriptome architecture analysis. We envision the PhageExpressionAtlas to become a central resource for the phage research community, fostering data-driven insights and interdisciplinary collaboration. The resource is available at phageexpressionatlas.cs.uni-tuebingen.de.

14:00-14:40
SEAL: Spatially-resolved Embedding Analysis with Linked Imaging Data
Room: 04AB
Format: In person


Authors List: Show

  • Simon Warchol, Harvard School of Engineering and Applied Sciences, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Grace Guo, Harvard School of Engineering and Applied Sciences, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Johannes Knittel, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
  • Dan Freeman, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Usha Bhalla, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
  • Jeremy Muhlich, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Peter K Sorger, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Hanspeter Pfister, Harvard John A. Paulson School of Engineering and Applied Sciences, United States

Presentation Overview: Show

Dimensionality reduction techniques help analysts interpret complex, high-dimensional spatial datasets by projecting data attributes into two-dimensional space. For instance, when investigating multiplexed tissue imaging, these techniques help researchers identify and differentiate cell types and states. However, they abstract away crucial spatial, positional, and morphological contexts, complicating interpretation and limiting deeper biological insights. To address these limitations, we present SEAL, an interactive visual analytics system designed to bridge the gap between abstract 2D embeddings and their rich spatial imaging context. SEAL introduces a novel hybrid-embedding visualization that preserves morphological and positional information while integrating critical high-dimensional feature data. By adapting set visualization methods, SEAL allows analysts to identify, visualize, and compare selections—defined manually or algorithmically—in both the embedding and original spatial views, enabling richer interpretation of the spatial arrangement and morphological characteristics of entities of interest. To elucidate differences between selected sets, SEAL employs a scalable surrogate model to calculate feature importance scores, identifying the most influential features governing the position of objects within embeddings. These importance scores are visually summarized across selections, with mathematical set operations enabling detailed comparative analyses. We demonstrate SEAL’s effectiveness through two case studies with cancer researchers: colorectal cancer analysis with a pharmacologist and melanoma investigation with a cell biologist. We then illustrate broader cross-domain applicability by exploring multispectral astronomical imaging data with an astronomer. Implemented as a standalone tool or integrated seamlessly with computational notebooks, SEAL provides an interactive platform for spatially informed exploration of high-dimensional datasets, significantly enhancing interpretability and insight generation.

Nightingale - A collection of web components for visualizing protein related data
Confirmed Presenter: Swaathi Kandasaamy, UniProt - EMBL-EBI, United Kingdom

Room: 04AB
Format: In person


Authors List: Show

  • Swaathi Kandasaamy, UniProt - EMBL-EBI, United Kingdom
  • Daniel Rice, UniProt - EMBL-EBI, United Kingdom
  • Aurélien Luciani, UniProt - EMBL-EBI, United Kingdom
  • Adam Midlik, PDBe - EMBL-EBI, United Kingdom
  • Maria Martin, UniProt - EMBL-EBI, United Kingdom

Presentation Overview: Show

Nightingale is an open-source web visualization library for rendering protein-related data including domains, sites, variants, structures, and interactions using reusable web components. It employs a track-based approach, where sequences are represented horizontally, and multiple tracks can be stacked vertically to visualize different annotations at the same position, aiding in the discovery of relationships across annotations. This intuitive approach enhances the exploration and interpretation of complex biological data. It leverages the HTML5 Canvas API for improved performance, handling large datasets efficiently in the most used tracks, while still using SVG as a layer on top of canvas for interactivity which is not critical for performance.

It is a collaborative effort by UniProt, InterPro, and PDBe to provide a unified set of components for their websites, including UniProt’s ProtVista, while allowing flexibility for specific needs. As a collection of standard web components, Nightingale integrates seamlessly into any web application, ensuring compatibility with various frameworks and libraries. It utilizes standard DOM event propagation and attribute-based communication to facilitate interoperability between Nightingale components and other web components, irrespective of their internal implementation details. As an evolving platform, we aim to engage with parallel visualization projects to identify and promote best practices in the application of web standards, with a focus on advancing the adoption and integration of web components within the domain of biological data visualization.

A Multimodal Search and Authoring System for Genomics Data Visualizations
Room: 04AB
Format: In person


Authors List: Show

  • Huyen N. Nguyen, Harvard Medical School, United States
  • Sehi L'Yi, Harvard Medical School, United States
  • Thomas C. Smits, Harvard Medical School, United States
  • Shanghua Gao, Harvard Medical School, United States
  • Marinka Zitnik, Harvard Medical School, United States
  • Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

We present a database system for retrieving interactive genomics visualizations through multimodal search capabilities. Our system offers users flexibility through three query methods: example images, natural language, or grammar-based queries, via a user interface. For each visualization in our database, we generate three complementary representations: a declarative specification using the Gosling visualization grammar, a pixel-based image, and a natural language description. To support investigation of multiple embeddings and retrieval strategies, we implement three embedding methods that capture different aspects of these visualizations: (1) Context-free grammar embeddings specifically designed for genomics visualizations, addressing specialized features like genomic tracks, views, and interactivity, (2) Multimodal embeddings derived from a state-of-the-art biomedical vision-language foundation model, and (3) Textual embeddings generated by our fine-tuned specification-to-text large language model. We evaluated the proposed embedding strategies across different modality variations using top-k retrieval accuracy. Notably, our findings demonstrate that context-free grammar embedding approaches achieve comparable retrieval results with lower computational demands. Our current collection contains over three thousand visualization examples spanning approximately 50 categories, from basic to scalable encodings, from single- to coordinated multi-view visualizations, supporting diverse genomics applications including gene annotations and single-cell epigenomics analysis. Retrieved visualizations serve as ready-to-use scaffolds for authoring: they are templates that users can modify with their data and customize to their visual preferences. This approach provides researchers with reusable examples, allowing them to concentrate on meaningful data analysis and interpretation instead of the technicalities of building visualizations from scratch.

Tersect Browser: characterising introgressions through interactive visualisation of large numbers of resequenced genomes
Room: 04AB
Format: In person


Authors List: Show

  • Tomasz Kurowski, Cranfield University, United Kingdom
  • Fady Mohareb, Cranfield University, United Kingdom

Presentation Overview: Show

Introgressive hybridisation has long been a major source of genetic variation in plant genomes, and the ability to precisely identify and delimit intervals of DNA originating from wild species or cultivars of interest is of great importance to both researchers seeking insights into the evolution and breeding of crops, and to plant breeders seeking to protect their intellectual property. The low cost of genome resequencing and the public availability of large sets of resequenced genomes for many species of commercial importance, as well as for their wild relatives, have made it possible to reliably characterise the origins of specific genomic intervals. However, such analyses are often hampered by the same large volume of data that enables them. They generally take a long time to execute, and their results are difficult to visualise in an easily explorable manner.
We present Tersect Browser, a Web-based tool that leverages a novel, multi-tier indexing and pre-calculation scheme to allow biologists to explore the relationships between large sets of resequenced genomes in a fully interactive fashion. Users have the option to freely adjust interval size and resolution while navigating through detailed genetic distance heatmaps and phylogenies for genomes and regions of interest, smoothly zooming in and out depending on the needs of their exploratory data analysis, aided by extendable plugins and annotations. Results and visualisations can also be shared with others and downloaded as high-resolution figures for use outside the application, placing the researcher best prepared to interpret the results in full control.

14:40-15:40
Invited Presentation: TBD
Room: 04AB
Format: In person


Authors List: Show

  • Ingrid Hotz