SEAL: Spatially-resolved Embedding Analysis with Linked Imaging Data
Room: 04AB
Format: In person
Authors List: Show
- Simon Warchol, Harvard School of Engineering and Applied Sciences, Laboratory of Systems Pharmacology, Harvard Medical School, United States
- Grace Guo, Harvard School of Engineering and Applied Sciences, Laboratory of Systems Pharmacology, Harvard Medical School, United States
- Johannes Knittel, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
- Dan Freeman, Laboratory of Systems Pharmacology, Harvard Medical School, United States
- Usha Bhalla, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
- Jeremy Muhlich, Laboratory of Systems Pharmacology, Harvard Medical School, United States
- Peter K Sorger, Laboratory of Systems Pharmacology, Harvard Medical School, United States
- Hanspeter Pfister, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
Presentation Overview: Show
Dimensionality reduction techniques help analysts interpret complex, high-dimensional spatial datasets by projecting data attributes into two-dimensional space. For instance, when investigating multiplexed tissue imaging, these techniques help researchers identify and differentiate cell types and states. However, they abstract away crucial spatial, positional, and morphological contexts, complicating interpretation and limiting deeper biological insights. To address these limitations, we present SEAL, an interactive visual analytics system designed to bridge the gap between abstract 2D embeddings and their rich spatial imaging context. SEAL introduces a novel hybrid-embedding visualization that preserves morphological and positional information while integrating critical high-dimensional feature data. By adapting set visualization methods, SEAL allows analysts to identify, visualize, and compare selections—defined manually or algorithmically—in both the embedding and original spatial views, enabling richer interpretation of the spatial arrangement and morphological characteristics of entities of interest. To elucidate differences between selected sets, SEAL employs a scalable surrogate model to calculate feature importance scores, identifying the most influential features governing the position of objects within embeddings. These importance scores are visually summarized across selections, with mathematical set operations enabling detailed comparative analyses. We demonstrate SEAL’s effectiveness through two case studies with cancer researchers: colorectal cancer analysis with a pharmacologist and melanoma investigation with a cell biologist. We then illustrate broader cross-domain applicability by exploring multispectral astronomical imaging data with an astronomer. Implemented as a standalone tool or integrated seamlessly with computational notebooks, SEAL provides an interactive platform for spatially informed exploration of high-dimensional datasets, significantly enhancing interpretability and insight generation.
Nightingale - A collection of web components for visualizing protein related data
Confirmed Presenter: Swaathi Kandasaamy, UniProt - EMBL-EBI, United Kingdom
Room: 04AB
Format: In person
Authors List: Show
- Swaathi Kandasaamy, UniProt - EMBL-EBI, United Kingdom
- Daniel Rice, UniProt - EMBL-EBI, United Kingdom
- Aurélien Luciani, UniProt - EMBL-EBI, United Kingdom
- Adam Midlik, PDBe - EMBL-EBI, United Kingdom
- Maria Martin, UniProt - EMBL-EBI, United Kingdom
Presentation Overview: Show
Nightingale is an open-source web visualization library for rendering protein-related data including domains, sites, variants, structures, and interactions using reusable web components. It employs a track-based approach, where sequences are represented horizontally, and multiple tracks can be stacked vertically to visualize different annotations at the same position, aiding in the discovery of relationships across annotations. This intuitive approach enhances the exploration and interpretation of complex biological data. It leverages the HTML5 Canvas API for improved performance, handling large datasets efficiently in the most used tracks, while still using SVG as a layer on top of canvas for interactivity which is not critical for performance.
It is a collaborative effort by UniProt, InterPro, and PDBe to provide a unified set of components for their websites, including UniProt’s ProtVista, while allowing flexibility for specific needs. As a collection of standard web components, Nightingale integrates seamlessly into any web application, ensuring compatibility with various frameworks and libraries. It utilizes standard DOM event propagation and attribute-based communication to facilitate interoperability between Nightingale components and other web components, irrespective of their internal implementation details. As an evolving platform, we aim to engage with parallel visualization projects to identify and promote best practices in the application of web standards, with a focus on advancing the adoption and integration of web components within the domain of biological data visualization.
A Multimodal Search and Authoring System for Genomics Data Visualizations
Room: 04AB
Format: In person
Authors List: Show
- Huyen N. Nguyen, Harvard Medical School, United States
- Sehi L'Yi, Harvard Medical School, United States
- Thomas C. Smits, Harvard Medical School, United States
- Shanghua Gao, Harvard Medical School, United States
- Marinka Zitnik, Harvard Medical School, United States
- Nils Gehlenborg, Harvard Medical School, United States
Presentation Overview: Show
We present a database system for retrieving interactive genomics visualizations through multimodal search capabilities. Our system offers users flexibility through three query methods: example images, natural language, or grammar-based queries, via a user interface. For each visualization in our database, we generate three complementary representations: a declarative specification using the Gosling visualization grammar, a pixel-based image, and a natural language description. To support investigation of multiple embeddings and retrieval strategies, we implement three embedding methods that capture different aspects of these visualizations: (1) Context-free grammar embeddings specifically designed for genomics visualizations, addressing specialized features like genomic tracks, views, and interactivity, (2) Multimodal embeddings derived from a state-of-the-art biomedical vision-language foundation model, and (3) Textual embeddings generated by our fine-tuned specification-to-text large language model. We evaluated the proposed embedding strategies across different modality variations using top-k retrieval accuracy. Notably, our findings demonstrate that context-free grammar embedding approaches achieve comparable retrieval results with lower computational demands. Our current collection contains over three thousand visualization examples spanning approximately 50 categories, from basic to scalable encodings, from single- to coordinated multi-view visualizations, supporting diverse genomics applications including gene annotations and single-cell epigenomics analysis. Retrieved visualizations serve as ready-to-use scaffolds for authoring: they are templates that users can modify with their data and customize to their visual preferences. This approach provides researchers with reusable examples, allowing them to concentrate on meaningful data analysis and interpretation instead of the technicalities of building visualizations from scratch.
Tersect Browser: characterising introgressions through interactive visualisation of large numbers of resequenced genomes
Room: 04AB
Format: In person
Authors List: Show
- Tomasz Kurowski, Cranfield University, United Kingdom
- Fady Mohareb, Cranfield University, United Kingdom
Presentation Overview: Show
Introgressive hybridisation has long been a major source of genetic variation in plant genomes, and the ability to precisely identify and delimit intervals of DNA originating from wild species or cultivars of interest is of great importance to both researchers seeking insights into the evolution and breeding of crops, and to plant breeders seeking to protect their intellectual property. The low cost of genome resequencing and the public availability of large sets of resequenced genomes for many species of commercial importance, as well as for their wild relatives, have made it possible to reliably characterise the origins of specific genomic intervals. However, such analyses are often hampered by the same large volume of data that enables them. They generally take a long time to execute, and their results are difficult to visualise in an easily explorable manner.
We present Tersect Browser, a Web-based tool that leverages a novel, multi-tier indexing and pre-calculation scheme to allow biologists to explore the relationships between large sets of resequenced genomes in a fully interactive fashion. Users have the option to freely adjust interval size and resolution while navigating through detailed genetic distance heatmaps and phylogenies for genomes and regions of interest, smoothly zooming in and out depending on the needs of their exploratory data analysis, aided by extendable plugins and annotations. Results and visualisations can also be shared with others and downloaded as high-resolution figures for use outside the application, placing the researcher best prepared to interpret the results in full control.