Presentation Overview: Show
Research in visualization is often motivated by the endeavor to improve on the illustration of data: in order to better communicate data to others and to gain deeper insights into complex datasets, possibly from a variety of data sources. In the medical domain, the data can include for example patient data, health records as well as biologic data such as genome. Insights to be obtained from data may relate inter alia to the spreading of diseases, evolutionary analysis and virus mutations. The tasks include both retrospective analysis for finding the patient zero and modelling the spreading of a disease, as well predictive modelling of virus mutations and future disease spreading. The COVID-19 pandemic has confronted this general motivation for our research to a need for practical solutions. Infection control experts needed to quickly gain insights into novel datasets and to communicate the insights to colleagues and to a broader public, requiring quick and efficient visualization solutions.
New methods, tools, and methodologies have popped up from basic and from applied research. New data was collected, model results were produced that required rapid analysis. Multi-disciplinary teams worked and applied solutions to the new challenges resulting from the pandemic. The rapid response was only possible by leveraging on the experience and past research. Thus, the talk will take a larger historical perspective and present specific solutions from own experience, including reflections on data, task and user triangle as well as the challenges of multidisciplinary working styles.
Presentation Overview: Show
Reference-based cell-type annotation can significantly reduce time and effort in single-cell analysis by transferring labels from a previously-annotated dataset to a new dataset. However, label transfer is challenging. End-to-end computational methods can fail due to mixing technical variants (e.g., different sequencing batches or techniques) that must be removed and biological variants (e.g., different cells) that must be conserved among datasets. To address this issue, we propose Polyphony, an interactive transfer learning (ITL) framework, to complement biologists' knowledge with advanced computational methods. Polyphony is motivated and guided by domain experts' needs for a controllable, interactive, and algorithm-assisted annotation process, identified through our multi-round expert interviews with six biologists. We introduce anchors, i.e., analogous cell populations across datasets, as a paradigm to explain the computational process and collect users' feedback for model improvement. A set of visualizations and interactions is provided to empower users to add, delete, or modify anchors, resulting in refined cell type annotations. We demonstrate the effectiveness of this approach through two usage scenarios and interviews with two biologists. The results show that our anchor-based ITL method takes advantage of both human and machine intelligence in annotating massive single-cell datasets.
Presentation Overview: Show
Nonlinear dimensionality reduction (DR) methods are commonly used to create two-dimensional embeddings of high-dimensional data for visualization. Since the effectiveness of learned embeddings can depend markedly on the choice of the DR method’s hyperparameters, prior work has focused on evaluating hyperparameter settings. However, data transformations can be equally important for creating effective embeddings. Yet, they have received less attention.
In this talk, we’re going to present data transformation approaches for the embedding of single-cell data, specifically surface proteomics. Using computationally-derived labels for expression groups (e.g., low, medium, high) we can spread out and normalize the expression range of different cell phenotypes. Visually this allows for the identification of rare and complex cell types that would otherwise be indistinguishable from broad cell phenotypes. Moreover, such an approach effectively eliminates batch effects that are otherwise the cause for great differences in the lower-dimensional embedding and make sample-by-sample comparisons ineffective. Finally, we’re going to show a data transformation approach using simulated data to create a generic embedding with concrete data being mapped into it. Such an approach enables relative comparison of cluster expression profiles while still providing a global map for broad cluster similarities.
Presentation Overview: Show
Visualizing single-cell transcriptomics data in an informative way is a major challenge in biological data analysis. Clustering of cells is a prominent analysis step and the results are usually visualized in a planar embedding of the cells using methods like PCA, t-SNE, or UMAP. Given a cluster of cells, one frequently searches for the genes highly expressed specifically in that cluster. At this point, visualization is usually replaced by studying a list of differentially expressed genes.
We address this bottleneck by presenting Association Plots (APs) adapted to single-cell data. APs are derived from correspondence analysis, a projection method which embeds both genes and cells in high-dimensional space, where genes associated to a cell cluster lie in a particular direction. By employing this feature, APs constitute a dimension-independent visualization of cluster-specific genes from single-cell datasets. Our method is now available as a free Bioconductor package APL.
We demonstrate the application of APs to single-cell RNA-seq data through several examples. First, we show the identification of marker genes using APs. Second, we present how APs aid in cell cluster annotation using a predefined list of marker genes. Finally, we compare results from APs to results from existing differential expression testing tools.
Presentation Overview: Show
We present kana, a web application for interactive scRNA-seq data analysis that combines execution of both visualization and computational analysis in the web browser. Kana leverages web technologies such as WebAssembly to efficiently perform the relevant computations on the user’s machine leveraging C++ libraries implementing analysis steps that are re-usable in non-visualization, or client/server approaches. As an added benefit of this client side approach, user data is never transferred or uploaded to a server, avoiding problems with data privacy. Since computations run in the browser, this also removes network latency hence providing a smooth interactive experience. Kana provides a streamlined one-click workflow for all steps in a typical scRNA-seq analysis, starting from a count matrix and finishing with marker detection and cell type annotation. Results are progressively rendered immediately as the underlying analysis step is complete and are presented in an intuitive web interface for further exploration and iterative analysis. Testing on public datasets shows that kana can analyze over 100,000 cells within 5 minutes on a typical laptop.
The application is hosted on GitHub: http://github.com/jkanche/kana. The preprint is available at https://www.biorxiv.org/content/10.1101/2022.03.02.482701v1
Presentation Overview: Show
Recently, various visualization methods have been developed to analyze the scRNA-seq data. However, current visualization methods, including UMAP and t-SNE, are challenged by the limited accuracy of rendering the geometric relationship of populations with distinct functional states. Most visualization methods are unsupervised, leaving out information from the clustering results or given labels. This leads to the inaccurate depiction of the distances between the bona fide functional states. In particular, UMAP and t-SNE are not optimal to preserve the global geometric structure. They may result in a contradiction that clusters with near distance in the embedded dimensions are in fact further away in the original dimensions. Besides, UMAP and t-SNE cannot track cluster variance. The embedded cluster variance is not only associated with the true variance but also proportional to the sample size.
We present supCPM, a robust supervised visualization method utilizing clustering results, which separates different clusters, preserves the global structure and tracks the cluster variance. Compared with other existing methods using synthetic and real datasets, supCPM shows improved performance in preserving the global geometric structure and data variance. Overall, supCPM provides an enhanced visualization pipeline to assist the interpretation of functional transition and accurately depict population segregation.
Presentation Overview: Show
Abundance profiles from metagenomic sequencing data synthesize information from billions of sequenced reads coming from thousands of microbial genomes. Analyzing and understanding these profiles can be a challenge since the data they represent are complex. Particularly challenging is their visualization, and here we present a technique called a ""Microbiome Map"" which visualizes a microbiome profile using a Hilbert curve.
The maps are created using the Jasper software, which generates colorful 2D images that succinctly visualizes a microbiome sequencing profile. Color and location in a microbiome map play a vital role: locations represent a genome from a reference collection (whole-genome sequencing), or a set of OTUs (16S sequencing); and color can represent their relative abundance. Maps can also be interactively explored using Jasper, which integrates with online resources such as Ensembl, GenBank, and UniProt.
We discuss how microbiome maps can be a powerful asset for classification and prediction models by visualizing the strain-level abundances of 44K genomes in 328 samples from the Human Microbiome Project, as well as 5K species in 200 fecal samples from a collaboration with Kangwon National University and Seoul National University in South Korea.
More information can be found at ""www.microbiomemaps.org"".
Presentation Overview: Show
A main task in computational cancer analysis is the identification of patient subgroups (i.e. cohorts) based on a rich collection of metadata attributes (patient stratification) or genomic markers of response (biomarkers). Coral is a web-based cohort analysis tool that is designed to support this task: Users can interactively create and refine multiple cohorts, based on quantitative or categorical attributes, which can then be compared, characterized, and inspected down to the level of single items. The characterization includes the possibility for statistical testing between cohorts and provides intuitive access to prevalence information. Coral visualizes the evolution of cohorts as well as their relationships as a graph. Furthermore, findings can be stored, shared, and reproduced via the integrated session management. Coral is pre-loaded with data from over 128 000 samples from the AACR Project GENIE, The Cancer Genome Atlas, the Cell Line Encyclopedia, and two depletion screen datasets.
To demonstrate the usefulness of Coral, we reproduce findings from a published article about KRASG12C somatic mutations in the AACR Project GENIE patients. We analyze the KRASG12C mutation frequencies for Non-Small Cell Lung Cancer (NSCLC) and colorectal cancer patient cohorts with regard to their differences in race and gender.
Presentation Overview: Show
The Cancer Genome Atlas (TCGA) contains multidimensional molecular data of 11,000 cancer patients of 33 cancer types. In our work, we aimed to present a visual analysis tool integrating our recently published gene-signature based low-risk/high-risk TCGA patient cohorts (Zengin T and Önal-Süzek T., 2020; Zengin T and Önal-Süzek T., 2021) along with curatedTCGA patient cohorts with all the single-nucleotide variations (SNVs), the copy number variations (CNVs), RNA-seq and clinical data of 33 different cancer patients from TCGA http://tcganalyzer.mu.edu.tr/
Our interactive shiny-based web platform TCGAnalyzeR enables statistical analysis of big data in 4 main categories providing the users to interactively select the cancer type, data category(SNV/CNV/DEA/Clinical), mutation type (somatic or all), risk group(low-risk/high-risk) and cohort type(paired/all). Downloadable plots and data tables are provided to interactively visualize data specific to each category. Each plot has its filtration options. The gene and patient (sample) names given in the tables and plots are selectable which enables the user to add a gene or patient to the “My genes” or “My patients” panel respectively for filtering other plots and copying the selections to user's clipboard.
For 3 cancer types, LUAD,LUSC, COAD, we provide pre-clustered low-risk or high-risk cohorts using our gene signature method. For 15 cancer types, patient cohorts from curatedTCGA are integrated and for 5 cancer types, we computed an iCluster+ based multi-omic patient clustering and integrated them into the web interface enabling a comparative visual analysis of user-defined subcohorts.
Presentation Overview: Show
Cancer is the result of an evolutionary process, where somatic mutations accumulate over time in a population of cells. As such, a tumor is composed of multiple subpopulations of cells, or clones, with distinct complements of mutations. This intra-tumor heterogeneity is a major driver for resistance to therapy. Researchers use evolutionary trees, or phylogenies, to study intra-tumor heterogeneity and reason about cancer evolution. While many methods have been developed to visualize and interpret tumor phylogenies, these methods often provide either 1) a static image of clonal evolution that does not accommodate user interaction or 2) tree layout interfaces that do not incorporate clonal proportions and mutation details. Here, we introduce PhyloDiver, a novel visual analytics tool that enables end-users to study clonal evolution in an interactive fashion while remaining connected to the underlying annotated mutations.
Presentation Overview: Show
We present an interactive visual approach for the exploration and formation of structural relationships in embeddings of high-dimensional data.
These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, define properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all.
In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups---for example, by dividing a heterogeneous group of patients into several subgroups.
The original high-dimensional data for single items, groups of items, or differences between items and groups are accessible through additional summary visualizations and difference visualizations that complement the embedding with a detailed look at the high-dimensional attributes.
We carefully tailored these summary and difference visualizations to various data types and semantic contexts.
We implemented the approach as a web application, which is open-source and publicly available at https://jku-vds-lab.at/apps/embedding-structure-explorer.
Presentation Overview: Show
Reconstruction of clonal evolution involves complex integrated analyses. The results are, in addition to classical representation by phylogenetic or clonal evolution trees, commonly visualized using fish plots. In these plots, the development of every individual clone is displayed, considering time on the x-axis, and cancer cell fraction on the y-axis. Thereby, fish-shaped objects are generated.
Despite providing a comprehensive visualization of clonal evolution, fish plots display information only on clone-, not on allele-level. Biallelic mutations cannot be identified at first sight. However, with respect to disease progression, these mutations play an essential role. To fill this gap, we introduce plaice plots as a derivative of fish plots. The actual 'fish' become flatfish, i.e. plaice, and are mirrored - above and below the y-axis. The upper plot visualizes common clonal development, while the lower plot shows the fraction of remaining healthy alleles. For example, in case of mutated TP53 and additional del17p affecting the remaining healthy allele, the fraction of cells with deficient TP53 is marked in the lower plot. Similarly, X-chromosomal mutations in male samples, leading to a loss of the only available healthy allele, are visualized. Thereby, plaice plots allow for immediate identification of double-hit events.
Presentation Overview: Show
The rapidly growing cancer dependency maps pave the way to precision oncology by identifying and targeting the “Achilles’ heel” of cancer. There is a pressing need for software that systematically links such genetic (gene knockouts) and pharmacologic dependencies (small compounds). Here we present a web-based R Shiny app that incorporates heterogenous data from large-scale high-throughput CRISPR screens, pharmacologic screens, and molecular signatures library, jointly covering 17k genes, 20k drugs, and 1k cell lines. The major goal is to match gene knockouts and drug treatments that induce similar effects in cell viability and/or gene expression perturbation in order to address two fundamental questions: 1) which drugs can be potential surrogates to the knockout of a gene, and 2) which genes are potential targets or mechanisms of action of a drug. The app has four complementary and interconnected modules that address various query scenarios to identify potential druggable genetic vulnerabilities and understand the mechanisms of action of a known or new drug. The results are represented by interactive figures and networks, as well as annotated data tables. In summary, our Shiny app enables easy and systematic navigation, visualization, and integration of the rapidly evolving genetic and pharmacologic dependency maps of cancer.
Presentation Overview: Show
Networks have become a ubiquitous research focus of the biological and biomedical research fields. Complex phenotypes, such as disease vulnerability, result from single-gene mutations that act in isolation and result from the perturbation of a gene’s network context. Interactive visualisation can support interpretation and understanding of the complexities inherent in such biological network data. The research presented here aims to investigate and review existing network visualisation tools in terms of their ability to support human cognition and data exploration, and through this provide guidance for next steps to improve such visualisation.
The effectiveness of the visualisation tools was measured using 25 factors, which were identified from literature using the systematic review method. Additionally, primary data was gathered using interviews and surveys to capture the data analysts’ experiences, expectations and opinions about the visualisation tools. Such mixed methods approach enables the researcher to juxtapose results from different angles for an accurate conclusion.
The results show that out of all the visualisation factors considered in the research, only “Advanced search” has emerged as a non-essential factor to be included in visualisation applications for complex networks. However, features would reveal more profound insight into the essential factors for visualisation application for complex networks.
Presentation Overview: Show
ECellDive is a virtual environment where users can model, simulate and visualize biological systems in collaboration with their colleagues. In ECellDive, everything is a module representing either data (e.g. a metabolic pathway) or any transform on this data (e.g. a Flux Balance Analysis).
For demonstration purposes we import the Escher-FBA model in our virtual scene (Zachary A. King et al. 2017, doi:10.1371/journal.pcbi.1004321) and dive into it. Diving transfers us to a new scene containing the metabolic pathway encoded in Escher-FBA. From there on, we explore the pathway by strolling around. This is a major improvement compared to the original web app where we have to zoom in/out or pan to explore the model. Then, we highlight the structure of the network by grouping modules together automatically or manually. It is particularly efficient to help contextualize the model by, for example, visualizing cellular compartment and metabolic subsystems. Finally, we perform a Flux Balance Analysis (FBA) of the pathway and update the simulation results by knocking-out/activating reactions of interest. Finally, ECellDive is about collaboration: any changes can be exported and shared. But we can also join a session hosted by someone else in real-time to modify the same file.
Presentation Overview: Show
Animal venoms have fascinated humanity for a long time mainly due their complex actions and effects. Nowadays, these substances still intricate humans and represent one of the main drivers for the discovery of novel natural drugs with potential therapeutic, medicinal and agricultural properties. Venom's vary widely and its biotechnological relevance is mostly attributed to its complex composition, being composed of a plethora of peptides, enzymes and other molecular compounds. Due to the importance that venoms represent, a new field, Venomics, that combines high throughput data from different biological levels with molecular and computational techniques, has emerged. A higher understanding of these substances can aid the generation of more effective antivenoms and discovery of new biomolecules. Here, we present the VenOmics and Cell Signaling Environment for BioDiscoveries (VEnOmiCS4BD), a novel web-based public database, in development, for -omics storage and integration of multi-level venomous data, such as transcriptomics and proteomics, derived from venomous and envenomated organisms as well as platform for integrative analysis that allows data exploration of gene expression profiles, crossing experiments, signaling pathways and knowledge discovery. With VEnOmiCS4BD, we hope to facilitate Venomics research, serving as a commonplace for deposition and downstream analysis of heterogeneous biological data.
Presentation Overview: Show
Single-cell sequencing improves our ability to understand biological systems at single-cell resolution and can be used to identify novel drug targets and optimal cell-types for target validation. However, tools that can interactively visualize and provide target-centric views of these large datasets are limited. We present SciViewer (Single-cell Interactive Viewer), a novel tool to interactively visualize, annotate and share single-cell datasets. SciViewer allows visualization of cluster, gene and pathway level information such as clustering annotation, differential expression, pathway enrichment, cell-type specificity, cellular composition, normalized gene expression and comparison across datasets. Further, we provide APIs for SciViewer to interact with publicly available pharmacogenomics databases for systematic evaluation of potential novel drug targets. We provide a module for non-programmatic upload of single-cell datasets. SciViewer will be a useful tool for data exploration and target discovery from single-cell datasets. It is available on GitHub (https://github.com/Dhawal-Jain/SciViewer).
Presentation Overview: Show
Existing genomic visualization tools are tailored towards specific tasks and as such are limited in expressiveness. The Gosling visualization grammar defines a set of primitives that specify how genomic datasets can be transformed and mapped to visual properties, providing building-blocks to compose unique scalable and interactive genomic data visualizations. Gosling visualizations are defined via JSON, however, which can be tedious and error-prone to edit manually – especially for complex specifications containing many layered and repeated elements. Additionally, genomic datasets defined by the Gosling grammar are expected to be accessible via HTTP, which poses challenges for users since a simple web-server and/or HiGlass server must be configured separately to view local data. Here we present Gos – a Python library which includes an API designed for computational biologists to quickly compose Gosling visualizations. Gos allows the use of familiar language features (variables, functions, for-loops, etc.) to author validated Gosling specifications (JSON) and additionally implements data-loading utilities to transparently load local data into visualizations, abstracting away the complexity of configuring custom web-servers. Gos is designed for interactive analysis within a computational notebook environment and integrates into Jupyter Notebook, JupyterLab, and Google Colab.
Presentation Overview: Show
Studying patient variants in model organisms is an active area of research. The key challenge is determining an ideal model organism for modeling and studying the patient variant phenotype. This task requires collaboration between a diverse group of experts and involves complex evaluations across multiple metrics like sequence alignment, human protein, and gene expression. Though there are many challenges in comparing the expression variation of a gene-associated variant, the advent of new databases with preprocessed expression data across species and tissues has prompted the exploration of transcriptome diversity aiding scientists in selecting a suitable model organism for phenotypic studies. We are developing CoSIA (Cross-Species Investigation and Analysis), an R package that provides researchers with multiple metrics for choosing the most suitable model organism for study by measuring and visualizing a diverse group of gene expression-based metrics. CoSIA uses curated non-diseased wild-type RNA-sequencing expression data from Bgee to visualize a gene’s expression across tissues and model organisms. Additionally, CoSIA provides functions to measure and visualize transcriptome diversity for a gene using median-based coefficients of variation and Shannon Entropy calculations. Thus, CoSIA provides researchers with tools to visualize the variation in a gene’s expression profile to determine a suitable model organism.
Presentation Overview: Show
Visual patterns of tissues and cells in microscopy images can unravel valuable insights to understand human bodies and treat diseases (e.g., histopathology). Recent advances in spatial omics enable the analysis of tissues at the cellular level and lead to an explosion of research interest. However, current studies rarely discuss visual patterns, which is partly due to the difficulty for humans to interpret the generated multiplexed images, which can have more than 40 channels.
To tackle this research gap, this study proposes a visual analytics approach to facilitate the visual exploration of tissues and cells through visual pattern mining. Specifically, the proposed method consists of a backend data module and a frontend visualization module. The backend module employs a beta-VAE module and extracts visual patterns by simultaneously considering all channels of the multiplexed images. The frontend module supports users in arranging and grouping items (e.g., cell thumbnails, tissue patches) based on the identified visual patterns. Users can examine the distribution of certain visual patterns and associate the item visual patterns with their spatial contexts and other types of biological information. A preliminary case study on breast cancer demonstrates the effectiveness of our proposed approach.
Presentation Overview: Show
Visualisation of cancer tissues is important for diagnosis, identifying driving pathological processes and potential biomarkers. Existing visualisation methods do not represent different tissue components and the tumour microenvironment intuitively and therefore are difficult to interpret by pathologists. Previously, we developed ShapoGraphy (www.shapography.com), a user-friendly web app for interactive creation of new glyph-based representations. Here we use ShapoGraphy to develop semantically relevant representation of multiplexed tissue image data that facilitate the pathological assessment and pattern discovery of tumour microenvironment phenotypes.
We will present the development of our representation and demonstrate its utility using several datasets measuring protein activities in stromal, immune and cancer cells. We will also present the exploration of various glyph design choices that uses different shapes and marks to represent different tissue compartments and tumour heterogeneity. To determine the effectiveness of our approach, we reviewed our designs with pathologists and biologists. We found that a representation that utilises compactly arranged hexagons that encode variables using the colour and symbols is more favourable. Finally, we will discuss general guidelines for producing effective glyph-based representation. In summary, our approach addresses the limitations of other visualisation approaches and provides a flexible way for summarising tissue image data.
Presentation Overview: Show
As collections of data grow in size, it is increasingly important to have efficient means of analyzing large data sets. Topological data analysis (TDA) uses concepts from the mathematical field of topology to not only efficiently examine large data sets, but to make inferences related to the "shape" of data. In this project, we use Mapper, a tool from TDA that summarizes data into a graph, to discover an underlying structure relating the shapes of more than 3,300 Passiflora leaves from 40 different species. As the Mapper graph has a structure, or "shape" of its own, we think of it as a "shape of shapes" that provides information on the interplay between the developmental processes determining leaf shape within a single plant and the evolutionary processes between species. In particular, we examine the interactions between leaf species and both leaf age and leaf area by constructing a Mapper graph for each measure. For each node in the resulting graphs, we then compute the average leaf shape to obtain a graph structure that reveals how morphometric differences between species relate to the developmental changes that must occur for those shapes to be realized.
Presentation Overview: Show
Over the last two decades, mass spectrometry based proteomics has evolved quite dramatically, levera...