Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

BioVis: Biological Data Visualization

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Monday, July 22nd
10:15 AM-10:20 AM
Opening Remarks - BioVis
Room: Boston 1/2 (Ground Floor)
  • BioVis General Chairs
10:20 AM-11:20 AM
BioVis Keynote: Visualisation as a partner to AI and machine learning in drug discovery
Room: Boston 1/2 (Ground Floor)
  • Lindsay Edwards, GSK, United Kingdom

Presentation Overview: Show

The rise of Artificial Intelligence (AI) - principally deep learning - and its influence in our everyday lives is plain. Yet despite significant progress in computer vision and natural language processing, the promised impact of AI in drug discovery (and biomedicine generally) has been slow to emerge. I will start by outlining what exactly I mean by AI (and machine learning) and briefly introduce deep learning and its canonical applications. The focus of my talk will then be on a) how AI is being used to tackle fundamental problems in drug discovery while b) visualisation is being used to tackle fundamental problems in AI, specifically (but not solely) related to interpretability. Visualisation is emerging as a key component of the machine learning pipeline. This coming together of visualisation and AI presents an opportunity to both the visualisation and drug discovery communities.

11:20 AM-11:40 AM
Epilogos: information-theoretic navigation of multi-tissue functional genomic annotations
Room: Boston 1/2 (Ground Floor)
  • Alex Reynolds, Altius Institute for Biomedical Sciences, United States
  • Eric Rynes, Altius Institute for Biomedical Sciences, United States
  • Manolis Kellis, Massachusetts Institute of Technology, United States
  • Wouter Meuleman, Altius Institute for Biomedical Sciences, United States

Presentation Overview: Show

Charting the non-coding genome is paramount for understanding human biology and disease. Chromatin state maps provide biologically-meaningful annotations, but tools for navigating them are lacking. Here, we present Epilogos, an information-theoretic framework for analyzing, visualizing, and navigating multi-biosample epigenomic, transcriptomic, and functional genomic annotations. We apply Epilogos to 127 genome-wide epigenomic maps, demonstrating its diverse functionality, including: prioritizing salient genomic regions of interest based on information content; building a single reference annotation summarizing a set of biosamples; constructing intuitive visualizations of multi-tissue chromatin state maps; recognizing maximally-different regions between groups of tissues; discovering common epigenomic motifs across multiple regions of interest; and searching instances of these motifs in new regions or new biosamples. For each application, Epilogos provides a simple, rigorous, and efficient framework for quantitative interpretation of epigenomic signals that scales to thousands of genome-wide annotations.

11:40 AM-11:50 AM
GeneDMRs: an R package for Gene-based Differentially Methylated Regions analysis
Room: Boston 1/2 (Ground Floor)
  • Xiao Wang, Technical University of Denmark, Denmark
  • Haja Kadarmideen, Technical University of Denmark, Denmark

Presentation Overview: Show

Calculating methylation levels within a single gene could help investigate and identify the Gene-based Differentially Methylated Regions (GeneDMRs). Such GeneDMRs are better than single differentially methylated cytosines (DMCs), as it provides an overall gene methylation profiles.
The mean methylation of a gene of one treatment group is defined as:
∑_1^n▒〖(∑_1^m▒〖MR〗_ij )/(∑_1^m▒〖TR〗_ij )*〖W_ij and W〗_ij 〗= (∑_1^m▒〖TR〗_ij )/(∑_1^n▒∑_1^m▒〖TR〗_ij ),
where 〖MR〗_ij and 〖TR〗_ij are methylated and total read number of the involved CpG/DMC j at a given gene of individual i, n is the total individual number of one treatment group, m is total number of CpG/DMC involved in this gene and W_ij is the weight of reads.
GeneDMRs is defined by the comparisons across different treatment groups following logistic regression model:
ln(π_i/(1-π_i ))=u+βT_i+e,
where π_i is the mean methylation and T_i is the treatment.
The GeneDMRs is a user-friendly package that can easily output the required results and figures, such as mean methylation levels of all genes and CpG islands or a specific gene or promoter/exon/intron regions with the realized boxplot, heat map and correlation matrix, as well GO terms/pathways in hyper-/hypo-methylated categories. As more features for GeneDMRs are being updated, the current offline GeneDMRs package is available from authors.

11:50 AM-12:00 PM
The gene Expression and Analysis Resource (gEAR) Portal
Room: Boston 1/2 (Ground Floor)
  • Joshua Orvis, University of Maryland School of Medicine - Institute for Genome Sciences, United States
  • Ronna Hertzano, University of Maryland School of Medicine, United States
  • Anup Mahurkar, University of Maryland School of Medicine - Institute for Genome Sciences, United States
  • Yang Song, University of Maryland School of Medicine - Institute for Genome Sciences, United States
  • Kevin Rose, University of Maryland School of Medicine, United States
  • Beatrice Milon, University of Maryland School of Medicine, United States
  • Jayaram Kancherla, University of Maryland, United States
  • Dustin Olley, University of Maryland School of Medicine - Institute for Genome Sciences, United States
  • Brian Gottfried, University of Maryland School of Medicine - Institute for Genome Sciences, United States
  • Hector Corrada Bravo, University of Maryland, Collge Park, United States

Presentation Overview: Show

The gEAR portal (umgear.org) is an online tool for multi-omic and multi-species data visualization, sharing, and analysis. Originally designed for auditory researchers, the gEAR portal has now been expanded for general use. The gEAR is unique in its ability to allow users to upload, view and analyze their own data in the context of previously published datasets, as well as confidentially share their data with collaborators prior to publication. It is also unique in combining not only multiple species but multiple data types including bulk RNA-seq, sorted cell RNA-seq, single cell RNA-seq (scRNA-seq) and epigenomics in a one page, user-friendly, browseable format. Individual expression datasets can be displayed in a variety of ways alongside each other, including interactive bar, line or violin plots, colorized anatomical SVGs, tSNE and PCA plots.

We have integrated a scRNA-seq workbench into the gEAR which provides access to both the raw data of scRNA-seq datasets, as well as saved expert analyses where cell types have already been assigned – giving researchers rapid insight into gene expression of their cell type of interest. This presentation functions as a step-by-step introduction to the gEAR portal, now a mainstream multi-omic data source for the ear research community.

12:00 PM-12:10 PM
Proactive Visual and Statistical Analysis of Genomic Data in Epiviz
Room: Boston 1/2 (Ground Floor)
  • Jayaram Kancherla, University of Maryland, United States
  • Zhe Cui, University of Maryland, United States
  • Niklas Elmqvist, University of Maryland, United States
  • Hector Corrada Bravo, University of Maryland, Collge Park, United States

Presentation Overview: Show

Integrative analysis of genomic data that includes statistical methods in combination with visual exploration has gained widespread adoption. Many existing methods involve a combination of tools and resources: user interfaces that provide visualization of large genomic datasets, and computational environments that focus on data analyses over various subsets of a given dataset. Over the last few years, we have developed Epiviz as an integrative and interactive genomic data analysis tool that incorporates visualization tightly with state-of-the-art statistical analysis framework. We present a proactive and automatic visual analytics system integrated with Epiviz that alleviates the burden of manually executing data analysis required to test biologically meaningful hypotheses. Results of potential interest that are proactively identified by server-side computations are listed as notifications in a feed. The feed turns genomic data analysis into collaborative work between the analyst and the computational environment, which shortens the analysis time and allows the analyst to explore results efficiently. This effort provides initial work on systems that substantially expand how computational and visualization frameworks can be tightly integrated to facilitate interactive genomic data analysis.

12:10 PM-12:20 PM
ImmuneRegulation: A web based tool for identifying human immune regulatory elements
Room: Boston 1/2 (Ground Floor)
  • Selim Kalayci, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, United States
  • Robert J. Klein, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, United States
  • John S. Tsang, Multiscale Biology Section, Laboratory of Immune System Biology, NIAID & NIH Center for Human Immunology, NIH, United States
  • Bali Pulendran, Emory Vaccine Center/Yerkes National Primate Research Center at Emory University, United States
  • Gregory Poland, Mayo Clinic, United States
  • Ruth R. Montgomery, Section of Rheumatology, Department of Internal Medicine, Yale School of Medicine, United States
  • Eva Harris, Division of Infectious Diseases and Vaccinology, School of Public Health, University of California, Berkeley, United States
  • Chris Cotsapas, Department of Neurology, Yale University, United States
  • Irene Ramos, Department of Microbiology and Global Health & Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, United States
  • Myvizhi Esai Selvan, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, United States
  • Zeynep H. Gümüş, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, United States

Presentation Overview: Show

Humans vary considerably in their healthy immune phenotypes and in their immune responses to various stimuli. Recent high-throughput studies are contributing to an improved understanding of immune cell function and regulation. Extensive datasets are publicly available through multiple large consortiums, including the Human Immunology Project Consortium, which includes measurements of healthy and activated human immune system, coupled with detailed clinical phenotyping in well-characterized cohorts. However, there is currently no central resource to interactively explore this wealth of data.

We developed a user-friendly open-access web portal, ImmuneRegulation, that enables users to interactively explore immune regulatory elements. ImmuneRegulation currently provides the largest centrally integrated resource on human transcriptome regulation across whole blood and blood cell types, including (i) ~43,000 genotyped individuals with associated gene expression data from ~51,000 experiments, yielding associations on ~220 million eQTLs; (ii) 14 million transcription factor binding region hits extracted from 1,945 ChIP-seq studies; and (iii) the latest GWAS catalog with 67,230 published variant-trait associations. In its front-end, a visually intuitive web interface enables query, browsing and interaction with large volumes of data, including user-supplied data. For gene(s) queried, visual, interactive summaries of regulatory elements are returned to help explore and communicate results. ImmuneRegulation is available at https://icahn.mssm.edu/immuneregulation.

12:20 PM-12:40 PM
Poster lightning talks 1 - BioVis
Room: Boston 1/2 (Ground Floor)
2:00 PM-3:00 PM
BioVis Keynote: Physical, Contextual, and Full of Value? What do novel directions in Visualization teach us about judging the value of visualization?
Room: Boston 1/2 (Ground Floor)
  • Petra Isenberg, INRIA, France, France

Presentation Overview: Show

Judging the value of visualizations is an important task in many situations and research has already derived several methodologies and theoretical frameworks for judging the success of visualizations. Traditionally, the discussions on value have focused largely on assessing value of visualizations based on objective measures such as effectiveness and efficiency of knowledge or insight gained or task completion more broadly. In this talk I will discuss where in visualization measures of success or value are harder to apply from the existing literature. In particular, I will discuss the problem of judging value for data physicalizations and how focusing on traditional "values of visualizations" may actually devalue work in this area that has shown obvious merit. I will argue for a more holistic approach for defining the success or "value" of visualization, especially with a focus on human emotion.

3:00 PM-3:20 PM
Combining programmatic visualization and interactive exploration of large phylogenetic trees using ETE toolkit
Room: Boston 1/2 (Ground Floor)
  • Jaime Huerta-Cepas, Centro de Biotecnología y Genómica de Plantas, (CBGP UPM-INIA), Spain

Presentation Overview: Show

ETE is a computational framework that assists in the reconstruction, analysis and visualization of phylogenetic trees and multiple sequence alignments. Over the past 10 years, ETE’s visualization options have evolved to provide a highly flexible programmatic tree drawing system in Python, which includes the ability to create custom decorated figures with notebook integration.

However, the size and complexity of current phylogenetic datasets pose fundamental problems for practical visualization. First, smooth interactive browsing and zooming capabilities are difficult to achieve when dealing with large tree structures. Second, the computational needs to render large tree figures, with potentially tens of thousands of different graphical elements, tends to be high. Too high to be overcome by just delegating visualization into high-level libraries such as matplotlib, d3, or even raw OpenGL directives.

Here, I will report on the current advances in ETE’s tree visualization system, whose most experimental prototype allows for the interactive visualization of programmatically decorated tree diagrams with more than a million leaves. For this, I will discuss on successful (and failed) attempts developed around different strategies, such as performance tricks inspired on video-game scene rendering, and context-based smart zooming approaches.

3:20 PM-3:30 PM
Flud: a hybrid crowd-algorithm approach for visualizing biological networks
Room: Boston 1/2 (Ground Floor)
  • T. M. Murali, Virginia Tech, United States
  • Aditya Bharadwaj, Virginia Tech, United States
  • David Gwizdala, Bridgewater Associates, United States
  • Yoonjin Kim, Virginia Tech, United States
  • Kurt Luther, Virginia Tech, United States

Presentation Overview: Show

Network biologists use graphs to understand the protein interactions that underlie processes that take place in the cell. In order to present and analyze these graphs, researchers require aesthetic layouts of these graphs that clearly convey the relevant biological information. However, the problem remains challenging due to multiple conflicting aesthetic criteria and complex domain-specific constraints. In this research, we have developed Flud, an online game with a purpose (GWAP) that allows humans with no expertise to design biologically meaningful graph layouts with the help of algorithmically generated suggestions. The goal of the players is to create a layout that optimises a weighted-score based on previously defined aesthetic considerations and a new biologically inspired criteria -- “maximize the number of downward pointing.” Further, we propose a novel hybrid approach for graph layout wherein crowd workers and a simulated annealing algorithm build on each other's progress. To showcase the effectiveness of Flud, we recruited crowd workers on Amazon Mechanical Turk to lay out complex protein networks that represent signaling pathways. Our results show that the proposed hybrid approach outperforms state-of-the-art techniques for graphs with a large number of feedback loops.

3:30 PM-3:40 PM
Evidente – A visual analytics tool for data enrichment in SNP-based phylogenetic trees
Room: Boston 1/2 (Ground Floor)
  • Mathias Witte Paz, University of Tübingen, Center for Bioinformatics, Germany
  • Alexander Seitz, University of Tübingen, Center for Bioinformatics, Germany
  • Kay Nieselt, Center for Bioinformatics Tübingen, University of Tübingen, Germany

Presentation Overview: Show

In recent years the developments of the next-generation sequencing technologies have enabled genome resequencing projects of many individuals within one species. The genomes are often analysed with respect to single-nucleotide polymorphisms (SNPs) or small indels. This gives the possibility of reconstructing a phylogenetic tree of all individuals based on the detected mutations. From such a phylogenetic tree, a common question is to identify clade-specific SNPs within the reconstructed phylogeny, i.e. that support the computed topology. Then one also often wishes to analyse these mutations in more detail to retrieve for example functional consequences that the SNP may have on the organism or to compute enrichment of certain features within the phylogenetic tree. Here, we present on-going work in developing the visual analytics tool Evidente for annotation and analysis of metadata in SNP-based phylogenetic trees. Besides the visualization of a phylogenetic tree, Evidente enables the user to get a visual overview of distribution of SNPs across all samples as well as clade-specific SNPs within the tree. Furthermore, Evidente allows the user to run an enrichment analysis, for example for Gene Ontology (GO) annotations.

3:40 PM-3:50 PM
BioCicle: A Tool for Summarizing and Comparing Taxonomic Profiles out of Biological Sequence Alignments
Room: Boston 1/2 (Ground Floor)
  • Meili Vanegas-Hernandez, Universidad de los Andes, Colombia
  • Fabio Andres Lopez-Corredor, Universidad de los Andes, Colombia
  • Tiberio Hernandez, Universidad de los Andes, Colombia
  • Alejandro Reyes, Universidad de los Andes, Colombia
  • John Alexis Guerra-Gomez, Northeastern University Silicon Valley, Colombia

Presentation Overview: Show

Biological sequence comparison is a crucial step towards the process of identifying and cataloging new sequences. To achieve this, computational biologists must compare a new sequence to the permanently-growing biological databases. This comparison produces a myriad of results, from where extracting useful information is highly cost-intensive given the lack of tools providing an overview of the results. Moreover, it is possible to mistakenly catalog new sequences due to poor comparison analysis. This project is the outcome of a close collaboration with domain experts and a thorough study of the state of the art. As a result, six analysis tasks commonly performed by bioinformaticians were identified. Each task consists either in summarizing (for single sequence results) or comparing (for multiple sequence results): regions of interest (AT1), taxonomic reports (AT2), and sequences' descriptions (AT3). A user test was done with a group of computational biologist, in which we could evaluate both the usability and usefulness of the platform. In brief, this project presents a taxonomy of analysis tasks, a visualization design for AT2a and AT2b, a dummy use case for a subset of a real metagenomics result set, and an open source prototype (available at http://54.208.29.57) presented as a proof of concept.

3:50 PM-4:00 PM
PubScore: quantifying and visualizing the literature relevance of a gene set about any topic
Room: Boston 1/2 (Ground Floor)
  • Tiago Lubiana, School of Pharmaceutical Sciences, University of São Paulo, Brazil
  • Helder I. Nakaya, University of São Paulo, Brazil

Presentation Overview: Show

Big data analysis in biology often generates lists of genes. The task of finding important targets for experimental testing or confirming the validity of the analysis requires expertise on the specific topic of study. For facilitating interpretation, databases as Kegg and Reactome associate gene lists to predefined terms. These, however, have significant limitations: there are only pre-established categories; they become outdated with new literature and membership to a category is usually binary (either a gene is a member or not). In this work, we present PubScore, an R package that provides a quantitative measurement of the literature relevance of a gene set and interactive visualization. By searching PubMed, the PubScore package retrieves a literature score for combinations of genes and terms of interest, estimates a p-value of how likely such a score is expected at random and provides an interactive output, enabling pinpointing of interesting genes. Our approach is category-free, returns up-to-date associations and differs quantitatively between strong and weak relations. Combining the scoring algorithm with a straightforward, biologist-friendly visualization, PubScore is a powerful tool to understand the known relevance of a set of genes in the literature.

4:40 PM-5:00 PM
Fast and curious: Efficient exploratory visualization of cross-domain interaction networks
Room: Boston 1/2 (Ground Floor)
  • Mehmet Gönen, Koç University, Turkey

Presentation Overview: Show

Exploratory visualization of biological datasets have many application areas. For example, visualizing single-cell RNA sequencing datasets in two-dimensional plots using t-Distributed Stochastic Neighbor Embedding (tSNE) or Uniform Manifold Approximation and Projection (UMAP) algorithms have become standard approaches to identify cell subpopulations. However, these approaches were not extended towards modeling cross-domain interaction networks, which contain interactions between two types of objects such as drug-target or drug-disease pairs. We developed a computational framework to extend these dimensionality reduction algorithms to be able to map cross-domain interaction networks into low-dimensional subspaces by preserving both within-domain similarities and cross-domain interactions. We extensively tested our proposed framework, named Fast & Curious (FC), on five different drug-target and three different drug-disease interaction networks. We evaluated the visualization and retrieval performance of our algorithm by embedding these eight interaction networks into two-dimensional subspaces.

5:00 PM-5:10 PM
Density-Preserving Visualization of Single Cells
Room: Boston 1/2 (Ground Floor)
  • Bonnie Berger, Massachusetts Institute of Technology, United States
  • Ashwin Narayan, Massachusetts Institute of Technology, United States
  • Hyunghoon Cho, Massachusetts Institute of Technology, United States

Presentation Overview: Show

Exploratory analysis of single-cell omics datasets begins with visualizing the data in low dimensions to reveal structural insights that can be probed in future experiments. For these insights to be biologically meaningful, it is crucial that the visualizations be as faithful to the source dataset as possible. However, as we demonstrate both theoretically and empirically, widely-used methods for single-cell data visualization, including t-SNE and UMAP, entirely ignore local density information, leading to visualizations where the sizes and densities of clusters have no bearing on the true transcriptional variability of underlying cell states. To address this problem, we present density-aware t-SNE (da-SNE), which obtains visualizations with density landscape that better correlates with the source dataset. We achieve this property by incorporating a differentiable metric for the density of a point in a dataset into the t-SNE objective function to obtain a joint optimization problem. On simulated and real datasets, we demonstrate da-SNE not only allows researchers to accurately compare cluster sizes and densities in the visualizations, but also more faithfully represents overlapping clusters and is more robust to parameter selection than t-SNE. Our approach is broadly applicable to other data science domains where the local density of data encodes valuable information.

5:10 PM-5:20 PM
iBioProVis: Interactive Visualization and Analysis of Compound Bioactivity Space
Room: Boston 1/2 (Ground Floor)
  • Ahmet Sureyya Rifaioglu, Middle East Technical University, Turkey
  • Maria Jesus Martin, EMBL-EBI, United Kingdom
  • Ataberk Donmez, Middle East Technical University, Turkey
  • Aybar Can Acar, Middle East Technical University, Turkey
  • Rengül Atalay, Middle East Technical University, Turkey
  • Tunca Dogan, European Bioinformatics Institute, Turkey
  • Mehmet Volkan Atalay, Middle East Technical University, Turkey

Presentation Overview: Show

Visualization and interpretation of high-dimensional chemical compound and target space is critical for better understanding of the mechanisms of bioactivity space and drug discovery process. Here, we describe iBioProVis, which projects and visualizes compounds on 2D space based on their structural features in the context of their cognate targets. The inputs are pairs of ChEMBL target identifiers and the output is the 2D projection plot of the active compounds of the input targets. By looking at the distribution of compounds(i.e.,points) in a projection, the user can infer that compounds that are close to each other may possess similar binding characteristics. One of the interesting additional feature is that the user can also provide a list of SMILES strings as input. By this way, the user can observe the projection of these compounds along with the projections of previously reported active compounds of the selected targets. iBioProVis provides an interactive environment where users can select different compounds and get several information about them. iBioProVis also provides cross-references to well-known databases so that users can easily relate the entities and navigate to those databases by clickable links. iBioProVis is freely available at http://ibioprovis.kansil.org/.

5:20 PM-5:30 PM
BioVis Challenges session @VIS
Room: Boston 1/2 (Ground Floor)
5:30 PM-5:50 PM
Poster lightning talks 2 - BioVis
Room: Boston 1/2 (Ground Floor)
5:50 PM-6:00 PM
BioVis@VIS and Closing Remarks
Room: Boston 1/2 (Ground Floor)
  • BioVis General Chairs
6:00 PM-8:00 PM
BioVis poster session
Room: Boston 1/2 (Ground Floor)