View Posters By Category
Session A: (July 7 and July 8)
Session B: (July 9 and July 10)
Short Abstract: The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects produced RNA-Seq data for tens of thousands of cancer and non-cancer samples, providing an unprecedented opportunity for data mining, cancer drug target discovery and data visualization. Recent years, promising cancer drugs, including Panitumumab and Bevacizumab, have been developed that inhibit cancer cells by selectively targeting over-expressed EGFR or VEGF genes in cancer cells, while leaving normal cells unharmed. Genetic alterations will influence gene expression directly or indirectly. It is a frequently used strategy to discover candidate cancer drug targets through the finding of cancer specific expressed genes. This study aims to investigate normalization methods for integrating different expression datasets, explore effective approaches to obtain differentially expressed genes, profile the prognostic genes and transcripts in survival analyses, characterize the distribution of cancer specific genes or transcripts, and analyze their biological functions. Meanwhile, we will develop tools for visualizing integrated expression data, with the aim to disseminate such data to the wide research community. We also plan to find useful biomarkers for early diagnosis. Finally, by investigating the association between genetic alterations and over-expression, we aim to elucidate the underlying genetic mechanisms of differentially expressed genes.
Short Abstract: Motivation: RNA-seq data is biased and accurate detection of differentially expressed genes is not a trivial task. As a result, researchers should analyze RNA-seq data like they would any other biased multivariate data. The most effective approach to modern data analysis is to iterate between models and visuals, and to enhance the appropriateness of models based on feedback from visuals. As it stands, there is a need to make it easier for researchers to use models and visuals in a complimentary fashion during RNA-seq data analysis. Results: We use real RNA-seq data to show that our visualization tools can detect normalization problems, DEG designation problems, and common errors in RNA-seq analysis. We also show that our tools can identify genes of interest that cannot be obtained by models. Conclusion: In this project, we do not propose that users radically change their approach to RNA-seq analysis. Instead, we propose that users simply modify their usual approach to RNA-seq analysis by quickly assessing the sensibility of their models with multivariate statistical graphics. We plan to serve a role in this solution by publishing a new R software package that includes the useful plotting techniques we introduce in this project.
Short Abstract: Thanks to the reduction in cost of sequencing technologies, many laboratories are collecting enormous amounts of genomic data. Powerful and easy to use software applications are needed to translate the information hidden in sequencing data into biological discovery to impact patient care. Available software packages typically require advanced programming knowledge, system administration privileges or they are web services that force researchers to work on outside servers. We developed D3Oncoprint to facilitate the interactive exploration of genomic datasets on local machines with no programming skills required. D3Oncoprint is a standalone application used to visualize and dynamically explore annotated genomic mutation files. D3Oncoprint provides links to curated databases (e.g., CIViC, OncoKB, My Cancer Genome, and FDA approved drugs), as well as curated gene lists from BioCarta pathways, and FoundationOne cancer panels to explore commonly investigated biological processes. D3Oncoprint is free and available for download from the website of the Biometric Research Program (BRP) of the Division of Cancer Treatment and Diagnosis, NCI (https://brb.nci.nih.gov/d3oncoprint/). The focus on interactive visualization with biological and medical annotation significantly lowers the barriers between complex genomic data and biomedical investigators. D3Oncoprint can help researchers explore their own data, without the need of an extensive computational background.
Short Abstract: We present the BioBlox suite of gaming-inspired programs based around protein docking (www.bioblox.org). BioBlox3D (www.bioblox3d.org) is a web-based program for desktops tackling the challenge of starting with two unbound protein molecules and predicting the structure of the complex. BioBlox3D provides an-easy-to-use approach for biologists to explore graphically protein docking. A motivation is that humans are particularly skilled at visual reasoning, sometimes outperforming computers. We have a gaming scoreboard and will establish a crowd source approach for protein docking. BioBloxVR is a visual reality version of BioBloxd3D which provides an inspiring game for exhibitions and outreach meetings. User can pick up, rotate and dock protein molecules using their Vive controllers in highly intuitive way. Inspired by BioBlox3D, we have developed a simple fun-to-play game named BioBlox2D designed for phones and tablets available from the App Store and Google Play. The game models fragment-based drug discovery. Small shapes with charges need to be correctly assembled so they fit into a complementary protein receptor. As the player progresses through the game there are questions in a biomedical quiz posing problems such as identifying glucose as the molecule used by our cells to produce energy. YouTube videos are at: https://www.youtube.com/watch?v=2z8y7rUWOos https://www.youtube.com/watch?v=4X9vgPzk_pM https://youtu.be/YInXSEVWPvk
Short Abstract: Advances in single-cell RNA-seq technology, flow cytometry and mass cytometry has enabled the expression profiling of a large number of genes and proteins for hundreds of thousands of individual cells. However, current visualisation techniques do not allow for effective display and understanding of the data due to the large number of points and use of non-immersive flat-screen visualisation. With the widespread availability of low-cost virtual reality (VR) devices, such as Google Cardboard, we propose the use these devices as an immersive environment for visualising single-cell data in order to improve the navigation and exploration of the large number of cells. We have developed starmap, a VR program for visualising single-cell data designed to work with low-cost VR headset. starmap offers a number of methods for interactions, such as wireless controller and voice control, and has a built-in star plot visualisation to allow user to explore features of the cells.
Short Abstract: Blood and corresponding saliva samples were collected from 1,213 subjects of all ages and sex presenting with fever and a parasitaemia level ≥2000 over a period of three years. Samples were collected from selected hospitals in Ado-Odo/Ota, Ogun State to determine the prevalence of falciparum malaria and its resistance genes. Incidence of falciparum malaria for males and females showed 51.04% and 39.62% respectively with a total incidence of 45.01%. Point mutations of K76T and N86Y in pfcrt and pfmdr1 genes respectively, as well as non-synonymous mutations in Pfk13 genes were targeted using polymerase chain reaction-restriction fragment length polymorphism (PCR/Nested PCR-RFLP) and sequenced for further analysis. Epidemiological studies identified Pfk13 genes in 21.84% of 55 blood samples and 44.44% of 15 saliva samples. For Pfcrt, 47.89% of 55 blood and 31.43% of 35 saliva samples were identified. Pfmdr1 showed 34.78% of 46 blood samples and 26.67% of 30 saliva samples evaluated. Findings from this study establish the prevalence of malaria and the resistance pattern of P.falciparum in the study area, hence aid in suggesting new/existing drugs of choice. It also suggests the use of saliva as a non- invasive malaria diagnostic tool that will be deployable to rural endemic areas.
Short Abstract: Enzymes play a crucial role in all living organisms as well as in industrial applications such as biofuel or cheese production. Most enzymes are catalytic proteins, that is, they accelerate a chemical reaction of one or more contacting substrate molecules. In general, these reactions do not happen at arbitrary positions of the enzyme’s surface but at specific binding sites. We present two visualization methods to assist users during the analysis of Molecular Dynamics simulations of such enzyme-substrate interactions. The first one targets the extraction and visualization of cavities, which are often the location of a binding site. Our second visual analysis method targets the substrate in the proximity of the enzyme. We visualize the temporally aggregated locations of the substrate molecules, thereby showing paths travelled by the substrate as well as preferred binding locations. We demonstrate the applicability of our approach for the visual analysis of simulation trajectories with more than 2M time steps, each one containing over 3M atoms, which results in more than 2 TB of hard drive space. Our proposed visualization of aggregated properties cannot only be visualized on standard consumer desktop hardware, but it also directly summarizes the temporal development during the simulation.
Short Abstract: Epigenetics involves the study of histones (proteins that DNA wraps around). Side chains — known as marks — attached to these histones may determine the function of the DNA wrapped around them. Identifying patterns or signatures consisting of 100 marks can be a challenging task. We developed a tool called HebbPlot, which can learn and visualize a signature from thousands of genomic locations that have the same function, e.g. active promoters. HebbPlot obtains vectors that represent overlaps between a set of regions and histone marks. These vectors are fed into a Hebbian network, which outputs a gray scale image of the overlaps between the genetic element and the epigenome. Each pixel represents the presence or absence of a mark. We used HebbPlot in six case studies conducted on 57 cell types. HebbPlots of promoters on the positive and the negative strands are mirror images, indicating the directionality of histone marks around active promoters. We confirmed that some marks are only present in high-CpG promoters in contrast to low-CpG promoters. HebbPlots show clear associations between the abundance of histone marks around coding regions and the level of gene expression. We hope HebbPlot will help biologists decipher the histone code.
Short Abstract: Affordable next-generation sequencing techniques and immuno-profiling assays have made it possible to monitor tumor evolution in patients over time, and to study their response to therapy. To interpret data from such studies, there is a critical need for visualization designed to aid researchers in visualizing and exploring temporal patterns within a single patient and across an entire patient cohort. In a week-long design sprint, we identified the requirements for a longitudinal cancer genomics visualization tool by interviewing potential end users. This also included the implementation and testing of a first prototype. Inspired by the design of the Domino technique (Gratzl et al., 2014), our approach is based on heterogeneous heatmaps representing patient samples (columns) at different timepoints (blocks). Each sample can be represented by various user-defined variables such as mutation count in genes of interest, or measurements from other assays. Between timepoint blocks, users can add treatment blocks representing information about drug regimens. Within each block, samples can be grouped to show proportions of patients with a particular attribute. With this grouping, the heatmap can be transformed iteratively to a Sankey diagram. We will report on our visualization approach and insights from our design process.
Short Abstract: We present Scalable Insets, a feature-centric technique to visually explore genome interaction maps with many 2D features. Genome interaction maps present an approximate likelihood of physical interaction of pairs of regions on the genome. These maps contain up to several thousands of 2D features such as compartments, domains, and loops. Visual inspection is critical to generate new hypotheses, evaluate the performance of feature detectors, and stratify features into groups. Exploration of many but sparsely distributed features in traditional pan-and-zoom heatmaps is challenging, as visual representations change across zoom levels, context and navigational cues get lost upon zooming, and navigation is time consuming. Our technique visualizes features too small to be identifiable at certain zoom levels using magnified thumbnail views of the features called insets. Insets support users in searching, comparing, and contextualizing features, while reducing the amount of navigation needed. They are dynamically placed either within the viewport or along the boundary of the viewport to offer a compromise between locality and context preservation. Features are interactively clustered by location and type and are visually represented as a pileup showing the average and variance of features to provide scalable exploration within a single viewport.
Short Abstract: An organism can be described by both its observable characteristics (phenotypes) and the underlying genes and genomic data (genotype) that cause the phenotype. There has been tremendous growth in data for both genomic data and phenotype imaging. This growth has been seen in the maize research community with sequencing of thousands of maize accessions and the availability of large-scale phenotype data. A challenge at MaizeGDB, the model organism database for maize, is to be able to build connections between the phenotype and genotype data sets. A GMOD project that consists of a web-based software package that allows annotation of the genotypic-phenotypic relationships is called BioDIG (Biological database of images and genomes). To integrate the phenotype images and the genomics information at MaizeGDB, we implemented and updated a maize-based version of BioDIG called MaizeDIG. MaizeDIG is enhanced to handle multiple genomes and integrated with genome browsers to make tracks showing mutant phenotypes images within their genomic context. MaizeDIG allows for custom tagging of images to highlight regions related to the phenotypes and to curate and search by gene model, gene symbol, gene name, and allele. MaizeDIG is preloaded with 2,721 mutant phenotype images that are available on seven genome browsers.
Short Abstract: Enhancers, as specialized genomic cis-regulatory elements, activate transcription of their target genes and play an important role in pathogenesis of many human complex diseases. Despite recent systematic identification of them in the human genome, currently there is an urgent need for comprehensive annotation databases of human enhancers with a focus on their disease connections. In response, we built the Human Enhancer Disease Database (HEDD) to facilitate studies of enhancers and their potential roles in human complex diseases. HEDD currently provides comprehensive genomic information for ~2.8 million human enhancers identified by ENCODE, FANTOM5, and RoadMap with disease association scores based on enhancer-gene and gene-disease connections. It also provides Web-based analytical tools to visualize enhancer networks and score enhancers given a set of selected genes in a specific gene network. HEDD is freely accessible at http://zdzlab.einstein.yu.edu/1/hedd.php.
Short Abstract: Visualization of regions of conserved synteny between two genomes is supported by a number of available software applications. However, existing tools do not support interactive visualization with the genome features in those blocks based on their biological properties such as function and phenotype. To address this usability gap, we have developed a new interactive web-based conserved synteny browser, the JAX Synteny Browser. The JAX browser allows users to easily navigate from a genome (wide) view of synteny to a conserved region of interest and then select which genome features are displayed in syntenic blocks of the reference and/or the comparison genome according to functional and phenotypic annotations. The dynamic, interactive visualizations allows users to explore syntenic relationships at four different view levels: genome, chromosome, block and feature. The initial implementation for the browser supports navigation and visualization of conserved synteny between the genomes of the laboratory mouse and human. However, the software design is intentionally genome agnostic and relies on data types that are available in well accepted data format standards where possible. The software can be used for any two genomes for which synteny block information can be generated and for which biological attributes of genome features are available.
Short Abstract: While the structure of populations has garnered increasing interest as a source of insight into their evolution and behavior, an intuitive and informative method for its visualization remain a challenge to be met. We previously developed PopNet, featuring network graphs and chromosome painting as innovations in visualizing the genomes of populations. To facilitate its use to a wider audience, we present PopNetD3, a browser-based interface to PopNet. Users can submit data to a cloud server for analysis, and view their network in-browser through a D3-based network visualizer. The advantages of PopNetD3 include low user requirements and stable run environment along with the introduction of new functionalities. PopNetD3’s functionalities are illustrated in an analysis of 44 whole genome sequence samples of Neisseria gonorrhoeae, the causative agent of gonorrhea. The network graph generated by PopNetD3 reveals 5 subpopulations within the sample set, and represents their relatedness as links between nodes. Geography is shown as an important factor modulated by long-distance travel. As well, segregation between male and female samples points to host-specific adaptations. The chromosome paintings, embedded within each node of the network, depict the degree of shared ancestry between subpopulations.
Short Abstract: Advances in high-throughput technologies are leading to major new insights into human genetic disease mechanisms. But the many results are scattered throughout the literature and are represented in many different ways, including free text, cartoons, pathway diagrams, and network graphs. Thus there is a need for a framework that is capable of integrating and presenting this knowledge in a coherent, comprehensive, and intuitive way. The MecCog framework utilizes formal concepts of biological mechanism to achieve this goal. In MecCog, a disease mechanism schema consists of a sequence of substate perturbations, starting at the DNA stage of biological organization and progressing through RNA, protein, molecular complex, cell, tissue and organ stages to the disease phenotype. Each substate perturbation produces the next through a mechanism module. Representation of uncertainty, ambiguity and ignorance, as well as inclusion of the evidence supporting each mechanism component, is tightly integrated into the framework. Mechanism schemas are represented by a diagrammatic language and are accessed through an interactive graphical interface (http://www.meccog.org).In addition to providing an integrative framework for disease mechanism, MecCog facilitates prioritization of future experiments, identification of new therapeutic targets, detection of epistatic interactions between loci in complex trait disease, and optimized therapy choice.
Short Abstract: The increasing abundance of genome-wide data has created a need for visual comparison between different genomic data sets as well as comparison between different regions in the same genome. We have developed HiGlass (http://higlass.io), a genome visualization tool that supports visualization of genome-wide 1D tracks and 2D interaction matrices in flexible view configurations that users can create on the fly. Recently, we extended HiGlass to provide a solution for comparing hundreds of genomic datasets efficiently by using a multi-resolution matrix data type with scaling and aggregation in one dimension. The ‘multi-vector’ matrix can store multiple data sets so that they are rapidly retrievable, thus eliminating the need to read and compare data across many files. The advantages of this solution are demonstrated in our use of HiGlass to display 256 ChIP-seq profiles using a heatmap as well as a chromatin state model with 15 states using a stacked bar chart. This data format can also be used to display genomic sequences. Returning tile data instead of pre-rendered image tiles allows for flexibility. This is particularly useful in features we have implemented for our visualizations, such as dynamic scaling, plot type selection, and smooth zooming and panning.
Short Abstract: The Arabidopsis Interactions Viewer 2.0 (AIV 2.0) is a vast improvement of a former web application (AIV 1.0) which allows a network visualization of 2.8 million experimentally-validated and 70,000 predicted Arabidopsis protein-protein and protein-DNA interactions curated by the Bio-Analytic Resource (BAR) for Plant Biology. Additional interaction data can also be loaded from other Proteomics Standard Initiative Common QUery InterfaCe (PSICQUIC) servers such as BioGrid or IntAct. Major technical improvements include using newer, faster technology such as cytoscape.js, allowing a much larger network to be loaded. Moreover, AIV 2.0 integrates multiple facets of biology such as subcellular localization pie-chart overlays, plant gene function terms (MapMan), group-by-localization layouts and filtering by co-expression scores. Multiple export options are available such as exporting to a Cytoscape-tailored JSON and tabular format (CSV) which allows user-defined scripting. To our knowledge, this is the only interactions visualization tool for plant biologists to feature the above facets of biological knowledge. Our goal is therefore to allow researchers use this tool’s multiple data layers to generate new hypotheses relevant to plant biology.
Short Abstract: Functional enrichment analysis (FEA) is a method used to identify functional profiles or over-represented classes in a set of genes (typically differentially expressed genes). Popular enrichment analysis algorithms/tools include singular enrichment analysis (SEA), gene set enrichment analysis (GSEA), and modular enrichment analysis (MEA). Regardless of the algorithms, an enrichment P-value is always calculated to indicate the enrichments of the tested biological function. Even though some FEA tools can visualize their particular analysis results, none of them allow for the visualization of enrichment scores across multiple lists of genes of interest, or across different analysis algorithms. To enable cross-comparison of FEA scores, we developed FeaVis, a user-friendly shiny application that enables easy and interactive visualization of several enrichment scores simultaneously. After uploading the FEA results into FeaVis, the user can interactively adjust filtering thresholds and search for a specific biological function on the datasets, to simultaneously visualize the corresponding enrichment analysis results via dot plots. Users can also save the generated figures locally. FeaVis is a shiny application publicly available at https://yanli.shinyapps.io/FeaVis_v1/
Short Abstract: Analysis of single-cell RNASeq (scRNA-Seq) datasets is currently a complex and time-consuming process, often requiring heuristics and guesswork from the user in order to obtain biologically meaningful results. Here we introduce UNCURL-App, a comprehensive online tool for analyzing scRNA-Seq data, which allows for the integration of prior knowledge into all stages of the analysis pipelines including clustering, visualization, and differential expression. This tool provides an interactive interface to our UNCURL software for data preprocessing and clustering, thereby allowing users to use UNCURL without programming. This step identifies cell types and creates a low-dimensional representation for visualization. In addition, it allows users to assess the importance of the identified clusters. This is done by finding the differentially expressed genes in each cell type, and integrating external knowledge bases into the data analysis process to determine the biological relevance of the identified genes. Finally, UNCURL-App allows users to interact with the analysis pipeline by iteratively splitting or merging cell types.
Short Abstract: Genome browsers that support fast navigation through large datasets while providing interactive visual analytics functions help scientists achieve deeper insight into biological systems. To help scientists understand data better, we develop and maintain Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Here we describe multiple IGB updates, including new capabilities to display and interact with high-throughput sequencing data. To demonstrate, we describe example visualizations of datasets from RNA-Seq, ChIP-Seq and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related datasets. To facilitate this, we enhanced IGB's ability to consume data from diverse sources, including Galaxy, the BioAnalytic Resource (bar.utoronto.ca), and multiple REST-based IgbQuickload sites. To support future visualization needs as new genome-scale assays enter wide use, we transformed IGB into a modular, extensible platform for developers to create and deploy all-new visualizations as IGB apps. We present an example IGB App called ProtAnnot, which searches InterPro and displays protein profile matches alongside gene models, exposing how alternative promoters, splicing and 3' end processing add, remove, or remodel functional motifs. These techniques make visual analysis faster and more convenient for biologists.
Short Abstract: Large and complex datasets often have internal structure that can be interpreted as association relationships and analyzed as networks. Cytoscape is often used to filter, cluster, and visualize data, particularly in molecular biology, but also in diverse domains such as sociology, economics, and computer science. While clustering can reveal substructure of large data, the visualization of substructure remains an unsolved problem. For example, while human genetic interactions can be visualized as a node-link diagram, the many links between clusters, or subsystems, often imposes such high cognitive load as to obscure actionable information. In our approach, we leverage different types of structural information to stratify data while maintaining links between strata, and then allowing visualization and exploration of linked layers simultaneously using various types of visualization techniques. In our example, the higher strata would be a subsystem consists of a group of genes, while the lower strata would be gene-gene interactions associated with the subsystems. Our new application HiView (http://hiview.ucsd.edu/) assists biologists in visualizing multiscale data generated by data-driven ontology toolkit (DDOT: http://the-data-driven-ontology-toolkit-ddot.readthedocs.io/en/latest/), consisting of 1) how genes participate in functional subsystems and 2) those genes’ interaction with other genes in various functional contexts.