Links within this page: Customized workflows for the interactive visualization of large-scale biological networks: A Python-based framework | Beyond normalization: Incorporating scale into high-throughput sequencing analysis | Student-Focused Session: Navigating In-Person Conferences | Applications of Generative AI, LLMs, and Traditional Machine Learning for Personalized Medicine and Drug Discovery

Customized workflows for the interactive visualization of large-scale biological networks: A Python-based framework

Monday May 13, 2024 08:00 AM - 12:00 PM

Networks are extremely useful tools for analyzing biological systems, and their effective visualization is crucial to properly interpreting and communicating their message. In large-scale networks with thousands of nodes and tens of thousands of edges, hand-tailoring the visualization is often impractical, therefore, it is hard to avoid a common pitfall of network visualization – the so-called “hairball network.” Interactive visualizations in which users can zoom in/out, pan, highlight, annotate, search, and select portions of the network offer a feasible middle ground that can help alleviate this challenge. Current freely available software packages and tools each lend some of their functionality to addressing parts of this task; however, end-to-end visualization workflows that integrate these individual parts are lacking, and knowledge thereof is sparse and not centralized. Our aim with this tutorial is to create working knowledge of the building blocks of interactive network visualizations, and to enable users to bring them together to design, create and deploy their own interactive network visualizations. Our tutorial will integrate elements from a lean set of tools (Gephi, NetworkX, Pajek, Bokeh, and Streamlit) in a Python-based framework, and use Google Colab for the majority of coding tasks.

Capacity: 40

Schedule

8:00 am – 10:00 am
Part I: To begin is half the work: A Pythonic preamble to preparing your network for Gephi
Having the right input is crucial to maximally leveraging the capabilities of Gephi. This portion will include a hands-on tutorial in Google Colab on how to generate multi-feature node and edge files for highly customized visualizations.

Part II: Visualizing your networks in Gephi – using key common and advanced functionalities for interactive visualizations
This portion of the tutorial will include an in-depth walkthrough of the visualization process with a focus on using Data Laboratory features that will enable downstream interactive visualizations. It will also include some helpful tips on layout choices and basic network analyses.

10:00-10:15 am
Coffee break

10:15 am - 12:00 pm
Part III: Using the Gephi output for inter-operable interactive visualizations
This portion of the tutorial will focus on using Bokeh to create interactive versions of the Gephi-generated networks and adding additional features such as dropdown menus and interaction events (e.g., mouse click and hover events) via custom JavaScript callbacks.

Part IV: Using Streamlit to create and deploy interactive network visualization applications
Building an interactive application with your networks can boost the visibility of your research. This portion of the tutorial will focus on designing and deploying a Streamlit web app of your interactive network visualizations.

Organizers

Arda Halu, PhD, Instructor in Medicine and Associate Scientist, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School

Prerequisites

Working knowledge of Python
Gephi (installed locally on the participant's laptop)
Access to Google Colab (requires a Google account)
- or -
The following Python libraries installed locally:
- Pandas
- NetworkX
- Bokeh
- Streamlit

- top -

Beyond normalization: Incorporating scale into high-throughput sequencing analysis

Monday May 13, 2024 08:00 AM - 12:00 PM

In this workshop, we discuss a novel modeling framework for high-throughput sequencing (HTS) data that moves away from normalization-based analyses and addresses some critical limitations with normalization-based approaches. Many tools for analyzing sequence count data use some form of statistical normalization in an attempt to remove the non-biological variation in sequencing depth. However, simply removing the non-biological variation is not sufficient to recover biological insights from these data because the unmeasured variation in scale (i.e., size) is often important. Recently, we have shown that many normalizations make some identifying assumption about the scale of system in order to perform many common modeling tasks for HTS data (Nixon et al., 2023). Moreover, we found that even slight error in these assumptions could lead to a troubling phenomena called unacknowledged bias: statistical estimates are biased but this bias is not considered in corresponding uncertainty estimates (e.g., confidence intervals or credible sets). In the context of HTS analyses, we found this unacknowledged bias could lead to false positive rates as high as 80% and false negative rates as high as 86%. To address these limitations, we developed Scale Simulation Random Variables (SSRVs) which use a generalization of normalizations which we call scale models (Nixon et al., 2023). In contrast to normalizations, scale models make explicit, transparent, and modifiable assumptions about the system scale.

In this workshop, we will discuss the pitfalls associated with normalization-based analyses and demonstrate how incorporating scale uncertainty can address some of these limitations. Going further, we will show how scale can be incorporated into many analyses such as differential abundance, introduce a variety of easy-to-use software tools, and highlight recent modifications to the ALDEx2 software suite, which makes this software the first and only tool capable of considering scale uncertainty in analyses (i.e., considering uncertainty in the chosen normalization). Participants will gain an understanding of how the ALDEx2 software was modified to incorporate scale, including a discussion of the new functionality we have incorporated into the package to facilitate scale-based analyses. Furthermore, through guided analyses, we will show how incorporating scale can improve the flexibility, robustness, and reproducibility of HTS analyses and demonstrate how these tools can reveal novel biological insights by allowing analysts to create models that more faithfully reflect known patterns of scale variation (e.g., decreased microbial load after antibiotic treatment). Furthermore, workshop participants will have dedicated time to apply these tools to their own data with support from the workshop organizers.

Capacity: 30

Schedule

Review of HTS data (e.g., collection and processing), common normalizations, their assumptions, and the concept of scale with a guided discussion/analysis of a previously published microbiome study.
Introduction and guided discussion/analysis of incorporating scale uncertainty in analyses in ALDEx2 with application to the data sets from part 1.
An introduction to Robust Differential Set Analysis including scale sensitivity analyses and the Log Fold Change Sensitivity Hypothesis Test with a guided DSA analysis to identify up- and down-regulated pathways in previously-published gene expression study.
Dedicated time to discuss and apply learned concepts and tools to attendees’ own data.

Organizers

Michelle P. Nixon, Ph.D: This email address is being protected from spambots. You need JavaScript enabled to view it.. Corresponding organizer
Gregory Gloor, Ph.D: This email address is being protected from spambots. You need JavaScript enabled to view it..
Justin D. Silverman, Ph.D., MD: This email address is being protected from spambots. You need JavaScript enabled to view it.

Prerequisites

Workshop participants are expected to have a basic knowledge of R. Installing the ALDEx2 (version 1.35.0 or higher), DESeq2, and fgsea packages prior to the workshop is recommended

References

Nixon, M. P., Letourneau, J., David, L. A., Lazar, N. A., Mukherjee, S., and Silverman, J. D. (2023). Scale reliant inference. arXiv preprint arXiv:2201.03616.

- top -

Student-Focused Session: Navigating In-Person Conferences

Monday May 13, 2024 1:00 PM - 5:00 PM

In this student-focused interactive half-day session we will have several activities designed to introduce trainees who may not have attended an in-person conference to the academic meeting environment. While the discussions will be geared toward first-time attendees, other students will be encouraged to attend not only to participate in the networking component of the session but to help impart any experiences from previous meetings. The day's events will include not only time for participants to hear from student-experts but also interact with other attendees in order to form connections that will be useful during the entire GLBIO meeting, and hopefully beyond.

Capacity: 50

Webstie: https://attending-conferences.github.io/glbio24.html

Schedule

13:00-13:45	Scientific Speed Dating
13:45-14:00	Welcome and Introduction to Session
14:00-14:45	Student Tips and Tricks Discussions
14:45-16:00	Project Concept Ideation (submit topics here)
16:00-16:45	Interactive Session on Science Communication featuring members of BioZone
16:45-17:00	Closing remarks

Organizers

Dan DeBlasio, Assistant Teaching Professor, Carnegie Mellon University
Raehash Shah, Undergraduate Student/Computational Biology Society President, Carnegie Mellon University
Caleb Ellington, PhD Student/Graduate Student Association President, Carnegie Mellon University/CPCB
Monica Dayao, PhD Student, Carnegie Mellon University/CPCB

Prerequisites

This session is open to trainees at any level (undergraduates, graduates, postdocs, etc.) who may not have attended an in-person meeting before, or have some tips and pointers that may be useful to others they want to share.

- top -

Applications of Generative AI, LLMs, and Traditional Machine Learning for Personalized Medicine and Drug Discovery

Monday May 13, 2024 1:00 PM - 5:00 PM

As the amount of data increases, there is heightened complexity, complicating data science. Thus, a need exists for sophisticated technology that can address the changing landscape of data, harness inherent information and translate it to a tangible outcome.

The workshop will focus on applying new tools in the field of multi-omics, data integration, and artificial intelligence. For example, state-of-the-art software packages for precision medicine and large language models (LLMs) including OpenAI’s GPT-4 as well as other LLMs in the field. With greater understanding of existing tools and their use comes opportunity.

Capacity: 150

Schedule

TBD

Organizers

Gu, Quincy, Ph.D. in Bioinformatics and Computational Biology, Research Collaborator, Department of Neurology, Mayo Clinic
Stover, Danielle Maeser, Ph.D. in Bioinformatics and Computational Biology, Senior Data Scientist, Humana This email address is being protected from spambots. You need JavaScript enabled to view it.
Griffin, Nicole Maeser, M.S. in Bioinformatics and Computational Biology, Senior Bioinformatician, Contractor through Sault Tribe Inc. for the Centers for Disease Control and Prevention, Immunology and Pathogenesis Branch This email address is being protected from spambots. You need JavaScript enabled to view it.
Patel, Ankush, Consultant, Mayo Clinic Platform, Mayo Clinic Department of Laboratory Medicine and Pathology; Consultant, Oncodea

Prerequisites

The workshops are relevant to all participants with an interest in understanding and/or adopting new technology to support healthcare (including undergraduates, graduate students, postdoctoral fellows, experimental researchers, industry professionals, etc.).

- top -