When and how to build a web-app and software package?
May 15, 2023
13:00 - 17:00
We recently developed a web application, MolEvolvR, to characterize proteins using molecular evolution and phylogeny. This session will serve as a behind-the-scenes (BTS) sneak peek into what MolEvolvR does, how it does it, and its origin story. The hands-on training component will provide the basic principles of why and how to build a dashboard/web-app for your biological problem of interest with R/Shiny, and when and how you would set up your back-end as an R package.
Workshop outline | Rationale with hands-on component
- When are you ready to set up a dashboard? When would you build a simple web-application instead?
- A little bit of bioinformatics: A quick preview of our MolEvolvR web-app. How it came to be, from special cases to a generalized version? How to characterize proteins of interest by identifying the underlying molecular building blocks? Where do these features come from and what does it mean, i.e., how to interpret the results using the lens of evolution?
- How do you pick your feature list for your web-app? Some lessons and reflections.
- Project management 101: how to set up version-controlled projects and repositories for collaborative computational biology projects (with other dry/wet-lab colleagues)?
- Dashboard 101: How to develop a dashboard for your data/visualizations using Rmarkdown/Quarto?
- Web-app 101: How to develop and host an interactive R/Shiny web application?
- R-package 101: When, why, and how would you create your first R-package?
- Quick tour on how to set up the backend that supports a workflow, data tables and visualizations, for a multi-functional web-app such as MolEvolvR.
- Janani Ravi, University of Colorado Anschutz Medical Campus
- Jacob D Krol, University of Colorado Anschutz Medical Campus
- Faisal S Alquaddoomi, University of Colorado Anschutz Medical Campus
Max capacity: 25
Intended audience: Any bioinformatician/computational biologist interested in developing open source software (e.g., R-package) and web applications (e.g., using R/Shiny) for their particular use cases, or for developing more broadly applicable open resources. Familiarity with R/tidyverse would help but is not required.
ITCR Training Network
Using reproducibility techniques and version control to improve daily bioinformatics research practices and publish scientific websites
May 15, 2023
8:00 - 12:00
This workshop aims to equip individuals with an understanding of how to apply reproducibility
skills and version control with GitHub and code review in their daily work. To provide a hands-on
activity to allow attendees to practice these skills and concepts and to provide the attendees
with a useful product, attendees will use our tools, open source tools for training resources
OTTR (https://www.ottrproject.org/), to create or update a free scientific website hosted on
GitHub during the workshop with automated checks and rendering. OTTR uses R packages,
GitHub templates and GitHub actions and is a suite of publishing tools that can automate
portions of the creation process for scientific websites and courses. The workshop is aimed for
individuals who are undergraduates, graduate students, medical students, or postdoctoral
The workshop will cover fundamental concepts about reproducibility practices, code review, and
the basics of version control with Git and GitHub. These basics will include pushing and pulling
data and code to GitHub, creating development branches, submission of pull requests, code
review and collaborative coding, hosting free websites with GitHub pages, and automating
processes with GitHub actions. Reproducibility best practices are critical in performing rigorous
and transparent research. This workshop will not only help enable trainees with skills to
incorporate more reproducible practices into their daily work by performing a hands-on activity,
but learners will also leave with a new website with automations to check various aspects of the
website, such as broken urls and misspellings based on a custom dictionary. Now more than
ever, personal scientific websites are extremely helpful for demonstrating an individual's
contributions to the field (for career advancement). In addition, if a learner already has a
personal website, websites to advertise software tools and provide documentation are also vital
for increasing tool usage.
- Candace Savonen, Fred Hutchinson Cancer Center
- Carrie Wright, Fred Hutchinson Cancer Center
Intended audience: Trainees (undergraduates, graduate students, medical students, or postdoctoral fellows) that work on informatics related code.
Best practice of bioinformatics engineering:
how to implement a scalable production workflow,
with examples in population level cohort study
May 15, 2023
8:00 - 17:00
A common challenge in the field of bioinformatics is the engineering of an informatics solution from a scientific proof-of-concept. Many people build their own pipelines, which often are consist of a large variety of scripts and software of different origin and quality, including many proof-of-concept scripts developed by researchers with the intention on methodology rather than production code. The complexity of these pipelines, together with a lack of clear standards of software engineering in this research field, make it difficult for institutions and companies to properly deploy, test, update and maintain software solutions – which, nevertheless, are required to be run in production level to serve as the primary competitive advantage in the applications or field of innovation.
In this workshop, we start with an in-depth discussion of “what makes good bioinformatics workflow” and dive into different perspectives of engineering best practices to turn a scientific proof-of-concept into a more repeatable, reproducible, scalable, and interactive/exploratory software solution. We will provide attendees access to Illumina’s cloud software platform tools to facilitate learning processes through hands on exercises, but the concepts are the same regardless of where they get implemented. Attendees may take these concepts and apply in their own infrastructure, on the cloud or on premise to implement better bioinformatics software solutions.
We discuss on requirements, considerations, thoughts, and tools to implement a bioinformatics pipeline to:
- Make it as easy as possible to re-run
- Make it repeatable
- Make it as easy as possible to adjust configurations
- Support better testing and with isolated executive environment
- Distribute jobs and achieve higher scalability
- Be able to track where things are
- Account for software maintenance, data management and project management
- Enable Data and analysis exploration, in real time and interactive
We then extend the discussion into how best to organize cohort level population study in genomics, and how to aggregate single sample/subject focused pipelines’ results into a data
warehouse for interactive exploration. Examples we will show include a cohort analysis tool on Illumina’s platform, as well as a crowdsourcing study on temporal transcriptomic coronavirus host infection data - from hypothesis generation to multi-omics validation through an integrative data portal development.
Learning Objectives for the Workshop For the attendees:
- To get exposed to considerations and requirements to operationalize bioinformatics workflow
- To get exposed to software tools and features that could help make bioinformatics solutions more repeatable, reproducible, scalable, and interactive
- To get some hands-on experience with the exercises powered by a cloud platform
- Cross-pollinating ideas for cohort level population genomics study
- Illumina team led by Ming Wu, Manager, ACE Data Sciences, Illumina
- Ken Eng, Project lead
- Rachel Sherman
- Sidki Bouslama
- James Flynn
- Dan Crawford
URL to see tools that will be taught during the workshop: https://developer.illumina.
Max capacity: 40
Intended audience: Beginner or intermediate level. This tutorial is aimed at bioinformaticians, computational biologists or molecular biologists – anyone who has the need to implement and operationalize a solution based on a scientific prototype in the field of genomics and bioinformatics.