Workshop 02 (WK02): Workshop on Education in Bioinformatics (WEB) - Exploiting Cloud and Virtual Resources for Training
Attention Conference Presenters - please review the Speaker Information Page available here.
Monday, July 11, 10:10 am – 12:40 pmOrganizer(s):
Presentation Overview:
Computing in cloud-based infrastructure is becoming increasingly prevalent in bioinformatics. Popularity with numerous code repositories, forums and in particular, application distribution platforms, has grown in parallel with increased usage of the cloud for bioinformatics. The movement goes beyond virtual machines and open sharing of code. Cloud services (Amazon, Google, iPlant), or home institution settings make available full fledge analysis pipelines (tools, data storage, access to high-performance computing), scalable to any size of research project. How do bioinformatics training programs keep pace with this changing landscape? How do bioinformatics trainers use these technology resources in their own classes, while keeping the complexity and ensuing stress to a minimum, for themselves and the trainees? What are the best technology choices for a trainee and how can learning be translated from the training environment back to the lab? More importantly, can the use of cloud resources in training be used to effectively enhance bioinformatics skills?
Through a series of presentations show-casing the use of cloud-based technologies and related tools in bioinformatics training programs, this workshop aims to highlight how these technologies can be effectively used in educational environments.
This workshop will consist of three presentations on topics ranging from packaging bioinformatics software to cloud-based compute environments, and their easy and reliable use in classrooms; and it will conclude with a panel debate on the merits and pitfalls of shifting bioinformatics training programs to the cloud.
Part A: Getting the Best Training in Computational Biology in an Era of Cloud Computing and Big Data
Speaker: Phil Bourne, National Institutes of Health, Bethesda, United Sates
SlidesPresentation Overview:
The NIH has established a data science initiative in recognition of the increasingly analytical nature of biomedical research. From the point of view of the external research community this is embodied in the Big Data to Knowledge (BD2K) initiative which has an extensive training component. This talk will outline some of the experiences and opportunities with current training programs – courses available, training modalities etc. - with particular emphasis on the use of clouds.
Part B: How to Scale Science and People Using the Cloud
Speaker: Nirav Merchant, Director of Bio Computing, University of Arizona, Co-PI, CyVerse Collaborative (formerly iPlant Collaborative)
SlidesPresentation Overview:
Nirav Merchant will discuss the benefits (and challenges) of adapting cloud environments to education as well as research. Working within their own customized instances, an educator (faculty member, workshop instructor, a colleague) can offer learners a uniform and reproducible setting – making it easier to teach, and safe to make mistakes. As learners scale, the cloud scales with them – from learning how to use Linux on a single-cpu instance to understanding how to mix and match cloud with high-performance computing and data grid resources.
Coffee Break (11:10 - 11:40 am)
Part C: Packaging computational biology tools for broad distribution and ease-of-reuse
Speaker: Matthew Vaughn, Director of Life Sciences Computing, Texas Advanced Computing Center, Co-PI: Cyverse, Araport, Jetstream Cloud
SlidesPresentation Overview:
A typical instance of computational biology software is composed of interpreted code, compiled binaries, shared libraries, and shell scripts, sometimes mixed in with use of web services or databases, running in the context of a complex computer operating system, atop increasingly sophisticiated physical resources. How can we expect computations to be sharable and reproducible, and how can we hope to train people to use such resources? This talk will describe how the Texas Advanced Computing Center enables distribution and use scientific software via various approaches, including Jupyter notebooks, Github repositories, computation-oriented web service APIs, virtual machine images, and container technologies such as Docker, and how these approaches complement one another for training and education.
Part D: Panel - Experience Exchange: Ideas for Exploiting the Cloud in Bioinformatics Training
Moderator: Michelle Brazas, Ontario Institute for Cancer Research
Panel Speaker: Phil Bourne, National Institutes of Health
Panel Speaker: Nirav Merchant, iPlant Collaborative
Panel Speaker: Annette McGrath, Life Science Informatics, CSIRO, Australia
Panel Speaker: Matthew Vaughn, Life Sciences Computing, Texas Advanced Computing Center
Presentation Overview:
This panel session will be a forum for discussion and exchange of strategies and approaches for applying cloud technologies and tools to the bioinformatics classroom. It will also be a discussion of the gaps and pitfalls in doing so. Come share your experiences and ideas on cloud-based bioinformatics training with the panel and audience.