WK02 - Exploiting Cloud and Virtual Resources for Training

Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

Sponsors

BD2K

Overleaf

F1000

General and Travel Fellowship Sponsors:

BMC Bioinformatics

IOCB Prague

Elixir Czech Republic

Prague

Vertex

Akamai

OBF

Bio-Ontologies

SysMod

Dassault Systemes Biovia Corp

PLOS

biovis

hitseq

ECCB

CAMDA

Workshop 02 (WK02): Workshop on Education in Bioinformatics (WEB) - Exploiting Cloud and Virtual Resources for Training

Attention Conference Presenters - please review the Speaker Information Page available here.

Monday, July 11, 10:10 am – 12:40 pm

Organizer(s):

Mainá Bitar, Brazil is currently a Post-Doctorate in Brazil and has been involved with education initiatives for a few years. She has been a member of the ISMB Student Council (ISMB-SC) Education Committee and former head of its Internship Committee. In Brazil, she has been a member of the Brazilian Association on Bioinformatics and Computational Biology (AB3C) consultive committee, a member of the ISCB Education committee and also the student representative on the board of ISCB.

Michelle D. Brazas is the Program Manager for Informatics and Bio-computing at the Ontario Institute for Cancer Research. She was previously the lead for the Canadian Bioinformatics Workshops (bioinformatics.ca) and Manager of Bioinformatics Education at OICR. She is also an executive on GOBLET and a member of the ISCB Education committee.

Fran Lewitter is Founding Director of Bioinformatics and Research Computing at Whitehead Institute for Biomedical Research. The group develops materials and provides training to biologists in the Institute. In addition, Fran is a member of the ISCB board of directors and the chair of the Education committee. She is also the former Education Editor for PLOS Computational Biology (currently on the Editorial Advisory Board) and treasurer of GOBLET.

Dr. Patricia M. Palagi is the Head of Training at the SIB Swiss Institute of Bioinformatics, Switzerland, and has been involved in bioinformatics education and training for several years. In the past, she has co-organised the ISCB workshops: ECCB12, WEB13, WEB14 and WEB15. Patricia is chair of the Fund-raising committee of GOBLET and also a member of the ISCB Education committee.

Presentation Overview:

Computing in cloud-based infrastructure is becoming increasingly prevalent in bioinformatics. Popularity with numerous code repositories, forums and in particular, application distribution platforms, has grown in parallel with increased usage of the cloud for bioinformatics. The movement goes beyond virtual machines and open sharing of code. Cloud services (Amazon, Google, iPlant), or home institution settings make available full fledge analysis pipelines (tools, data storage, access to high-performance computing), scalable to any size of research project. How do bioinformatics training programs keep pace with this changing landscape? How do bioinformatics trainers use these technology resources in their own classes, while keeping the complexity and ensuing stress to a minimum, for themselves and the trainees? What are the best technology choices for a trainee and how can learning be translated from the training environment back to the lab? More importantly, can the use of cloud resources in training be used to effectively enhance bioinformatics skills?

Through a series of presentations show-casing the use of cloud-based technologies and related tools in bioinformatics training programs, this workshop aims to highlight how these technologies can be effectively used in educational environments.

This workshop will consist of three presentations on topics ranging from packaging bioinformatics software to cloud-based compute environments, and their easy and reliable use in classrooms; and it will conclude with a panel debate on the merits and pitfalls of shifting bioinformatics training programs to the cloud.

Part A: Getting the Best Training in Computational Biology in an Era of Cloud Computing and Big Data

10:10 am – 10:35 am

Speaker: Phil Bourne, National Institutes of Health, Bethesda, United Sates

Presentation Overview:

The NIH has established a data science initiative in recognition of the increasingly analytical nature of biomedical research. From the point of view of the external research community this is embodied in the Big Data to Knowledge (BD2K) initiative which has an extensive training component. This talk will outline some of the experiences and opportunities with current training programs – courses available, training modalities etc. - with particular emphasis on the use of clouds.

Part B: How to Scale Science and People Using the Cloud

10:35 am – 11:10 am

Speaker: Nirav Merchant, Director of Bio Computing, University of Arizona, Co-PI, CyVerse Collaborative (formerly iPlant Collaborative)

Presentation Overview:

Nirav Merchant will discuss the benefits (and challenges) of adapting cloud environments to education as well as research. Working within their own customized instances, an educator (faculty member, workshop instructor, a colleague) can offer learners a uniform and reproducible setting – making it easier to teach, and safe to make mistakes. As learners scale, the cloud scales with them – from learning how to use Linux on a single-cpu instance to understanding how to mix and match cloud with high-performance computing and data grid resources.

Coffee Break (11:10 - 11:40 am)

Part C: Packaging computational biology tools for broad distribution and ease-of-reuse

11:40 am – 12:05 am

Speaker: Matthew Vaughn, Director of Life Sciences Computing, Texas Advanced Computing Center, Co-PI: Cyverse, Araport, Jetstream Cloud

Presentation Overview:

A typical instance of computational biology software is composed of interpreted code, compiled binaries, shared libraries, and shell scripts, sometimes mixed in with use of web services or databases, running in the context of a complex computer operating system, atop increasingly sophisticiated physical resources. How can we expect computations to be sharable and reproducible, and how can we hope to train people to use such resources? This talk will describe how the Texas Advanced Computing Center enables distribution and use scientific software via various approaches, including Jupyter notebooks, Github repositories, computation-oriented web service APIs, virtual machine images, and container technologies such as Docker, and how these approaches complement one another for training and education.

Part D: Panel - Experience Exchange: Ideas for Exploiting the Cloud in Bioinformatics Training

12:05 pm – 12:40 pm

Moderator: Michelle Brazas, Ontario Institute for Cancer Research
Panel Speaker: Phil Bourne, National Institutes of Health
Panel Speaker: Nirav Merchant, iPlant Collaborative
Panel Speaker: Annette McGrath, Life Science Informatics, CSIRO, Australia
Panel Speaker: Matthew Vaughn, Life Sciences Computing, Texas Advanced Computing Center

Presentation Overview:

This panel session will be a forum for discussion and exchange of strategies and approaches for applying cloud technologies and tools to the bioinformatics classroom. It will also be a discussion of the gaps and pitfalls in doing so. Come share your experiences and ideas on cloud-based bioinformatics training with the panel and audience.

ISCB On the Web

Flickr

Google

Youtube