Workshop 03 (WK03): Bioinfo-core Workshop

Attention Conference Presenters - please review the Speaker Information Page available here.

Monday, July 11, 2:00 pm – 4:30pm

Organizers:

Charlie Whittaker, The Koch Institute at MIT, Cambridge, United States
Jian-Liang (Jason) Li, Sanford Burnham Prebys Medical Discovery Institute, Orlando, United States
Madelaine Gogol, Stowers Institute, Kansas City, United States

Presentation Overview:

The workshop will address "The practical experience of big data and big compute". Members of core facilities will share their experience and insights via presentation and panel discussion.

Part A: Big Data

Speaker: Yury Bukhman, Great Lakes Bioenergy Research Center
Time: 2:00 pm – 2:15 pm

Presentation Overview:

The Computational Biology Core of the Great Lakes Bioenergy Research Center supports mostly academic labs at the University of Wisconsin, Michigan State University and other universities. With a variety of experiment types, they are challenged to manage and analyze disparate data and metadata in a diverse academic environment. Details of these data challenges and solutions will be discussed.


Speaker: Alberto Riva, University of Florida
Time:2:15 pm – 2:30 pm

Presentation Overview

The Bioinformatics Core of the ICBR provides bioinformatics services to the large and diverse scientific community of the University of Florida. Routine handling of projects covering a vast spectrum of biological and biomedical research requires a flexible and powerful data infrastructure. Implementation details of a software development environment (Actor) for reliable, reusable, reproducible analysis pipelines will be discussed, as well as insights on managing big data projects in a core setting.


Speaker: Big Data Panel
Time: 2:30 pm – 3:00 pm

Moderator: Madelaine Gogol, Stowers Institute for Medical Research
Panel Speaker: Yury Bukhman, Great Lakes Bioenergy Research Center
Panel Speaker: Alberto Riva, University of Florida
Panel Speaker: Hua Li, Stowers Institute for Medical Research
Panel Speaker: Jyothi Thimmapuram, Purdue University

The presenters, panelists, and attendees will explore practical experience with “big data” as well as use of public datasets in a panel discussion. Topics may include accuracy of annotation, trust of data, raw versus processed, data validation, and QC.


Coffee Break (3:00-3:30 pm)


Part B: Big Compute

Speaker: Sergi Sayols Puig, Institute of Molecular Biology Mainz
Time: 3:30 pm – 3:45 pm

Presentation Overview

With a variety of computing infrastructures available, building robust, transferable pipelines can increase utilization of compute resources. NGS analysis pipelines implemented as docker containers and deployed on a variety of compute platforms – (cluster, supercomputer, or workstation) will be discussed.


Speaker: Jingzhi Zhu, The Koch Institute at MIT
Time: 3:45 pm – 4:00 pm

Experiences transitioning a Bioinformatics core from a local to a cloud-based compute solution will be discussed, including the motivation, performance, cost, and issues with deploying bioinformatics pipelines to Amazon EC2 instances.


Speaker: Big Compute Panel
Time: 4:00 pm – 4:30 pm

Moderator: Brent Richter, Partners HealthCare
Panel Speaker: Sergi Sayols Puig, Institute of Molecular Biology Mainz
Panel Speaker: Jingzhi Zhu, The Koch Institute at MIT
Panel Speaker: Sara Grimm, NIEHS
Panel Speaker: TBA

The presenters, panelists, and attendees will discuss how people manage to stay on top of compute requirements for their own sites in a panel discussion. Major hurdles to overcome and the compromises needed for success will be discussed. We may also touch on experiences with containers and portable computing.