An open tech stack for the development of reproducible, portable, and robust bioinformatic pipelines.
Room: FI
Format: Live from venue
Moderator(s): Madelaine Gogol
- Nikhil Kumar, Memorial Sloan Kettering Cancer Center, United States
- Christopher Bolipata, Memorial Sloan Kettering Cancer Center, United States
- Sinisa Ivkovic, Memorial Sloan Kettering Cancer Center, United States
- Timothy Song, Memorial Sloan Kettering Cancer Center, United States
- Stephen Kelly, Memorial Sloan Kettering Cancer Center, United States
- Suleyman Vural, Memorial Sloan Kettering Cancer Center, United States
- Amy Webber, Memorial Sloan Kettering Cancer Center, United States
- Nicholas Socci, Memorial Sloan Kettering Cancer Center, United States
- David Solit, Memorial Sloan Kettering Cancer Center, United States
Presentation Overview: Show
The lack of reproducibility or portability are common problems for bioinformatic pipelines. In addition, complicated workflows can be difficult to run in production environments due to external factors such as non-deterministic errors in the server or filesystem. The Center for Molecular Oncology (CMO) at MSKCC has created a centralized bioinformatic service for labs to perform analysis on targeted cancer gene panels (MSK-IMPACT) and whole exome sequencing data. These pipelines need to be reproducible and portable so any lab can replicate the analysis, but at the same time, the pipelines need to be robust and easily debuggable for operations to run smoothly. To this end, the CMO has assembled, worked with, and developed a suite of open tools to be used in all phases of genomic analysis from pipeline development using Common Workflow Language, execution with TOIL an open-source pipeline management system, debugging with Elastic, Logstash, and Kibana, and testing with Jenkins. This presentation will highlight and discuss the open tech stack that is used at the CMO which can be used as a framework for other teams and organizations to develop and operate their own pipelines.