ISMB 2016 Applied Knowledge Exchange Sessions
Attention Conference Presenters - please review the Speaker Information Page available here.
- AKES01: Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
- AKES02: Community Efforts to Enable Scalable, Reproducible, and Portable Biomedical Data Analyses
- CANCELLED: AKES03: Synthetic Biology Open Language – standards, tools and guideline on managing both the functional design and the sequence details of your experiments
- AKES04: Living on the Edge (of Translational Informatics) - Opportunities and Challenges for Integrating Bioinformatics into the Clinical Realm
- CANCELLED: AKES05: Cytoscape 3 User Tutorial: Introduction to network visualization and analysis using Cytoscape
Room: Pelican 2 (Swan Hotel)
Organizer(s):
John’s interests lay in applying new technologies, hardware, and paradigms to genomic and biological problems in a way that is accessible to the average bench biologist. His Ph.D. work at the University of Texas in Austin was in Biomedical Engineering, involving computational chemistry and binding interactions. In 2011 John joined the life sciences computing group at the Texas Advanced Computing Center and has served and led a number of projects, all centered around either developing tools and infrastructure to support life sciences research or training scientists to leverage advanced computing resources. John has been consistently involved in teaching and training for over 8 years, including university courses, 1-on-1 mentoring, consulting, workshops, and presentations. Staying on the "front lines" of teaching technology to scientists is critical to his work as domain researcher, active programmer, and an experienced teacher.
Presentation Overview:
Better tools enable new discoveries; it is as true for computation as it is for experiment. Computing has been a pervasive part of scientific research across disciplines for a while, but as computational requirements increasingly exceed the capabilities of single workstations, the pressure is mounting for researchers to develop a new skillset and leverage a new toolset.
This workshop addresses the challenges and requirements for working effectively on cloud computing and high performance computing resources, discusses the key principles that should guide responsible scientific computation and collaboration, and using hands-on sessions presents practical solutions using emergent software tools that are becoming widely adopted in the global scientific community. Specifically, we will look at using “containers” to bundle software applications and their full execution environment in a portable way. We will look at managing and sharing data across distributed resources. And finally, we will tackle how to orchestrate job execution across systems and capture metadata on the results (and the process) so that parameters and methodologies are not lost. And perhaps the most important part, we will not be using the command line to achieve this.
Agenda:
Timing | Presenter | Topic Area/Activity Description |
---|---|---|
8:30-9:00am | John Fonner, UT Austin | Overview and teacher/student introductions (Presentation and Discussion) - Participants will have a clear understanding of the course goals, the tools we will use, and also information on the teachers and peers in the room |
9:40-9:20am | John Fonner, UT Austin | Connect laptops to online resources (Hands-on) – Instructors will make sure everyone is set up to participate in course activities. |
9:20-10:15am | Matthew Vaughn, UT Austin | Containerization tutorial (Hands-on) – Participants will learn about containers (Docker in this case), building a Dockerfile, running a container, and data handling with containers |
10:15-10:45am Coffee Break | ||
10:45-11:05am | Matthew Vaughn, UT Austin | Cyverse Science APIs (Presentation) – Discussion of managing distributed data storage and compute resources by using application programming interfaces |
11:05-12:00pm | John Fonner, UT Austin | Hacking session 1: Systems and Data movement (Hands-on) – Participants will register data and compute resources with the Cyverse Science APIs and learn to access the same data through the web or through Python or bash scripts using Jupyter notebooks |
12:00-1:00pm Lunch | ||
1:00-1:50pm | Matthew Vaughn, UT Austin | Hacking Session 2: Scalable Analyses (Hands-on)- Participants will use a web portal and Jupyter notebook to launch analyses, manage job execution, set up notifications, and share results with each other. |
1:50-2:10pm | Matthew Vaughn, UT Austin | Metadata and Reproducibility (Presentation and Discussion) – Interactive discussion on the aspects of reproducibility for science and how to achieve them practically. |
2:10-2:50pm | John Fonner, UT Austin | Hacking Session 3: Metadata, Reproducibility, and Collaboration (Hands-on) – Participants will practice associating metadata with their data, and then sharing the entire product with someone else to reproduce. |
2:50-3:30pm | John Fonner, UT Austin | Workflow Example (Hands-on) – Participants will be asked to integrate all they have learned by performing a workflow on genomic data using different compute resources for different steps and then sharing their process, results, and conclusions. |
3:30-4:00pm Coffee Break | ||
4:00-4:30pm | John Fonner, UT AustinJohn Fonner and Matthew Vaughn | Web Portals (Presentation) - With the Cyverse Discovery Environment and Agave ToGo as examples, we will look at alternate user interfaces and how to build community driven resources |
4:30-5:15pm | John Fonner, UT AustinJohn Fonner and Matthew Vaughn | Real World Applications (Q&A Session and Discussion) – The last hour will be used to apply the ideas of the course back to the domains of the participants. We will field questions and also use the time to review any section that participants would like to expand on. |
5:15-5:30pm | John Fonner, UT Austin | Closing thoughts and participation survey |
John’s interests lay in applying new technologies, hardware, and paradigms to genomic and biological problems in a way that is accessible to the average bench biologist. His Ph.D. work at the University of Texas in Austin was in Biomedical Engineering, involving computational chemistry and binding interactions. In 2011 John joined the life sciences computing group at the Texas Advanced Computing Center and has served and led a number of projects, all centered around either developing tools and infrastructure to support life sciences research or training scientists to leverage advanced computing resources. John has been consistently involved in teaching and training for over 8 years, including university courses, 1-on-1 mentoring, consulting, workshops, and presentations. Staying on the "front lines" of teaching technology to scientists is critical to his work as domain researcher, active programmer, and an experienced teacher.
After a postdoctoral fellowship in the Plant Genetics group at Cold Spring Harbor Laboratory, where he researched genome architecture, genetic regulation, and small RNA biology, Matt joined the research faculty of Cold Spring Harbor Laboratory in 2007 to conduct a program of research in epigenetics and life sciences cyberinfrastructure. In 2010, he joined the Texas Advanced Computing Center and today serves as the Director of Life Sciences Computing where he leads efforts to advance biologists' access to and utilization of advanced scientific computing technologies. Matt regularly develops and teaches training courses for the Texas Advanced Computing Center as well as research projects such as the iPlant Collaborative and the Arabadopsis Information Portal.
Learning Objectives:
- Build Containers – Like gift wrapping for code, it makes any scrappy workflow more socially acceptable. Containers let you (and everyone else) run your workflows almost anywhere and get the same answers.
- Use Science APIs – Once your workflows are containerized, Science APIs are the key to bending all computers to your will. It is perhaps the most powerful way to collaborate, capture metadata, orchestrate workflows, share data, and scale compute power.
- Publish responsibly – Capturing computational workflows only in the methods section of a paper or by posting source code is woefully insufficient now. Why not let your reviewers repeat your calculations with a few clicks?
Delegate Requirements
Participants should bring their own laptop.
Maximum Attendees: 30
Room: Toucan (Swan Hotel)
Organizer(s):
Dr. Kaushik is a community manager at Seven Bridges and assists both internal and external developers with using Docker and building CWL pipelines on the Cancer Genomics Cloud. He also has an established track record in professional development planning. He is former chair of the UC San Diego Bioengineering research expo which brings together 400+ researchers and biomedical professionals across California. He has led workshops on graph theory, statistical modeling, and the central dogma of molecular biology as an fellow for the Insight Data Science program.
Presentation Overview:
Reproducibility remains a major concern in biomedical research. Recently, it has been demonstrated that cancer informatics analyses performed within a single consortia may yield wildly variable results. As the collection of genomic data and analyses continue to accelerate, concerns about maintaining the accuracy of results continues to grow. Largescale, accurate cancer analyses demand scalable informatics. Scalability in turn requires reproducibility and portability of tools, analyses, and data to ensure that researchers can collaborate easily and effectively.
Recent technological developments and organizational efforts have sought to address the reproducibility problem in biomedical data analysis and have been successfully applied to cancer informatics. For example, Docker containers enable researchers to package software with all of its required dependencies and nothing more. This feature allows software to be shared with anyone in such a way that the exact analysis can be reproduced. Docker containers can be easily shared through GitHub, thirdparty repositories, or usertouser with plaintext files. Moreover, external tools can hook into Docker directly, using it as a component of complex pipelines or analyses. The Common Workflow Language (CWL) is one specification, which enables researchers to describe analysis tools and workflows that are powerful, easy, and portable. Dynamic computing environments, often referred to as ‘the cloud’, are able to support colocalization of cancer data, Docker+CWL workflows, and the computational resources required to perform largescale analyses. These environments can be extended with collaboration and project management tools to enable researchers to work together in a transparent and reproducible fashion.
These methodologies have enabled globalscale cancer genomics initiatives such as the International Cancer Genome Consortium (ICGC) PanCancer Analysis for Whole Genomes Project (PCAWG), and the National Cancer Institute (NCI) Cancer Genomics Cloud (CGC) pilot program. In this workshop, we will instruct attendees in Docker and CWL as well as their use and best practices, and discuss concretely how these technologies enable scalable, reproducible, and portable cancer informatics. We will also discuss the methodologies behind how these tools are developed and deployed and pose the following questions what are next steps for improving reproducibility in bioinformatics and scale informatics efforts? What have we learned from analyses of thousands of cancer genomes that can be applied to other diseases and other consortium efforts? In addition, we will encourage discussion about unmet needs and future solutions in cancer informatics.
Agenda:
Timing | Presenter | Topic Area/Activity Description |
---|---|---|
8:30-8:40am | Gaurav Kaushik, Seven Bridges | Introduction to speakers, overview of agenda, aims for the day PDF |
8:40-9:25am | Brian O’Connor, OICR |
Hands On: Using Docker to Enable LargeScale Cancer Genomics Initiatives |
9:25-9:45am | Michael Crusoe, University of California Davis |
Lecture: The Inception and Development of the Common Workflow Language: a Model for Community-Driven Projects |
9:45-10:15am | Gaurav Kaushik, Seven Bridges | Lecture: Scalable, Collaborative, Reproducible, and Extensible Analysis of TCGA Data in the Cloud Dr. Davis Dusenbery will discuss the Cancer Genomics Cloud pilots project funded by the National Cancer Institute. The overarching goal of the project is to explore how co-localizing large genomics datasets, like The Cancer Genome Atlas, with dynamic compute infrastructure to analyze them, can make learning from these data faster, and ultimately enable precision medicine. The Cancer Cloud pilots also serve as a model for how collaborative research may be conducted for a variety of disease types at the national scale. |
10:15-10:45am Coffee Break | ||
10:45-11:15am | Gaurav Kaushik, Seven Bridges |
Hands-on Tutorial: Building and Executing a Cancer Analysis Pipeline on the CGC
|
11:15-11:45am | Kyle Ellrott, Oregon Health & Science University |
Lecture: Collaborative Cancer Informatics |
11:45-12:15pm | All Hands | Forum: The audience will break into groups to discuss about individual components of the day's events:
|
12:15-12:30pm | Summary & Conclusion | Attendees, presenters, and organizers will reconvene to summarize the discussions (5 min each). |
Dr. Kaushik is a product manager for the Seven Bridges softwaredevelopment kit for buildingportable applications and workflows using Docker and CWL. He is also the community manager for the Cancer Genomics Cloud, which entails assisting developers in using open tools to enable their research.
Dr. O'Connor is the Managing Director of Cloud Computing at OICR. He has been a member of major genomics collaborative efforts such as the PanCancer Analysis of Whole Genomes. He has also helped develop DockStore, an open platform for sharing Dockerbased tools described with the Common Workflow Language which is used by the Global Alliance for Genomic Health (GA4GH).
Prof. Ellrott is a leading expert in cancer genomics and data integration and management as a contributor to the UCSC Genome Browser. He has participated in and led DREAM challenges, an open-science effort to find solutions for pressing problems in biological science and human health.
Michael Crusoe is the Community Engineer for the Common Workflow Language, which entails managing the development of CWL but also external communications with collaborators and users. He is also Staff Engineer in the C. Titus Brown lab at UC Davis, where he helped developed new software for analysis of high throughput sequencing data.
Learning Objectives:
From this workshop, you will:
- Get informed about the challenges in enabling reliable cancer informatics and ongoing solutions
- Learn how to create your own Docker containers and CWL applications to run locally or on the cloud
- Become familiar with the inception and execution of the Cancer Genomics Cloud pilot
- Be able to get involved in the next DREAM challenge and impact our understanding of cancer
- Give your input to help iterate, improve, and extend current collaborative science models
Delegate Requirements
Attendees will benefit from prior experience with bioinformatics or computational biology, especially in the field of cancer. This session will serve as a suitable introduction to dynamic computing, Docker, Common Workflow Language, and the broader subject of reproducible analyses.
Hands-on demos of publicly available software will be conducted so attendees are encouraged to bring their laptops.
AKES03: CANCELLED: Synthetic Biology Open Language – standards, tools and guideline on managing both the functional design and the sequence details of your experiments
AKES04: Living on the Edge (of Translational Informatics) - Opportunities and Challenges for Integrating Bioinformatics into the Clinical Realm
Room: Toucan (Swan Hotel)
Organizer(s):
Samuel Volchenboum is a pediatric oncologist and the director of the University of Chicago Center for Research Informatics and the Associate Chief Research Informatics Officer. His group provides divisional support for bioinformatics, data warehousing and business analytics, high-performance computing, application development, and data governance. His team has built a variety of systems used in research, including a platform for collecting and reporting pharmacogenomics data as well as tools to support the clinical molecular pathology core. Dr. Volchenboum’s research is focused on using large clinical data sets to understand the impact of hospital events on patient care and safety. Dr. Volchenboum also directs the University of Chicago Graham School Master’s Program in Biomedical Informatics. He is a chair-elect of the AMIA Genomics and Translational Bioinformatics Working Group.
Presentation Overview:
Bioinformatics tools and techniques are becoming an essential component of clinical care, yet integrating these new modalities into clinical practice remains a challenge. From molecular diagnostics to pharmacogenomics to molecular pathology and molecular medicine, it is increasingly critical for there to be integration between the bioinformaticians performing the analyses and the clinicians interpreting and acting on the results. This session will focus on these interfaces, with particular attention to the tools, systems, and processes required for an efficient transition “from bench to bedside.” The speakers all have experience working on both sides of this imaginary divide and will provide insights into driving discovery into the clinic. It is expected that the audience member will gain a more in depth understanding of the issues facing data scientists and clinicians alike. Dr. Russ Altman will moderate the session and introduce each speaker using relevant examples from the literature. Speakers will cover topics ranging from an overview of bioinformatics techniques used in clinical practice to the ethics and legal implications of genomic testing. Anyone working at the interface of genomics and medicine will find this session useful and informative.
Agenda:
Timing | Presenter | Topic Area/Activity Description |
---|---|---|
1:30-1:45pm | Russ Altman |
Introduction to speakers / overview
|
1:45-2:15pm | Samuel Volchenboum |
Lecture - Overview of bioinformatics techniques and uses in
|
2:15-2:45pm | Robert R. Freimuth |
Lecture - Utilizing genomic data in clinical systems
|
2:45-3:15pm | Lewis Frey |
Lecture - Collection of data for research
|
3:15-3:30pm | Subha Madhavan |
Lecture - Practical Precision Medicine: Integration of clinical and genomic data to support cancer care
|
3:30-4:00 pm Coffee Break | ||
4:00-4:15 pm | Subha Madhavan |
Demo - G-DOC Plus: A Data Science Platform for Precision Medicine Research
|
4:15 - 4:45 pm | Casey Overby |
Lecture - Increasing the reach of clinical genomics research and genomics-informed care
|
4:45-5:15 pm | Jessie Tenenbaum |
Lecture - Ethical, legal, and social implications of genomic testing Though significant advances have been made toward realization of genomic medicine, precision medicine is still in its early days. Here we describe a case in which a patient’s treatment was altered based on direct-to-consumer genetic testing. Genomic medical guidelines continue to evolve such that a different, and less aggressive, therapeutic course would be recommended today. This case illustrates issues around the translation of genetic knowledge into clinical practice: patient engagement and personal preferences, patient and |
5:15-5:30 pm | Panel | Panel discussion |
Samuel Volchenboum is a pediatric oncologist and the director of the University of Chicago Center for Research Informatics and the Associate Chief Research Informatics Officer. His group provides divisional support for bioinformatics, data warehousing and business analytics, high-performance computing, application development, and data governance. His team has built a variety of systems used in research, including a platform for collecting and reporting pharmacogenomics data as well as tools to support the clinical molecular pathology core. Dr. Volchenboum’s research is focused on using large clinical data sets to understand the impact of hospital events on patient care and safety. Dr. Volchenboum also directs the University of Chicago Graham School Master’s Program in Biomedical Informatics. He is a chair-elect of the AMIA Genomics and Translational Bioinformatics Working Group.
Dr. Altman is a professor of bioengineering, genetics, & medicine and past chairman of the Bioengineering Department at Stanford University. His primary research interests are in the application of computing and informatics technologies to problems relevant to medicine. He is particularly interested in methods for understanding drug action at molecular, cellular, organism and population levels. Dr. Altman has conducted many teaching sessions in which he curates large amounts of seminal literature and distills the material into the most relevant publications. He delivers the annual AMIA “year in review,” a perennial favorite and standing-room-only lecture.
Dr. Freimuth is an Assistant Professor of Biomedical Informatics in the Department of Health Sciences Research, Mayo Clinic, and maintains adjunct appointments at University of Minnesota-Rochester and Arizona State University. The goal of his research is to develop scalable and interoperable genomic clinical decision support tools, driven by knowledge bases that utilize standards-based data models, terminologies, message formats, and algorithms, that will enable robust management and delivery of genomic knowledge as well as the evaluation of medical outcomes resulting from genomic-guided therapy decisions. He is a member of several research networks and formal standards development initiatives focused on genomic medicine. Dr. Freimuth is co-Chair of the Clinical Pharmacogenetics Implementation Consortium (CPIC) Informatics Work Group and he recently completed two terms as Chair of the AMIA Genomics and Translational Bioinformatics Working Group
Dr. Overby is an Assistant Professor in the Divisions of General Internal Medicine and Health Sciences Informatics at Johns Hopkins University (JHU). Dr. Overby is also Adjunct Investigator in the Genomic Medicine Institute at Geisinger Health System. Her research interests span a number of areas at the intersection of public health genetics and biomedical informatics, including applications that support translation of biological knowledge to clinical care and population healthcare, delivering health information and knowledge to the public, and developing knowledge-based approaches to use big data (including electronic health record data) for population health.
Dr. Madhavan is Director of the Innovation Center for Biomedical Informatics (ICBI) at the Georgetown University Medical Center and Associate Professor of Oncology. She is a world leader in data science, clinical informatics and health IT who is responsible for several biomedical informatics efforts including the software development of Georgetown Database of Cancer (G-DOC) a resource for both researchers and clinicians to realize the goals of personalized medicine and co-directs Lombardi Cancer Center’s Biostatistics and Bioinformatics shared resource. She has taught similar sessions in the past, most recently at the Pacific Symposium on Biocomputing.
Dr. Frey has focused on methods of data integration and analysis for the purpose of discovery. Novel analysis of integrated heterogeneous data provides opportunity for discovery and improved patient care through bench-to-bedside translational research. He is leading the development of big data methodologies applied to the Veterans Affairs Informatics and Computing Infrastructure (VINCI). The goal is to create a system for distributable clinical, personalized, pragmatic predictions of outcome (Clinical3PO) with easy deployment of preconfigured virtual machines. The impact will be a product that is plug and play with existing infrastructure at health care institutions. A distributable Clinical3PO system will enable novel analysis on sequential health care data that overcomes the curse of dimensionality currently limiting the research field.
TBA
Learning Objectives:
- Understand the challenges facing both bioinformaticians and clinical researchers when developing and using new technologies for translational research
- Identify considerations when making genomic testing accessible for use in clinical care
- Comprehend the opportunities and complexities of collecting clinical data for research
- Understand key ethical and legal implications of genomic testing
Maximum Attendees: 50