Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

#ISMB2016

Sponsors

Silver:
Bronze:
F1000
Recursion Pharmaceuticals

Copper:
Iowa State University

General and Travel Fellowship Sponsors:
Seven Bridges GBP GigaScience OverLeaf PLOS Computational Biology BioMed Central 3DS Biovia GenenTech HiTSeq IRB-Group Schrodinger TOMA Biosciences

ISMB 2016 Applied Knowledge Exchange Sessions

Attention Conference Presenters - please review the Speaker Information Page available here.

Take the AKES Survey

AKES01:  Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Saturday, July 9, 8:30 am – 5:30 pm
Room: Pelican 2 (Swan Hotel)
Organizer(s):

John’s interests lay in applying new technologies, hardware, and paradigms to genomic and biological problems in a way that is accessible to the average bench biologist.  His Ph.D. work at the University of Texas in Austin was in Biomedical Engineering, involving computational chemistry and binding interactions. In 2011 John joined the life sciences computing group at the Texas Advanced Computing Center and has served and led a number of projects, all centered around either developing tools and infrastructure to support life sciences research or training scientists to leverage advanced computing resources. John has been consistently involved in teaching and training for over 8 years, including university courses, 1-on-1 mentoring, consulting, workshops, and presentations. Staying on the "front lines" of teaching technology to scientists is critical to his work as domain researcher, active programmer, and an experienced teacher.

Presentation Overview:

Better tools enable new discoveries; it is as true for computation as it is for experiment. Computing has been a pervasive part of scientific research across disciplines for a while, but as computational requirements increasingly exceed the capabilities of single workstations, the pressure is mounting for researchers to develop a new skillset and leverage a new toolset.

This workshop addresses the challenges and requirements for working effectively on cloud computing and high performance computing resources, discusses the key principles that should guide responsible scientific computation and collaboration, and using hands-on sessions presents practical solutions using emergent software tools that are becoming widely adopted in the global scientific community. Specifically, we will look at using “containers” to bundle software applications and their full execution environment in a portable way. We will look at managing and sharing data across distributed resources. And finally, we will tackle how to orchestrate job execution across systems and capture metadata on the results (and the process) so that parameters and methodologies are not lost. And perhaps the most important part, we will not be using the command line to achieve this.

 

Materials

Agenda:
TimingPresenterTopic Area/Activity Description
8:30-9:00am John Fonner, UT Austin Overview and teacher/student introductions (Presentation and Discussion) - Participants will have a clear understanding of the course goals, the tools we will use, and also information on the teachers and peers in the room
9:40-9:20am John Fonner, UT Austin Connect laptops to online resources (Hands-on) – Instructors will make sure everyone is set up to participate in course activities.
9:20-10:15am Matthew Vaughn, UT Austin Containerization tutorial (Hands-on) – Participants will learn about containers (Docker in this case), building a Dockerfile, running a container, and data handling with containers
10:15-10:45am Coffee Break
10:45-11:05am Matthew Vaughn, UT Austin Cyverse Science APIs (Presentation) – Discussion of managing distributed data storage and compute resources by using application programming interfaces
11:05-12:00pm John Fonner, UT Austin Hacking session 1: Systems and Data movement (Hands-on) – Participants will register data and compute resources with the Cyverse Science APIs and learn to access the same data through the web or through Python or bash scripts using Jupyter notebooks
12:00-1:00pm  Lunch
1:00-1:50pm Matthew Vaughn, UT Austin Hacking Session 2: Scalable Analyses (Hands-on)- Participants will use a web portal and Jupyter notebook to launch analyses, manage job execution, set up notifications, and share results with each other.
1:50-2:10pm Matthew Vaughn, UT Austin Metadata and Reproducibility (Presentation and Discussion) – Interactive discussion on the aspects of reproducibility for science and how to achieve them practically.
 2:10-2:50pm John Fonner, UT Austin Hacking Session 3: Metadata, Reproducibility, and Collaboration (Hands-on) – Participants will practice associating metadata with their data, and then sharing the entire product with someone else to reproduce.
 2:50-3:30pm John Fonner, UT Austin Workflow Example (Hands-on) – Participants will be asked to integrate all they have learned by performing a workflow on genomic data using different compute resources for different steps and then sharing their process, results, and conclusions.
 3:30-4:00pm   Coffee Break
 4:00-4:30pm John Fonner, UT AustinJohn Fonner and Matthew Vaughn Web Portals (Presentation) - With the Cyverse Discovery Environment and Agave ToGo as examples, we will look at alternate user interfaces and how to build community driven resources
 4:30-5:15pm John Fonner, UT AustinJohn Fonner and Matthew Vaughn Real World Applications (Q&A Session and Discussion) – The last hour will be used to apply the ideas of the course back to the domains of the participants. We will field questions and also use the time to review any section that participants would like to expand on.
 5:15-5:30pm  John Fonner, UT Austin Closing thoughts and participation survey

Bio:

John’s interests lay in applying new technologies, hardware, and paradigms to genomic and biological problems in a way that is accessible to the average bench biologist. His Ph.D. work at the University of Texas in Austin was in Biomedical Engineering, involving computational chemistry and binding interactions. In 2011 John joined the life sciences computing group at the Texas Advanced Computing Center and has served and led a number of projects, all centered around either developing tools and infrastructure to support life sciences research or training scientists to leverage advanced computing resources. John has been consistently involved in teaching and training for over 8 years, including university courses, 1-on-1 mentoring, consulting, workshops, and presentations. Staying on the "front lines" of teaching technology to scientists is critical to his work as domain researcher, active programmer, and an experienced teacher.

Bio:

After a postdoctoral fellowship in the Plant Genetics group at Cold Spring Harbor Laboratory, where he researched genome architecture, genetic regulation, and small RNA biology, Matt joined the research faculty of Cold Spring Harbor Laboratory in 2007 to conduct a program of research in epigenetics and life sciences cyberinfrastructure. In 2010, he joined the Texas Advanced Computing Center and today serves as the Director of Life Sciences Computing where he leads efforts to advance biologists' access to and utilization of advanced scientific computing technologies. Matt regularly develops and teaches training courses for the Texas Advanced Computing Center as well as research projects such as the iPlant Collaborative and the Arabadopsis Information Portal.

Learning Objectives:
  • Build Containers – Like gift wrapping for code, it makes any scrappy workflow more socially acceptable.  Containers let you (and everyone else) run your workflows almost anywhere and get the same answers.
  • Use Science APIs – Once your workflows are containerized, Science APIs are the key to bending all computers to your will.  It is perhaps the most powerful way to collaborate, capture metadata, orchestrate workflows, share data, and scale compute power.
  • Publish responsibly – Capturing computational workflows only in the methods section of a paper or by posting source code is woefully insufficient now.  Why not let your reviewers repeat your calculations with a few clicks?
Delegate Requirements

Participants should bring their own laptop.


Maximum Attendees: 30

top

 

AKES02:  Community Efforts to Enable Scalable, Reproducible, and Portable Biomedical Data Analyses
Saturday, July 9, 8:30 am – 12:30 pm
Room: Toucan (Swan Hotel)
Organizer(s):

Dr. Kaushik is a community manager at Seven Bridges and assists both internal and external developers with using Docker and building CWL pipelines on the Cancer Genomics Cloud. He also has an established track record in professional development planning. He is former chair of the UC San Diego Bioengineering research expo which brings together 400+ researchers and biomedical professionals across California. He has led workshops on graph theory, statistical modeling, and the central dogma of molecular biology as an fellow for the Insight Data Science program.

Presentation Overview:

Reproducibility remains a major concern in biomedical research. Recently, it has been demonstrated that cancer informatics analyses performed within a single consortia may yield wildly variable results. As the collection of genomic data and analyses continue to accelerate, concerns about maintaining the accuracy of results continues to grow. Largescale, accurate cancer analyses demand scalable informatics. Scalability in turn requires reproducibility and portability of tools, analyses, and data to ensure that researchers can collaborate easily and effectively.

Recent technological developments and organizational efforts have sought to address the reproducibility problem in biomedical data analysis and have been successfully applied to cancer informatics. For example, Docker containers enable researchers to package software with all of its required dependencies and nothing more. This feature allows software to be shared with anyone in such a way that the exact analysis can be reproduced. Docker containers can be easily shared through GitHub, thirdparty repositories, or usertouser with plaintext files. Moreover, external tools can hook into Docker directly, using it as a component of complex pipelines or analyses. The Common Workflow Language (CWL) is one specification, which enables researchers to describe analysis tools and workflows that are powerful, easy, and portable. Dynamic computing environments, often referred to as ‘the cloud’, are able to support colocalization of cancer data, Docker+CWL workflows, and the computational resources required to perform largescale analyses. These environments can be extended with collaboration and project management tools to enable researchers to work together in a transparent and reproducible fashion.

These methodologies have enabled globalscale cancer genomics initiatives such as the International Cancer Genome Consortium (ICGC) PanCancer Analysis for Whole Genomes Project (PCAWG), and the National Cancer Institute (NCI) Cancer Genomics Cloud (CGC) pilot program. In this workshop, we will instruct attendees in Docker and CWL as well as their use and best practices, and discuss concretely how these technologies enable scalable, reproducible, and portable cancer informatics. We will also discuss the methodologies behind how these tools are developed and deployed and pose the following questions what are next steps for improving reproducibility in bioinformatics and scale informatics efforts? What have we learned from analyses of thousands of cancer genomes that can be applied to other diseases and other consortium efforts? In addition, we will encourage discussion about unmet needs and future solutions in cancer informatics.

Agenda:
TimingPresenterTopic Area/Activity Description
8:30-8:40am Gaurav Kaushik, Seven Bridges Introduction to speakers, overview of agenda, aims for the day PDF
8:40-9:25am Brian O’Connor, OICR

Hands On: Using Docker to Enable Large­Scale Cancer Genomics Initiatives
Dr. O’Connor will begin by discussing the ICGC PCAWG project and the insights he's gained from rapidly analyzing thousands of whole cancer genomes. Attendees will learn about how his team has used Docker as a key component to their workflow, best practices for creating shareable tools, as well as the motivation behind the creation of DockStore (dockstore.org), a platform for sharing Docker­based tools described with the Common Workflow Language. Attendees will learn how to create custom Docker containers and push them to DockStore to share with the community.

YouTube Video

Slides 1

Slides 2

9:25-9:45am Michael Crusoe, University of California Davis

Lecture: The Inception and Development of the Common Workflow Language: a Model for Community-Driven Projects
Attendees will learn about how developers in industry and academia recognized the need for a language that enables reproducible workflows and came together to develop the Common Workflow Language, an emerging standard of global cancer informatics. Mr. Crusoe will present CWL as a model for future collaborative efforts to establish standards and practices that benefit the community.

Slides

9:45-10:15am Gaurav Kaushik, Seven Bridges Lecture: Scalable, Collaborative, Reproducible, and Extensible Analysis of TCGA Data in the Cloud
Dr. Davis Dusenbery will discuss the Cancer Genomics Cloud pilots project funded by the National Cancer Institute. The overarching goal of the project is to explore how co-localizing large genomics datasets, like The Cancer Genome Atlas, with dynamic compute infrastructure to analyze them, can make learning from these data faster, and ultimately enable precision medicine. The Cancer Cloud pilots also serve as a model for how collaborative research may be conducted for a variety of disease types at the national scale.
PDF
10:15-10:45am Coffee Break
10:45-11:15am Gaurav Kaushik, Seven Bridges

Hands-on Tutorial: Building and Executing a Cancer Analysis Pipeline on the CGC
Dr. Kaushik will take attendees through a demonstration of the Cancer Genomics Cloud platform, including finding and using TCGA data and building and running applications and pipelines. The CGC is a public resource for cancer informatics researchers to work collaboratively.

Slides

 

11:15-11:45am Kyle Ellrott, Oregon Health & Science University

Lecture: Collaborative Cancer Informatics

Dr. Ellrott will discuss his experience creating and using reproducible workflows to facilitate collaborative cancer genomics efforts. He will provide insight into the benefits and challenges of these approaches at scale. Additionally, Dr. Ellrott will present how portable workflows have enabled ‘Team Science’ through the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenges for collaborative efforts in cancer genomics. In particular he will highlight how the NCI Cancer Genomics Cloud pilots have been utilized by DREAM challenge participants to crowdsource cancer analyses using TCGA data.

Slides

11:45-12:15pm All Hands Forum: The audience will break into groups to discuss about individual components of the day's events:
  • Next steps for global cancer analysis with Brian O’Connor
  • The next DREAM challenge for cancer with Kyle Ellrott
  • Cloud computing for cancer genomics with Gaurav Kaushik and Brandi Davis-Dusenbery
12:15-12:30pm Summary & Conclusion Attendees, presenters, and organizers will reconvene to summarize the discussions (5 min each).
PDF

Bio:

Dr. Kaushik is a product manager for the Seven Bridges softwaredevelopment kit for buildingportable applications and workflows using Docker and CWL. He is also the community manager for the Cancer Genomics Cloud, which entails assisting developers in using open tools to enable their research.

Bio:

Dr. O'Connor is the Managing Director of Cloud Computing at OICR. He has been a member of major genomics collaborative efforts such as the PanCancer Analysis of Whole Genomes. He has also helped develop DockStore, an open platform for sharing Docker­based tools described with the Common Workflow Language which is used by the Global Alliance for Genomic Health (GA4GH).

Bio:

Prof. Ellrott is a leading expert in cancer genomics and data integration and management as a contributor to the UCSC Genome Browser. He has participated in and led DREAM challenges, an open-science effort to find solutions for pressing problems in biological science and human health.

Bio:

Michael Crusoe is the Community Engineer for the Common Workflow Language, which entails managing the development of CWL but also external communications with collaborators and users. He is also Staff Engineer in the C. Titus Brown lab at UC Davis, where he helped developed new software for analysis of high throughput sequencing data.

Learning Objectives:

From this workshop, you will:

  • Get informed about the challenges in enabling reliable cancer informatics and ongoing solutions
  • Learn how to create your own Docker containers and CWL applications to run locally or on the cloud
  • Become familiar with the inception and execution of the Cancer Genomics Cloud pilot
  • Be able to get involved in the next DREAM challenge and impact our understanding of cancer
  • Give your input to help iterate, improve, and extend current collaborative science models
Delegate Requirements

Attendees will benefit from prior experience with bioinformatics or computational biology, especially in the field of cancer. This session will serve as a suitable introduction to dynamic computing, Docker, Common Workflow Language, and the broader subject of reproducible analyses.

Hands-on demos of publicly available software will be conducted so attendees are encouraged to bring their laptops.

top

 

AKES03: CANCELLED: Synthetic Biology Open Language – standards, tools and guideline on managing both the functional design and the sequence details of your experiments
AKES04:  Living on the Edge (of Translational Informatics) - Opportunities and Challenges for Integrating Bioinformatics into the Clinical Realm
Saturday, July 9, 1:30pm – 5:30 pm
Room: Toucan (Swan Hotel)
Organizer(s):

Samuel Volchenboum is a pediatric oncologist and the director of the University of Chicago Center for Research Informatics and the Associate Chief Research Informatics Officer. His group provides divisional support for bioinformatics, data warehousing and business analytics, high-performance computing, application development, and data governance. His team has built a variety of systems used in research, including a platform for collecting and reporting pharmacogenomics data as well as tools to support the clinical molecular pathology core. Dr. Volchenboum’s research is focused on using large clinical data sets to understand the impact of hospital events on patient care and safety. Dr. Volchenboum also directs the University of Chicago Graham School Master’s Program in Biomedical Informatics. He is a chair-elect of the AMIA Genomics and Translational Bioinformatics Working Group.

Presentation Overview:

Bioinformatics tools and techniques are becoming an essential component of clinical care, yet integrating these new modalities into clinical practice remains a challenge. From molecular diagnostics to pharmacogenomics to molecular pathology and molecular medicine, it is increasingly critical for there to be integration between the bioinformaticians performing the analyses and the clinicians interpreting and acting on the results. This session will focus on these interfaces, with particular attention to the tools, systems, and processes required for an efficient transition “from bench to bedside.” The speakers all have experience working on both sides of this imaginary divide and will provide insights into driving discovery into the clinic. It is expected that the audience member will gain a more in depth understanding of the issues facing data scientists and clinicians alike. Dr. Russ Altman will moderate the session and introduce each speaker using relevant examples from the literature. Speakers will cover topics ranging from an overview of bioinformatics techniques used in clinical practice to the ethics and legal implications of genomic testing. Anyone working at the interface of genomics and medicine will find this session useful and informative.

Agenda:
TimingPresenterTopic Area/Activity Description
1:30-1:45pm Russ Altman

Introduction to speakers / overview
Dr. Altman will give an overview of the session and the learning
objectives. He will introduce each speaker and talk using examples
from the literature relevant to each topic.

Slides

 

1:45-2:15pm Samuel Volchenboum

Lecture - Overview of bioinformatics techniques and uses in
clinical practice and research
This talk will focus on a broad and high-level overview of how bioinformatics is being harnessed to answer clinical questions and for medical decision making. Clinical vignettes will be used along with practical examples from a sampling of academic programs. Covered will be both panel testing / targeted sequencing as well as exome and whole genome sequencing. Also briefly discussed will be how research groups can be leveraged for clinical testing, including the CAP certification and HIPAA and 21 CFR Part 11 compliance.

PDF

 

2:15-2:45pm Robert R. Freimuth

Lecture - Utilizing genomic data in clinical systems
Researchers and clinicians have more opportunities to share data and knowledge than ever before, but there remain significant challenges to the meaningful exchange, integration, and use of information. This presentation will provide examples of how genomic data are recorded in electronic health record systems and how those results are used clinically.  Specific challenges will be highlighted, such as variability in the representation of genetic data, and growing needs related to knowledge management, as well as the impact that those challenges have on patient care and translational research. Existing efforts by both international standards organizations and national consortia to develop normalized systems for the efficient exchange and integration of data related to clinical genomics will be reviewed.

PDF

 

2:45-3:15pm Lewis Frey

Lecture - Collection of data for research

The talk will explore how data are collected and used for subsequent research. As panel and sequencing data are collected and stored, it will become important to understand who “owns” these data and who can use them for research. Finally, how these data are integrated into the data warehouse and connected to clinical information will be explored along with how to address issues of interoperability.

Slides

 

3:15-3:30pm  Subha Madhavan

 Lecture - Practical Precision Medicine: Integration of clinical and genomic data to support cancer care



Genomics technologies are increasingly utilized in the management of patients with malignancies, especially in patients who do not respond to “standard of care” treatments. From initial diagnostic and treatment triaging to contemplating options for patients with refractory disease, informatics is at the heart of cancer care management. This session will concentrate on the ways in which genomic data can be integrated into clinical care processes to support decision making. What kinds of data and associated evidence should be made available to clinicians? How should these data be presented to caregivers? How can this information be delivered to maximize comprehension?

Slides

 

3:30-4:00 pm Coffee Break
4:00-4:15 pm Subha Madhavan

Demo - G-DOC Plus: A Data Science Platform for Precision Medicine Research

Slides

 

4:15 - 4:45 pm Casey Overby

Lecture - Increasing the reach of clinical genomics research and genomics-informed care

Underrepresented research populations are less likely to benefit from genomic advances. Thus, there is a critical need to improve the reach of clinical genomics research to broader populations. Further, there is a need to deliver customized guidance for acting upon results from those studies. The learning objectives for this session will be to understand: (a) emerging communication models leveraging technology to increase
patient engagement, and (b) the potential for programmatic clinical decision support to facilitate customized genomics-informed care guidance. Specific examples assessing public interest in donating biobank samples and the risk communication needs of cancer genetic counselors will be described.

PDF

 

4:45-5:15 pm Jessie Tenenbaum

Lecture - Ethical, legal, and social implications of genomic testing

Powerpoint

Though significant advances have been made toward realization of genomic medicine, precision medicine is still in its early days. Here we describe a case in which a patient’s treatment was altered based on direct-to-consumer genetic testing. Genomic medical guidelines continue to evolve such that a different, and less aggressive, therapeutic course would be recommended today. This case illustrates issues around the translation of genetic knowledge into clinical practice: patient engagement and personal preferences, patient and
clinician education, genomic data integration in EHRs, and legal and financial implications of genetic findings. Researchers and clinicians will need to work closely with patients and patient communities as well as ethical, legal, social, and economic experts in order to fully realize the promise and the value of genomic medical practice.

5:15-5:30 pm Panel Panel discussion

Bio:

Samuel Volchenboum is a pediatric oncologist and the director of the University of Chicago Center for Research Informatics and the Associate Chief Research Informatics Officer. His group provides divisional support for bioinformatics, data warehousing and business analytics, high-performance computing, application development, and data governance. His team has built a variety of systems used in research, including a platform for collecting and reporting pharmacogenomics data as well as tools to support the clinical molecular pathology core. Dr. Volchenboum’s research is focused on using large clinical data sets to understand the impact of hospital events on patient care and safety. Dr. Volchenboum also directs the University of Chicago Graham School Master’s Program in Biomedical Informatics. He is a chair-elect of the AMIA Genomics and Translational Bioinformatics Working Group.

Bio:

Dr. Altman is a professor of bioengineering, genetics, & medicine and past chairman of the Bioengineering Department at Stanford University. His primary research interests are in the application of computing and informatics technologies to problems relevant to medicine. He is particularly interested in methods for understanding drug action at molecular, cellular, organism and population levels. Dr. Altman has conducted many teaching sessions in which he curates large amounts of seminal literature and distills the material into the most relevant publications. He delivers the annual AMIA “year in review,” a perennial favorite and standing-room-only lecture.

Bio:

Dr. Freimuth is an Assistant Professor of Biomedical Informatics in the Department of Health Sciences Research, Mayo Clinic, and maintains adjunct appointments at University of Minnesota-Rochester and Arizona State University.  The goal of his research is to develop scalable and interoperable genomic clinical decision support tools, driven by knowledge bases that utilize standards-based data models, terminologies, message formats, and algorithms, that will enable robust management and delivery of genomic knowledge as well as the evaluation of medical outcomes resulting from genomic-guided therapy decisions.  He is a member of several research networks and formal standards development initiatives focused on genomic medicine.  Dr. Freimuth is co-Chair of the Clinical Pharmacogenetics Implementation Consortium (CPIC) Informatics Work Group and he recently completed two terms as Chair of the AMIA Genomics and Translational Bioinformatics Working Group

Bio:

Dr. Overby is an Assistant Professor in the Divisions of General Internal Medicine and Health Sciences Informatics at Johns Hopkins University (JHU). Dr. Overby is also Adjunct Investigator in the Genomic Medicine Institute at Geisinger Health System. Her research interests span a number of areas at the intersection of public health genetics and biomedical informatics, including applications that support translation of biological knowledge to clinical care and population healthcare, delivering health information and knowledge to the public, and developing knowledge-based approaches to use big data (including electronic health record data) for population health.

Bio:

Dr. Madhavan is Director of the Innovation Center for Biomedical Informatics (ICBI) at the Georgetown University Medical Center and Associate Professor of Oncology. She is a world leader in data science, clinical informatics and health IT who is responsible for several biomedical informatics efforts including the software development of Georgetown Database of Cancer (G-DOC) a resource for both researchers and clinicians to realize the goals of personalized medicine and co-directs Lombardi Cancer Center’s Biostatistics and Bioinformatics shared resource. She has taught similar sessions in the past, most recently at the Pacific Symposium on Biocomputing.

Bio:

Dr. Frey has focused on methods of data integration and analysis for the purpose of discovery. Novel analysis of integrated heterogeneous data provides opportunity for discovery and improved patient care through bench-to-bedside translational research. He is leading the development of big data methodologies applied to the Veterans Affairs Informatics and Computing Infrastructure (VINCI). The goal is to create a system for distributable clinical, personalized, pragmatic predictions of outcome (Clinical3PO) with easy deployment of preconfigured virtual machines. The impact will be a product that is plug and play with existing infrastructure at health care institutions. A distributable Clinical3PO system will enable novel analysis on sequential health care data that overcomes the curse of dimensionality currently limiting the research field.

Bio:

TBA

Learning Objectives:
  • Understand the challenges facing both bioinformaticians and clinical researchers when developing and using new technologies for translational research
  • Identify considerations when making genomic testing accessible for use in clinical care
  • Comprehend the opportunities and complexities of collecting clinical data for research
  • Understand key ethical and legal implications of genomic testing


Maximum Attendees: 50

top

 

AKES05: CANCELLED: Cytoscape 3 User Tutorial: Introduction to network visualization and analysis using Cytoscape