Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

NIH/OD Office of Data Science Strategy (ODSS)

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in UTC
Tuesday, July 27th
11:00-11:10
Welcome and Introduction
Format: Live-stream

Moderator(s): Susan Gregurick

  • Susan Gregurick
11:00-12:20
Session I: The NIH Cloud Platforms Interoperability (NCPI) Efforts
Format: Live-stream

Moderator(s): Susan Gregurick

11:10-11:33
Federated and FAIR Systems Interoperation in NCPI
Format: Live-stream

Moderator(s): Susan Gregurick

  • Brian O'Connor

Presentation Overview: Show

The NCPI effort started in the fall of 2019 with the goal of establishing and implementing guidelines and technical standards to empower a trans-NIH, federated FAIR data ecosystem. The Systems Interoperation Working Group is a key part of this effort and focuses on putting these approaches and standards into practice, allowing researchers to work across participating cloud platforms. In this talk we look at the intersection of researcher use cases for distributed cloud-based analysis, the data standards that enable this vision, and dive into an example that leverages the Systems Interoperation work as a whole. Along the way we will explore some key technical standards that have been critical to our progress and examine our next steps as we push the envelope of interoperability.

11:33-11:56
Clinical and Phenotypic Data Interoperability using FHIR in NCPI
Format: Live-stream

Moderator(s): Susan Gregurick

  • Allison P Heath
  • Robert Carroll

Presentation Overview: Show

The increasing ability to analyze and integrate data across research studies in cloud-based platforms creates a corresponding need for clinical data interoperability. One of the major gaps with making clinical and phenotype data available to the research community is the lack of universal standards for representing and transmitting this data. FHIR has emerged as the core interoperability standard for healthcare data and provides a practical framework for implementation and interchange that is extensible to research data. The usage of FHIR provides an implementation framework for research platforms to interoperate. Additionally, it also provides a bridge between data generated from clinical systems and research systems that have traditionally been siloed and thus reducing the capacity for translational impact. We will present work on the NCPI FHIR Implementation Guide, including practical examples of representing research phenotypes across several NIH programs. We will further discuss how FHIR can be a substrate for an ecosystem of tools and downstream analytics.

11:56-12:20
Modeling the computing requirements and costs for genomics analysis in the cloud
Format: Live-stream

Moderator(s): Susan Gregurick

  • Michael C. Schatz

Presentation Overview: Show

Cost is one of the largest barriers for migrating biomedical related analysis into the cloud. Researchers currently have limited information for the expected costs for running analysis tools, which challenges budgeting and prevents many researchers from adopting cloud solutions. In addition, software developers may not focus on optimizing costs for cloud environments, which increases expense even when relatively simple optimizations are available. Addressing these needs, we are profiling and analyzing the cloud costs for many of the most widely used tools & workflows in genomics. To identify these workflows, we have mined the historic usage data on the global usegalaxy.* Galaxy servers, as it is one of the most popular community resources available. We are now measuring their computing requirements and costs when running with inputs of varying sizes and complexities. From these data, we aim to develop a predictive model and API that can estimate the costs for running these analyses on each of the NCPI cloud platforms. Our goal is to inform investigators of the anticipated costs for their research and reduce costs by informing software developers of the tools that most urgently need optimization.

12:40-14:00
Session II: FAIR Data/Repositories
Format: Live-stream

Moderator(s): Jennie Larkin & Ishwar Chandramouliswaran

12:40-12:45
NIH Activities to Support a FAIR Repository Ecosystem
Format: Live-stream

Moderator(s): Jennie Larkin & Ishwar Chandramouliswaran

  • Jennie Larkin, NIA/NIH, USA
  • Ishwar Chandramouliswaran, NIAID/NIH, USA

Presentation Overview: Show

An overview of the NIH activities to support a FAIR data repository ecosystem. This will be followed by moderating the panel of speakers scheduled in this session.

12:45-13:00
FAIR Research - What is in it for the Researchers?
Format: Live-stream

Moderator(s): Jennie Larkin & Ishwar Chandramouliswaran

  • Mark Hahnel

Presentation Overview: Show

In this short presentation, Mark Hahnel, founder and CEO of the data publishing platform Figshare will talk about the unexpected benefits in complying with policies to make datasets and code openly available - from the perspective of the researcher.

Mark will talk about how certain researchers have taken advantage of disseminating all of their research outputs to get credit for all of their research. In the competitive academic landscape, not all outputs can have huge impact - but there are simple practices that can help research move faster, whilst complying with funder mandates and ensuring maximum dissemination for your own research.

13:00-13:15
The Tribal Nature of Data Sharing
Format: Live-stream

Moderator(s): Jennie Larkin & Ishwar Chandramouliswaran

  • Lara Mangravite

Presentation Overview: Show

Sage Bionetworks developed Synapse in 2012 as a general purpose data repository for cloud-based distribution of data under FAIR principles. Although the system is freely available for any use, data sharing and secondary use are most active where a research community, or their funder, has created a domain-specific space. Here we will discuss the technical, social, and culture issues that make domain-specific data repositories so successful in catalyzing secondary research – and consider how we might leverage these to promote the use of general infrastructure.

13:15-13:30
FAIR Data and FAIR Repositories
Format: Live-stream

Moderator(s): Jennie Larkin & Ishwar Chandramouliswaran

  • John Chodacki, California Digital Library (CDL), USA

Presentation Overview: Show

This talk will focus on how both generalist and domain-specific repositories can best support FAIR data principles. Focus will be given to the role of community groups such as FORCE11 in the adoption of best practices by data repositories and data publishers.

13:30-13:45
Bridging from Researcher Data Management to ELIXIR Archives in the RDM Lifecycle
Format: Live-stream

Moderator(s): Jennie Larkin & Ishwar Chandramouliswaran

  • Carole Goble, The University of Manchester / ELIXIR-UK

Presentation Overview: Show

ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.

13:45-14:00
An Introduction to ICPSR: A Place to Discover and Access Social and Behavioral Science Data for Secondary Analysis
Format: Live-stream

Moderator(s): Jennie Larkin & Ishwar Chandramouliswaran

  • Sandra Tang

Presentation Overview: Show

Over the past 60 years, the Inter-university Consortium for Political and Social Research (ICPSR) has successfully coordinated the research needs for over 800 universities and research institutes. This work has led to the expanded use of secondary data, the development of innovative lines of research, and the training of multiple generations of scholars across the fields of social and behavioral sciences. This talk will highlight some of the resources and support that we provide to data users in discovering, accessing, and analyzing existing data as well as how we support data providers to share their data responsibly and demonstrate impact of their data. Finally, this talk will address new directions for ICPSR given the evolving data landscape and analytic needs of data users.

14:20-15:20
Session III (Panel): Diversity in Data Science Training and Research
Format: Live-stream

Moderator(s): Karol Watson & Susan Gregurick

Panel
Format: Live-stream

Moderator(s): Karol Watson & Susan Gregurick

  • Marcela Nava, University of Texas at Arlington
  • Joshua Pritchett, Google Cloud
  • Jennifer Wagner, Geisinger Center for Translational Bioethics and Health Care Policy
  • Omolola Ogunyemi, Charles R. Drew University of Medicine and Science
Wednesday, July 28th
11:00-11:05
Introduction and Welcome
Format: Live-stream

Moderator(s): Heidi Sofia

  • Susan Gregurick

Presentation Overview: Show

In this opening introduction, Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health, will share the NIH’s vision for a modernized, integrated FAIR biomedical data ecosystem and the strategic roadmap to achieve this vision. Dr. Gregurick will highlight projects being implemented by team members across the NIH’s 27 institutes and centers and will ways that industry, academia, and other communities can help NIH enable a FAIR data ecosystem.

11:00-12:20
Session IV: Open Research Software
Format: Live-stream

Moderator(s): Heidi Sofia

11:05-11:20
Manual Brain Segmentation Workflows Using the SPINE Virtual Laboratory: from Desktop to Cloud
Format: Live-stream

Moderator(s): Heidi Sofia

  • Alfredo Morales Pinzon, Brigham and Women's Hospital & Harvard Medical School, USA

Presentation Overview: Show

Manual segmentation of brain structures on magnetic resonance images (MRI) requires brain anatomical knowledge, well-defined procedures, and tailored neuroimaging segmentation tools, in order to enable high reproducibility of results, especially in projects with large data sets requiring segmentations from multiple annotators. Desktop based solutions work well for single rater segmentations but are not tailored to collaborative work. For instance, images and segmentations have to be sent back and forth among the various actors of a project, and changes in segmentation guidelines are difficult to share and implement. In this project we aim to translate the core functionalities of a neuroimaging module developed in a desktop solution into a web-based editor, and to codify manual segmentation processes into computerized workflows. A JavaScript Object Notation (JSON) schema is proposed to describe segmentation tools (e.g., configuration of viewers and segmentations widgets) and workflows, allowing the research community to easily edit and share workflows. The web-based editor, as well as the computerized workflows, are integrated in the SPINE Virtual Laboratory which allows for centralized control access and execution of workflows, while maintaining the provenance of the data and results. The SPINE Virtual Laboratory is being tested in the cloud, increasing the scalability of the solution by enabling cloud workflow execution and connection to open science cloud based data repositories.

11:20-11:35
UR_EAR - A Web App supporting computational models for auditory-nerve and midbrain responses
Moderator(s): Heidi Sofia

  • Laurel Carney
11:35-11:50
mPATH: A Digital Health Navigator
Format: Live-stream

Moderator(s): Heidi Sofia

  • Ajay Dharod

Presentation Overview: Show

As technology evolves, EHRs have increasingly adopted more of a platform role, enabling third-party applications to perform limited functions within the system of record. Interoperable standards-based information systems that facilitate the secure but agile exchange of data are critical to contemporary healthcare delivery. Cloud-based services are widely used in the non-healthcare domain. EHR-integrated digital and mobile health applications must include expanded use of the SMART on FHIR standard and cloud base architecture to allow for scalability and ease of installation across healthcare systems and care settings. This presentation will share the software development for scalability and cloud readiness journey (NIH ODSS supplement grant) of an NCI-funded cancer-prevention digital health navigation application (mPATH). The activities of the Administrative Supplement port and translate the highly efficacious mPATH mobile health platform to a robust and scalable technical software infrastructure including improving interoperability (SMART on FHIR) and migration to a cloud computing model which will greatly increase the impact of the platform and contribute to the literature for open-standards based software development.

11:50-12:05
Human AI Loop in Cloud for Distributed Computation
Format: Live-stream

Moderator(s): Heidi Sofia

  • Pinaki Sarder, University at Buffalo, USA

Presentation Overview: Show

Modern medicine is in the process of entering the big data revolution, generating large volume multi-modal, multi-scale, and multi-omics data. This progress has opened-up new opportunities for computational scientists to discover previously undiscoverable statistical biomarkers from the data. The ideal end user for the developed computational tools will use them to ask important biological questions, but often does not have a background in computational science. This has further driven computational data science to ensure that the developed computational tools are accessible to any end-user. In this talk, we will shed light along this direction, and discuss an integrative tool, HAIL (Human AI Loop) in Cloud, for open data science. This tool is developed in conjunction with HistomicsUI, an open-source whole slide image viewer and digital data archival system. We have integrated our HAIL tool as an end-user plugin for conducting detection, segmentation, and quantification of structures from very large tissue images, as well as fusion of multi-modal data, particularly fusion of spatial molecular and image data. HAIL allows an end-user to actively interact with the system to tune the model training using their biological prior knowledge. We will also show results on how the tool can be used to conduct multi-site data analysis, eliminating the need to share data with protected health care information outside the host institution. We will conclude by discussing the need for generation of reference datasets, as well as the importance of integrating various types of data ontology with the system, for reproducible assessment of independent datasets.

12:05-12:20
Software and Science at the Open Force Field Initiative
Format: Live-stream

Moderator(s): Heidi Sofia

  • Jeff Wagner, Open Force Field Initiative, USA

Presentation Overview: Show

The Open Force Field Initiative aims to publish sustainable software and reproducible research. To this end, we have experimented with several approaches to software development and data management, many of which have become standard practice in the Initiative. For example, to make our research reproducible, we tie data artifacts to github releases that contain complete copies of source code, input data files, and exact conda environments. To make the software sustainable, our packages are templated by the MolSSI cookiecutter, which allows us to standardize infrastructure with other packages in our field, and lowers the barrier for contributors. While some of these practices increase upfront project costs, they reduce the personnel-time required to onboard new contributors and researchers, leading to a model which facilitates collaboration. This talk will review many of the specific approaches to open source science that Open Force Field has tried, and the degrees to which they have been successful.

12:40-14:00
Session V: Reproducibility & Re-use
Format: Live-stream

Moderator(s): Alex Bui

12:40-12:44
Introduction and Welcome
Format: Live-stream

Moderator(s): Alex Bui

  • Alex Bui
12:44-13:03
A Framewok for Making Predictive Models Useful in Medicine
Format: Live-stream

Moderator(s): Alex Bui

  • Nigam Shah, Stanford University, United States

Presentation Overview: Show

In this session we will explore strategies for, and issues involved in, bringing Artificial Intelligence (AI) technologies to the clinic, safely and ethically. We will discuss the different use cases for AI in personalizing diagnosis, prognosis and treatment recommendation. The session will review the use of routinely collected data on millions of individuals to provide on-demand evidence in those situations where good evidence is lacking and introduce a framework for analyzing the utility of model-guided workflows in healthcare.

13:03-13:22
Veridical Data Science for precision medicine: subgroup discovery through staDISC
Format: Live-stream

Moderator(s): Alex Bui

  • Bin Yu
13:22-13:41
PREMIERE: A community-driven platform for reproducibilty and reuse in biomedical predictive modeling
Format: Live-stream

Moderator(s): Alex Bui

  • Anders O Garlid
13:41-14:00
Omics Indexing and Standards
Moderator(s): Alex Bui

  • Yasset Perez-Riverol
14:20-15:20
Session VI (Panel): Bridging International Efforts in Data Science
Moderator(s): Rolf Apweiler

Panel
Moderator(s): Rolf Apweiler

  • Michele Ramsay
  • Claudia Medeiros
  • Chuck Cook
  • Griffin M. Weber, Harvard Medical School, United States



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube