Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


NIH/OD Office of Data Science Strategy (ODSS)


Schedule subject to change
Tuesday, July 14th
10:40 AM-11:00 AM
Introduction to the ODSS Data Science Sessions
Format: Live-stream

  • Susan Gregurick, Associate Director for Data Science and Director of the NIH Office for Data Science Strategy, United States
  • Patti Brennan, National Library of Medicine, NIH, United States
11:00 AM-11:20 AM
10 lessons we learned in 10 days about clinical data interoperability in the COVID crisis
Format: Live-stream

  • Atul Butte, Priscilla Chan and Mark Zuckerberg Distinguished Professor  of  Pediatrics, Bioengineering &  Therapeutic Sciences, and Epidemiology & Biostatistics at UCSF , United States

Presentation Overview: Show

There is an urgent need to take what we have learned in our new data-driven era of medicine, and use it to create a new system of precision medicine, delivering the best, safest, cost-effective preventative or therapeutic intervention at the right time, for the right patients. Dr. Butte's lab at the University of California, San Francisco builds and applies tools that convert trillions of points of molecular, clinical, and epidemiological data -- measured by researchers and clinicians over the past decade and now commonly termed “big data” -- into diagnostics, therapeutics, and new insights into disease. Dr. Butte, a computer scientist and pediatrician, will highlight his center’s recent work on integrating electronic health records data across the entire University of California, and how analytics on this “real world data” can lead to new evidence for drug efficacy, new savings from better medication choices, and new methods to teach intelligence – real and artificial – to more precisely practice medicine.

11:20 AM-11:40 AM
Network Medicine Framework for Drug Repurposing
Format: Live-stream

  • Joseph Loscalzo, Hersey Professor of the Theory and Practice of Medicine at Harvard Medical School, Chairman of the Department of Medicine, and Physician-in-Chief at Brigham and Women’s Hospital, United States

Presentation Overview: Show

Repurposing approved drugs for novel uses is evolving as a significant approach for identifying new, effective treatments for known diseases and for new diseases. We have applied the principles of network medicine to the identification of drug targets that may be effectively repurposed, and done so by utilizing the comprehensive protein-protein interaction network (interactome) as the basis for the analysis. To do so, we first recognize that unique (clustered) subnetworks or disease modules exist within this interactome for each disease. Next, we create a bipartite graph based on the assumption that the proximity of a drug target to the disease module of interest suggests the potential utility of the target’s drug for the disease of interest. We then use network proximity, network diffusion, and AI-based network metrics to rank all approved drugs with respect to their likely efficacy in the disease of interest; aggregate all predictions to create an integrated rank order; test its statistical predictive accuracy with ground truth examples; and arrive at lead candidate drugs for the disease of interest. Lastly, we test select members of this candidate list in relevant in vitro experiments as a proof-of-concept. We believe this unique approach to repurposing approved drugs holds promise as an effective, rapid, relatively inexpensive strategy for deploying ‘new’ treatments safely.

12:00 PM-12:20 PM
Machine Reading for Precision Medicine
Format: Live-stream

  • Hoifung Poon, Senior Director of Biomedical NLP at Microsoft Research, United States

Presentation Overview: Show

The advent of big data promises to revolutionize medicine by making it more personalized and effective, but big data also presents a grand challenge of information overload. For example, tumor sequencing has become routine in cancer treatment, yet interpreting the genomic data requires painstakingly curating knowledge from a vast biomedical literature, which grows by thousands of papers every day. Electronic medical records contain valuable information to speed up clinical trial recruitment and drug development, but curating such real-world evidence from clinical notes can take hours for a single patient. Natural language processing (NLP) can play a key role in interpreting big data for precision medicine. In particular, machine reading can help unlock knowledge from text by substantially improving curation efficiency. However, standard supervised methods require labeled examples, which are expensive and time-consuming to produce at scale. In this talk, I'll present Project Hanover, where we overcome the annotation bottleneck by combining deep learning with probabilistic logic, and by exploiting self-supervision from readily available resources such as ontologies and databases. This enables us to extract knowledge from millions of publications, reason efficiently with the resulting knowledge graph by learning neural embeddings of biomedical entities and relations, and apply the extracted knowledge and learned embeddings to supporting precision oncology.

12:20 PM-12:40 PM
Internet and Remote Sensing Data for Public Health Surveillance
Format: Live-stream

  • Elaine Nsoesie, Assistant Professor of Global Health, Boston University School of Public Health, United States

Presentation Overview: Show

In this talk, I will present opportunities and challenges of using data from Internet sources and remote sensing for public health surveillance. If properly mined and filtered, these data can be used to study the dynamics of disease propagation, risk factors and the relationship between human behavior and disease spread. This talk will include examples on the use of satellite images, social media postings and product reviews for surveillance of infectious diseases, food products and obesity prevalence.

2:00 PM-2:20 PM
Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning
Format: Live-stream

  • Timnit Gebru, Technical Co-Lead of the Ethical Artificial Intelligence Team at Google Research, United States

Presentation Overview: Show

A growing body of work shows that many problems in fairness, accountability, transparency, and ethics in machine learning systems are rooted in decisions surrounding the data collection and annotation process. We argue that a new specialization should be formed within machine learning that is focused on methodologies for data collection and annotation: efforts that require institutional frameworks and procedures. Specifically for sociocultural data, parallels can be drawn from archives and libraries. Archives are the longest standing communal effort to gather human information and archive scholars have already developed the language and procedures to address and discuss many challenges pertaining to data collection such as consent, power, inclusivity, transparency, and ethics privacy. We discuss these five key approaches in document collection practices in archives that can inform data collection in sociocultural machine learning.

2:20 PM-2:40 PM
Data Sharing Challenges for Biomedical AI
Format: Live-stream

  • Bradley Malin, Vice Chair for Research and Professor of Biomedical Informatics,
    Professor of Biostatistics,
    Professor of Electrical Engineering & Computer Science, and Affiliated Faculty of the Center for Biomedical Ethics & Society and Computer Science at Vanderbilt University, United States

Presentation Overview: Show

The amount of person-specific data generated in the clinical and research domain continues to grow at an unprecedented rate. The breadth and depth of this data enables better care and hypothesis-driven discovery. At the same time, there is a belief that sharing and reusing, as well as combining it with other resources will lead to new and more robust statistical investigations. However, sharing such data has the potential to infringe upon the rights (or expectations) of the individuals to whom the records correspond and organizations making the resources available. In this presentation, I will review various ways in which well meaning biomedical data science investigations have led to misuse, as well as how data-driven methodologies can help us to mitigate such threats while still enabling scientific utility. This presentation will draw upon experiences in data collection and sharing in the context of several large data genomic and medical records data sharing initiatives, including the NIH-sponsored Electronic Medical Records and Genomics (eMERGE) Network and the All of Us Research Program, as well as the European Medicines Agency clinical trials data sharing program.

2:40 PM-3:00 PM
Ethics, Bias, and the Adoption of AI in Biomedicine
Format: Live-stream

  • Matthew DeCamp, Associate Professor, Center for Bioethics and Humanities and Division of General Internal Medicine, University of Colorado, United States

Presentation Overview: Show

Artificial intelligence (AI) has vast potential to revolutionize biomedicine across the translational science spectrum, from basic science to clinical research to practice and policymaking. However, recent high-profile cases – lawsuits alleging that patients’ privacy rights were violated when health data were shared with a technology company, errant treatment recommendations, and racially biased algorithms – suggest that addressing the ethical questions AI raises will be critical to its success. What does privacy require? Who owns data? Where is the line between informing choice through analytics, and manipulating it? Should some domains of the human experience be “off limits” for AI? How should the potential for biases and stigmatization as a result of AI applications be managed? Using a highly publicized real-world case of a biased resource allocation algorithm as an example, this presentation will describe three considerations that are critical for understanding how to manage ethical questions surrounding AI: context (i.e., how we manage bias and other ethical issues depends upon the data type and the circumstances under which data are collected and used); upstream action (i.e., bias and other ethical issues must be addressed at the earliest stages of AI development, not after the fact); and engagement (i.e., meaningfully engaging diverse stakeholders is essential to managing ethical issues). For AI to be most effective it must be ethical. Moving forward, there is a great need for empirical research into how ethical issues and their management affect the diffusion of AI-based innovations.

3:20 PM-3:40 PM
Training at the Intersection: Bringing Together Computation and Biomedicine
Format: Live-stream

  • Alex T. Bui, Director, Medical & Imaging Informatics (MII) Group
    Director, UCLA Medical Informatics Home Area Graduate Program
    Professor, Departments of Radiological Sciences, Bioengineering & Bioinformatics
    David Geffen Chair in Informatics
    David Geffen School of Medicine at UCLA, United States

Presentation Overview: Show

The confluence of data from electronic health records (EHRs) and new types of observations (omic, imaging, mHealth), combined with novel computational methods holds a tantalizing promise to transform biomedical discovery and patient care. New machine and reinforcement learning (ML/RL) techniques, for example, are uncovering scientific insights and helping optimize clinical decision making. However, the next generation of scientists must appreciate the proper use of these methods in addition to the underlying translational challenges inherent to making such techniques usable in real-world environments. Moreover, such training needs to span an interdisciplinary spectrum, particularly if artificial intelligence (AI) is to advance in healthcare: for the computer scientists and engineers engaged in methodological development and the theoretical underpinnings of these algorithms, there is a need to appreciate the nuanced nature of biomedical data and how the tools they create are ultimately used; and for the biological/clinical scientist interested in using these tools, there must be an understanding of the appropriate application of data-driven analyses, including their evaluation. Unique opportunities exist to bring these individuals together by employing team science approaches on real-world use cases posed by healthcare systems, providing a pragmatic testbed for learning and implementation. Although short-term effort is required to make such teams successful (e.g., forging a common language, understanding different information needs), the diversity of technical experience and viewpoints enables innovative, effective solutions and the group learns together, often in a self-sustaining manner.

3:40 PM-4:00 PM
Format: Live-stream

  • Kara Hall, Director of the Science of Team Science (SciTS) Team, United States

Presentation Overview: Show


4:00 PM-4:20 PM
Reflections and Lessons from 15 Years of Training Computational Biologists
Format: Live-stream

  • Ivet Bahar, Distinguished Professor and JK Vries Chair, Computational & Systems Biology Department, School of Medicine, University of Pittsburgh and Associate Director, UP Drug Discovery Institute, United States

Presentation Overview: Show

It is clear that there is an increasing need for computational biologists in both academia and industry. We launched a joint doctoral program to train computational biologists between Carnegie Mellon University and University of Pittsburgh in 2005, and this has been a continually evolving process since then in tandem with the rapid developments in the field, and the increasing demand for workforce in the field. I will present the challenges we faced in our interdisciplinary, inter-institutional program, and the solutions we came up with. As a program that admits students from a wide variety of backgrounds and aims at providing training in a rapidly evolving field, it has been critically important to monitor student progress and success as a function of their background and offer customized training when needed in line with career opportunities. A major lesson learned is the importance of familiarizing with the field early in education, before graduate studies; another is to gauge the career landscape and proactively adapt the training program to current and future needs.

4:20 PM-4:40 PM
Integrating Biomedical Informatics and Data Science to Prepare the Precision Medicine Workforce
Format: Live-stream

  • Philip Payne, Robert J. Terry Professor and Director, Washington University Institute for Informatics; Professor of Medicine, Washington University School of Medicine; Professor of Computer Science and Engineering, Washington University McKelvey School of Engineering, United States

Presentation Overview: Show

As biomedicine, and in particular, translational research, have entered the era of big data and artificial intelligence, the Washington University School of Medicine has developed a precision medicine roadmap intended to respond to such trends. At the core of this roadmap is the promise of precise and data-driven approaches to enabling the delivery of the right treatment to the right patient at the right time, all with the objective of saving and improving lives. As part of our roadmap, we believe that the biomedical research teams of the future will increasingly need to utilize multi-disciplinary approaches, notably incorporating the use of biomedical informatics and data science theories and methods. Based upon this belief, we have launched a comprehensive portfolio of practical, in-career, and scientific training programs, all of which transcend traditional disciplinary boundaries, and that ultimately seek to create a workforce capable of delivering on the promise of precision medicine. In this presentation, we will review the structure, curriculum, and lessons learned as a result of establishing such education and workforce development programs. In particular, we will focus on the critical need to combine both biomedical informatics and data science methodologies with driving biological and clinical problems, so as to engage and support learners in a highly contextualized and experiential learning environment.

4:45 PM-6:00 PM
Panel Discussion
Format: Live-stream

  • Session Speakers