ISCB-Asia/SCCG 2012, session on Workflows and the Cloud for Reproducible Computing


Marco Roos
Workflow4Ever/Leiden University Medical Center, The Netherlands

Newton's ideas and methods are preserved forever: how about yours?

Abstract

For the modern scientist, research is becoming more and more digital. This creates new needs and possibilities for preserving and publishing digital material and maintaining high standards of reproducibility. To ensure that experiments can be independently verified and reused, we need digital access to research data and computational procedures, to the insights that were obtained, and to information that help us understand the experiment, such as the scientific context. Beyond infrastructure and maintenance, this is a challenge for capturing, preserving, curating, and verifying computer-supported research. We present two complementary efforts. The EU project Advanced Workflow Preservation Technologies for Enhanced Science (Wf4Ever)) aims to develop technology to preserve the reproducibility of scientific workflows. A key concept is the Research Object (RO) model that can be used to structure, annotate, and preserve digital materials and computational protocols (workflows) throughout the research cycle. The second effort is fostered by the international 'Concept Web Alliance', and revolves around the nanopublication concept. Nanopublications bind scientific assertions to evidence and attribution, thereby supporting researchers who wish to share data that carries valuable scientific meaning. ROs focus on the digital experiment as an executable protocol, while nanopublications focus on scientific insight, data, and attribution. They are interoperable, because both adopt Semantic Web principles in their core design. The RO concept is currently driving further development of the user platform myExperiment.org, where researchers can pack workflows with critical resources, such as (references to) data, previous runs, and annotations in order to help others to understand, re-enact and – if need be – repair, previously executed designs. Nanopublications are explored for genome annotations, protein interactions, and to aid drug discovery in the EU Open PHACTS project.

Biography

Marco Roos is coordinator and researcher of the BioSemantics group, which is part of the Human Genetics Department of the Leiden University Medical Center (LUMC) in the Netherlands. He is also a guest of the Informatics Institute of the University of Amsterdam.

His interest is in applying computer science techniques for unravelling molecular mechanisms in the cell. He contributes to research and development of biomedical knowledge discovery methods, and adoption of an e-Science approach based on computer-facilitated multidisciplinary collaboration, for instance by using workflows and Semantic Web / Linked Data.

Marco studied molecular biology and combined this with various classes in computer science. He did his PhD in the field of Molecular Cytology where he worked on the structure and function of chromosomes during the cell cycle. After his PhD he contributed to the Human Transcriptome Map project at the Academic Medical Center in Amsterdam, and subsequently became active in the field of e-Science via the Virtual Laboratory of e-Science project at the University of Amsterdam. He contributed to the development of an e-Science approach based on a combination of workflows, the Semantic Web, and sharing via Web2.0 tools such as myExperiment.org. His role became liaison between (molecular) biology and computer science.

The BioSemantics group in Leiden, headed by Prof. Barend Mons, and its sister group at the Erasmus Medical Center in Rotterdam, collaborate closely with the Netherlands Bioinformatics Centre (NBIC), for which Marco is co-chair of the Interoperability Task Force. The BioSemantics group has produced tools and Web Services for biomedical knowledge discovery, such as Anni, Nermal and Jane. Its research led to the development of the ConceptWiki, and the concept of nanopublication. The group also participates in developing standards and tools for preserving high quality research workflows. See http://biosemantics.org for more information.