Bio-Ontologies COSI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CDT
Wednesday, July 13th
10:30-10:50
Keynote Presentation: Bio-Ontologies COSI Opening Introductions
Room: MQRN
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Tiffany Callahan
10:50-11:50
Keynote Presentation: Bio-Ontologies COSI Keynote: Encoding biases' influences on development and use of ontologies in the life sciences
Room: MQRN
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Maria Keet


Presentation Overview: Show

Ontology authoring, sometimes referred to as the ‘implementation’ stage of representing the knowledge, may seem like a just-do-it task, but even when experts agree on what to represent, there are a myriad of ways how to represent it. Consistently adhered to, they lead to transformable modelling styles. Different representation choices may clash with other ontologies to reuse, however, and with some of the purposes that an ontology may have been built for. Cognizance of such differences may facilitate smoother deployment of ontologies in applications with different requirements. They also pose challenges on methods and tools, such as for competency questions and verification with them, test-driven development, and various bottom-up ontology development approaches, such as knowledge extraction from biological diagrams. In this talk we take a tour through new insights into such factors that slow down or speed up development of bio-ontologies and their use in tools for the life sciences.

11:50-12:10
Proceedings Presentation: Exploring Automatic Inconsistency Detection for Literature-based Gene Ontology Annotation
Room: MQRN
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Jiyu Chen, The University of Melbourne, Australia
  • Benjamin Goudey, The University of Melbourne, Australia
  • Justin Zobel, The University of Melbourne, Australia
  • Nicholas Geard, The University of Melbourne, Australia
  • Karin Verspoor, RMIT University, Australia


Presentation Overview: Show

Motivation: Literature-based Gene Ontology Annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This paper presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection.
Results: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported.
Conclusion: This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows.

12:10-12:30
Evaluation of Named Entity Recognition Systems to Improve Ontology Concept Annotation for Biomedical Knowledge Graphs
Room: MQRN
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Sanya B. Taneja, University of Pittsburgh, United States
  • Marcin Joachimiak, Lawrence Berkeley National Laboratory, United States
  • Harshad Hegde, Lawrence Berkeley National Laboratory, United States
  • William Baumgartner Jr., University of Colorado Anschutz Medical Campus, United States
  • J. Harry Caufield, Lawrence Berkeley National Laboratory, United States
  • Tiffany Callahan, Columbia University, United States
  • Christopher Mungall, Lawrence Berkeley National Laboratory, United States
  • Richard D. Boyce, University of Pittsburgh, United States


Presentation Overview: Show

Named Entity Recognition (NER) systems are commonly used in the construction of large biomedical knowledge graphs (KGs) from free text or non-standardized data. Their main role is to map biomedical entities to standardized identifiers in ontologies and databases. While NER is only one of the steps in KG construction, NER systems can greatly accelerate KG construction. However, errors introduced by the NER systems can systematically affect downstream applications of the KG. In this study, we used two NER systems – BioPortal Annotator and the OntoRunNER OGER++ wrapper - to map biomedical entities from a KG to 13 biomedical ontologies and subsequently evaluated the mappings. Our preliminary results show that the OntoRunNER wrapper produced average candidate matches equal to 4.26 and mapped 76% of the entities correctly, while the BioPortal Annotator correctly mapped 60% of the manually reviewed entities. Results from both systems contained errors such that using the mappings in a KG without curation could lead to inaccurate inferences. We are currently evaluating the effects of the NER systems on downstream KG applications using graph analysis, embedding similarity, and data source cross-validation.

14:30-14:50
EJP RD meets OHDSI: enabling interoperability for rare disease research
Room: MQRN
Format: Live-stream

Moderator(s): Tiffany Callahan

  • Rowdy de Groot, University of Amsterdam, Netherlands
  • Nirupama Benis, University of Amsterdam, Netherlands
  • Pablo Alarcón Moreno, Universidad Politécnica de Madrid, Spain
  • Rajaram Kaliyaperumal, Leiden University Medical Centre, Netherlands
  • Marco Roos, Leiden University Medical Centre, Netherlands
  • Ronald Cornet, University of Amsterdam, Netherlands
  • Núria Queralt-Rosinach, Leiden University Medical Center, Netherlands


Presentation Overview: Show

Interoperability, one of the FAIR Data principles (Findable, Accessible, Interoperable and Reusable), requires mapping data models, formats and semantics. The European Joint Programme on Rare Disease (EJP RD) CDE semantic data models enable creating highly expressive FAIR data for the interoperability of patient registries to facilitate rare disease research. The Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model (CDM) is used to harmonize representations of healthcare data and reproducible open source analytics to conduct clinical research. Here, we present our mapping work for scheme integration and interoperability between the EJP RD CDE semantic model and the OMOP CDM. Enhancing the interoperability between these two schemas can be beneficial to enrich rare disease research with observational health data, and to extend the EJP RD Virtual Platform research ecosystem with OMOP analytics.

14:50-15:10
Development of a general purpose cognitive-behavioral symptom taxonomy
Room: MQRN
Format: Live-stream

Moderator(s): Tiffany Callahan

  • Liwei Wang, Mayo Clinic, United States
  • Sunyang Fu, Mayo Clinic, United States
  • Sunghwan Sohn, Mayo Clinic, United States
  • Sungrim Moon, Mayo Clinic, United States
  • Hua Xu, University of Texas Health Science Center, United States
  • Cui Tao, University of Texas Health Science Center, United States
  • Jennifer St. Sauver, Mayo Clinic, United States
  • Ronald Petersen, Mayo Clinic, United States
  • Hongfang Liu, Mayo Clinic, United States
  • Jungwei Fan, Mayo Clinic, United States


Presentation Overview: Show

Motivation: Cognitive-behavior symptoms (CBSx) represent the surface manifestation of diverse etiology. Identification and documentation of CBSx are critical to biomedical research that targets understanding the association between the symptoms and the underlying causes. Seeing the lack of a semantic resource of CBSx, specifically dedicated to the symptom layer, we sought to develop a CBSx Taxonomy.
Methods: We defined CBSx as any observable abnormality or reduced capacity in certain cognitive or behavioral functionality. We also collected concepts concerning the impact of CBSx on life quality and daily activities. The taxonomy was iteratively developed by synthesizing knowledge from the literature, existing biomedical ontologies, and clinical instruments that assess CBSx in patients. A domain expert in aging and neurogenerative diseases was consulted in curating the CBSx Taxonomy.
Results: The derived taxonomy contains 258 concepts that are grouped into four major branches: cognitive symptoms, psychomotor symptoms, behavioral symptoms, and impact on life.
Conclusion: By synthesizing multiple knowledge sources, we developed the CBSx Taxonomy to serve as a dedicated semantic layer for cognitive-behavioral symptoms. The taxonomy is shared publicly and is expected to benefit diverse applications including natural language processing, phenotyping, and semantic harmonization.

15:10-15:20
Identification of clusters containing future gene-to-phenotype relations across heterogeneous data sources
Room: MQRN
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Michael Bradshaw, University of Colorado Boulder, United States
  • Connor Gibbs, Colorado State University, United States
  • Bailey Fosdick, Colorado State University, United States
  • Ryan Layer, University of Colorado Boulder, United States


Presentation Overview: Show

Due to gaps in knowledge, most rare disease patients never get a diagnosis. But diagnostic reach can be extended by integrating bio-ontologies. We show clusters identified across ontologies suggest undiscovered connections between genes and phenotypes. Biological networks have long been used to infer new connections, but the use of heterogeneous networks and the inference of gene-to-phenotype connections remains largely unexplored. We create a heterogeneous network composed of genes and phenotypes with edges derived from the STRING protein-protein interaction network and the Human Phenotype Ontology for each year: 2015 to 2021. Employing a combination of classic and node attribute-aware network clustering algorithms, we identify small, heterogeneous clusters for each year. We show these clusters contain significantly more (p < 0.0001) gene-to-phenotype edges in the future year than 10,000 replicates from a robust null model. Using biologically meaningful cluster properties, we train an XGBoost model to estimate the degree to which we’d expect more gene-to-phenotype pairs in the future year than at random and prioritize clusters that will be meaningful to those affected by a rare disease.

All data and methods are available in a Snakemake pipeline and Conda environment for the highest degree of reproducibility (https://github.com/MSBradshaw/BOCC).

16:00-16:20
Improving the FAIRness of vascular anomaly research data using the International Society for the Study of Vascular Anomalies (ISSVA) Ontology
Room: MQRN
Format: Live-stream

Moderator(s): Nicole Vasilevsky

  • Philip van Damme, Amsterdam UMC, Netherlands
  • Martijn Kersloot, Amsterdam UMC, Netherlands
  • Bruna dos Santos Vieira, Radboud university medical center, Netherlands
  • Leo Schultze Kool, Radboud university medical center, Netherlands
  • Ronald Cornet, Amsterdam UMC, Netherlands


Presentation Overview: Show

To support diagnosis, management, and further research of vascular anomalies, Mulliken and Glowacki created a comprehensive classification system for vascular anomalies. The International Society for the Study of Vascular Anomalies (ISSVA, i.e., the society for specialists of various medical disciplines involved in the treatment of patients afflicted with vascular anomalies), adopted this classification in 1996. The current version of the classification is available as a PDF file, which does not allow for structured registration of these diagnoses using unique identifiers, nor implementation in software systems. To make the data for vascular anomaly research more Findable, Accessible, Interoperable, and Reusable (FAIR), it is important that these diagnoses are registered in a structured and machine-readable manner. The Vascular Anomalies European Reference Network (VASCERN) and its Registry of Rare Vascular Anomalies (VASCA), therefore, adopted the ISSVA classification and created a machine-readable representation of the classification: the ISSVA ontology.

In this session, we will present the ISSVA ontology. We will also present our lessons learned from creating an ontology out of a classification and (semi-automatically) mapping the ontology to existing ontologies.

16:20-16:40
Creating a FAIR data model for personalized risk-based breast cancer research: Findings from the PRISMA study
Room: MQRN
Format: Live-stream

Moderator(s): Nicole Vasilevsky

  • Xiaofeng Liao, Radboud University Medical Center, Netherlands
  • Milou de Jong, Radboud University Medical Center, Netherlands
  • Philip van Damme, Amsterdam University Medical Center, Netherlands
  • Ronald Cornet, Amsterdam University Medical Center, Netherlands
  • Bruna dos Santos Vieira, Radboud University Medical Center, Netherlands
  • Jennifer Lutomski, Radboud University Medical Center, Netherlands
  • Mirjam Brullemans-Spansier, Radboud University Medical Center, Netherlands
  • Peter T Hoen, Radboud University Medical Center, Netherlands


Presentation Overview: Show

In the Netherlands, women aged 50-75 years are invited to receive breast cancer screening every two years. The PRISMA (Personalised RISk-based MAmmascreening) study was designed to investigate the added value of risk-based mammography screening. To our best knowledge, there is no universally accepted data model for the collection of breast cancer risk factors and PROMs. Therefore, we aimed to retrospectively create a domain ontology based on the PRISMA questionnaire. This contributes to global efforts to increase secondary use of patient-reported outcomes through FAIRification, i.e. ensuring data are Findable, Accessible, Interoperable, and Reusable.

Initially, 201 questions and 188 Variables were identified from the questionnaire. After several inventory meetings with different stakeholders, the resulting 70 data elements were grouped into 17 main classes. The structure of the domain ontology adheres to an “is-a” relationship.
Given that most of the concepts derived from the questionnaire are measurements of some attributes of the patient, we use the Semantic Science Integrated Ontology (SIO), which is capable of modelling entities, processes, and their qualities/attributes, as guidance to derive our core model.

The data model developed could serve as a template for other breast cancer research groups to other Patient-Reported Outcome and Real-World Experience questionnaires.

16:40-17:00
Ontology Management in an Industrial Environment: The BASF Governance Operational Model for Ontologies (GOMO)
Room: MQRN
Format: Live-stream

Moderator(s): Nicole Vasilevsky

  • Ana Iglesias-Molina, Universidad Politécnica de Madrid, Spain
  • José Antonio Bernabé-Díaz, BASF Group, Spain
  • Prashant Deshmukh, BASF Group, Germany
  • Paola Espinoza-Arias, BASF Group, Spain
  • Aaron Ayllón-Benítez, BASF Group, Spain
  • Alba Fernández-Izquierdo, BASF Group, Spain
  • José María Ponce-Bernabé, BASF Group, Spain
  • Sara Pérez, BASF Group, Spain
  • Edna Ruckhaus, Universidad Politécnica de Madrid, Spain
  • Oscar Corcho, Universidad Politécnica de Madrid, Spain
  • José Luis Sánchez-Fernández, BASF Group, Spain


Presentation Overview: Show

Governance on ontology development and maintenance practices within an organization has many advantages over ungoverned, siloed approaches that many organizations exhibit nowadays. This paper presents the BASF Governance Operational Model for Ontologies (GOMO), which addresses all stages of the ontology lifecycle and provides a framework for the development and maintenance of ontologies within BASF. GOMO is comprised of Principles, Standards and Quality Assurance criteria, Best Practices, Training and Outreach and is the result of a collaborative effort between industry and academia in the semantic web field. GOMO Principles, Standards and Best Practices are being applied to all running ontology-based projects in BASF. Through outreach presentations with sections of the BASF community, GOMO has reached a wider audience to foster understanding on the utility and implementation of ontologies. Finally, GOMO stands as a framework that is fit for adoption by other organizations facing similar challenges in ontology governance.

17:00-17:20
From genome annotation to knowledge graph: The case of Pseudomonas fluorescens SBW25
Room: MQRN
Format: Live-stream

Moderator(s): Nicole Vasilevsky

  • Carsten Fortmann-Grote, Max Planck Institute for Evolutionary Biology, Germany
  • Paul Rainey, Max Planck Institute for Evolutionary Biology, Germany


Presentation Overview: Show

We have recently published an updated genome assembly and annotation
of our model organism Pseudomonas fluorescens SBW25. We are now
facing the challenge to keep the annotation up to date with novel results
from experimental and computational studies of gene function, fitness as-
says, regulatory and metabolic networks. We will present
various opensource software tools and open data and metadata standards combined
into a public knowledge base for our model organism. The central part
is our genome database and genome browser based on Tripal. It allows internal and external colleagues to feed in their data and results in a curated fashion. To further integrate our data we are working on a Linked Data architecture that connects our
genome database to various public *omics databanks as well as to internal datasources
thereby creating an organism specific knowledge graph. By exposing a
public SPARQL endpoint, our data ultimately becomes part of the world
wide semantic web that incorporates other, domain specific knowledge
graphs but also generic data sources such as Wikipedia (via WikiData).
In this way, our system facilitates the growth of the Pseudomonas fluorescens SBW25 knowledge graph both through manual explorations as well as through automated procedures.

17:20-17:40
RTO, A Specific Crop Ontology for Rice Trait Concepts
Room: MQRN
Format: Live-stream

Moderator(s): Nicole Vasilevsky

  • Xinzhi Yao, Huazhong Agricultural University, China
  • Yun Liu, Huazhong Agricultural University, China
  • Qidong Deng, Huazhong Agricultural University, China
  • Yusha Liu, Huazhong Agricultural University, China
  • Xinchen Ma, Huazhong Agricultural University, China
  • Yufei Shen, Huazhong Agricultural University, China
  • Qianqian Peng, Huazhong Agricultural University, China
  • Zaiwen Feng, Huazhong Agricultural University, China
  • Jingbo Xia, Huazhong Agricultural University, China


Presentation Overview: Show

Being the greatest significant crop in Asian countries, rice and its breeding have long been a concerned research issues. Unfortunately, a standardization of rice trait ontology has been lacking, and make it challenging to normalize the description of lab results. In this work, a rice trait ontology (RTO) is manually curated by aligning three existed terminology set of rice trait. RTO includes 2,522 rice trait concepts with corresponding descriptors defining relations among concepts. Hopefully RTO standardizes the common-used trait concepts during rice breeding research, and provides the possibility of automated mining of rice traits knowledge. To make it easier for concept query, a user-friendly web service is released via the link, http://lit-evi.hzau.edu.cn/RiceTraitOntology.

17:40-18:00
EDAM - The data analysis and management ontology (update 2022)
Room: MQRN
Format: Live from venue

Moderator(s): Sanya Taneja

  • Lucie Lamothe, Institut Français de Bioinformatique, France
  • Mads Kierkegaard, University of Southern Denmark, Ødense, Denmark
  • Melissa Black, Outreachy intern (EDAM), São Paulo (at the time of contribution), Brazil
  • Hager Eldakroury, Outreachy intern (EDAM), Cairo (at the time of contribution), Egypt
  • Ankita Priya, Birla Institute of Technology, Mesra, India
  • Anne Machinda, independent contributor, Bamenda, Cameroon
  • Uttam Singh Khanduja, Medicaps University, Indore, India
  • Drashti Patoliya, independent contributor, Surat, India
  • Rashika Rathi, Indian Institute of Technology, Mandi, India
  • Tawah Peggy Che Nico, University of Buea, Cameroon
  • Gloria Umutesi, independent contributor, Kigali, Rwanda
  • Claudia Blankenburg, Leibniz Institute of Plant Biochemistry, Halle, Germany
  • Vedran Kasalica, Utrecht University, Netherlands
  • Anita Op, independent contributor, Nigeria
  • Precious Chieke, independent contributor, Nigeria
  • Zm, independent contributor, Nigeria
  • Ellschi, independent contributor, Germany
  • Gerlex, independent contributor, Bouvet Island
  • Steve Laurie, Centre Nacional d'Anàlisi Genòmica, Barcelona, Spain
  • Steffen Neumann, Leibniz Institute of Plant Biochemistry, Halle, Germany
  • Veit Schwämmle, University of Southern Denmark, Ødense, Denmark
  • Ivan Kuzmin, University of Tartu, Estonia
  • Jon Ison, Institut Français de Bioinformatique (at the time of contribution), France
  • Chris Hunter, GigaScience GigaDB, Hong Kong, China
  • Jonathan Karr, Icahn School of Medicine at Mount Sinai, New York City, United States
  • Anne Fouilloux, University of Oslo, Norway
  • Alban Gaignard, University of Nantes, France
  • Bryan Brancotte, Institut Pasteur, Paris, France
  • Hervé Ménager, Institut Pasteur, Paris, France
  • Matúš Kalaš, University of Bergen, Norway


Presentation Overview: Show

EDAM is a domain ontology of data analysis and data management in bio- and other sciences. It comprises concepts related to analysis, modeling, optimization, and data life-cycle, divided into 4 main sections: topics, operations, data, and formats.

EDAM is used in numerous resources, for example Bio.tools, Galaxy, CWL, Debian, BioSimulators, FAIRsharing, or the ELIXIR training portal TeSS. Thanks to the annotations with EDAM, tools, workflows, standards, data, and learning materials are easier to find, compare, choose, and integrate.

EDAM is developed by a diverse community of contributors. A substantial extension is EDAM Bioimaging, focused on image analysis and machine learning.

The main improvements and ongoing work in 2022 include:

- In addition to using standard tools such as HermiT and ROBOT, we develop additional validation tools at both the syntactic and semantic levels: https://github.com/edamontology/edam-validation

- Enabling interdisciplinary applications with EDAM Geo (https://github.com/edamontology/edam-geo), an extension of EDAM towards geolocated data (e.g. in ecology, public health, …). Developed at https://webprotege.stanford.edu/#projects/69591619-4eda-4f03-9e7f-65b213038fe1/edit/Classes

- Improving the implementation of links to external resources (incl. other ontologies), definitions, and the overall quality

- Addition of numerous data formats, especially for models and simulations, and chemistry

Thursday, July 14th
10:15-10:45
Proceedings Presentation: DeepGOZero: Improving protein function prediction from sequence and zero-shot learning based on ontology axioms
Room: MQRN
Format: Live-stream

Moderator(s): Sanya Taneja

  • Maxat Kulmanov, King Abdullah University of Science and Technology, Saudi Arabia
  • Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia


Presentation Overview: Show

Motivation: Protein functions are often described
using the Gene Ontology (GO) which is an ontology consisting of over
50,000 classes and a large set of formal axioms. Predicting the
functions of proteins is one of the key challenges in computational
biology and a variety of machine learning methods have been
developed for this purpose. However, these methods usually require
significant amount of training data and cannot make predictions for
GO classes which
have only few or no experimental annotations.
Results: We developed DeepGOZero, a machine learning model
which improves predictions for functions with no or only a small
number of annotations. To achieve this goal, we rely on a
model-theoretic approach for learning ontology embeddings and
combine it with neural networks for protein function
prediction. DeepGOZero can exploit formal axioms in the GO to make
zero-shot predictions, i.e., predict protein functions even if not a
single protein in the training phase was associated with that
function. Furthermore, the zero-shot prediction method employed by
DeepGOZero is generic and can be applied whenever associations with
ontology classes need to be predicted.

10:45-10:55
Assessing ontology fitness for use with the Harmonized Data Quality Framework
Room: MQRN
Format: Live from venue

Moderator(s): Sanya Taneja

  • Tiffany Callahan, Columbia University, United States
  • William Baumgartner, University of Colorado Anschutz Medical Campus, United States
  • Nicolas Matentzoglu, Semanticly Ltd, United Kingdom
  • Nicole Vasilevsky, University of Colorado Anschutz Medical Campus, United States
  • Lawrence Hunter, University of Colorado Anschutz Medical Campus, United States
  • Michael Kahn, University of Colorado Anschutz Medical Campus, United States


Presentation Overview: Show

Ontologies play an important role in the representation, standardization, and integration of biomedical data, but are known to have data quality (DQ) issues. We aimed to understand if the Harmonized Data Quality Framework (HDQF), developed to standardize electronic health record DQ assessment strategies, could be used to improve ontology quality assessment. A novel set of 14 ontology checks was developed. These DQ checks were aligned to the HDQF and examined by HDQF developers. The ontology checks were evaluated using 11 Open Biomedical Ontology Foundry ontologies. 85.7% of the ontology checks were successfully aligned to at least 1 HDQF category. Accommodating the unmapped DQ checks (n=2), required modifying an original HDQF category and adding a new Data Dependency category. While all of the ontology checks were mapped to an HDQF category, not all HDQF categories were represented by an ontology check presenting opportunities to strategically develop new ontology checks. The HDQF is a valuable resource and this work demonstrates its ability to categorize ontology quality assessment strategies.

10:55-11:05
Systematic Integration of Large-scale Clinical and Phenotype Datasets
Room: MQRN
Format: Live-stream

Moderator(s): Sanya Taneja

  • Irene Kyomugisha, Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, South Africa
  • Gaston Mazandu, Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, South Africa
  • Jack Morrice, Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, South Africa
  • Rapheal Sangenda, Sickle Cell Programme, Muhimbili University of Health, and Allied Sciences (MUHAS), Dar es Salaam, Tanzania
  • Annemie Stewart, Computational Biology Division, Faculty of Health Sciences, University of Cape Town, South Africa
  • Victoria Nembaware, Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, South Africa
  • Wilson Mupfururirwa, Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, South Africa
  • Julie Makani, Sickle Cell Programme, Muhimbili University of Health, and Allied Sciences (MUHAS), Dar es Salaam, Tanzania
  • Ambroise Wonkam, Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, South Africa


Presentation Overview: Show

Advances in data collection techniques have yielded large-scale heterogeneous clinical and phenotype datasets from different geographical locations. However, harmonizing these datasets retrospectively for integrative analyses to potentially increase prediction power is still challenging. We present omsim, a model-based ontology mapping and text graph-based similarity information retrieval technique, for automated generation of harmonized datasets from disparate research patient registries. We tested omsim on multi-national sickle cell patient research datasets in sub-Saharan Africa.

11:05-11:25
Predicting the pro-longevity or anti-longevity effect of model organism genes with sub-graph embeddings of Gene Ontology
Room: MQRN
Format: Live-stream

Moderator(s): Sanya Taneja

  • Patrick Greaves, Birkbeck, University of London, United Kingdom
  • Carl Barton, Birkbeck, University of London, United Kingdom
  • Cen Wan, Birkbeck, University of London, United Kingdom


Presentation Overview: Show

The recent success of node embedding methods has greatly boosted the application of graph data to bioinformatics problems. In this work, we proposed a new method for learning a type of Gene Ontology sub-graph embeddings to classify model organisms' genes into pro-longevity or anti-longevity genes. The experimental results show that this type of Gene Ontology sub-graph embeddings successfully obtains higher predictive accuracy than the conventional binary Gene Ontology annotation-based features.

11:25-11:45
OBO Academy: Training materials for bio-ontologists
Room: MQRN
Format: Live from venue

Moderator(s): Sanya Taneja

  • Nicole Vasilevsky, University of Colorado Anschutz Medical Campus, United States
  • James Overton, Knocean, Inc, Canada
  • Rebecca Jackson, Bend Informatics LLC, United States
  • Sabrina Toro, University of Colorado Anschutz Medical Campus, United States
  • Shawn Tan, European Bioinformatics Institute, United Kingdom
  • Bradley Varner, European Bioinformatics Institute, United Kingdom
  • David Osumi-Sutherland, University of Cambridge, United Kingdom
  • Nicolas Matentzoglu, EMBL-EBI, United Kingdom


Presentation Overview: Show

Biomedical ontologies are widely available, with hundreds of ontologies under development, however, there is a lack of formal training on methods for ontology development, including best practices for how to create and edit ontologies, and the application of new tools and workflows. This presents a challenge for new and current ontologists to find and access training materials, and learn the methodology or hone existing skills. The OBO Academy provides open, online, self-paced training materials that aim to provide ongoing training for the ontology community on best practices in ontology development. The training materials cover a range of topics from basics like getting started in contributing to ontologies and editing in Protege, to more advanced materials that cover technical workflows such as using the Ontology Development Kit and ROBOT templates. The initial offering of materials is available online and is under continuous development, and community feedback and contributions are welcomed (https://github.com/OBOAcademy/obook).

11:45-12:05
Creation and unification of development and life stage ontologies for animals
Room: MQRN
Format: Live from venue

Moderator(s): Nicole Vasilevsky

  • Anne Niknejad, University of Lausanne, Switzerland
  • Christopher J. Mungall, Lawrence Berkeley National Laboratory, Berkeley, USA, United States
  • David Osumi-Sutherland, EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK, United Kingdom
  • Marc Robinson-Rechavi, Universite de Lausanne, Switzerland
  • Frederic B. Bastian, University of Lausanne, SIB Swiss Institute of Bioinformatics, Switzerland


Presentation Overview: Show

With the new era of genomics, an increasing number of animal species are amenable to large-scale data generation. This had led to the emergence of new multi-species ontologies to annotate and organize these data. While anatomy and cell types are well covered by these efforts, information regarding development and life stages is also critical in the annotation of animal data. Its lack can hamper our ability to answer comparative biology questions and to interpret functional results. We present here a collection of development and life stage ontologies for 21 animal species, and their merge into a common multi-species ontology. This work has allowed the integration and comparison of transcriptomics data in 52 animal species.

12:05-12:15
Bio-Ontologies COSI Closing Remarks
Room: MQRN
Format: Live from venue

Moderator(s): Robert Hoehndorf

  • Robert Hoehndorf


Presentation Overview: Show

Closing session and COSI remarks

13:15-14:15
Keynote Presentation: The open data highway: turbo-boosting translational traffic with ontologies
Room: Lecture Hall
Format: Live from venue

Moderator(s): Nomi Harris

  • Melissa Haendel, University of Colorado Anschutz Medical Campus, USA


Presentation Overview: Show

Addressing complex scientific challenges requires a roadmap of data from diverse sources, organisms, contexts, formats, and granularities. Building a coherent holistic view of the data landscape to address any given problem is non-trivial. Often in the aggregation process, many of the original connections within the data are lost and it is difficult to make new (inferred) connections. Novel data integration strategies that leverage semantic technologies such as ontologies, knowledge graphs, and common modeling strategies can help span disciplinary boundaries. However, it takes the people too; robust interdisciplinary collaboration and improved data licensing and access can advance progress and innovation - turbo boosting the open data highway.

14:15-14:35
The OntoDev Suite of Ontology and Data Integration Tools
Room: Lecture Hall
Format: Live-stream

Moderator(s): Tiffany Callahan

  • Rebecca C. Jackson, Bend Informatics LLC, United States
  • James A. Overton, Knocean, Inc., Canada


Presentation Overview: Show

The OntoDev Suite (https://ontodev.com, https://github.com/ontodev) of open source software brings together modular open-source libraries and applications for ontology development and scientific data integration, with special emphasis on open science and the Open Biological and Biomedical Ontologies (OBO) community. The suite builds on the success of ROBOT to include data cleaning, ontology-driven validation, development and curation workflows, and more. We strive to make small, focused tools that work well together, but also work well with other best-in-class software, languages, and platforms. In this talk we present an overview of the suite, design principles, and future plans.

14:35-14:55
RPhenoscape: Semantic computing with morphological traits and ontologies
Room: Lecture Hall
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Hilmar Lapp, Duke University, United States
  • John Bradley, Duke University, United States
  • Hong Xu, Duke University, United States
  • Amandeep Rathee, Duke University, United States
  • James Balhoff, Renaissance Computing Institute, United States
  • Wasila Dahdul, University of California at Irvine, United States


Presentation Overview: Show

We present RPhenoscape, a package for the R programming language to provide convenient and robust access to the ontologies and ontology-liked morphological trait data (phenotypes) within the Phenoscape Knowledgebase (KB), as well as to several algorithms for computing with the semantics of traits based on formal logic reasoning. Among the major aims of the package is to enable the computational integration of trait semantics into evolutionary models for comparative trait analysis, which have traditionally treated traits simply as independent characters and character states. To this end, RPhenoscape provides access to the computational inference of presence/absence trait matrices, the presence/absence-based inference of trait dependence, evidence-based mutual trait compatibility/exclusivity, and a variety of semantic similarity metrics for phenotypes. RPhenoscape is currently in the last steps of a new major release series, which adds some of the features presented here and once complete will be made available on the Comprehense R Archive Network.

14:55-15:05
KG-OBO: Open Bio-Ontologies in Knowledge Graph Form
Room: Lecture Hall
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Justin Reese, Berkeley Bioinformatics Open-source Projects, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
  • Chris Mungall, Berkeley Bioinformatics Open-source Projects, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
  • Harry Caufield, Berkeley Bioinformatics Open-source Projects, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States


Presentation Overview: Show

Knowledge graphs (KGs) are representations of entities and their multifaceted relationships. An ongoing challenge in learning from KGs in biology and biomedicine is in bridging the gap between real-world observations and conceptual knowledge. Though numerous bio-ontologies address this need, none may be directly added to a KG without significant effort.

Past efforts in aligning instance data to ontologies led to creation of the OBO Foundry, an open resource for standardized biological ontologies. We developed KG-OBO to allow the community to rapidly integrate OBO Foundry ontologies with biological KGs. KG-OBO translates OBOs into easily-parsed KGX TSV graphs aligned with the Biolink model, then uploads all graphs to a public repository. Users may merge one or more ontology graphs as needed, e.g., combining CHEBI with a KG of protein vs. chemical interactions allows for grouping chemicals hierarchically. The added context can also provide further training input for graph machine learning models.

The KG-OBO code, graphs, and infrastructure drive a community of knowledge engineers seeking answers to biomedical questions in KGs, including the broader OBO community. We anticipate that continued interest in learning from KGs will require easy access to the comprehensive knowledge within bio-ontologies, and KG-OBO fills this need.

15:05-15:15
Federating and querying heterogeneous and distributed Web APIs and triple stores
Room: Lecture Hall
Format: Live from venue

Moderator(s): Tiffany Callahan

  • Tarcisio Mendes de Farias, SIB Swiss Institute of Bioinformatics, Switzerland
  • Christophe Dessimoz, University of Lausanne, Switzerland
  • Aaron Ayllon-Benitez, BASF Digital Solutions SL, Spain
  • Chen Yang, BASF, Belgium
  • Jiao Long, Ghent University, Belgium
  • Ana-Claudia Sima, SIB Swiss Institute of Bioinformatics, Switzerland


Presentation Overview: Show

Today’s international corporations such as BASF, a leading company in the crop protection industry, produce and consume more and more data that are often fragmented and accessible through Web APIs. In addition, part of the proprietary and public data of BASF’s interest are stored in triple stores and accessible with the SPARQL query language. Homogenizing the data access modes and the underlying semantics of the data without modifying or replicating the original data sources become important requirements to achieve data integration and interoperability. In this work, we propose a federated data integration architecture within an industrial setup, that relies on an ontology-based data access method. Our performance evaluation in terms of query response time showed that most queries can be answered in under 1 second.