Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Bio-Ontologies COSI

Presentations

Schedule subject to change
Wednesday, July 15th
10:40 AM-11:40 AM
Bio-Ontologies Keynote: The crisis of content
Format: Live-stream

  • Michael Gruninger

Presentation Overview: Show

Although a plethora of ontologies have been developed in a wide
variety of domains, there is often a sense in which it is difficult to measure progress in the field of applied ontology. In some domains there is a mindset that treats ontologies as being as arbitrary as software code, so there is no point in evaluating them, and there cannot possibly be any consensus on which ontologies to use. In other domains, there is an abundance of ontologies but no understanding of their relationships, leading to a perception of continually reinventing the wheel. Far too often, the only criteria for selecting ontologies are political, not technical. If we proceed further down this road, we ultimately risk irrelevance. Against this viewpoint, I would offer an approach to ontology design that focuses on formalizing the intended semantics of an ontology, so that sharability and reusability is guaranteed.

11:20 AM-11:30 AM
CORAL: A platform for FAIR, rigorous, self-validated data modeling and integrative, reproducible data analysis
Format: Pre-recorded with live Q&A

  • John-Marc Chandonia, Berkeley National Lab, United States
  • Pavel S Novichkov, Berkeley National Lab, United States
  • Adam P Arkin, Berkeley National Lab, United States

Presentation Overview: Show

Many organizations face challenges in managing and analyzing data, especially when such data is obtained from multiple sources, created using diverse methods or protocols. Analyzing heterogeneous, structured datasets requires rigorous tracking of their interrelationships and provenance. This task has long been a Grand Challenge of data science, and has more recently been formalized in the FAIR principles: that all data be Findable, Accessible, Interoperable and Reusable, both for machines and for people. Adherence to these principles is necessary for proper stewardship of information, for testing regulatory compliance, for measuring efficiency, and for effectively being able to reuse data analytical frameworks. Interoperability and reusability are especially challenging to implement in practice, to the extent that scientists acknowledge a “reproducibility crisis” across many fields of study. We developed CORAL, a framework for organizing the large diversity of datasets that are generated and used by moderately complex organizations. CORAL features a web interface for bench scientists to upload and explore data, as well as a Jupyter notebook interface for data analysts, both backed by a common API. We describe the CORAL data model and associated tools, and discuss how they greatly facilitate adherence to all four of the FAIR principles.

12:00 PM-12:20 PM
Ontology-based collection and analysis of natural and lab animal hosts of human coronaviruses
Format: Pre-recorded with live Q&A

  • Yongqun He, University of Michigan, United States
  • Yang Wang, Guizhou University Medical College, China
  • Fengwei Zhang, People’s Hospital of Guizhou University, China
  • Hong Yu, People’s Hospital of Guizhou University, China
  • Xianwei Ye, Guizhou University Medical College, China

Presentation Overview: Show

SARS-CoV-2 is the pathogen of the COVID-19 disease. It
is commonly agreed that SARS-CoV-2 originated from
some animal host. However, the exact origin of
SARS-CoV-2 remains unclear. The origins of other human
coronaviruses including SARS-CoV and MERS-CoV are
also unclear. This study focuses on collection, ontological
modeling and representation, and analysis of the hosts of
various human coronaviruses with a focus on SARS-CoV-2.
Over 20 natural and laboratory animal hosts were found able
to host human coronaviruses. All the viruses and hosts were
classified using the NCBITaxon ontology. The related terms
were also imported to the Coronavirus Infectious Disease
Ontology (CIDO), and the relations between human
coronaviruses and their hosts were linked using an axiom in
CIDO. Our ontological classification of all the hosts also
allowed us to hypothesize that human coronaviruses only
use mammals as their hosts.

12:20 PM-12:40 PM
CIDO Diagnosis: COVID-19 diagnosis modeling, representation and analysis using the Coronavirus Infectious Disease Ontology
Format: Pre-recorded with live Q&A

  • Yongqun He, University of Michigan, United States
  • Hong Yu, People’s Hospital of Guizhou University, China
  • Asiyah Lin, National Center for Ontological Research, United States

Presentation Overview: Show

Diagnosis of COVID-19 is critical to the control of COVID-19
pandemic. Common diagnostic methods include symptoms
identification, chest imaging, serological test, and RT-PCR.
However, the sensitivity and specificity of different diagnosis
methods differ. In this study, we ontologically represent
different aspects of COVID-19 diagnosis using the community-
based Coronavirus Infectious Disease Ontology (CIDO), an
OBO Foundry library ontology. CIDO includes many new terms
and also imports many relevant terms from existing ontologies.
The high level hierarchy and design pattern of CIDO are
introduced to support COVID-19 diagnosis. The knowledge
reported in the literature reports and reliable resources such as
the FDA website is ontologically represented. We modeled and
compared over 20 SARS-CoV-2 RT-PCR assays, which target
different gene markers in SARS-CoV-2. The sensitivity and
specificity of different methods are discussed.

2:00 PM-2:20 PM
Modeling quantitative traits for COVID-19 case reports
Format: Pre-recorded with live Q&A

  • Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia
  • Núria Queralt-Rosinach, Leiden University Medical Center, Netherlands
  • Susan Bello, Jackson Laboratory, United States
  • Claus Weiland, Senckenberg Biodiversity and Climate Research Center, Germany
  • Philippe Rocca-Serra, University of Oxford, United Kingdom
  • Paul Schofield, University of Cambridge, United Kingdom

Presentation Overview: Show

Medical practitioners record the condition status of a patient through qualitative and quantitative
observations. The measurement of vital signs and molecular parameters in the clinics gives a
complementary description of abnormal phenotypes associated with the progression of a disease. The
Clinical Measurement Ontology (CMO) is used to standardize annotations of these measurable traits.
However, researchers have no way to describe how these quantitative traits relate to phenotype
concepts in a machine-readable manner. Using the WHO clinical case report form standard for the
COVID-19 pandemic, we modeled quantitative traits and developed OWL axioms to formally relate
clinical measurement terms with anatomical, biomolecular entities and phenotypes annotated with the
Uber-anatomy ontology (Uberon), Chemical Entities of Biological Interest (ChEBI) and the Phenotype and
Trait Ontology (PATO) biomedical ontologies. The formal description of these relations allows
interoperability between clinical and biological descriptions, and facilitates automated reasoning for
analysis of patterns over quantitative and qualitative biomedical observations.

2:20 PM-2:40 PM
Using ontologies to extract disease-phenotype associations from literature
Format: Pre-recorded with live Q&A

  • Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia
  • Senay Kafkas, King Abdullah University Of Science and Technology, Saudi Arabia
  • Sarah Alghamdi, King Abdullah University of Science and Technology, Saudi Arabia
  • Paul Schofield, University of Cambridge, United Kingdom

Presentation Overview: Show

With the advances in Next Generation Sequencing (NGS) technologies, a huge volume of clinical genomic data has become available. Efficient exploitation of such data requires linkage to a patient's complete phenotype profile. Current resources providing disease-phenotype associations are not comprehensive, and they often do not cover all of the diseases from OMIM and particularly from ICD10, which are the primary terminologies used in clinical settings. Here, we propose a text-mining system which utilizes semantic relations in the phenotype ontologies and statistical methods to extract disease-phenotype associations from the literature. We compare our findings against established disease-phenotype associations and also demonstrate its utility in covering mouse gene-disease associations from Mouse Genome Informatics (MGI). Such associations serve as necessary information blocks for understanding underlying disease mechanisms and developing or repurposing drugs.

2:40 PM-3:00 PM
Representing Physician Suicide Claims as Nanopublications
Format: Pre-recorded with live Q&A

  • Michel Dumontier, Maastricht University, Netherlands
  • Tiffany Leung, Maastricht University, Netherlands
  • Tobias Kuhn, VU University Amsterdam, Netherlands

Presentation Overview: Show

In the poorly studied field of physician suicide, various fac-tors can contribute to misinformation or information distor-tion, which in turn can influence evidence-based policies and prevention of suicide in this unique population. Here, we report on the use of nanopublications as a scientific publishing approach to establish a citation network of claims drawn from a variety of media concerning the rate of suicide of US physicians. Our work integrates these vari-ous claims and enables the verification of non-authoritative assertions, thereby better equipping researchers and to advance evidence-based knowledge and make informed statements in the advocacy of physician suicide prevention.

3:20 PM-3:40 PM
Metadata standards for the FAIR sharing of vector embeddings in Biomedicine
Format: Pre-recorded with live Q&A

  • Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia
  • Senay Kafkas, King Abdullah University Of Science and Technology, Saudi Arabia
  • Remzi Çelebi, Maastricht University, Netherlands
  • Mehdi Ali, University of Bonn, Germany
  • Hajira Jabeen, University of Bonn, Germany
  • Michel Dumontier, Maastricht University, Netherlands

Presentation Overview: Show

Motivation:
Today, we have an enormous amount of biomedical data and its size, as well as complexity, have been increasing over time. Implementation of standards represents one of the key drivers in the life sciences research as well as the technology transfer. More specifically, standards enable data accessibility, sharing, integration and therefore facilitates data harnessing and accelerates research and innovation transfer.
The life sciences community has widely developed and used Semantic web technology standards for data representation and sharing. However, given the success of unsupervised machine learning methods such as Word2Vec and BERT, there is a need to develop new standards for sharing the (pre-trained) vector space embeddings of the entities to facilitate reusability of data and method development. Motivated by this, we propose data and metadata standards for the FAIR distribution of vector embeddings and demonstrate utilization of these standards in Bio2Vec, a platform providing a flexible, reliable and standard-compliant data representation, sharing, integration and analysis.

Availability:
The proposed metadata standard and an example are available in the ShEx format at Zenodo.

3:40 PM-4:00 PM
Applying GWAS on UK Biobank by using enhanced phenotype information based on Ontology-Wide Association Study
Format: Pre-recorded with live Q&A

  • Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia
  • Senay Kafkas, King Abdullah University Of Science and Technology, Saudi Arabia
  • Runar Reve, King Abdullah University Of Science and Technology, Saudi Arabia

Presentation Overview: Show

Genome Wide Associations Study (GWAS) have been widely used to identify potentially causative variants of genetic disease or trait given the patient phenotypes. However, generally it cannot present the complete picture, particularly on how the studied trait related to other similar traits; because, often not all of the available phenotype information is exploited in the analyses.
Here, we propose to use Ontology-Wide Genome Associations Study (OWAS) to complete the phenotype profiles of diseases and perform GWAS on UK Biobank.
More specifically, with OWAS, we utilize the phenotype information that exists in the literature as well as in semantic resources to expand the GWAS to the cases that are not explicitly associated with the phenotypes. Our initial results show that our approach has the potential to increase the statistical power of GWAS as well as identify associations for the phenotypes which have not been explicitly observed.

4:00 PM-4:20 PM
hPSCreg-CLO: ontological representation of human pluripotent stem cell lines from the hPSCreg
Format: Pre-recorded with live Q&A

  • Stefanie Seltmann, Charité – Universitätsmedizin Berlin, Germany
  • Yongqun He, University of Michigan, United States

Presentation Overview: Show

Human pluripotent stem cells (PSC) are immortal, represent the genotype of the donor and can differentiate into all cell types of a human body. These features establish their enormous potential for modelling diseases and tissues in vitro, drug- and toxicity testing and regenerative medicine. To translate these potencies into reality, large numbers of PS- lines are being generated from a wide spectrum of donors to make them available for the diverse applications. For users to identify suitable PSC- lines, information about the donors, cell generation, characterization and quality are essential. The human pluripotent stem cell registry hPSCreg contains more than 3000 cell line that are richly annotated. To make the hPSCreg resource more accessible and interoperable, we developed hPSCreg-CLO, a new CLO branch that represents various hPSC lines from hPSCreg. hPSC specific design patterns were generated and used to support computer-assisted ontology development. The hPSCreg-CLO includes over 2,400 hPSC lines and their related information such as cell donors, anatomical entities, and original cell types. DL queries were performed to demonstrate the query capability of hPSCreg-CLO. hPSCreg-CLO will further be integrated with the hPSCreg project and support the database data integration and advanced analyses.

4:20 PM-4:40 PM
A Framework for Automated Construction of Heterogeneous Large-Scale Biomedical Knowledge Graphs
Format: Pre-recorded with live Q&A

  • Tiffany J. Callahan, Computational Bioscience Program, University of Colorado Anschutz Medical Campus, United States
  • Ignacio J. Tripodi, Computer Science Department, University of Colorado Boulder, United States
  • Lawrence E. Hunter, Computational Bioscience Program, University of Colorado Anschutz Medical Campus, United States
  • William A. Baumgartner, Computational Bioscience Program, University of Colorado Anschutz Medical Campus, United States

Presentation Overview: Show

Although knowledge graphs (KGs) are used extensively in biomedical research to model complex phenomena, many KG construction methods remain largely unable to account for the use of different standardized terminologies or vocabularies, are often difficult to use, and perform poorly as the size of the KG increases in scale. We introduce PheKnowLator (Phenotype Knowledge Translator), a novel KG framework and fully automated Python 3 library explicitly designed for optimized construction of semantically-rich, large-scale biomedical KGs. To demonstrate the functionality of the framework, we built and evaluated eight different parameterizations of a large semantic KG of human disease mechanisms. PheKnowLator is available at: https://github.com/callahantiff/PheKnowLator.

5:00 PM-6:00 PM
Bio-Ontologies Keynote: COVID-SEE: Enabling scientific evidence exploration through semantics in a time of crisis
Format: Live-stream

  • Karin Verspoor

Presentation Overview: Show

The rapid increases in scientific knowledge have never been more obvious than in the wake of the emergence of the COVID-19 virus, where we have seen hundreds of new research articles being published every week as scientists and medical researchers rush to share knowledge about predicting disease spread, and management or treatment of the disease. This has left scientists scrambling to navigate and synthesise large amounts of information. We have been developing a system we call COVID-SEE (Scientific Evidence Explorer) that leverages natural language processing methods to structure key information in COVID-19-related literature, and facilitates navigation of the literature through a relational lens. I will introduce our approach, and discuss the many ways ontologies enable and support the project.

6:00 PM-6:00 PM
Bio-Ontologies: Closing Day 1
Format: Live-stream

Thursday, July 16th
10:40 AM-11:00 AM
Detecting Gene Ontology misannotations using taxon-specific rate ratio comparisons
Format: Pre-recorded with live Q&A

  • Chengxin Zhang, University of Michigan, United States
  • Xiaoqiong Wei, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, China
  • Peter Freddolino, University of Michigan, United States
  • Yang Zhang, University of Michigan, United States

Presentation Overview: Show

Many protein function databases are built on automated or semi-automated curations and can contain various annotation errors. The correction of such misannotations is critical to improving the accuracy and reliability of the databases. We proposed a new approach to detect potentially incorrect Gene Ontology (GO) annotations by comparing the ratio of annotation rates (RAR) for the same GO term across different taxonomic groups, where those with a relatively low RAR usually correspond to incorrect annotations. As an illustration, we applied the approach to 20 commonly-studied species in two recent UniProt-GOA releases and identified 250 potential misannotations in the 2018-11-6 release, where only 25% of them were corrected in the 2019-6-3 release. Importantly, 56% of the misannotations are “Inferred from Biological aspect of Ancestor (IBA)”, i.e. reviewed computational annotations based on phylogenetic analysis. This is in contrast to previous observations that attributed misannotations mainly to “Inferred from Sequence or structural Similarity (ISS)”, probably reflecting an error source shift due to the new developments of function annotation databases. The results demonstrated a simple but efficient misannotation detection approach that is useful for large-scale comparative protein function studies. The code and list of identified misannotations are available at https://zhanglab.ccmb.med.umich.edu/RAR.

11:00 AM-11:10 AM
COB as a Community Resource
Format: Live-stream

  • Rebecca Jackson, University of Maryland Baltimore, United States
  • Randi Vita, La Jolla Institute for Allergy & Immunology, United States
  • Lynn Schriml, University of Maryland School of Medicine, United States
  • William D Duncan, Lawrence Berkeley National Laboratory, United States
  • James A Overton, Knocean Inc, Canada
  • Christopher J Mungall, Lawrence Berkeley National Laboratory, United States
  • Bjoern Peters, La Jolla Institute for Allergy & Immunology, United States

Presentation Overview: Show

The Open Biological and Biomedical Ontology (OBO) is a collective of ontology developers committed to collaboration and shared principles. The OBO Foundry mission is to develop a family of logically well-formed and scientifically accurate interoperable ontologies. Participants voluntarily adhere and contribute to the development of an evolving set of principles including open use, collaborative development, non-overlapping and strictly-scoped content, common syntax and relations. OBO provides services to the community such as hosting persistent URLs and ontology files, recording metadata, and supporting discussion forums and regular calls.
We developed a set of key top-level ontology terms that unify the many OBO Foundry ontologies, termed Core Ontology for Biology and Biomedicine (COB). COB simplifies the identification of ontology terms simplifying navigation across OBO projects. It includes logic that links ontologies together, allowing interoperability problems to be detected and corrected. Related ontology terms from multiple ontologies can be viewed at the same time, illustrating how OBO ontologies and their terms are related and ensuring interoperability.
COB is still in active development; we are eager to obtain community feedback. We want to collect actionable suggestions on what users most want in COB and what this community would find most useful to their daily practices.

11:10 AM-11:20 AM
A Structured Model for Immune Exposures
Format: Pre-recorded with live Q&A

  • Randi Vita, La Jolla Institute for Allergy & Immunology, United States
  • James A Overton, Knocean Inc, Canada
  • Bjoern Peters, La Jolla Institute for Allergy & Immunology, United States
  • Patrick Dunn, ImmPort Curation Team, United States
  • Kei-Hoi Cheung, Department of Emergency Medicine, Yale University, United States
  • Steven H Kleinstein, Interdepartmental Program in Computational Biology and Bioinformatics, Yale, United States
  • Alessandro Sette, La Jolla Institute for Allergy & Immunology, United States

Presentation Overview: Show

An Immune Exposure is the process by which components of the immune system first encounter a potential trigger. The ability to describe consistently the details of the Immune Exposure process was needed for data resources responsible for housing scientific data related to the immune response. This need was met through the development of a structured model for Immune Exposures. This model was created during curation of the immunology literature, resulting in a robust model capable of meeting the requirements of such data. We present this model with the hope that overlapping projects will adopt and or contribute to this work.

12:00 PM-12:20 PM
Semantic Variation Graphs: Ontologies for Pangenome Graphs
Format: Pre-recorded with live Q&A

  • Jerven T. Bolleman, SIB Swiss Institute of Bioinformatics, Switzerland
  • Toshiyuki T. Yokoyama, The University of Tokyo, Japan
  • Simon Heumos, Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Germany
  • Josiah Seaman, Max Planck Institute for Developmental Biology, Germany
  • Dmytro Trybushnyi, Karlsruhe Institute of Technology, Germany
  • Torsten Pook, Center of Integrated Breeding Research, University of Goettingen, Germany
  • Andrea Guarracino, University of Rome Tor Vergata, Italy
  • Erik Garrison, University of California Santa Cruz, United States

Presentation Overview: Show

Background: Variation graphs are a novel way to describe genomic variation across a population. Variation graph tools present a significant improvement in mitigating reference bias compared to the linear reference ecosystem. Existing toolkits focus on algorithms processing pangenome graphs. Yet, they have limited capabilities in integrating various annotations of the biology and providing an interface for large scale visualizations.
Description: To interpret biological meaning in variation graphs by integrating various kinds of annotations for further analysis, FAIR data interchange formats are demanded. Borderless technology such as the Semantic Web allows variation graph toolkits and pangenome tools to focus on their core competence while allowing bioinformaticians to integrate, analyze, and visualize the data.
Result: We demonstrate how we can represent a graphical pangenome with pangenome ontologies using a standard declarative graph query language. Then we show how the vg RDF and Pantograph RDF can represent data ready for the Semantic Web and how we can combine existing data from INDSC and UniProt without conversions or loss of information into a single Variation and Knowledge Graph.

12:20 PM-12:30 PM
RGD: Data and tools to aid the discovery of precision models of human disease
Format: Pre-recorded with live Q&A

  • Jeff De Pons, Medical College of Wisconsin, United States
  • Jyothi Thota, Medical College of Wisconsin, United States
  • Anne E. Kwitek, Medical College of Wisconsin, United States
  • Jennifer R. Smith, Rat Genome Database, Medical College of Wisconsin, United States
  • Marek Tutaj, Medical College of Wisconsin, United States
  • Harika Srividya Nalabolu, Medical College of Wisconsin, United States
  • Logan Lamers, Medical College of Wisconsin, United States
  • Monika Tutaj, Medical College of Wisconsin, United States
  • Stan Laulederkind, Medical College of Wisconsin, United States
  • G. Thomas Hayman, Medical College of Wisconsin, United States
  • Shur-Jen Wang, Medical College of Wisconsin, United States
  • Matthew Hoffman, Medical College of Wisconsin, United States
  • Mary Kaldunski, Medical College of Wisconsin, United States
  • Cody Plasterer, Medical College of Wisconsin, United States
  • Mahima Vedi, Medical College of Wisconsin, United States
  • Melinda Dwinell, Medical College of Wisconsin, United States

Presentation Overview: Show

RGD (https://rgd.mcw.edu) is a multi-species knowledgebase which provides a substantial corpus of genomic, genetic, phenotypic and disease-related data and an innovative suite of tools for analyzing these data. Researchers can leverage cross-species manual annotations from RGD and annotations imported from external sources to search for an appropriate model. As an example, a researcher studying Wilson disease can find a list of associated genes using RGD's OLGA tool. An integrated toolbox facilitates submission of gene lists to other analysis tools to explore annotations across ontologies and across species. In analyses related to Wilson disease, ATP7B is one gene which commonly appears. The association between ATP7B and Wilson disease is well-documented at RGD via disease and phenotype annotations and associated pathogenic variants. Links on the gene page provide access to data for other species, such as an extensive list of mouse phenotypes. For rat, RGD's PhenoMiner tool provides related quantitative measurement data. RGD's strain record details a large Atp7b deletion in the LEC/Hok strain, a Wilson disease model. RGD's Variant Visualizer provides functionality to explore pathogenic or damaging variants for human, rat and dog.

12:30 PM-12:40 PM
Bio-Ontologies: Closing
Format: Live-stream

  • Bio-Ontologies Organizers