Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT |
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT |
---|---|
Session A Poster Set-up and Dismantle
Session A Posters set up: Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT Session A Posters dismantle: Tuesday, July 12 at 6:00 PM CDT |
Session B Poster Set-up and Dismantle
Session B Posters set up: Wednesday, July 13 between 7:30 AM - 10:00 AM CDT Session B Posters dismantle: Thursday. July 14 at 2:00 PM CDT |
Presentation Overview: Show
Motivation
Many Francophone African countries are affected by sickle cell disease (SCD). A French translation of the SCD Ontology (SCDO) would thus promote SCDO application by Francophone clinicians, researchers and other relevant stakeholders, aiding SCDO-facilitated harmonisation of data from Francophone research sites and making patient-centered tracking, diagnostics and therapeutics applications more accessible to Francophone patients.
Method
We used language translation software and manually reviewed the French auto-translations. To manage translation of the SCDO, we developed a graphical user interface (GUI) using python and wrote standard operating procedures (SOPs). To display the French translations, as well as future layperson translations, in the SCDO OWL file, we developed a set of language tags using the World Wide Web Consortium’s IANA Language Subtag Registry and the language tag syntax defined by the Internet Engineering Task Force’s (IETF’s) Best Current Practice (BCP) 47.
Results
An Ontology Translation GUI and SOPs.
Four types of language tags for displaying translations in ontologies.
Version 1.1 French SCDO.
Conclusion
We describe a time-saving and cost-effective workflow, with associated tools, including a standardised protocol of language tags, for the translation of ontologies, and highlight the utility of our method with its successful application in producing the French SCDO.
Presentation Overview: Show
In the Netherlands, women aged 50-75 years are invited to receive breast cancer screening every two years. The PRISMA (Personalised RISk-based MAmmascreening) study was designed to investigate the added value of risk-based mammography screening. To our best knowledge, there is no universally accepted data model for the collection of breast cancer risk factors and PROMs. Therefore, we aimed to retrospectively create a domain ontology based on the PRISMA questionnaire. This contributes to global efforts to increase secondary use of patient-reported outcomes through FAIRification, i.e. ensuring data are Findable, Accessible, Interoperable, and Reusable.
Initially, 201 questions and 188 Variables were identified from the questionnaire. After several inventory meetings with different stakeholders, the resulting 70 data elements were grouped into 17 main classes. The structure of the domain ontology adheres to an “is-a” relationship.
Given that most of the concepts derived from the questionnaire are measurements of some attributes of the patient, we use the Semantic Science Integrated Ontology (SIO), which is capable of modelling entities, processes, and their qualities/attributes, as guidance to derive our core model.
The data model developed could serve as a template for other breast cancer research groups to other Patient-Reported Outcome and Real-World Experience questionnaires.
Presentation Overview: Show
Motivation: Cognitive-behavior symptoms (CBSx) represent the surface manifestation of diverse etiology. Identification and documentation of CBSx are critical to biomedical research that targets understanding the association between the symptoms and the underlying causes. Seeing the lack of a semantic resource of CBSx, specifically dedicated to the symptom layer, we sought to develop a CBSx Taxonomy.
Methods: We defined CBSx as any observable abnormality or reduced capacity in certain cognitive or behavioral functionality. We also collected concepts concerning the impact of CBSx on life quality and daily activities. The taxonomy was iteratively developed by synthesizing knowledge from the literature, existing biomedical ontologies, and clinical instruments that assess CBSx in patients. A domain expert in aging and neurogenerative diseases was consulted in curating the CBSx Taxonomy.
Results: The derived taxonomy contains 258 concepts that are grouped into four major branches: cognitive symptoms, psychomotor symptoms, behavioral symptoms, and impact on life.
Conclusion: By synthesizing multiple knowledge sources, we developed the CBSx Taxonomy to serve as a dedicated semantic layer for cognitive-behavioral symptoms. The taxonomy is shared publicly and is expected to benefit diverse applications including natural language processing, phenotyping, and semantic harmonization.
Presentation Overview: Show
Interoperability, one of the FAIR Data principles (Findable, Accessible, Interoperable and Reusable), requires mapping data models, formats and semantics. The European Joint Programme on Rare Disease (EJP RD) CDE semantic data models enable creating highly expressive FAIR data for the interoperability of patient registries to facilitate rare disease research. The Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model (CDM) is used to harmonize representations of healthcare data and reproducible open source analytics to conduct clinical research. Here, we present our mapping work for scheme integration and interoperability between the EJP RD CDE semantic model and the OMOP CDM. Enhancing the interoperability between these two schemas can be beneficial to enrich rare disease research with observational health data, and to extend the EJP RD Virtual Platform research ecosystem with OMOP analytics.
Presentation Overview: Show
We have recently published an updated genome assembly and annotation
of our model organism Pseudomonas fluorescens SBW25. We are now
facing the challenge to keep the annotation up to date with novel results
from experimental and computational studies of gene function, fitness as-
says, regulatory and metabolic networks. We will present
various opensource software tools and open data and metadata standards combined
into a public knowledge base for our model organism. The central part
is our genome database and genome browser based on Tripal. It allows internal and external colleagues to feed in their data and results in a curated fashion. To further integrate our data we are working on a Linked Data architecture that connects our
genome database to various public *omics databanks as well as to internal datasources
thereby creating an organism specific knowledge graph. By exposing a
public SPARQL endpoint, our data ultimately becomes part of the world
wide semantic web that incorporates other, domain specific knowledge
graphs but also generic data sources such as Wikipedia (via WikiData).
In this way, our system facilitates the growth of the Pseudomonas fluorescens SBW25 knowledge graph both through manual explorations as well as through automated procedures.
Presentation Overview: Show
Governance on ontology development and maintenance practices within an organization has many advantages over ungoverned, siloed approaches that many organizations exhibit nowadays. This paper presents the BASF Governance Operational Model for Ontologies (GOMO), which addresses all stages of the ontology lifecycle and provides a framework for the development and maintenance of ontologies within BASF. GOMO is comprised of Principles, Standards and Quality Assurance criteria, Best Practices, Training and Outreach and is the result of a collaborative effort between industry and academia in the semantic web field. GOMO Principles, Standards and Best Practices are being applied to all running ontology-based projects in BASF. Through outreach presentations with sections of the BASF community, GOMO has reached a wider audience to foster understanding on the utility and implementation of ontologies. Finally, GOMO stands as a framework that is fit for adoption by other organizations facing similar challenges in ontology governance.
Presentation Overview: Show
The recent success of node embedding methods has greatly boosted the application of graph data to bioinformatics problems. In this work, we proposed a new method for learning a type of Gene Ontology sub-graph embeddings to classify model organisms' genes into pro-longevity or anti-longevity genes. The experimental results show that this type of Gene Ontology sub-graph embeddings successfully obtains higher predictive accuracy than the conventional binary Gene Ontology annotation-based features.
Presentation Overview: Show
Today’s international corporations such as BASF, a leading company in the crop protection industry, produce and consume more and more data that are often fragmented and accessible through Web APIs. In addition, part of the proprietary and public data of BASF’s interest are stored in triple stores and accessible with the SPARQL query language. Homogenizing the data access modes and the underlying semantics of the data without modifying or replicating the original data sources become important requirements to achieve data integration and interoperability. In this work, we propose a federated data integration architecture within an industrial setup, that relies on an ontology-based data access method. Our performance evaluation in terms of query response time showed that most queries can be answered in under 1 second.
Presentation Overview: Show
Ontologies play an important role in the representation, standardization, and integration of biomedical data, but are known to have data quality (DQ) issues. We aimed to understand if the Harmonized Data Quality Framework (HDQF), developed to standardize electronic health record DQ assessment strategies, could be used to improve ontology quality assessment. A novel set of 14 ontology checks was developed. These DQ checks were aligned to the HDQF and examined by HDQF developers. The ontology checks were evaluated using 11 Open Biomedical Ontology Foundry ontologies. 85.7% of the ontology checks were successfully aligned to at least 1 HDQF category. Accommodating the unmapped DQ checks (n=2), required modifying an original HDQF category and adding a new Data Dependency category. While all of the ontology checks were mapped to an HDQF category, not all HDQF categories were represented by an ontology check presenting opportunities to strategically develop new ontology checks. The HDQF is a valuable resource and this work demonstrates its ability to categorize ontology quality assessment strategies.
Presentation Overview: Show
With the new era of genomics, an increasing number of animal species are amenable to large-scale data generation. This had led to the emergence of new multi-species ontologies to annotate and organize these data. While anatomy and cell types are well covered by these efforts, information regarding development and life stages is also critical in the annotation of animal data. Its lack can hamper our ability to answer comparative biology questions and to interpret functional results. We present here a collection of development and life stage ontologies for 21 animal species, and their merge into a common multi-species ontology. This work has allowed the integration and comparison of transcriptomics data in 52 animal species.
Presentation Overview: Show
EDAM is a domain ontology of data analysis and data management in bio- and other sciences. It comprises concepts related to analysis, modeling, optimization, and data life-cycle, divided into 4 main sections: topics, operations, data, and formats.
EDAM is used in numerous resources, for example Bio.tools, Galaxy, CWL, Debian, BioSimulators, FAIRsharing, or the ELIXIR training portal TeSS. Thanks to the annotations with EDAM, tools, workflows, standards, data, and learning materials are easier to find, compare, choose, and integrate.
EDAM is developed by a diverse community of contributors. A substantial extension is EDAM Bioimaging, focused on image analysis and machine learning.
The main improvements and ongoing work in 2022 include:
- In addition to using standard tools such as HermiT and ROBOT, we develop additional validation tools at both the syntactic and semantic levels: https://github.com/edamontology/edam-validation
- Enabling interdisciplinary applications with EDAM Geo (https://github.com/edamontology/edam-geo), an extension of EDAM towards geolocated data (e.g. in ecology, public health, …). Developed at https://webprotege.stanford.edu/#projects/69591619-4eda-4f03-9e7f-65b213038fe1/edit/Classes
- Improving the implementation of links to external resources (incl. other ontologies), definitions, and the overall quality
- Addition of numerous data formats, especially for models and simulations, and chemistry
Presentation Overview: Show
Named Entity Recognition (NER) systems are commonly used in the construction of large biomedical knowledge graphs (KGs) from free text or non-standardized data. Their main role is to map biomedical entities to standardized identifiers in ontologies and databases. While NER is only one of the steps in KG construction, NER systems can greatly accelerate KG construction. However, errors introduced by the NER systems can systematically affect downstream applications of the KG. In this study, we used two NER systems – BioPortal Annotator and the OntoRunNER OGER++ wrapper - to map biomedical entities from a KG to 13 biomedical ontologies and subsequently evaluated the mappings. Our preliminary results show that the OntoRunNER wrapper produced average candidate matches equal to 4.26 and mapped 76% of the entities correctly, while the BioPortal Annotator correctly mapped 60% of the manually reviewed entities. Results from both systems contained errors such that using the mappings in a KG without curation could lead to inaccurate inferences. We are currently evaluating the effects of the NER systems on downstream KG applications using graph analysis, embedding similarity, and data source cross-validation.
Presentation Overview: Show
The representation of KEGG human pathway database, a collection of manually fabricated maps representing current knowledge on metabolism and various other functions of the cell and organism, in a graph database, Neo4j, gives us a panoramic view of the pathway database thus enables us to find gene relations across multiple KEGG pathway maps. Besides, multiple paths between two genes are also found, which enhances our understanding of gene pathways. Moreover, Neo4j also enables us to customize our queries by constraining the number and length of the pathway and filters dubious pathways that only been mentioned very few times in the database. Overall, it’s concluded that although KEGG pathways’ representation in graph database may not completely manifest real underlying mechanisms, the progression of personalized medicine is still propelled greatly.
Presentation Overview: Show
Due to gaps in knowledge, most rare disease patients never get a diagnosis. But diagnostic reach can be extended by integrating bio-ontologies. We show clusters identified across ontologies suggest undiscovered connections between genes and phenotypes. Biological networks have long been used to infer new connections, but the use of heterogeneous networks and the inference of gene-to-phenotype connections remains largely unexplored. We create a heterogeneous network composed of genes and phenotypes with edges derived from the STRING protein-protein interaction network and the Human Phenotype Ontology for each year: 2015 to 2021. Employing a combination of classic and node attribute-aware network clustering algorithms, we identify small, heterogeneous clusters for each year. We show these clusters contain significantly more (p < 0.0001) gene-to-phenotype edges in the future year than 10,000 replicates from a robust null model. Using biologically meaningful cluster properties, we train an XGBoost model to estimate the degree to which we’d expect more gene-to-phenotype pairs in the future year than at random and prioritize clusters that will be meaningful to those affected by a rare disease.
All data and methods are available in a Snakemake pipeline and Conda environment for the highest degree of reproducibility (https://github.com/MSBradshaw/BOCC).
Presentation Overview: Show
Motivation: Literature-based Gene Ontology Annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This paper presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection.
Results: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported.
Conclusion: This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows.