The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 14, 2024
10:40-10:50
COSI Announcements
Track: Bio-Ontologies

Room: 522
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Tiffany Callahan
  • Robert Hoehndorf
July 14, 2024
10:50-11:55
Invited Presentation: Exploring Multiple Perspectives for Associative Knowledgebases
Confirmed Presenter: Karin Slater
Track: Bio-Ontologies

Room: 522
Format: Live Stream
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Karin Slater

Presentation Overview:Show

Databases encoding associative relationships between biomedical entities function as background knowledge which are leveraged for a range of purposes. For example, disease-phenotype associations are used for differential diagnosis and variant prediction, while gene-function associations are used in gene set enrichment analyses.

In the ontology world, these associative knowledgebases lie somewhere between the conceptualisation and instance spaces, defining foundational knowledge that is often probabilistic, associative, or uncertain, rather than axiomatic. They are formed through some combination of manual curation from expert knowledge, experimental data, and analysis of co-occurrence in literature text. Due to this aetiology of associations, existing databases represent a particular perspective on biomedical knowledge, and it is one that differs from those that might be cultivated from analysis of other sources, such as clinical data, public discussion, or alternative modularisations of literature text.

We will explore the similarities and differences between associative knowledgebases derived from these contexts, including methodological concerns, hypothesis generation, characterisation, and implications for downstream applications.

July 14, 2024
11:55-12:20
Extracting Clinical Significance for Drug-Gene Interactions using FDA Label Packages
Confirmed Presenter: Matthew Cannon, Institute for Genomic Medicine, Nationwide Children's Hospital
Track: Bio-Ontologies

Room: 522
Format: In Person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Matthew Cannon, Matthew Cannon, Institute for Genomic Medicine
  • James Stevenson, James Stevenson, Institute for Genomic Medicine
  • Kathryn Stahl, Kathryn Stahl, Institute for Genomic Medicine
  • Rohit Basu, Rohit Basu, Institute for Genomic Medicine
  • Kori Kuzma, Kori Kuzma, Institute for Genomic Medicine
  • Adam Coffman, Adam Coffman, Department of Medicine
  • Susanna Kiwala, Susanna Kiwala, Department of Medicine
  • Joshua McMichael, Joshua McMichael, Department of Medicine
  • Elaine Mardis, Elaine Mardis, Institute for Genomic Medicine
  • Obi Griffith, Obi Griffith, Department of Medicine
  • Malachi Griffith, Malachi Griffith, Department of Medicine

Presentation Overview:Show

The drug-gene interaction database (DGIdb) is a resource that aggregates interaction data from over 40 different resources into one platform with the primary goal of making the druggable genome accessible to clinicians and researchers. By providing a public, computationally accessible database, DGIdb enables therapeutic insights through broad aggregation of drug-gene interaction data.

As part of our aggregation process, DGIdb preserves data regarding interaction types, directionality, and other attributes that enable filtering or biochemical insight. However, source data are often incomplete and may not contain the therapeutic relevance of the interaction. In this report, we address these missing data and demonstrate a pipeline for extracting physiological context from free-text sources. We apply existing large language models (LLMs) to tag and extract indications, cancer types, and relevant pharmacogenomics from free-text, FDA approved labels. We are then able to utilize the Variant Interpretation for Cancer Consortium (VICC) normalization services to ground extracted data back to formally grouped concepts.

In a preliminary test set of 355 FDA labels, we were able to normalize 59.4% of extracted chemical entities back to ontologically-grounded therapeutic concepts. We can link this therapeutic context data back to interaction records already searchable within DGIdb. By using LLMs to extract this data set, we can supplement our existing interaction data with relevant indications, pharmacogenomic data and mutational statuses that may inform the therapeutic relevance of a particular interaction. Inclusion of these data will be invaluable for variant interpretation pipelines where mutational status can lead to the identification of a lifesaving therapeutic.

July 14, 2024
14:20-15:05
Proceedings Presentation: Predicting protein functions using positive-unlabeled ranking with ontology-based priors
Confirmed Presenter: Fernando Zhapa-Camacho, King Abdullah University of Science and Technology, Saudi Arabia
Track: Bio-Ontologies

Room: 522
Format: Live Stream
Moderator(s): Tiffany Callahan


Authors List: Show

  • Fernando Zhapa-Camacho, Fernando Zhapa-Camacho, King Abdullah University of Science and Technology
  • Zhenwei Tang, Zhenwei Tang, University of Toronto
  • Maxat Kulmanov, Maxat Kulmanov, King Abdullah University of Science and Technology
  • Robert Hoehndorf, Robert Hoehndorf, King Abdullah University of Science and Technology

Presentation Overview:Show

Automated protein function prediction is a crucial and widely studied problem in bioinformatics. Computationally, protein function is a multilabel classification problem where only positive samples are defined and there is a large number of unlabeled annotations. Most existing methods rely on the assumption that the unlabeled set of protein function annotations are negatives, inducing the false negative issue, where potential positive samples are trained as negatives. We introduce a novel approach named PU-GO, wherein we address function prediction as a positive-unlabeled ranking problem. We apply empirical risk minimization, i.e., we minimize the classification risk of a classifier where class priors are obtained from the Gene Ontology hierarchical structure. We show that our approach is more robust than other state-of-the-art methods on similarity-based and time-based benchmark datasets. Data and code are available at https://github.com/bio-ontology-research-group/PU-GO.

July 14, 2024
15:05-15:30
Protein Function: how much do we know and how much do we care?
Confirmed Presenter: An Phan, Iowa State University, United States
Track: Bio-Ontologies

Room: 522
Format: In Person
Moderator(s): Tiffany Callahan


Authors List: Show

  • An Phan, An Phan, Iowa State University
  • Karin Dorman, Karin Dorman, Iowa State University
  • Claus Kadelka, Claus Kadelka, Iowa State University
  • Iddo Friedberg, Iddo Friedberg, Iowa State University

Presentation Overview:Show

The resources required to study gene function are limited, especially when considering the number of genes in the human genome and the complexity of their function. Genes are prioritized for experimental studies based on many different considerations, including, but not limited to, perceived biomedical importance and the understanding of biomedical processes. At the same time, the lion's share of genes are not studied or are under-characterized, with detrimental results to our understanding of the functions inherent to them, and their effects on human health and wellness. However, the size of this disparity in knowledge has not yet been quantified. Understanding function annotation disparity is a necessary first step toward understanding how much functional knowledge is gained of the human genome, and guidelines for the future studies of its component genes effectively.
Here, we present a comprehensive longitudinal analysis of our understanding of the human proteome utilizing tools from economics and information theory. Specifically, we view the human proteome as a population of proteins with a knowledge economy: we treat quantified knowledge of the function of each protein as the equivalent of its wealth, and examine the distribution of knowledge of proteins within a proteome in the same manner distribution of wealth is studied in societies. Our results show a broad distribution of functional knowledge about human proteins over the last decade, in which the inequality in annotations of these proteins remains high.

July 14, 2024
15:30-15:55
Harmonizing human and microbial datasets to explore mechanisms of the gut microbiome in disease
Confirmed Presenter: Brook Santangelo, University of Colorado Anschutz Medical Campus, United States
Track: Bio-Ontologies

Room: 522
Format: In Person
Moderator(s): Tiffany Callahan


Authors List: Show

  • Brook Santangelo, Brook Santangelo, University of Colorado Anschutz Medical Campus
  • Marcin Joachimiak, Marcin Joachimiak, Lawrence Berkeley National Laboratory
  • Harshad Hegde, Harshad Hegde, Lawrence Berkeley National Laboratory
  • Lawrence Hunter, Lawrence Hunter, University of Chicago
  • Catherine Lozupone, Catherine Lozupone, University of Colorado Anschutz Medical Campus

Presentation Overview:Show

The integration of disparate forms of biological data is essential for understanding human health and disease. Doing so is particularly challenging in the context of microbe-host interactions that contribute to both positive and negative health outcomes. There are thousands of relevant microbial species, and many interactions among those microbes and with the host. To facilitate understanding of these complex interactions, information about host and microbial physiology, genetics, and metabolism, including interactions must be assembled. We address this technical challenge by harmonizing data in the form of a knowledge graph (KG) of the gut microbiome in disease. We present a KG that integrates enzymatic data of human and over 1,500 microbial proteomes, drawn from UniProt and 8 other reaction, enzymatic, genomic, chemical, pathway and disease oriented resources. We also provide a framework that supports customizable subsets which represent a microbial community of interest. We use a version of the graph constrained by gut microbes known to be correlated with disease that contains over 8 million nodes and 30 million edges. We apply a novel semantic search to identify meaningful mechanistic hypotheses for these microbe-disease relationships. Finally, we demonstrate the predictive capabilities of the KG by using graph embeddings to identify similarities among individual microbial taxa and human disease. This KG is an important enabling technology for automated methods to uncover mechanistic explanations for microbe-disease associations.

July 14, 2024
16:40-17:05
Using ontologies to make bioassay protocols machine readable
Confirmed Presenter: Alex Clark, Collaborative Drug Discovery, Canada
Track: Bio-Ontologies

Room: 522
Format: In Person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Alex Clark, Alex Clark, Collaborative Drug Discovery
  • Barry Bunin, Barry Bunin, Collaborative Drug Discovery
  • Jason Harris, Jason Harris, Collaborative Drug Discovery

Presentation Overview:Show

Bioassay protocols have lagged other areas of drug discovery in terms of digitization. While molecules and proteins have spawned entire disciplines (cheminformatics and bioinformatics), most archives of assays carried out by companies are siloed away as a combination of bespoke pick lists, plain text, and sporadic links to globally meaningful dictionaries. Published experiments often err on the side of terseness and obfuscation by referring to similar work. This leads to serious challenges to anyone who wants to federate data, or effectively search it, or use it as the basis of any kind of machine learning inference. Reproducibility issues are further confounded by the difficulty of ascertaining whether any two experiments are comparable. Public ontologies can greatly improve the machine readability of assay protocols by virtue of having universal meaning. We will describe an open source project - BioAssay Express - that uses templates to gather and organize ontologies into a coherent user interface for curating data content. We have marked up 4000 assays from PubChem using our templates, plus another 2600 from the DataFAIRy project, using a hybrid automated model/expert curation workflow. This freely available data can be precisely and rapidly searched as well as used for sophisticated analysis techniques and model building. We have integrated these curation tools into a commercial product in order to make the process of creating marked-up data less work than traditional writeups, with the ultimate goal of making machine readable data the standard practice rather than a post-publication cleanup chore.

July 14, 2024
17:05-17:30
Knowledge graphs in Cancer Genomics: The Case of Mutational Signatures
Confirmed Presenter: Ulrike Steindl, Computational Biomedicine, University Hospital Aachen
Track: Bio-Ontologies

Room: 522
Format: In Person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Ulrike Steindl, Ulrike Steindl, Computational Biomedicine
  • Arnab Chakrabarti, Arnab Chakrabarti, Computational Biomedicine
  • Kjong-Van Lehmann, Kjong-Van Lehmann, Computational Biomedicine

Presentation Overview:Show

Mutational Signatures are generated from somatic genomic mutation data based on their sequence context and have been shown to be indicative of various functional changes in cancer patients.
Studying cancer biology using mutational signatures is an emerging field of research. The analyses are continuously being refined.
The findings generated using this approach are manifold, making it challenging to make draw conclusions. Due to its distributed nature, integrating knowledge allows for new discoveries.
In this work, we will introduce the Mutational Signature Ontology, an ontology describing the space of mutational signature.

The Mutational Signature Ontology represents the numeric data of the signatures in COSMIC database version 3.4 (Sondka et al. 2023) and selected metadata. It is implemented as an owl/rdf knowledge graph, encoding necessary other information regarding the sample used, and other features encoded in the COSMIC dataset.


We also integrated are the discoveries based on Alexandrov et al. (2020), which provide a quantificational link between cancer types and mutational signatures.
The tumor, etiology, and treatment classes of the Mutational Signature Ontology have been designed to be interoperable with the National Cancer Institute Thesaurus (NCIT).

The Mutational Signature Ontology models relations between mutational signatures, mutations, and localities in the genomic location, which uses concepts from the Gene Ontology (Ashburner et al. 2000) and Sequence Ontology (Eilbeck et al. 2005).

The Mutational Signature Ontology and knowledge graph provides missing links in the existing ontology space in oncology. It enables interaction between previously unrelated knowledge spaces and will allow for new predictions.

July 14, 2024
17:30-18:00
COSI Closing Remarks
Track: Bio-Ontologies

Room: 522
Format: In Person
Moderator(s): Robert Hoehndorf


Authors List: Show

  • Robert Hoehndorf
  • Tiffany Callahan

Presentation Overview:Show

Speaker Questions and COSI Closing / Community Discussion