All times listed are in UTC
- Marinka Zitnik, Harvard Medical School, United States
Presentation Overview: Show
The success of machine learning depends heavily on the choice of features on which the algorithms are applied. For that reason, much of the efforts go into the engineering of informative features. In this talk, I describe our research in learning deep representations that are actionable and allow endpoint users to ask what-if questions and receive robust predictions that can be interpreted meaningfully. These methods specify deep graph neural functions that can flexibly embed data points into an embedding space, optimized to reflect the topology of input data. Our recent theoretical results allow us to augment the embeddings and make them robust and counterfactually fair. I will describe how these methods enabled the repurposing of drugs for an emerging pathogen and led to downstream validation in human cells. Last, I will highlight Therapeutics Data Commons (https://tdcommons.ai), a platform with AI/ML-ready datasets and tasks for therapeutics together with an ecosystem of tools, libraries, leaderboards, and community resources.
- Sanya Taneja, University of Pittsburgh, United States
- Tiffany Callahan, University of Colorado Anschutz Medical Campus, United States
- Mathias Brochhausen, University of Arkansas for Medical Sciences, United States
- Mary Paine, Washington State University, United States
- Sandra Kane-Gill, University of Pittsburgh, United States
- Richard Boyce, University of Pittsburgh, United States
Presentation Overview: Show
Botanical and other natural products (NPs) are not as widely represented in biomedical ontologies compared to conventional drugs. The growing use of NPs that have been implicated in clinically significant pharmacokinetic NP–drug interactions (NPDIs) renders addressing this knowledge gap imperative. In this study, we designed potential logical extensions to the Chemical Entities of Biological Interest (ChEBI) ontology that map information about NPs and NP constituents from the Global Substance Registration System (G-SRS). We extracted information from the G-SRS database using SQL; created semantically consistent logical representations for the case NPs - kratom, goldenseal, and green tea; and integrated them within the ChEBI ontology. The merged ontology contains NP information in computable form and is compatible with the principles of the Open Biomedical Ontologies Foundry. The potential logical extensions are the first step in advancing re-search related to NPDIs using biomedical ontologies and knowledge graphs.
- Jinzhou Yang, Maastricht University, Institute of Data Science, Netherlands
- Remzi Celebi, Maastricht University, Institute of Data Science, Netherlands
- Leoni Bücken, Maastricht University, Institute of Data Science, Netherlands
- Sarah Chenine, Maastricht University, Institute of Data Science, Netherlands
- Vincent Emonet, Maastricht University, Institute of Data Science, Netherlands
- Michel Dumontier, Maastricht University, Institute of Data Science, Netherlands
Presentation Overview: Show
Motivation: Understanding the medical context of therapeutic intervention is crucial to its successful use in people. However, this contextual information is not recorded in a machine-readable manner, thereby limiting its use in query answering, clinical decision support, and computational drug discovery. Here, we describe a semi-automated approach to capture drug indications and their medical context. Our approach involves i) a pre-screening of relevant terms using natural language processing tools, and ii) the development and use of Nanobench semantic templates to facilitate data curation with support for term auto-completion from vocabulary standards. We apply our method to create the NeuroDKG, a knowledge graph for Neuropharmaceutical Drugs, which is available as a set of nanopublications.
Availability: The NeuroDKG is available at https://github.com/MaastrichtU-IDS/neuro_dkg
- Sarah Mullin, Yale University, United States
- Robert McDougal, Yale University, United States
- Kei-Hoi Cheung, Yale University, United States
- Halil Kilicoglu, University of Illinois Urbana-Champaign, United States
- Amanda Beck, Albert Einstein College of Medicine, United States
- Caroline Zeiss, Yale University, United States
Presentation Overview: Show
Despite advances in identifying the biological basis of Alzheimer’s disease (AD) and dementia, there remain few chemical therapeutic interventions. One major challenge is the poor translation of effective therapies from animals to humans. Text mining translation-related characteristics, such as chemical interventions, can help to address this challenge. However, normalization to a standardized ontology that contains hierarchical relations and molecule structure information, is challenging. We provide a reproducible hierarchical primarily dictionary-based method to normalize chemical mentions from PubTator to Chemical Entities of Biological Interest (ChEBI), a fully curated database and OBO Foundry ontology for molecular entities. To generate this mapping we make use of external synonym databases, ChEBI parent-child relationships, and nearby context words. We found 277,844 PubMed abstracts related to Alzheimer’s and dementia in PubTator. Of the total 55,574 chemical mentions found in the article title, we normalized 49,966 mentions to 3,507 unique ChEBI entities. In addition, we were able to identify potential new candidate entities related to AD and dementia from the remaining 9.4%. Patterns that emerge from aggregation of standardized chemical interventions can help ascertain translational potential. In addition, effective and correct normalization in text mining is important for future downstream applications, such as improved efficacy and drug design.
- Min Wu, I2R, A*STAR, Singapore
- Yong Liu, Nanyang Technological University, Singapore
- Shike Wang, ShanghaiTech University, China
- Fan Xu, ShanghaiTech University, China
- Yunyang Li, ShanghaiTech University, China
- Jie Wang, ShanghaiTech University, China
- Ke Zhang, ShanghaiTech University, China
- Jie Zheng, ShanghaiTech University, China
Presentation Overview: Show
Motivation: Synthetic lethality (SL) is a promising gold mine for the discovery of anti-cancer drug targets. Wet-lab screening of SL pairs is afflicted with high cost, batch-effect, and off-target problems. Current computational methods for SL prediction include gene knock-out simulation, knowledge-based data mining, and machine learning methods. Existing methods tend to assume that SL pairs are independent of each other, without taking into account their intrinsic correlation. Although several methods have incorporated genomic and proteomic data to aid SL prediction, these methods involve manual feature engineering that heavily relies on domain knowledge.
Results: Here we propose a novel graph neural network (GNN)-based model, named KG4SL, by incorporating knowledge graph message-passing into SL prediction. The knowledge graph was constructed using 11 kinds of entities including genes, compounds, diseases, biological processes, and 24 kinds of relationships that could be pertinent to SL. The integration of knowledge graph can help harness the independence issue and circumvent manual feature engineering by conducting message-passing on the knowledge graph. Our model outperformed all the state-of-the-art baselines in AUC, AUPR and F1. Extensive experiments, including the comparison of our model with an unsupervised TransE model, a vanilla GCN (graph convolutional network) model, and their combination, demonstrated the significant impact of incorporating knowledge graph into GNN for SL prediction.
- Núria Queralt Rosinach, Leiden University Medical Center, Netherlands
- Paul Schofield, University of Cambridge, United Kingdom
- Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia
- Claus Weiland, Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt am Main, Germany, Germany
- Erik Schultes, GO FAIR International Support and Coordination Offcie: Leiden, NL, Netherlands
- César Henrique Bernabé, Leiden University Medical Center, Netherlands
- Marco Roos, Leiden University Medical Centre, Netherlands
Presentation Overview: Show
One year ago, the novel COVID-19 infectious disease emerged and spread, causing high mortality and morbidity rates worldwide. In the OBO Foundry, there are more than one hundred ontologies to share and analyse large-scale datasets for biological and biomedical sciences. However, this pandemic revealed that we lack tools for an efficient and timely exchange of this epidemiological data which is necessary to assess the impact of disease outbreaks, the efficacy of mitigating interventions and to provide a rapid response. Recently, several new COVID-19 ontologies have developed such as the IDO extension or CIDO. Hence, our research question was to determine if there was a good representation of epidemiological quantitative concepts in OBO ontologies. Our objectives were to identify missing COVID-19 epidemiological terms and implement axiom patterns for extensions to existing ontologies or to build a new, logically well-formed, and accurate ontology in OBO. In this study we present our findings and contributions for the bio-ontologies community.
- Aditya Rao, Tata Consultancy Services Ltd, India
- Thomas Joseph, Tata Consultancy Services Ltd, India
- Vangala Govindakrishnan Saipradeep, Tata Consultancy Services Ltd, India
- Sujatha Kotte, Tata Consultancy Services Ltd, India
- Naveen Sivadasan, Tata Consultancy Services Ltd, India
- Rajgopal Srinivasan, Tata Consultancy Services Ltd, India
Presentation Overview: Show
Comprehensive and high-quality entity dictionaries such as rare disease dictionaries are invaluable resources for use by clinicians and researchers, for building domain ontologies, and for a wide range of IR and IE tasks. We present a text- mining approach for rare diseases dictionary augmentation. Towards this, we build a text-mining pipeline that retrieves new rare disease terms from MEDLINE that are highly related to a given set of dictionary terms and recommends the new terms in a ranked order. Unlike general disease terms, rare diseases terms are significantly longer complex phrases consisting of multiple compound nouns and parts- of-speech tags such as conjunctions, determiners, prepositions and adjectives. Our pipeline uses syntactic and semantic similarity measures in combination with efficient nearest neighbor search for efficient retrieval. We demonstrate the utility of our pipeline for augmenting Orphanet rare diseases dictionary. Manual quality assessment of the top 10,000 rank ordered output revealed high quality recommendations with PR-AUC and mean precision of 0.947. Most terms among these were new synonyms of Orphanet terms (PR-AUC and mean precision of 0.902). We further show the utility of the recommended terms for improved IR tasks.
- Tiago Lubiana, University of São Paulo, Brazil
- João Vitor Ferreira Cavalcante, Federal University of Rio Grande do Norte, Brazil
Presentation Overview: Show
PanglaoDB is a database of cell type markers widely used for single-cell RNA sequencing data analysis. PanglaoDB is in a 3-star category for Linked Open Data. Conforming data to W3C standards with cross-database links makes data 5-star and is a valuable step in making biological sources Findable, Accessible, Interoperable, and Reusable. Thus, we leveraged Wikidata, a freely editable knowledge graph database to connect PanglaoDB to the semantic web. After creating classes and relations, we matched PanglaoDB's categories to Wikidata URIs and added the information via Wikidata's API. Then, we explored the 5-star data with SPARQL queries to ask questions like “which cell types express markers related to neurogenesis?” and “which diseases are related to human pancreatic beta cells?”. As Wikidata is connected to several biomedical resources, is under stable funding, and is continuously updated by contributors, it increases the magnitude and the stability of the contribution of PanglaoDB to the scientific community. The approach can be applied to any knowledge set of public interest (given proper permissions), providing a low-cost and low-barrier platform for sharing curated biological knowledge.
- Matthias Samwald
Presentation Overview: Show
Recent years were marked by outstanding advances in the capabilities of artificial intelligence (AI). Deep learning enabled rapid progress on many benchmarks that were previously deemed difficult to tackle for machine learning algorithms, often achieving human-level performance. Models such as GPT-3 demonstrated problem-solving capabilities across a wide range of intelligence tasks. Given these developments, AI holds the potential to radically transform society through accelerating technological, biomedical, and societal knowledge creation and innovation. The realization of this potential increasingly hinges on our understanding of how to best define, train, measure, interconnect and utilize AI capabilities. The development of a shared ontology of AI tasks and capabilities is instrumental towards achieving such an understanding. In this talk, I will present recent work on the Intelligence Task Ontology, a large-scale ontology with broad coverage of artificial intelligence tasks, benchmarks, and datasets. I will demonstrate how such ontological models can form the foundation for harnessing AI in biomedical research and clinical decision-making.
- Susana Nunes, LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
- Rita T. Sousa, LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
- Catia Pesquita, LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
Presentation Overview: Show
Ontology-based approaches for predicting gene-disease associations include the more classical semantic similarity methods and more recently knowledge graph embeddings. While semantic similarity is typically restricted to hierarchical relations within the ontology, knowledge graph embeddings consider their full breadth. However, embeddings are produced over a single graph and complex tasks such as gene-disease association may require additional ontologies.
We investigate the impact of employing richer semantic representa-tions that are based on more than one ontology, able to represent both genes and diseases and consider multiple kinds of relations within the ontologies. Our experiments demonstrate the value of employing knowledge graph embeddings based on random-walks and highlight the need for a closer integration of different ontologies.
- Paul Schofield, University of Cambridge, United Kingdom
- Robert Hoehndorf, King Abdullah University of Science and Technology, Saudi Arabia
- Sarah Alghamdi, King Abdullah University of Science and Technology, Saudi Arabia
Presentation Overview: Show
The use of model organisms such as the mouse, fruitfly and
zebrafish has been key in driving our understanding of human disease
and its underlying biology for arguably a century, mainly due to the
availability of genetic approaches. Many thousands of phenotypic
annotations are now available for the major experimental model
organism. Different organisms offer differing strengths and
weaknesses. When combining the phenotypic annotations across multiple
model organisms, the strengths and weaknesses of each model may be
compensated and coverage of the human genome can be optimised. Work
over the past decade has demonstrated the power of cross-species
phenotypic comparisons, and cross-species phenotype ontologies such as
uPheno and the PhenomeNET ontology have been developed for this
purpose. We report further development of the pan-species phenotype
ontology PhenomeNet-Extended (Pheno-e), in particular including
phenotypes from Schizosaccharomyces and Drosophila.
We apply ontology embeddings and unsupervised machine learning to
measure the semantic similarity between phenotypes resulting from
loss-of-function mutations in model organisms and their associated
phenotypes. We demonstrate the different contributions of each
species' phenotypic data to the identification of human gene-disease
associations, and investigate the physiological and anatomical
properties through which each species contributes.
- Cen Wan, Birkbeck, University of London, United Kingdom
Presentation Overview: Show
The recent success of hierarchical feature selection methods enables us to discover knowledge from ontology data. In this work, we focus on discovering relationships between ageing and human phenotypic abnormalities by adopting a well-known hierarchical feature selection method. The selected human phenotype ontology terms further reveal strong links between ageing and developmental phenotypic abnormalities.
- Bio-Ontologies Community
Presentation Overview: Show
The Plenary is an opportunity for Bio-Ontologies community members to discuss current developments and provide feedback to the organizers and the community.