Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
SciLinker: A Scalable Text Mining Framework for Inferring Gene-disease and Cell type-disease Associations from Scientific Knowledge
Track: Text Mining
  • Dongyu Liu, Sanofi US Services Inc, United States
  • Franck Rapaport, Sanofi US Services Inc, United States
  • Shameer Khader, Sanofi US Services Inc, United States
  • Emanuele de Rinaldis, Sanofi US Services Inc, United States


Presentation Overview: Show

Natural Language Processing (NLP) enables the extraction of valuable information from unstructured text data, such as biomedical entitles and their relationships from biomedical literature. Here, we present SciLinker, an NLP framework to extract gene-disease and cell type-disease associations from PubMed abstracts.

We used the open-source ScispaCy library as the foundation for SciLinker and employed pre-trained named entity recognition models to identify human genes, cell types and diseases. We then normalized these genes, cell types, and diseases to the Unified Medical Language System (UMLS) knowledge base. Finally, we quantified the statistical significance of co-occurrences, normalized to the occurrence frequency of each separate entity. We applied this framework to extract over 1.2 million unique gene-disease associations for 27,013 genes and 14,219 diseases as well as over 200,000 unique cell type-disease associations for 1,010 cell types and 11,676 diseases from 35 million abstracts.

SciLinker is an integral part of our drug discovery pipeline. It supports us in generating and validating hypotheses related to potential gene-disease relationships, facilitating target identification and credentialing. Moreover, it enables the validation of disease-driven “pathogenic” cell types identified from leveraging both single cell transcriptomics and population genetics, along with the cell-type-disease relationships extracted from the literature.