MyMED - An Internal XML Relational Database Implementation of MEDLINE Citations

K Lewis1, CW Hogue2
1lewis@mshri.on.ca, Samuel Lunenfeld Research Institute, Mt Sinai Hospital, Dept of Biochemistry, University of Toronto; 2hogue@mshri.on.ca, Samuel Lunenfeld Research Institute, Mt Sinai Hospital, Dept of Biochemistry, University of Toronto

The MyMED database is an internal relational database implementation of MEDLINE XML data leased from the National Library of Medicine (NLM). MEDLINE is a large database consisting of more than twelve million medical bibliographic citations and author abstracts from more than 4,600 biomedical journals. MyMED facilitates the execution of text mining algorithms and complex SQL queries in a fast, secure manner. The alternative involves multiple CGI calls to public MEDLINE databases that have a daily limit for the number of requests per machine. Citations in XML format are extracted from NLM files using the DB2 XML Extender and stored in an IBM DB2 Enterprise Edition Universal Database enabled for XML. Updates are processed daily using stored procedures. Selected XML text fields are indexed using the IBM DB2 Text Information Extender allowing advanced queries including wildcard searching abilities, proximity searching, scoring functions and capabilities to build a thesaurus. The system includes a C and Perl API and uses XSLT to transfer data from XML to text and MEDLINE formats.