Surfing data sources in drug discovery

Dennis Madsen1
1dnnm@novonordisk.com, Novo Nordisk

Utilizing data from different experimental and computational methods is one of the key issues in modern drug discovery. Much effort has been put into complex data integration schemes and recently some commercial solutions have arisen. Asking the right questions is difficult since most scientists have expert knowledge in a small number of research methods. Drug discovery projects furthermore extend over many years and may have several project managers. An overview of the experimental data available is therefore very useful to be able to ask more specific questions. A system has been designed where the different data sources can be queried for specific types of information: project name, chemical compound id, biological sequence id, protein structure id, metabolic pathway name, and metabolite name. The user can query for information in one data source or simultaneously in all of the available data sources. The data types force a simple model onto the data and some data sources only contain information concerning some of the data types. The hits are displayed with links to the originating specialist data source and an option to use the hit as the next query. The user can thereby browse from e.g. a project name via the available protein structures to the chemical compounds that were bound to the protein structures. Or a project manager can get a quick overview of the information in the data sources based on e.g. the project name. A prototype was developed as a web-solution, but has now been implemented as a windows application. The data sources are available through CORBA services.
The simple data types are presently being re-used as building blocks in advanced systems for specific high-level queries and visualization in e.g. microarray, metabonomics and proteomics experiments.