Uncovering Hidden Linkages among Disparate Information Sources

Edy S. Liongosari1, Mitu Singh2
1edy.s.liongosari@accenture.com, Accenture Technology Labs; 2mitu.singh@accenture.com, Accenture Technology Labs

KDT is a tool that utilizes a knowledge modeling approach to intelligently extract and integrate a large set of disconnected bio-medical information. From fifteen publicly available sources of over 200GB of raw data, the tool identifies over 2.5 million bio-medical entities with two billion relationships among those entities. Through an extensive use of user definable rules, the entities and relationships are cleansed, analyzed and integrated to form a single unified knowledge web. The rules also allow KDT users to customize what entities and relationships to display on the screen, how to derive and prioritize the relationships.

This allows KDT to go beyond simply showing a web page of links pertinent to a particular subject. It allows its users to see how the entities are linked together at different levels of certainty. It uncovers hidden linkages by intelligently traversing the integrated knowledge web and deriving additional relationships. It also highlights certain unusual links that might be worth exploring. KDT comes with a unique web-enabled user interface that allows users to navigate through a large knowledge web with ease, view a large set of relationships at once and create additional annotations.

KDT has been shown useful in the early stage of the drug discovery process in Pharmaceutical companies in several ways. First, it can quickly show the researchers how two seemingly unrelated entities may be related to each other such as the relationships between two diseases. This will help the researchers to quickly narrow down the scope of their literature search and identify which route to pursue. Second, it can quickly prioritize potential biological targets and identifies the relationships among those targets. Third, it can be used to identify potential collaboration opportunities among its users to remove potential redundant activities across the organization by matching their bookmarks, navigation history and profiles.

KDT is implemented as a three-tier web-based architecture. The back-end components that are responsible for data extraction, integration and link creation, heavily utilize a relational database, third party data interfaces and a large set of custom Java components. The middle-tier application server is based on Java 2 Enterprise Edition. The front-end visualization component is a Java applet downloadable to any Java-enabled web browser.