Automating Data Collection And Categorisation Using The CAS Software

John Gill1
1john.gill@monash.edu.au, Victorian Bioinformatics Consortium, Monash University

Often biologists wish to obtain related information from 2 or more biological databases, frequently grouping and relating the data elements in one or more ways. This presents a number of problems, including; information formatting styles, database design and inconsistencies and the act of searching multiple databases for the required data elements themselves. When this needs to be done for a large number of biological items, or repeated on a frequent basis to ensure an up-to-date information, then this can quickly become a laborious, if not a painstaking task. To help alleviate this problem we have developed the CAS, Categorised Annotation Sets, system. This system allows the user to define information categories and incoming data formats and sources. Each of the categories act as object items, consisting of both attribute and method elements. By defining the external data sources and their accompanying data format, the user can create association links between the incoming data fields and the category attributes. Each association may also be given one or more prioritized sequence of conditional association rules, thus giving the user complete control over data allocation to each of the category attributes. Methods defined within each of the categories may be used for data formatting, data manipulation or for external database calls using one or more fields within the category attribute set. The category methods can be attached to one or more of the category attributes and may be triggered by various events, such as data addition, updates or deletions, or via external method calls. An inbuilt Display Manager allows the user to navigate through the CAS dataset, create new associations between data elements and investigate data relationships. Through the CAS software and its underlying data management concepts, this system may be used to automatically build and maintain large ongoing biological data sets consisting of information obtained from multiple data sources. A complimentary software product, called GET3D, Genomic Exploration Tool in 3D, can be used to navigate, manipulate and build onto the CAS database. This software tool uses 3D visualization and interactive techniques to navigate and manipulate the CAS dataset, including the creation and managing visual links and data associations.