cDNA2Genome: A TOOL FOR MAPPING AND ANNOTATING cDNAS

Coral del Val1, Karl-Heinz Glatting2, S.Suhai
1c.delval@dkfz.de, Department of Molecular Biophysics DKFZ, German Cancer Research Center; 2glatting@dkfz.de, Department of Molecular Biophysics DKFZ, German Cancer Research Center

The aim of a high-quality annotation is to identify the key genome features - in particular, the genes and their products in order to bridge the considerable gap between large-scale data-collection and its interpretation. The tools and resources for annotation are developing rapidly, however, high throughput data analysis requires the correct combination of these applications and databases.

In this context we have developed different analysis tools for semi-automatical annotation of sequences. In collaboration with other NGFN/DHGP research groups we have designed tools supporting cDNA mapping and characterization of full length cDNAs (cDNA2Genome). cDNA2Genome is a web tool for automatic high-throughput mapping and characterization of cDNAs. It uses already existing annotation data and improves them when possible with the most up-to-date databases, especially in the case of ESTs, proteins and mRNAs. cDNA2Genome is focussed on the determination of the cDNA exon-intron structure which is exhaustively assessed with a vast number of approaches to gene prediction. The final result of cDNA2Genome is an XML file which contains all relevant information obtained by the task. This XML output can be easily used in successive analysis (i.e. in pipelines or for integrating the data into databases). At the same time, the web user can easily inspect the xml output through the graphical display using standard web browsers. The graphical representation provides an interactive view of the annotations. The complete application outputs are immediatly accessible via hyperlinks.

cDNA2Genome has been implemented under the W3H task framework using HUSAR (Heidelberg Unix Sequence Analysis Resources) http://genome-dkfz-heidelberg.de . This framework allows the straightforward implementation of a combination of bioinformatics tools. Furthermore, the task-system allows the immediate integration of cDNA2genome into the W2H web interface. This interface allows easy access to sequence databases in addition to analysis programs. Additionally it provides secure storage of sequences and results as well as secure access through HTTPS (SSL, HTTP://www.nyx.net/%7Elmulcahy/ssl.html) and SSH(ftp://ftp.cs.hut.fi/pub/ssh/).Using the W3H-Task-System, cDNA2genome is ready to allow the sequential or parallel computation of many sequences for large-scale analysis (batch processes).