SeqHound: biological sequence and structure database as a platform for bioinformatics research

Katerina Michalickova1, Hao Lieu2, Gary D. Bader, Michel Dumontier, Doron Betel, Ruth Isserlin, Christopher W.V. Hogue
1katerina@mshri.on.ca, Samuel Lunenfeld Research Institute and Department of Biochemistry, University of Toronto; 2lieu@mshri.on.ca, Samuel Lunenfeld Research Institute

SeqHound [http://seqhound.mshri.on.ca] is an integrated biological sequence and 3-D structure database system. It provides a bioinformatics resource in a research group setting. SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data. It holds links to sequence redundancies, sequence neighbours, taxonomy, complete genomes, structural domains as defined by Reverse Position Specific BLAST, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a PERL, Bioperl, C or C++ remote API or directly through an optimized local C API. SeqHound provides functionality necessary to retrieve sequences, structures and structural domains using any of the specified links as a query. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on automated retrieval using identifiers while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. The source code and examples are available under the terms of the GNU public license at the Sourceforge site [http://sourceforge.net/projects/slritools/] in the SLRI Toolkit.