BlastNP: A new sequence similarity searching and visualization method

Jan C. Biro1
1jan.biro@kbh.ki.se, Homulus Informatics

Background

BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The standard nucleotide-nucleotide BLAST [blastN] has relatively low sensitivity. The standard protein-protein BLAST [blastP] is much more sensitive but the number of known real protein sequences is limited. It is possible to combine the advantages of the nucleotide and protein blasts by translating the nucleotide query [blastX], the database [TblastN] or both [TblastX] into real or conceptual protein frames.

Results

An alternative method to TblastX has been developed, known as blastNP. Nucleic acids in database and query sequences were translated into overlapping protein-like sequences (overlappingly translated sequences or OTSs) before searching with blastP. Thus, each nucleic acid sequence is represented by a single “protein like” sequence (instead of three hypothetical proteins in different reading frames). The BlastNP method is defined as a BlastP that is performed on an overlappingly translated nucleic acid database using a similarly converted nucleic acid query. The specificity and sensitivity of blastNP and TblastX is very similar, however blastNP is more sensitive to detect short sequence similarities (less than 50 residues).

Conclusion

BlastNP combines the advantages of nucleotide and protein blasts and bypasses many difficulties: 1. it is more sensitive to weak sequence similarities than blastN, 2. codon redundancy is eliminated, 3. the sensitivity to single nucleotide polymorphism, mutation and sequencing errors are reduced, 4. it is insensitive to frame shifts.

Key words: blast, translation, nucleic acid, databases, data management, methods and systems, sequence analysis, new technologies and methods, data visualization.