SeqFreq: A Statistical Repetitive Motif Discovery Tool

Roger Craig1, Li Liao2, Javier Garcia-Frias, Adam Marsh
1rcraig@eecis.udel.edu, University of Delaware; 2lliao@cis.udel.edu, University of Delaware

SeqFreq is a software tool for the discovery and analysis of repetitive motifs in biological sequences. SeqFreq uses a numerical suffix tree method to quantitatively order n-mers according to different criteria such as length, GC%, frequency and maximum likelihood estimates. Either all n-mers or those of a user-specified length may be found. The method may either be exhaustive and find all repetitive motifs or it may prune the tree during construction based on certain parameters and thresholds. Discovering and enumerating motifs, their structure and their statistical distributions remains an active area of investigation. Multiple microbial genomes can be compared and contrasted on the basis of n-mer choice and usage. The usage and choices of these n-mers and their correlation to differing phenotypes, environmental habitats, genomic entropy, and genomic complexity is an area of future interest.