Annotation of non-coding RNA molecules in cattle genomic sequences

Brian Dalrymple1, Sean McWilliam2, Wes Barris, Pradeep Tokachichu
1Brian.Dalrymple@csiro.au, CSIRO Livestock Industries; 2Sean.McWilliam@csiro.au, CSIRO Livestock Industries

Non-coding RNA molecules play an important, but probably under appreciated role, in cellular function. It is quite possible that many important production traits in animals may involve non-coding RNA molecules. However, whilst protein-coding regions are generally well annotated, the annotation of non-coding RNAs is very poor. The full sequencing of the bovine genome is likely to be underway towards the end of 2003. In preparation for this we are establishing an annotation pipeline for the identification of putative small and micro RNA molecules in the bovine genome based on a number of different approaches. So far a number of different methods have been implemented. To identify members of characterised families of small RNAs a combination of BLAST and INFERNAL (Eddy, 2002, BMC Bioinformatics 3:18) is used with the RNA covariance models in Rfam (http://www.sanger.ac.uk/Software/Rfam/). The sequence of each BAC is used to search the database of Rfam family members filtered to <90% identity. Regions of the BACs with matches with e value < 10 are then compared against the relevant Rfam family covariance model, basically as described by Griffiths-Jones et al., 2003 (Nucleic Acids research, 31, 1, 439-441). To identify potential microRNAs bovine BAC sequences are searched with BLAST against a database of precursor sequences from the miRNA registry (http://www.sanger.ac.uk/Software/Rfam/mirna/search/shtml. To identify snoRNAs and other known small RNAs BLAST searches of focussed databases are carried out To identify potential new non-coding structural RNAs a combination of genome specific BLAST and QRNA (Rivas and Eddy, 2001, BMC Bioinformatics, 2:8), a prototype structural non-coding RNA gene finder is used. Alignments with a log odds score of >=5.0 are processed further and compared with other genomes. The results of analysis of bovine BAC-end sequences using these approaches will be shown. The authors would like to thank Sam Griffiths-Jones for sharing the cmblast script used on the Rfam website with us.