Bioinformatics Tools to Support siRNA Technology

Fran Lewitter1, Bingbing Yuan2, Markus Hossbach, Thomas Tuschl, George Bell, Robert Latek
1lewitter@wi.mit.edu, Biocomputing Group, Whitehead Institute, Cambridge MA; 2yuan@wi.mit.edu, Biocomputing Group, Whitehead Institute, Cambridge MA

One way to study the function of a gene is to reduce or eliminate its expression in a cell. This can be particularly useful when studying the dysfunction of the gene that, for example, is known or thought to cause a particular disease. A promising technology to study the role of an individual gene is RNAi and its effector molecule siRNA - short interfering RNA. A properly selected short double stranded RNA sequence (~ 21 nucleotides) can specifically silence gene expression. We have built a first-generation computational tool for siRNA selection (http://jura.wi.mit.edu/bioc/siRNA) which implements sophisticated selection algorithms to identify siRNAs with a high probability of specifically silencing the target gene.

Our tool provides the biologist with the flexibility to use predefined siRNA patterns or input their own pattern. Several filters can refine users' oligonucleotide sequence characteristics, such as GC percentage, base variations, and the number of repetitive bases in a row. Since the objective of using siRNA is to silence a specific gene in a mammalian cell, the base-pairing region for a siRNA is carefully selected to avoid similarity to any unrelated mRNA. To do so, our program incorporates similarity searching of each candidate siRNA against the human or mouse Unigene databases. Subsequently, each candidate siRNA is mapped to the human or mouse genome, indicating if the siRNA maps to an exon-exon boundary. To aid in the selection of a siRNA from a region of minimal genetic variation, single nucleotide polymorphisms in the region of each candidate siRNA are also shown.

We have also designed a prototype database called sirBank ("The siRNA data bank") to be a repository for siRNA molecules known to silence target genes. Through a web interface, users can browse the data or search on a variety of data fields such as target gene name, accession, species, and investigator. Data have been selected from the literature, and users can also submit data online.