One of the essential features of eukaryotic genes is that one gene can produce many isoforms of mRNA. Sequencing of the human and mouse genomes lead to identification of about 30 thousand genes, in contrast to earlier estimates of up to 120 thousand human genes. The latest estimations demonstrated that at least 60% of human genes are alternatively spliced. The frequency of alternative splicing of mouse genes was estimated as 41%. Regulated AS is involved in human genetic disease, and up to 15% disease-causing point mutations are thought to be responsible for aberrant splicing. Thus alternative splicing emerges as a major mechanism of generating the proteome diversity.
It is clear that current EST databases are incomplete and thus comparison of alternative splicing isoforms deduced from the EST data will lead to severe underestimation of conservation of alternative splicing. Instead, we apply a conservative approach: we analyze whether an alternatively spliced region observed in one species is present in the genome of the other species. This variant may be non-functional due to changes in regulation. Thus our analysis may underestimate the true extent of non-conserved alternative splicing.
Analysis of GenBank, draft human genome, and two databases of alternative splicing, AsMamDB and HASDB resulted in identification of 166 pairs of orthologous alternatively spliced human and mouse genes. We considered four types of elementary alternatives: cassette (on/off) exons, alternative donor and acceptor splicing sites, and retained introns. We did not distinguish cassette and alternative (mutually exclusive) exons, as the comparative genomic approach does not allow one to take into account dependencies between elementary alternatives. 126 alternatively spliced human genes contained 177 cassette exons, 51 alternative acceptor sites, 52 alternative donor sites, and 12 retained introns. The total number of known human elementary alternatives was 285. 124 alternatively spliced mouse genes contained 123 cassette exons, 46 alternative acceptor sites, 53 alternative donor sites, and 29 retained introns. The total number of known mouse elementary alternatives was 252. Only 51 elementary alternatives were described in both genomes, which comprizes about 20% of all elementary alternatives.
We observed that that only 69-83% of elementary alternatives were conserved. The degree of conservation was higher in mRNA-derived alternatives (76-83%) compared to EST-derived ones (69-75%), and higher for mouse alternatives (73-83%) than for human ones (69-76%). About half of alternatively spliced genes had at least one species-specific isoform. The differences between human and mouse data and between mRNA and EST data are not large and may be explained by differences in EST coverage of the human and mouse genomes, especially as regards relatively young species-specific isoforms with likely narrow tissue or stage specificity.
So, these results demonstrate a considerable diversity of alternative splicing in the human and mouse genomes. Orthologous genes with non-conserved isoforms may play a role in species-specific development and speciation in general.
R.N. Nurtdinov, I.I. Artamonova, A.A. Mironov, M.S. Gelfand (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Human Molecular Genetics, accepted