微卫星是生物基因组中变异频率最快的序列,结构基因中微卫星重复数的变化会引起基因的框移突变,导致基因表达完全不同或截短的蛋白。因此在进化过程中,基因区微卫星会受到强烈选择的影响。为研究基因区微卫星在不同树种中的变化情况,在本研究中,利用SPUTNIK程序分析了NCBI数据库中松树(Pinus spp.)、杨树(Populus spp.)及桉树(Eucalyptus spp.)的表达序列标签(express sequence tag,EST)序列各3万条。结果显示,桉树和杨树EST序列含有微卫星的比例比较接近,分别为18.7%和15.3%,而在松树中则发生了较大分化,只有8.2%。研究发现,三碱基重复单元是这3个树种编码序列中微卫星的主要重复类型。除三碱基重复微卫星外,桉树和杨树EST序列中其它类型微卫星的丰度随着重复单元长度的增加而减少,而在松树中则呈相反现象。同时值得注意的是松树EST序列中变异频率快的微卫星(〉20bp)数量明显比桉树及杨树少。研究还发现,3个树种中微卫星获得或丢失重复单元的速率都随着重复单元的增加而降低。本研究首次报道了不同树种基因区微卫星比较研究,发现了一些松树与杨树、桉树相比较EST序列中所含微卫星在丰度及变异频率方面存在的异同。基因所含微卫星序列对基因的功能有重要影响,本研究的结果将为了解不同树种中基因区微卫星的特征提供重要参数,同时也将为利用所研究树种的EST序列开发多态性高的微卫星标记提供有益的生物信息学参考。
Microsatellites are the most variable sequences in the genome of different organisms. Changes in repeat motif numbers will cause frameshift mutation of the corresponding genes, and lead to the expression of completely different or shortened proteins. During the evolutionary time, microsatellites in transcribed sequences have undergone strong selection. In order to explore the variation trends of genic SSRs in different tree species, thirty thousand ESTs were analyzed for Pinus spp. Populus spp. and Eucalyptus spp. respectively in this study. The results showed that the percentage of ESTs containing SSRs was similar in eucalyptus and poplars, accounting for 18.71% and 15.33% respectively. By contrast, this ratio was significantly lower in pine, only accounting for 8.22%. A common phenomenon observed in the three tree species was that the triplet repeats were the dominant microsatellites in the investigated EST sequences. Except for the triplet SSRs, richness of different type SSRs decreased with an increase in repeat motif length both in eucalyptus and poplars, while an opposite variation trend was observed in pine. It was noteworthy that content of highly polymorphic microsatellites (20 bp) was higher in ESTs of eucalyptus and poplars than that of pine. The results also showed that, in the investigated tree species, the frequency of microsatellite gaining or losing repeat unit/units decreased with increment in the repeat motif lengths of different types of microsatellites. We first report the comparison of genic SSRs in different tree species, and find some interesting variation trends in comparison pine with poplar and eucalyptus. Since genic SSRs significantly affect the gene function, the results provide some important parameters to learn the characteristics of genic SSRs in different organisms. Meanwhile, our results also supply useful bioinformatics guidance for developing high variable EST-SSRs in the investigated tree species.