采用生物信息学方法对茶树(Camellia Sinensis)叶部不同发育阶段(单芽/三叶)转录组中的EST-SSR位点信息进行分析,以期为茶树分子标记辅助育种提供有效的参考。利用SSRFINDER软件,对茶树芽叶转录组的高通量测序获得的97454条Unigenes序列(100.91Mb)进行大通量SSR位点的筛选,共获得36249个SSR位点,分布于27913条Unigenes序列中,其发生频率为28.64%,平均分布密度为1/2.78 kb。在茶树芽叶的转录组序列中共发现962种碱基重复基元,其中占主导的是以(AG/CT)n为主(占总SSRs的23.78%)的二核苷酸重复,占到总SSRs的59.19%,其次是三核苷酸和单核苷酸重复,其占比分别为20.92%和13.13%。茶树芽叶转录组所含不同重复基元SSRs的平均长度,相对全器官转录组较短,其中占比最高的是重复长度为18 bp的中长重复序列,占到总SSRs的23.37%,而大于25 bp的较长序列重复极少。同时对茶树不同器官转录组测序的SSR分布特性进行了分析比较,以期为后续的分子标记开发提供参考。
Bioinformatics method is used to analysis the EST-SSR loci in different development stages of leaf(Single bud & 3rd leaf) transcriptome of tea plant(Camellia Sinensis), in order to provide effective reference for the tea plant molecular marker assisted breeding. In total 97454 unigenes(100.91 Mb)derived from deep sequencing of camellia sinensis bud and 3rd leaf transcriptome were used for the development of functional microsatellites or simple sequence repeats(EST-SSRs) molecular markers. 36249 SSRs were screened using SSRFINDER software in 27913 unigenes. The frequency of SSR was 28.64% and mean distribution density was 1/2.78 kb. Di-nucleotide was the major repeated type, accounting for 59.19%, and tri-nucleotide and mononucleotide accounted for 20.92% and13.13%, respectively. Among all 962 SSR motifs, A/T, AG/CT and AAG/CTT were respectively the most frequent repeated motifs in mono-nucleotide, di-nucleotide and tri-nucleotide repeated type, and accounting for 12.98%, 45.95% and 5.59% in all the SSR repeat motifs. The microsatellites with length below 18 bp were in the maximum proportion, accounting for 23.37%, while the microsatellites over 25 bp were only 3.6%. The results indicated that the unigenes obtained from transcriptome sequence in C. sinensis can be used as an effective way for the development of SSR loci. More SSR primers will provide more markers for the research of genetic variation of C. sinensis.