采用新一代高通量测序技术平台IlluminaHiseq2000对云南松转录组测序,得到的数据进行denovo组装,获得80000条Unigenes,N50为1881nt、平均890nt。与公共数据库进行比对,注释到NR、NT、Swiss.Prot数据库的Unigenes分别为43434、46415、29418条。将Unigenes与COG数据库比对,有14792条Unigenes成功注释,根据功能大致分成25类;与GO数据库比对,有26743条Unigenes获得注释,按功能分为细胞组分、分子功能和生物过程3大类55亚类,其中参与的生物过程较多;以KEGG数据库参考,有25873条Unigenes参与128条代谢途径分支,以代谢相关的通路较为集中,并找到与木质素合成关键酶的Unigenes。这些研究极大地扩充了云南松的基因资源,将有助于云南松基因的发掘与利用、分子标记的开发及其种质资源遗传改良的研究等。
The transcriptome of Pinus yunnanensis was sequenced by using Illumina Hiseq 2 000. In total 80 000 Unigene with an average length of 890 nt and N50 of 1 881 nt were obtained by de novo assembly. Of the Unigene, 43 434, 46 415 and 29 418 Unigenes had significant similarity with known data bank in NR, NT and Swiss-Prot, respectively. 14 792 Unigenes were annotated in clusters of orthologous groups of proteins (COG) and assigned to 25 clusters. 26 743 Unigenes were annotated in gene ontology(GO) and grouped into biological processes, cellular components and molecular function three functional categories, 55 sub-categories. The biological processes were most commonly existed. A total of 25 873 Unigenes were divided into 128 Kyoto Encyclopedia of Genes and Genomes(KEGG) pathways whose functions focused on metabolism. We found some Unigenes related to lignin biosynthesis. The sequence data for P. yunnanensis wiill be helpful for the gene discovery and utilization, molecular marker development and genetic improvement in the further research.