【目的】分析菠萝[Ananas comosus(L.)Merr.]基因组中编码CDS的密码子使用偏好性,为了解菠萝的密码子偏好性规律和进行分子改造提供理论基础,促进植物密码子的生物学研究。【方法】以菠萝基因组测序获得的30 663条编码CDS为数据来源,应用编写的perl脚本、CUSP和SPSS软件对序列进行密码子偏好性、双联密码子以及多元统计分析。【结果】菠萝基因组数据中的编码CDS的GC平均含量为52.09%,密码子中第3位核苷酸的GC平均含量(GC3S)为55.41%,有效密码子数(ENC)取值为58.41,绝大部分的ENC值都大于35。另外,确定了34种高频密码子(RSCU值大于1),其中仅有8个以AT碱基结尾,25个以CG碱基结尾;同时确定了31种高优越表达密码子。结合以上结果,最后筛选出13种最优密码子。通过与17种植物的GC3S和密码子使用频率进行比较,发现双子叶植物与单子叶植物的GC3S和密码子使用频率存在较大差异,而菠萝较其他单子叶植物与双子叶植物更接近。【结论】从不同基因、基因内不同位置以及不同植物3个层面对菠萝密码子的偏好性进行分析,筛选出13种菠萝最优密码子。该研究有助于更好地了解菠萝密码子偏好性规律,促进植物密码子生物学研究及基因组数据在非模式植物中的潜在应用。
【Objective】Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides(a triplet) which encodes a specific amino acid residue in a polypeptide chain or for the termination of translation(stop codons). After a long evolution, each species forms its own codon usage patterns. Pineapple [Ananas comosus(L.) Merr.] is a nutrientdense fruit with strong consumer demand and high commericial value. However, little is known about the rules of pineapple codon usage. The aim of the present study was to investigate the pattern utilization of codons in genome sequencing data of pineapple in order to provide important guidance for genetic transformation, new gene discovery, functional gene expression regulation, protein structure and function prediction of genes, comparative genomics research with other species and molecular breeding in pineapple.【Methods】Data were obtained by JGI database, we analyzed the 30 663 genes in genome sequencing data of pineapple to study the pattern utilization of codons by perl script, and SPSS bioinformatics softwares, by which CG, Effective number of codon(ENC), Relative synonymous codon usage(RSCU) and double codon werecaculated. The RSCU value was the relative probability of a codon encoding the same amino acid for a particular codon. In the absence of codon usage preference, the RSCU of each synonymous codon was 1.When the RSCU of a codon was over 1, the codon was defined as a high frequency codon, indicating that the codon had a higher frequency of use in a synonymous codon and that the gene had a preference for the codon. The ENC value described the degree to which codon usage is deviated from random selection. ENC could reflect the degree of preference for synonymous codon usage in the codon family. The smaller the ENC value was, the higher the expression level of the corresponding endogenous gene was. According to the size of the ENC of each gene, the values of RSCU of the gene