运用隐马尔可夫模型,利用Perl编程,以几种模式生物的蛋白质数据库为基础,构建了目标基因的全基因组预测的新方法。该方法具有高通量,准确度高且操作简易等优点,特别在多结构域蛋白家族预测上更显优势。应用该方法对几种模式生物的全基因组PPR和TPR蛋白家族进行了预测,其中粳稻日本晴中含有536个PPR蛋白、199个TPR蛋白;籼稻9311中含有519个PPR蛋白、177个TPR蛋白;拟南芥中含有735个PPR蛋白、292个TPR蛋白;红藻中6个PPR蛋白、32个TPR蛋白;蓝细菌以及古细菌中没有PPR蛋白,但蓝细菌含有10个TPR蛋白,古细菌有4个TPR蛋白,并对所得结果进行了进一步生物信息学分析。
Based on the protein databases of several model species, this study developed a new method of the Genome-wide prediction for the target genes, using Hidden Markov model by Perl programming. The advantages of this method are high throughput, high quality and easy prediction, especially in the case of multi-domains proteins families. By this method, we predicted the PPR and TPR proteins families in whole genome of several model species. There were 536 PPR proteins and 199 TPR proteins in Oryza sativa ssp. japonica, 519 PPR proteins and 177 TPR proteins in Oryza sativa L. ssp. indica, 735 PPR proteins and 292 TPR proteins in Arabidopsis thaliana, 6 PPR proteins and 32 TPR proteins in Cyanidioschyzon mero/ae. Synechococcus and Thermophi/ic archaebacterium did not have PPR proteins. By contrast, 10 TPR proteins were found in Synechococcus and 4 TPR proteins were found in Thermophilic archaebacterium. Moreover, of these results, some further bioinformatics analyses were conducted.