了解真核细胞中细胞核内蛋白质的定位情况对于新发现蛋白质的功能注释具有重要意义。随着蛋白质数据库中蛋白质序列数量的急速增加,采用计算方法来预测蛋白质亚核定位已经成为蛋白质科学领域研究的热点。根据Chou提出的伪氨基酸组成离散模型,提出了一种新的蛋白质亚核定位预测方法。计算蛋白质序列的近似熵作为附加特征构建伪氨基酸组成,表示蛋白质序列特征,AdaBoost分类算法作为预测工具。与已报道的亚核定位预测方法的性能相比,这种方法具有更高的准确率。
The knowledge of protein subnuclear locations in eukaryotic cell provides strongly help for annotation of protein function. The gap between the number of known function proteins and the number of known sequence in protein databank is increasing rapidly. Prediction of protein subnuclear locations becomes an important research hot point in protein science. A novel approach based on pseudo amino acid composition (PseAA) was proposed to predict protein subnuclear localization. According to the concept of PseAA originally introduced by Chou, a novel pseudo amino acid (PseAA) composition based on the concept of approximate entropy (ApEn) was pressented. The AdaBoost classifier was used as prediction engine. The quite encouraging obtained results indicate that the current approach is effective and might become potential tools in this area and other protein attributes .