利用伪氨基酸组成提取蛋白序列特征值,考察参数λ和w对识别效果的影响,以k-近邻作为基础分类器,用于预测水解酶的亚家族类型.结果表明,伪氨基酸组成特征提取法与单纯的20个氨基酸组成特征方法相比,其识别精度有较大程度提高.20AA组成的平均预测精度为72.3%,而伪氨基酸组成特征提取的识别效果可达82.7%.在参数影响考察方面,自相关性函数个数的选取对识别效果影响较大,而权重因子w对识别效果影响则很小.
Predicting the hydrolase subfamily is of great importance for designing a fast and reliable classification system.In this paper,the pseudo amino acid composition method was used to extract the features from protein sequencec,and the k-nearest neighbor algorithm was used as the classifier to predict the hydrolase subfamily.The influences of λ and ω on prediction accuracy were also studied.The results showed that the prediction accuracy of pseudo amino acid composition were much higher(about 10.4%) than that of amino acid composition,the prediction accuracy of amino acid was 72.3%,while the pseudo amino acid was 87.2%.The running parameter of λ had more influence on prediction accuracy when compared with ω.