引入加权思想,以一种新的特征提取法,即加权自相关函数,表示蛋白质序列,与支持向量机组合,并采用“一对多”、“一对一”分类策略对膜蛋白进行分类研究,结果有明显改善。在采用支持向量机算法及“一对多”分类策略下,加权自相关函数特征提取法的每一类别分类精度、Matthews相关系数和总分类精度都要高于氨基酸组成成分特征提取法相应的分类结果,其总分类精度和脂链锚锭蛋白的分类精度分别为87.98%、65.85%,比氨基酸组成成分特征提取法分别提高3.38、9.75个百分点;“一对一”策略的总分类精度可达到94.88%,比“一对多”策略提高6.9个百分点;支持向量机机器学习算法的分类能力优于贝叶斯协方差统计算法,其总分类精度比贝叶斯协方差算法最大可提高15.6个百分点。
The weighted idea is introduced to form a novel feature extraction method, that is, the weighted auto-correlation function method, to represent the protein sequences. The support vector machine (SVM) algorithm is combined felicitously with this novel feature extraction method, and two classification strategies ('one-versus-rest' and 'one-versus-one') are also used to classify the membrane proteins. The results are significantly improved. With the same SVM and 'one-versus-rest' strategy, the results based on the weighted auto-correlation function method are better than that based on amino acid composition method. The total accuracy and lipidchain anchored accuracy are 87.98% and 65.85%, which are 3.38, 9.75 percentage points higher than that of amino acid composition method respectively in jackknife test. The total accuracy of 'one-versus-one' strategy may he up to 94.88% in jackknife test, which is 6.9 percentage points higher than that of "one-versus-rest" strategy. The classification performance of SVM is superior to Bayes covariant discriminant algorithm. The total accuracy of SVM is 15.6 percentage points farthest higher than that of Bayes covariant discriminant method.