蛋白质序列特征表示和机器学习算法是影响蛋白质结构类预测效果好坏的两个重要方面.本研究基于k-字统计频率和k-片段位置分布两种特征提取方法,将分别提取到的氨基酸序列信息和物理化学性质信息同蛋白质二级结构信息进行融合,建立17维和57维的特征信息集,并尝试在Adaboost.M1算法中引入Multi-Agent多智能体融合的思想,提出了一种Ma-Ada多分类器融合算法.该算法作为蛋白质结构类的预测工具,充分挖掘了单分类器度量层信息以及各个单分类器之间的交互融合信息.实验结果表明,Ma-Ada算法在Z277、Z498、1189和D640四个蛋白质数据集的57维特征信息集上的分类率分别达到了91.3%、96.8%、85.3%和87.2%,在17维特征信息集上的分类率也分别达到了90.6%、95.8%、84.8%和88.3%.与其它蛋白质结构类预测方法的结果相比,本方法能够获得较好的分类率.
Protein sequence feature and machine learning algorithm are two important aspects to determine the results of protein structural class prediction. In this study, we established 17-D and 57-D feature information sets through fusing the sequence information, physical and chemical information with the secondary structure information based on the k-word statistical frequency and the k-fragment distribution feature extraction method. By introducing Multi-Agent's idea into Adaboost. M1 algorithm, a novel method for protein structural class prediction, called Ma-Ada multi-classifier fusion algorithm, was proposed, which fully utilized the information of the single classifier metric layer and the fusion of information among individual classifiers. Four protein datasets including Z277, Z498, 1189, D640 were used to validate the performance of the Ma-Ada algorithm. Classification accuracies are 91.3 % , 96.8 % , 85.3% and 87.2 % with 57 -D features, and 90.6 % , 95. 8 % , 84.8 % and 88.3 % with 17 D features on datasets Z277, Z498, 1189 and D640, respectively. The experimental results show better.