基于蛋白质相互作用能量热点的特性,定义了残基接触数、溶剂可及性面积相对变化量所占比例等18个新特征。分别使用基于支持向量机(support vector machine,SVM)和基于 F - Score 的递归特征消除法进行特征选择,提出对应的预测模型 SVM - RFE 和 F - Score - RFE 用于蛋白质能量热点的预测。实验结果显示,在独立测试中 F - Score - RFE 模型的 F1比当前预测性能最好的方法提高6.25%,表明所定义的新特征对蛋白质能量热点的识别具有较大的贡献。
18 new features such as residue contact number and the proportion of relative change of accessible surface area et al. were derived based on the analysis of protein-protein interaction energy hot spots. Two recursion feature elimina-tion methods were used to select discriminative feature subsets and two corresponding prediction models were proposed, noted as SVM - RFE and F - Score - RFE. The experimental results showed that the prediction model F - Score - RFE could improve 6. 25% in the value of F1 compared with the best existing method on the same independent test dataset, which indicated that new features defined were significant to improve the performance of prediction.