针对P2P网络流量产生过程中存在概念漂移现象,提出具有概念漂移检测功能的多分类器动态集成流量识别方案。该方案包括概念漂移检测和分类器动态集成两大模块,由卡方统计推断原理实现概念漂移检测模块功能,采用基分类器的性能优先淘汰策略进行动态集成解决流量概念漂移发生后的识别问题。在以贝叶斯分类器、支持向量机、决策树作为基分类器,针对不同集成规模、数据块大小进行仿真实验,结果证明方案是可行的,模型的识剐准确率达到82%以上。
To deal with concept drift in P2P traffic identification, a new traffic identification scheme is pres ented. The scheme consists of two components: concept drift detection module and ensemble module. Based on the chi-square statistic principle the concept drift detection module is implemented, and the ensemble module is established dynamically by base-classifier's performance. For different ensemble scale and data block, simula- tion experiments built on the three kinds of base-classifier(Bayes classifier, support vector machine, and decision tree) are done, experiment results show that the scheme is feasible and the accuracy of the scheme is more than 82%.