当前动态数据流下的实时分类问题存在3个难点:针对海量数据的实时处理;概念漂移的跟踪和模型的更新;模型的稳定和鲁棒性。针对上述问题,将极端支持向量机(extreme support vector machine,ESVM)与 MapReduce 框架结合,提出了带遗忘因子的鲁棒 ESVM算法。该方法通过构造残差权重矩阵,对残差进行修正,同时加入遗忘因子,提高新样本的作用,从而实现对海量数据处理问题的求解。实验结果显示,所提出方法能够快速有效地对动态数据流进行分类,且结果不易受到噪声干扰,稳定性强。
There are three difficulties in real-time dynamic data stream classification:real-time processing of massive data, tracking of concept drift and model updates, model's stability and robustness.To solve these problems,extreme support vector machine (ESVM)is combined with MapReduce framework,and a forgetting factor robust ESVM algorithm (FFR-ESVM)is proposed. The proposed algorithm amends the residuals by constructing a residual matrix,while improves the effect of new samples by forgetting factor.Experimental results show that the proposed algorithm can rapidly and effectively classify dynamic data stream,and the results are stable and less affected by noise interference.