针对重现概念漂移检测中的概念表征和分类器选择问题,提出了一种适用于含重现概念漂移的数据流分类的算法———基于主要特征抽取的概念聚类和预测算法(Conceptual clustering and predic‐tion through main feature extraction ,MFCCP)。MFCCP通过计算不同批次样本的主要特征及影响因子的差异度以识别重复出现的概念,为每个概念维持且及时更新一个分类器,并依据Hoeffding不等式选择最合适的分类器对当前样本集实施分类,以提高对概念漂移的反应能力。在3个数据集上的实验表明:M FCCP在含重现概念漂移的数据集上的分类准确率,对概念漂移的反应能力及对概念漂移检测的准确率均明显优于其他4种对比算法,且M FCCP也适用于对不含重现概念漂移的数据流进行分类。
Recurring concept drift is one of the sub‐types of concept drift .In recurring concept drift detec‐tion ,it is very important to represent concepts and select the most appropriate classifier to classify .We propose an algorithm ,conceptual clustering and prediction through main feature extraction (MFCCP) , for classifying data stream with recurring concept drifts .MFCCP can recognize recurring concepts by computing the differences of main features and impact factors of different batches of samples .It main‐tains a classifier for each concept and monitors the classification accuracy to select classifier according to hoeffding inequality in order to enhance the ability of adapting to concept drift .The experimental results over the three datasets illustrate that M FCCP achieves better classification accuracy ,adapts faster to con‐cept drift ,and detects concept drift more accurately than the other four algorithms on the data streams with recurring concept drifts ,and therefore ,M FCCP is apt to classify data stream without recurring con‐cept drif t .