隐含概念漂移的数据流分类问题是数据挖掘领域研究的热点之一,而实际数据中的噪音会直接影响概念漂移检测及分类质量,因此具有良好抗噪性能的数据流分类方法具有重要的研究和应用价值.随机决策树的集成模型是一种有效的数据流分类模型,为此本文基于随机决策树,引入Hoeffding Bounds不等式来检测和区分概念漂移和噪音,根据检测结果动态调整滑动窗口的大小和漂移检测周期,并提出一种增量式的集成分类方法ICDC,实验结果表明,本文算法在含噪音数据流上处理概念漂移是有效的.
Classification of data streams with concept drift has become one of hot research spots.However,noise in real data directly affects the result of detection of concept drift and the quality of classification.Therefore,an anti-noise approach is of important value for research and application.Based on the ensemble random decision tree,an effective classification model for stream classification,an incremental approach ICDC was proposed by introducing the Hoeffding Bounds inequality to distinguish concept drift and noise in classification,which adjusts the period of detection and window size for training data in accordance with the detection results.Extensive studies on synthetic and real streaming databases demonstrate that ICDC performs quite effectively compared with several known single or ensemble online algorithms.