随着大数据、物联网技术应用不断深入,数据流产生场景逐渐增多,对数据流分类挖掘成为研究热点.数据流具备时序特性,存在概念漂移现象,导致传统数据流分类模型无法直接迁移到新环境.文中首先分析集成学习、增量学习在具有概念漂移的数据流分类中的应用研究,同时讨论了如何利用主动学习、半监督学习和迁移学习解决数据流分类中样本标注难题.最后对具有概念漂移的数据流分类存在问题及发展趋势进行分析,提出进一步研究方向.
With the application of big data and network technology, the scenes of data stream gradually increase, and research on data stream classification becomes a hot topic. There are time characteristic and concept drift phenomenon in data stream, which makes the traditional data stream classification model invalid in the new environment. This paper analyzes the application research of ensemble learning and incremental learning in the data stream classification with concept drift, meantime, it discusses how to use active learning, semi - supervised learning and transfer learning to solve the problems of sample annotations in data stream classification. Lastly, the existing problems and development trend of data stream classification with concept drift are analyzed and the direction of further research is provided.