针对目前数据流离群点不能很好地被处理、数据流聚类效率较低以及对数据流的动态变化不能实时检测等问题,提出一种基于近邻传播与密度相融合的进化数据流聚类算法(I-APDen Stream)。此算法使用传统的两阶段处理模型,即在线与离线聚类两部分。不仅引进了能够体现数据流动态变化的微簇衰减密度以及在线动态维护微簇的删减机制,而且在对模型采用扩展的加权近邻传播(WAP)聚类进行模型重建时,还引进了异常点检测删除机制。通过在两种类型数据集上的实验结果表明,所提算法的聚类准确率基本能保持在95%以上,其纯度对比实验等其他相关测试都有较好结果,能够高实效、高质量、高效率地处理数据流数据聚类。
To solve the problems that the data stream outliers can not be disposed well, the efficiency of clustering data stream is low and the dynamic changes of data stream can not be real-time detected, an evolutionary data stream clustering algorithm based on integration of affinity propagation and density (I-APDenStream) was proposed. The traditional two-stage processing model was used in this algorithm, namely online and offline clustering. Not only the decay density of micro-cluster which could represent the dynamic changes of data stream and deletion mechanism for online dynamic maintenance of micro- cluster were introduced, but also the outliers' detection and simplification mechanism for model reconstruction by using the extended Weight Affinity Propagation (WAP) cluster was introduced. The experimental results on two types of data sets demonstrate that the cluster accuracy of the proposed algorithm remains at above 95%, and also achieves considerable improvements with respect to the purity compared to other algorithms. The proposed algorithm can cluster the data stream with high real-time, high quality and high efficiency.