任意形状聚类是数据流挖掘中的重要研究课题.提出一种滑动窗口内进化数据流任意形状聚类算法SWASCStream.提出了改良的微簇特征结构,能够全面地描述滑动窗口内任意形状的簇;提出新的稀疏微簇、临界微簇和非疏微簇的概念,有助于从本质上提高滑动窗口内的聚类质量;提出了合理的微簇周期删除策略,能够有效降低算法的维护代价,并且保证误差可控.通过一系列真实和人工数据集上的试验,验证了本文算法的高效性.
Arbitrary shape clustering is an important task in mining data streams. This paper presents the SWASCStream, a new approach for discovering arbitrary shape clusters in an evolving data stream over sliding windows. The improved micro-cluster feature structure is introduced to describe the clusters roundly over sliding windows, the new definitions of the sparse, critical and non-sparse micro-cluster are proposed to improve the cluster result essentially, and the rational periodical pruning strategy is designed to reduce the maintenance cost and make the error controllable. Experimental evaluation over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.