聚类是数据挖掘中一个非常活跃的研究分支,任意形状的聚类则是一个有待研究的开放问题。提出一种包含分类属性取值频率信息的类间差异性度量和一种对象与类的相似度定义,在此基础上提出一种能处理任意形状的聚类算法,可处理混合属性数据集。在人造数据集和真实数据集上检验了提出的算法,并与相关算法进行了对比,实验结果表明,提出的算法是有效可行的。
Clustering is a very active research branch in data mining field.The research about the arbitrary shape clustering is an open problem.In this paper an inter-cluster dissimilarity measure taking into account the frequency information of the categorical attribute values is introduced.An arbitrary shape clustering algorithm is proposed by defining the similarity degree between an object and a cluster.It can be used for the mixed attributes dataset.The experimental results on the synthetic and real-life datasets show that the proposed algorithm is feasible and effective comparing to other classical algorithms.