为了解决单一聚类算法存在结果不准确和随机性大,且现有算法对分类数据聚类时将其转换成数值型会产生误差等问题,提出了一种面向分类属性数据的聚类融合算法。算法利用原有分类属性值的差异产生聚类成员,然后采用相似度方法进行划分,通过寻求目标函数最小的划分来简化聚类过程。算法在UCI数据集上进行了验证,结果表明算法的效率和精度都优于现有算法,说明算法的设计和更新策略是有效的。
In order to prevent the inaccuracy and randomness of single clustering algorithm,and error of existing clustering algorithm transferring categorical data into numerical data for clustering,this paper proposed the clustering ensemble for catego-rical data.The algorithm produced clustering memberships by values of categorical data,and then used similarity degree to partition dataset,which reduced the process of clustering by minimizing the objective function.Finally,applied the algorithm into UCI dataset.The results show its efficiency and accuracy are better than existing algorithms,the design and refreshing methods are effective.