聚类集成是集成学习中的一个重要分支,其目标是解决无监督聚类分析中聚类算法的选择性、偏差性与数据特殊性等导致聚类结果不理想的问题。文章提出了一种基于数据关联的聚类集成方法(CEBDR),该算法先提取出在聚类成员中体现有关联关系的数据对象来组成新的类,然后对这些类进行二次聚类得到最终的集成结果。文中选用了一些标准数据集,采用CEBDR算法、已有的基聚类和聚类集成算法来进行对比实验,实验结果表明,该算法能够有效地提高聚类质量。
Clustering ensemble is an important part of ensemble learning, and the goal is to solve the problem of the bad result caused by the selectivity and bias of clustering algorithms and the specialness of data in the unsupervised clustering analysis. A clustering ensemble algorithm based on data associa- tion is proposed in this paper. The algorithm firstly extracts classes made of related data objects in clustering members, and then combines these clusters again to get the final result. Finally the com- parison experiments on the selected standard datasets are carried out by using the proposed algorithm and the existing algorithms of base clustering and clustering ensemble, and the results show that the proposed algorithm works better in clustering analysis.