现有的半监督聚类方法较少利用数据集空间结构信息,限制了聚类算法的性能。该文提出一种基于密度的约束扩展方法(DCE),将数据集以图的形式表达,定义一种基于密度的图形相似度。根据样本点间的距离和相似度关系,对已知约束集进行扩展,扩展后的约束集可用于各种半监督聚类算法。以约束完全连接聚类和成对约束K均值方法为例,说明了约束扩展方法的应用。实验表明,DCE能够有效地提升半监督聚类算法的性能。
Most of the existing semi-supervised clustering methods neglect the structural information of the data, while the few constraints available may degrade the performance of the algorithms. This paper presents a Density-based Constraint Expansion(DCE) method. The dataset is represented by a graph. It introduces a density-based graph similarity. The constraint set is expanded by the similarity of the data samples. The expanded constraint set can be used in all semi-supervised clustering algorithms, including the constraint complete link algorithm and the pairwise constraint K means algorithm. Experimental results on several synthetic datasets and real-world datasets show that the DCE method can effectively enhance the performance of the semi-supervised clustering algorithms.