半监督聚类是利用少部分监督信息辅助大量未标签数据进行非监督的学习,其聚类性能的改善依赖于监督信息,因此挖掘适合半监督聚类的监督信息非常关键.提出了一种基于监督信息特性的主动学习策略,即找出同一类中距离相对较远的数据对象对和不同类中距离相对较近的数据对象对组成监督信息,并将其引入谱聚类算法,构建新颖的主动半监督谱聚类算法ASSC(Active Semi-supervised Spectral Clustering).利用该监督信息调整谱聚类中点与点之间的距离矩阵,使类内各点紧聚,类间散布.通过对UCI基准数据集以及人工数据集的实验结果表明,ASSC算法优于采用随机选取监督信息的谱聚类性能.王娜,李霞王娜,李霞
Semi-supervised clustering uses a small amount of supervised data such as pairwise constraints to aid unsupervised learning.The improved clustering performance depends heavily on the choice of constraints.This makes it important to explore the appropriate pairwise constraints for semi-supervised clustering.This paper presents a method for actively selecting informative pairwise constraints,which corresponds to pick up data pairs far apart in the same cluster and those close in different clusters.An active semi-supervised spectral clustering(ASSC) is then developed by utilizing the selected pairwise constraints to adjust the distance matrix in spectral clustering.As a result,the intra-cluster distance is decreased and the inter-cluster distance is increased.Experimental results on UCI benchmark data sets and artificial data set show that these informative pariwise constraints lead to substantial performance enhancement over the random selective pairwise constraints spectral clustering.