半监督聚类算法通常利用标注数据优化类别描述参数(如类的中心),然后通过类别描述参数划分无标注数据的类别,但是没有考虑标注数据对其周围无标注数据的类别划分的直接作用。文中提出一种双向选择调整策略,在根据类别描述参数对数据进行类别划分之后,利用标注数据调整其周围未标注数据的类别标签,从而提高类别划分的准确度。该方法根据标注数据周围的数据密度来动态确定数据调整范围,并采用新的相似度计算方法提高被调整的数据准确度。文中利用双向选择调整策略改进了基于多项式模型的半监督聚类算法和半监督模糊聚类算法,并使用多个标准数据集进行实验。实验结果表明改进的算法有效提高了半监督聚类的准确性。
Usually, semi-supervised clustering algorithms utilize a small amount of labeled data to improve cluster parameters which guide the clustering of unlabeled data. However, the existing semi-supervised clustering algorithms ( such as cluster centroid) ignore the labeled data could directly affect the clustering of unlabeled data. It proposes a double adjustment strategy which adjusts unlabeled data clustering with the labeled information, after the data is clustered according to the cluster parameters. Thus, the proposed method improves the cluste- ring accuracy. The adjustment extension is changed dynamically by the local density around the labeled~data. And a novel similarity meas- ure is proposed to improve the accuracy of the adjusted unlabeled data. It modifies two algorithms,based on mulfinomial model semi-su- pervised clustering algorithm and semi-supervised fuzzy clustering algorithm, with the double adjustment method. Experimental results show that the method could improve the accuracy of semi-supervised clustering.