针对现有半监督最大间隔聚类算法在不同类别中有不少样本非常相似的情况下难以提高聚类准确度的问题,提出了下述解决策略:首先,基于最大化间隔准则设计一种鲁棒的成对约束损失函数,即使不同类别有较多样本非常相似,该函数仍然能有效地检测不能满足成对约束的聚类结果,并提供相应的惩罚,从而能较好地提高聚类的性能。其次,基于约束凹凸过程设计一种迭代算法进行求解。进而,基于这一策略,提出了一种新的聚类算法——鲁棒的成对约束最大化间隔聚类(BPCMMC)算法。实验结果表明,该算法能有效克服现有半监督最大间隔聚类算法的不足,其聚类错误率明显低于传统的半监督聚类算法。
To solve the problem that the existing semi-supervised maximum margin clustering algorithm does not work ro- bustly when lots of very similar samples exist in different categories, this study adopted the tactics below : Firstly, design a robust loss function for violating the pairwise constraints based on the maximum margin principle, which features robust penalization to the violation of the pairwise constraints; Secondly, design an iterative algorithm based on the constrained concave-convex procedure (CCCP) to improve the clustering accuracy. Based on the tac- tics, a new semi-supervised clustering algorithm, the robus pairwise constrained maximum margin clusting (RPCM- MC) algorithm, was put forward. The experimental results demonstrate that the proposed algorithm can overcome the drawbacks of the existing semi-supervised maximum margin clustering algorithm and outperform some represent- ative semi-supervised clustering algorithms.