距离度量对模糊聚类算法FCM的聚类结果有关键性的影响。实际应用中存在这样一种场景,聚类的数据集中存在着一定量的带标签的成对约束集合的辅助信息。为了充分利用这些辅助信息,首先提出了一种基于混合距离学习方法,它能利用这样的辅助信息来学习出数据集合的距离度量公式。然后,提出了一种基于混合距离学习的鲁棒的模糊C均值聚类算法(HR—FCM算法),它是一种半监督的聚类算法。算法HR.FCM既保留了GIFP-FCM(Generalized FCM algorithm with improved fuzzy partitions)算法的鲁棒性等性能,也因为所采用更为合适的距离度量而具有更好的聚类性能。实验结果证明了所提算法的有效性。
The distance metric plays a vital role in the fuzzy C-means clustering algorithm. In actual applications, there is a practical scenario in which the clustered data have a certain amount of side information, such as pairwise constraints with labels. To sufficiently utilize this side information, first, we propose a learning method based on hybrid distance, in which side information can be utilized to attain a distance metric formula for the data set. Next, we propose a robust fuzzy C-means clustering algorithm (HR-FCM algorithm) based on hybrid-distance learning, which is semi-supervised. The HR-FCM inherits the robustness of the GIFP-FCM ( generalized FCM algorithm with improved fuzzy partitions) and has better clustering performance due to the more appropriate distance metric. The experimental results confirm the effectiveness of the proposed algorithm.