为了更有效地分析聚簇重叠部分高阶异构数据的聚簇结果,提出了一种高阶异构数据模糊联合聚类(HFCC)算法,该算法最小化每个特征空间中对象与聚簇中心的加权距离。推导出对象隶属度和特征权重的迭代更新公式,设计出聚类过程的迭代算法,并且从理论上证明了该迭代算法的收敛性。另外,通过泛化XB指标,提出适用于评估高阶异构数据聚类质量的指标GXB,用于判断聚簇数目。实验表明,HFCC算法能够有效探测数据内部隐藏的重叠聚簇结构,并且HFCC算法聚类效果明显优于5种有代表性的硬划分算法,此外GXB指标能够有效判定高阶异构数据的聚簇数目。
In order to analyze the clustering results of high-order heterogeneous data at the overlaps of different clusters more efficiently, a fuzzy co-clustering algorithm was developed for high-order heterogeneous data (HFCC). HFCC algo- rithm minimized distances between objects and centers of clusters in each feature space. The update rules for fuzzy memberships of objects and weights of features were derived, and then an iterative algorithm was designed for the clus- tering process. Additionally, convergence of iterative algorithm was proved. In order to estimate the number of clusters, GAB validity index was proposed by generalizing the AB validity index, which could measure the quality of high-order clustering results. Finally, experimental results show that HFCC can efficiently mine the overlapped clusters and the qualities of clustering results of HFCC are superior five classical hard high-order co-clustering algorithms. Additionally, GAB validity index can efficiently estimate the number of high-order clusters.