以Xie-Beni指标作为聚类有效性函数取得了良好的效果,但当聚类个数很大时,Xie-Beni指标将单调递减。针对此问题,分别考察改进的Hubertr统计量和聚类分离度,导出一个新的基于数据几何结构的聚类有效性函数。使得它有惟一的最大值,函数值随聚类个数增大而递减的趋势并不影响最优聚类个数的判定。实验表明,该有效性函数能够发现最优的聚类个数,对于分类结构比较明确的数据,有良好的性能,而且对模糊因子m有良好的鲁棒性。
Many clustering validity functions have been proposed, especially those based on the geometrical structure of data set, such as Dunn's index and Xie-Beni index. Xie-Beni index decreases with the number of partitions increasing. It is difficult to choose the optimal partition of data when the number of clusters is large. From the point of view of the compactness and the separation of clustering, a novel clustering validity function is proposed, which is based on the improved Huber P statistic combined with the separation of clustering. The function has the only maximum with the number of clusters increasing. The experiment indicates that the function is simply, precise and robust, can be used as the optimal index for choosing the optimal partition of data.