为了更有效地确定数据集的最佳聚类数,提出一种新的确定数据集最佳聚类数的算法。该算法借签层次聚类的思想,一次性地生成所有可能的划分,然后根据有效性指标选择最佳的聚类划分,进而获得最佳聚类数。理论分析和实验结果证明,该算法具有良好的性能。
To attack determining the optimal number of clusters problem effectively, a novel algorithm which can deter- mine the optimal clustering number in a massive dataset based on hierarchical clustering is proposed in this paper, it gener- ates all possible divisions at one time, then selects the optimal one based on the validity index, and thus obtains the opti- mal clustering number of dataset. Theoretical analysis and experimental results have verified the effectiveness and good per- formance of the algorithm.