为确定-means等聚类算法的初始聚类中心,首先由样本总量及其取值区间长度确定对应维上的样本密度统计区间数,并将满足筛选条件的密度峰值所在区间内的样本均值作为候选初始聚类中心;然后,根据密度峰值区间在各维上的映射关系建立候选初始聚类中心关系树,进一步采用最大最小距离算法获得初始聚类中心;最后为确定最佳聚类数,基于类内样本密度及类密度建立聚类有效性评估函数.针对人工数据集及UCI数据集的实验结果表明了所提出算法的有效性.
In order to select the initial clustering centers for the divisional clustering algorithm such as the -means algorithm, the sample density calculating regions number of each dimension is confirmed according to the samples number and their values, firstly. Then, the average value of the samples of the region with peak value satisfying the filtering conditions is taken as the candidate for the initial clustering center, and a relationship tree of the candidates is established on the mapping relations of the regions. Furthermore, the initial clustering centers are selected by using the maximal-minimal distance algorithm. To confirm the best number of the clusters, a clustering quality evaluation function is established according to the sample density and cluster density. Experiment results of the manual and UCI data sets show the effectiveness of the proposed algorithms.