传统K—means算法对初始聚类中心选择较敏感,结果有可能收敛于一般次优解,为些提出一种结合双粒子群和K-means的混合文本聚类算法。设计了自调整惯性权值策略,根据最优适应度值的变化率动态调整惯性权值。两子群分别采用基于不同惯性权值策略的粒子群算法进化,子代间及子代与父代信息交流,共享最优粒子,替换最劣粒子,完成进化,该算法命名为双粒子群算法。将能平衡全局与局部搜索能力的双粒子群算法与高效的K—means算法结合,每个粒子是一组聚类中心,类内离散度之和的倒数是适应度函数,用K—means算法优化新生粒子,即为结合双粒子群和K—means的混合文本聚类算法。实验结果表明,该算法相对于K—means、PSO等文本聚类算法具有更强鲁棒性,聚类效果也有明显的改善。
As traditional K-means clustering algorithm is sensitive to the choice of initial cluster centers, the results may con- verge to the general suboptimal solutions, this paper presented a hybrid text clustering algorithm based on dual particle swarm optimization and K-means algorithm. It designed self-adjusting inertia weight strategy which used rate of change of optimal fit- ness to adjust the inertia weight automatically. Two populations used PSO based on different inertia weight strategies in the process of evolution. Two populations shared the best individual and eliminated the worst individual by exchanging information between the two groups of offsprings as well as offsprings and parents to complete the evolution. The algorithm was named dual particle swarm optimization. The algorithm combined balancing ability of global and local search of dual particle swarm optimi- zation with efficiency of K-means. Every particle was a group of clustering centers and reciprocal of sum of scatter within class was fitness function, then optimized newborn particle with K-means. This was called hybrid text clustering algorithm based on dual particle swarm optimization and K-means algorithm. The results of experiment show that compared with other text cluste- ring algorithms like K-means and PS0 et al, this algorithm has strong robustness and better clustering results.