探讨了“分裂-合并”(DM)策略对文本聚类集成算法改进的效果。首先在聚类成员生成阶段运行使用DM策略的超球K均值(SKM)算法r次,每次生成较多的文本子簇,并根据子簇的相似性使用凝聚层次聚类方法合并这些子簇,得到r个聚类成员,随后在聚类集成阶段采用两个快速的谱聚类算法进行集成。在6组真实文本集上进行了实验,使用DM策略的两个聚类集成算法获得的平均标准化互信息(NMI)分别比改进前的算法提高了4.6和7.9个百分点,证明了DM策略可以有效提高文本聚类集成算法的聚类质量。
The influence of the divide and merge (DM) strategy on document cluster ensemble algorithms was explored. Firstly, the spherical K-means (SKM) algorithm utilizing the DM strategy was performed for r times in the ensemble member generation phase, and each time more document sub-clusters were obtained and the agglomerative hierarchical method was used to merge these sub-clusters according to their similarity to attain r ensemble members. Then, two fast spectral clustering algorithms were performed to ensemble the r clusterings. The experiments on six real-world document sets showed that the DM strategy increased the normalized mutual information (NMI) of the two cluster ensemble algorithms by 4.6 and 7.9 percentage in average, respectively. These results prove that DM strategy can effectively improve the performance of document cluster ensemble algorithms.