本文提出一种有效处理高维数据的聚类算法,算法首先通过构造特征链将文档集合划分为多个类别,同时在相似度计算及权值调整时考虑相似特征的影响以凝聚语义相似的文档,并动态调整文档权重使分布不平衡的文档得到充分训练.实验表明:该算法在高维空间能够获得较好的聚类结果,类内相似度高,类问区分性好,迭代次数较少.
A novel clustering algorithm for high dimensional data is proposed in this paper. This algorithm first partitions input document set into some clusters by constructing feature chains. Simultaneously it also considers the effects of similar features in similarity computation and weight adjustment to agglomerate documents with semantic similarities, and dynamically adjusts weights of documents to make unbalanced documents well trained. Experiment results demonstrate that it can obtain relatively better clustering results with high intra-cluster agglomeration and inter-cluster distinclness, and also has less iterative times.