文本聚类是聚类的一个重要的研究方向,是聚类在文本处理领域的重要应用。但是,传统的聚类算法在文本聚类应用中的表现并不能让人满意。文中将一种新的聚类算法——Chameleon算法引入中文文本聚类领域中。在构建中文文本聚类模型的基础上结合了分词、文本向量化等技术进行了相关实验。实验的结果表明Chameleon算法可以应用在中文文本聚类领域中,同时也解决了传统算法在聚类形状发现方面的不足。相关实验说明了这种算法在中文文本聚类领域应用中的有效性和实用性。
Text clustering,one of the most important research branches of clustering,is the application of clustering algorithm in the text processing.The performance of traditional clustering algorithm in the Chinese text processing may not be satisfying.In this paper,a new clustering algorithm,Chameleon is introduced to text processing.Combining with segmentation algorithm and vector technology some experiments have been done based on Chinese text clustering model.The experiments will prove that Chameleon can be used in the field of Chinese text clustering and it will do better than traditional algorithm in shape finding. Finally,series of experiments will prove that Chameleon algorithm is more effective and practical than traditional text clustering algorithm.