东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于Chameleon算法的文本聚类技术研究

ISSN号：1005-3751
期刊名称：计算机技术与发展
时间：0
页码：1-4
语言：中文
分类：TP311.5[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]上海交通大学信息安全工程学院,上海200240
相关基金：国家自然科学基金项目（60772098）; 教育部新世纪优秀人才支持计划项目（NCET-0600393）; 上海市科学技术委员会科技攻关项目（08511501902）; 2007年上海市曙光计划（IAP1027）
相关项目：基于特征概念网的网上离散文本信息舆情分析研究

关键词：文本聚类, CHAMELEON, 文本向量, text clustering, Chameleon, text vector

中文摘要：

文本聚类是聚类的一个重要的研究方向,是聚类在文本处理领域的重要应用。但是,传统的聚类算法在文本聚类应用中的表现并不能让人满意。文中将一种新的聚类算法——Chameleon算法引入中文文本聚类领域中。在构建中文文本聚类模型的基础上结合了分词、文本向量化等技术进行了相关实验。实验的结果表明Chameleon算法可以应用在中文文本聚类领域中,同时也解决了传统算法在聚类形状发现方面的不足。相关实验说明了这种算法在中文文本聚类领域应用中的有效性和实用性。

英文摘要：

Text clustering,one of the most important research branches of clustering,is the application of clustering algorithm in the text processing.The performance of traditional clustering algorithm in the Chinese text processing may not be satisfying.In this paper,a new clustering algorithm,Chameleon is introduced to text processing.Combining with segmentation algorithm and vector technology some experiments have been done based on Chinese text clustering model.The experiments will prove that Chameleon can be used in the field of Chinese text clustering and it will do better than traditional algorithm in shape finding. Finally,series of experiments will prove that Chameleon algorithm is more effective and practical than traditional text clustering algorithm.

同期刊论文项目