东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

权吸引子传播算法及其在文本聚类中的应用

期刊名称：计算机研究与发展，2010, 47(10): 1733-1740.（EI）
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]吉林大学计算机科学与技术学院,长春130012, [2]符号计算与知识工程教育部重点实验室吉林大学,长春130012, [3]内蒙古民族大学计算机科学与技术学院,内蒙古通辽028000, [4]计算机软件新技术国家重点实验室（南京大学）,南京210093, [5]吉林大学地球科学学院,长春130061
相关基金：国家自然科学基金项目（60673023 10872077 60703025）; 国家“八六三”高技术研究发展计划基金项目（2009AA02Z307）; 吉林省科技发展基金项目（20080708 20090152）; 内蒙古自然科学基金项目（20080404MS0908）; 内蒙古民族大学博士科研启动基金项目（BS191）; 吉林大学研究生创新基金项目（20091024）
相关项目：基于结构Motif方法的蛋白质与DNA相互作用研究

关键词：吸引子传播算法, 相似特征集, 排斥特征集, 仲裁特征集, 文本聚类, affinity propagation, similar feature set, rejective feature set, arbitral feature set, text clustering

中文摘要：

吸引子传播算法（affinity propagation,AP）是一种新的高效聚类算法.由于AP算法简单易用,它已被广泛应用到数据挖掘的各个领域.在AP算法中,相似性度量具有重要作用.另一方面,传统的文本挖掘常采用向量空间模型和满足欧氏空间的相似性度量.这种方法的优点是简单且易于实现,但随着数据规模的膨胀,向量空间将变得高维稀疏并将导致计算复杂度快速增长.为解决此问题,给出了相似特征集、排斥特征集和仲裁特征集的概念,在这些概念的基础上提出了一种能够包含文本结构信息的非欧空间相似性度量方法.并提出了一种新的聚类算法,称之为权吸引子传播算法（weight affinity propagation,WAP）.为检验提出算法的聚类效果,选用标准数据集Reuters-21578进行了验证.实验结果表明WAP明显优于k-means聚类算法、具备非线性特征的SOFM聚类算法和采用经典相似性度量的吸引子传播算法等3种经典聚类算法.

英文摘要：

Affinity propagation（AP）is a newly developed and effective clustering algorithm.For its simplicity,general applicability,and good performance,AP has been used in many data mining research fields.In AP implementations,the similarity measurement plays an important role.Conventionally,text mining is based on the whole vector space model（VSM）and its similarity measurements often fall into Euclidean space.By clustering texts in this way,the advantage is simple and easy to perform.However,when the data scale puffs up,the vector space will become high-dimensional and sparse.Then,the computational complexity grows exponentially.To overcome this difficulty,a non-Euclidean space similarity measurement is proposed based on the definitions of similar feature set（SFS）,rejective feature set（RFS）and arbitral feature set（AFS）.The new similarity measurement not only breaks out the Euclidean space constraint,but also contains the structural information of documents.Therefore,a novel clustering algorithm,named weight affinity propagation（WAP）,is developed by combining the new similarity measurement and AP.In addition,as a benchmark dataset,Reuters-21578 is used to test the proposed algorithm.Experimental results show that the proposed method is superior to the classical k-means,traditional SOFM and affinity propagation with classic similarity measurement.

同期刊论文项目