东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于话题相似性改进的K-means新闻话题聚类

ISSN号：1672-9722
期刊名称：《计算机与数字工程》
时间：0
分类：TP301.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：南京理工大学计算机科学与工程学院,南京210094
相关基金：国家自然科学基金项目“虚拟计算环境下的软件自愈机理和方法研究”（编号：61300053）资助.

关键词： K-MEANS算法, 新闻话题检测, 舆情监督, 文本相似性, 话题覆盖率, K-means algorithm, news topic detection, public opinion supervision, text similarity, topic coverage

中文摘要：

新闻话题聚类在舆情监督、热点话题发现、突发事件实时追踪等领域有着重要的应用.基于K-means的文本聚类算法由于算法简单易实现,时空复杂度低,聚类效果优异等特点被广泛用作新闻话题聚类算法.但传统的K-means算法又具有其局限性,如对初始中心点的选择敏感和用户必须自定义分组K等,导致算法收敛于局部最优而无法得到全局最优解.针对传统的K-means算法中初始聚类中心点随机选择导致聚类结果不稳定的问题,提出了一种改进的K-means算法用于新闻话题检测,该算法基于新闻报道相似性选择初始聚类中心点,保证各新闻话题集群具有很好的区分度.并在此基础上,根据新闻话题覆盖率自动确定话题集群个数K.实验结果表明,改进后的算法能够生成稳定的,高质量的话题集群.

英文摘要：

News topic clustering plays an important role in the field of public opinion supervision,hot topic detection and re-al-time tracking. The text clustering algorithm based on K-means is widely used as a news topic clustering algorithm because of its simple and easy implementation,low space-time complexity and excellent clustering results. However,the traditional K-means al-gorithm has its limitations,such as the choice of the initial center point and the user to customize the K and so on,which leads to the algorithm to converge to the local optimal and can not get the global optimal solution. According to the initial clustering center of the traditional K-means algorithm in random selection leads to clustering instability problem,topic clustering for an improved K-means algorithm is proposed,the algorithm reports similarity to select the initial cluster center based on guarantee the news topic cluster has a good discrimination. And on this basis,according to the coverage rate of the news topic to determine the number of clus-ters K. The experimental results show that the improved algorithm can generate stable and high quality topic clusters.

同期刊论文项目

虚拟计算环境下的软件自愈机理和方法研究

期刊论文 4

同项目期刊论文

基于蕴含关系的场景测试法路径优化方法研究

基于改进的自组织映射入侵检测方法

基于组合增量聚类的数据流异常检测研究

期刊信息

《计算机与数字工程》

主管单位:中国船舶重工集团公司
主办单位:中船重工集团公司七院第七0九研究所
主编：王小非
地址：武昌74223信箱
邮编：430074
邮箱：jssg@chinajournal.net.cn
电话：027-87534308 87534205

国际标准刊号：ISSN：1672-9722
国内统一刊号：ISSN：42-1372/TP
邮发代号:

获奖情况:

国内外数据库收录:

被引量:13630