东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于小世界模型的复合关键词提取方法研究

期刊名称：中文信息学报
时间：0
页码：121-128
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]西安电子科技大学智能信息处理研究所,陕西西安710071, [2]西安邮电学院信息中心,陕西西安710061, [3]西安电子科技大学图书馆,陕西西安710071
相关基金：国家自然科学基金资助项目（60803162）;陕西省自然科学基金资助项目（SJ08-ZT15）;陕西省教育厅科研计划资助项目（08JK245）
相关项目：基于多阶段可用性的Web 服务组合管理关键技术研究

关键词：计算机应用, 中文信息处理, 小世界网络, 词语网络, 平均最短路径变化量, 聚类系数变化量, 复合关键词, computer application, Chinese information processing, small world network, term network graph, average shortest path length increment, average clustering coefficient increment, compound keywords

中文摘要：

该文提出了一种新的基于小世界网络特性的关键词提取算法。首先,利用K最邻近耦合图构成方式,将文档表示成为词语网络。引入词语聚类系数变化量和平均最短路径变化量来度量词语的重要性,选择重要性大的词语组成候选关键词集。利用侯选关键词集词语位置关系和汉语词性搭配关系,提取出复合关键词。实验结果表明该方法是可行和有效的,获取复合关键词比一般关键词所表达的含义更便于人们对文本的理解。

英文摘要：

In this paper, a new algorithm is proposed for extracting compound keywords from the Chinese document by the small world network. Using k-nearest-neighbor coupled graph, a Chinese document is first represented as a network： the node represent the term, and the edge represent the co-occurrence of terms. Then, two variables, clustering coefficient increment and average path length increment, are introduced to measure term＇s importance and to generate the candidate keyword set. With factors such as co-operation between two any terms of part of speech in a sentence and the neighborhood between any two terms of the candidate set, some related words in the candidate set are combined as the compound keywords. The experimental results show that the algorithm is effective and accurate in comparision with the manual keywords extraction from the same document. The semantic representation by the compound keywords of a document is far more clearer than that of single keywords set, facilitating a better compre hension of the document.

同期刊论文项目