关键词抽取是自然语言理解领域中的重要技术之一.本文研究汉语语言所组成的自然语言网络中的复杂网络特性,并根据语言网络中的“小世界”特性和近两年复杂网络研究中部分新的理论成果,提出基于复杂网络特征的中文文档关键词抽取算法.该算法根据文档语言网络中单词结点的复杂网络特征值进行关键词抽取.实验结果表明,本文算法抽取关键词所获得的平均准确率要高于TFIDF关键词抽取算法所获得的平均准确率.
Automatic keyword extraction is one of the most important techniques in natural language processing. In this paper, features of complex networks composed of Chinese are studied. A novel automatic keyword extraction algorithm for Chinese document is proposed which is based on the features of the complex networks according to the small world structure in language networks and'the theoretical achievements in complex networks. It extracts keyword based on the feature values of the word nodes in a documental language network. Experimental results show the proposed algorithm obtains higher average precision compared with the keyword extraction algorithm based on TFIDF.