词的重要性评价是关键词抽取研究中比较重要的环节,其中一种比较常用的方法是对词的相关属性进行加权分析,并根据综合权值确定重要性程度.词所处的位置、词频、词性以及与线索词的同现信息等都是影响关键词抽取的重要因素.本文首先对可能影响关键词抽取的因素进行了探讨和分析,而后利用遗传算法对各个知识源参数进行了优化.在人工标注的语料上进行的测试结果验证了该方法的可行性.
The evaluation of word importance is one of the important steps for keyword extraction. Currently a popular extraction method is to evaluate the comprehensive weight for every content word in terms of their attributes, the chance for a content word to be selected as keyword is determined by its comprehensive weight. Word location, word frequency, word POS and the concurrency with cue words are all key elements for the computation of comprehensive weight. In this paper, the impacts of these elements on keyword extraction are first analyzed from the theoretical and statistical angle, and then GA is utilized to optimize the coefficient of these attributes. The test on the human-tagged corpus verifies that our method is feasible.