关键词抽取是信息检索的一个重要研究话题。这篇论文基于分析新闻文件的语言学特征在中国新闻文件给了关键词的说明;然后与多策略基于 tf/idf 建议了一个新关键词抽取方法。途径选择了 uni- 的候选人关键词,双性人 -- 并且 tri- 克,;然后根据他们的词法人物的意见定义特征;上下文信息。而且,论文建议了几策略修改从词分割得到的不完全的词;在新闻文件发现未知潜在的关键词。试验性的结果证明我们的建议方法罐头显著地超过基线方法。我们也把它用于回顾的事件察觉。试验性的结果出现精确性;回顾的事件察觉能显著地被改进的新闻的效率。
Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new keyword extraction method based on tf/idf with multi-strategies. The approach selected candidate keywords of uni-, hi- and tri-grams, and then defines the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found unknown potential keywords in news documents. Experimental results show that our proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection. Experimental results show that the accuracy and efficiency of news retrospective event detection can be significantly improved.