基准词是具有明确褒贬义倾向的词汇,基准词的选择对词汇倾向性判别的准确率有影响。现有的基准词选择方法主要关注的是基准词的频率、类别区分度和上下文敏感性,忽略了基准词的褒贬强度,这导致了词汇乃至更大粒度的语言单元的语义信息遗失。本文提出了一种含强度的基准词选择和词汇倾向性判别方法,首先对情感词进行语义相似性计算和聚类,然后进行倾向性计算和分类,由此得到包含语义及强度信息的基准词集,该基准词集可用于词汇褒贬性及褒贬强度的判别。我们分别使用通用搜索引擎和领域搜索引擎对该方法进行了验证,实验结果表明,领域搜索引擎下的词汇褒贬性及正负性词的褒贬强度判别准确率分别可以达到84.00%、80.49%和76.47%.
Paradigm words are the words that have unambiguous sentiment orientation. The choice of the paradigm words has effect on the accuracy of word sentiment orientation discrimination. Previous work concentrated on the frequency, category distinguishing ability and sensitivity to context of the paradigm words, and paid little attention to the intensity of sentiment orientation, which lead to the loss of the semantic information of words and superior linguistic cells. We propose a method for paradigm words selection and word sentiment orientation discrimination. In this method the words are clustered based on the semantic similarity, and then classified based on sentiment orientation. The Paradigm words with semantic and intensity information can be used for word sentiment orientation discrimination. The method was evaluated through common Search engine and domain search engine, the experiment results indicate that the accuracy of word sentiment orientation discrimination reach 84.00% , 80.49% and 76.47% when the domain search engine was used.