中科院刘群的基于《知网》的词语相似度计算是当前比较有代表性的计算词语相似度的方法之一。在测试中我们发现对一些存在对义或反义的词语与同义、近义词语一样具有较高的相似度,一些明显相似的词反而相似度较低,如“美丽”与“贼眉鼠眼”的相似度为0.814815,与“优雅”的相似度为0.788360,“深红”与“粉红”的相似度仅为0.074074,这将不利于进行词语的极性识别。基于文本情感色彩分析的需要,把词语相似度的取值范围规定为[-1,+1],在刘群论文的基础上,进一步考虑了义原的深度信息,并利用《知网》义原间的反义、对义关系和义原的定义信息来计算词语的相似度。在词语极性识别实验中,得到了较好的实验结果:P值为99.07%,R值为99.11%。
Word similarity computing based on the "HorNet" of Liu Qun is a representative method to compute the word similarity. But it is found that some words with contrastive or contradictive meanings are computed with high similarity compared those true synonymous. To resolve this defect for the word polarity analysis, we confine the value of word similarity between [-1, +1] in this paper, and enhance the word similarity computation on the basis of Liu's paper by employing sememes' depth information, the antonym and definition information of the sememe. This method produces a good performance in the word polarity recognition experiment, achieving 99.07 % in accuracy and 99.11% in recall.