现有的基准词选择方法存在着随机性和主观性的缺陷,提出了一种基于词聚类的基准词的选择方法:从目标领域本体中选出一组初始种子词进行扩展,聚类得出二代种子词,对二代种子词再进行扩展、聚类,依次迭代直至得到最优的聚类种子词,并作为最终选取的基准词。实验结果表明该方法提取的基准词在词的情感倾向分类中具有较高的准确率。
This paper put forward a method of selecting paradigm words, which was based on the existing randomness and sub- jectivity issue. Firstly, it expanded words by a group of selected initial seed words;secondly, it obtained the second generation of seed words by means of hierarchical clustering. According to the similarity between two different expanded words, then it ex- panded and clustered the second generation seed words. At last it orderly iterated by same procedure to get the optimal cluste- ring seed words as the final selected paradigm words. The experiment result indicates that the new method has a higher accuracy in selecting paradigm words while classifying the different emotional proclivities.