作为语义异构问题的基础,概念间语义相似度计算已成为研究热点,对此,提出一种基于WordNet的综合概念语义相似度计算方法.该方法不仅集成了传统的基于语义距离的算法和基于信息内容的算法,而且引入了深度、密度因子和语义重合度来进行综合分析,并针对综合算法中权值难以确定的问题,引入主成分分析改进权值分配方法.实验结果表明,改进后的方法计算的相似度与人工判断的相似度相关性较高,有效改善了概念语义相似度计算的准确性.
As the basis of the semantic heterogeneity, the calculation of semantic similarity between con- cepts has become a hot topic. A calculation method based on the comprehensive concept of the semantic similarity of WordNet is presented. The method integrates traditional semantic distance-based algorithm, content-based algorithm, introduces the depth, density factor and semantic coincidence degree to conduct a comprehensive analysis. In order to determine the right weights in the synthesis algorithm, a principal component analysis is proposed to improve the weight allocation. Experiments show that the similarity of the proposed method has good correlation with similarity to the artificial one, thus the accuracy of the concept of semantic similarity calculation is improved effectively.