针对自然语言处理和信息检索传统算法存在影响因子单一、收敛性差、受数据稀疏和数据噪声的干扰较大等问题,提出一种以特定领域本体为基础的综合加权语义相似度计算方法。采用新的量化方法得到领域本体层次中关系路径权重的影响因子关系类型、强度、节点深度和密度,在此基础上获取两条概念关系边的语义距离;利用语义距离计算概念结点之间的语义相似度;采用地理领域的旅游本体部分层次网络进行实验分析和比较,实验结果表明,该算法能够有效改善概念语义相似度计算的准确性和有效性,能够获得更符合现实的信息检索结果。
Bacause traditional algorithms of natural language processing and information retrieval have single impact factor, poor convergence, larger sparse data and data noise interference and other defects, a weighted semantic similarity algorithm based on specific domain ontology is proposed. First of all, new quantitative method is used in ontology hierarchy to get the edge type, intensity, depth and density, based on these semantic distance between edges is obtained. Then, semantic similarity is calculated with the semantic distance of concepts. Finally, part of the geography domain tourism ontology hierarchical network is used in experiment to analysis and comparison. The results show that the algorithm can effectively improve the accuracy and validity of semantic similarity calculation of the concepts, and more realistic information retrieval results are obtained.