提出一种利用百度百科半结构化数据自动获取词语相似度的方法,该方法将百科词条与其相关词条看做有向图的两个节点,且两节点相互之间存在着链接关系,然后利用 SimRank 算法计算百科词条语义相似度。实验表明,该方法优于传统的词语语义相似度测量,能准确地反映词语之间的语义关系。
The measurement of the semantic similarity using semi-structured data on Baidu encyclopedia was proposed. The encyclopedia entries and related entries were considered as two nodes of a directed graph, of which there was a link between two nodes.Then SimRank algorithm was used to calculate the semantic similarity of encyclopedia entries.Experimental results showed that the proposed measure significantly outperformed the traditional similarity measures, and might accurately reflect the semantic relationship between words.