在基于语义的查询扩展中,为了找到描述查询需求语义的相关概念,词语-概念相关度的计算是语义查询扩展中的关键一步.针对词语概念相关度的计算,提出一种K2CM(keyword to concept method)方法.K2CM方法从词语-文档-概念所属程度和词语-概念共现程度两个方面来计算词语-概念相关度.词语-文档-概念所属程度来源于标注的文档集中词语对概念的所属关系,即词语出现在若干文档中而文档被标注了若干概念-词语-概念共现程度是在词语概念对的共现性基础上增加了词语概念对的文本距离和文档分布特征的考虑.3种不同类型数据集上的语义检索实验结果表明与传统方法相比,基于K2CM的语义查询扩展可以提高查询效果.
In semantic-based query expansion, computing term-concept association is a key step in finding associated concepts to describe the needed query. A method called K2CM (keyword to concept method) is proposed to compute the term-concept association. In K2CM, the attaching relationship among term, document and concept together with term-concept co-occurrence relationship are introduced to compute term-concept association. The attaching relationship derives from the fact that a term is attached to some concepts in annotated corpus, where a term is in some documents and the documents are labeled with some concepts. For term-concept co-occurrence relationship, it is enhanced by the text distance and the distribution feature of term-concept pair in corpus. Experimental results of semantic-based search on three different corpuses show that compared with classical methods, semantic-based query expansion on the basis of K2CM can improve search effectiveness.