传统上,检索系统利用基于词语共现分析所生成的自构造词表,即词词关联矩阵来实现模糊检索,这种方式生成的词表存在词间关系单一、语词假相关、词义控制差等问题。本文结合传统叙词表,对自构造词表的词间关联度算法进行了改进,新算法丰富了词间关系类型。实证分析表明,新算法有助于提升系统的检索效率。本文首先阐明词间关联度现有算法及关系数据处理方式,并指出现有算法存在的问题,然后引入叙词表控制机制,针对四种词间关系控制情形提出了各自的词间关联度改进算法。最后通过集合从理论上分析改进算法和现有算法,并借助语词关系网实证改进算法对语词关系网的关联性的影响。
Traditionally, the fuzzy retrieval model use co-occurrence matrix to automatically generate source thesaurus, the co-occurrence matrix is also named "keyword connection matrix" , but there are some problems using this method, such as single concept relations, pseudo-correlation, badly control of word sense etc. By using the traditional thesaurus function, this article improves the existing algorithm on term-term relative value that riches the types of concept relations. It also been testified that it is helpful to improve retrieval efficiency. Firstly, this paper introduces the existing algorithms on term-term relative value and relational data treatment, and then we define the problems of the existing algorithm. Secondly, it introduces the thesaurus control mechanism, we present an improved algorithm based on four different kinds of concept relations control. Finally, we make the theoretical comparison between the improved algorithm and the existing algorithm through set analysis and empirically discuss the influence of keyword relative network' relevance with improved algorithm.