针对领域知识特点和当前基本查询扩展方法的局限性,提出了在特定领域问答系统中将命名实体识别与基本查询扩展方法相结合对答案文本检索进行查询扩展的方法。该方法标注旅游领域18个类别的命名实体,并使用条件随机场理论建立实体识别模型,将识别模型以线性插值方式分别融合到本文选用的基于TF-IDF、互信息和局部上下文分析3种基本查询扩展方法中,再选取出扩展词进行查询。在旅游领域数据集上的实验结果表明,该方法在3种基本查询扩展方法基础上使准确度提高15.8%以上,其中结合领域命名实体识别和局部上下文分析的查询扩展方法的准确度提高了21.4%。
For the characteristics of domain knowledge and the current limitations of the basic query expansion method, a new query expansion method of domain text retrieval by combining special named entity recognition (SNER) and basic query expan- sion method is proposed. First, eighteen categories of named entities are marked, and then the theory of conditional random fields (CRFS) is adopted to establish entity recognition model, finally, recognition model is integrated into the local context anal- ysis of query expansion method by linear interpolation method to select expansion terms. The experimental result of tourism data sets shows the proposed method is superior to the existing three basic expansion methods: the accuracy is improved by 15.8%. In particular, the method of fusion LCA and SNER improved 21.4%.