针对在核函数方法中,传统短语树只包含通用领域信息难以训练出适应特定领域关系抽取模型的问题,本文提出一种融入领域知识短语树的中文领域实体关系抽取方法.基于Web上中文特定领域网站的信息结构特点,构建出能反映特定领域实体语义关系的领域知识树,并将其融合到实例句的句法树中,得到特定领域实体语义树.然后通过支持向量机训练,得到实体关系的分类模型,对特定领域实体关系进行抽取.在收集的600篇旅游领域语料上进行关系抽取实验,结果表明:本文所提出的方法优于不融入领域信息的方法,F值提高了3.4%.
To solve the problem that the traditional tree kernel method is not able to train the suited model to extract entity relation in given domain,this paper proposed a method of Chinese domain entity relation extraction based on domain knowledge phrasal tree.Based on the features in web page of Chinese domain-specific website,this paper structured a domain knowledge tree which can reflect semantic information between domain entities,and fuse the information into the traditional phrasal tree.Finally,this paper obtained a classification model of entity relationship by using support vector machine to extract entity relation in given domain.Through the relation extraction experiments on collecting 600 corpuses in tourist domain,the experimental results show that the presented method is better than the traditional tree method,and the Fvalue increases 3.4%.