将基于特征向量的平面核和基于句法分析树的结构核组合,进行中文实体关系抽取。首先进行特征选择实验,为构造平面核中的特征向量选择最优特征集合,特征包括实体大类、实体子类、实体类别等实体信息以及实体对在句子中的前后词信息。在定义结构核函数时,从包含两个实体的句子中提取最短路径包含树(shortestpathtree,SPT),然后使用卷积树核函数来计算两棵SPT树的相似度。在ACERDC2005中文语料库上进行实体关系大类的抽取实验,其F值达到了68.50%,比两个单独核函数的方法分别提高4.36%和17.37%。同时,在组合核中也进行了特征选择实验,得到了最好关系抽取性能的F值为70.58%,说明单独平面核的最优特征集在组合核中未必最优。结果表明,本文利用实体语义信息构造平面核并与结构核组合,对于中文实体关系抽取具有较好的性能。
This paper combines the feature-based method and the Shortest Path Tree kernel method to extract relations between Chinese entities. First, the experiment to choose the best feature set for the feature-based method is carried on. The best feature set includes entity type, entity subtype and entity class, etc. To define the Shortest Path Tree kernel, we extract the Shortest Path Tree (SPT) from the sentence parsing result. Then we use the convolution kernel to calculate the similarity between two SPTs. The F-score based on the ensemble kernel on the ACE RDC 2005 corpus is 68.50% , which is higher than that based on every single kernel method by 4.36% and 17.37% respectively. Furthermore, we also choose the best feature set for the ensemble kernel method by experiment. The result shows that the best feature set for the feature based kernel method is not the best one for the ensemble kernel. The F-score based on the ensemble kernel is 70.58% using its best feature set. , The result presents that the ensemble kernel method, combined the tree kernel with the feature- based kernel benefit from the entity semantic information, performs better for extracting the relations between Chinese entities.