命名实体关系抽取是信息抽取领域中的重要研究课题之一。该文探讨了核方法在中文关系抽取上的有效性问题,主要分为三部分:研究了在卷积树核中使用不同的语法树对关系抽取性能的影响;通过构造复合核检查了树核与平面核之间的互补效果;改进了最短路径依赖核,将核计算建立在原最短依赖路径的最长公共子序列上,以消除原始最短路径依赖核对依赖路径长度相同的过严要求。因为核方法开始被用于英文关系抽取时,F1值也只有40%左右,而我们在ACE2007标准语料集上的实验结果表明,只使用作用在语法树上的卷积核时,中文关系抽取的F1值达到了35%,可见卷积核方法对中文关系抽取也是有效的,同时实验也表明最短路径依赖核对中文关系抽取效果不明显。
Entity Relation Extraction is one of the important research fields in Information Extraction. This paper explores the effectiveness of two kernel-based methods, the convolution tree kernel and the shortest path dependency kernel, for Chinese relation extraction based on ACE 2007 corpus. For the convolution kernel, the influence by the different parse tree spans on the performance of relation extraction is studied. Then, experiments with composite kernels, which are a combination of the convolution kernel and feature-based kernels, are conducted to investigate the complementary effects between tree kernel and flat kernels. Finally, we improve the shortest path dependency kernel by replacing the strict same length requirement with finding the longest common subsequences between two shortest dependency paths. Experiments prove that kernel-based methods are effective for Chinese relation extraction as well.