近些年来,中文语义角色标注得到了大家的关注,不过大多是传统的基于句法树的系统,即对句法树上的节点进行语义角色识别和分类。该文提出了一种与传统方法不同的处理策略,我们称之为基于语义组块分析的语义角色标注。在新的方法中,语义角色标注的流程不再是传统的"句法分析——语义角色识别——语义角色分类",而是一种简化的"语义组块识别——语义组块分类"流程。这一方法将汉语语义角色标注从一个节点的分类问题转化为序列标注问题,我们使用了条件随机域这一模型,取得了较好的结果。同时由于避开了句法分析这个阶段,使得语义角色标注摆脱了对句法分析的依赖,从而突破了汉语语法分析器的时间和性能限制。通过实验我们可以看出,新的方法可以取得较高的准确率,并且大大节省了分析的时间。通过对比,我们可以发现在自动切分和词性标注上的结果与在完全正确的切分和词性标注上的结果相比,还有较大差距。
In recent years, the Chinese SRL (semantic role labeling) has aroused the intensive attention. Many SRL systems have been built on the parsing trees, in which the constituents of the sentence structure are identified and then classified. In contrast, this paper establishes a semantic chunking based method which changes the SRL task from the traditional "parsing-semantic role identification-semantic role classification" process into a simple "semantic chunk identification-semantic chunk classification" pipeline. The semantic chunking, which is named after the syntactic chunking, is used to identify the semantic chunk, namely the arguments of the verbs. Based on the semantic chunking result, the Chinese SRL can be changed into a sequence labeling problem instead of the classification problem. We apply the conditional random fields to the problem and get better performance. Along with the removal of the parsing stage, the SRL task avoids the dependence on parsing, which is always the bottleneck both of speed and precision. The experiments have shown that the outperforms of our approach previously best-reported methods on Chinese SRL with an impressive time reduction. We also show that the proposed method works much better on gold word segmentation and POS tagging than on the automatic results.