自动篇章处理是自然语言处理中非常有挑战的一个任务,对自然语言处理的其他任务,如问答系统,自动文摘以及篇章生成都有重要的作用。近年来,大规模篇章语料PDTB的出现为篇章研究提供了一个公共的平台。该文在PDTB语料之上提出了一个完整的基于条件随机场模型的显式篇章分析平台,该平台包含连接词识别、篇章关系分类和关系论元提取三个子任务。给出了在PDTB上各模块的实验结果,并针对错误传播问题,给出了完整平台的性能及详细分析。
Automatic discourse processing is considered as one of the most challenging NLP tasks which is helpful to many downstream NLP tasks,such as question answering,automatic summary and natural language generation.Recently,the large scale discourse corpus PDTB is made available,which provides a common platform for discourse researchers.On the basis of PDTB corpus,the paper proposes an end-to-end explicit discourse parser with conditional random fields.The parser consists of three components joined in a sequential pipeline architecture,which includes connective classifier,explicit relation classifier and relation argument extractor.We report the performance on each component,and,from error-cascading perspectives,we analyses the parser's overall performance in detail.