汉语句群自动划分是将篇章划分成包含不同主题的文本片段,在信息提取、文摘生成、语篇理解及其他多个领域有着极为重要的应用。指代消解是识别篇章中先行词和照应词关联起来的过程,消解不同表达是自然语言理解的基础之一。针对目前的句群划分工作的重点在于划分出主题之间的边界而较少利用其本身指代关系来进行语言理解,或者因指代模糊而得到错误的划分结果的问题,提出了一种基于指代消解的句群自动划分方法。该方法从对篇章的指代情况消解出发,利用适合中文的多层过滤指代消解方法得到指代链信息,以消除不同名词代表相同实体、代词指代不明的问题。结合指代链信息,并同时考虑篇章衔接词因素,设计并进行了基于多元判别分析(Multiple Discriminate Analysis,MDA)的一组评价函数J评价句群划分验证实验。实验结果表明,所提出的方法能够有效地进行句群自动划分,统计正确分割平均札提高了7%左右。
Automatic Chinese sentence grouping is to divide the text into texts fragments with different theme and plays an important role in information extraction, summary generation, sentence comprehension and other fields. Coreference resolution is a procedure of recogniz- ing antecedent and anaphora and associating them in the chapter. Resolution of the different expression is one of the basis of natmal lan- guage understanding. Currently, focus of automatic Chinese sentences grouping is recognizing boundaries of different topics. Instead, the coreference relations of passage are rarely used for language comprehension, and inaccurate results are usually existed due to vagueness resolution. So an automatic Chinese sentences grouping method based on coreference resolution is proposed, which starts with resolution of the passages and get link of resolution with multi-layer filter resolution method to eliminate different terms referred to the same entity or to unknown. Besides, the cohesive markers of passages are taken into account. A group of evaluation functions are designed to evaluate sentences grouping and the experimental results show that it has improved the Chinese sentences grouping work, by which Pμ has in- creased about 7%.