深入研究了中英文指代消解中的待消解项识别问题.在前人工作的基础上,首先使用规则方法识别与上下文无关或具有显著固定模式的非待消解项;针对与上下文相关的非待消解项识别,从平面特征方法和结构化树核函数方法两方面入手进行了探索;利用复合核函数将平面特征和结构化特征有效结合,对待消解识别问题进行了进一步研究.在ACE2003英文语料和ACE2005中文语料上的实验结果表明,提出的多种待消解项识别方案各具特色,都取得了不错的性能.最后将得到的待消解项识别模块应用于中英文的指代消解任务.实验结果表明,合适的待消解项识别能够大大提高中英文指代消解的性能.
This paper systematically explores noun phrase anaphoricity determination for coreference resolution in both English and Chinese languages in various ways. Firstly, a rule-based method is used to detect the non-anaphors which are insensitive to the context or have some obvious patterns. Then, both flat feature-based and structured tree kernel-based methods are used to determinate the non- anaphors sensitive to the context. Finally, a composite kernel is proposed to combine the flat features with structured ones to further improve the performance. Experimental results on both the ACE 2003 English corpus and the ACE 2005 Chinese corpus show that all the proposed methods perform well on anaphoricity determination. In addition, the anaphoricity determination module is applied to coreference resolution systematically. Experimental also results show that proper anaphoricity determination can significantly improve the performance of coreference resolution in both English and Chinese languages.