共指消解是信息抽取中一个重要子任务。近年来,许多学者尝试利用统计机器学习的方法来进行共指消解并取得了一定的进展。背景知识作为新的研究热点已经被越来越多地利用在自然语言处理的各个领域。该文集成多种背景语义知识作为基于二元分类的共指消解框架的特征,分别在WordNet、维基百科上提取背景知识,同时利用句子中的浅层语义关系、常见文本模式以及待消解词上下文文本特征。并利用特征选择算法自动选择最优的特征组合,同时对比同样的特征下最大熵模型与支持向量机模型的表现。在ACE数据集上实验结果表明,通过集成各种经过特征选择后的背景语义知识,共指消解的结果有进一步提高。
The coreference resolution is an important subtask of information extraction. Recently statistical machine learning methods have been substantially attempted for this issue with some achievements. In this paper, we try to integrate the background semantic knowledge, which is a new subject being introduced in every field of NLP nowadays, into the classical pairwise classification framework for coreference resolution. We extract background knowledge from WordNet and Wikipedia, and exploit the semantic role labeling, general pattern knowledge and the context of mention as well. In the experiment, the feature selection algorithm is employed to decide the best features set, on which the maximum entropy model and SVM model are compared for their performance. The experimental results on ACE dataset exhibit the improvement of coreference resolution after adding selected background semantic knowledge.