传统的弱指导关系抽取研究主要集中于单语言内部。为了充分利用语言之间的互补性来减轻对大规模训练数据的需求,提出一种双语协同训练的关系分类方法。针对小规模标注语料和一定规模的未标注语料,通过机器翻译和实体对齐产生关系实例的双语视图,最后利用协同训练得到两种语言的分类模型。在ACE RDC 2005中英文语料上的实验表明,双语协同训练方法可以同时提高中文和英文的关系分类性能,并且减少对于标注训练数据量的需求。
Traditional semi-supervised learning for relation extraction mainly focuses on monolingual resources, failing to alle- viate the need for large-scale corpora by taking full advantage of the complementariness between multiple languages. This paper proposed a bilingual co-training paradigm for relation classification. It gave a small number of labeled instances and a large number of unlabeled instances in two languages, denved two language views for relation instances from machine translation and entity alignment, and finally generated classification models for two languages by co-training. Experimental results on the ACE RDC 2005 Chinese and English corpora show that bilingual co-training can simultaneously improve relation classification in both languages, further decreasing the amount of training data.