针对在中文资源的关系抽取中,由于中文长句句式复杂,句法特征提取难度大、准确度低等问题,提出了一种基于平行语料库的双语协同中文关系抽取方法。首先在中英双语平行语料库中的英文语料上利用英文成熟的句法分析工具,将得到依存句法特征用于英文关系抽取分类器的训练,然后与利用适合中文的n-gram特征在中文语料上训练的中文关系抽取分类器构成双语视图,最后再依靠标注映射后的平行语料库,将彼此高可靠性的语料加入对方训练语料进行双语协同训练,最终得到一个性能更好的中文关系抽取分类模型。通过对中文测试语料进行实验,结果表明该方法提高了基于弱监督方法的中文关系抽取性能,其F值提高了3.9个百分点。
In the relation extraction of Chinese resources, the long Chinese sentence style is complex, the syntactic feature extraction is very difficult, and its accuracy is low. A bilingual cooperative relation extraction method based on a parallel corpus was proposed to resolve these above problems. In a Chinese and English bilingual parallel corpus, the English relation extraction classification was trained by dependency syntactic features which obtained by mature syntax analytic tools of English, the Chinese relation extraction classification was trained by n-gram feature which is suitable for Chinese, then they constituted bilingual view. Finally, based on the annotated and mapped parallel corpus, the training corpus with high reliability of both classifications were added to each other for bilingual collaborative training, and a Chinese relation extraction classification model with better performance was acquired. Experimental results on Chinese test corpus show that the proposed method improves the performance of Chinese relation extraction method based on weak supervision, its F value is increased by 3.9 percentage points.