动词次范畴是根据句法行为对动词的进一步划分,它是由核心动词和一系列论元组成。其相关研究在英汉等多种语言方面都取得了较好的成果,但跨语言之间的研究还很少。该文提出了一种基于主动学习策略的英汉动词次范畴论元对应关系自动获取方法,这种方法可以在双语平行语料上,几乎不需要任何先验的语言学知识的情况下,自动获取英汉论元的对应关系。然后我们将这些对应关系加入了统计机器翻译系统。实验结果表明,融合了英汉动词次范畴论元对应关系的SMT系统在性能上有明显的提升,证明了自动抽取的对应关系的有效性,也为SMT提供了新的研究方向。
The verb subcategorization (SCF) is a more brief classification based on syntactic behaviors of verb and it is composed by a verb and several arguments. Recently it has attracted substantial researches for a single language, e.g. English and Chinese, whereas the cross-lingual subcategorization demands more systematic efforts. We present a novel method to obtain SCF argument crrespondenee between Chinese and English based on active learning. This method can find the new relations through bilingual parallel sentence pairs almost without any priori language knowl- edge. We also integrated these relations to the statistical machine translation (SMT) system and experiment results show that the performance of SMT combined bilingual argument relationships has significant improvement, which indicates the validity of argument corresponding relationships automatically obtained.