在口语翻译中,如何融入语义及语用信息一直是目前研究的难点之一。对话行为作为浅层话语结构描述的特征,近年来陆续应用于不同类型的翻译系统中。该文在介绍对话行为理论和口语标注语料的基础上,以基于短语的统计翻译系统为应用对象,提出了对话行为应用于翻译过程的三种方式。该方法通过对对话行为的自动分类,使训练语料—测试语料、开发集—测试集、源语言—目标语言的一致性得到提高,提高了翻译系统的性能,使最终的翻译结果可以更准确地反映源语言所要表达的对话意图。在汉英口语翻译评测数据上的实验证明,对话行为信息的加入使翻译系统的性能得到了有效的提高。
How to apply semantic and pragmatics information is one of the difficulties in researches on spoken language translation.Dialog act,as a description of shallow discourse structure,has been utilized in several types of translation systems.In this paper,we first introduce dialog act theory and several famous dialog act annotated corpora.Based on annotated corpus and dialog act automatic recognition technology,we propose three kinds of applications of dialog act in phrase-based translation.By introducing the dialog act classification,our approach improves the consistencies between the training data and the test data,between the develop set and the test set,and between the source language and the target language.Further,the translation process is more effective and translation result is more accurate in reflecting the intention of source language.The experimental results on Chinese-to-English spoken language show that dialog act can make the spoken language translation system more accurate and effective.