命名实体、术语的翻译对自然语言处理,机器翻译性能的影响越来越得到重视,但是这些翻译很难从现有的翻译词典中获得充足的信息。提出了一种从网页中自动获取高质量命名实体短语翻译对的方法,首次探索了对双语文本中对齐缺失部分自动补充的方法。该方法利用网页双语翻译对的特点,使用统计判别模型,融合多种识别特征自动挖掘网站中存在的双语短语翻译三元对。实验结果表明,采用该模型能高效处理命名实体双语翻译对,正确率达到95.6%。
The effect of translations of named entities and terms on many application systems such as NLP and machine translation attracts more and more attention.However,these translations are hard to attain sufficient information from current bilingual dictionary.In this paper we propose a method to automatically acquire high quality phrase translation pairs of the named entities from web corpora,and explore for the first time the automatic complementary way for the lost part of the bilingual corpora.The method utilises the features of bilingual translation pairs in web pages,uses a statistical discriminative model and combines with multiple recognising features to automatically mine ternary bilingual phrases translation pairs in web stations.Experimental results show that the use of the model can effectively deal with bilingual translation pairs of the named entities with high accuracy of 95.6%.