针对汉-越双语因语言特点差异较大而导致难以实现词语自动对齐的问题,提出了一种基于深层神经网络(deep neural network,DNN)的汉-越双语词语对齐方法。该方法先将汉-越双语词语转化成词向量,作为DNN模型的输入,再通过调整和扩展HMM模型,并融入上下文信息,构建DNN-HMM词语对齐模型。实验以HMM模型和IBM4模型为基础模型,通过大规模的汉-越双语词语对齐任务表明,该方法的准确率、召回率较两个基础模型都有明显的提高,而词语对齐错误率大大降低。
It is difficult to achieve auto-alignment between Vietnamese and Chinese,because their syntax and structure are quite different. In this case,we present a novel method for the Vietnamese-Chinese word alignment based on DNN( deep neural network). Firstly,we should convert Vietnamese-Chinese bilingual word into word embedding,and as the input within DNN. Secondly,DNN-HMMword alignment model is constructed by expanding HMMmodel,which also integrating the context information. The basic model of the experiments are HMMand IBM4. The results of largescale Vietnamese-Chinese bilingual word alignment task showthat this method not only significantly improved its accuracy and recall rate than the two basic models,but also greatly reduced word alignment error rate.