提出了一种基于转换和映射的英汉双语语料的语义单元自动获取方法,即构造一套规则系统对英语句子的链语法分析结果进行处理,以设定单词的语义层次并转换生成语义单元的英语表示(ESER);然后利用统计双语词对齐将之映射为语义单元的汉语表示(CSER),从而获得双语语义单元;最后通过语义单元表示实量竞争、合并以及召回等一系列策略对自动获取结果进行优化。该方法能够在缺少完全语义分析的情况下实现语义单元的自动获取。实验结果表明双语语义单元自动获取的F值达到了74.06%,基于语义单元的机器翻译系统具有准确率高的特点。
The paper proposes an approach to automatic extraction of semantic elements (SEs) from the English-Chinese bilingual parallel corpus. Its work consists of three parts: first, constructing a set of rules from the chain of English sentences parsed by the link parser to detennine the relative logical hierarchy of each word to generate the English semantic element representation (ESER) based on the transfonnational algorithm, then, mapping the ESER into the Chinese semantic element representation (CSER) according to the statistical word alignment so as to obtain bilingual semantic element representations (SERs), finally, extracting SEs by optimizing the bilingual SERS through a series of strategies like SERs competition, unaligned word pair recall. This approach can extract SEs without complete semantic analysis from the bilingual parallel corpus. The experiments showed that the F-measure of SE extraction reached 74.06 % and the SE based machine translation systems featured in their high accuracy.