不借助于任何辞典工具而从Web中自动挖掘出术语的翻译,这是一项有趣且富有挑战性的工作。本文提供了一种基于部分平行Web语料的自动术语翻译方法。首先通过一个术语对,采用Web挖掘技术,获取潜在的匹配模式。接着,在用户对源术语进行翻译时,利用已获取的模式来抽取候选答案集,最后依据评分函数,对候选答案进行排序,并将结果以格式化的形式反馈给用户:本文依据三条层次规则,构造了候选答案的评分函数。实验结果表明,本文所构造的评分函数客观反映了不同匹配模式的不同重要性,且基于部分平行Web语料的方法能够很好地发现源术语的正确翻译,优于现有的技术方案。
To find the translation of a given terminology from web without any dictionary is an interesting and challengeable work. This paper presents a new automatic terminology translation method based on partially-parallel web corpus. It first uses only a pair of terminologies to get the initial matching patterns through web mining technology. On translating a source terminology, it uses these patterns previously obtained to extract some candidate answers, and then ranks them according to the scoring function and returns them to user in a specified format. This paper establishes three heuristic rules to construct the scoring function for evaluating the reliability of candidate answers. The experimental results indicate that this scoring function objectively reflects the different importance of various matching patterns, and this method provided by this paper can well find the translations of source terminologies and is better than the existing systems.