东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于特征相似度的可比语料挖掘汉柬命名实体等价对

ISSN号：1672-9722
期刊名称：《计算机与数字工程》
时间：0
分类：TP391.1[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]昆明理工大学信息工程与自动化学院,昆明650500, [2]昆明理工大学智能信息处理重点实验室,昆明650500, [3]云南民族大学东南亚南亚语言文化学院,昆明650500
相关基金：国家自然科学基金“柬埔寨语命名实体识别及汉柬双语语料库构建方法研究”(编号:61462055);国家自然科学基金“基于篇章特征的越南语新闻事件元素抽取关键技术研究”(编号:61562049)资助.

作者：徐璐[1,2], 严馨[1,2], 夏青[1,2], 周枫[1,2], 莫源源[3]

关键词：命名实体等价对, 汉柬双语, 多特征融合, 可比语料, 音译模型, named entity equivalents, Chinese-Khmer bilingual, multi-feature fusion, comparable corpus, transliteration model

中文摘要：

命名实体翻译等价对在跨语言信息处理中具有非常重要的应用价值，然而由于语料资源的有限性，国内外关于汉柬命名实体等价对的抽取方法还没有深入研究。论文从可比语料文本出发，根据不同类型实体要素的特点以及在可比语料中的特点，选取了柬文命名实体到中文命名实体的音译特征、翻译特征、可比语料中命名实体的上下文特征及自身的长度特征，提出了一种基于多特征融合来计算相似度的方法来挖掘汉柬双语命名实体等价对。实验表明该方法取得了比较好的效果，其中挖掘人名实体对的准确率达到76%，召回率达到66%，证明了该方法要优于只采用单一特征的方法。

英文摘要：

Named entity translation equivalent has been playing a significant role in the processing of cross-language information.However limited by the corpora resource,few in-depth studies have been made on the extraction of the bilingual Chinese-Khmer named entity equivalents.Starting from the comparable corpus text,according to the type of entity characteristics and comparable corpus characteristics,the paper selects transliteration feature,translation feature,context feature of the bilingual Chinese-Khmer named entity equivalents and length feature.So a method based on multi-feature fusion is proposed to calculate the similarity to excavate the bilingual Chinese-Khmer named entity equivalents.The experiment shows this method has a good performance when the bilingual Chinese-Khmer named entity equivalents are acquired through the computation of feature similarity,turning out that the method proposed in this paper is able to give better effect compared with the method using only a single feature.

同期刊论文项目