东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于词向量的藏文词性标注方法研究

ISSN号：1003-0077
期刊名称：《中文信息学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]西藏大学计算机科学与技术系,西藏拉萨850000, [2]西南交通大学信息科学与技术学院,四川成都610031
相关基金：国家自然基金（61262058）;国家社会科学基金（15ZDB11）;西藏高校青年教师创新支持计划项目（QC2005_18）;高原学者计划-珠杰

关键词：词向量, 藏文, 词性标注, distributed representation, Tibetan, POS

中文摘要：

藏文词性标注是藏文信息处理的基础,在藏文文本分类、自动检索、机器翻译等领域有广泛的应用。该文针对藏文语料匮乏,人工标注费时费力等问题,提出一种基于词向量模型的词性标注方法和相应算法,该方法首先利用词向量的语义近似计算功能,扩展标注词典;其次结合语义近似计算和标注词典,完成词性标注。实验结果表明,该方法能够快速有效地扩大了标注词典规模,并能取得较好的标注结果。

英文摘要：

Part of Speech （POS） tagging is fundamental to Tibetan processing, with a wide applications in Tibetan text classification, information retrieval, machine translation and other fields. This paper proposes a method of Ti betan POS tagging based on distributed representation. First, this method extends the dictionary by semantic approximation according to the distributed representation. Then the POS tagging is completed according to the dictionary and the semantic similarity. Experimental results show that this method can expand the dictionary with a better result.

同期刊论文项目