东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

融合多策略的维吾尔语词干提取方法

ISSN号：1003-0077
期刊名称：中文信息学报
时间：2015.9.15
页码：204-210
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]新疆大学信息科学与工程学院,新疆乌鲁木齐830046, [2]中国科学院自动化研究所模式识别国家重点实验室,北京100190
相关基金：国家自然科学基金（61163032）
相关项目：维吾尔语语素结构规则及其应用研究

关键词：维吾尔语, 形态, 词干提取, N-GRAM模型, 词性特征, 上下文词干信息, Uyghur, morphology, stem segmentation, N-gram model, part of speech, context information

中文摘要：

维吾尔语是形态变化复杂的黏着性语言,维吾尔语词干词缀切分对维吾尔语信息处理具有非常重要的意义,但到目前为止,维吾尔语词干提取的性能仍存在较大的改进空间。该文以N-gram模型为基本框架,根据维吾尔语的构词约束条件,提出了融合词性特征和上下文词干信息的维吾尔语词干提取模型。实验结果表明,词性特征和上下文词干信息可以显著提高维吾尔语词干提取的准确率,与基准系统比较,融入了词性特征和上下文词干信息的实验准确率分别达到了95.19%和96.60%。

英文摘要：

Uyghur is an agglutinative language with complex morphology, Uyghur words stem segmentation plays an important role in Uyghur language information processing. But so far, the performance of the Uyghur words stem segmentation still has much room for improvement . According to the constraints of Uyghur word formation, we proposed a stem segmentation model for Uyghur which fuses the part of speech feature and context information based on N--gram model. Experimental results show that, the part of speech feature and the context information of stem can increase the performance of Uyghur words stem segmentation significantly with the accuracy reaching 95. 19% and 96.60% respectively compared to the baseline system.

同期刊论文项目